Control 2.4: Business Continuity and Disaster Recovery
Control ID: 2.4 Pillar: Management Regulatory Reference: GLBA 501(b), SOX 404, FINRA 4511, OCC 2011-12, FFIEC BC/DR Guidance Last UI Verified: January 2026 Governance Levels: Baseline / Recommended / Regulated Last Verified: 2026-02-03
Agent 365 Architecture Update
Agent 365 Observability supports business continuity planning by providing unified telemetry for disaster recovery testing and agent health monitoring across platforms. See Unified Agent Governance for observability architecture.
Objective
Ensure critical Copilot Studio agents remain available or can be rapidly restored following service disruptions, regional outages, or disaster scenarios. Establish backup procedures, recovery objectives, failover capabilities, and regular testing to meet regulatory requirements for operational resilience.
Why This Matters for FSI
- GLBA 501(b): Protect against threats to availability - resilient agent architecture required
- SOX 404: Internal controls must include documented recovery procedures
- FINRA 4511: Business continuity requirements extend to agent availability
- OCC 2011-12: Operational risk management includes BC/DR for critical systems
- FFIEC BC/DR: Recovery testing required with documented results
Control Description
This control addresses key FSI requirements for operational resilience of AI agents:
| Capability | Description | FSI Relevance |
|---|---|---|
| Recovery Time Objective (RTO) | Maximum acceptable downtime | Regulatory examination focus |
| Recovery Point Objective (RPO) | Maximum acceptable data loss | Data integrity requirements |
| Geo-Redundancy | Cross-region failover capability | Regional outage protection |
| Regular Testing | Annual DR test with documented results | FFIEC examination readiness |
Agent Criticality Classification
| Tier | Description | RTO | RPO | Examples |
|---|---|---|---|---|
| Tier 1 - Critical | Business cannot operate | <1 hour | <15 min | Trading assistant, Payment processor |
| Tier 2 - High | Significant impact | <4 hours | <1 hour | Customer service, Compliance agent |
| Tier 3 - Medium | Moderate impact | <24 hours | <4 hours | Internal HR bot, IT help desk |
| Tier 4 - Low | Minimal impact | <72 hours | <24 hours | Personal productivity agents |
Key Configuration Points
- Classify agents by criticality tier with defined RTO/RPO
- Create secondary region environments for geo-redundancy
- Configure automated solution backup with appropriate retention
- Deploy agents to DR environment with connection references updated
- Create DR runbook with declaration criteria and recovery procedures
- Establish service health monitoring and alerting
- Schedule and document annual DR testing
Automation Available
See DR Testing Framework in FSI-AgentGov-Solutions for automated validation of AI agent disaster recovery procedures against defined RTO/RPO targets with gap identification.
Zone-Specific Requirements
| Zone | Requirement | Rationale |
|---|---|---|
| Zone 1 (Personal) | Weekly backup; 72-hour RTO acceptable; no DR environment required | Low criticality, minimal business impact |
| Zone 2 (Team) | Daily backup; 4-hour RTO; warm standby DR recommended; annual testing | Shared agents require higher availability |
| Zone 3 (Enterprise) | Continuous backup; <1-hour RTO; hot standby required; quarterly testing | Customer-facing, regulatory examination focus |
Roles & Responsibilities
| Role | Responsibility |
|---|---|
| Power Platform Admin | Environment management, backup configuration, DR environment setup |
| IT Operations | DR activation, failover execution, monitoring |
| AI Governance Lead | Criticality classification, RTO/RPO approval |
| Compliance Officer | DR test documentation, regulatory examination readiness |
Related Controls
| Control | Relationship |
|---|---|
| 2.1 - Managed Environments | DR environment governance |
| 2.3 - Change Management | Solution versioning for backup |
| 3.1 - Agent Inventory | Critical agent identification |
| 3.4 - Incident Reporting | DR event documentation |
Implementation Playbooks
Step-by-Step Implementation
This control has detailed playbooks for implementation, automation, testing, and troubleshooting:
- Portal Walkthrough — Step-by-step portal configuration
- PowerShell Setup — Automation scripts
- Verification & Testing — Test cases and evidence collection
- Troubleshooting — Common issues and resolutions
Verification Criteria
Confirm control effectiveness by verifying:
- All agents classified by criticality tier with documented RTO/RPO
- Automated backups running on schedule with retention policy enforced
- DR environment configured and current with production
- DR runbook documented with declaration criteria and recovery steps
- Annual DR test completed with RTO/RPO targets met
- Test results documented and retained for regulatory examination
Additional Resources
- Microsoft Learn: Power Platform backup and restore
- Microsoft Learn: Environment strategy
- Microsoft Learn: Business continuity guidance
- Microsoft Learn: Azure regions for Power Platform
Updated: January 2026 | Version: v1.2 | UI Verification Status: Current