Control 2.4: Business Continuity and Disaster Recovery
Control ID: 2.4
Pillar: Management
Regulatory Reference: FFIEC Business Continuity Management (BCM) Handbook (Nov 2019); FINRA Rule 4370 (Business Continuity Plans); FINRA Rule 4511 / Notice 25-07; SEC 17a-3 / 17a-4 (record retention and reconstruction); SOX 404 (internal controls); GLBA 501(b) (Safeguards Rule – availability); OCC Heightened Standards (12 CFR 30 Appendix D – risk governance); Interagency Paper on Sound Practices to Strengthen Operational Resilience (Oct 2020 – OCC / Fed / FDIC)
Last UI Verified: April 2026
Governance Levels: Baseline / Recommended / Regulated
Agent 365 Architecture Update
Agent 365 Observability supports business continuity planning by providing unified telemetry for disaster recovery testing and agent health monitoring across platforms. See Unified Agent Governance for observability architecture.
Microsoft Shared-Responsibility Boundary
Microsoft does not publish a contractual Recovery Time Objective (RTO) or Recovery Point Objective (RPO) for Power Platform, Dataverse, or Copilot Studio. The Online Services SLA covers availability (financially backed 99.9%) — not recovery time after a destructive event or regional outage. Customers remain responsible for defining RTO/RPO, deploying secondary-region environments, exporting solutions and Dataverse data, and exercising failover. This control documents the customer-side controls that supplement the Microsoft platform.
Objective
Define how the organization keeps critical Copilot Studio agents available — or restores them within board-approved time and data-loss windows — following service disruption, regional outage, accidental deletion, malicious destruction, or third-party (Microsoft) incident. Establish backup procedures, recovery objectives, failover patterns, and exercise cadence that supports the FSI regulatory expectations cited above.
Why This Matters for FSI
- FFIEC BCM Handbook (Nov 2019): Examiners expect a documented Business Impact Analysis (BIA), explicit RTO/RPO per critical system, tested recovery procedures, and evidence that third-party (cloud) dependencies are part of the resilience program. AI agents that handle customer interactions, surveillance, or trading workflows are in scope as critical systems.
- FINRA Rule 4370 (Business Continuity Plans): Member firms must maintain a written BCP addressing mission-critical systems, alternate communications, data backup and recovery, and operational assessment. Agents that participate in customer-facing workflows or recordkeeping are mission-critical and must appear in the BCP inventory.
- FINRA Rule 4511 / FINRA Notice 25-07: Required books and records — including supervisory artifacts created by or supporting AI agents — must remain recoverable for the prescribed retention period; AI-related records receive heightened scrutiny under Notice 25-07.
- SEC Rule 17a-3 / 17a-4: Records must be preservable and reconstructable for the full retention term, in non-erasable / non-rewritable form (or equivalent audit-trail electronic recordkeeping under the 2022 amendment). DR planning helps demonstrate the firm's continuing ability to reconstruct required records after a disruption.
- SOX 404: Internal control over financial reporting includes IT general controls (ITGCs) for backup and recovery; documented and tested DR procedures for in-scope agents support management's annual assessment.
- GLBA 501(b) / Safeguards Rule: Administrative, technical, and physical safeguards must protect the availability of customer information; this requires a tested recovery program for systems that process or store NPI.
- OCC Heightened Standards – 12 CFR 30 Appendix D: Covered banks must maintain a Risk Governance Framework that explicitly addresses operational resilience, including independent assurance over recovery capabilities for critical operations.
- Interagency Paper on Sound Practices to Strengthen Operational Resilience (Oct 2020): Joint OCC / Federal Reserve / FDIC guidance — large firms should set tolerances for disruption, map dependencies (including cloud and AI), and exercise against severe-but-plausible scenarios.
Control Description
This control applies the customer-side resilience pattern for Copilot Studio and Power Platform agents:
| Capability | Description | FSI Relevance |
|---|---|---|
| Business Impact Analysis (BIA) | Per-agent classification with criticality, dependencies, RTO, RPO | Required by FFIEC BCM and FINRA 4370 |
| Recovery Time Objective (RTO) | Customer-defined maximum tolerable downtime | Examiner focus; not a Microsoft commitment |
| Recovery Point Objective (RPO) | Customer-defined maximum tolerable data loss | Bounded by Dataverse system-backup cadence and customer export frequency |
| Microsoft system backups | Automatic Dataverse backups (production: retained 28 days; sandbox: 7 days) | Baseline platform protection — does not replace customer-managed exports |
| Customer-managed exports | Solution exports, Dataverse data exports, environment variable / connection-reference snapshots | Supports cross-region recovery and meets reconstruction obligations |
| Secondary-region environment | Customer-provisioned Power Platform environment in a different Azure region | Required for regional-outage scenarios; Microsoft does not auto-failover Dataverse cross-region |
| Agent identity continuity | Entra app registrations, service principals, and Entra Agent IDs are tenant-scoped (global) | Survives single-region outages without re-registration; federated credentials replicate globally |
| Tabletop and live failover exercises | Documented scenario tests with measured outcomes | FFIEC and Interagency Paper expect evidence of exercise, not just plans |
Agent Criticality Classification
The tier table below is a starting framework. Each firm should ratify RTO / RPO numbers through its BCP committee against board-approved disruption tolerances.
| Tier | Description | Indicative RTO | Indicative RPO | Examples |
|---|---|---|---|---|
| Tier 1 — Critical | Business cannot operate; customer or market impact | < 1 hour | < 15 min | Trading-floor assistant, payment instruction agent |
| Tier 2 — High | Significant operational impact | < 4 hours | < 1 hour | Customer service agent, supervisory / surveillance agent |
| Tier 3 — Medium | Moderate internal impact | < 24 hours | < 4 hours | Internal HR bot, IT help desk |
| Tier 4 — Low | Minimal impact | < 72 hours | < 24 hours | Personal productivity agents |
Microsoft Platform Recovery Behaviors (April 2026)
| Layer | Microsoft-managed | Customer-managed |
|---|---|---|
| Dataverse system backup | Continuous backup; production environments retained 28 days, sandbox 7 days | Trigger manual backups before risky changes; export to long-term storage for retention beyond 28 days |
| Dataverse restore | In-place or to new environment within the same region (subject to capacity) | Cross-region restore is not a native operation — requires solution + data redeployment to a pre-provisioned secondary environment |
| Copy environment | Full or minimal copy supported between environments | Use to refresh DR environment from production on a defined cadence |
| Reset environment | Available for sandbox only | Not a recovery primitive for production |
| Copilot Studio agent portability | Agents move via solution export / import | Knowledge sources, environment variables, connection references, and certain custom topics may need separate handling — verify component coverage |
| Entra Agent ID and service principals | Tenant-global; not region-bound | Document credential rotation and conditional access for the DR environment |
| Service availability SLA | 99.9% financially backed | SLA covers uptime, not recovery — define your own RTO / RPO |
Key Configuration Points
- Maintain a Business Impact Analysis (BIA) covering every Zone 2 and Zone 3 agent, with assigned criticality tier, RTO, RPO, dependencies (data sources, connectors, knowledge sources, downstream systems), and named recovery owner
- Provision a secondary-region Power Platform environment for each production environment hosting Tier 1 or Tier 2 agents, in a different Azure region within the same regulatory geography
- Schedule customer-managed solution exports (managed solutions) and Dataverse data exports on a cadence that meets the agent's RPO target; store exports in immutable / WORM-capable storage to support SEC 17a-4 reconstruction
- Document and apply the same DLP, Conditional Access, and identity policies in the secondary-region environment as in production (see Controls 1.x and 2.1)
- Maintain a written DR runbook covering declaration authority, communication plan, failover steps per tier, failback steps, and post-event evidence collection
- Subscribe required admin distribution lists to Microsoft 365 Service Health and Message Center alerts for Power Platform, Dataverse, Copilot Studio, and Entra ID
- Schedule and document recovery exercises: at minimum annual for Zone 2, quarterly for Zone 3 (consistent with FFIEC and FINRA 4370 expectations); retain evidence for the regulatory retention period
Automation Available
See DR Testing Framework in FSI-AgentGov-Solutions for automated validation of AI agent disaster recovery procedures against defined RTO/RPO targets, with gap identification and exercise evidence packaging.
Zone-Specific Requirements
| Zone | Requirement | Rationale |
|---|---|---|
| Zone 1 (Personal) | Listed in BIA only if used for any record-bearing activity. Rely on Microsoft system backups (28-day Dataverse retention). No secondary-region environment required. RTO ≤ 72 hours acceptable. | Minimal regulatory exposure; recovery cost should not exceed business impact. |
| Zone 2 (Team) | Mandatory BIA entry. Customer-managed solution export at least daily; weekly export retained ≥ 90 days. Warm standby in secondary region recommended for any agent supporting recordkeeping or customer interaction. RTO ≤ 4 hours; RPO ≤ 1 hour. Annual documented recovery exercise. | Shared agents typically participate in supervised business workflows and are in-scope for FINRA 4370 / FFIEC BCM. |
| Zone 3 (Enterprise) | Mandatory BIA entry with board-ratified RTO / RPO. Hot or warm standby in secondary region required; secondary-region environment kept current (≤ 24 hours stale) via automated sync. Solution and data exports replicated to immutable storage to support SEC 17a-4 reconstruction. RTO ≤ 1 hour; RPO ≤ 15 minutes. Quarterly exercise (mix of tabletop and live failover); independent assurance review at least annually per OCC Appendix D. | Customer-facing or market-facing agents; primary examiner focus area; failures may trigger Reg SCI-equivalent reporting for affected SCI entities. |
Roles & Responsibilities
| Role | Responsibility |
|---|---|
| Power Platform Admin | Provision and maintain secondary-region environments; configure manual backups, copy-environment schedules, and solution export pipelines; execute platform-side failover steps |
| Environment Admin | Maintain parity of DLP, security roles, environment variables, and connection references between primary and DR environments |
| AI Governance Lead | Maintain agent BIA inventory; recommend tier assignment; chair recovery exercises; report exercise outcomes to BCP committee |
| Compliance Officer | Approve RTO / RPO against regulatory tolerances; retain exercise evidence per SEC 17a-4 timelines; coordinate examiner responses |
| Entra Agent ID Admin | Confirm agent identities, federated credentials, and service principal scopes function in DR environment; document credential rotation procedures |
Related Controls
| Control | Relationship |
|---|---|
| 2.1 - Managed Environments | DR environment must be a Managed Environment with parity policies |
| 2.3 - Change Management | Solution versioning, ALM pipelines, and rollback feed the recovery process |
| 2.13 - Documentation and Record Keeping | BIA, runbook, and exercise evidence are recordkeeping artifacts |
| 2.26 - Entra Agent ID Identity Governance | Agent identity and credential continuity in the DR environment |
| 3.1 - Agent Inventory and Metadata Management | Source of critical-agent inventory feeding the BIA |
| 3.4 - Incident Reporting | DR declaration and post-event RCA documentation |
Implementation Playbooks
Step-by-Step Implementation
This control has detailed playbooks for implementation, automation, testing, and troubleshooting:
- Portal Walkthrough — Step-by-step portal configuration
- PowerShell Setup — Automation scripts
- Verification & Testing — Test cases and evidence collection
- Troubleshooting — Common issues and resolutions
Verification Criteria
Confirm control effectiveness by verifying:
- A current BIA exists listing every Zone 2 and Zone 3 agent with assigned criticality tier, board-ratified RTO and RPO, dependencies, and named recovery owner
- Microsoft system backups are visible in PPAC for production environments hosting Tier 1 / Tier 2 agents, and customer-managed solution / data export jobs are running on the cadence required by each agent's RPO
- A secondary-region environment exists for each Zone 3 production environment, with parity DLP, Conditional Access, security roles, environment variables, and connection references; freshness is ≤ 24 hours
- A written DR runbook is published, version-controlled, and references current contact lists, declaration authority, and tier-specific recovery steps
- The most recent recovery exercise (annual for Zone 2, quarterly for Zone 3) has documented results showing whether RTO and RPO targets were met, gaps identified, and remediation tracked to closure
- Exercise evidence and DR event records are retained in the firm's regulatory recordkeeping system for the period required by SEC 17a-4 and FINRA 4511
- Entra Agent ID, service principals, and federated credentials referenced by recovered agents authenticate successfully in the DR environment without re-registration
Additional Resources
Microsoft Learn (April 2026)
- Backup and restore environments — Power Platform
- Business continuity and disaster recovery — Power Platform
- Copy an environment
- Environment strategy and lifecycle
- Azure regions and data location for Power Platform
- Solution import and export — ALM
- Copilot Studio: ALM with Power Platform pipelines
- Service availability SLA — Microsoft Online Services
US FSI Regulatory References
- FFIEC IT Examination Handbook — Business Continuity Management (Nov 2019)
- FINRA Rule 4370 — Business Continuity Plans and Emergency Contact Information
- SEC Rule 17a-4 — Records to be preserved by certain exchange members, brokers, and dealers
- 12 CFR Part 30 Appendix D — OCC Guidelines Establishing Heightened Standards
- Interagency Paper on Sound Practices to Strengthen Operational Resilience (Oct 2020)
Updated: April 2026 | Version: v1.4.0 | UI Verification Status: Current