Skip to content

Control 2.4: Business Continuity and Disaster Recovery

Control ID: 2.4
Pillar: Management
Regulatory Reference: FFIEC Business Continuity Management (BCM) Handbook (Nov 2019); FINRA Rule 4370 (Business Continuity Plans); FINRA Rule 4511 / Notice 25-07; SEC 17a-3 / 17a-4 (record retention and reconstruction); SOX 404 (internal controls); GLBA 501(b) (Safeguards Rule – availability); OCC Heightened Standards (12 CFR 30 Appendix D – risk governance); Interagency Paper on Sound Practices to Strengthen Operational Resilience (Oct 2020 – OCC / Fed / FDIC)
Last UI Verified: April 2026
Governance Levels: Baseline / Recommended / Regulated


Agent 365 Architecture Update

Agent 365 Observability supports business continuity planning by providing unified telemetry for disaster recovery testing and agent health monitoring across platforms. See Unified Agent Governance for observability architecture.

Microsoft Shared-Responsibility Boundary

Microsoft does not publish a contractual Recovery Time Objective (RTO) or Recovery Point Objective (RPO) for Power Platform, Dataverse, or Copilot Studio. The Online Services SLA covers availability (financially backed 99.9%) — not recovery time after a destructive event or regional outage. Customers remain responsible for defining RTO/RPO, deploying secondary-region environments, exporting solutions and Dataverse data, and exercising failover. This control documents the customer-side controls that supplement the Microsoft platform.

Objective

Define how the organization keeps critical Copilot Studio agents available — or restores them within board-approved time and data-loss windows — following service disruption, regional outage, accidental deletion, malicious destruction, or third-party (Microsoft) incident. Establish backup procedures, recovery objectives, failover patterns, and exercise cadence that supports the FSI regulatory expectations cited above.


Why This Matters for FSI

  • FFIEC BCM Handbook (Nov 2019): Examiners expect a documented Business Impact Analysis (BIA), explicit RTO/RPO per critical system, tested recovery procedures, and evidence that third-party (cloud) dependencies are part of the resilience program. AI agents that handle customer interactions, surveillance, or trading workflows are in scope as critical systems.
  • FINRA Rule 4370 (Business Continuity Plans): Member firms must maintain a written BCP addressing mission-critical systems, alternate communications, data backup and recovery, and operational assessment. Agents that participate in customer-facing workflows or recordkeeping are mission-critical and must appear in the BCP inventory.
  • FINRA Rule 4511 / FINRA Notice 25-07: Required books and records — including supervisory artifacts created by or supporting AI agents — must remain recoverable for the prescribed retention period; AI-related records receive heightened scrutiny under Notice 25-07.
  • SEC Rule 17a-3 / 17a-4: Records must be preservable and reconstructable for the full retention term, in non-erasable / non-rewritable form (or equivalent audit-trail electronic recordkeeping under the 2022 amendment). DR planning helps demonstrate the firm's continuing ability to reconstruct required records after a disruption.
  • SOX 404: Internal control over financial reporting includes IT general controls (ITGCs) for backup and recovery; documented and tested DR procedures for in-scope agents support management's annual assessment.
  • GLBA 501(b) / Safeguards Rule: Administrative, technical, and physical safeguards must protect the availability of customer information; this requires a tested recovery program for systems that process or store NPI.
  • OCC Heightened Standards – 12 CFR 30 Appendix D: Covered banks must maintain a Risk Governance Framework that explicitly addresses operational resilience, including independent assurance over recovery capabilities for critical operations.
  • Interagency Paper on Sound Practices to Strengthen Operational Resilience (Oct 2020): Joint OCC / Federal Reserve / FDIC guidance — large firms should set tolerances for disruption, map dependencies (including cloud and AI), and exercise against severe-but-plausible scenarios.

Control Description

This control applies the customer-side resilience pattern for Copilot Studio and Power Platform agents:

Capability Description FSI Relevance
Business Impact Analysis (BIA) Per-agent classification with criticality, dependencies, RTO, RPO Required by FFIEC BCM and FINRA 4370
Recovery Time Objective (RTO) Customer-defined maximum tolerable downtime Examiner focus; not a Microsoft commitment
Recovery Point Objective (RPO) Customer-defined maximum tolerable data loss Bounded by Dataverse system-backup cadence and customer export frequency
Microsoft system backups Automatic Dataverse backups (production: retained 28 days; sandbox: 7 days) Baseline platform protection — does not replace customer-managed exports
Customer-managed exports Solution exports, Dataverse data exports, environment variable / connection-reference snapshots Supports cross-region recovery and meets reconstruction obligations
Secondary-region environment Customer-provisioned Power Platform environment in a different Azure region Required for regional-outage scenarios; Microsoft does not auto-failover Dataverse cross-region
Agent identity continuity Entra app registrations, service principals, and Entra Agent IDs are tenant-scoped (global) Survives single-region outages without re-registration; federated credentials replicate globally
Tabletop and live failover exercises Documented scenario tests with measured outcomes FFIEC and Interagency Paper expect evidence of exercise, not just plans

Agent Criticality Classification

The tier table below is a starting framework. Each firm should ratify RTO / RPO numbers through its BCP committee against board-approved disruption tolerances.

Tier Description Indicative RTO Indicative RPO Examples
Tier 1 — Critical Business cannot operate; customer or market impact < 1 hour < 15 min Trading-floor assistant, payment instruction agent
Tier 2 — High Significant operational impact < 4 hours < 1 hour Customer service agent, supervisory / surveillance agent
Tier 3 — Medium Moderate internal impact < 24 hours < 4 hours Internal HR bot, IT help desk
Tier 4 — Low Minimal impact < 72 hours < 24 hours Personal productivity agents

Microsoft Platform Recovery Behaviors (April 2026)

Layer Microsoft-managed Customer-managed
Dataverse system backup Continuous backup; production environments retained 28 days, sandbox 7 days Trigger manual backups before risky changes; export to long-term storage for retention beyond 28 days
Dataverse restore In-place or to new environment within the same region (subject to capacity) Cross-region restore is not a native operation — requires solution + data redeployment to a pre-provisioned secondary environment
Copy environment Full or minimal copy supported between environments Use to refresh DR environment from production on a defined cadence
Reset environment Available for sandbox only Not a recovery primitive for production
Copilot Studio agent portability Agents move via solution export / import Knowledge sources, environment variables, connection references, and certain custom topics may need separate handling — verify component coverage
Entra Agent ID and service principals Tenant-global; not region-bound Document credential rotation and conditional access for the DR environment
Service availability SLA 99.9% financially backed SLA covers uptime, not recovery — define your own RTO / RPO

Key Configuration Points

  • Maintain a Business Impact Analysis (BIA) covering every Zone 2 and Zone 3 agent, with assigned criticality tier, RTO, RPO, dependencies (data sources, connectors, knowledge sources, downstream systems), and named recovery owner
  • Provision a secondary-region Power Platform environment for each production environment hosting Tier 1 or Tier 2 agents, in a different Azure region within the same regulatory geography
  • Schedule customer-managed solution exports (managed solutions) and Dataverse data exports on a cadence that meets the agent's RPO target; store exports in immutable / WORM-capable storage to support SEC 17a-4 reconstruction
  • Document and apply the same DLP, Conditional Access, and identity policies in the secondary-region environment as in production (see Controls 1.x and 2.1)
  • Maintain a written DR runbook covering declaration authority, communication plan, failover steps per tier, failback steps, and post-event evidence collection
  • Subscribe required admin distribution lists to Microsoft 365 Service Health and Message Center alerts for Power Platform, Dataverse, Copilot Studio, and Entra ID
  • Schedule and document recovery exercises: at minimum annual for Zone 2, quarterly for Zone 3 (consistent with FFIEC and FINRA 4370 expectations); retain evidence for the regulatory retention period

Automation Available

See DR Testing Framework in FSI-AgentGov-Solutions for automated validation of AI agent disaster recovery procedures against defined RTO/RPO targets, with gap identification and exercise evidence packaging.


Zone-Specific Requirements

Zone Requirement Rationale
Zone 1 (Personal) Listed in BIA only if used for any record-bearing activity. Rely on Microsoft system backups (28-day Dataverse retention). No secondary-region environment required. RTO ≤ 72 hours acceptable. Minimal regulatory exposure; recovery cost should not exceed business impact.
Zone 2 (Team) Mandatory BIA entry. Customer-managed solution export at least daily; weekly export retained ≥ 90 days. Warm standby in secondary region recommended for any agent supporting recordkeeping or customer interaction. RTO ≤ 4 hours; RPO ≤ 1 hour. Annual documented recovery exercise. Shared agents typically participate in supervised business workflows and are in-scope for FINRA 4370 / FFIEC BCM.
Zone 3 (Enterprise) Mandatory BIA entry with board-ratified RTO / RPO. Hot or warm standby in secondary region required; secondary-region environment kept current (≤ 24 hours stale) via automated sync. Solution and data exports replicated to immutable storage to support SEC 17a-4 reconstruction. RTO ≤ 1 hour; RPO ≤ 15 minutes. Quarterly exercise (mix of tabletop and live failover); independent assurance review at least annually per OCC Appendix D. Customer-facing or market-facing agents; primary examiner focus area; failures may trigger Reg SCI-equivalent reporting for affected SCI entities.

Roles & Responsibilities

Role Responsibility
Power Platform Admin Provision and maintain secondary-region environments; configure manual backups, copy-environment schedules, and solution export pipelines; execute platform-side failover steps
Environment Admin Maintain parity of DLP, security roles, environment variables, and connection references between primary and DR environments
AI Governance Lead Maintain agent BIA inventory; recommend tier assignment; chair recovery exercises; report exercise outcomes to BCP committee
Compliance Officer Approve RTO / RPO against regulatory tolerances; retain exercise evidence per SEC 17a-4 timelines; coordinate examiner responses
Entra Agent ID Admin Confirm agent identities, federated credentials, and service principal scopes function in DR environment; document credential rotation procedures

Control Relationship
2.1 - Managed Environments DR environment must be a Managed Environment with parity policies
2.3 - Change Management Solution versioning, ALM pipelines, and rollback feed the recovery process
2.13 - Documentation and Record Keeping BIA, runbook, and exercise evidence are recordkeeping artifacts
2.26 - Entra Agent ID Identity Governance Agent identity and credential continuity in the DR environment
3.1 - Agent Inventory and Metadata Management Source of critical-agent inventory feeding the BIA
3.4 - Incident Reporting DR declaration and post-event RCA documentation

Implementation Playbooks

Step-by-Step Implementation

This control has detailed playbooks for implementation, automation, testing, and troubleshooting:


Verification Criteria

Confirm control effectiveness by verifying:

  1. A current BIA exists listing every Zone 2 and Zone 3 agent with assigned criticality tier, board-ratified RTO and RPO, dependencies, and named recovery owner
  2. Microsoft system backups are visible in PPAC for production environments hosting Tier 1 / Tier 2 agents, and customer-managed solution / data export jobs are running on the cadence required by each agent's RPO
  3. A secondary-region environment exists for each Zone 3 production environment, with parity DLP, Conditional Access, security roles, environment variables, and connection references; freshness is ≤ 24 hours
  4. A written DR runbook is published, version-controlled, and references current contact lists, declaration authority, and tier-specific recovery steps
  5. The most recent recovery exercise (annual for Zone 2, quarterly for Zone 3) has documented results showing whether RTO and RPO targets were met, gaps identified, and remediation tracked to closure
  6. Exercise evidence and DR event records are retained in the firm's regulatory recordkeeping system for the period required by SEC 17a-4 and FINRA 4511
  7. Entra Agent ID, service principals, and federated credentials referenced by recovered agents authenticate successfully in the DR environment without re-registration

Additional Resources

Microsoft Learn (April 2026)

US FSI Regulatory References


Updated: April 2026 | Version: v1.4.0 | UI Verification Status: Current