Control 2.4: Business Continuity and Disaster Recovery — Verification & Testing
Verification, exercise scenarios, and evidence collection for Control 2.4: Business Continuity and Disaster Recovery.
Audience
M365 administrators, AI Governance Leads, and Compliance Officers in US financial services who need to evidence that customer-side BC/DR controls for Copilot Studio agents operate effectively. Evidence produced by these tests supports FFIEC BCM examination, FINRA Rule 4370 review, SEC 17a-4 reconstruction obligations, SOX 404 ITGC testing, and OCC Heightened Standards (Appendix D) independent assurance.
Pre-Verification Prerequisites
| Item | Source |
|---|---|
| Approved BIA listing every in-scope agent with tier, RTO, RPO, dependencies | AI Governance Lead |
| DR runbook (current version, Compliance-approved) | Portal Walkthrough §7 |
| Provisioned secondary-region environment with parity policies | Portal Walkthrough §2 |
| Customer-managed solution exports landing in immutable storage | PowerShell Setup §4 |
| Service Health alerts routed to DR distribution list | Portal Walkthrough §6 |
| Independent observer assigned (Compliance or Internal Audit) for Zone 3 exercises | OCC Appendix D requires independent assurance |
Verification Checklist
Backup verification
- PPAC shows continuous Dataverse system backups for the past 28 days (production) or 7 days (sandbox) per environment
- At least one manual backup created in the past change window with a labeled, traceable change ID
- Customer-managed solution export job has run within the cadence required by the highest-tier agent's RPO
- Solution exports are landing in immutable Azure Blob (or equivalent WORM-capable) storage with a time-based retention policy that meets SEC 17a-4
- SHA-256 hash sidecars exist alongside every export for independent integrity verification
DR environment verification
- Secondary-region environment exists for every Zone 3 production environment, in a different Azure region within the same regulatory geography
- DR environment is enabled as a Managed Environment with parity DLP policies
- Latest managed solution(s) imported in DR; version matches the most recent production release
- Application users present in DR for every Entra Agent ID / service principal used by in-scope agents
- Conditional Access targeting DR environment URLs is documented and tested
- Connection references can be re-bound; reference inventory documented
Identity continuity verification
- App registrations and Entra Agent IDs used by in-scope agents have federated credentials present and not expiring within 30 days
- Client secrets (where used) are stored in a region-paired Key Vault with rotation owner documented
- Smoke test (
WhoAmI) succeeds against DR environment using production-equivalent identity
Runbook verification
- Runbook is published, version-controlled, and approved by Compliance within the past 12 months
- Declaration authority list and contact details are current (no departed personnel)
- Failover and failback sequences include explicit time targets aligned to BIA RTO
- Communication templates exist for stakeholder, regulator, and customer notifications
Exercise verification
- Most recent Zone 3 exercise occurred within the past 90 days; Zone 2 within the past 12 months
- Exercise documentation includes scenario, scope, participants, observer, and timestamps for each phase
- Measured RTO and RPO recorded against targets
- Gaps tracked to closure with owner and date
- Compliance Officer sign-off recorded
- Exercise evidence is retained for ≥ 6 years in the firm's regulatory recordkeeping system
Zone-Specific BC/DR Targets
These are starting framework targets; firms ratify the actual numbers in the BIA against board-approved disruption tolerances.
| Configuration | Zone 1 | Zone 2 | Zone 3 |
|---|---|---|---|
| RTO target (indicative) | ≤ 72 hours | ≤ 4 hours | ≤ 1 hour |
| RPO target (indicative) | ≤ 24 hours | ≤ 1 hour | ≤ 15 minutes |
| Customer-managed export cadence | None required | Daily (minimum) | Aligned to RPO (typically continuous via Synapse Link) |
| Long-term immutable retention | Not required | ≥ 90 days | ≥ 6 years (SEC 17a-4) |
| Secondary-region environment | Not required | Warm standby recommended | Hot or warm standby required |
| Documented exercise cadence | Not required | Annual | Quarterly (mix of tabletop and live) |
| Independent assurance | Not required | Internal review | Annual independent review (OCC Appendix D) |
Backup Retention Layers
| Layer | Retention | Purpose |
|---|---|---|
| Microsoft system backups (Dataverse production) | 28 days | In-region operational recovery |
| Microsoft system backups (Dataverse sandbox) | 7 days | In-region operational recovery |
| Customer manual backups (Power Platform) | Per published Power Platform policy — verify before relying on a window | Pre-change rollback |
| Local staging exports | ≤ 14 days | Short-term staging only — not regulatory evidence |
| Immutable Azure Blob exports | ≥ 6 years (or longest applicable retention) | SEC 17a-4 reconstruction; FINRA 4511 books and records |
DR Exercise Scenarios
Each scenario should be run at least once per Zone 3 quarterly cycle. Mix tabletop and live execution; Microsoft does not need to be notified for a customer-side exercise that does not impersonate a Microsoft outage to other tenants.
Scenario 1 — Regional outage simulation (live failover)
Objective: Validate ability to fail Tier 1 agents over to the DR environment within RTO.
Pre-conditions:
- DR environment refreshed within the past 24 hours
- Identity continuity check passed within the past 7 days (see PowerShell Setup §5)
- Business stakeholders notified
- Independent observer present
Procedure:
| Phase | Activity | Target | Owner |
|---|---|---|---|
| 1 | Declare exercise; primary "marked unavailable" in change-control terms | 5 min | On-call manager |
| 2 | Validate DR environment health (smoke test) | 10 min | Power Platform Admin |
| 3 | Re-bind connection references and refresh OAuth tokens in DR | 15 min | Environment Admin |
| 4 | Cut traffic / channel publish to DR endpoint | 10 min | Power Platform Admin |
| 5 | Functional smoke test on Tier 1 agent | 15 min | Business owner |
| 6 | Notify stakeholders, log declaration time | 5 min | AI Governance Lead |
| Total | Failover complete | ≤ 60 min |
Pass criteria:
- All Tier 1 agents respond in DR within RTO
- No data loss exceeding documented RPO
- All runbook steps executed without undocumented deviation
- Identity, DLP, and Conditional Access enforced in DR
Scenario 2 — Single-agent restore from backup (live)
Objective: Validate ability to restore a single agent from a customer-managed solution export.
Procedure:
- Select a non-production agent
- Export current state as baseline (managed solution + Dataverse data)
- Make a destructive change (delete solution components)
- Restore from the most recent managed solution export
- Re-bind connection references
- Verify functional parity against baseline
Pass criteria:
- Agent restored within 30 minutes
- All exportable components recovered
- Components excluded from solution export (knowledge sources, etc.) recovered via documented secondary procedure
- Test conversation produces expected response
Scenario 3 — Tenant-wide auth failure (tabletop)
Objective: Validate runbook coverage for an Entra outage affecting credential acquisition.
Procedure:
- Walk through declaration criteria for an identity-plane outage
- Confirm break-glass identities and out-of-band communication channels
- Identify which agents become unrecoverable until identity is restored
- Document escalation and customer-notification thresholds
Pass criteria: Runbook explicitly addresses identity-plane scenarios; participants can locate break-glass procedures within 5 minutes.
Scenario 4 — Malicious deletion of agent (live, sandbox-only)
Objective: Validate ability to recover from accidental or malicious deletion of an agent solution.
Procedure:
- In a sandbox environment, delete a non-production agent's solution
- Restore from immutable Azure Blob using the latest export and SHA-256 verification
- Validate component coverage and conduct functional test
Pass criteria: Recovery completes within tier RTO; integrity hash matches export sidecar.
Scenario 5 — Failback to primary
Objective: Validate clean return to the primary environment after the simulated outage.
Procedure:
- Operate in DR for the documented exercise window (minimum 2 hours)
- Inventory configuration and data changes made in DR
- Export and import changes back to primary; reconcile conflicts
- Cut traffic back to primary; place DR back into warm-standby state
- Verify primary functionality
Pass criteria: No data loss during failback; no unresolved conflicts; users experience no functional regression.
DR Exercise Documentation Template
Use this template for every exercise; retain ≥ 6 years.
Exercise summary
| Field | Value |
|---|---|
| Exercise date | |
| Exercise type | Tabletop / Partial / Full |
| Scenario | (e.g., Scenario 1 — Regional outage simulation) |
| Scope (agents, environments) | |
| Participants | |
| Independent observer | |
| Runbook version exercised |
Measured outcomes
| Metric | Target | Actual | Result |
|---|---|---|---|
| Tier 1 RTO | ≤ 1 hr | Pass / Fail | |
| Tier 2 RTO | ≤ 4 hr | Pass / Fail | |
| RPO (data loss measured) | ≤ 15 min | Pass / Fail | |
| Component coverage (% of solution restored) | 100% | Pass / Fail | |
| Identity continuity (WhoAmI in DR) | Pass | Pass / Fail | |
| Runbook executed without deviation | Yes | Pass / Fail |
Issues and corrective actions
| Issue | Severity | Root cause | Corrective action | Owner | Target close date | Status |
|---|---|---|---|---|---|---|
| High / Med / Low |
Lessons learned
(narrative)
Sign-off
| Role | Name | Date |
|---|---|---|
| AI Governance Lead | ||
| Power Platform Admin | ||
| Compliance Officer | ||
| Independent observer |
FSI Use Case — Trading-Floor Surveillance Agent
A surveillance agent that supports supervisory review under FINRA Rule 3110 is typically Tier 1 or Tier 2. Treatment:
- Recordkeeping: Conversation transcripts and supervisory dispositions are books and records; export to immutable storage with SEC 17a-4 retention
- Architecture: Primary in East US, warm standby in West US; daily Synapse Link replication of supervisory tables to ADLS Gen2
- RTO / RPO: ≤ 1 hr / ≤ 15 min, ratified by the BCP committee
- Exercise cadence: Quarterly live failover (one full, three tabletop), with Internal Audit observation for the annual full exercise
- Examiner artifacts: BIA entry, runbook, last four exercise reports with measured outcomes, immutability policy on the export Blob, agent identity rotation log
Related Playbooks
- Portal Walkthrough — Step-by-step portal configuration
- PowerShell Setup — Automation scripts
- Troubleshooting — Common issues and resolutions
Updated: April 2026 | Version: v1.4.0