Verification & Testing — Control 2.6: Model Risk Management (OCC 2011-12 / SR 11-7)
Examiner-defensible verification package for Control 2.6 — Model Risk Management Alignment with OCC 2011-12 / SR 11-7. This playbook produces, signs, and retains the artifacts required to demonstrate to the OCC, Federal Reserve, FDIC, NYDFS, FINRA, SEC, and Internal Audit that the firm's Model Risk Management program is operating with respect to the AI agents in scope — not that Microsoft tooling is configured.
Audience: Compliance Officer, Internal Audit, MRM Committee secretariat, Model Risk Manager (independent validation function), AI Governance Lead. The detailed portal navigation and PowerShell automation that produce the evidence consumed here are in the sister Portal Walkthrough and PowerShell Setup playbooks; this playbook focuses on the audit and examination question: is the firm's MRM program actually operating with these agents in it, and is the MRM Committee accepting the validation?
Last UI verified: April 2026.
Document Conventions
| Convention | Value |
|---|---|
| Hedged regulatory language | This playbook supports compliance with, but does not by itself ensure compliance with, OCC Bulletin 2011-12, Federal Reserve SR 11-7, FDIC FIL-22-2017, FFIEC IT Examination Handbook, FINRA Rule 3110 (Supervision), FINRA Rule 4511 (Books and Records), FINRA Regulatory Notice 25-07 (AI Tools — RFC, contextual only), SEC Rules 17a-3 / 17a-4 (Recordkeeping), SOX §§ 302 / 404, GLBA § 501(b), CFTC Regulation 1.31, and NYDFS 23 NYCRR Part 500. A clean execution of every test in this playbook does not guarantee regulatory compliance, does not replace the firm's Model Risk Management Committee, does not replace independent model validation by qualified personnel, does not replace the effective-challenge process required by SR 11-7, and does not replace the registered-principal supervisory review required by FINRA Rule 3110. |
| Canonical role names | Per docs/reference/role-catalog.md. The roles relevant to this playbook are: Model Risk Manager (independent validation function — second line), MRM Committee (governance body), AI Governance Lead, Compliance Officer, Internal Audit (third line), AI Administrator, Power Platform Admin, Purview Compliance Admin, Entra Global Reader (read-only witness). |
| Run identifier | Each verification cycle is tagged MRM26-yyyyMMdd-HHmmss-<8charGuid> and embedded in every evidence record and artifact filename. |
| Cycle cadence | Pre-deployment (§2), quarterly ongoing-monitoring (§3), risk-commensurate outcomes-analysis (§4 — not fixed annual), per-vendor-event vendor-model (§5), per-MRM-Committee-meeting effective-challenge (§6), continuous retention (§7), quarterly sovereign parity (§8), per-retirement event (§9), on-demand examination rehearsal (§10). |
| Scoring | Every test emits one of Clean / Anomaly / Pending / NotApplicable / Error per §11. The MRM Committee — not this playbook — accepts or rejects the underlying validation. |
| Evidence retention | Six (6) years on 17a-4(f)-compliant media (Purview retention label with deletion lock, or an approved 17a-4(f) vendor) for: model-classification decisions, Agent Cards, validation memos, MRM Committee minutes, monitoring reports, outcomes-analysis runs, vendor-model dispositions, and retirement records. The first two years must be readily accessible. |
Non-substitution. Microsoft Purview DSPM for AI, the Agent 365 Admin Center, the Power Platform CoE export, Copilot Studio Analytics, Azure AI Foundry monitoring and evaluators, and Microsoft Entra Agent ID are inventory, evaluation, and evidence-collection surfaces. They do not replace, and must not be presented to the MRM Committee or to examiners as a substitute for, the firm's MRM Committee, independent validation function, effective-challenge process, three-lines-of-defense governance, registered-principal supervision under FINRA Rule 3110, or 17a-4(f) books-and-records retention.
§1 Verification Scope
1.1 What this playbook verifies — and what it does not
| Question this playbook does answer | Question this playbook does not answer |
|---|---|
| Is every AI agent that meets the firm's model-classification criteria recorded in the firm's model inventory? | Is every AI agent in the tenant a "model"? (Classification is an MRM judgement, not a tool output.) |
| Has each in-scope agent been pre-deployment validated by personnel functionally independent of the model owner / developer? | Is the validation methodology adequate? (That is the MRM Committee's judgement.) |
| Has the MRM Committee accepted the validation, with effective-challenge questions and dispositions documented in minutes? | Does acceptance equal regulatory compliance? (It does not.) |
| Are ongoing-monitoring reports produced quarterly, with thresholds, named owners, and breach escalation to the firm's IR / RCA workflow? | Are the thresholds set at the right values? (That is a model-risk-policy decision.) |
| Are vendor-model changes (Microsoft default-model migration, Foundry deprecations, Anthropic Claude availability) detected and dispositioned before they take effect? | Will Microsoft ship a model change without notice? (This is a vendor-risk concern handled in Control 2.7.) |
| Is validation evidence retained on 17a-4(f)-compliant media for 6 years? | Is the firm's retention vendor itself 17a-4(f)-attested? (That is a vendor-due-diligence question.) |
1.2 Population of in-scope agents
The verification population is the set of AI agents that the firm's model-classification process has classified as Tier 1 or Tier 2. The population is anchored to three independent sources reconciled together:
| Source | Surface | What it provides | Reconciliation rule |
|---|---|---|---|
| A. Authoritative AI inventory | Microsoft Purview DSPM for AI inventory export (cross-reference Control 1.6) | Every Copilot, Copilot Studio, Foundry, and Entra-registered AI app the tenant has produced or exposed | Source of truth for "does this agent exist in the tenant?" |
| B. Agent 365 Admin Center inventory | Agent 365 Admin Center → Agents (cross-reference Control 2.25) | Every Agent 365-surfaced agent with publication / activation approval, governance template, owner, and license state | Source of truth for "is this agent governance-bound?" |
| C. Power Platform CoE export | CoE Starter Kit → Copilot Studio inventory export (CSV / Power BI) | Every Copilot Studio agent across managed environments with environment, owner, last-modified, model selection | Source of truth for "is this agent under Power Platform governance?" |
Reconciliation produces a single in-scope register. Every agent that appears in source A but not in source B is investigated as a potential governance gap (escalate to Control 2.25) or as orphaned (escalate to Control 3.6). Every agent in source C without a corresponding entry in source A is flagged as a DSPM coverage gap.
1.3 Model-tiering coverage check
For each agent in the reconciled register, confirm exactly one of the following classifications has been recorded by the firm's MRM-designated classifier (typically the Model Risk Manager with input from the AI Governance Lead and business owner):
| Classification | MRM treatment | Verification expectation |
|---|---|---|
| Tier 1 (high — material business impact, customer decisions, financial calculations, risk scoring) | Full MRM treatment with independent validation | In §2–§9 below |
| Tier 2 (medium — customer-facing recommendations, significant but not material impact) | Full MRM treatment with independent validation | In §2–§9 below |
| Tier 3 (low — internal use, minimal business impact) | Lightweight self-assessment plus second-line review | Subset coverage in §2–§9; not the focus of this playbook |
| Non-model (information retrieval only, internal productivity, no decision influence) | Standard agent governance only — out of scope for MRM | Recorded with rationale; periodically re-evaluated by Compliance |
Coverage assertion: every agent in the reconciled register has exactly one classification, with rationale, classifier identity, classifier date, and classifier-independence statement. Any agent without a classification is Pending and the MRM Committee secretariat is notified within five business days.
1.4 Evidence record schema
Every test in §2–§10 emits one JSON evidence record conforming to the following schema. The §10.4 examination-readiness pack assembler refuses to publish a pack containing a record that fails schema validation.
{
"control_id": "2.6",
"run_id": "MRM26-20260415-093012-a1b2c3d4",
"run_timestamp": "2026-04-15T09:30:12Z",
"tenant_id": "11111111-2222-3333-4444-555555555555",
"tenant_display_name": "Contoso Bank, N.A.",
"cloud": "Commercial",
"namespace": "PREDEPLOY | MONITOR | OUTCOMES | VENDOR | CHALLENGE | RETAIN | SOV | RETIRE | EXAM",
"agent_id": "agt-22001",
"agent_name": "Wealth-Mgmt Investment-Calculator",
"model_tier": "1",
"underlying_model": "GPT-5 (Foundry)",
"underlying_model_version": "2026-03-15",
"criterion": "VC-1",
"subject_type": "validation_memo | committee_minute | monitoring_report | foundry_eval_run | vendor_disposition | retention_label | sov_attestation | retirement_record",
"status": "Clean | Anomaly | Pending | NotApplicable | Error",
"assertion": "<plain-English statement of what was tested>",
"observed_value": { "...": "..." },
"expected_value": { "...": "..." },
"evidence_artifacts": ["validation-memo-agt-22001-2026Q1.pdf"],
"evidence_sha256": "f3a1...",
"regulator_mappings": ["OCC-2011-12","SR-11-7","FINRA-3110","FINRA-4511","SEC-17a-4","SOX-404"],
"operator_upn": "model.risk.manager@contoso.com",
"mrm_committee_acceptance_ref": "MRMC-2026Q1-min-§4.b",
"schema_version": "1.0"
}
§2 Pre-Deployment Validation Evidence Checklist (PREDEPLOY namespace)
Criterion mapping: Control 2.6 Verification Criteria 1, 2, 3, 4 (pre-deployment subset), 5.
For every agent classified Tier 1 or Tier 2 in §1.3, confirm the following five evidence artifacts exist, are dated, are signed, are version-pinned, and are retained per §7.
2.1 Evidence artifact 1 — Model-classification decision
| Item | Expectation | Source |
|---|---|---|
| Classification (Tier 1 / 2 / 3 / Non-model) | Recorded with a written rationale ≥ 200 words referencing the OCC 2011-12 model definition | Model inventory record (Dataverse / SharePoint per Portal Walkthrough §3) |
| Classifier identity | Named individual, role, employee ID | Inventory record |
| Classifier independence statement | "The classifier is not the model owner / developer for this agent" or, where the same person plays both roles, an explicit second-line counter-signature | Inventory record + e-signature |
| Classification date | UTC timestamp; not back-dated more than 5 business days from approval | Inventory record version history |
| Re-classification trigger | Recorded next-review date, plus material-change triggers (model change, knowledge-source change, business-purpose change) | Inventory record |
Anomaly if any field is missing. Pending if the agent is in pre-classification triage and within the 30-day grace period defined in the firm's model risk policy.
2.2 Evidence artifact 2 — Agent Card
The Agent Card is the firm's plain-language model card. It is the document the validators read first and the document the MRM Committee references.
Required sections (verifier checks each is present and non-empty):
- Intended use — what business decisions or recommendations the agent supports
- Out-of-scope use — what the agent must not be used for
- Data — every knowledge source, RAG corpus, training input, and customer-data category the agent touches (cross-reference Control 2.16)
- Underlying model and version — the specific foundation model and provider; "Copilot Studio default" is not acceptable per the control's underlying-model admonition
- Limitations — failure modes, edge cases, populations or scenarios where the agent is known to underperform
- Performance benchmarks — quantitative results from pre-deployment outcomes analysis (§4)
- Known failure modes — categorical list with mitigation or compensating control
- Change history — every prompt change, knowledge-source change, model change, with date, approver, and material/non-material classification
Anomaly if any of sections 1–7 is empty for a Tier 1 agent or if section 8 has no entries since the agent's last validation.
2.3 Evidence artifact 3 — Pre-deployment validation memo
The validation memo is authored by personnel functionally independent of the model owner / developer, per SR 11-7 § V. Internal second-line MRM staff satisfy the independence test; external assessment is one way to demonstrate independence (often used for Tier 1) but is not required and is not equivalent.
| Section | Expectation |
|---|---|
| Conceptual soundness | Validator's assessment of design appropriateness, methodology, assumptions, and stated limitations — citing the Agent Card, DSPM for AI inventory record, and Agent 365 governance template binding |
| Data assessment | Validator's review of knowledge sources for appropriateness, quality, representativeness (cross-reference Control 2.16) |
| Outcomes-analysis interpretation | Validator's reading of the Foundry evaluator runs (§4) — including any evaluator scores below the firm's threshold and the validator's recommendation (accept / accept-with-condition / reject) |
| Implementation verification | Validator's confirmation that the agent in production matches the validated configuration — includes manifest hash, governance-template binding, Entra Agent ID binding (cross-reference Control 2.26) |
| Limitations and caveats | Validator's plain statement of what the validation does not cover |
| Validator identity & independence | Named validator, role, line-of-defense, independence statement |
| Validator signature | Wet, e-signature, or cryptographic signature with timestamp |
Anomaly if the validator and the model owner / developer are the same individual without a documented compensating second-line counter-signature.
2.4 Evidence artifact 4 — MRM Committee minutes
For each Tier 1 agent, the MRM Committee must have formally accepted the validation. Minutes must reference the agent by ID and capture:
- Date and attendees (with line-of-defense designation)
- Specific challenge questions raised by the independent validator and other Committee members not involved in model development
- Disposition of each challenge question (accepted / requires-condition / requires-revalidation / rejected)
- Conditions imposed on production use (if any)
- Final decision: approved / approved-with-conditions / rejected
- Next scheduled re-validation date (risk-commensurate; see §4.1)
- Secretariat signature
Anomaly if minutes reference the agent but do not record at least one specific challenge question and disposition. Anomaly if the decision is "approved" but at least one Committee member counter-flagged the agent and no rationale for overriding is recorded.
2.5 Evidence artifact 5 — Identity attribution
Confirm the agent has a Microsoft Entra Agent ID (or, where Entra Agent ID is not yet at parity, a documented compensating identity binding) recorded in the inventory record. Cross-reference Control 2.26. Without an attributable identity the agent cannot be supervised under FINRA Rule 3110 and cannot be retired cleanly under §9.
2.6 PREDEPLOY scoring
| Status | Condition |
|---|---|
Clean |
All five evidence artifacts present, dated, signed, retained per §7, and MRM Committee minutes reference the agent. |
Anomaly |
Any artifact missing a required field; classifier or validator independence cannot be demonstrated; Committee minutes lack challenge-and-disposition. |
Pending |
Agent is within the firm's documented pre-production grace window; classification or validation is in flight with target completion date. |
NotApplicable |
Agent re-classified as Non-model with rationale; or agent retired before validation cycle (cross-reference §9). |
Error |
Inventory record unreachable; e-signature service down; evidence-store retrieval failure. Re-run cycle and escalate. |
§3 Ongoing-Monitoring Evidence Test (MONITOR namespace)
Criterion mapping: Control 2.6 Verification Criteria 4 (ongoing-monitoring subset), 6.
Cadence: Quarterly per agent at minimum; more frequently if the firm's model risk policy requires.
3.1 Per-agent monitoring report
For each in-scope agent, the firm's monitoring report for the quarter combines:
| Source | What the validator consumes |
|---|---|
| Copilot Studio Analytics | Conversation count, escalation / fallback rate, user-satisfaction (CSAT) where collected, response latency, error rate |
| Azure AI Foundry monitoring (where the underlying model runs on Foundry) | Token usage, latency percentiles, model-side error rates, throttling events, content-filter triggers |
| Microsoft Purview DSPM for AI interaction reports | Sensitive-prompt rate, sensitive-response rate, sentiment, customer-comment topic clustering, label-applied rate |
| Application Insights / Azure Monitor (where wired up by the agent's developer) | Custom application telemetry: tool-call counts, knowledge-source-hit rates, downstream API latency |
The validator (second line, not the model owner) signs a one-page summary stating: KPIs for the quarter, drift versus prior quarters, threshold-breach count, and whether the validator recommends continued operation or escalation.
3.2 Threshold-breach routing
| Item | Expectation |
|---|---|
| Thresholds defined per agent | Documented in the model inventory and Agent Card |
| Alerts route to a named owner | Single named individual (not a distribution list) plus secondary on-call; recorded in inventory record |
| Threshold breach → IR ticket | Every breach opens an incident per Control 3.4, with Root Cause Analysis |
| RCA feedback loop | RCA findings feed back to the MRM Committee at the next regular meeting; material findings trigger out-of-cycle review |
Anomaly if any in-scope Tier 1 / Tier 2 agent lacks defined thresholds, lacks a named owner, or has a breach this quarter without a corresponding Control 3.4 IR ticket.
3.3 MONITOR scoring
| Status | Condition |
|---|---|
Clean |
Quarterly monitoring report produced for the agent, signed by an independent validator, with thresholds, owner, breach count, and RCA references. |
Anomaly |
Report produced but missing signature; thresholds undefined; owner unnamed; breach without IR ticket. |
Pending |
Quarter not yet closed; report in draft within firm's documented production window (typically 15 business days after quarter end). |
NotApplicable |
Agent newly deployed mid-quarter; first full quarter not yet observable. |
Error |
Telemetry surface unavailable; analytics export failed. Re-run after surface restored. |
§4 Outcomes-Analysis Evidence Test (OUTCOMES namespace)
Criterion mapping: Control 2.6 Verification Criterion 4 (outcomes-analysis subset).
4.1 Risk-commensurate cadence — not fixed annual
SR 11-7 § V requires outcomes analysis at a frequency commensurate with risk, model complexity, and change activity — not on a fixed annual basis. The firm's model risk policy must define how cadence is set per agent. The verifier confirms the cadence is documented and being followed; the verifier does not opine on whether the cadence is correct.
| Driver | Effect on cadence |
|---|---|
| Tier 1 agent, customer-facing | More frequent (commonly quarterly or per material change) |
| Tier 2 agent, internal decision support | Commonly semi-annual or per material change |
| Material change (model, prompt, knowledge source) | Outcomes-analysis run before the change is approved for production |
| Vendor-driven model change | Outcomes-analysis run as part of vendor-model disposition (§5) |
| Threshold breach (§3) | Out-of-cycle outcomes-analysis run as part of RCA |
4.2 Foundry built-in evaluator coverage
For each outcomes-analysis run, confirm the following Foundry built-in evaluators were exercised against a representative evaluation dataset (the dataset itself is approved by the MRM Committee and version-controlled):
| Evaluator family | Specific evaluators | What it tells the validator |
|---|---|---|
| Quality | groundedness, relevance, coherence, fluency | How accurately and coherently the agent answers from its grounding |
| Safety | hate / unfairness, violence, sexual, self-harm, protected materials | Whether the agent produces or amplifies disallowed content |
| Agent-specific | tool-call accuracy, task-completion, intent-resolution | Whether the agent invokes the correct tools, completes multi-step tasks, and resolves user intent |
| Custom (firm-defined) | FSI-specific evaluators (e.g., Reg BI suitability heuristics, AML red-flag detection) | Validator-defined business-specific quality checks |
Evaluation dataset expectations:
- Representative of intended-use scenarios documented in the Agent Card
- Includes adversarial / edge-case scenarios proportionate to risk tier
- Versioned and retained alongside the run (so that re-runs are reproducible)
- Reviewed by the MRM Committee at least annually, or upon material change to intended use
4.3 Run record
Each Foundry evaluator run produces a record retained alongside the validation memo. The verifier confirms each run record contains:
- Agent ID and version (manifest hash)
- Underlying model and version at the time of the run
- Evaluation dataset name and version
- Per-evaluator score (numeric) plus any failed thresholds
- Validator's interpretation: pass / pass-with-condition / fail
- Validator signature
- Cross-reference into the MRM Committee minutes that disposed the run
4.4 OUTCOMES scoring
| Status | Condition |
|---|---|
Clean |
Run executed within the cadence defined in the model risk policy; all relevant evaluator families exercised; run record signed; Committee disposition recorded. |
Anomaly |
Run skipped beyond the policy-defined cadence; safety evaluators omitted for a Tier 1 / Tier 2 agent; run executed but no validator interpretation. |
Pending |
Run scheduled but not yet complete within policy window. |
NotApplicable |
Tier 3 or Non-model agent under firm's policy; or agent retired (§9). |
Error |
Foundry evaluator harness unavailable; dataset retrieval failure. |
§5 Vendor-Model Governance Test (VENDOR namespace, SR 11-7 § V)
Criterion mapping: Control 2.6 Verification Criterion 7. SR 11-7 § V requires vendor-supplied models be validated with the same rigour as internally-developed models.
5.1 Vendor-event watchlist
For each Microsoft / vendor surface that can change the underlying model behind an in-scope agent, confirm the firm has a subscribed monitoring channel, a named disposition owner, and a documented lead time:
| Vendor event | Surface to monitor | Disposition owner | Required lead time |
|---|---|---|---|
| Microsoft Copilot Studio default-model migration (e.g., GPT-5 → next default) | Microsoft 365 Message Center; Power Platform release plan | AI Governance Lead + Model Risk Manager | ≥ 30 days before effective date for Tier 1; per policy for lower tiers |
| Foundry model deprecation | Azure AI Foundry "Models" page; Azure Service Health | AI Administrator + Model Risk Manager | Per Microsoft published deprecation window |
| Foundry evaluator changes (judge-model updates, evaluator GA / deprecation) | Foundry release notes | Model Risk Manager | Before next outcomes-analysis run |
| Anthropic Claude availability changes in Copilot Studio (commercial-cloud and sovereign-cloud delta) | Copilot Studio release notes; sovereign roadmap | AI Governance Lead | Before publication / activation in production |
| Azure OpenAI Service model version pinning changes | Azure portal / Azure OpenAI release notes | AI Administrator | Before next in-scope agent deployment |
| Third-party connector / partner-model availability | Partner channel | AI Governance Lead | Per firm vendor-risk policy (Control 2.7) |
5.2 Disposition record
For each vendor event affecting an in-scope agent, the firm produces a vendor-model disposition record before the event takes effect. The verifier confirms each record contains:
| Field | Expectation |
|---|---|
| Vendor event description | Short narrative + link to vendor announcement |
| Effective date (vendor-stated) | UTC timestamp |
| Disposition decision date | Must precede the effective date |
| Affected in-scope agents | List of agent IDs from the §1.2 register |
| Impact assessment | Per-agent: does this materially change behaviour, performance, safety, or supervisory profile? |
| Decision | accept-as-non-material / accept-with-revalidation / pin-to-prior-version / suspend agent / retire agent (cross-reference §9) |
| Re-validation reference | If "accept-with-revalidation": pointer to the §4 outcomes-analysis run executed against the new model |
| Agent Card and validation memo updates | Confirm both updated to reference the new underlying model and version |
| MRM Committee acceptance | Pointer to minutes accepting the disposition |
| Signatures | Disposition owner + Model Risk Manager + (for Tier 1) MRM Committee Chair |
Anomaly if a vendor event has taken effect in production without a dated disposition record, or if the disposition decision date is after the effective date (i.e., disposed retrospectively).
5.3 VENDOR scoring
| Status | Condition |
|---|---|
Clean |
All vendor-event channels subscribed with named owners; every event with potential in-scope impact has a dated, signed disposition record predating the effective date; Agent Card and validation memo updated. |
Anomaly |
Vendor event with in-scope impact dispositioned after effective date; Agent Card not updated to reflect new underlying model; subscription gap on a watchlist channel. |
Pending |
Vendor event detected; impact assessment in flight within firm's documented disposition window. |
NotApplicable |
Vendor event affects only Non-model agents under firm's classification. |
Error |
Vendor channel feed unavailable; disposition store unreachable. |
§6 Effective-Challenge Test (CHALLENGE namespace)
Criterion mapping: Control 2.6 Verification Criterion 5; SR 11-7 § III ("effective challenge").
Effective challenge is the cornerstone of SR 11-7. Microsoft tooling cannot perform it. This test confirms that the firm's MRM Committee minutes evidence challenge questions raised by personnel not involved in model development for each in-scope Tier 1 agent, and that those questions were dispositioned.
6.1 Per-Tier-1-agent assertions
For each Tier 1 agent reviewed in the quarter, the MRM Committee minutes must:
- Identify the challenger. A named individual functionally independent of the model owner / developer raised at least one substantive challenge question. "Substantive" excludes administrative or scheduling questions.
- Record the question. The challenge question is recorded verbatim or paraphrased with sufficient specificity to be examined later.
- Record the disposition. The model owner / validator's response is recorded; the Committee accepted, conditionally accepted, or rejected the response.
- Record any conditions. Conditions imposed on continued production use are tracked to closure in subsequent meetings (verifier traces forward at least one quarter).
- Independence of challenger. The challenger is not the model owner, the model developer, or anyone reporting to the model owner. Where the challenger reports up through the same business line, a documented second-line counter-signature is required.
6.2 FINRA Rule 3110 supervisory linkage
For each Tier 1 agent that supports an in-scope FINRA business activity (broker-dealer customer interactions, investment recommendations, AML monitoring, trade surveillance, communications with the public), the verifier confirms the registered-principal supervisory layer (Control 2.12) is linked into the MRM workflow:
| Linkage point | Expectation |
|---|---|
| Registered-principal sign-off on validation acceptance | Recorded in MRM Committee minutes or in a parallel supervisory log |
| Supervisory review of Communication Compliance findings (Control 1.10) | Findings flow into the §3 monitoring report |
| Books-and-records preservation of supervisory review evidence | 6-year 17a-4(f) retention per §7 |
Anomaly if a Tier 1 agent supports an in-scope FINRA activity and no registered-principal sign-off appears in either the MRM minutes or the supervisory log for the quarter under review.
6.3 CHALLENGE scoring
| Status | Condition |
|---|---|
Clean |
Every Tier 1 agent in scope has at least one substantive challenge question and disposition recorded in minutes for the cycle; conditions tracked forward; FINRA Rule 3110 linkage in place where applicable. |
Anomaly |
Tier 1 agent reviewed without recorded challenge; challenger independence not demonstrable; conditions imposed but not tracked forward; FINRA linkage missing where required. |
Pending |
Committee meeting deferred within the firm's documented tolerance window. |
NotApplicable |
No Tier 1 agents in scope this cycle (rare; document the population check from §1.2). |
Error |
Minutes repository unreachable; secretariat signature service down. |
§7 Retention Test (RETAIN namespace)
Criterion mapping: Control 2.6 Verification Criterion 9; SEC Rule 17a-4(f); FINRA Rule 4511; SOX § 802.
7.1 Population of records to retain
| Record class | Minimum retention | Acceptable medium |
|---|---|---|
| Model-classification decisions | 6 years (life-of-model + 6 years for Tier 1) | Purview retention label with deletion lock; or approved 17a-4(f) vendor |
| Agent Cards (every version) | 6 years (life-of-model + 6 years for Tier 1) | Purview retention label with version history; or approved 17a-4(f) vendor |
| Pre-deployment validation memos | 6 years | Purview retention label; or approved 17a-4(f) vendor |
| MRM Committee minutes | 7 years (firm-wide governance practice; minimum 6 for SEC 17a-4) | Purview retention label; or approved 17a-4(f) vendor |
| Quarterly monitoring reports | 6 years | Purview retention label; or approved 17a-4(f) vendor |
| Foundry outcomes-analysis run records (with evaluation datasets) | 6 years | Purview retention label; or approved 17a-4(f) vendor |
| Vendor-model dispositions | 6 years | Purview retention label; or approved 17a-4(f) vendor |
| Model-retirement records | 6 years from retirement date | Purview retention label; or approved 17a-4(f) vendor |
Copilot Studio Analytics dashboards and Foundry monitoring dashboards are not 17a-4(f) WORM-compliant on their own. Periodic exports of these dashboards are the records of the business and must land in the retention medium above. Cross-reference Control 1.7 and Control 1.9.
7.2 Retention-label binding test
For a sample of 10 records (or 100% if fewer than 10 exist this cycle) drawn proportionally across record classes, the verifier confirms the binding of a Purview retention label with retentionDuration ≥ 6 years and deletionLocked = true (or, for vendor-stored records, the vendor's 17a-4(f) attestation reference). Confirm via Purview compliance portal or via Get-RetentionCompliancePolicy per PowerShell Setup and Control 1.7.
7.3 Sample retrieve-and-restore test
For one record from each of the eight record classes, the verifier (Purview Compliance Admin or equivalent) executes a retrieval from the retention store and confirms:
| Test step | Expectation |
|---|---|
| Retrieval latency | Within firm's documented "readily accessible" SLA for records less than 2 years old (typically same business day) |
| Content integrity | SHA-256 of retrieved bytes matches the SHA-256 stored in the original evidence record |
| Immutability proof | Retention label still bound; deletion lock still enforced; tampering attempt would be auditable |
| Restoration to working location | Record can be restored to a read-only working location for examiner inspection without removing the immutability binding |
Anomaly if any sample record fails latency, integrity, immutability, or restoration. Anomaly if a record older than 2 years cannot be retrieved within the firm's documented "near-line" SLA.
7.4 RETAIN scoring
| Status | Condition |
|---|---|
Clean |
All eight record classes have policy-bound retention; sample of 10 + 8 retrieve-and-restore passes integrity and immutability checks. |
Anomaly |
Any record class lacks policy binding; sample retrieval fails; immutability binding broken; deletion-lock not enforced. |
Pending |
Retention policy migration in progress with documented target date. |
NotApplicable |
Record class produced zero records this cycle (document why). |
Error |
Purview surface unavailable; vendor 17a-4(f) portal unreachable. |
§8 Sovereign-Cloud Parity Test (SOV namespace)
Criterion mapping: Control 2.6 Sovereign-cloud admonition; applies to GCC, GCC High, DoD tenants.
8.1 Surface-by-surface parity check
The Microsoft surfaces this control depends on have known parity gaps in sovereign clouds at the verification date. For tenants in GCC / GCC High / DoD, the verifier confirms the firm has identified the gap and has a compensating manual control in place — re-verified quarterly.
| Surface | Commercial behaviour | Sovereign expectation | Compensating control if absent |
|---|---|---|---|
| Microsoft Purview DSPM for AI inventory | Authoritative AI inventory across Copilot, Copilot Studio, Foundry, Entra-registered AI apps | Verify regional availability and connector parity per Microsoft 365 Government roadmap | Manual SharePoint-list inventory reconciled monthly to Power Platform CoE export and Agent 365 inventory; sourced from operational sweeps documented in the model risk policy |
| Azure AI Foundry evaluator harness | Built-in groundedness / safety / agent-specific evaluators | Verify Foundry availability and judge-model parity in Azure Government / DoD; some evaluators rely on commercial-cloud-hosted judge models | Documented manual evaluation procedure per agent tier; rubric scored by independent validator |
| Agent 365 Admin Center governance console | Publication / activation approval surface; governance templates | Microsoft has not announced sovereign parity at Agent 365 GA; cross-reference Control 2.25 | Manual change-ticket-driven publish / activate workflow; AI Governance Lead approval recorded in firm ticketing system with retention binding |
| Microsoft Entra Agent ID lifecycle workflows | Identity-bound agent governance | Verify lifecycle-workflow parity per Control 2.26 | Manual Entra service-principal-based identity binding with documented sponsor and review cadence |
| Anthropic Claude in Copilot Studio | Available in commercial cloud | Independently gated per region | Treat as unavailable for in-scope agents until verified; pin to GA Microsoft / Azure OpenAI models |
8.2 Quarterly dual-signed sovereign attestation
Each quarter, for sovereign tenants only, produce a dual-signed attestation:
| Field | Expectation |
|---|---|
| Cloud | GCC / GCC High / DoD |
| As-of date | Last business day of the quarter |
| Surface-by-surface parity statement | Available / partial / unavailable per surface |
| Compensating control statement | For each unavailable surface, the compensating control reference and its operating effectiveness |
| In-scope agent count under compensating controls | Number of agents whose MRM evidence depends on a manual compensating control this quarter |
| Re-check date | Start of next quarter |
| Signatures | AI Governance Lead + Compliance Officer |
8.3 SOV scoring
| Status | Condition |
|---|---|
Clean |
Tenant is Commercial (test passes by definition); or sovereign tenant with current quarterly dual-signed attestation and compensating controls demonstrably operating. |
Anomaly |
Sovereign tenant lacks current attestation; compensating control not demonstrably operating for an in-scope agent. |
Pending |
Attestation in production within firm's quarterly close window. |
NotApplicable |
Not applicable for Commercial tenants other than to record the cloud detection result. |
Error |
Cloud detection unavailable; sovereign roadmap reference unreachable. |
§9 Model-Retirement Test (RETIRE namespace)
Criterion mapping: Control 2.6 Verification Criterion 10.
9.1 Per-cycle assertion
At least once per verification cycle (quarterly), the firm exercises its documented model-retirement / sunset workflow against a real in-scope agent. If no in-scope agent is being retired in the cycle, the firm exercises a tabletop walkthrough against a hypothetical retirement and records the artifacts the workflow would produce.
9.2 Retirement record
For each retirement (real or tabletop), the verifier confirms a retirement record exists containing:
| Field | Expectation |
|---|---|
| Agent ID and name | Matches the §1.2 register |
| Retirement decision date | Recorded in MRM Committee minutes |
| Retirement effective date | UTC timestamp; date the agent ceased availability |
| Reason for retirement | Business-driven / model-risk-driven / vendor-driven (model deprecation) / replacement / regulatory |
| Disposition of in-flight conversations and pending tasks | Drained, transferred to replacement, or terminated with user notification |
| Identity disposal | Entra Agent ID disabled; Power Platform solution archived; Agent 365 publication revoked |
| Knowledge-source / RAG corpus disposition | Retained for retention period or transferred to replacement agent (cross-reference Control 2.16) |
| Communications to users | Notification text retained |
| Evidence preservation | Development, validation, monitoring, outcomes-analysis, and retirement evidence bundled and bound to retention label per §7 |
| Retirement signatures | AI Governance Lead + Model Risk Manager + (for Tier 1) MRM Committee Chair |
9.3 Post-retirement orphan check
After retirement, confirm via Control 3.6 detection cycles that the retired agent does not re-appear as an orphan. A re-appearance indicates the retirement was incomplete (typically: identity disabled but solution still resolvable in the tenant).
9.4 RETIRE scoring
| Status | Condition |
|---|---|
Clean |
At least one retirement (real or tabletop) exercised this cycle with all required record fields and signatures; post-retirement orphan check passes. |
Anomaly |
Retirement record missing fields; identity not disabled; evidence not preserved; retired agent re-appears as orphan in Control 3.6 cycle. |
Pending |
Retirement in flight; effective date in future. |
NotApplicable |
Tabletop only this cycle; record the rationale. |
Error |
Retirement record store unreachable. |
§10 Examination-Readiness Rehearsal (EXAM namespace)
Criterion mapping: Aggregates §1–§9; supports OCC, Federal Reserve, FDIC, NYDFS, FINRA, SEC examination requests.
The examiner will not ask "is DSPM for AI configured?" The examiner will ask: "Show me your AI model inventory. Pick one Tier 1 agent. Show me the validation memo, the MRM Committee minutes that accepted it, the last four quarters of monitoring, and the most recent outcomes-analysis run." This rehearsal confirms the firm can produce those artifacts on demand within examiner-acceptable timeframes.
10.1 Rehearsal cadence
The rehearsal is exercised at least annually, and additionally:
- 30 days before any pre-announced regulatory examination
- After any material change to the MRM program, the model inventory schema, or the retention medium
- After any change in MRM Committee chair or Model Risk Manager
10.2 Rehearsal scenarios
The MRM Committee secretariat selects one scenario from each of the four families and times the response. Target SLAs are illustrative; the firm sets its own SLAs in the model risk policy.
| Scenario family | Examiner-style request | Target SLA |
|---|---|---|
| Inventory completeness | "Provide the full inventory of in-scope AI agents with tier, underlying model, validation status, and last-validation date." | ≤ 1 business day |
| Per-agent validation pack | "For agent [randomly selected Tier 1], provide the model-classification decision, Agent Card, pre-deployment validation memo, and MRM Committee minutes accepting the validation." | ≤ 2 business days |
| Per-agent ongoing operations | "For agent [randomly selected Tier 1 or Tier 2], provide the last four quarters of monitoring reports and the most recent outcomes-analysis runs." | ≤ 3 business days |
| MRM Committee operating effectiveness | "Provide the MRM Committee minutes for any approved Tier 1 agent over the past year, with documented effective-challenge questions and dispositions." | ≤ 3 business days |
10.3 Rehearsal record
Each rehearsal produces a record containing: scenario selected, agent or population queried, time-to-produce, completeness of produced bundle, gaps identified, and remediation actions assigned. The record is signed by the MRM Committee Chair and retained per §7.
10.4 Examination-pack assembler
For automated rehearsals, the firm runs an examination-pack assembler (a wrapper around the §1.4 evidence schema) that:
- Accepts an agent ID or population filter
- Pulls every PREDEPLOY, MONITOR, OUTCOMES, VENDOR, CHALLENGE, RETAIN, and RETIRE evidence record for the agent / population
- Validates each record against the §1.4 schema; refuses to assemble if any record fails
- Produces a single signed bundle (Merkle-rooted SHA-256 chain over the canonical-JSON evidence records) plus a plain-language summary suitable for examiner delivery
- Logs assembly to the audit trail per Control 1.7
10.5 EXAM scoring
| Status | Condition |
|---|---|
Clean |
All four rehearsal scenarios completed within firm-set SLAs; pack assembler produces a complete, signed bundle; gaps remediated. |
Anomaly |
One or more rehearsal scenarios missed SLA; pack assembler refused due to schema-invalid records; gap identified without remediation owner. |
Pending |
Rehearsal scheduled; in flight within firm's documented tolerance. |
NotApplicable |
First-cycle rehearsal not yet due (newly stood-up MRM-for-AI program within firm's documented grace window). |
Error |
Pack assembler unavailable; evidence store offline. |
§11 Scoring Rubric
Every test in §2–§10 emits exactly one of the following statuses. The MRM Committee secretariat aggregates statuses into the cycle attestation; the MRM Committee — not this playbook — accepts or rejects the underlying validation.
| Status | Definition | What it means for the cycle |
|---|---|---|
Clean |
The assertion was tested, the expected evidence was found, and it satisfies the criterion as defined in this playbook and in Control 2.6's Verification Criteria. | No remediation required for this assertion; the assertion contributes to a "clean" cycle. |
Anomaly |
The assertion was tested and the evidence was either missing, incomplete, unsigned, mis-dated, mis-bound to retention, lacking independence, lacking effective challenge, lacking owner, or otherwise non-conforming to the criterion. | Remediation owner assigned; remediation tracked to closure in next cycle; aggregated into the cycle attestation as an open finding to the MRM Committee. |
Pending |
The assertion is in flight within a firm-documented tolerance window — for example, quarterly monitoring report still in draft within the 15-business-day production window, or a vendor-event disposition under review within the firm's documented disposition window. | Re-tested at next cycle close; if still Pending past the tolerance window, automatically escalates to Anomaly. |
NotApplicable |
The assertion does not apply this cycle — for example, no Tier 1 agents in scope, or agent re-classified as Non-model with rationale, or no retirements (real) this cycle and tabletop performed instead. | Recorded with rationale; reviewed at next quarterly cycle to confirm the rationale still holds. |
Error |
The test could not be executed because of a tooling or surface failure (Purview portal unreachable, retention store offline, vendor channel feed down, signature service unavailable). The status reflects test execution, not the firm's compliance posture. | Re-run within firm-documented retry window; if persistent, escalate to AI Administrator and to the Microsoft support channel; document the workaround used to produce evidence in the interim. |
11.1 Cycle-level aggregation
The cycle attestation rolls up every assertion into a single MRM cycle status:
| Cycle status | Aggregation rule |
|---|---|
| Clean cycle | All assertions are Clean, NotApplicable, or Pending within tolerance. Zero Anomaly and zero Error. |
| Cycle with findings | One or more Anomaly. Cycle attestation lists each finding with remediation owner, target close date, and risk rating. MRM Committee disposes each finding at next regular meeting. |
| Cycle blocked | One or more Error that prevents production of evidence required by Control 2.6 Verification Criteria. Cycle attestation lists each blocker with restoration plan and interim manual control. |
11.2 What a clean cycle does not mean
A clean cycle indicates the firm's MRM program is operating with the in-scope AI agents and producing the evidence the framework expects. It does not mean the agents are safe, the models are accurate, the validations are correct, the MRM Committee's challenge was sufficient, or the firm is in regulatory compliance. Those judgements remain with the MRM Committee, the Compliance Officer, the Internal Audit function, and the firm's regulators. The Microsoft surfaces this control depends on support, but do not substitute for, those judgements.
Related Playbooks
- Portal Walkthrough — Step-by-step portal configuration that produces the inventory, Agent Card, and monitoring evidence consumed here
- PowerShell Setup — Automation scripts for inventory export, monitoring report generation, and pack assembly
- Troubleshooting — Common issues and resolutions
Related Controls
- Control 1.6 — Microsoft Purview DSPM for AI — Authoritative AI inventory feeding §1.2
- Control 1.7 — Comprehensive Audit Logging — 17a-4(f) WORM retention path for §7
- Control 1.9 — Data Retention and Deletion — Retention schedule for validation evidence
- Control 1.10 — Communication Compliance — Supervisory review feeding §6.2
- Control 2.3 — Change Management — Vendor-driven change capture for §5
- Control 2.7 — Vendor and Third-Party Risk Management — Vendor-model governance under SR 11-7 § V
- Control 2.12 — FINRA Rule 3110 Supervision — Registered-principal linkage in §6.2
- Control 2.16 — RAG Source Integrity Validation — Knowledge-source integrity feeding §2.2 and §9.2
- Control 2.25 — Agent 365 Admin Center Governance Console — Publication / activation approval surface
- Control 2.26 — Entra Agent ID Identity Governance — Identity attribution for §2.5 and §9.2
- Control 3.4 — Incident Reporting and Root Cause Analysis — Threshold-breach escalation in §3.2
- Control 3.6 — Orphaned Agent Detection — Post-retirement orphan check in §9.3
Updated: April 2026 | Version: v1.4.0 | UI Verification Status: Current