Control 2.20: Adversarial Testing and Red Team Framework
Control ID: 2.20
Pillar: Management
Regulatory Reference: OCC Bulletin 2011-12, Federal Reserve SR 11-7, FINRA Rule 3110, FINRA Regulatory Notice 25-07, NIST AI RMF (with Generative AI Profile, NIST AI 600-1), MITRE ATLAS, OWASP Top 10 for LLM Applications (2025)
Last UI Verified: April 2026
Governance Levels: Baseline / Recommended / Regulated
Objective
Establish a proactive adversarial testing program to identify vulnerabilities in AI agents before deployment through red team exercises, prompt injection testing, jailbreak simulations, and robustness validation using golden datasets.
Why This Matters for FSI
- OCC Bulletin 2011-12 / Federal Reserve SR 11-7: Effective challenge of model design helps meet supervisory expectations for independent stress and robustness testing of AI agents used in business decisions.
- FINRA Rule 3110 / Notice 25-07 (March 2025): Documented adversarial testing supports the firm's supervisory system and Written Supervisory Procedures (WSPs) for AI tools used by associated persons.
- SEC Rule 17a-4(b)(4) / 18a-6: Red-team test artifacts (prompts, responses, severity, remediation) are business records that must be preserved when they evidence supervision of an AI system.
- GLBA 501(b) Safeguards Rule: Periodic adversarial assessment helps demonstrate ongoing evaluation of safeguards on systems handling customer NPI.
- NIST AI RMF (MEASURE 2.6, MANAGE 4.1): Structured red-team programs support the "Measure" and "Manage" functions for generative AI risk.
No companion solution by design
Not all controls have a companion solution in FSI-AgentGov-Solutions; solution mapping is selective by design. This control is operated via native Microsoft admin surfaces and verified by the framework's assessment-engine collectors. See the Solutions Index for the catalog and coverage scope.
Control Description
This control establishes a proactive adversarial testing program — distinct from the detection posture in Control 1.21 (Adversarial Input Logging) and the general QA posture in Control 2.5 (Testing & Validation). The program is built on six elements:
- Red Team Program Charter — Documented scope, rules of engagement, authorization, separation of duties, and out-of-scope assets. Aligned with the firm's WSPs and the Microsoft AI Red Team operating model.
- Attack Library — Curated, version-controlled test cases covering OWASP Top 10 for LLM Applications (2025), MITRE ATLAS techniques, and FSI-specific abuse cases (NPI exfiltration, MNPI elicitation, unsuitable-recommendation prompting, suitability bypass).
- Golden Dataset Validation — Reference datasets of representative domain-specific Q&A pairs used to detect regressions in agent accuracy and refusal behaviour after configuration or model changes.
- Tooling and Automation — Integration with Microsoft PyRIT (Python Risk Identification Toolkit) for orchestrated probes against Azure OpenAI / Azure AI Foundry-backed agents, and Azure AI Foundry built-in Risk and Safety Evaluations for content-safety scoring.
- Evidence and Remediation Tracking — SHA-256-hashed, WORM-retained test packs with severity classification (Critical / High / Medium / Low) and SLA-based remediation tracked to closure.
- Cadence and Independence — Pre-deployment gates plus quarterly (Zone 2) or continuous + annual third-party (Zone 3) cycles, with operator role separation enforced for model-validation independence per OCC 2011-12.
Adversarial testing is preventive; Control 1.21 is detective. Both should be deployed together for Zone 2 and Zone 3 agents.
Key Configuration Points
- Charter the red team program with named operators, rules of engagement, authorization, and out-of-scope assets; review annually.
- Maintain an attack library that maps to OWASP Top 10 for LLM Applications (2025) and MITRE ATLAS technique IDs; version-control under change management.
- Build and maintain a golden dataset of at least 100 domain-specific Q&A pairs per Zone 3 agent; refresh after any material model, prompt, or grounding-source change.
- Integrate adversarial probes into the pre-deployment pipeline (gate releases on a defined defense-rate threshold per zone).
- Use Microsoft PyRIT for orchestrated probing of Azure OpenAI / Azure AI Foundry-backed agents; use Azure AI Foundry Risk and Safety Evaluations for content-safety scoring.
- Define remediation SLAs: Critical (24 hours to fix or compensating control), High (7 days), Medium (30 days), Low (next release).
- Schedule quarterly red-team exercises for Zone 2 agents; continuous testing plus annual independent third-party assessment for Zone 3.
- Preserve test artefacts (prompts, responses, severity, remediation, SHA-256 manifest) to immutable storage for the firm's records-retention horizon (commonly 6+ years for FINRA 4511 / SEC 17a-4(b)(4)).
- Enforce operator role separation — the red-team operator may not also be the agent's primary developer or owner without a co-signer.
Zone-Specific Requirements
| Zone | Requirement | Rationale |
|---|---|---|
| Zone 1 (Personal) | Basic prompt injection tests; pre-deployment only | Low risk, minimal adversarial exposure |
| Zone 2 (Team) | Comprehensive test suite; quarterly testing; golden dataset validation | Shared agents warrant structured testing |
| Zone 3 (Enterprise) | Full red team program; continuous testing; third-party assessment annually; immediate remediation | Customer-facing requires maximum adversarial validation |
Roles & Responsibilities
| Role | Responsibility |
|---|---|
| AI Governance Lead | Charter, scope, rules of engagement, remediation oversight, and reporting to the AI governance committee |
| Security Architect | Maintain attack library; align test coverage with MITRE ATLAS and OWASP LLM Top 10 (2025) |
| Cloud Security Architect | Execute adversarial probes; tune detection patterns; reconcile findings with Control 1.21 detection telemetry |
| Model Risk Manager | Independent challenge per OCC 2011-12 / SR 11-7; sign off on golden dataset adequacy |
| Agent Owner | Remediate identified vulnerabilities within SLA; provide configuration and prompt-design fixes |
| Copilot Studio Agent Author | Implement remediations (topic, instruction, grounding-scope, content-filter changes) |
| Compliance Officer | Confirm WSP language reflects documented testing cadence and findings retention |
Related Controls
| Control | Relationship |
|---|---|
| 1.21 - Adversarial Input Logging | Detection complements proactive testing |
| 2.5 - Testing & Validation | Adversarial testing is part of validation |
| 2.6 - Model Risk Management | Red team supports model validation |
| 2.11 - Bias Testing | Complementary testing approach |
Implementation Playbooks
Step-by-Step Implementation
This control has detailed playbooks for implementation, automation, testing, and troubleshooting:
- Portal Walkthrough — Step-by-step portal configuration
- PowerShell Setup — Automation scripts
- Verification & Testing — Test cases and evidence collection
- Troubleshooting — Common issues and resolutions
Verification Criteria
Confirm control effectiveness by verifying:
- Red team program charter exists, is approved, and documents scope, authorization, rules of engagement, and operator role separation.
- Attack library covers all OWASP Top 10 for LLM Applications (2025) categories and at least the high-relevance MITRE ATLAS techniques for the agent's deployment surface.
- Golden dataset for each Zone 3 agent contains ≥100 representative Q&A pairs and shows a documented refresh date within the last quarter.
- Pre-deployment adversarial probes run automatically and gate releases against a defined defense-rate threshold; evidence pack (prompts, responses, severity, manifest with SHA-256 hashes) is generated per run.
- Remediation tracking shows SLA conformance for identified vulnerabilities; overdue items have a documented compensating control and risk-acceptance signature.
- For Zone 3 agents, an annual independent third-party adversarial assessment report is on file and findings are tracked to closure.
- Test artefacts are stored on immutable / WORM-capable media for the firm's records-retention horizon and are reconcilable with Control 1.21 detection events by ConversationId / AgentId.
Additional Resources
- MITRE ATLAS: Adversarial Threat Landscape for AI Systems (tactics and techniques catalog)
- OWASP Top 10 for LLM Applications (2025)
- Microsoft AI Red Team
- Microsoft PyRIT — Python Risk Identification Toolkit for generative AI
- Azure AI Foundry — Risk and Safety Evaluations for generative AI
- Azure AI Content Safety — Prompt Shields
- NIST AI Risk Management Framework (with Generative AI Profile, NIST AI 600-1)
- FINRA Regulatory Notice 25-07 — AI tools and supervisory obligations (March 2025)
Updated: April 2026 | Version: v1.4.0 | UI Verification Status: Current