Skip to content

Control 2.20: Adversarial Testing and Red Team Framework

Control ID: 2.20
Pillar: Management
Regulatory Reference: OCC Bulletin 2011-12, Federal Reserve SR 11-7, FINRA Rule 3110, FINRA Regulatory Notice 25-07, NIST AI RMF (with Generative AI Profile, NIST AI 600-1), MITRE ATLAS, OWASP Top 10 for LLM Applications (2025)
Last UI Verified: April 2026
Governance Levels: Baseline / Recommended / Regulated


Objective

Establish a proactive adversarial testing program to identify vulnerabilities in AI agents before deployment through red team exercises, prompt injection testing, jailbreak simulations, and robustness validation using golden datasets.


Why This Matters for FSI

  • OCC Bulletin 2011-12 / Federal Reserve SR 11-7: Effective challenge of model design helps meet supervisory expectations for independent stress and robustness testing of AI agents used in business decisions.
  • FINRA Rule 3110 / Notice 25-07 (March 2025): Documented adversarial testing supports the firm's supervisory system and Written Supervisory Procedures (WSPs) for AI tools used by associated persons.
  • SEC Rule 17a-4(b)(4) / 18a-6: Red-team test artifacts (prompts, responses, severity, remediation) are business records that must be preserved when they evidence supervision of an AI system.
  • GLBA 501(b) Safeguards Rule: Periodic adversarial assessment helps demonstrate ongoing evaluation of safeguards on systems handling customer NPI.
  • NIST AI RMF (MEASURE 2.6, MANAGE 4.1): Structured red-team programs support the "Measure" and "Manage" functions for generative AI risk.

No companion solution by design

Not all controls have a companion solution in FSI-AgentGov-Solutions; solution mapping is selective by design. This control is operated via native Microsoft admin surfaces and verified by the framework's assessment-engine collectors. See the Solutions Index for the catalog and coverage scope.

Control Description

This control establishes a proactive adversarial testing program — distinct from the detection posture in Control 1.21 (Adversarial Input Logging) and the general QA posture in Control 2.5 (Testing & Validation). The program is built on six elements:

  1. Red Team Program Charter — Documented scope, rules of engagement, authorization, separation of duties, and out-of-scope assets. Aligned with the firm's WSPs and the Microsoft AI Red Team operating model.
  2. Attack Library — Curated, version-controlled test cases covering OWASP Top 10 for LLM Applications (2025), MITRE ATLAS techniques, and FSI-specific abuse cases (NPI exfiltration, MNPI elicitation, unsuitable-recommendation prompting, suitability bypass).
  3. Golden Dataset Validation — Reference datasets of representative domain-specific Q&A pairs used to detect regressions in agent accuracy and refusal behaviour after configuration or model changes.
  4. Tooling and Automation — Integration with Microsoft PyRIT (Python Risk Identification Toolkit) for orchestrated probes against Azure OpenAI / Azure AI Foundry-backed agents, and Azure AI Foundry built-in Risk and Safety Evaluations for content-safety scoring.
  5. Evidence and Remediation Tracking — SHA-256-hashed, WORM-retained test packs with severity classification (Critical / High / Medium / Low) and SLA-based remediation tracked to closure.
  6. Cadence and Independence — Pre-deployment gates plus quarterly (Zone 2) or continuous + annual third-party (Zone 3) cycles, with operator role separation enforced for model-validation independence per OCC 2011-12.

Adversarial testing is preventive; Control 1.21 is detective. Both should be deployed together for Zone 2 and Zone 3 agents.


Key Configuration Points

  • Charter the red team program with named operators, rules of engagement, authorization, and out-of-scope assets; review annually.
  • Maintain an attack library that maps to OWASP Top 10 for LLM Applications (2025) and MITRE ATLAS technique IDs; version-control under change management.
  • Build and maintain a golden dataset of at least 100 domain-specific Q&A pairs per Zone 3 agent; refresh after any material model, prompt, or grounding-source change.
  • Integrate adversarial probes into the pre-deployment pipeline (gate releases on a defined defense-rate threshold per zone).
  • Use Microsoft PyRIT for orchestrated probing of Azure OpenAI / Azure AI Foundry-backed agents; use Azure AI Foundry Risk and Safety Evaluations for content-safety scoring.
  • Define remediation SLAs: Critical (24 hours to fix or compensating control), High (7 days), Medium (30 days), Low (next release).
  • Schedule quarterly red-team exercises for Zone 2 agents; continuous testing plus annual independent third-party assessment for Zone 3.
  • Preserve test artefacts (prompts, responses, severity, remediation, SHA-256 manifest) to immutable storage for the firm's records-retention horizon (commonly 6+ years for FINRA 4511 / SEC 17a-4(b)(4)).
  • Enforce operator role separation — the red-team operator may not also be the agent's primary developer or owner without a co-signer.

Zone-Specific Requirements

Zone Requirement Rationale
Zone 1 (Personal) Basic prompt injection tests; pre-deployment only Low risk, minimal adversarial exposure
Zone 2 (Team) Comprehensive test suite; quarterly testing; golden dataset validation Shared agents warrant structured testing
Zone 3 (Enterprise) Full red team program; continuous testing; third-party assessment annually; immediate remediation Customer-facing requires maximum adversarial validation

Roles & Responsibilities

Role Responsibility
AI Governance Lead Charter, scope, rules of engagement, remediation oversight, and reporting to the AI governance committee
Security Architect Maintain attack library; align test coverage with MITRE ATLAS and OWASP LLM Top 10 (2025)
Cloud Security Architect Execute adversarial probes; tune detection patterns; reconcile findings with Control 1.21 detection telemetry
Model Risk Manager Independent challenge per OCC 2011-12 / SR 11-7; sign off on golden dataset adequacy
Agent Owner Remediate identified vulnerabilities within SLA; provide configuration and prompt-design fixes
Copilot Studio Agent Author Implement remediations (topic, instruction, grounding-scope, content-filter changes)
Compliance Officer Confirm WSP language reflects documented testing cadence and findings retention

Control Relationship
1.21 - Adversarial Input Logging Detection complements proactive testing
2.5 - Testing & Validation Adversarial testing is part of validation
2.6 - Model Risk Management Red team supports model validation
2.11 - Bias Testing Complementary testing approach

Implementation Playbooks

Step-by-Step Implementation

This control has detailed playbooks for implementation, automation, testing, and troubleshooting:


Verification Criteria

Confirm control effectiveness by verifying:

  1. Red team program charter exists, is approved, and documents scope, authorization, rules of engagement, and operator role separation.
  2. Attack library covers all OWASP Top 10 for LLM Applications (2025) categories and at least the high-relevance MITRE ATLAS techniques for the agent's deployment surface.
  3. Golden dataset for each Zone 3 agent contains ≥100 representative Q&A pairs and shows a documented refresh date within the last quarter.
  4. Pre-deployment adversarial probes run automatically and gate releases against a defined defense-rate threshold; evidence pack (prompts, responses, severity, manifest with SHA-256 hashes) is generated per run.
  5. Remediation tracking shows SLA conformance for identified vulnerabilities; overdue items have a documented compensating control and risk-acceptance signature.
  6. For Zone 3 agents, an annual independent third-party adversarial assessment report is on file and findings are tracked to closure.
  7. Test artefacts are stored on immutable / WORM-capable media for the firm's records-retention horizon and are reconcilable with Control 1.21 detection events by ConversationId / AgentId.

Additional Resources


Updated: April 2026 | Version: v1.4.0 | UI Verification Status: Current