Control 2.20 — Portal Walkthrough: Adversarial Testing and Red Team Framework
Control: 2.20 — Adversarial Testing and Red Team Framework
Pillar: Management
Last UI Verified: April 2026
Estimated time: 6–10 hours initial setup; 2–8 hours per recurring test cycle (varies by zone and agent surface)
Governance Levels: Baseline / Recommended / Regulated
READ FIRST — what this walkthrough is and is NOT
This walkthrough configures the proactive adversarial testing program (red-team charter, attack library, isolated test environment, golden dataset, pre-deployment gate, evidence pack) for Microsoft 365 Copilot, Copilot Studio agents, and Azure AI Foundry-backed FSI agents.
It is NOT a substitute for the following sibling controls. Each is a separate configuration surface with its own playbook:
| If you need… | Use Control | Why this is not 2.20 |
|---|---|---|
| Detection of prompt injection / jailbreak / XPIA in production | 1.21 — Adversarial Input Logging | 2.20 is preventive (pre-deployment + scheduled probes); 1.21 is detective (production telemetry) |
| General agent QA, regression testing, suitability validation | 2.5 — Testing & Validation | 2.5 covers functional QA; 2.20 covers adversarial / safety scenarios |
| Independent model validation per OCC 2011-12 / SR 11-7 | 2.6 — Model Risk Management | 2.6 is the program; 2.20 is one input to 2.6 |
| Bias and fairness testing | 2.11 — Bias Testing & Fairness | 2.11 covers protected-class fairness; 2.20 covers safety / abuse / exfiltration |
| Sensitive Information Type tuning so DLP / Comm Compliance fires correctly during red-team probes | 1.13 — SITs and Pattern Recognition | SIT tuning is a prerequisite for measuring exfiltration test outcomes |
| Content Safety / Prompt Shields configuration on the Azure AI deployment under test | 1.21 §3 (Prompt Shields surface) | 2.20 measures whether Prompt Shields blocks the attack; it does not configure Prompt Shields |
| Evidence retention to WORM under SEC 17a-4(f) | 1.19 — eDiscovery for Agent Interactions | 2.20 generates evidence; 1.19 holds it |
Hedged-language reminder — supports, does not guarantee
The procedures below support compliance with OCC Bulletin 2011-12, Federal Reserve SR 11-7, FINRA Rule 3110, FINRA Regulatory Notice 25-07 (March 2025), SEC Rule 17a-4(b)(4), GLBA 501(b), and the NIST AI RMF Generative AI Profile (NIST AI 600-1). They do not by themselves guarantee regulatory compliance. The Compliance Officer, Model Risk Manager, and CISO must independently validate that the firm's WSPs reference the documented testing cadence, the actual defense-rate thresholds in use, and the firm's records-retention horizon.
What this walkthrough covers — surfaces & owners
| # | Surface | Portal | Owner role | Notes |
|---|---|---|---|---|
| 1 | Red-team program charter (document) | Internal GRC system | AI Governance Lead | Charter must precede any tenant action |
| 2 | Isolated test environment | admin.powerplatform.microsoft.com |
Power Platform Admin | Sandbox; no production data |
| 3 | Test agent deployment (Copilot Studio) | copilotstudio.microsoft.com |
Copilot Studio Agent Author | Cloned from production at known config snapshot |
| 4 | Azure AI Foundry deployment under test (if applicable) | ai.azure.com |
Azure AI Owner | Read-only review of content filters & Prompt Shields |
| 5 | Attack library + golden dataset (version-controlled) | Internal repo | Security Architect + Model Risk Manager | Reviewed quarterly |
| 6 | Pre-deployment gate (CI/CD pipeline) | Azure DevOps / GitHub Actions | Cloud Security Architect | Gate threshold defined per zone |
| 7 | Evidence pack (test results + SHA-256 manifest) | Storage with WORM / immutability lock | Cloud Security Architect | Reconcile with Control 1.19 hold |
| 8 | Findings register + remediation tracking | Internal GRC system | AI Governance Lead | SLA enforcement; risk-acceptance signatures |
§0 Coverage boundary, sovereign-cloud notes, and portal vs script matrix
0.1 Coverage boundary
In scope:
- Microsoft 365 Copilot, Copilot Studio agents, and Azure OpenAI / Azure AI Foundry-backed agents in any zone.
- Adversarial test categories: prompt injection (direct + indirect / XPIA), jailbreak, system-prompt extraction, data exfiltration (NPI / MNPI), suitability bypass, encoded evasion (Base64, Unicode confusables, zero-width), tool / plugin abuse, multi-turn manipulation.
- Pre-deployment probes and scheduled cycles (monthly Z3, quarterly Z2, annual Z1).
Out of scope (handled by sibling controls — see READ FIRST):
- Production detection (1.21), eDiscovery hold (1.19), bias testing (2.11), general QA (2.5), connector / DLP authoring (3.1), and incident response workflow (3.4).
0.2 Sovereign-cloud applicability
| Capability | Commercial | GCC | GCC High | DoD | Notes |
|---|---|---|---|---|---|
| Copilot Studio test environment | GA | GA | Limited preview — verify | Limited — verify | Re-confirm against M365 Government service description |
| Azure AI Foundry — Risk and Safety Evaluations | GA | Rolling | Lagging — verify | Lagging — verify | Substitute: PyRIT + custom scoring on Foundry diagnostic logs |
| Microsoft PyRIT | OSS — runs anywhere | OSS | OSS | OSS | The toolkit is open source; sovereign concern is the target surface, not PyRIT itself |
| Azure AI Content Safety — Prompt Shields | GA | Rolling | Lagging — verify | Lagging — verify | If Prompt Shields is unavailable, document compensating control (pre-prompt classifier in app code) |
Treat any cross-cloud parity claim as time-bound
Re-verify against the Microsoft 365 Government service description before treating any item above as a primary control in GCC / GCC High / DoD. Document the verification date in the change ticket.
0.3 Portal vs script matrix
| Step | Portal? | Script? | Notes |
|---|---|---|---|
| Author program charter | Internal GRC / Word | n/a | Document, sign, store in records system |
| Create isolated test environment | ✅ PPAC | ✅ New-AdminPowerAppEnvironment |
Portal recommended for first-time setup; script for repeatable refresh |
| Clone production agent into sandbox | ✅ Copilot Studio (Solutions export/import) | ✅ Solution CLI | Portal walkthrough below; see powershell-setup.md for solution-based pipeline |
| Maintain attack library | Source repo | n/a | Version control (Git) is the system of record |
| Run probes against agent endpoint | ❌ | ✅ PyRIT / Invoke-AdversarialTests.ps1 |
No portal-driven probe runner |
| Run Azure AI Foundry Risk & Safety Evaluations | ✅ AI Foundry | ✅ Foundry SDK | Portal-driven for ad-hoc; SDK for pipeline |
| Capture evidence pack (SHA-256) | n/a | ✅ Mandatory script step | See powershell-setup.md §5 |
| Track findings & remediation | Internal GRC / DevOps work-item system | API | Out of scope for Microsoft portals |
§1 Pre-flight gates
Complete every gate before opening any portal. Most defects in red-team programs trace to a missed pre-flight.
1.1 Authorization gate
- Red-team program charter signed by AI Governance Lead, CISO, and Compliance Officer.
- Rules of Engagement (ROE) document lists in-scope agents, test categories, time windows, and prohibited actions (no destructive payloads, no exfiltration of real customer data, no production-data targeting).
- Out-of-scope assets explicitly listed (production tenants, customer-facing endpoints, third-party SaaS not owned by the firm).
- Operator separation-of-duties attestation: red-team operators are not the agent's primary developer or owner. Co-signer attestation captured if exception granted.
- Legal review confirms ROE does not violate Computer Fraud and Abuse Act (CFAA) or vendor terms of service for any in-scope SaaS.
1.2 Environment gate
- Sandbox Power Platform environment exists, labelled
RedTeam-{agent}-{yyyymm}, type Sandbox, region matched to production. - DLP and Managed Environment posture in the sandbox matches production at a known snapshot date (record snapshot ID).
- Sandbox environment contains no production customer data. Synthetic data only. Verify via Purview Data Map scan of the sandbox storage.
1.3 Identity gate
- Test users exist in a non-licensed test OU; do not reuse production identities.
- Operator account uses Privileged Identity Management (PIM) for elevation; standing access not granted.
- Audit logging is enabled in the sandbox (Control 1.7) so probe activity is captured and retrievable.
1.4 Tooling gate
- Microsoft PyRIT installed at a CAB-approved version on the operator workstation or pipeline runner.
- Attack library repository checked out at a tagged commit; commit SHA recorded for the run.
- Golden dataset checked out at a tagged version; version recorded.
- Evidence storage location confirmed and immutability / retention policy verified (link to Control 1.19 hold).
§2 Step-by-step portal configuration
Step 1 — Author and sign the red-team program charter
The charter is a documentary prerequisite, not a portal action. Capture in the firm's GRC system the following sections:
| Section | Required content |
|---|---|
| Purpose | Why the program exists; mapping to OCC 2011-12 / SR 11-7 / FINRA 3110 / Notice 25-07 |
| Scope | Which agents, which zones, which Microsoft surfaces |
| Authorization | Signatures: AI Governance Lead, CISO, Compliance Officer, Legal acknowledgment |
| Rules of Engagement | Permitted tactics, prohibited actions, time windows, communication protocol |
| Operator roster + separation-of-duties | Names, roles, exceptions with co-signer attestations |
| Cadence | Z1 annual, Z2 quarterly, Z3 monthly + annual independent third-party |
| Evidence policy | Artefact taxonomy, SHA-256 manifest requirement, WORM destination, retention horizon |
| Remediation SLA | Critical (24 h), High (7 d), Medium (30 d), Low (next release) |
| Reporting | Cadence and audience for executive summary, technical report, board report |
Store the signed charter in records-retention scope (link to Control 1.19 hold to satisfy SEC 17a-4(b)(4) / 18a-6 preservation).
Step 2 — Create an isolated test environment (PPAC)
- Open Power Platform Admin Center.
- Environments → New.
- Configure:
- Name:
RedTeam-{AgentName}-{YYYYMM}(e.g.RedTeam-AdvisoryBot-202604) - Type: Sandbox
- Region: Same as production
- Dataverse: Add only if production agent uses Dataverse
- Security group: Restrict to red-team operator group
- After provisioning, open the environment → Settings → Privacy + Security:
- Confirm DLP policies match production at snapshot date.
- Enable detailed auditing.
- Apply the Managed Environment posture if production is managed.
Do not skip the security-group restriction
Without a security group, default-tenant access lets unintended users see test prompts that may contain sensitive synthetic data. Restrict at create-time, not after.
Step 3 — Clone the production agent into the sandbox
For Copilot Studio agents:
- Open Copilot Studio.
- Switch to the production environment.
- Open the agent → Settings → Solutions → Export as solution → Managed.
- Switch to the sandbox environment.
- Solutions → Import → upload the exported solution.
- After import, open the agent and disable any production connectors (Graph, SharePoint, custom connectors) or rebind them to sandbox equivalents.
- Re-publish the agent to a sandbox channel only (do not re-publish to Teams / Web for the production audience).
For Azure AI Foundry-backed agents:
- Open Azure AI Foundry.
- Select the production project.
- Deployments → Export the model deployment configuration (JSON).
- In the sandbox project, recreate the deployment with the same model, version, and content-filter (RAI policy) configuration.
- Confirm Prompt Shields (UPIA + XPIA) state matches production. Record the snapshot in the evidence pack.
Step 4 — Verify content-safety baseline on the agent under test
Open the test agent's Azure AI Foundry deployment (if applicable):
- Deployments → {deployment} → Content filter (RAI policy).
- Verify Prompt Shields:
- User prompt attacks — should be Annotate and Block for Zone 3, at least Annotate for Zone 2.
- Document attacks (XPIA) — should be Annotate and Block where Prompt Shields supports it.
- Capture screenshots and the policy JSON via the Foundry SDK (see
powershell-setup.md§3) for the evidence pack.
This baseline is the expected defense posture the red-team probes will measure against. If Prompt Shields is not configured, the probe will record near-100 % attack success — the finding is then misattributed (it's a Control 1.21 / Foundry config gap, not an agent design defect).
Step 5 — Stage the attack library and golden dataset
- Check out the attack library repo at the tagged commit approved by the Security Architect:
- Confirm coverage matrix maps to OWASP Top 10 for LLM Applications (2025) IDs (LLM01 … LLM10) and at least the high-relevance MITRE ATLAS techniques for the agent's surface.
- Confirm the FSI-specific abuse families exist:
- NPI exfiltration (account number elicitation, SSN, DOB, address)
- MNPI elicitation (pre-earnings, M&A, insider information)
- Unsuitable-recommendation prompting (high-risk product to unsuitable persona)
- Suitability-bypass via persona injection
- Customer-impersonation prompting
- Check out the golden dataset at the tagged version. Confirm minimum 100 Q&A pairs for Z3 agents; record the refresh date.
Step 6 — Configure the pre-deployment gate
The gate is a CI/CD job that runs the adversarial probe before any production release of an agent.
| Setting | Zone 1 | Zone 2 | Zone 3 |
|---|---|---|---|
| Gate enforcement | Advisory | Block on regression | Block on threshold breach |
| Defense-rate threshold | n/a | ≥ 90 % | ≥ 98 % (Critical / High families); ≥ 95 % overall |
| Required test families (per release) | Smoke | Full quarterly suite | Full monthly suite + delta probes for changed surface |
| Sign-off required to override | n/a | Agent Owner + AI Governance Lead | Agent Owner + AI Governance Lead + Model Risk Manager |
| Evidence pack written to WORM | Optional | Required | Required + reconciled with Control 1.19 hold |
Implement the gate as a pipeline stage in Azure DevOps or GitHub Actions; the probe runner is described in powershell-setup.md §2. The pipeline must fail closed: if the runner errors out or the evidence pack does not generate a SHA-256 manifest, the release does not proceed.
Step 7 — Run the cycle
For each scheduled cycle:
- Confirm pre-flight gates §1.1 – §1.4.
- Execute the probe runner against the test agent's endpoint (see
powershell-setup.md). - Run Azure AI Foundry Risk & Safety Evaluations on the same prompt set for cross-validation.
- Generate the evidence pack (test results CSV + JSON + SHA-256 manifest + screenshot evidence).
- Triage findings: assign severity (Critical / High / Medium / Low) and SLA.
- Open remediation work items for each finding; track to closure.
- Re-test after remediation (regression probe).
- Publish executive summary; archive full pack to WORM.
Step 8 — Reconcile with detection (Control 1.21)
For each test cycle, confirm that probe activity produced expected detection telemetry under Control 1.21:
- Did Prompt Shields fire on the encoded-evasion family? (Cross-check Azure AI Content Safety logs.)
- Did Defender XDR / Defender for Cloud surface AI alerts in the seconds-to-minutes latency window?
- Did Purview Communication Compliance flag the supervisory-relevant prompts?
- Did the Unified Audit Log capture the
CopilotInteractionrecords (metadata only — UAL does not contain full prompt body)?
Reconciliation gaps mean either Control 2.20 attack-library coverage is incomplete or Control 1.21 detection is mis-tuned. Both findings flow to the Control 1.21 owner and the AI Governance Lead.
§3 Configuration by governance zone
| Setting | Zone 1 (Personal) | Zone 2 (Team) | Zone 3 (Enterprise) |
|---|---|---|---|
| Charter required | Lightweight (one-page) | Full charter | Full charter + annual board review |
| Test environment isolation | Optional (may test in dev) | Required (sandbox) | Required (sandbox + network isolation) |
| Test cadence | Annual | Quarterly | Monthly + continuous on changes |
| Attack library coverage | OWASP LLM01 + LLM02 + LLM06 minimum | Full OWASP LLM Top 10 | Full OWASP LLM Top 10 + MITRE ATLAS + FSI abuse families |
| Golden dataset minimum size | 25 Q&A pairs | 50 Q&A pairs | 100+ Q&A pairs per agent |
| Pre-deployment gate | Advisory | Block on regression | Block on threshold breach |
| Independent third-party assessment | None | Consider | Required annually |
| Remediation SLA — Critical | 30 days | 7 days | 24 hours (or compensating control) |
| Remediation SLA — High | 60 days | 14 days | 7 days |
| Evidence retention | Per firm policy | 6+ years (FINRA 4511) | 6+ years to WORM (SEC 17a-4(f)) |
| Board reporting | None | Annual | Quarterly |
§4 FSI example — Investment Advisory bot (Zone 3)
program:
name: Investment Advisory Bot — Red Team
charter_version: v2026.04
charter_signed_by: [AI Governance Lead, CISO, Compliance Officer]
rules_of_engagement: ROE-2026-Q2-AdvisoryBot.pdf
legal_review: 2026-04-02
environment:
name: RedTeam-AdvisoryBot-202604
type: Sandbox
region: us
dataverse: enabled
managed_environment: true
dlp_snapshot: prod-dlp-2026-04-01
agent_under_test:
copilot_studio_solution: AdvisoryBot_Managed_v2.4.1.zip
foundry_deployment: gpt-4o-2024-08-06 (eastus2 advisorybot-prod snapshot)
prompt_shields:
user_prompt_attacks: AnnotateAndBlock
document_attacks: AnnotateAndBlock
attack_library:
repo: agentgov-attack-library
tag: v2026.04
commit: 7c2a91b
coverage:
owasp_llm_top10: [LLM01..LLM10]
mitre_atlas: [AML.T0051, AML.T0054, AML.T0055, AML.T0057]
fsi_families:
- NPI exfiltration (45 prompts)
- MNPI elicitation (20 prompts)
- Unsuitable recommendation (30 prompts)
- Suitability bypass via persona (15 prompts)
- Customer impersonation (10 prompts)
golden_dataset:
version: GD-2026-Q2-Advisory
size: 142 Q&A pairs
last_refresh: 2026-04-05
refreshed_by: Model Risk Manager
cadence:
pre_deployment_gate: every release
scheduled: monthly (1st Monday)
third_party: annual (Q3 contracted)
gate_thresholds:
defense_rate_overall_min: 0.95
defense_rate_critical_min: 0.98
evidence_pack_required: true
sha256_manifest_required: true
worm_destination: purview-hold-2.20-advisorybot
remediation_slas:
Critical: 24h
High: 7d
Medium: 30d
Low: next release
reconciliation:
control_1_21_event_join_key: ConversationId
expected_detection_planes:
- PromptShields (synchronous)
- DefenderForCloud_AI_workload_alerts (seconds–minutes)
- DefenderXDR_Copilot_detections (seconds–minutes)
- Purview_CommComplianceCopilot (minutes–hours)
- UnifiedAudit_CopilotInteraction (minutes–hours; metadata only)
reporting:
executive_summary: monthly
technical_report: post-cycle
board_report: quarterly
§5 Validation
After completing the steps above, confirm:
- Charter signed and stored in records system.
- Sandbox environment exists, isolated, no production data, audit logging enabled.
- Test agent deployed to sandbox; production connectors disabled or rebound.
- Content-safety baseline (Prompt Shields state) captured to evidence pack.
- Attack library and golden dataset checked out at tagged versions; versions recorded.
- Pre-deployment gate configured in pipeline and verified to fail-closed on probe error.
- First cycle completed; evidence pack with SHA-256 manifest written to WORM destination.
- Findings opened in GRC / work-item system with severity, SLA, and owner.
- Reconciliation with Control 1.21 detection telemetry documented.
Back to Control 2.20 · PowerShell Setup · Verification & Testing · Troubleshooting