Skip to content

Control 1.21: Adversarial Input Logging

Control ID: 1.21
Pillar: Security
Regulatory Reference: FINRA 3110, FINRA 4511, FINRA 25-07, SEC 17a-4(b)(4), GLBA 501(b), OCC 2011-12 / Fed SR 11-7, NIST SP 800-53 SI-4 / AU-6 / SI-10, NIST CSF 2.0 DE.CM / DE.AE, NIST AI RMF 1.0 + GenAI Profile (NIST AI 600-1) MEASURE-2.6 / MANAGE-4.1, MITRE ATLAS (Initial Access via Prompt Injection, Defense Evasion via Encoding)
Last UI Verified: April 2026
Governance Levels: Baseline / Recommended / Regulated


Objective

Implement detection and logging capabilities for adversarial inputs targeting AI agents, including prompt injection attacks, jailbreaking attempts, and encoding-based evasion techniques to provide early warning of manipulation attempts and support incident response.


Why This Matters for FSI

  • FINRA Rule 3110 + Regulatory Notice 25-07 (March 2025): Supports the supervisory system FSI member firms must apply to AI-mediated communications. Notice 25-07 reaffirms that existing supervisory and recordkeeping obligations apply to generative-AI tools; it does not mandate a fixed real-time SLA, but expects near-real-time review consistent with the firm's WSP and the latency of the underlying Microsoft signals.
  • FINRA Rule 4511 + SEC 17a-4(b)(4): Helps preserve evidence of adversarial-input attempts and SOC response actions as books-and-records (typically 6 years, first 2 years easily accessible) where the attempt touches a customer-facing or recordkeeping-scope agent.
  • GLBA 501(b) Safeguards Rule: Administrative and technical safeguards through security-event detection, logging, and incident response for systems that touch customer NPI — including AI agents that ground on or generate NPI.
  • OCC 2011-12 / Fed SR 11-7 (Model Risk Management): Adversarial inputs are a model-risk surface; detection and logging support the ongoing monitoring expectations for AI/ML systems used in the bank.
  • NIST SP 800-53 SI-4 (System Monitoring), SI-10 (Information Input Validation), AU-6 (Audit Review, Analysis & Reporting): Direct mapping for input-validation and audit-review obligations.
  • NIST AI RMF 1.0 + Generative AI Profile (NIST AI 600-1): MEASURE-2.6 (security) and MANAGE-4.1 (incident handling) call for adversarial-input detection, tamper-resistant logging, and ongoing red-teaming.
  • MITRE ATLAS: Adversarial-input detection maps to ATLAS techniques for LLM Prompt Injection (direct / indirect) and LLM Jailbreak; aligning detections to ATLAS supports threat-hunt repeatability.

No companion solution by design

Not all controls have a companion solution in FSI-AgentGov-Solutions; solution mapping is selective by design. This control is operated via native Microsoft admin surfaces and verified by the framework's assessment-engine collectors. See the Solutions Index for the catalog and coverage scope.

License Requirements

  • Azure AI Content Safety / Prompt Shields — included with Azure OpenAI / Azure AI Foundry inference; verify region SKU at deploy.
  • Microsoft Defender for Cloud — Threat Protection for AI Workloads — paid Defender for Cloud plan; per-token consumption-based billing for AI workload protection.
  • Microsoft Defender XDR for Microsoft 365 Copilot — requires Microsoft 365 Copilot entitlement plus the relevant Defender plan (typically Microsoft 365 E5 Security or Defender for Office 365 P2).
  • Microsoft Purview Communication Compliance — Microsoft 365 E5 / E5 Compliance, or Communication Compliance add-on; "Detect Microsoft Copilot Interactions" template requires the Copilot entitlement on monitored users.
  • Microsoft Purview DSPM for AI — DSPM for AI entitlement (per current Microsoft licensing); E5 Compliance recommended.
  • Microsoft 365 Unified Audit (Standard / Advanced) — Standard Audit included with most M365 SKUs; Advanced Audit (long retention, high-value events) requires E5 / E5 Compliance.
  • Microsoft Sentinel — pay-as-you-go ingestion; Content Hub solutions for Microsoft 365 Copilot and Defender for Cloud are no-cost installs but ingestion is billed.
  • Microsoft Entra ID P2 — required for Conditional Access response actions tied to Defender XDR Copilot incidents.

Re-verify SKU eligibility at deploy time against the Microsoft 365 security & compliance licensing guidance.

Sovereign Cloud Parity (verify at deploy time)

Capability Commercial GCC GCC High DoD
Azure AI Content Safety — Prompt Shields GA Verify per release Verify per release / often lagging Verify per release / often lagging
Defender for Cloud — Threat Protection for AI Workloads GA Rolling Lagging — verify Lagging — verify
Defender XDR for Microsoft 365 Copilot detections GA Rolling Lagging — verify Lagging — verify
Purview Communication Compliance — Prompt Shield classifier GA Rolling Lagging — verify Lagging — verify
Purview DSPM for AI GA Rolling Lagging — verify Lagging — verify
Microsoft 365 Copilot / Copilot Studio GA GA Limited preview as of early 2026 — verify Limited / verify
Microsoft Sentinel — Content Hub solutions for Copilot / DfC AI GA GA GA GA (verify per solution)

Treat any cross-cloud parity gap as a compensating-control conversation, not an assumption of feature parity. Broker-dealer or federal-adjacent advisory tenants on GCC / GCC High / DoD must re-verify against the Microsoft 365 government service description before relying on any of the above as a primary control.

Control Description

Adversarial-input detection in the Microsoft estate is delivered by four distinct Microsoft signal planes, plus cross-plane correlation in Sentinel. FSI deployments must reason about them separately — they have different latencies, different blocking semantics, different licensing, and different sovereign-cloud availability.

  1. Inference-time guardrail — Azure AI Content Safety Prompt Shields. GA. Inline classifier on the Azure OpenAI / Azure AI Foundry inference path that detects and (when configured) blocks direct prompt injection / jailbreak (UPIA) and indirect / document-grounded prompt injection (XPIA). This is the only Microsoft signal plane that operates synchronously with the prompt — i.e., genuinely real-time and pre-response. Required for any Zone 3 agent built on Azure OpenAI or Azure AI Foundry.
  2. Azure AI workload telemetry — Microsoft Defender for Cloud, Threat Protection for AI Workloads. GA. Consumes Prompt Shields and adds Defender's own classifiers; raises typed alerts including Jailbreak attempt detected, Suspected prompt injection, Sensitive data exposure in AI response, Credential theft in AI response, Suspected data exfiltration. Alerts flow to the Defender portal, Defender XDR, and Sentinel. Near-real-time (seconds–minutes from the inference event).
  3. M365 Copilot detection plane — Microsoft Defender XDR for Microsoft 365 Copilot. Surfaces native UPIA and XPIA alerts for Microsoft 365 Copilot interactions (including XPIA from poisoned files, emails, and SharePoint content), correlates with user / device / identity signals across Defender XDR, and is the canonical M365-side incident plane for adversarial events against Copilot.
  4. Supervisory plane — Microsoft Purview Communication Compliance, "Detect Microsoft Copilot Interactions" policy template. GA. Built-in Prompt Shield classifier (prompt-injection / jailbreak risk) and Protected Material classifier (copyrighted / branded output). Places matched interactions into a reviewer queue with attestation — the only Microsoft surface that produces a supervisory-grade audit trail appropriate for FINRA 3110 / Notice 25-07. Detective only, not preventive.
  5. Audit & evidence plane — Microsoft 365 Unified Audit (CopilotInteraction) + Purview eDiscovery + DSPM for AI. Preserves the who / when / which agent / which thread / sensitive-data-touched metadata for every Copilot interaction. Note: the standard CopilotInteraction audit record does not include full prompt body text — pattern matching on prompt content must come from Prompt Shields / Defender / Comm Compliance, not from KQL over audit. Pair with Control 1.19 (eDiscovery for Agent Interactions) to preserve and produce attack evidence under SEC 17a-4(b)(4).
  6. Cross-plane correlation — Microsoft Sentinel. Ingest Defender XDR, Defender for Cloud, Purview, and Power Platform signals; use Sentinel Content Hub solutions (Microsoft 365 Copilot, Defender for Cloud) for prebuilt analytics rules, hunting queries, and workbooks rather than building from scratch. Sentinel rule windows do not make audit-derived signals real-time — synchronous detection comes from #1–#3.

Latency reality check. Only Prompt Shields is synchronous with the prompt. Defender for Cloud / Defender XDR alerts are typically seconds–minutes. Communication Compliance, DSPM-for-AI, and Unified Audit-derived signals can lag from minutes to hours. WSP language and Zone 3 SLAs must be written against these documented latencies, not against an aspirational "real-time" claim.


Key Configuration Points

  • Enable Azure AI Content Safety Prompt Shields on every Azure OpenAI / Azure AI Foundry deployment that backs an FSI agent: turn on jailbreak (direct injection) detection and indirect attack (document-grounded XPIA) detection. Default action Annotate for Zone 1/2, Block for Zone 3. Verify category coverage (multilingual, encoded, obfuscated) at deploy.
  • Enable Microsoft Defender for Cloud — Threat Protection for AI Workloads on the Azure subscription(s) hosting AI inference resources. Confirm the plan is On; verify alerts surface in the Defender portal and stream to Defender XDR + Sentinel.
  • Enable Defender XDR for Microsoft 365 Copilot detections. Verify UPIA/XPIA alerts appear in the Defender portal Incidents queue and that the Microsoft 365 Copilot alert source is enabled.
  • Configure the Purview Communication Compliance "Detect Microsoft Copilot Interactions" policy template with Prompt Shield classifier and Protected Material classifier enabled; scope to Copilot users in Zones 2 and 3; assign reviewer roles per FINRA 3110 supervisory hierarchy; document the review SLA in the WSP.
  • Enable Purview Audit (Control 1.7) so CopilotInteraction records are captured for every Copilot / Copilot Studio agent thread; cross-reference Control 1.19 for retention via eDiscovery hold.
  • Enable Microsoft Sentinel Content Hub solutions for Microsoft 365 Copilot and Defender for Cloud to deploy prebuilt analytics rules, hunting queries, and workbooks; avoid hand-rolling KQL over CopilotInteraction for prompt-content pattern matching — that data is not in the standard schema.
  • Define zone-specific response posture explicitly: Zone 1 = annotate-and-log only (Prompt Shields annotate; Comm Compliance review queue); Zone 2 = annotate + alert + supervisory review; Zone 3 = Prompt Shields block + Defender XDR auto-incident + Comm Compliance high-priority review + (where applicable) Copilot Studio agent disable via governance automation.
  • Preserve attack evidence per the firm's WSP retention schedule — for broker-dealer recordkeeping-scope agents this is typically 6 years (first 2 easily accessible) under SEC 17a-4(b)(4); pair Comm Compliance case + Defender XDR incident export + CopilotInteraction records under an eDiscovery hold (Control 1.19).
  • Run quarterly AI red-team exercises against Zone 3 agents using Microsoft's PyRIT (Python Risk Identification Toolkit) or equivalent; capture findings into the agent risk register (Control 1.2).

Zone-Specific Requirements

Zone Requirement Rationale
Zone 1 (Personal) Prompt Shields Annotate; Defender for Cloud AI alerts on (informational); Comm Compliance review queue weekly; no preventive blocking Low risk; awareness-focused; preserve user productivity
Zone 2 (Team) Prompt Shields Annotate (Block on jailbreak only); Defender for Cloud AI alerts integrated to Sentinel; Comm Compliance review queue with named reviewer; weekly supervisory review Balanced protection for shared agents with supervisory accountability
Zone 3 (Enterprise / customer-facing) Prompt Shields Block on direct + indirect injection; Defender for Cloud AI threat protection plan On; Defender XDR auto-incident; Comm Compliance high-priority queue with daily reviewer SLA; quarterly AI red-team exercise (PyRIT or equivalent); attack evidence retained per SEC 17a-4(b)(4) cadence; sovereign-cloud parity confirmed for every primitive in scope Maximum protection for customer-NPI / MNPI / order-facing agents

Roles & Responsibilities

Role Responsibility
Azure / Defender for Cloud Admin Enable Defender for Cloud Threat Protection for AI Workloads on relevant subscriptions; tune AI alert policies
AI / Azure AI Foundry Admin Enable and tune Azure AI Content Safety Prompt Shields on every Azure OpenAI / Foundry deployment backing an FSI agent
Defender XDR Admin Enable Microsoft 365 Copilot detection source; manage UPIA/XPIA incident response playbooks
Sentinel SOC Analyst Triage incidents; install Microsoft 365 Copilot + Defender for Cloud Content Hub solutions; build/maintain hunting queries
Purview Communication Compliance Admin Configure the Detect Microsoft Copilot Interactions template with Prompt Shield + Protected Material classifiers; manage reviewer assignments and SLA
Purview Data Security AI Admin Validate DSPM-for-AI signal coverage feeding adversarial-input correlation; review Activity Explorer evidence
AI Governance Lead Own the cross-plane policy (Annotate vs Block by zone); convene quarterly red-team review
AI Red Team / Pen-test Owner Execute PyRIT-driven adversarial test plans; route findings to agent risk register (Control 1.2)
Designated Supervisor / Registered Principal FINRA 3110 supervisory sign-off on Comm Compliance Copilot review queue
Compliance Officer Validate regulatory mapping (FINRA 3110 / 4511 / 25-07 / SEC 17a-4 / GLBA / OCC 2011-12); accept residual risk
CISO Approve Zone 3 blocking posture and incident-handling SLAs

Control Relationship
1.6 - DSPM for AI Sensitive-data exposure signals correlate with adversarial-input detections; XPIA mitigations rely on accurate DSPM coverage
1.7 - Audit Logging Provides CopilotInteraction and Purview audit evidence trail for adversarial events
1.8 - Runtime Protection Complementary runtime threat detection (endpoint / identity) for the user side of an attack
1.10 - Communication Compliance Hosts the Detect Microsoft Copilot Interactions policy with the Prompt Shield + Protected Material classifiers — the supervisory plane for FINRA 3110 / 25-07
1.13 - Sensitive Information Types SIT signal feeds Comm Compliance and DSPM-for-AI quality; sensitive-data-in-AI-response alerts depend on it
1.14 - Data Minimization & Agent Scope XPIA blast radius is governed by what an agent can ground on — minimization is a primary XPIA mitigation
1.19 - eDiscovery for Agent Interactions Preservation and production of attack evidence for SEC 17a-4(b)(4) / FINRA 4511
1.24 - Defender for AI Services Defender for Cloud Threat Protection for AI Workloads (Azure-side detection plane)
3.4 - Incident Reporting Incident response and root-cause analysis for confirmed adversarial events
3.9 - Sentinel Integration Cross-plane correlation, Content Hub solutions, hunting
4.6 - Grounding Scope Governance Reduces the corpus an attacker can poison for XPIA — primary preventive mitigation

Implementation Playbooks

Step-by-Step Implementation

This control has detailed playbooks for implementation, automation, testing, and troubleshooting:


Verification Criteria

Confirm control effectiveness by verifying:

  1. Prompt Shields enabled on every Azure OpenAI / Azure AI Foundry deployment backing an FSI agent: jailbreak (direct injection) On; indirect attack (XPIA) On; default action set per zone (Annotate Zones 1/2; Block Zone 3).
  2. Synthetic direct-injection test ("role-override / system-prompt-extraction" pattern) submitted to a Zone 3 agent is blocked at the inference boundary by Prompt Shields and produces a Jailbreak attempt detected alert in Microsoft Defender for Cloud — Threat Protection for AI Workloads.
  3. Synthetic XPIA test (poisoned document seeded into a SharePoint library the agent grounds on, then queried via a benign Copilot prompt) produces an XPIA alert in Defender XDR for Microsoft 365 Copilot.
  4. Purview Communication Compliance "Detect Microsoft Copilot Interactions" policy is active for all Copilot users in Zone 2/3; the synthetic test interactions appear in the reviewer queue with the Prompt Shield classifier match flag; reviewer SLA met.
  5. Defender XDR incident is auto-created for the Zone 3 synthetic event with full kill-chain entities (user, agent, prompt source, grounded files); incident streams into Sentinel.
  6. Sentinel Content Hub solutions for Microsoft 365 Copilot and Microsoft Defender for Cloud are installed; at least one analytics rule from each is Enabled; corresponding workbook renders.
  7. CopilotInteraction audit records for the test interactions are present in Purview Audit and accessible via the Search-UnifiedAuditLog API; cross-reference Control 1.7.
  8. eDiscovery hold (Control 1.19) placed on the test custodian preserves the Comm Compliance case + Defender XDR export + CopilotInteraction records for the agreed retention window (typically 6 years for broker-dealer scope under SEC 17a-4(b)(4)).
  9. Sovereign-cloud availability of every primitive in scope (Prompt Shields, Defender for Cloud AI, Defender XDR for Copilot, Comm Compliance Prompt Shield classifier, DSPM for AI) is confirmed available in the tenant's cloud (Commercial / GCC / GCC High / DoD) — capture the verification evidence per release.
  10. Quarterly AI red-team exercise (PyRIT or equivalent) executed against Zone 3 agents within the past 90 days; findings logged in the agent risk register (Control 1.2) and any new detection rules promoted to Sentinel.
  11. WSP / Supervisory documentation explicitly states the FINRA 3110 / Notice 25-07 review cadence, names the Designated Supervisor, and references the latency characteristics of the Microsoft signals (Prompt Shields synchronous; Defender alerts seconds–minutes; Comm Compliance + audit minutes–hours) — no "real-time" overclaim.
  12. False-positive review log maintained on a defined cadence (recommended weekly for Zone 3); findings routed back to Prompt Shields tuning, Comm Compliance policy thresholds, and Sentinel rule tuning.

Additional Resources


Updated: April 2026 | Version: v1.4.0 | UI Verification Status: Current (re-verified April 2026)