Verification & Testing: Control 2.9 - Agent Performance Monitoring and Optimization
Last Updated: April 2026 Audience: M365 administrators preparing audit-ready evidence for FINRA / SEC / OCC examinations of AI agent monitoring.
How verification maps to regulatory ask
| Test | Helps demonstrate |
|---|---|
| TC-2.9-01 to TC-2.9-04 | FINRA 4511 / SEC 17a-3/4 record-keeping; FINRA 25-07 supervisory review |
| TC-2.9-05 to TC-2.9-07 | OCC 2011-12 / Fed SR 11-7 ongoing model performance monitoring |
| TC-2.9-08 to TC-2.9-09 | SOX 404 control-effectiveness testing |
| TC-2.9-10 | GLBA 501(b) integrity of customer-information processing |
Manual verification steps
Test 1 — Native analytics data flow (TC-2.9-01)
- Open PPAC → Analytics → Copilot Studio.
- Select an environment with Zone 2/3 agents.
- Confirm sessions, resolution rate, and CSAT data for the trailing 7+ days.
- Expected: non-zero sessions, recent timestamps, no banner indicating analytics is disabled.
- Evidence: screenshot to
maintainers-local/tenant-evidence/2.9/.
Test 2 — Application Insights linkage (TC-2.9-02)
- In Copilot Studio, open each Zone 2/3 agent → Settings → Advanced → Application Insights.
- Confirm a connection string is configured.
- In Azure Portal, run KQL:
requests | where timestamp > ago(24h) | summarize count(). - Expected: count > 0 within 24 h. An empty result with a "configured" string is a monitoring gap — escalate.
Test 3 — Power BI dashboard accuracy (TC-2.9-03)
- Open the
Agent-Performance-Analyticsworkspace. - Compare KPI card values to the raw query in PPAC and Application Insights.
- Verify the dataset refresh timestamp is within the configured SLA (Zone 3: ≤ 1 hour).
- Expected: values match within the refresh window.
Test 4 — Alert triggering (TC-2.9-04)
- Temporarily lower an alert threshold (e.g., error rate > 0.1%).
- Wait for the next evaluation interval (Power Automate ≤ 1 h, Azure Monitor ≤ 5 min).
- Confirm the notification arrives in Teams / email / paging system.
- Restore the threshold immediately and record the test in the change log.
- Expected: alert delivered to all configured channels within the documented SLA.
Test 5 — Latency percentile evidence (TC-2.9-05)
- Run the KPI script (
Get-AgentKpis.ps1) for the trailing 30 days. - Verify p50 / p95 / p99 are populated for every Zone 2/3 agent.
- Expected: percentiles within the zone targets; outliers documented in the optimization backlog.
Test 6 — Quarterly model performance memo (TC-2.9-06, Zone 3 only)
- Confirm the Model Risk Manager has produced a quarterly memo summarizing:
- KPI trend versus prior quarter
- Drift indicators (input distribution, output quality)
- Hallucination / grounding metrics where measured
- Optimization actions taken and their results
- Expected: memo exists, dated within the prior quarter, referenced in MRM register per OCC 2011-12.
Test 7 — Hallucination / grounding telemetry (TC-2.9-07, Zone 3 — if implemented)
- Confirm a custom event (e.g.,
customEvents | where name == "HallucinationDetected") returns rows. - Compare against the documented sampling methodology (e.g., 5% of sessions evaluated by Azure AI Evaluation SDK).
- Expected: non-zero events; rate trends visible on the RAI dashboard.
Test 8 — Review meeting cadence (TC-2.9-08)
- Inspect the calendar series for weekly / monthly / quarterly reviews.
- Pull the last 3 months of meeting minutes from WORM-capable storage.
- Expected: every meeting documented with attendees, KPI snapshot, and decisions.
Test 9 — Data export to immutable storage (TC-2.9-09)
- Navigate to the ADLS Gen2 container or storage account holding the export.
- Verify recent files (daily cadence) and the immutability policy is enabled in time-based retention mode.
- Expected: files present; retention policy locked; deletion attempts denied (test with
Remove-AzStorageBlob -WhatIf).
Test 10 — End-to-end customer-impact scenario (TC-2.9-10, Zone 3)
- Replay a synthetic customer transcript known to stress the agent.
- Observe latency, error rate, escalation, and CSAT proxy in the dashboard.
- Expected: all signals captured; alerts behave per design; transcripts retained per SEC 17a-4.
Test case matrix
| Test ID | Scenario | Expected | Pass / Fail |
|---|---|---|---|
| TC-2.9-01 | Native analytics shows agent data | Sessions, CSAT visible (≤ 48 h lag) | |
| TC-2.9-02 | App Insights ingesting telemetry | Last 24 h count > 0 | |
| TC-2.9-03 | Power BI dashboard accurate | Matches source within refresh window | |
| TC-2.9-04 | Alert triggers on threshold breach | Notification delivered to all channels | |
| TC-2.9-05 | Latency percentiles available | p50 / p95 / p99 populated | |
| TC-2.9-06 | Quarterly MRM memo exists | Dated within prior quarter | |
| TC-2.9-07 | RAI / hallucination telemetry | Events emitted per sampling design | |
| TC-2.9-08 | Review cadence honored | Minutes for last 3 months on WORM | |
| TC-2.9-09 | ADLS export immutable | Files present, retention locked | |
| TC-2.9-10 | Synthetic customer scenario | All signals + alert behaviors correct |
Evidence collection checklist
- Screenshot: PPAC → Analytics → Copilot Studio
- Screenshot: Copilot Studio agent → Settings → Application Insights connection
- Screenshot: Power BI dashboard KPI cards with refresh timestamp
- Screenshot: Alert notification (Teams / email / paging system)
- Screenshot: ADLS Gen2 container with files + immutability policy detail
- Export:
agent-inventory-*.json+ manifest with SHA-256 (Script 1) - Export:
appinsights-linkage-*.json(Script 2) - Export:
kpis-30-day-*.json(Script 3) - Document: quarterly MRM memo (Zone 3)
- Document: review meeting calendar series + last 3 months of minutes
- Document: change log entry for each alert threshold test (Test 4)
Stage all evidence under maintainers-local/tenant-evidence/2.9/. The folder is gitignored — never commit tenant data.
Attestation statement template
## Control 2.9 Attestation - Agent Performance Monitoring and Optimization
**Organization:** [Organization Name]
**Control Owner:** [Name / Role]
**Tenant ID:** [Tenant ID]
**Cloud:** [Commercial / GCC / GCC High / DoD]
**Period:** [YYYY-Q#]
I attest that, for the period above:
1. Copilot Studio analytics is enabled and producing data for all in-scope agents.
2. Application Insights is linked to every Zone 2 and Zone 3 agent and ingesting telemetry.
3. Performance KPIs are defined and approved per zone:
- Zone 1: error rate < [X]%, p95 < [X] s
- Zone 2: error rate < [X]%, p95 < [X] s, CSAT ≥ [X]
- Zone 3: error rate < [X]%, p95 < [X] s, CSAT ≥ [X]
4. Alerts are configured for error rate, latency, and (Zone 2/3) CSAT, with escalation
paths through [Teams / email / paging] and tested within the period (date: [date]).
5. Review cadence is operating:
- Weekly operational: [day / time]
- Monthly business: [day / time]
- Quarterly executive: [day / time]
- Quarterly model risk (Zone 3): [day / time], memo dated [date]
6. Telemetry is retained for [N] days in Application Insights and [N] days on WORM-capable
storage to help meet SEC 17a-4(f) and FINRA 4511 record-keeping requirements.
7. Sovereign cloud feature parity has been verified against current Microsoft Learn
documentation; gaps (if any) are documented in [reference].
**Agents monitored:** [Count]
**Evidence package:** [path / SHA-256 of manifest]
**Signature:** _______________________
**Date:** _______________________
Back to Control 2.9 | Portal Walkthrough | PowerShell Setup | Troubleshooting