Verification & Testing: Control 2.9 - Agent Performance Monitoring and Optimization

Last Updated: April 2026 Audience: M365 administrators preparing audit-ready evidence for FINRA / SEC / OCC examinations of AI agent monitoring.

How verification maps to regulatory ask

Test	Helps demonstrate
TC-2.9-01 to TC-2.9-04	FINRA 4511 / SEC 17a-3/4 record-keeping; FINRA RN 24-09 + Rule 3110 supervisory review
TC-2.9-05 to TC-2.9-07	OCC Bulletin 2026-13 (formerly OCC 2011-12) / Fed SR 26-2 (formerly SR 11-7) ongoing model performance monitoring
TC-2.9-08 to TC-2.9-09	SOX 404 control-effectiveness testing
TC-2.9-10	GLBA 501(b) integrity of customer-information processing

Manual verification steps

Test 1 — Native analytics data flow (TC-2.9-01)

Open PPAC → Analytics → Microsoft Copilot Studio.
Select an environment with Zone 2/3 agents.
Confirm sessions, resolution rate, and CSAT data for the trailing 7+ days.
Expected: non-zero sessions, recent timestamps, no banner indicating analytics is disabled.
Evidence: screenshot to maintainers-local/tenant-evidence/2.9/.

Test 2 — Application Insights linkage (TC-2.9-02)

In Copilot Studio, open each Zone 2/3 agent → Settings → Advanced → Application Insights.
Confirm a connection string is configured.
In Azure Portal, run KQL: requests | where timestamp > ago(24h) | summarize count().
Expected: count > 0 within 24 h. An empty result with a "configured" string is a monitoring gap — escalate.

Test 3 — Power BI dashboard accuracy (TC-2.9-03)

Open the Agent-Performance-Analytics workspace.
Compare KPI card values to the raw query in PPAC and Application Insights.
Verify the dataset refresh timestamp is within the configured SLA (Zone 3: ≤ 1 hour).
Expected: values match within the refresh window.

Test 4 — Alert triggering (TC-2.9-04)

Temporarily lower an alert threshold (e.g., error rate > 0.1%).
Wait for the next evaluation interval (Power Automate ≤ 1 h, Azure Monitor ≤ 5 min).
Confirm the notification arrives in Teams / email / paging system.
Restore the threshold immediately and record the test in the change log.
Expected: alert delivered to all configured channels within the documented SLA.

Test 5 — Latency percentile evidence (TC-2.9-05)

Run the KPI script (Get-AgentKpis.ps1) for the trailing 30 days.
Verify p50 / p95 / p99 are populated for every Zone 2/3 agent.
Expected: percentiles within the zone targets; outliers documented in the optimization backlog.

Test 6 — Quarterly model performance memo (TC-2.9-06, Zone 3 only)

Confirm the Model Risk Manager has produced a quarterly memo summarizing:
- KPI trend versus prior quarter
- Drift indicators (input distribution, output quality)
- Hallucination / grounding metrics where measured
- Optimization actions taken and their results
Expected: memo exists, dated within the prior quarter, referenced in MRM register per OCC Bulletin 2026-13 (formerly OCC 2011-12).

Test 7 — Hallucination / grounding telemetry (TC-2.9-07, Zone 3 — if implemented)

Confirm a custom event (e.g., customEvents | where name == "HallucinationDetected") returns rows.
Compare against the documented sampling methodology (e.g., 5% of sessions evaluated by Azure AI Evaluation SDK).
Expected: non-zero events; rate trends visible on the RAI dashboard.

Test 8 — Review meeting cadence (TC-2.9-08)

Inspect the calendar series for weekly / monthly / quarterly reviews.
Pull the last 3 months of meeting minutes from WORM-capable storage.
Expected: every meeting documented with attendees, KPI snapshot, and decisions.

Test 9 — Data export to immutable storage (TC-2.9-09)

Navigate to the ADLS Gen2 container or storage account holding the export.
Verify recent files (daily cadence) and the immutability policy is enabled in time-based retention mode.
Expected: files present; retention policy locked; deletion attempts denied (test with Remove-AzStorageBlob -WhatIf).

Test 10 — End-to-end customer-impact scenario (TC-2.9-10, Zone 3)

Replay a synthetic customer transcript known to stress the agent.
Observe latency, error rate, escalation, and CSAT proxy in the dashboard.
Expected: all signals captured; alerts behave per design; transcripts retained per SEC 17a-4.

Test case matrix

Test ID	Scenario	Expected
TC-2.9-01	Native analytics shows agent data	Sessions, CSAT visible (≤ 48 h lag)
TC-2.9-02	App Insights ingesting telemetry	Last 24 h count > 0
TC-2.9-03	Power BI dashboard accurate	Matches source within refresh window
TC-2.9-04	Alert triggers on threshold breach	Notification delivered to all channels
TC-2.9-05	Latency percentiles available	p50 / p95 / p99 populated
TC-2.9-06	Quarterly MRM memo exists	Dated within prior quarter
TC-2.9-07	RAI / hallucination telemetry	Events emitted per sampling design
TC-2.9-08	Review cadence honored	Minutes for last 3 months on WORM
TC-2.9-09	ADLS export immutable	Files present, retention locked
TC-2.9-10	Synthetic customer scenario	All signals + alert behaviors correct

Evidence collection checklist

Stage all evidence under maintainers-local/tenant-evidence/2.9/. The folder is gitignored — never commit tenant data.

Attestation statement template

## Control 2.9 Attestation - Agent Performance Monitoring and Optimization

**Organization:** [Organization Name]
**Control Owner:** [Name / Role]
**Tenant ID:** [Tenant ID]
**Period:** [YYYY-Q#]

I attest that, for the period above:

1. Copilot Studio analytics is enabled and producing data for all in-scope agents.
2. Application Insights is linked to every Zone 2 and Zone 3 agent and ingesting telemetry.
3. Performance KPIs are defined and approved per zone:
    - Zone 1: error rate < [X]%, p95 < [X] s
    - Zone 2: error rate < [X]%, p95 < [X] s, CSAT ≥ [X]
    - Zone 3: error rate < [X]%, p95 < [X] s, CSAT ≥ [X]
4. Alerts are configured for error rate, latency, and (Zone 2/3) CSAT, with escalation
   paths through [Teams / email / paging] and tested within the period (date: [date]).
5. Review cadence is operating:
    - Weekly operational: [day / time]
    - Monthly business: [day / time]
    - Quarterly executive: [day / time]
    - Quarterly model risk (Zone 3): [day / time], memo dated [date]
6. Telemetry is retained for [N] days in Application Insights and [N] days on WORM-capable
   storage to help meet SEC 17a-4(f) and FINRA 4511 record-keeping requirements.
7. Telemetry retention periods meet SEC 17a-4(f) and FINRA 4511 requirements.

**Agents monitored:** [Count]
**Evidence package:** [path / SHA-256 of manifest]

**Signature:** _______________________
**Date:** _______________________

Back to Control 2.9 | Portal Walkthrough | PowerShell Setup | Troubleshooting

Updated: May 2026 | Version: v1.6.2 | UI Verification Status: Current