Troubleshooting: Control 2.9 - Agent Performance Monitoring and Optimization

Last Updated: April 2026 Audience: M365 administrators investigating monitoring gaps that could create false-clean evidence for FINRA / SEC / OCC examinations.

Quick reference

Symptom	Likely cause	First action
Analytics dashboard empty for new agents	Pre-populate window (24–48 h) not elapsed; or analytics disabled	Confirm in PPAC → Analytics → Microsoft Copilot Studio
App Insights "configured" but no telemetry	Connection string typo; ingestion lag; sampling at 0%	Run Script 2 in `powershell-setup.md`; check sampling settings
Power BI dashboard stale	Dataflow refresh failure; ADLS export lag; gateway down	Check refresh history; manually refresh
Alerts not firing	Power Automate flow disabled; threshold above current values; smart detection still in baselining	Verify flow run history; lower threshold to test
CSAT empty	Survey topic not published or not routed at end of conversation	Add satisfaction survey topic to agent
Get-AdminPowerAppEnvironmentRoleAssignment empty on Dataverse env	Documented Microsoft behavior — these cmdlets do not work on Dataverse-backed environments	Use PPAC Dataverse Security Roles

Detailed troubleshooting

Issue 1 — Copilot Studio analytics empty after deployment

Symptoms: No sessions, conversations, topic resolution, or CSAT visible for an agent that is in production use.

Investigation:

Confirm the agent is published (analytics aggregates only published-channel sessions, not test-pane traffic).
Confirm at least 24 h has elapsed since first user session — initial population is not real-time.
Confirm the user viewing analytics has either AI Administrator, Power Platform Admin, or environment Maker role on the environment.

Resolution: Wait the documented lag, then verify role and publication. If still empty after 72 h with confirmed user activity, open a Microsoft support ticket — this is platform-side.

Issue 2 — Application Insights linked but no telemetry ingested

Symptoms: Settings → Application Insights shows a connection string; KQL requests | where timestamp > ago(24h) | count returns 0.

Investigation steps:

Sampling: check customDimensions['samplingRate']. If set to 0% by an applicationinsights.config override or telemetry processor, all data is dropped.
Ingestion lag: new resources can take 10–15 minutes for first events; if the link was just configured, wait.
Connection string mismatch: confirm the string in Copilot Studio resolves to the same App Insights resource you are querying. A common error is pasting the string from a sibling environment.
Daily cap: Application Insights has a configurable daily cap; if hit, ingestion stops until the next UTC day.
Network egress: if your tenant uses Private Link / restricted egress, confirm Copilot Studio's egress is allowlisted to the App Insights ingestion endpoint.

Resolution: Address the specific failure. After fix, allow 30 minutes and re-query.

Issue 3 — Power BI dashboard not refreshing

Symptoms: Dashboard tiles show stale timestamps or refresh history shows failures.

Investigation:

Source credentials: dataflow / dataset credentials may have expired (especially OAuth tokens). Re-authenticate.
ADLS Gen2 connector: verify the service principal still has Storage Blob Data Reader on the container.
On-premises gateway (if used): confirm the gateway service is running and the Power BI tenant shows it as Online.
Schema drift: if an analytics export added or removed columns, the dataset may fail with "column not found." Edit the dataflow to remove the reference or add a default.
Capacity throttling: Premium / Fabric capacity nearing CU exhaustion can throttle scheduled refresh. Inspect capacity metrics.

Resolution: Address the root cause and trigger a manual refresh to confirm. Document the cause in the change log if customer-facing reporting was affected.

Issue 4 — Alert flow / rule not triggering on threshold breach

Symptoms: Threshold is clearly exceeded in the dashboard but no notification arrived.

Power Automate path:

Confirm flow status is On (a tenant DLP policy change can disable flows that use a now-blocked connector).
Inspect the last 5 runs — common failures: 401 from Power BI dataset (refresh stale credentials); type mismatch in the threshold condition (string vs number).
Verify the Teams / email connector still has a valid principal — service-account password rotations break unattended flows.

Azure Monitor path:

Confirm the alert rule is Enabled and the action group has at least one receiver.
Smart Detection alerts require 7+ days of baseline data — newly created resources will not fire smart alerts.
Confirm the metric being evaluated is the right one — requests/failed vs requests/duration are distinct signals.
Check the action group history — Teams webhooks expire if the channel is deleted.

Resolution: Fix the failure, then run the Test 4 procedure in verification-testing.md to confirm and document.

Issue 5 — Performance degradation detected

Symptoms: Error rate or p95 latency exceeds zone threshold sustained over multiple intervals.

Triage:

Recent changes: review the agent change log (Control 2.5) for topic edits, model swaps, knowledge source updates, or connector changes in the last 7 days.
Backend dependencies: check Service Health (Azure + M365), then any custom APIs the agent calls.
Capacity: environment capacity metrics in PPAC — sessions per minute hitting the environment limit causes 429s that surface as errors.
Model degradation: for Zone 3, request a hallucination / grounding sample from the Model Risk Manager — performance drops can correlate with model behavior changes after platform updates.
Knowledge source freshness: SharePoint or web-source indexing failures can degrade generative answer quality without changing the agent.

Resolution: roll back the most recent change if correlated; otherwise file a Microsoft support ticket and document the incident under Control 2.7 (Incident Management).

Issue 6 — Hallucination / grounding metrics absent (Zone 3)

Symptoms: RAI dashboard shows no events or zero rate.

Likely causes:

The Azure AI Evaluation SDK pipeline is not running. Confirm the scheduled job (Azure Function / Logic App / Power Automate) executed in the last interval.
Custom events emitted by the evaluator are written to a different App Insights resource than the dashboard queries.
The sampling rate is too low — at 1% sampling on a low-volume agent, weeks may pass with zero captured events.

Resolution: Verify the evaluator is running, points to the correct App Insights, and the sampling rate is calibrated for agent volume. Document the sampling methodology in the MRM memo.

Escalation path

Agent Owner — agent-specific issues, topic edits
Power Platform Admin / AI Administrator — analytics, App Insights linkage, environment capacity
AI Governance Lead — KPI definitions, threshold tuning, cross-zone trends
Model Risk Manager — sustained Zone 3 degradation, MRM memo updates (OCC Bulletin 2026-13 (formerly OCC 2011-12) / Fed SR 26-2 (formerly SR 11-7))
Microsoft Support — platform issues confirmed reproducible after triage

Known limitations

Limitation	Impact	Workaround
24–48 h lag for first analytics population	New deployments have a blind window	Combine with App Insights (near real-time) for the first week
CSAT requires user prompt	No score without a survey topic	Publish satisfaction survey at end of conversation
Cross-environment views in PPAC are limited	Multi-environment tenants need manual aggregation	Use ADLS export + Power BI to unify
Application Insights minimum sampling	Very low-traffic agents may not show events	Set sampling to 100% for Zone 3 small-volume agents
`*-AdminPowerAppEnvironmentRoleAssignment` does not work on Dataverse-backed environments	Returns empty silently	Use PPAC Dataverse Security Roles per the PowerShell baseline

Back to Control 2.9 | Portal Walkthrough | PowerShell Setup | Verification & Testing

Updated: May 2026 | Version: v1.6.2 | UI Verification Status: Current