Spec: Confidence / Uncertainty Capture and Routing

Purpose: Ensure agent outputs are handled according to risk by capturing confidence/uncertainty and enforcing routing rules (review, escalation, block) based on zone, action type, and confidence.
Applies to: Zone 3 (required), Zone 2 (recommended), Zone 1 (baseline banding recommended).
Regulatory driver: FINRA highlights that AI agents may perform multi-step reasoning that reduces transparency/explainability and may act autonomously beyond authority, reinforcing the need for governance, oversight, and guardrails.
Evidence driver: Microsoft Purview Copilot audit records include structured interaction metadata and can include message-level flags like JailbreakDetected (prompt jailbreak attempt) and resource-level XPIADetected (cross prompt injection attack detection) that are useful "uncertainty/risk signals" for routing.

1) Definitions

Confidence: A measure of how likely the agent’s output is correct/appropriate for the request (not “model probability” unless the model provides it).
Uncertainty: Evidence that the agent may be wrong, missing required information, or operating outside its designed capability.
Routing: The process that determines whether the output is (a) allowed to proceed, (b) requires human review, (c) is escalated, or (d) is blocked.

2) Confidence representation (choose one)

Option A (recommended): 3-band confidence

High / Medium / Low
Works across different agent implementations and avoids false precision.

Option B: Numeric score

confidence_score in [0.0, 1.0]
Only if consistently derivable and calibrated.

For regulated governance, consistency and auditability matter more than “perfect” confidence math.

3) Inputs to compute confidence (minimum viable)

Confidence should be computed from a policy-driven heuristic (even if the model doesn’t provide a score), using signals like:

C1) Source support

Number of distinct sources used
Presence of authoritative internal sources vs “no source”
Conflicts between sources

C2) Policy and boundary signals

Any DLP/policy blocks (PolicyDetails present)
Any restricted label encountered in AccessedResources.SensitivityLabelId

C3) Attack/abuse signals (risk, not “confidence”)

Any prompt message flagged JailbreakDetected == true
Any resource flagged XPIADetected == true

C4) Task type + criticality

Informational vs recommendation vs execute
Customer-facing vs internal
Consequential decision context

4) Routing policy (required)

4.1 Routing matrix

Zone	Decision type	Confidence	Default route	Notes
Zone 1	Inform	High/Med/Low	Allow	Log band + key metadata
Zone 1	Recommend/Execute	Any	Block or review	Zone 1 should not execute
Zone 2	Inform/Recommend	High	Allow	With monitoring
Zone 2	Inform/Recommend	Medium	Review optional	Reviewer sampling recommended
Zone 2	Inform/Recommend	Low	Mandatory review	Create ticket/case
Zone 2	Execute	Any	Mandatory review	Execute requires approval
Zone 3	Inform/Recommend	High	Allow	Monitor + log
Zone 3	Inform/Recommend	Medium	Review required	Default to review in regulated
Zone 3	Inform/Recommend	Low	Escalate	Block auto actions
Zone 3	Execute	High only	Allow only if AAM allows	Must still pass hard limits
Zone 3	Execute	Medium/Low	Block + escalate	No auto-exec

4.2 Mandatory escalation triggers (override confidence)

Regardless of confidence band, escalate/block if: - Prohibited action attempted (AAM violation) - Restricted label boundary crossed - JailbreakDetected == true for prompt message - XPIADetected == true for accessed resource - Scope drift detected (connector/site drift) - Missing required sources for a regulated recommendation (“unsupported claim”)

5) Logging requirements (tie to Decision Log Schema)

For each material interaction: - confidence_band (required) - confidence_factors (optional list, recommended) - routing_outcome (allow/review/escalate/block) - routing_reason_codes (required if not “allow”) - review_required boolean + reviewer outcome if applicable

Also log whether: - jailbreak signal observed - XPIA signal observed

6) Human review requirements

Review checklist (minimum)

Is the output supported by sources?
Does it comply with communications/disclosure rules if customer-facing?
Does it contain any sensitive data that should be masked?
Is the action within authorized scope?

Reviewer actions

Approve as-is
Modify output
Reject output
Escalate to compliance/security

7) Calibration and QA (monthly/quarterly)

Sample Low/Medium/High outputs and measure:
approval rate
correction rate
false-negative rate (High confidence but wrong)
Adjust confidence heuristics and thresholds
Record changes via change management

FINRA underscores governance/testing/monitoring expectations for GenAI use and the need for supervisory processes under FINRA Rule 3110.

8) “Done” criteria

Confidence bands recorded for Zone 2/3 agents in decision logs.
Routing matrix enforced (no bypass without logged override).
Mandatory escalation triggers implemented and tested.
Monthly calibration report exists and is reviewed by governance stakeholders.

FSI Agent Governance Framework v1.2.51 - January 2026

Specification	Relationship
Human-in-the-Loop Triggers	Defines when low-confidence responses require human review — uses confidence thresholds defined in this spec
Zone 1 Minimum Explainability	Specifies minimum transparency requirements for confidence scores shown to users