Control 3.6 — Troubleshooting: Orphaned Agent Detection and Remediation

Scope. This playbook is the diagnostic companion to Control 3.6 — Orphaned Agent Detection and Remediation. It covers detection drift, sponsor/HR signal breaks, reassignment failures, bulk remediation partial success, terminal-delete blocks (Purview hold / retention label), multi-source reconciliation instability, false-positive owner classifications, and SIEM ingestion gaps.

Regulatory framing. Accurate, timely orphan detection and remediation supports compliance with FINRA Rule 4511 (records), SEC Rule 17a-4(f) (WORM retention of the orphan register and remediation evidence), SOX §404 (ITGC — access provisioning/de-provisioning around financially significant systems), GLBA Safeguards Rule (access controls over nonpublic personal information), and OCC Bulletin 2013-29 / Fed SR 26-2 (formerly SR 11-7) (third-party and model risk for AI agents that persist after sponsor separation). Controls described here support — they do not replace registered-principal supervisory review under FINRA Rule 3110 or firm-specific model-risk governance.

Audience. AI Administrator, Power Platform Admin, Entra Agent ID Admin, Entra Identity Governance Admin, Entra Global Reader, AI Governance Lead, Purview Compliance Admin, and SOC analysts supporting examiner/audit response.

Sibling playbooks. portal-walkthrough.md · powershell-setup.md · verification-testing.md

§0 Triage Tree
§1 Diagnostic Data Collection
§2 Pillar — DETECT-NO-RESULTS
§3 Pillar — DETECT-DATA-DRIFT
§4 Pillar — HR-SIGNAL-MISSING
§5 Pillar — SPONSOR-TRANSFER-FAIL
§6 Pillar — REASSIGN-FAIL
§7 Pillar — BULK-PARTIAL-FAIL
§8 Pillar — TERMINAL-DELETE-BLOCKED
§9 Pillar — RECONCILE-DRIFT
§10 Pillar — FALSE-POSITIVE-DL-OWNER
§11 Pillar — SIEM-NO-EVENTS
§12 Runbook RB-01 — M&A / RIF Mass Departure
§13 Runbook RB-02 — Examiner Pulls Register with SIEM Gaps
§14 Runbook RB-03 — Sponsor Termination Cascade
§15 Runbook RB-04 — Bulk Reassign Over-Applied
§16 Runbook RB-05 — SOX Zero-Orphan Attestation Failure
§17 Runbook RB-06 — Litigation-Hold Deletion Blocked
§18 Escalation Matrix
§19 Evidence Collection During Incident
§20 Cross-References

§0 Triage Tree

0.1 Symptom → Pillar Map

#	Presenting Symptom	Most Likely Pillar	First Diagnostic
1	`Find-OrphanedAgents` returns 0 rows but known terminated sponsors exist	§2 DETECT-NO-RESULTS	`Get-Agt36Health -Surface Detection`
2	Agent 365 Ownerless Agents card shows N, script shows N±Δ	§3 DETECT-DATA-DRIFT	GQ-01 vs. card export diff
3	Terminated user's `employeeLeaveDateTime` is null in Graph	§4 HR-SIGNAL-MISSING	GQ-03 HR attribute audit
4	Entra Lifecycle Workflow "Manager Transfer" didn't fire on leaver	§5 SPONSOR-TRANSFER-FAIL	LCW run history + GQ-04
5	`Set-AgentOwner` throws "principal not licensed for environment"	§6 REASSIGN-FAIL	`Get-Agt36OwnerEligibility`
6	Bulk remediation reports 47 of 52 succeeded, no per-agent detail	§7 BULK-PARTIAL-FAIL	Correlation-ID replay
7	`Remove-OrphanedAgent` returns "PolicyHold" / "LabelRetained"	§8 TERMINAL-DELETE-BLOCKED	Purview label + hold check
8	Orphan register count changes ±15% run-to-run with no known events	§9 RECONCILE-DRIFT	3-source reconciliation diff
9	Ownerless Agents card not visible	Triage as DETECT-NO-RESULTS (§2) — confirm PRE-06 status	`Get-Agt36Health -Surface Detection`
10	Agent owned by a Distribution List flagged as "orphan"	§10 FALSE-POSITIVE-DL-OWNER	Owner principalType inspect
11	Sentinel shows no `AgentOrphanDetectionRun` events for >24h	§11 SIEM-NO-EVENTS	KQL-01 + connector health
12	Examiner asks for orphan register as-of prior quarter, cannot produce	§13 RB-02	WORM snapshot retrieval
13	RIF event produced 600 leavers, LCW throttled	§12 RB-01	Throttle metrics + batch plan
14	SOX attestation shows 3 orphaned Zone 3 agents past SLA	§16 RB-05	SLA clock audit
15	Litigation hold owner refuses delete despite terminal SLA breach	§17 RB-06	Custodian coordination

0.2 Severity Matrix

Severity	Criteria	Response Target	Notifications
SEV-1	Detection pipeline is silent >24h and HR leaver events exist in the window; or examiner/regulator is on-the-clock	Triage within 1h; mitigation within 4h	AI Governance Lead, CISO delegate, Legal (if examiner)
SEV-2	Zone 3 (Enterprise) orphan past 7-day remediation SLA with no owner candidate; bulk remediation <90% success	Triage within 4h; mitigation within 24h	AI Governance Lead, Control Owner
SEV-3	Zone 2 orphan past SLA; reconciliation drift 5-15%; single reassign failure	Triage within 1 business day	Control Owner, sponsor line manager
SEV-4	Zone 1 orphan past SLA; cosmetic/reporting defect; single false positive	Triage within 5 business days	Control Owner only

0.3 Pre-Escalation Checklist

Before escalating any incident above SEV-3, the on-call administrator must confirm and record:

Evidence preservation initiated — before any remediation action, capture E-01 (orphan register snapshot at time of detection) and E-02 (raw detection-run log) per §1.5.
Scope quantified — exact count of affected agents, by Zone, with tenant and environment IDs.
Examiner/litigation flag — confirmation from Legal / Compliance whether any affected agent is subject to an active examination, subpoena, or litigation hold (determines whether §8 TERMINAL-DELETE-BLOCKED logic must be honored before any delete).
Control dependencies verified — status of upstream controls 1.2 Agent Registry, 2.25 Agent 365 Console, 2.26 Entra Agent ID, 3.1 Agent Inventory.

0.4 Examiner Artifact Preservation (Do-This-First)

If the incident could be pulled into an examination, subpoena, or internal investigation:

Do not delete any orphan entries, detection logs, or reassignment audit rows until the Legal Hold Coordinator confirms preservation.
Snapshot the current orphan register to an immutable storage location (Blob with legal hold, or Purview retention label FSI-Examiner-Preserve-Indefinite).
Record the snapshot hash (SHA-256), timestamp (UTC), operator UPN, and ticket ID in the incident record.
Do not run Remove-OrphanedAgent or Set-AgentOwner -Force against any preserved agent without written Legal approval.
Preserve raw Graph and Power Platform activity logs for the affected agents for 7 years (FINRA 4511 floor) or longer if Legal specifies.

§1 Diagnostic Data Collection

All diagnostic helpers are thin, read-only wrappers that call the underlying Graph, Power Platform, and Log Analytics APIs and standardize output for evidence capture. They never mutate state. Names follow the Get-Agt36* convention.

1.1 Helper Inventory

Helper	Purpose	Inputs	Output
`Get-Agt36Health`	One-shot health snapshot across detection, HR connector, LCW, Purview hold services, Agent 365 card, SIEM connector	`-Surface <Detection\\|HR\\|LCW\\|Purview\\|Card\\|SIEM\\|All>`	PSCustomObject with per-surface status, lastSuccess, lagMinutes
`Get-Agt36Orphan`	Pulls current orphan register with full metadata; supports as-of queries against WORM snapshots	`-AsOf <datetime>`, `-Zone <1\\|2\\|3>`, `-IncludeShadow`	Array of orphan records with category, detection source, SLA clock
`Get-Agt36SponsorImpact`	Given a leaver UPN, lists all agents owned, co-owned, or uniquely operated by that principal	`-LeaverUpn <string>`, `-Lookback <days>`	Impact report grouped by Zone
`Get-Agt36ReconcileState`	Runs a live 3-source reconciliation: Power Platform DLP inventory, Graph `applications`/`servicePrincipals`, Agent 365 export	`-Tenant <guid>`	Reconciliation delta report with hashes
`Get-Agt36OwnerEligibility`	Tests whether a candidate owner meets licensing, environment membership, and SoD requirements before reassignment	`-CandidateUpn <string>`, `-AgentId <guid>`	Eligibility verdict with blocking reasons
`Get-Agt36RetentionBindings`	For an agent, lists Purview retention labels, litigation holds, and eDiscovery case memberships	`-AgentId <guid>`	Binding list with hold IDs and custodians

All helpers return objects compatible with Export-Clixml, ConvertTo-Json -Depth 8, and ConvertTo-Csv for evidence archival.

1.2 Graph Query Catalog

ID	Purpose	Endpoint	Required Permission
GQ-01	Enumerate servicePrincipals with `tags` containing `AgentId`	`GET /servicePrincipals?$filter=tags/any(t:t eq 'AgentId')&$select=id,appId,displayName,owners,tags,signInActivity`	`Application.Read.All`
GQ-02	Identify ownerless agent app registrations	`GET /applications?$expand=owners&$filter=owners/$count eq 0`	`Application.Read.All`
GQ-03	Audit HR attribute presence on separated users	`GET /users?$filter=accountEnabled eq false&$select=id,userPrincipalName,employeeLeaveDateTime,employeeOrgData,manager`	`User.Read.All`
GQ-04	Lifecycle Workflow run history for a workflow	`GET /identityGovernance/lifecycleWorkflows/workflows/{id}/runs?$top=200`	`LifecycleWorkflows.Read.All`
GQ-05	Agent owners with expanded principalType (user / group / servicePrincipal)	`GET /applications/{id}/owners?$select=id,userPrincipalName,displayName,@odata.type`	`Application.Read.All`
GQ-06	Agent sign-in activity for staleness scoring	`GET /auditLogs/signIns?$filter=appId eq '{appId}' and createdDateTime ge {iso}`	`AuditLog.Read.All`
GQ-07	Directory role assignments to detect admin-owned agents	`GET /directoryRoles?$expand=members`	`RoleManagement.Read.Directory`
GQ-08	Guest/external owner enumeration (high risk)	`GET /applications?$expand=owners&$filter=owners/any(o:o/userType eq 'Guest')`	`Application.Read.All`

Permissions must be granted to the detection pipeline's managed identity or app registration; see 2.26 Entra Agent ID for the SoD-aware permission set.

1.3 KQL Query Catalog (Log Analytics / Sentinel)

// KQL-01 — Detection run cadence and success over last 14 days
AgentGovernance_CL
| where TimeGenerated > ago(14d)
| where EventType_s == "AgentOrphanDetectionRun"
| summarize Runs = count(), Successes = countif(Status_s == "Success"),
            LastSuccess = maxif(TimeGenerated, Status_s == "Success")
  by Tenant_s, bin(TimeGenerated, 1d)
| extend LagHours = datetime_diff('hour', now(), LastSuccess)
| where LagHours > 24 or Successes < Runs

// KQL-02 — Reassignment actions, grouped by operator and outcome
AgentGovernance_CL
| where TimeGenerated > ago(30d)
| where EventType_s in ("AgentOwnerReassigned", "AgentOwnerReassignFailed")
| summarize count() by OperatorUpn_s, EventType_s, FailureReason_s
| order by OperatorUpn_s asc

// KQL-03 — Purview hold / label block events blocking terminal delete
AgentGovernance_CL
| where EventType_s == "AgentTerminalDeleteBlocked"
| project TimeGenerated, AgentId_g, BlockReason_s, HoldId_s, CustodianUpn_s, Zone_s
| order by TimeGenerated desc

// KQL-04 — HR attribute propagation lag
AgentGovernance_CL
| where EventType_s == "HrAttributeObserved"
| extend LagMinutes = datetime_diff('minute', TimeGenerated, HrSystemTimestamp_t)
| summarize p50 = percentile(LagMinutes, 50), p95 = percentile(LagMinutes, 95),
            p99 = percentile(LagMinutes, 99) by bin(TimeGenerated, 1d)

// KQL-05 — Reconciliation drift alarms
AgentGovernance_CL
| where EventType_s == "AgentInventoryReconciliation"
| extend DriftPct = todouble(DriftCount_d) / todouble(TotalCount_d) * 100.0
| where DriftPct > 5.0
| project TimeGenerated, Tenant_s, DriftPct, SourceA_s, SourceB_s, DeltaHash_s

// KQL-06 — SIEM connector health (self-test ping)
AgentGovernance_CL
| where EventType_s == "ConnectorSelfTest"
| summarize LastPing = max(TimeGenerated) by ConnectorName_s
| extend SilenceMinutes = datetime_diff('minute', now(), LastPing)
| where SilenceMinutes > 30

1.4 Power Platform / Agent 365 Exports

Export	Surface	Format	Cadence
Ownerless Agents card CSV	Agent 365 Admin Center	CSV	On-demand
DLP policy inventory	Power Platform Admin Center	JSON (`Get-AdminDlpPolicy`)	Daily
Environment admin audit	Power Platform Admin Center	CSV	On-demand
Microsoft Copilot Studio publisher audit	Copilot Studio Admin	CSV	On-demand

1.5 Evidence Floor

Every Control 3.6 incident ticket must carry the following evidence artifacts, stored in immutable storage with FSI-Records-6yr-WORM or stricter retention:

ID	Artifact	Source	Retention
E-01	Orphan register snapshot at time of detection (CSV + SHA-256)	`Get-Agt36Orphan \\| Export-Csv`	7 years (SEC 17a-4(f) floor)
E-02	Raw detection-run log (JSONL)	Log Analytics export	7 years
E-03	Graph query responses (GQ-01..GQ-08 as relevant)	Graph API	7 years
E-04	Lifecycle Workflow run record (if SPONSOR-TRANSFER-FAIL)	`GET /lifecycleWorkflows/.../runs/{id}`	7 years
E-05	Reassignment audit rows (before/after owner, operator, justification)	`AgentGovernance_CL`	7 years
E-06	Bulk job correlation-ID result set (BULK-PARTIAL-FAIL)	Job orchestrator	7 years
E-07	Purview label / hold binding report (TERMINAL-DELETE-BLOCKED)	`Get-Agt36RetentionBindings`	7 years
E-08	Incident timeline with UTC timestamps, operator UPNs, ticket IDs	Ticketing system	7 years
E-09	Post-incident root-cause memo and control change log	Incident review	7 years

Hashes (SHA-256) of E-01 through E-07 must be recorded in the incident ticket and cross-filed with the Legal Hold Coordinator if any affected agent is subject to active examination or litigation hold.

§2 Pillar — DETECT-NO-RESULTS

Symptom. Find-OrphanedAgents (see powershell-setup.md) returns zero records even though terminated sponsors with known agent ownership exist in the tenant.

2.1 Symptom Catalog

Variant	Observable
Empty array	`Find-OrphanedAgents` returns `@()` with exit code 0
Silent failure	Cmdlet returns nothing; no log line emitted to `AgentGovernance_CL`
Filter collapse	Results returned but every row has same `detectedSource`, missing categories 2-10
Auth fail masquerading as empty	Cmdlet swallows 401/403 and returns empty set

2.2 Root Cause Matrix

Cause	Likelihood	Verification
Graph permission missing (`Application.Read.All`, `User.Read.All`, `AuditLog.Read.All`)	High	`Get-MgContext \\| Select-Object -ExpandProperty Scopes`
Managed identity lost role assignment	Medium	GQ-07 directory role check; Entra audit log `Remove member from role` events
Detection filter too narrow (only category 1 "ownerless")	High	Inspect cmdlet source; confirm all 10 categories enumerated
HR connector silent — `employeeLeaveDateTime` is null everywhere	Medium	GQ-03 distribution: percentage of disabled accounts with non-null leave date
Token cache stale after tenant change	Low	`Disconnect-MgGraph; Connect-MgGraph` and retry
Tenant throttling returning empty pages	Low	Inspect Graph response headers `Retry-After`, `x-ms-throttle-*`

2.3 Diagnostic Procedure

Run Get-Agt36Health -Surface Detection. If lastSuccess is more than 24h old or permissionSet shows missing scopes, treat as SEV-1 and proceed to §11 SIEM-NO-EVENTS in parallel to confirm the detection job is even executing.
Execute GQ-01 directly against Graph with the pipeline's credentials. If GQ-01 returns results but Find-OrphanedAgents does not, the cmdlet's filter logic is at fault — not the upstream data.
Execute GQ-02 (ownerless apps) and GQ-03 (HR attribute audit). Confirm you have non-empty result sets for both. If GQ-03 is empty across all disabled users, escalate to §4 HR-SIGNAL-MISSING.
Inspect AgentGovernance_CL for AgentOrphanDetectionRun events in the last 24h (KQL-01). If absent, the detection job itself did not execute — investigate scheduler (Azure Automation, Logic App, or Power Automate flow owning the run).

2.4 Resolution Steps

Permission gap. Re-consent the managed identity or app registration with the minimum scopes listed in §1.2. Record the before/after permission set as E-03.
Narrow filter. File a change request per Control 2.3 Change Management to widen detection logic to all 10 signal categories. Do not patch in production without CAB approval.
Throttling. Add exponential backoff + Retry-After respect to the detection helper; increase page size ceiling only after measuring 429 rates.
Token/context. Disconnect-MgGraph; Connect-MgGraph -Identity (for managed identity) or -Scopes (for interactive re-consent). Verify with Get-MgContext.

2.5 Verification

Find-OrphanedAgents returns ≥ expected count (compare to manual Graph enumeration in step 2 above).
KQL-01 shows a successful AgentOrphanDetectionRun in the last hour with status Success and non-zero AgentsScanned_d.
Evidence E-01 and E-02 captured and hashed.
All 10 detection categories represented in the output (spot-check each category has at least one test case or documented zero-finding rationale).

2.6 Cross-References

2.7 Examiner Artifact Preservation

Before any configuration change, export the current detection cmdlet source, scheduler definition, and permission grants to E-03. If the incident is examiner-adjacent, record the full permission-set delta (before/after) signed by the AI Administrator and AI Governance Lead.

§3 Pillar — DETECT-DATA-DRIFT

Symptom. The Agent 365 Ownerless Agents governance card and Find-OrphanedAgents disagree on the orphan count by more than the 5% reconciliation tolerance.

3.1 Symptom Catalog

Variant	Observable
Card > script	Card reports 42 ownerless; script reports 31
Script > card	Script reports 58; card reports 47
Category disagreement	Both agree on total, disagree on which agents are in category 1 vs. 4
Transient drift	Drift resolves after 60-90 minutes (propagation lag)

3.2 Root Cause Matrix

Cause	Likelihood	Verification
Agent 365 card covers only category 1 (ownerless); script covers all 10	Very High (by design)	Control 3.6 §Ownerless Agents card scope note
Card refresh lag (up to 24h)	High	Card footer "Last refreshed" timestamp
Script running against stale cache	Medium	Clear helper's local cache and rerun
Group-owned agents counted differently (expand vs. not expand)	Medium	GQ-05 principalType analysis
Guest/external owner exclusions differ	Medium	GQ-08 + card filter settings
Soft-deleted vs. hard-deleted agents in 30-day window	Low	Graph `directory/deletedItems` comparison

3.3 Diagnostic Procedure

Export the Ownerless Agents card to CSV; capture timestamp and row count.
Run Get-Agt36Orphan -Zone 3,2,1 -IncludeShadow at the same wall-clock time.
Compute a set diff (not just a count diff) on agentId: rows only in card, rows only in script, rows in both.
For rows only in card: inspect ownership in Graph (GQ-05). They should be category 1 (truly ownerless). If they are not in script output, the script's category-1 filter has a gap.
For rows only in script: map each to its category (2-10). These are expected — the card does not cover categories 2-10.
Record the diff as E-03 with both source hashes.

3.4 Resolution Steps

Expected architectural drift (card scope). Not an incident. Document the reconciliation narrative in the monthly governance review; the script is the system of record per Control 3.6.
Card refresh lag. Wait up to 24h and re-compare. If drift persists beyond 24h after a known change event, open a Microsoft support case referencing the card refresh cadence.
Script cache staleness. Clear helper cache: Get-Agt36Orphan -ClearCache. Rerun and re-diff.
Principal-type inconsistency. Align script's principalType handling with the card's (user + group + servicePrincipal) and document in the detection cmdlet's header comment.

3.5 Verification

Set diff resolves to zero for category 1 rows.
Non-category-1 rows are present only in the script output, which is expected.
E-01 (orphan register) hash matches E-03 (card export) for the category-1 subset.
Monthly reconciliation narrative updated in the Control 3.6 governance log.

3.6 Cross-References

3.7 Examiner Artifact Preservation

Archive both the card CSV and the script output side-by-side with the reconciliation narrative. Examiners frequently ask "why do these two numbers differ" and the prepared narrative must be retrievable within the examination response SLA.

§4 Pillar — HR-SIGNAL-MISSING

Symptom. employeeLeaveDateTime, employeeHireDate, or manager is not populated on Entra user objects for known HR events, breaking every downstream lifecycle signal.

4.1 Symptom Catalog

Variant	Observable
Global null	GQ-03 shows 0% populated across all disabled users
Partial propagation	Some populations (FTE) populated; contractors not
Time zone truncation	Leave date set to midnight UTC regardless of local termination time
Stale update	Leave date populated but does not match HRIS of record

4.2 Root Cause Matrix

Cause	Likelihood	Verification
HR connector (Workday, SuccessFactors, Entra Cloud Sync with HR source) not running	High	Connector run history
Attribute mapping missing for `employeeLeaveDateTime`	High	Provisioning attribute mapping JSON
Non-FTE populations excluded from HR sync scope	High	Sync scope rules
Source HRIS does not publish leave date	Medium	HRIS data team confirmation
Provisioning throttled / quarantined	Low	Provisioning logs

4.3 Diagnostic Procedure

Run GQ-03 and compute coverage: (non-null employeeLeaveDateTime / accountEnabled=false) × 100.
Pull Entra provisioning logs for the HR connector: filter on action=Update, last 7 days.
Compare a known-terminated user's Entra record to the HRIS source-of-truth record. Record deltas.
Verify attribute mapping in the provisioning configuration (portal: Entra ID → Users → Provisioning → Attribute mappings).

4.4 Resolution Steps

Connector down. Restart connector; if quarantined, clear quarantine after diagnosing root cause (typically >10% mismatch rate during pilot).
Mapping gap. File change request (Control 2.3) to add employeeLeaveDateTime mapping; coordinate with HR systems team on the source-attribute name.
Scope exclusion. If contractor population is excluded by policy, document the compensating control (manual monthly HR export → orphan register reconciliation).
Time zone. Document the UTC midnight convention in the Control 3.6 data dictionary; downstream cmdlets must not assume local-time precision.

4.5 Verification

GQ-03 coverage ≥ 95% on disabled users terminated in the last 30 days.
Representative sample of 10 recent terminations — Entra record matches HRIS within ±1 business day.
Entra provisioning logs show connector healthy with <5% failure rate.
Orphan detection rerun shows HR-signal-driven categories (2, 3, 5) returning expected volumes.

4.6 Cross-References

4.7 Examiner Artifact Preservation

Preserve the HR connector run history and attribute mapping configuration as E-03. For FSI examinations focused on JML (Joiner-Mover-Leaver) controls, examiners will request evidence that HR events flow into identity governance within a documented SLA.

Symptom. Entra Identity Governance Lifecycle Workflow "Manager Transfer on Leaver" did not fire, or fired but did not transfer agent ownership to the leaver's manager.

5.1 Symptom Catalog

Variant	Observable
Workflow did not fire	`GET /lifecycleWorkflows/workflows/{id}/runs` has no run for the leaver
Workflow fired but no-op	Run record exists; action log shows 0 objects transferred
Partial transfer	Some agents transferred; some skipped with `TargetNotEligible`
Manager missing	Leaver has no `manager` attribute; workflow has no transfer target

5.2 Root Cause Matrix

Cause	Likelihood	Verification
Workflow trigger condition does not match leaver's attributes	High	Trigger JSON vs. leaver attribute dump
`manager` attribute null on leaver	High	GQ-03 extended with `manager` expansion
Workflow scope excludes leaver's department/OU	Medium	Workflow scope rules
Workflow disabled or in draft state	Medium	Workflow `isEnabled` property
Custom task script failed (returned non-zero)	Medium	Task run result
Licensing — Entra ID Governance P2 not assigned to workflow identity	Low	Tenant license audit

5.3 Diagnostic Procedure

Identify the specific workflow by name; record its id.
GQ-04: pull last 50 runs; filter to leaver's UPN.
If no run: confirm the leaver matches the workflow's scope and trigger (employeeLeaveDateTime in last N days + scope attribute match).
If run exists but no-op: inspect each task's status and processingResult. "Task completed, 0 objects" typically means the identity had no owned apps at run time — which may be correct if a prior run already transferred them.
For partial transfer: inspect the skipped agents' owner eligibility with Get-Agt36OwnerEligibility -CandidateUpn <manager>.

5.4 Resolution Steps

Trigger mismatch. Adjust trigger expression (portal or Graph PATCH), requires Entra Identity Governance Admin. File Control 2.3 change request.
Missing manager. Escalate to HR to populate; until populated, workflow cannot transfer. Interim: manual reassignment per §6 REASSIGN-FAIL procedures with AI Governance Lead approval.
Eligibility block. Transfer target (manager) lacks required license or environment membership. Either license them or pick an alternate owner (line-of-business agent steward).
Disabled workflow. Re-enable after root-cause of disabling is documented (often: emergency disable during incident).

5.5 Verification

GQ-04 shows a successful run for each recent leaver within the workflow's SLA (typically 2h post employeeLeaveDateTime).
Post-transfer, GQ-05 on each affected agent shows the new manager as owner.
E-04 (LCW run record) captured.
Entra audit log shows Update application owners events matching the workflow run correlation ID.

5.6 Cross-References

5.7 Examiner Artifact Preservation

Preserve the workflow definition JSON, run record, and resulting Entra audit events as an E-04 bundle. Manager-transfer is a critical JML control; examiners will expect end-to-end traceability from HR event → LCW run → ownership change → Entra audit.

§6 Pillar — REASSIGN-FAIL

Symptom. Set-AgentOwner or Set-AdminPowerAppOwner fails with errors such as "principal not licensed for environment", "not a member of environment", "SoD violation", or "insufficient privileges on target".

6.1 Symptom Catalog

Variant	Observable
License block	Error `UserLicenseNotFound` or `NotLicensedForEnvironment`
Environment membership block	Error `UserNotEnvironmentMember`
SoD block	Error `SegregationOfDutiesViolation` (operator and candidate overlap on incompatible roles)
Privilege block	Error `InsufficientPrivileges` on operator
Silent success-then-revert	Reassignment appears to succeed, then reverts within minutes (policy overrider)

6.2 Root Cause Matrix

Cause	Likelihood	Verification
Candidate owner lacks Power Apps per-user or per-app license	High	`Get-MgUserLicenseDetail -UserId <upn>`
Candidate not in the target environment's access list	High	Power Platform Admin Center → Environment → Access
SoD policy blocks cross-role reassignment (e.g., assigning ownership to a user with incompatible Entra role)	Medium	Control 2.8 SoD matrix
Operator is Global Reader — read-only	Medium	`Get-MgContext` scope enumeration
Conditional Access blocks the candidate (device compliance, MFA)	Low	Sign-in logs for candidate
Managed environment policy (in Managed Environments) enforces stricter owner rules	Low	Managed Environments settings

6.3 Diagnostic Procedure

Get-Agt36OwnerEligibility -CandidateUpn <upn> -AgentId <guid>. Read the verdict block — it enumerates every blocker.
If license block: confirm with Get-MgUserLicenseDetail. Provision license (requires License Admin) or pick a different candidate.
If environment block: add candidate to target environment with correct role.
If SoD block: consult Control 2.8 SoD matrix. Either change the candidate or route through an approved break-glass procedure.
If silent revert: inspect environment's Managed Environments policy for owner-enforcement rules; disable enforcement temporarily with change-request approval.

6.4 Resolution Steps

Licensing. Coordinate with License Admin to provision Power Apps license. Do not proxy the license through a service principal — ownership must resolve to a human or approved group.
Environment membership. Add candidate via Power Platform Admin Center → Environment → Access. Record the add event as E-03.
SoD. Select an alternate candidate in line with Control 2.8. If none exists, escalate to AI Governance Lead for documented exception with compensating control.
Privileges. Operator must hold Power Platform Admin or equivalent; re-authenticate with the correct principal.

6.5 Verification

Set-AgentOwner succeeds; exit code 0 with new ownerId in response.
GQ-05 on the agent shows the new owner.
Audit event AgentOwnerReassigned present in AgentGovernance_CL (KQL-02) with operator, justification, old/new owner.
E-05 captured.

6.6 Cross-References

6.7 Examiner Artifact Preservation

For each reassignment, preserve the full E-05 audit row including justification free-text. Examiners will sample reassignments and ask for the business rationale; the justification field is the primary evidence.

§7 Pillar — BULK-PARTIAL-FAIL

Symptom. A bulk reassignment or bulk terminal-delete job reports "47 of 52 succeeded" with no per-agent detail, or hangs partway through.

7.1 Symptom Catalog

Variant	Observable
Aggregate counter only	Log says "5 failed" with no agentId list
Hang	Job status `Running` for >4h with no progress events
Inconsistent retry	Some failures retried, others not — no deterministic policy
Correlation missing	No correlation ID tying bulk job to individual agent actions

7.2 Root Cause Matrix

Cause	Likelihood	Verification
Per-item error logging not emitted by job orchestrator	High	Inspect orchestrator source / logs
Throttling mid-batch (429) with silent drop	High	Response-header telemetry
Mixed-cause failures (some license, some SoD, some hold) hidden behind single counter	Medium	Per-agent replay
Orchestrator process restart mid-run	Low	Process uptime telemetry

7.3 Diagnostic Procedure

Pull the bulk job's correlation ID.
Query AgentGovernance_CL for all events with that correlation ID (KQL-02 variant).
For each agent in the input set missing a success event, replay the single-agent reassignment using Get-Agt36OwnerEligibility first, then Set-AgentOwner with the same parameters.
Record per-agent outcome into E-06.

7.4 Resolution Steps

Instrumentation. File Control 2.3 change request to add per-item logging (agentId, status, errorCode, errorMessage, durationMs) to the orchestrator. Until patched, manual replay per step 3 above is required.
Throttling. Implement batch sizing (e.g., 25 at a time) with backoff; respect Retry-After headers.
Mixed-cause surfacing. Aggregate log must include a failure-reason histogram, not just a count.

7.5 Verification

Every agent in the input set has exactly one terminal event (success or classified failure) in AgentGovernance_CL.
E-06 shows per-agent outcome with correlation ID traceable end-to-end.
Re-run of the same input set converges to steady state within 2 attempts.

7.6 Cross-References

§6 REASSIGN-FAIL
§8 TERMINAL-DELETE-BLOCKED
Control 2.3 Change Management

7.7 Examiner Artifact Preservation

A bulk job is examination-sensitive because it represents a population-level control action. Preserve the input manifest (list of agent IDs), the per-item outcome ledger (E-06), and the aggregate summary. Examiners will reconcile input vs. outcome counts line-by-line.

§8 Pillar — TERMINAL-DELETE-BLOCKED

Symptom. Remove-OrphanedAgent returns PolicyHold, LabelRetained, LitigationHoldActive, or eDiscoveryCaseMember and refuses to delete.

8.1 Symptom Catalog

Variant	Observable
Purview retention label binding	Label with retention rule + record designation prevents delete
Litigation hold on owner mailbox	Owner's mailbox on hold; associated artifacts preserved
eDiscovery case custodian	Agent listed as custodian-adjacent in active case
Managed Environments data-loss policy	Environment-level policy requires manual dual approval

8.2 Root Cause Matrix

This is by design — terminal delete must never silently override a legal preservation obligation. The troubleshooting task is to confirm the block is valid, route to the correct preservation workflow, and advance the remediation clock correctly.

Cause	Likelihood	Action
Purview retention label (record)	High	Confirm label; route to Records Management workflow
Litigation hold	High	Coordinate with Legal Hold Coordinator
eDiscovery custodian	Medium	Coordinate with Case Manager
Stale hold not released after matter close	Medium	Request hold release from Legal
Label misapplied (false positive)	Low	File label-correction request with Purview Compliance Admin

8.3 Diagnostic Procedure

Get-Agt36RetentionBindings -AgentId <guid>. Enumerate every label and hold binding with custodian UPNs.
For each binding, open the corresponding case/hold in Purview Compliance and confirm status.
If any binding is active, the agent must not be deleted. It may be (a) retained in place with ownership reassigned to a records steward, or (b) soft-deleted with the retention binding preserved — whichever the Records Management policy specifies.

8.4 Resolution Steps

Active hold, valid. Reassign ownership to records steward or designated custodian. Document the decision and file in the orphan register with status Retained-Under-Hold. The remediation SLA clock stops at the point of hold confirmation per Control 3.6 §Evidence & Retention.
Stale hold. Request hold release from Legal Hold Coordinator; attach evidence that the matter is closed. Re-run terminal delete after release is confirmed in Purview.
Misapplied label. Purview Compliance Admin corrects label assignment; retain full audit trail of the correction.

8.5 Verification

For each agent blocked, there is a documented decision: Retained-Under-Hold, Released-Then-Deleted, or Label-Corrected-Then-Deleted.
E-07 (retention binding report) captured with timestamp and operator.
No terminal delete occurred against an agent with an active legal preservation obligation.

8.6 Cross-References

8.7 Examiner Artifact Preservation

This pillar is itself an evidence-preservation pillar. Every blocked delete produces an artifact (E-07) that becomes part of the agent's lifecycle record. Examiners view intact preservation as a positive control signal.

§9 Pillar — RECONCILE-DRIFT

Symptom. The 3-source reconciliation (Power Platform inventory + Graph + Agent 365) shows >5% drift run-to-run with no corresponding change events, indicating reconciliation logic instability rather than real change.

9.1 Symptom Catalog

Variant	Observable
High-frequency oscillation	Drift bounces between +5% and -5% every run
One-sided drift	One source systematically higher than the others
Hash mismatch with zero row-level diff	Hashes differ but row-level comparison shows identical content

9.2 Root Cause Matrix

Cause	Likelihood	Verification
Eventual-consistency window (Graph and Power Platform propagate at different rates)	High	Stagger run timestamps 15 min apart; drift should converge
Soft-deleted agents counted differently	High	Include/exclude soft-deleted uniformly
Key normalization differs (appId vs. objectId vs. agentId)	Medium	Inspect join keys in reconciliation code
Sort order differences break hash even when content identical	Medium	Canonicalize order before hashing

9.3 Diagnostic Procedure

Run Get-Agt36ReconcileState -Tenant <guid>. Capture the delta report.
For each row in the delta: confirm whether it is present in all 3 sources, or absent from one.
Compute an order-independent hash on each source and compare.
Stagger a second run 30 minutes later and compare — persistent drift is structural; transient drift is propagation lag.

9.4 Resolution Steps

Propagation lag. Document the eventual-consistency window; raise the drift-alarm threshold or debounce with a 60-minute moving average.
Normalization. Align join keys; canonicalize to agentId (the governance identifier). File Control 2.3 change request.
Sort-independence. Replace sequence-dependent hash with sorted-merge hash.

9.5 Verification

KQL-05 shows drift < 5% for 3 consecutive runs.
Hash computation is order-independent (test: shuffle input, re-hash, compare).
Reconciliation narrative in E-09 explains any residual expected drift.

9.6 Cross-References

9.7 Examiner Artifact Preservation

Preserve both the raw source exports and the reconciliation delta. Examiners may request "show me that your three inventory sources agree" and the prepared reconciliation artifact is the direct response.

§10 Pillar — FALSE-POSITIVE-DL-OWNER

Symptom. An agent owned by a Distribution List, Microsoft 365 Group, or Security Group is flagged as "orphan" because the detection cmdlet resolved ownership to "no user principal".

10.1 Symptom Catalog

Variant	Observable
DL owner	Owner is a mail-enabled distribution list
Security group owner	Owner is a security group with members
Dynamic group	Owner is a dynamic M365 group, membership resolves at runtime
Nested group	Owner is a group whose only member is another group

10.2 Root Cause Matrix

Cause	Likelihood	Verification
Detection cmdlet does not expand group membership to check for human members	High	Inspect cmdlet source
Group membership resolves to zero users (empty DL) — genuine orphan-equivalent	Medium	Expand membership at detection time
Nested group depth exceeds cmdlet's expansion limit	Low	Increase expansion depth (cap at 3 levels per governance policy)

10.3 Diagnostic Procedure

For each flagged agent, run GQ-05 and inspect owner @odata.type.
If type is #microsoft.graph.group: expand membership with GET /groups/{id}/transitiveMembers.
If expanded membership contains ≥1 active user, this is a false positive — the agent is owned by a group with valid human members.
If expanded membership is empty or contains only disabled users, this is a genuine orphan-equivalent and should remain in the register with a category-specific flag.

10.4 Resolution Steps

False positive. File Control 2.3 change request to enhance the detection cmdlet to expand group membership (up to 3 levels) and classify as orphan only if transitive membership has zero active users.
Empty DL. Retain in orphan register under new category "ownerless-group-resolved". Apply same remediation workflow as a direct ownerless agent.
Temporary workaround. Maintain a curated allowlist of group owners known to be valid (e.g., agent-stewards@org), with quarterly recertification.

10.5 Verification

Flagged DL/group-owned agents with valid transitive human members no longer appear in the orphan register.
Empty DL/group-owned agents appear with the correct category flag.
Allowlist recertified in the last quarter (if used as interim).

10.6 Cross-References

10.7 Examiner Artifact Preservation

Document the false-positive analysis as part of the monthly governance review. Examiners appreciate explicit false-positive handling because it demonstrates control calibration maturity.

§11 Pillar — SIEM-NO-EVENTS

Symptom. Sentinel (or other SIEM) shows no AgentOrphanDetectionRun, AgentOwnerReassigned, or AgentTerminalDelete events for more than 24 hours, despite operational activity.

10.1 Symptom Catalog

Variant	Observable
Total silence	KQL-01, KQL-02, KQL-03 all empty for 24h+
Partial silence	Detection events arrive; reassignment events do not
Connector unhealthy	Data Connector status in Sentinel shows "Degraded" or "Disconnected"
Table schema drift	Events arrive but in a different table or with renamed columns

10.2 Root Cause Matrix

Cause	Likelihood	Verification
Data Collection Rule (DCR) misconfigured or disabled	High	Azure Monitor → DCR inventory
Log Analytics workspace key rotated without updating sender	High	Sender configuration vs. current workspace key
Sender identity lost `Log Analytics Contributor` role	Medium	RBAC audit
Network egress blocked (private endpoint / NSG change)	Medium	Connectivity test from sender
Ingestion cap hit (daily cap reached)	Medium	Log Analytics workspace usage
Table name change (Custom Logs v2 migration)	Low	Schema audit

10.3 Diagnostic Procedure

KQL-06 (Connector self-test ping). If self-test is silent, the sender pipeline is down.
Azure Portal → Log Analytics workspace → Usage and estimated costs → check daily cap and ingestion.
From the sender host, test connectivity to the ingestion endpoint (public or private).
Verify sender identity's role assignments against the workspace.

10.4 Resolution Steps

DCR disabled. Re-enable; verify with KQL-06 self-test within 15 minutes.
Key rotated. Update sender; restart sender service; confirm with KQL-06.
RBAC lost. Re-assign Log Analytics Contributor or Monitoring Metrics Publisher to sender identity.
Network. Diagnose NSG/private endpoint; file network change request.
Ingestion cap. Coordinate with platform team on cap increase or workload rebalancing.

10.5 Verification

KQL-06 returns a ping within the last 10 minutes.
KQL-01, KQL-02 return events consistent with current operational tempo.
Sentinel Data Connector status is Healthy.
E-02 recoverable for the silent window (from sender-side local buffer if available, or post-incident manual replay).

10.6 Cross-References

§13 Runbook RB-02 — Examiner Pulls Register with SIEM Gaps
Control 3.1 Agent Inventory

10.7 Examiner Artifact Preservation

During SIEM silence, the sender host's local buffer (if implemented) is often the only record. Preserve the local buffer until events are confirmed ingested. Document the silent window with start/end UTC timestamps in E-09 for examiner transparency.

§12 Runbook RB-01 — M&A / RIF Mass Departure

Trigger. M&A divestiture, reduction in force, or mass reorganization generates >100 leaver events in a <72-hour window, potentially orphaning hundreds of agents.

10.1 Pre-Conditions

HR has confirmed the population cutoff list (final leaver UPNs).
Legal has confirmed whether any leaver is subject to litigation hold.
AI Governance Lead has briefed Control Owners on the event window.
Baseline orphan register (E-01) captured immediately pre-event.

10.2 Procedure

T-24h: Snapshot current orphan register and agent inventory (E-01, E-03). Hash and file.
T-0: HR event propagates. Monitor GQ-03 coverage; expected populated employeeLeaveDateTime within 4h.
T+4h: Run Get-Agt36SponsorImpact across the full leaver list. Produce an impact report grouped by Zone with total affected agent count.
T+8h: Entra Lifecycle Workflows should have fired. Verify via GQ-04 batch query. Throttling is likely — if <80% processed at T+8h, escalate to parallel-batch manual reassignment.
T+24h: For agents still orphaned post-LCW, initiate batched Set-AgentOwner in groups of 25 with 5-minute pauses. Capture per-agent outcomes (E-05, E-06).
T+72h: Run full reconciliation (Get-Agt36ReconcileState). Zero Zone 3 agents should remain orphaned.
T+7d: Produce RIF remediation summary memo (E-10).

10.3 Evidence Bundle

E-01 (T-24h baseline), E-03 (Graph queries at multiple checkpoints), E-04 (LCW runs), E-05 (reassignments), E-06 (bulk job outcomes), E-09 (incident timeline), E-10 (summary memo).

10.4 Post-Incident Actions

Tune LCW throttle configuration based on observed throughput.
Document any novel orphan patterns as new detection categories or false-positive rules.
Conduct lessons-learned with AI Governance Lead; update this runbook.

10.5 Cross-References

§5 SPONSOR-TRANSFER-FAIL
§7 BULK-PARTIAL-FAIL
Control 2.26 Entra Agent ID

§13 Runbook RB-02 — Examiner Pulls Register with SIEM Gaps

Trigger. Examiner or internal audit requests the orphan register "as of" a prior date, but SIEM ingestion had silent windows during that period.

10.1 Pre-Conditions

Examiner request scope documented (as-of date, agent population, detail level).
Legal Hold Coordinator engaged if the examination is formal.
WORM storage access available (read-only with audit) to the AI Governance Lead.

10.2 Procedure

T+0: Acknowledge request; confirm scope in writing with examiner and AI Governance Lead.
T+1h: Retrieve WORM-stored orphan register snapshot (E-01) closest to the examiner's as-of date. Verify hash.
T+2h: Cross-check against detection-run logs (E-02). Identify any silent windows from §11 SIEM-NO-EVENTS post-incident records.
T+4h: For each silent window, retrieve sender-side local buffer or re-run Get-Agt36Orphan -AsOf <datetime> against Graph / Power Platform historical data where available. Document explicitly which sources were usable for which time ranges.
T+24h: Package register, supporting evidence, and silent-window narrative for examiner. Include explicit statement of known gaps — do not paper over gaps; examiners view candor favorably.
T+48h: AI Governance Lead review and dual-signature on the examiner deliverable.

10.3 Evidence Bundle

E-01, E-02, E-03 (historical Graph query replays), E-09, E-10 (examiner response memo with silent-window narrative).

10.4 Post-Incident Actions

Remediate any SIEM gaps identified during the silent-window narrative.
Implement sender-side durable buffering if not already in place.
Update the retention policy on WORM storage if the examiner-relevant window is older than the current retention.

10.5 Cross-References

§11 SIEM-NO-EVENTS
Control 3.5 Cost Allocation & Budget Tracking (for examiner cost-impact questions)

Trigger. A single high-privilege sponsor's termination triggers orphaning of 10+ Zone 3 agents, including cross-business-unit agents.

10.1 Pre-Conditions

HR confirmed termination with effective date.
Sponsor's manager attribute is populated in Entra.
Business-unit agent stewards identified for each affected Zone 3 agent.

10.2 Procedure

T-24h (if planned): Run Get-Agt36SponsorImpact -LeaverUpn <sponsor>. Distribute to business-unit stewards for reassignment pre-planning.
T-0: Termination effective. employeeLeaveDateTime populated.
T+2h: LCW fires. Verify GQ-04. Manager-transfer default covers agents where business-unit steward agrees.
T+4h: For agents requiring alternate owner (not manager), execute Set-AgentOwner with documented business-steward approval (E-05).
T+24h: Zero orphans in sponsor's impact set. Final reconciliation and evidence bundle.

10.3 Evidence Bundle

E-03 impact report, E-04 LCW runs, E-05 per-agent reassignments with business-unit approvals, E-09 timeline.

10.4 Post-Incident Actions

Review whether the sponsor held single-owner status on any agent (SoD risk pattern); add co-owner policy for Zone 3 agents with privileged sponsors.

10.5 Cross-References

§5 SPONSOR-TRANSFER-FAIL
Control 2.8 SoD

§15 Runbook RB-04 — Bulk Reassign Over-Applied

Trigger. A bulk reassignment job transferred ownership of agents that were not intended to be in scope, due to an over-broad input filter.

10.1 Pre-Conditions

The over-application has been detected (typically: unexpected ownership-change events in KQL-02).
Input manifest preserved.
Legal notified if any over-applied agent is examination-adjacent.

10.2 Procedure

T+0: Freeze further bulk jobs. Snapshot current state (E-03, E-05).
T+1h: Reconstruct the intended input set vs. the actual processed set. Identify the over-applied subset.
T+2h: For each over-applied agent, retrieve the pre-change owner from E-05 audit rows. Re-run Set-AgentOwner to restore the original owner with justification "Reversal of RB-04 incident ".
T+8h: Full reconciliation. Confirm all over-applied agents are restored.
T+24h: Root cause of filter over-broadness documented. Input-validation enhancement filed under Control 2.3.

10.3 Evidence Bundle

E-05 (both original and reversal rows), E-06 (original bulk job), E-09, E-10.

10.4 Post-Incident Actions

Add pre-flight dry-run with diff preview to the bulk orchestrator.
Require dual approval on any bulk job exceeding 50 agents.

10.5 Cross-References

§16 Runbook RB-05 — SOX Zero-Orphan Attestation Failure

Trigger. Quarterly SOX attestation for financially significant systems (Zone 3, tagged SOX-ICFR-Relevant) shows >0 orphaned agents past remediation SLA.

10.1 Pre-Conditions

SOX attestation window is active (typically last 2 weeks of quarter).
Control Owner and SOX Control Tester identified.
Audit committee reporting timeline known.

10.2 Procedure

T+0: SOX Tester flags orphan count > 0. AI Governance Lead acknowledges.
T+1h: For each orphan, produce SLA clock audit — start time (detection), current time, breach magnitude.
T+2h: Classify each breach: genuine remediation miss, TERMINAL-DELETE-BLOCKED (valid preservation), or data-quality artifact.
T+24h: Remediate the genuine misses. For hold-blocked items, ensure the register correctly reflects Retained-Under-Hold status per §8.4.
T+3d: Control Owner and AI Governance Lead sign a deficiency memo if any genuine miss exceeded materiality thresholds; route through Internal Audit per SOX §404 process.
T+7d: Remediation evidence (E-05, E-07 as applicable, E-10) bundled for Internal Audit.

10.3 Evidence Bundle

E-01 at start of window, E-01 at attestation time, E-05 remediations, E-07 hold records, E-09 timeline, E-10 deficiency memo (if applicable).

10.4 Post-Incident Actions

Tune detection SLA tiering if a recurring category is slipping.
Review whether Zone 3 SOX-tagged agents require stricter owner policy (e.g., mandatory co-owner).

10.5 Cross-References

§17 Runbook RB-06 — Litigation-Hold Deletion Blocked

Trigger. An agent is past terminal-deletion SLA but Remove-OrphanedAgent is blocked by an active litigation hold or eDiscovery case binding.

17.1 Pre-Conditions

Legal Hold Coordinator identified.
eDiscovery Case Manager identified (if applicable).
Get-Agt36RetentionBindings output available (E-07).

17.2 Procedure

T+0: Confirm the block is valid via Purview Compliance. This is not an error to work around — it is a legal obligation.
T+1h: Route the agent to Retained-Under-Hold status in the orphan register. Reassign ownership (not deletion) to the designated records steward. The remediation SLA clock stops at this point per Control 3.6 §Evidence & Retention.
T+24h: Document in the register the hold ID, custodian, and expected matter-close date.
On hold release (date varies): Legal Hold Coordinator notifies AI Governance Lead. Re-run Remove-OrphanedAgent. Capture E-05, E-07, and the Legal release notice as E-10.

17.3 Evidence Bundle

E-07 (binding at start, release at end), E-05 (final deletion), Legal release notice, E-09 end-to-end timeline (often spanning months or years).

17.4 Post-Incident Actions

Review the agent's lifecycle record for completeness; the full retained-through-hold narrative is examination-grade.
If a recurring pattern emerges (same custodian, multiple holds), coordinate with Legal on whether a more structured custodian-of-record role is appropriate.

17.5 Cross-References

§18 Escalation Matrix

Severity	Primary Responder	Escalation L1	Escalation L2	External
SEV-1	AI Administrator on-call	AI Governance Lead	CISO delegate	Legal, examiner liaison
SEV-2	AI Administrator	Control Owner	AI Governance Lead	—
SEV-3	AI Administrator	Control Owner	—	—
SEV-4	AI Administrator	—	—	—

Cross-functional contacts (always reachable via established on-call rotation):

Entra Identity Governance Admin — Lifecycle Workflows, HR attribute provisioning (§4, §5).
Power Platform Admin — environment membership, DLP policy, bulk job orchestration (§6, §7).
Purview Compliance Admin — retention labels, holds, eDiscovery (§8, RB-07).
Legal Hold Coordinator — litigation holds, examiner artifact preservation, RB-02, RB-07.
AI Governance Lead — policy interpretation, dual-signature on SEV-1/2 evidence bundles.

Notifications are dispatched via the firm's ticketing system with SLA clocks per the severity matrix. The AI Governance Lead carries authority to waive non-regulatory SLAs in the case of a compensating incident; such waivers are recorded in E-10.

§19 Evidence Collection During Incident

Evidence collection is continuous and non-destructive. The following conventions apply to every Control 3.6 incident:

Capture before you act. E-01 and E-02 must be captured prior to any remediation step. Hashes recorded in the incident ticket.
Hash every artifact. SHA-256 at minimum; include filename, size, and UTC capture time.
Chain of custody. Every handoff of evidence (operator → analyst → Legal) records UPN and UTC timestamp.
Immutable storage. All evidence artifacts ultimately land in WORM storage with FSI-Records-6yr-WORM retention or stricter. Examination-flagged artifacts escalate to FSI-Examiner-Preserve-Indefinite.
Never edit a primary artifact. Derivations (e.g., filtered views for examiner response) are separate files that reference the primary.
Cross-file with Legal whenever examiner, subpoena, or litigation hold touches the incident.
Complete the evidence bundle before closing the ticket. E-09 (timeline) and E-10 (root-cause memo) are mandatory for SEV-1 and SEV-2.

17.1 Standard Evidence Capture Skeleton

# Capture orphan register snapshot (E-01)
$ts    = (Get-Date).ToUniversalTime().ToString("yyyyMMddTHHmmssZ")
$stem  = "INC-$($TicketId)-$ts"
$regP  = "$EvidenceRoot\$stem-E01-orphan-register.csv"

Get-Agt36Orphan -Zone 3,2,1 -IncludeShadow |
    Export-Csv -Path $regP -NoTypeInformation -Encoding UTF8

$hash  = (Get-FileHash -Path $regP -Algorithm SHA256).Hash
Write-EvidenceManifest -TicketId $TicketId -ArtifactId E-01 `
    -Path $regP -Sha256 $hash -CapturedBy (whoami) -CapturedAtUtc $ts

The Write-EvidenceManifest helper writes a signed JSON manifest to the same WORM storage location, keyed by ticket ID. See powershell-setup.md for the full helper definition.

§20 Cross-References

20.1 Control Cross-References

Control 3.6 — Orphaned Agent Detection and Remediation — source-of-truth control specification.
Control 1.2 — Agent Registry and Integrated Apps Management — upstream inventory.
Control 2.25 — Agent 365 Admin Center Governance Console — Ownerless Agents card.
Control 2.26 — Entra Agent ID Identity Governance — Lifecycle Workflows, HR signal.
Control 2.3 — Change Management and Release Planning — for all detection/orchestrator changes.
Control 2.8 — Access Control and Segregation of Duties — reassignment SoD rules.
Control 3.1 — Agent Inventory and Metadata Management — inventory reconciliation.
Control 3.5 — Cost Allocation and Budget Tracking — orphan cost reporting.
Control 3.13 — Agent 365 Admin Center Analytics — analytics surface.

20.2 Sibling Playbook Cross-References

20.3 Regulatory Cross-References

FINRA Rule 4511 — books and records preservation; register is system-of-record artifact.
FINRA Rule 3110 — supervisory review; this playbook supports — does not replace registered-principal supervisory review.
SEC Rule 17a-4(f) — WORM retention for broker-dealer records; 6-year floor applied to E-01 through E-10.
SOX §404 — ITGC for access provisioning/de-provisioning; see RB-06.
GLBA Safeguards Rule — access controls over NPI; orphan agents are access-control risks.
OCC Bulletin 2013-29 / Fed SR 26-2 (formerly SR 11-7) — third-party and model risk; AI agents that persist past sponsor separation are in-scope risk events.

Updated: May 2026 | Version: v1.6.2 | UI Verification Status: Current

Control 3.6 — Troubleshooting: Orphaned Agent Detection and Remediation

Table of Contents

§0 Triage Tree

0.1 Symptom → Pillar Map

0.2 Severity Matrix

0.3 Pre-Escalation Checklist

0.4 Examiner Artifact Preservation (Do-This-First)

§1 Diagnostic Data Collection

1.1 Helper Inventory

1.2 Graph Query Catalog

1.3 KQL Query Catalog (Log Analytics / Sentinel)

1.4 Power Platform / Agent 365 Exports

1.5 Evidence Floor

§2 Pillar — DETECT-NO-RESULTS

2.1 Symptom Catalog

2.2 Root Cause Matrix

2.3 Diagnostic Procedure

2.4 Resolution Steps

2.5 Verification

2.6 Cross-References

2.7 Examiner Artifact Preservation

§3 Pillar — DETECT-DATA-DRIFT

3.1 Symptom Catalog

3.2 Root Cause Matrix

3.3 Diagnostic Procedure

3.4 Resolution Steps

3.5 Verification

3.6 Cross-References

3.7 Examiner Artifact Preservation

§4 Pillar — HR-SIGNAL-MISSING

4.1 Symptom Catalog

4.2 Root Cause Matrix

4.3 Diagnostic Procedure

4.4 Resolution Steps

4.5 Verification

4.6 Cross-References

4.7 Examiner Artifact Preservation

§5 Pillar — SPONSOR-TRANSFER-FAIL

5.1 Symptom Catalog

5.2 Root Cause Matrix

5.3 Diagnostic Procedure

5.4 Resolution Steps

5.5 Verification

5.6 Cross-References

5.7 Examiner Artifact Preservation

§6 Pillar — REASSIGN-FAIL

6.1 Symptom Catalog

6.2 Root Cause Matrix

6.3 Diagnostic Procedure

6.4 Resolution Steps

6.5 Verification

6.6 Cross-References

6.7 Examiner Artifact Preservation

§7 Pillar — BULK-PARTIAL-FAIL

7.1 Symptom Catalog

7.2 Root Cause Matrix

7.3 Diagnostic Procedure

7.4 Resolution Steps

7.5 Verification

7.6 Cross-References

7.7 Examiner Artifact Preservation

§8 Pillar — TERMINAL-DELETE-BLOCKED

8.1 Symptom Catalog

8.2 Root Cause Matrix

8.3 Diagnostic Procedure

8.4 Resolution Steps

8.5 Verification

8.6 Cross-References

8.7 Examiner Artifact Preservation

§9 Pillar — RECONCILE-DRIFT

9.1 Symptom Catalog

9.2 Root Cause Matrix

9.3 Diagnostic Procedure

9.4 Resolution Steps

9.5 Verification

9.6 Cross-References

9.7 Examiner Artifact Preservation

§10 Pillar — FALSE-POSITIVE-DL-OWNER

10.1 Symptom Catalog

10.2 Root Cause Matrix