ASARD Troubleshooting Guide
Overview
This guide provides diagnostic procedures and resolutions for common issues encountered when operating the Agent Sharing Access Restriction Detector (ASARD) solution.
Scope: This guide covers operational issues, error diagnostics, and resolution procedures. For deployment instructions, see Deployment Guide. For exception management procedures, see Exception Management Guide.
Escalation: If issues persist after following resolution procedures, escalate to: - Platform issues: Power Platform support or tenant admin team - Script issues: Development team or governance automation owner - Policy questions: Information security or compliance team
Diagnostic Approach
When troubleshooting, start with the most recent logs and error messages. Most issues can be diagnosed from script output, API responses, or Power Automate run history.
BAP API Authentication Issues
Symptom
- Detection script fails with
401 Unauthorizederrors - Error message:
"error": {"code": "Unauthorized", "message": "Unauthorized"} - Script output shows authentication failures when enumerating environments or agents
Causes
- Expired access token: Access tokens expire after 1 hour
- Insufficient API permissions: App registration missing required permissions
- Wrong tenant ID: Script configured with incorrect tenant
- Missing admin consent: API permissions not granted admin consent
- Disabled or deleted app registration: Service principal no longer exists
Diagnostic Steps
1. Verify token acquisition:
# Enable debug logging in script
export AZURE_IDENTITY_LOGGING=1
python detect_agent_sharing_violations.py --dry-run
2. Check API permissions:
- Navigate to Azure Portal > App registrations > Your app
- Go to API permissions
- Verify the following permissions exist and are granted:
- PowerApps-Advisor / Analysis.Read.All (or appropriate BAP API permission)
- Microsoft Graph / Group.Read.All
- Check Status column shows "Granted for [tenant]"
3. Validate configuration:
# Verify environment variables
echo $AZURE_CLIENT_ID
echo $AZURE_TENANT_ID
# Do NOT echo client secret for security reasons
4. Test token manually:
from azure.identity import ClientSecretCredential
import os
credential = ClientSecretCredential(
tenant_id=os.environ["AZURE_TENANT_ID"],
client_id=os.environ["AZURE_CLIENT_ID"],
client_secret=os.environ["AZURE_CLIENT_SECRET"]
)
token = credential.get_token("https://api.bap.microsoft.com/.default")
print(f"Token acquired: {token.token[:20]}...")
print(f"Expires on: {token.expires_on}")
Resolution
For expired tokens: - Tokens are automatically refreshed by the MSAL library - If script runs longer than 1 hour, ensure refresh logic is enabled - No manual action required in most cases
For insufficient permissions: 1. Add missing API permissions in Azure Portal 2. Click Grant admin consent for your tenant 3. Wait 5-10 minutes for permission propagation 4. Retry detection script
For wrong tenant ID:
1. Obtain correct tenant ID from Azure Portal > Microsoft Entra ID > Overview
2. Update AZURE_TENANT_ID environment variable or configuration
3. Retry
For missing admin consent: 1. Navigate to App registrations > Your app > API permissions 2. Click Grant admin consent for [tenant] 3. Confirm the action 4. Verify Status column updates to "Granted"
For disabled app registration: 1. Check app registration status in Azure Portal 2. If deleted, recreate following Deployment Guide Step 1 3. Update configuration with new client ID and secret
Security Group Resolution Failures
Symptom
- Agents marked as violations despite being shared with approved groups
- Warning:
Unable to resolve security group: <group-id> - Console output shows group resolution failures
- False positive violations for agents with approved sharing
Causes
- Group not found: Group ID does not exist in Microsoft Entra ID
- Deleted security groups: Group was deleted after agent sharing configured
- Nested groups not supported: Sharing principal is member of approved group (indirectly)
- Incorrect group ID format: Group object ID vs. group name vs. group email
- Insufficient Graph API permissions: Cannot read group membership
Diagnostic Steps
1. Verify group exists in Microsoft Entra ID:
# Using Microsoft Graph PowerShell
Connect-MgGraph -Scopes "Group.Read.All"
Get-MgGroup -GroupId "<group-id>"
2. Check group ID format:
- Group object ID: 12345678-1234-1234-1234-123456789abc (GUID)
- Group name: Finance-Agents-Prod (display name)
- Group email: finance-agents@contoso.com (mail attribute)
- Script expects object ID (GUID format)
3. Test group resolution:
from azure.identity import ClientSecretCredential
from msgraph import GraphServiceClient
import os
credential = ClientSecretCredential(
tenant_id=os.environ["AZURE_TENANT_ID"],
client_id=os.environ["AZURE_CLIENT_ID"],
client_secret=os.environ["AZURE_CLIENT_SECRET"]
)
client = GraphServiceClient(credential)
group = await client.groups.by_group_id("<group-id>").get()
print(f"Group: {group.display_name}")
4. Review approved group policy table:
- Navigate to Dataverse > gov_asardsecuritygrouppolicy
- Verify group IDs are in object ID (GUID) format
- Check isactive field is set to Yes
- Confirm zone matches environment classification
Resolution
For deleted groups:
1. Identify replacement group or create new approved group
2. Update gov_asardsecuritygrouppolicy table with new group ID
3. Update agent sharing to use new group (if remediation needed)
4. Re-run detection scan
For incorrect group ID format: 1. Obtain correct object ID from Microsoft Entra ID:
Get-MgGroup -Filter "displayName eq 'Finance-Agents-Prod'" | Select-Object Id, DisplayName
gov_asardsecuritygrouppolicy table with correct object ID
3. Re-run detection scan
For nested groups (design limitation):
- Current behavior: Script does NOT resolve nested group membership
- Workaround options:
1. Flatten sharing: Share agent directly with approved group (not nested)
2. Create exception: Add exception record for legitimate nested group sharing
3. Enhance script: Modify bap_admin_client.py to recursively resolve groups (requires Microsoft Graph transitive membership query)
For insufficient permissions:
1. Verify Group.Read.All permission granted to app registration
2. Grant admin consent if missing
3. Retry detection script
For stale group policy:
1. Review all records in gov_asardsecuritygrouppolicy
2. Set isactive = false for groups that no longer exist
3. Add new approved groups as needed
4. Document group lifecycle process to prevent future staleness
Agent Enumeration Failures
Symptom
- Detection script reports fewer agents than expected
- Warning:
Failed to enumerate agents in environment: <env-name> - Incomplete agent inventory in violation reports
- Script completes but misses environments or agents
Causes
- API throttling: Too many requests to BAP Admin API
- Environment access denied: Service principal lacks permissions for environment
- Deleted agents: Agent was deleted after enumeration started
- API timeout: Request took too long and timed out
- Transient API failures: Temporary service issues
Diagnostic Steps
1. Check API throttling headers:
# Add logging in bap_admin_client.py
response = requests.get(url, headers=headers)
print(f"Rate limit remaining: {response.headers.get('x-ratelimit-remaining')}")
print(f"Rate limit reset: {response.headers.get('x-ratelimit-reset')}")
x-ratelimit-remaining is low or 0, throttling is occurring
2. Verify environment permissions:
# Check service principal has access to environment
pac admin list --environment <env-id>
3. Test agent enumeration for specific environment:
from bap_admin_client import BAPAdminClient
client = BAPAdminClient()
agents = client.get_agents(environment_id="<env-id>")
print(f"Found {len(agents)} agents")
4. Review script logs:
python detect_agent_sharing_violations.py --dry-run 2>&1 | tee detection.log
Resolution
For API throttling:
1. Implement exponential backoff: Modify bap_admin_client.py:
import time
def get_agents_with_retry(env_id, max_retries=3):
for attempt in range(max_retries):
try:
return get_agents(env_id)
except ThrottlingException:
wait_time = 2 ** attempt # Exponential backoff
print(f"Throttled, waiting {wait_time}s...")
time.sleep(wait_time)
- Reduce request rate: Add delay between environment enumerations:
for env in environments: agents = get_agents(env.id) time.sleep(1) # 1 second delay between requests
For environment access denied: 1. Grant system admin role to service principal: - Navigate to Power Platform Admin Center - Select environment > Settings > Users + permissions > Security roles - Assign System Administrator role to app registration user
- Use delegated authentication: Switch from application permissions to delegated permissions with user context
For deleted agents: - Expected behavior: Agent deleted between enumeration and evaluation - Script should handle gracefully with try-catch - No action required if script continues
For API timeouts: 1. Increase timeout in requests:
response = requests.get(url, headers=headers, timeout=60) # 60 seconds
For transient failures: 1. Retry detection script 2. If persistent, check Microsoft 365 Service Health for platform issues 3. Implement retry logic with exponential backoff
Dataverse Write Errors
Symptom
- Detection script completes but no violations written to Dataverse
- Error:
"error": {"code": "0x80040237", "message": "Duplicate key detected"} - Error:
"error": {"code": "0x80048306", "message": "Request throttled"} - Violation records missing or incomplete in
gov_asardsharingviolationtable
Causes
- Schema mismatch: Dataverse table schema does not match script expectations
- Alternate key conflicts: Attempting to create record with duplicate key
- API throttling: Too many Dataverse write operations
- Field validation errors: Data violates field constraints (length, format)
- Missing required fields: Script not providing all required attributes
Diagnostic Steps
1. Verify schema version:
# Check table definition
import requests
response = requests.get(
f"{dataverse_url}/api/data/v9.2/EntityDefinitions(LogicalName='gov_asardsharingviolation')",
headers={"Authorization": f"Bearer {token}"}
)
print(response.json())
2. Test single record write:
violation_record = {
"gov_agentid": "test-agent-123",
"gov_environmentid": "test-env-456",
"gov_sharingprincipalid": "test-group-789",
"gov_zone": "production",
"gov_detectiondate": "2026-02-13T19:00:00Z",
"gov_status": "active"
}
response = requests.post(
f"{dataverse_url}/api/data/v9.2/gov_asardsharingviolations",
headers={"Authorization": f"Bearer {token}"},
json=violation_record
)
print(f"Status: {response.status_code}")
print(f"Response: {response.text}")
3. Check for duplicate keys:
- Review alternate key definition on gov_asardsharingviolation table
- Typical alternate key: gov_agentid + gov_sharingprincipalid + gov_detectiondate
- Query existing records to find conflicts:
GET /api/data/v9.2/gov_asardsharingviolations?$filter=gov_agentid eq 'test-agent-123'
4. Monitor API throttling:
# Enable Dataverse API throttling logging
export DATAVERSE_DEBUG=1
python detect_agent_sharing_violations.py
Resolution
For schema mismatch: 1. Re-run schema creation script:
python create_asard_dataverse_schema.py --update
For alternate key conflicts: 1. Identify conflict: Determine which record has duplicate key 2. Update instead of insert: Modify script to use PATCH for existing records:
# Check if record exists
filter_query = f"gov_agentid eq '{agent_id}' and gov_sharingprincipalid eq '{group_id}'"
existing = requests.get(
f"{dataverse_url}/api/data/v9.2/gov_asardsharingviolations?$filter={filter_query}",
headers={"Authorization": f"Bearer {token}"}
)
if existing.json()['value']:
# Update existing record
record_id = existing.json()['value'][0]['gov_asardsharingviolationid']
requests.patch(f"{dataverse_url}/api/data/v9.2/gov_asardsharingviolations({record_id})", ...)
else:
# Create new record
requests.post(f"{dataverse_url}/api/data/v9.2/gov_asardsharingviolations", ...)
For API throttling:
1. Batch write operations: Use $batch endpoint to reduce request count
2. Implement backoff: Catch throttling errors (429 status code) and retry:
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 60))
time.sleep(retry_after)
For field validation errors:
1. Review error message for specific field causing validation failure
2. Common issues:
- String too long: Trim or truncate values to field max length
- Invalid date format: Use ISO 8601 format: YYYY-MM-DDTHH:MM:SSZ
- Invalid choice value: Ensure choice field values match defined options
3. Update script to validate data before write
For missing required fields: 1. Review table definition to identify required fields 2. Update script to provide all required attributes 3. Set default values for optional fields
Detection False Positives/Negatives
Symptom
- False positives: Agents marked as violations despite compliant sharing
- False negatives: Non-compliant agents not detected as violations
- Zone classification incorrect for environments
- Approved groups not recognized
Causes
- Zone classification errors: Environment names do not match classification rules
- Sharing principal parsing issues: Script cannot parse sharing configuration format
- Approved group policy gaps: Missing entries in policy table
- Zone-specific policy not applied: Group approved for one zone but agent in different zone
- Wildcard policy misconfigured: Zone-wide approval not working as expected
Diagnostic Steps
1. Test zone classification:
from asard_zone_rules import classify_environment_zone
test_envs = [
"Finance-Production-01",
"HR-Development-Sandbox",
"Test-Environment-QA",
"MyApp-Prod"
]
for env_name in test_envs:
zone = classify_environment_zone(env_name)
print(f"{env_name} -> {zone}")
unknown zone assignments
2. Review sharing principal structure:
# Log sharing principals during detection
agent = get_agent(agent_id)
print(f"Sharing principals: {agent.get('sharingPrincipals', [])}")
3. Audit approved group policy:
SELECT gov_groupobjectid, gov_groupname, gov_zone, gov_isactive
FROM gov_asardsecuritygrouppolicy
WHERE gov_isactive = 1
4. Trace specific violation:
python detect_agent_sharing_violations.py --dry-run --agent-id "<agent-id>"
--agent-id flag to debug specific agent
- Review console output for evaluation logic
Resolution
For zone classification errors:
1. Update classification rules: Edit asard_zone_rules.py:
def classify_environment_zone(env_name: str) -> str:
env_lower = env_name.lower()
# Add your custom keywords
if 'prod' in env_lower or 'prd' in env_lower:
return 'production'
# ... add more patterns
For sharing principal parsing issues:
1. Review API response structure: Examine raw BAP Admin API response
2. Update parsing logic: Modify detect_agent_sharing_violations.py:
def parse_sharing_principals(agent_data):
principals = agent_data.get('sharingPrincipals', [])
# Add error handling and logging
for principal in principals:
if not isinstance(principal, dict):
print(f"Unexpected principal format: {principal}")
continue
# Extract group ID with fallback logic
group_id = principal.get('id') or principal.get('principalId')
For approved group policy gaps:
1. Identify missing groups from false positive violations
2. Validate with security team that groups should be approved
3. Add missing groups to gov_asardsecuritygrouppolicy table
4. Re-run detection scan
For zone-specific policy issues:
1. Verify zone matching: Ensure agent's environment zone matches policy zone
2. Add cross-zone policies: If group should be approved in multiple zones:
- Create separate policy records for each zone
- Or use wildcard zone (*) if appropriate
3. Review zone hierarchy: Some organizations need parent/child zone logic
For wildcard policy misconfigured:
1. Verify wildcard format: Policy record should have:
- gov_groupobjectid = "*"
- gov_zone = "<target-zone>"
- gov_isactive = true
2. Check script logic: Ensure detection script recognizes wildcard:
if group_id in approved_groups or "*" in approved_groups:
return True # Approved
Remediation Failures
Symptom
remediate_agent_sharing.pyscript fails during execution- Error:
"error": {"code": "BadRequest", "message": "Invalid PATCH operation"} - Agent sharing not removed after remediation runs
- Script reports success but sharing still present
Causes
- PATCH operation errors: API rejects sharing removal request
- Validation failures: Sharing configuration cannot be modified (e.g., owner sharing)
- Concurrent modifications: Another process modified agent during remediation
- Insufficient permissions: Service principal cannot modify agent sharing
- Agent state issues: Agent in draft mode or being published
Diagnostic Steps
1. Test remediation in WhatIf mode:
python remediate_agent_sharing.py --whatif
2. Test single agent remediation:
python remediate_agent_sharing.py --agent-id "<agent-id>" --dry-run
3. Check agent state:
agent = bap_client.get_agent(agent_id)
print(f"State: {agent.get('state')}")
print(f"Owner: {agent.get('owner')}")
print(f"Sharing: {agent.get('sharingPrincipals')}")
4. Verify permissions:
# Check service principal role in environment
pac admin list --environment <env-id>
Resolution
For PATCH operation errors: 1. Review API error details: Check full error response for specific issue 2. Validate PATCH payload: Ensure request body format matches API requirements:
{
"sharingPrincipals": [
{"id": "keep-this-group-id"},
// Removed group omitted from array
]
}
pac agent share --agent-id <agent-id> --remove --principal <group-id>
For validation failures (owner sharing): - Limitation: Agent owners cannot be removed via sharing API - Workaround: Change agent owner first, then remove old owner from sharing - Exception: Create exception for owner sharing if legitimate
For concurrent modifications: 1. Implement optimistic concurrency: Use ETag headers:
# Get current ETag
response = requests.get(agent_url, headers=headers)
etag = response.headers.get('ETag')
# Include in PATCH request
patch_headers = {**headers, 'If-Match': etag}
requests.patch(agent_url, headers=patch_headers, json=data)
For insufficient permissions: 1. Grant System Administrator role to service principal in target environment 2. Or use delegated authentication with user account that has appropriate permissions
For agent state issues: 1. Wait for publishing: If agent is being published, retry after completion 2. Handle draft agents: Skip remediation for draft agents or wait until published 3. Add state check:
if agent.get('state') != 'Published':
print(f"Skipping agent in {agent.get('state')} state")
return
Post-validation required: - After successful remediation, always re-run detection scan - Verify sharing actually removed (not just API success) - Update violation status in Dataverse
Exception Workflow Issues
Symptom
- Expired exceptions not reset to active violations
- Exception expiration notifications not sent
- Exception review workflow does not trigger
- Exception records not created properly
Causes
- Flow misconfiguration: Triggers or conditions not set correctly
- Webhook failures: Teams webhook URL invalid or channel deleted
- Trigger conditions: Recurrence schedule or data trigger not working
- Dataverse connection issues: Flow cannot read/write exception records
- Adaptive card errors: Card payload invalid or missing required fields
Diagnostic Steps
1. Check flow run history: - Navigate to Power Automate > My flows > Exception Review Workflow - Click Run history - Review failed runs for error details
2. Test webhook manually:
curl -X POST <webhook-url> \
-H "Content-Type: application/json" \
-d '{"text": "Test notification from ASARD"}'
3. Query exception records:
SELECT gov_asardexceptionid, gov_expirationdate, gov_status
FROM gov_asardexception
WHERE gov_expirationdate < GETDATE() AND gov_status = 'active'
4. Test flow manually: - Open flow in Power Automate - Click Test > Manually - Provide test data and observe execution
Resolution
For flow misconfiguration: 1. Verify recurrence trigger: - Open flow > Edit trigger - Confirm schedule is correct (e.g., Daily at 8 AM) - Check time zone settings
- Update filter conditions:
- In "Get expired exceptions" action
- Filter query should be:
gov_expirationdate lt '@{utcNow()}' and gov_status eq 'active'
For webhook failures: 1. Regenerate webhook: - Remove old webhook from Teams channel - Create new webhook - Update flow with new URL
- Test webhook connectivity:
- Use curl or Postman to send test message
- Check Teams channel for delivery
For trigger conditions: 1. Review trigger settings: - Recurrence: Verify interval and frequency - Dataverse trigger: Check table and filter conditions
- Enable flow:
- Ensure flow is turned on (not disabled)
- Check if any connections need reauthentication
For Dataverse connection issues: 1. Reauthenticate connection: - Flow > Edit > Connections - Remove and re-add Dataverse connection - Use account with appropriate permissions
- Check table permissions:
- Verify flow's connection has read/write access to exception table
For adaptive card errors: 1. Validate card payload: - Copy adaptive card JSON from flow action - Test in Adaptive Cards Designer - Fix any schema errors
- Check dynamic content:
- Ensure all dynamic fields (e.g.,
@{items('Apply_to_each')?['gov_agentid']}) have fallback values - Add null checks:
@{coalesce(items('Apply_to_each')?['gov_agentid'], 'N/A')}
Monitoring exceptions workflow health: - Set up flow analytics in Power Automate - Configure failure notifications to governance team - Review run history weekly for patterns
Related Documentation
- ASARD Deployment Guide
- ASARD Exception Management Guide
- Power Platform Admin API Documentation
- Dataverse Web API Documentation
- Power Automate Troubleshooting
Updated: February 2026 | Version: v1.2