Skip to content

Troubleshooting

Common issues, error recovery procedures, and rollback guidance.

Error Categories

Category Severity Example
Authentication High SP credential expired
Provisioning High Environment creation failed
Configuration Medium Baseline settings not applied
Integration Medium Flow trigger not firing
Validation Low Invalid security group name

Authentication Errors

401 Unauthorized - Service Principal

Symptoms: - Power Automate flow fails with 401 - Error: "The client credentials are invalid"

Causes: 1. Client secret expired 2. Secret not correctly stored in Key Vault 3. Wrong tenant/client ID

Resolution:

  1. Check secret expiry in Entra ID:
  2. App registrations > Your app > Certificates & secrets
  3. Verify secret hasn't expired

  4. Rotate if expired:

    python scripts/register_service_principal.py \
      --tenant-id <tenant> \
      --app-name ELM-Provisioning-ServicePrincipal \
      --key-vault-name <vault> \
      --expiry-days 90 \
      --rotate-secret \
      --verbose
    

  5. Test connection:

    python scripts/elm_client.py --test-connection
    

403 Forbidden - Insufficient Permissions

Symptoms: - Environment creation returns 403 - "The caller does not have the required permissions"

Causes: 1. SP not registered as Power Platform Management App 2. SP application user missing in Dataverse 3. ELM Admin role not assigned

Resolution:

  1. Verify PPAC registration:
  2. admin.powerplatform.microsoft.com > Settings > Service principal
  3. Status should show "Enabled"

  4. Re-register if needed:

  5. Enter Application ID again
  6. Click Create

  7. Verify Dataverse application user:

  8. Environment > Settings > Users
  9. Filter to Application users
  10. Confirm SP appears with ELM Admin role

Provisioning Errors

Environment Creation Timeout

Symptoms: - Flow exceeds 60-minute timeout - State stuck at "Provisioning" - ProvisioningLog shows "ProvisioningStarted" but no "EnvironmentCreated"

Causes: 1. Power Platform capacity issues 2. Region-specific delays 3. Service disruption

Resolution:

  1. Check environment status in PPAC:
  2. admin.powerplatform.microsoft.com > Environments
  3. Search for environment name
  4. Check provisioning state

  5. If environment exists but flow timed out:

  6. Manually update EnvironmentRequest:
    • Set fsi_environmentid to environment GUID
    • Set fsi_state to Provisioning (6)
  7. Re-trigger remaining steps manually or wait for next flow run

  8. If environment doesn't exist:

  9. Set fsi_state back to Approved (4)
  10. Flow will re-trigger and retry

Environment Creation Failed

Symptoms: - Power Platform connector returns error - ProvisioningLog shows "ProvisioningFailed"

Common Error Messages:

Error Cause Resolution
"EnvironmentQuotaExceeded" Tenant environment limit Request quota increase or delete unused environments
"InvalidRegion" Region not available Use different region or verify region spelling
"CapacityNotAvailable" No capacity in region Try different region or wait
"DuplicateName" Environment name exists Use unique name

Resolution:

  1. Review error in ProvisioningLog fsi_errormessage
  2. Address root cause
  3. Set fsi_state back to Approved (4) to retry

Managed Environment Enable Failed

Symptoms: - Environment created but not managed - ProvisioningLog missing "ManagedEnabled" entry

Causes: 1. API version mismatch 2. Environment type doesn't support managed 3. Transient API error

Resolution:

  1. Manually enable via PPAC:
  2. Environment > Settings > Edit
  3. Enable Managed Environment

  4. Or via PowerShell:

    Set-AdminPowerAppEnvironmentGovernanceConfiguration `
      -EnvironmentName <env-id> `
      -EnableGovernanceConfiguration $true
    

  5. Log manual action in ProvisioningLog

Environment Group Assignment Failed

Symptoms: - Environment created but not in expected group - Error: "Environment group not found"

Causes: 1. Group name mismatch (case-sensitive) 2. Group deleted 3. Transient API error

Resolution:

  1. Verify group exists:
  2. admin.powerplatform.microsoft.com > Environment groups
  3. Confirm exact name matches flow variable

  4. Manually add to group:

  5. Select environment group
  6. Add environment

  7. Update flow variable if name changed


Configuration Errors

Baseline Configuration Not Applied

Symptoms: - Environment created but auditing not enabled - Session timeout not set

Causes: 1. Child flow failed 2. Organization ID lookup failed 3. Permission denied on settings

Resolution:

  1. Check child flow run history
  2. Manually apply settings via PowerShell:
# Connect to environment
$conn = Connect-CrmOnline -ServerUrl "https://<env>.crm.dynamics.com"

# Get organization ID
$org = Get-CrmOrganizations | Select-Object -First 1

# Update settings
Set-CrmRecord -conn $conn -EntityLogicalName organization -Id $org.OrganizationId -Fields @{
  "isauditenabled" = $true
  "isuseraccessauditenabled" = $true
  "auditretentionperiodv2" = 365
  "sessiontimeoutenabled" = $true
  "sessiontimeoutinmins" = 480
}

Security Group Binding Failed

Symptoms: - Zone 2/3 environment without security group - Error: "Security group not found"

Causes: 1. Invalid security group ID 2. Group deleted after request 3. Graph API permission issue

Resolution:

  1. Verify group exists:

    az ad group show --group "<group-id>"
    

  2. If group exists, manually bind:

  3. PPAC > Environment > Settings > Edit
  4. Security group: Select correct group

  5. If group doesn't exist, contact requester for valid group


Integration Errors

Flow Trigger Not Firing

Symptoms: - Request approved but no provisioning starts - Flow run history shows no recent runs

Causes: 1. Trigger filter condition not met 2. Flow disabled 3. Connection expired

Resolution:

  1. Verify flow is enabled:
  2. make.powerautomate.com > My flows
  3. Check flow status

  4. Check connections:

  5. All connections should show green checkmark

  6. Verify trigger condition:

  7. Filter: fsi_state eq 4
  8. Manually verify request has state = 4 (Approved)

  9. Test trigger:

  10. Update a test request to state = Approved
  11. Check if flow runs

Copilot Agent Not Responding

Symptoms: - Agent doesn't respond to requests - Error: "Something went wrong"

Causes: 1. Agent not published 2. Authentication issue 3. Power Automate action failed

Resolution:

  1. Verify agent is published:
  2. Copilot Studio > Agent > Publish status

  3. Check authentication:

  4. Settings > Security
  5. Verify "Authenticate with Microsoft" is configured

  6. Test topics individually:

  7. Use Test panel in Copilot Studio
  8. Check for errors in each node

Validation Errors

Invalid Environment Name

Symptoms: - Agent rejects environment name - Error: "Name must follow pattern"

Resolution:

Correct naming format: DEPT-Purpose-TYPE

Valid Invalid
FIN-Reporting-PROD Finance-Reporting-Production
IT-DevTest-SANDBOX it-devtest-sandbox
COMP-Risk-DEV COMP_Risk_DEV

Security Group Not Found

Symptoms: - Flow fails during group validation - Error: "Group not found in Entra ID"

Resolution:

  1. Verify exact group name with requester
  2. Search in Entra ID:
  3. entra.microsoft.com > Groups
  4. Search by name or ID

  5. If group is correct but lookup fails:

  6. Check Graph API permissions
  7. Verify Group.Read.All permission granted

Rollback Procedures

When to Rollback

Scenario Rollback? Action
Environment created, config failed No Complete config manually
Environment created, wrong zone Maybe Reconfigure or delete if <1 hour
Environment created with wrong name Yes Delete and recreate
Provisioning stuck Maybe Check PPAC first

Manual Rollback Steps

  1. Log rollback initiation:
  2. Create ProvisioningLog entry with action = RollbackInitiated

  3. Delete environment (if appropriate):

    # Only if environment is <1 hour old and hasn't been used
    Remove-AdminPowerAppEnvironment -EnvironmentName "<env-id>" -Confirm
    

  4. Update request:

  5. Set fsi_state = Failed (8)
  6. Clear fsi_environmentid
  7. Clear fsi_environmenturl

  8. Log rollback completion:

  9. Create ProvisioningLog entry with action = RollbackCompleted

  10. Notify requester:

  11. Explain what happened
  12. Provide next steps (resubmit or modified request)

Rollback Decision Matrix

Time Since Creation Data Added Rollback?
< 5 minutes No Yes - Auto
5-60 minutes No Yes - Manual approval
> 60 minutes No Review case-by-case
Any Yes No - Manual remediation

Evidence Collection Errors

Export Script Fails

Symptoms: - export_quarterly_evidence.py returns error - Empty or incomplete export files

Causes: 1. Authentication failure 2. Date range issues 3. Large dataset timeout

Resolution:

  1. Test authentication:

    python scripts/elm_client.py --test-connection
    

  2. Try smaller date range:

    python scripts/export_quarterly_evidence.py \
      --start-date 2026-01-01 \
      --end-date 2026-01-31
    

  3. Check FetchXML query in script output (verbose mode)

Immutability Check Fails

Symptoms: - validate_immutability.py reports violations - Unexpected audit entries found

Investigation:

  1. Run detailed check:

    python scripts/validate_immutability.py \
      --environment-url https://<org>.crm.dynamics.com \
      --verbose
    

  2. Review audit entries:

  3. Who attempted modification?
  4. When did it occur?
  5. Was it blocked?

  6. If modifications succeeded:

  7. CRITICAL: Security incident
  8. Review security role configuration
  9. Check for System Administrator overrides
  10. Document and report per security policy

Getting Help

Information to Gather

Before escalating, collect:

  1. Request details:
  2. Request number (REQ-XXXXX)
  3. Requested environment name
  4. Zone classification

  5. Error information:

  6. Exact error message
  7. ProvisioningLog entries
  8. Flow run ID (correlation ID)

  9. Environment:

  10. Governance environment URL
  11. Target environment (if created)
  12. Timestamp of failure

Escalation Path

Issue Type Contact
Flow failures Platform Operations
Permission issues Identity team
API errors Microsoft Support
Security incidents Security Operations

Microsoft Support

For platform issues, open support case:

  1. admin.powerplatform.microsoft.com > Help + support
  2. Include:
  3. Environment ID
  4. Correlation ID from flow
  5. Error message
  6. Timestamp (UTC)

Preventive Measures

Monitoring

Set up alerts for:

  • Flow failures (Power Automate)
  • Environment creation failures (PPAC alerts)
  • Credential expiry (90 days before)
  • Immutability violations (weekly check)

Regular Maintenance

Task Frequency Script/Action
Credential rotation 90 days register_service_principal.py --rotate
Immutability check Weekly validate_immutability.py
Role privilege audit Monthly verify_role_privileges.py
Connection health check Weekly Manual in Power Automate
Quarterly evidence export Quarterly export_quarterly_evidence.py