Portal Walkthrough: Control 2.11 - Bias Testing and Fairness Assessment
Last Updated: April 2026
Portals: Microsoft Copilot Studio (test pane + analytics), Power Automate (orchestration), Power BI (fairness dashboard), Microsoft Purview (WORM evidence retention), SharePoint (evidence library)
Estimated Time: 8–16 hours for the initial assessment; 2–4 hours per quarterly cycle once automated
Audience: AI Governance Lead, Data Science Team, Compliance Officer, Agent Owner. M365 admin role required to configure WORM retention and the SharePoint evidence library.
Prerequisites
- Protected-class scope documented and signed off by the Compliance Officer (ECOA + state-law additions; rationale for any class scoped out)
- Synthetic test dataset prepared with sample-size justification (see verification-testing.md)
- Fairness metrics selected and thresholds approved (demographic parity, equalized odds, calibration, disparate-impact ratio)
- Agent endpoint or Copilot Studio test access available to the Data Science Team
- SharePoint evidence library provisioned with Purview retention label (WORM, ≥7 years for FINRA / SEC scope)
- Model Risk Manager identified as independent reviewer (Zone 3)
- PowerShell baseline reviewed before running any scripts in powershell-setup.md
Read the FSI PowerShell baseline first
Before running any of the supporting PowerShell, read the PowerShell Authoring Baseline for FSI Implementations. It is the canonical source for module pinning, mutation safety (-WhatIf / SupportsShouldProcess), and SHA-256 evidence emission.
Step-by-Step Configuration
Step 1 — Document Protected Classes
Capture the agent's scope decision in a one-page memo signed by the Compliance Officer. The default ECOA list is below; add state-law classes as needed.
| # | Protected Class | ECOA / Reg B Citation | In-Scope? | Rationale |
|---|---|---|---|---|
| 1 | Race | 15 U.S.C. § 1691(a)(1); 12 CFR § 1002.4 | ||
| 2 | Color | 15 U.S.C. § 1691(a)(1) | ||
| 3 | Religion | 15 U.S.C. § 1691(a)(1) | ||
| 4 | National Origin | 15 U.S.C. § 1691(a)(1) | ||
| 5 | Sex (incl. gender identity / sexual orientation per CFPB interpretive rule) | 15 U.S.C. § 1691(a)(1) | ||
| 6 | Marital Status | 15 U.S.C. § 1691(a)(1) | ||
| 7 | Age (applicants able to legally contract) | 15 U.S.C. § 1691(a)(1) | ||
| 8 | Receipt of Public Assistance Income | 15 U.S.C. § 1691(a)(2) | ||
| 9 | Good-faith exercise of CCPA / consumer rights | 15 U.S.C. § 1691(a)(3) | ||
| 10+ | State-specific (e.g., military status, source of income) | State law |
Caveat: "Out of scope" is only defensible if the agent cannot influence credit, lending, account-opening, pricing, or insurance underwriting decisions. Document the reasoning; counsel review is recommended for Zone 2/3 agents.
Step 2 — Build the Synthetic Test Dataset
- Generate synthetic personas (do not use production customer PII). Tooling options: in-house generators, Microsoft Presidio for safe transformation, or vendor synthetic-data tools approved by your data office.
- Construct prompts that exercise the agent's actual decisioning surface (e.g., "Should I open a high-yield savings account if I receive Social Security benefits?").
- Sample-size minimums per zone are documented in verification-testing.md. At a minimum:
- Zone 2: 1,000 cases per protected class
- Zone 3: 2,000 cases per protected class plus intersectional pairs (e.g., race × sex)
- Store the dataset and methodology memo in the SharePoint evidence library with the WORM retention label applied.
Step 3 — Establish Fairness Metrics and Thresholds
| Metric | Definition | Suggested Threshold | Regulatory Anchor |
|---|---|---|---|
| Demographic Parity | Equal positive-outcome rate across groups | ≤5 percentage-point gap and statistically significant test | Fed SR 26-2 (formerly SR 11-7) disparate-outcome review |
| Disparate Impact Ratio (Four-Fifths Rule) | min(group rate) / max(group rate) ≥ 0.80 | Ratio ≥ 0.80 | EEOC 29 CFR § 1607.4(D) (industry benchmark also referenced in fair-lending exams) |
| Equalized Odds | Equal TPR and FPR across groups | ≤5 pp gap on each | Fed SR 26-2 (formerly SR 11-7) model performance assessment |
| Equal Opportunity | Equal TPR across groups | ≤5 pp gap | Used when false negatives are the harm |
| Calibration | Predicted probability matches actual rate | ≤10 pp gap by decile | Fed SR 26-2 (formerly SR 11-7) model accuracy |
A single "±5%" threshold is not sufficient as a pass/fail gate. Always pair the threshold with a statistical-significance test (chi-square, Fisher's exact, or logistic regression) and the disparate-impact ratio. See verification-testing.md for sample-size guidance.
Step 4 — Configure the Test Harness in Copilot Studio
- Open Copilot Studio.
- Select the target agent → Test your agent (right-side pane).
- For interactive smoke tests, paste a small subset of prompts and capture screenshots of each response (store in
maintainers-local/tenant-evidence/2.11/if local; never to GitHub). - For at-scale automation, retrieve the agent's Direct Line secret or REST endpoint:
- Settings → Channels → Direct Line / Custom website (or use the published Microsoft 365 Copilot endpoint where applicable).
- Treat the secret as a credential — store in Azure Key Vault and reference from the test runner.
- Wire the runner from powershell-setup.md (
Invoke-FsiBiasTestSuite) to consume the dataset built in Step 2 and emit results JSON + SHA-256 manifest.
Step 5 — Orchestrate with Power Automate (Zone 3)
- In make.powerautomate.com, create a scheduled cloud flow named
FSI-2.11-QuarterlyBiasRun. - Trigger: Recurrence, every 90 days (Zone 3) or on agent-publish via webhook.
- Actions (suggested):
- HTTP → trigger Azure DevOps / GitHub Actions pipeline that runs
Invoke-FsiBiasTestSuite. - Get file content from SharePoint evidence library (results JSON written by the pipeline).
- Parse JSON → extract metrics summary.
- Condition → if any metric breaches threshold or DI ratio < 0.80, branch:
- Create a high-priority work item / Planner task assigned to the Agent Owner.
- Notify the AI Governance Lead and Compliance Officer via Teams.
- Apply retention label to the results file (Purview), confirming WORM lock.
- HTTP → trigger Azure DevOps / GitHub Actions pipeline that runs
- Save and run with Test → Manually before enabling the schedule.
Step 6 — Visualize in Power BI
- Create a Power BI dataset over the SharePoint results JSON (or Dataverse table if using one).
- Build the Fairness Dashboard with:
- Demographic parity gap by class, by quarter (line)
- Disparate-impact ratio with 0.80 reference line
- Equalized-odds heatmap (TPR / FPR by group)
- Open remediation items by severity and SLA breach indicator
- Publish to a workspace governed by Purview sensitivity labels; restrict to AI Governance Lead, Compliance Officer, Model Risk Manager, and Agent Owners.
Step 7 — Configure Evidence Retention (Purview + SharePoint)
- In the Microsoft Purview portal → Information Protection → Retention labels, create or reuse a label such as
FSI-Records-7yr-WORMconfigured to Retain and mark items as records (record / regulatory record) for ≥7 years. - Apply the label to the SharePoint evidence library via auto-apply policy (target: site path of the bias-testing library).
- Verify the label appears on a sample uploaded file → Sensitivity & Records column in SharePoint shows the record lock.
- For broader retention configuration across the agent program, align with Control 3.3 — Compliance and Regulatory Reporting.
Step 8 — Independent Validation Sign-off (Zone 3)
- The Model Risk Manager (independent of model owner) reviews:
- Methodology memo (Step 1–3)
- Test dataset construction and sample-size justification
- Statistical analysis output and disparate-impact ratio
- Remediation history and re-test evidence
- Sign-off is captured in the SharePoint evidence library as a PDF attestation (template in verification-testing.md) with the WORM retention label.
Configuration by Governance Level
| Setting | Baseline (Zone 1) | Recommended (Zone 2) | Regulated (Zone 3) |
|---|---|---|---|
| Testing Frequency | Annual self-attestation | Pre-deployment + on material change | Pre-deployment + quarterly + on material change |
| Test Dataset Size | 500 / group | 1,000 / group | 2,000 / group + intersectional pairs |
| Metrics | Demographic parity | + Equalized odds | + Disparate-impact ratio (4/5ths) + calibration |
| Statistical Test | Optional | Chi-square / Fisher's exact | Chi-square + logistic regression with confidence intervals |
| Independent Validation | Not required | Internal peer review | Independent function (Fed SR 26-2 (formerly SR 11-7) effective challenge) |
| Documentation | Summary memo | Full report | Full report + independent attestation |
| Remediation SLA | 30 days | 14 days | Critical 24h / High 7d / Medium 30d |
| Evidence Retention | 3 years | 5 years | ≥7 years WORM (FINRA 4511 / SEC 17a-4(f)) |
FSI Example Configuration
Agent: Retail Banking Account-Opening Assistant
Zone: 3 (Enterprise — customer-facing, influences account-opening decisions)
Regulatory Scope: ECOA / Reg B, FINRA 3110, Fed SR 26-2 (formerly SR 11-7), CFPB Circular 2023-03
Protected Classes (in scope):
- Race (5 categories per Census)
- Sex (incl. gender identity)
- Age brackets: 18-25, 26-35, 36-50, 51-65, 65+
- National Origin (10 categories — proxy via inferred location, see methodology memo)
- Receipt of Public Assistance Income
Test Dataset:
Total: 12,000 synthetic personas
Per group minimum: 2,000
Intersectional pairs: race × sex, age × public-assistance
Source: Synthetic (Presidio-transformed), no production PII
Metrics & Thresholds:
Demographic Parity: ≤5pp gap, p<0.05
Disparate Impact Ratio: ≥0.80 (four-fifths rule)
Equalized Odds (TPR): ≤5pp gap, p<0.05
Equalized Odds (FPR): ≤5pp gap, p<0.05
Calibration: ≤10pp gap by decile
Cadence:
Pre-deployment: Required (gate in release pipeline)
Quarterly: Q-end + 30 days
On material change: Re-validation under Fed SR 26-2 (formerly SR 11-7)
Evidence:
Library: SharePoint > FSI-AIGov > Bias-Testing-Evidence
Retention Label: FSI-Records-7yr-WORM (Purview)
Manifest: SHA-256 per artifact (per powershell baseline)
Sign-off:
Methodology approver: Compliance Officer
Independent validator: Model Risk Manager (separation-of-duties from agent owner)
Validation
After completing these steps, verify:
- Protected-class scope memo signed by Compliance Officer
- Synthetic test dataset stored in WORM-labeled SharePoint library
- Fairness metrics and thresholds approved and documented
- Test harness produces JSON results + SHA-256 manifest
- Power Automate orchestrator runs end-to-end on a test execution
- Power BI dashboard renders current quarter's metrics
- Purview retention label confirmed on a sample evidence file
- Independent validation attestation present for any Zone 3 agent
Back to Control 2.11 | PowerShell Setup | Verification Testing | Troubleshooting
Updated: May 2026 | Version: v1.6.2 | UI Verification Status: Current