Control 2.5: Testing, Validation, and Quality Assurance
Control ID: 2.5 Pillar: Management Regulatory Reference: SOX 302/404, FINRA 4511, FINRA Rule 3110, GLBA 501(b), OCC 2011-12 Last UI Verified: January 2026 Governance Levels: Baseline / Recommended / Regulated Last Verified: 2026-02-03
Agent 365 Architecture Update
Agent 365 lifecycle management supports promotion gates that can enforce testing and validation requirements before agents move between environments. See Unified Agent Governance for promotion gate configuration.
Objective
Ensure Copilot Studio agents function correctly, securely, and fairly before deployment to production through comprehensive testing including functional validation, security testing, performance benchmarking, bias detection, and user acceptance testing.
Why This Matters for FSI
- SOX 302/404: Internal control testing requires documented test procedures and results
- FINRA 4511: Testing records support books and records requirements for validated agent behavior
- OCC 2011-12: Model validation requires independent testing of AI behavior
- FINRA Rule 3110: AI supervision requires bias testing before deployment
- GLBA 501(b): Security program requires security testing verification
Control Description
This control establishes comprehensive testing requirements for AI agents across the development lifecycle:
| Test Type | Description | FSI Application |
|---|---|---|
| Functional Testing | Verify agent responds correctly to user queries | Customer accuracy |
| Security Testing | Validate data protection and access controls | Data protection |
| Performance Testing | Verify agents respond within 3 seconds (Zone 1-2) or 2 seconds (Zone 3) for standard queries | Customer experience |
| Bias Testing | Detect and mitigate unfair treatment | FINRA 3110 compliance |
| Regression Testing | Confirm changes don't break existing functionality | Change management |
| UAT | Business validation before production deployment | Stakeholder approval |
Evaluation Gates
| Gate | Lifecycle Stage | Required Validations |
|---|---|---|
| Gate 1 | Design > Build | Business justification, risk classification |
| Gate 2 | Build > Evaluate | Prompt injection testing, authentication validation |
| Gate 3 | Evaluate > Deploy | Test pass rate >95%, UAT sign-off |
| Gate 4 | Deploy > Monitor | Compliance approval, rollback plan documented |
Key Configuration Points
- Establish dedicated test environments with production-equivalent configuration
- Create golden datasets with known-correct question-answer pairs (Zone 2: 50+, Zone 3: 150+ entries)
- Configure Copilot Studio Agent Evaluation metrics (groundedness, relevance, coherence)
- Use Agent Evaluation with Test Sets (Preview) for structured multi-version comparison testing:
- Create test set CSV templates with expected question-answer pairs
- Run evaluations across agent versions with automated scoring
- Review activity maps and thumbs-up/down feedback analytics
- Track evaluation results over time for regression detection
- Consider the Copilot Studio Kit (GA, October 2025) from Power CAT for additional testing tools including AI content validation and KPI analysis
- Implement automated regression testing in CI/CD pipeline
- Track hallucination rate with zone-appropriate thresholds (Zone 2: <5%, Zone 3: <2%)
- Require UAT sign-off from business owners before production deployment
- Retain test evidence per regulatory requirements (Zone 2: 3 years, Zone 3: 7–10 years)
February 2026 Pipeline Deadline
Pipeline target environments used for testing will be automatically enabled as Managed Environments starting February 2026. Organizations should review Control 2.1 for deadline details and licensing requirements.
Zone-Specific Requirements
| Zone | Requirement | Rationale |
|---|---|---|
| Zone 1 (Personal) | Basic functional testing; security scan only; informal documentation | Low risk, personal use |
| Zone 2 (Team) | Full test suite required; golden dataset recommended; UAT sign-off required | Shared agents require higher standards |
| Zone 3 (Enterprise) | Comprehensive testing with independent review; golden dataset mandatory; Compliance approval | Customer-facing, regulatory examination focus |
Roles & Responsibilities
| Role | Responsibility |
|---|---|
| AI Governance Lead | Define testing standards, approve test plans |
| QA Lead | Execute test suites, manage golden datasets |
| Business Owner | Complete UAT, sign off on production readiness |
| Compliance Officer | Approve Zone 3 deployment, verify regulatory tests |
Related Controls
| Control | Relationship |
|---|---|
| 2.2 - Environment Groups | Test environment classification |
| 2.3 - Change Management | Pre-deployment testing gates |
| 2.11 - Bias Testing | Fairness assessment |
| 2.18 - Conflict of Interest Testing | COI testing for agent recommendations (COI Testing Framework) |
| 2.20 - Adversarial Testing | Security testing |
Implementation Playbooks
Step-by-Step Implementation
This control has detailed playbooks for implementation, automation, testing, and troubleshooting:
- Portal Walkthrough — Step-by-step portal configuration
- PowerShell Setup — Automation scripts
- Verification & Testing — Test cases and evidence collection
- Troubleshooting — Common issues and resolutions
Verification Criteria
Confirm control effectiveness by verifying:
- Test strategy documented with zone-appropriate requirements
- Test environments configured and matching production settings
- Golden dataset created with minimum entries for zone level
- Performance testing confirms response times meet targets: <3s for Zone 1-2, <2s for Zone 3
- Automated regression tests integrated in deployment pipeline
- UAT sign-off obtained from business owners
- Test evidence retained per regulatory retention policy
Additional Resources
Regulatory Guidance:
- FINRA Regulatory Notice 15-09: Algorithmic Trading Strategies — Precedent for automated system testing and supervision; applies testing principles to AI agents
Microsoft Documentation:
- Microsoft Learn: Test your Copilot Studio agent
- Microsoft Learn: Solution checker
- Microsoft Learn: Power Platform testing guidance
- Microsoft Learn: Azure DevOps Test Plans
FINRA Notice 15-09 Testing Precedent
FINRA Regulatory Notice 15-09 (March 2015) addresses supervision of algorithmic trading strategies and provides a useful precedent for AI agent testing. Key principles include:
- Pre-deployment testing: Test strategies in controlled environments before production
- Ongoing monitoring: Continuously monitor algorithm performance
- Kill switch capability: Ability to halt algorithm operation quickly
- Change testing: Re-test after any modification
These principles apply directly to AI agent governance. Treat AI agents as "algorithms" requiring similar rigor.
Agent 365 SDK Testing (Preview)
Preview Notice
Microsoft Agent 365 SDK and Agent Essentials are in limited preview (Frontier program). Verify feature availability and GA timelines before implementing production controls dependent on these capabilities. Expect changes before general availability.
Agent 365 SDK provides additional testing capabilities for Blueprint-registered agents:
- CLI-based test execution for CI/CD integration
- Evaluation metrics aligned with Blueprint promotion gates
CLI-Based Test Execution:
Agent 365 SDK includes a test runner CLI for automated testing in CI/CD pipelines:
# Run agent test suite
agent365 test run --manifest ./agent-manifest.json --dataset ./golden-dataset.json
# Run specific test category
agent365 test run --category security --manifest ./agent-manifest.json
# Export test results for compliance evidence
agent365 test run --output ./test-results.json --format compliance
Test Categories Aligned with Blueprint Gates:
| Category | Gate | Validation |
|---|---|---|
| Functional | Build → Deploy | Response accuracy against golden dataset |
| Security | Build → Deploy | Prompt injection resistance, auth validation |
| Performance | Build → Deploy | Response time <2s (Zone 3), <3s (Zone 2) |
| Compliance | Deploy → Production | Data handling, audit logging verification |
CI/CD Integration Example (Azure DevOps):
# azure-pipelines.yml excerpt
- task: Bash@3
displayName: 'Run Agent 365 Tests'
inputs:
script: |
agent365 test run \
--manifest $(Build.SourcesDirectory)/agent-manifest.json \
--dataset $(Build.SourcesDirectory)/tests/golden-dataset.json \
--output $(Build.ArtifactStagingDirectory)/test-results.json \
--threshold 95
failOnStderr: true
Zone-Specific Test Requirements:
| Zone | CLI Test Requirement | Evidence Retention |
|---|---|---|
| Zone 1 | Manual testing acceptable | 90 days |
| Zone 2 | CLI tests in deployment pipeline | 3 years |
| Zone 3 | CLI tests required; blocking on failure | 7–10 years |
- Microsoft Learn: Agent 365 SDK Overview (Preview) - SDK overview including testing capabilities
Updated: January 2026 | Version: v1.2 | UI Verification Status: Current