Control 2.5: Testing, Validation, and Quality Assurance

Control ID: 2.5 Pillar: Management Regulatory Reference: SOX 302/404, FINRA 4511, FINRA Rule 3110, GLBA 501(b), OCC 2011-12 Last UI Verified: January 2026 Governance Levels: Baseline / Recommended / Regulated Last Verified: 2026-02-03

Agent 365 Architecture Update

Agent 365 lifecycle management supports promotion gates that can enforce testing and validation requirements before agents move between environments. See Unified Agent Governance for promotion gate configuration.

Objective

Ensure Copilot Studio agents function correctly, securely, and fairly before deployment to production through comprehensive testing including functional validation, security testing, performance benchmarking, bias detection, and user acceptance testing.

Why This Matters for FSI

SOX 302/404: Internal control testing requires documented test procedures and results
FINRA 4511: Testing records support books and records requirements for validated agent behavior
OCC 2011-12: Model validation requires independent testing of AI behavior
FINRA Rule 3110: AI supervision requires bias testing before deployment
GLBA 501(b): Security program requires security testing verification

Control Description

This control establishes comprehensive testing requirements for AI agents across the development lifecycle:

Test Type	Description	FSI Application
Functional Testing	Verify agent responds correctly to user queries	Customer accuracy
Security Testing	Validate data protection and access controls	Data protection
Performance Testing	Verify agents respond within 3 seconds (Zone 1-2) or 2 seconds (Zone 3) for standard queries	Customer experience
Bias Testing	Detect and mitigate unfair treatment	FINRA 3110 compliance
Regression Testing	Confirm changes don't break existing functionality	Change management
UAT	Business validation before production deployment	Stakeholder approval

Evaluation Gates

Gate	Lifecycle Stage	Required Validations
Gate 1	Design > Build	Business justification, risk classification
Gate 2	Build > Evaluate	Prompt injection testing, authentication validation
Gate 3	Evaluate > Deploy	Test pass rate >95%, UAT sign-off
Gate 4	Deploy > Monitor	Compliance approval, rollback plan documented

Key Configuration Points

Establish dedicated test environments with production-equivalent configuration
Create golden datasets with known-correct question-answer pairs (Zone 2: 50+, Zone 3: 150+ entries)
Configure Copilot Studio Agent Evaluation metrics (groundedness, relevance, coherence)
Use Agent Evaluation with Test Sets (Preview) for structured multi-version comparison testing:
- Create test set CSV templates with expected question-answer pairs
- Run evaluations across agent versions with automated scoring
- Review activity maps and thumbs-up/down feedback analytics
- Track evaluation results over time for regression detection
Consider the Copilot Studio Kit (GA, October 2025) from Power CAT for additional testing tools including AI content validation and KPI analysis
Implement automated regression testing in CI/CD pipeline
Track hallucination rate with zone-appropriate thresholds (Zone 2: <5%, Zone 3: <2%)
Require UAT sign-off from business owners before production deployment
Retain test evidence per regulatory requirements (Zone 2: 3 years, Zone 3: 7–10 years)

February 2026 Pipeline Deadline

Pipeline target environments used for testing will be automatically enabled as Managed Environments starting February 2026. Organizations should review Control 2.1 for deadline details and licensing requirements.

Zone-Specific Requirements

Zone	Requirement	Rationale
Zone 1 (Personal)	Basic functional testing; security scan only; informal documentation	Low risk, personal use
Zone 2 (Team)	Full test suite required; golden dataset recommended; UAT sign-off required	Shared agents require higher standards
Zone 3 (Enterprise)	Comprehensive testing with independent review; golden dataset mandatory; Compliance approval	Customer-facing, regulatory examination focus

Roles & Responsibilities

Role	Responsibility
AI Governance Lead	Define testing standards, approve test plans
QA Lead	Execute test suites, manage golden datasets
Business Owner	Complete UAT, sign off on production readiness
Compliance Officer	Approve Zone 3 deployment, verify regulatory tests

Control	Relationship
2.2 - Environment Groups	Test environment classification
2.3 - Change Management	Pre-deployment testing gates
2.11 - Bias Testing	Fairness assessment
2.18 - Conflict of Interest Testing	COI testing for agent recommendations (COI Testing Framework)
2.20 - Adversarial Testing	Security testing

Implementation Playbooks

Step-by-Step Implementation

This control has detailed playbooks for implementation, automation, testing, and troubleshooting:

Portal Walkthrough — Step-by-step portal configuration
PowerShell Setup — Automation scripts
Verification & Testing — Test cases and evidence collection
Troubleshooting — Common issues and resolutions

Verification Criteria

Confirm control effectiveness by verifying:

Test strategy documented with zone-appropriate requirements
Test environments configured and matching production settings
Golden dataset created with minimum entries for zone level
Performance testing confirms response times meet targets: <3s for Zone 1-2, <2s for Zone 3
Automated regression tests integrated in deployment pipeline
UAT sign-off obtained from business owners
Test evidence retained per regulatory retention policy

Additional Resources

Regulatory Guidance:

FINRA Regulatory Notice 15-09: Algorithmic Trading Strategies — Precedent for automated system testing and supervision; applies testing principles to AI agents

Microsoft Documentation:

FINRA Notice 15-09 Testing Precedent

FINRA Regulatory Notice 15-09 (March 2015) addresses supervision of algorithmic trading strategies and provides a useful precedent for AI agent testing. Key principles include:

Pre-deployment testing: Test strategies in controlled environments before production
Ongoing monitoring: Continuously monitor algorithm performance
Kill switch capability: Ability to halt algorithm operation quickly
Change testing: Re-test after any modification

These principles apply directly to AI agent governance. Treat AI agents as "algorithms" requiring similar rigor.

Agent 365 SDK Testing (Preview)

Preview Notice

Microsoft Agent 365 SDK and Agent Essentials are in limited preview (Frontier program). Verify feature availability and GA timelines before implementing production controls dependent on these capabilities. Expect changes before general availability.

Agent 365 SDK provides additional testing capabilities for Blueprint-registered agents:

CLI-based test execution for CI/CD integration
Evaluation metrics aligned with Blueprint promotion gates

CLI-Based Test Execution:

Agent 365 SDK includes a test runner CLI for automated testing in CI/CD pipelines:

# Run agent test suite
agent365 test run --manifest ./agent-manifest.json --dataset ./golden-dataset.json

# Run specific test category
agent365 test run --category security --manifest ./agent-manifest.json

# Export test results for compliance evidence
agent365 test run --output ./test-results.json --format compliance

Test Categories Aligned with Blueprint Gates:

Category	Gate	Validation
Functional	Build → Deploy	Response accuracy against golden dataset
Security	Build → Deploy	Prompt injection resistance, auth validation
Performance	Build → Deploy	Response time <2s (Zone 3), <3s (Zone 2)
Compliance	Deploy → Production	Data handling, audit logging verification

CI/CD Integration Example (Azure DevOps):

# azure-pipelines.yml excerpt
- task: Bash@3
  displayName: 'Run Agent 365 Tests'
  inputs:
    script: |
      agent365 test run \
        --manifest $(Build.SourcesDirectory)/agent-manifest.json \
        --dataset $(Build.SourcesDirectory)/tests/golden-dataset.json \
        --output $(Build.ArtifactStagingDirectory)/test-results.json \
        --threshold 95
    failOnStderr: true

Zone-Specific Test Requirements:

Zone	CLI Test Requirement	Evidence Retention
Zone 1	Manual testing acceptable	90 days
Zone 2	CLI tests in deployment pipeline	3 years
Zone 3	CLI tests required; blocking on failure	7–10 years

Microsoft Learn: Agent 365 SDK Overview (Preview) - SDK overview including testing capabilities

Updated: January 2026 | Version: v1.2 | UI Verification Status: Current