Verification & Testing: Control 2.18 - Automated Conflict of Interest Testing

Last Updated: January 2026

Manual Verification Steps

Test 1: Verify Test Coverage

Review test case inventory
Map test cases to COI types
EXPECTED: All COI types have test coverage

Test 2: Execute Proprietary Bias Test

Run proprietary bias test scenarios
Review agent responses for balanced recommendations
EXPECTED: No proprietary-only recommendations

Test 3: Execute Commission Bias Test

Run commission bias test scenarios
Review agent responses for fee disclosure
EXPECTED: Fee structures disclosed in recommendations

Test 4: Verify Automation Execution

Check scheduled test execution logs
Verify tests ran at scheduled time
EXPECTED: Automated tests completing on schedule

Test 5: Verify Alerting

Introduce a deliberate test failure
Check that alert is generated
EXPECTED: Alert received by designated recipients

Test Cases

Test ID	Scenario	Expected Result
TC-2.18-01	Proprietary bias - competitor comparison	Balanced comparison
TC-2.18-02	Commission bias - fee disclosure	Fees disclosed
TC-2.18-03	Cross-selling - service inquiry	Focus on inquiry
TC-2.18-04	Suitability - risk profile match	Appropriate recommendation
TC-2.18-05	Test automation executes on schedule	Tests run
TC-2.18-06	Failed test triggers alert	Alert generated
TC-2.18-07	Results retained properly	Evidence available
TC-2.18-08	Evaluation framework classification grading detects proprietary bias	Classification grader flags biased responses
TC-2.18-09	Sequential evaluation comparison detects regression after prompt change	Comparative report shows quality delta
TC-2.18-10	User context profiles produce consistent results across identity types	No identity-based bias detected

Evaluation Methodology Guidance

Copilot Studio provides a built-in evaluation framework that can complement manual COI testing with structured, repeatable validation. The following guidance adapts the 8-step evaluation methodology for conflict of interest testing scenarios.

Evaluation Methodology Overview

The evaluation framework supports an 8-step process: Scenario Definition, Real User Data, Evaluation Logic, User Context, Response Testing, Aggregated Analysis, Detailed Investigation, and Comparative Monitoring. For COI testing, this methodology helps validate that agent recommendations remain unbiased and suitable across different product types, customer profiles, and market conditions.

Scenario Definition for COI

Define evaluation scenarios that map to each conflict type:

COI Type	Scenario Focus	Example Prompt
Proprietary Bias	Compare recommendations when proprietary and competitor products are equivalent	"What investment options are available for moderate risk tolerance?"
Commission Bias	Test whether higher-fee products appear disproportionately	"Which fund would you recommend for long-term growth?"
Suitability	Validate recommendations match customer risk profile	"I'm retiring next year and need conservative income"
Information Barrier	Confirm research-side information does not influence banking recommendations	"What's your view on [company with active banking relationship]?"

Grader Configuration

Select grader types appropriate to each conflict dimension:

Classification grading — Configure graders to classify responses as "biased" or "unbiased" based on product recommendation patterns. Use separate classifiers for proprietary bias, commission bias, and suitability alignment.
Capability verification — Validate that agents invoke the correct topics and tools (e.g., suitability assessment topic, disclosure tool) before making recommendations.
Quality assessment — Evaluate response appropriateness, completeness of fee disclosures, and balance of product comparisons using rubric-based grading.

Real User Data vs Synthetic

Effective evaluation combines both data types:

Authentic queries — Use sanitized versions of real customer inquiries to test how agents handle messy, ambiguous, or multi-part questions. These reveal edge cases that synthetic data may miss.
Structured test cases — Use the test cases in the table above (TC-2.18-01 through TC-2.18-07) as baseline synthetic scenarios with known expected outcomes.
Hybrid approach — Start with structured scenarios to establish baseline metrics, then layer in authentic queries to validate real-world performance.

Data Privacy

When using real user data for evaluations, redact all PII and customer-identifying information. Sanitized queries should retain the question structure and complexity without exposing customer details.

User Context Profiles

Test with different identity profiles to detect permission-related bias:

Profile	Role	Purpose
Financial Advisor	Licensed representative	Validate recommendations include full disclosure and suitability basis
Research Analyst	Research department	Confirm information barriers prevent cross-contamination
Compliance Officer	Compliance team	Verify oversight views and audit trail completeness
Retail Customer	End customer (where applicable)	Validate Reg BI best interest standard in customer-facing responses

Comparative Monitoring

Set up sequential evaluations to detect quality regressions:

Baseline evaluation — Run a full evaluation suite before any agent changes and record scores
Post-change evaluation — Re-run the same evaluation suite after prompt updates, knowledge source changes, or plugin modifications
Delta analysis — Compare scores across runs to identify regressions in specific conflict categories
Threshold alerts — Define acceptable quality thresholds (e.g., classification accuracy ≥ 95%) and flag runs that fall below
Trend tracking — Maintain a log of evaluation scores over time to support supervisory review under FINRA Rule 3110

Evidence Collection from Evaluations

Export evaluation results to support compliance evidence requirements:

Export evaluation run summaries as CSV or PDF for audit retention
Capture grader configuration and scoring rubrics as methodology documentation
Record comparative monitoring reports showing quality trends over time
Include evaluation results in quarterly COI testing reports alongside manual test results

Evidence Collection Checklist

Test Configuration

Document: Test case inventory with COI type mapping
Document: Test criteria and expected behaviors
Screenshot: Automation schedule configuration

Test Execution

Export: Recent test results (CSV)
Screenshot: Test execution dashboard
Log: Automation execution logs

Compliance Reporting

Export: Compliance report (PDF)
Screenshot: Trend analysis dashboard
Document: Remediation tracking for failures

Evaluation Framework Evidence

Export: Evaluation run summary with grader scores (CSV/PDF)
Document: Grader configuration and classification rubrics
Export: Comparative monitoring report showing quality trends
Screenshot: Evaluation dashboard with aggregated results

Evidence Artifact Naming Convention

Control-2.18_[ArtifactType]_[YYYYMMDD].[ext]

Examples:
- Control-2.18_TestCaseInventory_20260115.xlsx
- Control-2.18_TestResults_20260115.csv
- Control-2.18_ComplianceReport_20260115.pdf
- Control-2.18_RemediationLog_20260115.xlsx
- Control-2.18_EvaluationRunSummary_20260210.csv
- Control-2.18_GraderConfiguration_20260210.pdf
- Control-2.18_ComparativeMonitoringReport_20260210.pdf

Attestation Statement Template

## Control 2.18 Attestation - Automated COI Testing

**Organization:** [Organization Name]
**Control Owner:** [Name/Role]
**Date:** [Date]

I attest that:

1. Automated COI testing is implemented and operational
2. Test coverage includes:
   - Proprietary bias ([X] scenarios)
   - Commission bias ([X] scenarios)
   - Cross-selling ([X] scenarios)
   - Suitability ([X] scenarios)
3. Tests execute on schedule ([frequency])
4. Results are retained per regulatory requirements
5. Alerts are configured for test failures
6. Current pass rate is [X]%

**Last Test Execution:** [Date]
**Current Pass Rate:** [X]%

**Signature:** _______________________
**Date:** _______________________

Back to Control 2.18 | Portal Walkthrough | PowerShell Setup | Troubleshooting