Control 1.4: Semantic Index Governance and Scope Control
Control ID: 1.4 Pillar: Readiness & Assessment Regulatory Reference: GLBA §501(b), FFIEC IT Handbook (Information Security Booklet), Sarbanes-Oxley §§302/404 — where applicable to ICFR Last Verified: 2026-05-25 Governance Levels: Baseline / Recommended / Regulated
Objective
Understand and govern the scope of the Microsoft 365 Semantic Index, which underpins Copilot's ability to ground responses in organizational content. This control helps organizations comprehend what the Semantic Index indexes across M365 workloads, how it interacts with Microsoft Graph to provide Copilot with contextual understanding, and what governance levers are available to control indexing scope. Proper Semantic Index governance supports compliance with data protection requirements by providing visibility into the content pipeline that feeds Copilot's AI capabilities.
Why This Matters for FSI
- GLBA §501(b): Understanding which customer data is processed by the Semantic Index is essential for maintaining safeguards over non-public personal information. The Semantic Index creates derived representations of content that must be governed alongside the source data.
- FFIEC IT Handbook (Information Security): Risk assessment for new technology deployments must include understanding the technology's data processing pipeline. The Semantic Index is a core component of Copilot's data processing architecture.
- Sarbanes-Oxley §§302/404 — where applicable to ICFR: Financial reporting data processed by the Semantic Index could be surfaced by Copilot in contexts outside normal financial controls. Understanding indexing scope supports internal control integrity where AI tools affect financial reporting processes.
- SEC Regulation S-P: The Semantic Index processes content that may include consumer financial information subject to privacy safeguards. Governance of indexing scope supports these safeguards.
- SR 11-7 / OCC Bulletin 2011-12 (Model Risk Management): The Semantic Index transforms raw content into vector embeddings that influence Copilot's responses. While not a traditional model, understanding this transformation pipeline aligns with model risk management principles.
Control Description
What Is the Semantic Index?
The Microsoft 365 Semantic Index is a content processing layer that enhances Microsoft Graph by creating vector embeddings (mathematical representations) of organizational content. These embeddings enable Copilot to understand content semantically rather than relying solely on keyword matching.
| Component | Role | Data Flow |
|---|---|---|
| Microsoft Graph | Provides structured access to M365 data (files, emails, chats, meetings) | Source of raw content and metadata |
| Semantic Index | Creates vector embeddings of Graph content for semantic understanding | Processes Graph content into searchable embeddings |
| Copilot Orchestrator | Combines user prompt with Semantic Index results and LLM capabilities | Retrieves relevant content via Semantic Index, sends to LLM for response generation |
| LLM (Large Language Model) | Generates responses grounded in retrieved content | Receives content context from orchestrator, produces response |
What Gets Indexed
The Semantic Index processes content across multiple M365 workloads:
| Workload | Content Indexed | Scope |
|---|---|---|
| SharePoint Online | Document content, metadata, site pages | All sites the tenant has enabled for search indexing |
| OneDrive for Business | Personal files, shared documents | All OneDrive content included in search |
| Exchange Online | Email body, subject, attachments (text extracted) | User's mailbox content |
| Microsoft Teams | Chat messages, channel messages, meeting transcripts | All Teams content the user participates in |
| Loop | Loop component content | Shared Loop workspaces |
| OneNote | Notebook pages and sections | Notebooks accessible to the user |
How Semantic Index Respects Permissions
The Semantic Index does not create a separate permission model. Access to indexed content follows the same Microsoft Graph permissions that govern direct access:
| Principle | Implementation |
|---|---|
| User-scoped access | Copilot queries the Semantic Index in the context of the calling user's permissions |
| No privilege escalation | The Semantic Index does not grant access to content the user cannot already access via Graph |
| Real-time permission evaluation | Permission checks occur at query time, not at indexing time, so permission changes take effect without re-indexing (propagation timing may vary by workload) |
| Tenant boundary | Semantic Index content is isolated per tenant with no cross-tenant access |
Governance Implications
| Governance Concern | Implication | Mitigation |
|---|---|---|
| Indexing scope is broad by default | All search-indexed content is available to the Semantic Index | Use RSS (Control 1.3) to limit Copilot search scope; use Restricted Content Discovery to exclude specific sites; note that the Semantic Index does not expose per-item admin controls — scope governance operates at the site, workload, and policy level |
| Embeddings are derived data | Vector embeddings are mathematical representations of content, not copies, but they enable content retrieval | Treat embedding governance as an extension of source content governance |
| Cross-workload synthesis | Semantic Index enables Copilot to combine information from multiple workloads | Monitor via DSPM Activity Explorer for unexpected cross-workload data surfacing |
| Indexing latency | New content is indexed with some delay; permission changes are evaluated in real-time | Factor indexing latency into content lifecycle governance |
| No selective user-level indexing opt-out | Administrators cannot exclude specific users' content from the Semantic Index while keeping Copilot enabled for those users | Use workload-level controls and Copilot license assignment to manage scope |
Controlling What Gets Indexed
While the Semantic Index does not offer granular per-item indexing controls, several governance levers affect what content enters the index:
| Control Lever | Mechanism | Scope |
|---|---|---|
| SharePoint search indexing | Sites excluded from SharePoint search are also excluded from Semantic Index | Per-site |
| Restricted SharePoint Search (RSS) | Limits Copilot grounding to allow-listed sites (does not prevent indexing, but prevents retrieval) | Tenant-wide, up to 100 sites |
| Restricted Content Discovery (RCD) | Excludes specific sites from Copilot discovery while maintaining direct access | Per-site via SAM |
| Copilot license assignment | Users without Copilot licenses do not query the Semantic Index (but their content may still be indexed for others) | Per-user |
| Sensitivity labels + DLP | DLP policies can restrict Copilot from processing content with specific sensitivity labels | Per-label |
| Information barriers | Prevents Copilot from surfacing content across barrier boundaries | Per-segment |
Metadata-Aware Query Readiness
The Semantic Index supports metadata-aware scoped queries at the library and folder level — Copilot can filter retrieval by SharePoint column values, content types, and managed metadata when content is properly tagged. Organizations should assess metadata readiness as a governance lever:
| Readiness Dimension | What to Assess | Copilot Impact |
|---|---|---|
| Column/metadata population | Are SharePoint library columns (document type, department, status, classification) consistently populated across in-scope sites? | Copilot can scope retrieval to specific metadata values (e.g., "approved" documents only) when metadata is populated; empty columns reduce retrieval precision |
| Content type adoption | Are SharePoint content types used to classify documents within libraries? | Content types provide structured metadata that Copilot uses for scoped queries — libraries using only the default Document content type miss this governance lever |
| Managed metadata (term store) | Is the tenant term store configured with a consistent taxonomy for key business domains? | Consistent taxonomy enables Copilot to understand organizational relationships (department hierarchy, product taxonomy, regulatory domain) and scope queries accordingly |
| Folder structure conventions | Do libraries use a consistent folder hierarchy with meaningful naming? | Copilot can scope retrieval to folder paths, but inconsistent or deeply nested structures reduce the effectiveness of folder-based scoping |
| Default column values | Are default column values configured at the library or folder level to support auto-classification? | Default values reduce the burden on content authors and improve metadata completeness, which in turn improves Copilot retrieval precision |
Assessment steps:
- Identify the top 20 SharePoint sites in Copilot scope by usage volume
- For each site, audit metadata population rates for key columns (target: ≥80% population for governance-relevant columns)
- Review content type usage — sites relying solely on the default Document content type should be prioritized for content type deployment
- Verify term store taxonomy covers the institution's primary business domains and regulatory categories
- Document metadata readiness gaps in the Copilot governance assessment and include in the remediation roadmap (see Control 1.8 for the full information architecture review)
Authoritative Content Management for Copilot Search
Administrators can designate SharePoint sites as authoritative content to influence how Copilot ranks and prioritizes organizational knowledge when generating responses. Authoritative sites receive higher ranking in Copilot Search results, improving the trustworthiness and relevance of AI-generated answers.
Portal: M365 Admin Center > Copilot > Search > Authoritative content
| Setting | Description | Limit |
|---|---|---|
| Authoritative sites | SharePoint sites designated as high-trust sources for Copilot Search ranking | Verify current service limits in Microsoft documentation |
| Bookmarks | Admin-defined answers that appear prominently for specific search queries | Managed from M365 Admin Center > Settings > Search & intelligence > Answers |
| Acronyms | Organization-specific acronym definitions surfaced by Copilot and Microsoft Search | Managed from M365 Admin Center > Settings > Search & intelligence > Answers |
FSI relevance:
- Policy documents: Designate SharePoint sites hosting compliance policies, written supervisory procedures, and regulatory guidance as authoritative to ensure Copilot prioritizes official firm positions over informal content
- Approved product materials: Mark sites containing approved marketing materials, product disclosures, and client-facing templates as authoritative to reduce the risk of Copilot surfacing draft or unapproved content
- Regulatory guidance: Designate sites hosting internal regulatory interpretations and examination preparation materials as authoritative sources
- Bookmarks: Create bookmarks for frequently asked compliance or operational questions so Copilot and Microsoft Search return the approved answer consistently
- Acronyms: Define firm-specific and industry acronyms (e.g., MNPI, NPI, WSP, CRD) to help Copilot interpret user prompts accurately
Organizations should review and update authoritative site designations, bookmarks, and acronyms at least quarterly to reflect changes in content governance and organizational priorities.
Copilot Surface Coverage
| Copilot Surface | Semantic Index Dependency | Governance Focus |
|---|---|---|
| Microsoft 365 Copilot Chat | Primary | Full cross-workload Semantic Index search -- broadest governance scope |
| Word / Excel / PowerPoint | High | Document-level semantic search for reference and drafting |
| Outlook | High | Email semantic search for summarization and drafting |
| Teams | High | Chat and meeting content semantic search |
| SharePoint | High | Site-scoped semantic search for content discovery |
| OneDrive | Medium | Personal content semantic search |
| Loop | Medium | Collaborative content semantic search |
| Copilot Pages | High | Semantic search for AI-generated collaborative artifacts |
| Copilot Notebooks | High | Semantic search for AI-generated analytical notebooks |
| Viva | Medium | Organizational insights grounded in semantic understanding |
Governance Levels
| Level | Requirement | Rationale |
|---|---|---|
| Baseline | Document understanding of what the Semantic Index indexes across the organization's M365 workloads. Identify which workloads contain sensitive content that will be processed by the Semantic Index. Review SharePoint search indexing configuration for sites containing regulated data. | Minimum understanding of the Semantic Index data pipeline to inform governance decisions. |
| Recommended | All Baseline requirements plus: map Semantic Index scope against data classification inventory (from Control 1.1). Implement RSS or RCD controls to limit Copilot grounding scope for sensitive sites. Monitor Copilot content retrieval patterns via DSPM Activity Explorer. Document Semantic Index governance posture in Copilot governance documentation. | Provides active governance of Semantic Index scope with monitoring and documented controls. |
| Regulated | All Recommended requirements plus: conduct formal risk assessment of Semantic Index data processing for regulated data types (NPI, MNPI, PII). Document Semantic Index architecture and data flow in technology risk assessment. Evaluate embedding storage and lifecycle governance. Include Semantic Index scope in periodic Copilot governance reviews. Engage information security team in Semantic Index governance decisions. | Comprehensive understanding and governance of the Semantic Index data processing pipeline that supports examination readiness and regulatory inquiries about AI data handling. |
Setup & Configuration
Step 1: Inventory Current Search Indexing Configuration
Review which SharePoint sites are currently included in search indexing:
Navigate to SharePoint admin center > Settings > Search to review tenant-level search configuration.
For site-level control, review individual site search settings:
# PowerShell reference (see PowerShell Setup playbook for full walkthrough):
# Get-SPOSite -Limit All | Select Url, DisableCompanyWideSharingLinks
# Review search index exclusion status per site
Step 2: Map Semantic Index Scope to Data Classification
Cross-reference the workloads and sites indexed by the Semantic Index against the data classification inventory from Control 1.1:
| Workload | Sites/Content in Index | Sensitivity Level | Governance Action Needed |
|---|---|---|---|
| SharePoint | [List sites] | [Classification] | [Action or N/A] |
| OneDrive | All user OneDrive | [Classification] | [Action or N/A] |
| Exchange | All mailboxes | [Classification] | [Action or N/A] |
| Teams | All teams/channels | [Classification] | [Action or N/A] |
Step 3: Implement Scope Controls
Based on the mapping above, implement appropriate scope controls:
- For sites with sensitive content not yet remediated: Use RSS allow-list scoping (Control 1.3) to exclude from Copilot grounding, or apply RCD to exclude specific sites
- For workloads with regulated data: Ensure DLP policies and sensitivity labels are in place
- For content that should not be in Copilot scope: Exclude from SharePoint search indexing
Step 4: Document Semantic Index Governance Posture
Create and maintain documentation that describes: - Current Semantic Index scope across workloads - Controls in place to govern indexing scope - Monitoring mechanisms for Copilot content retrieval - Decision rationale for scope inclusions and exclusions
Financial Sector Considerations
- Material Non-Public Information (MNPI): Content containing MNPI (earnings data, M&A details, trading strategies) is indexed by the Semantic Index if it resides in indexed M365 locations. Financial institutions must ensure information barriers and permission controls prevent Copilot from surfacing MNPI across restricted boundaries.
- Embedding Data Residency: The Semantic Index processes and stores embeddings within the Microsoft 365 tenant boundary. For institutions with data residency requirements, confirm that embedding processing aligns with data residency commitments in Microsoft's service agreements.
- Audit Trail for AI Processing: Regulators may inquire about what data the AI system processes. Understanding Semantic Index scope enables institutions to provide informed responses about Copilot's data processing pipeline.
- Model Risk Management Parallels: While the Semantic Index is not a model in the traditional SR 11-7 / OCC Bulletin 2011-12 sense, it transforms data in ways that affect AI outputs. Institutions with mature model risk management practices should consider whether the Semantic Index warrants inclusion in their AI/ML inventory.
- Cross-Workload Data Leakage: The Semantic Index enables Copilot to synthesize information across workloads (e.g., combining SharePoint document content with email context). This cross-workload synthesis may surface correlations that would not be apparent from any single data source, creating novel information leakage risks.
- Retention and Deletion: When source content is deleted or retention policies expire, the corresponding Semantic Index representations should be removed on Microsoft service timelines. Organizations should verify this lifecycle alignment meets regulatory retention requirements through tenant testing.
Verification Criteria
- Documentation exists describing what the Semantic Index indexes across each M365 workload in the organization
- SharePoint search indexing configuration has been reviewed and sites containing regulated data are documented
- Semantic Index scope has been mapped against the data classification inventory from Control 1.1
- Appropriate scope controls (RSS, RCD, search exclusion) are in place for sites containing sensitive or regulated content
- DSPM Activity Explorer monitoring is configured to track Copilot content retrieval patterns (Recommended and Regulated levels)
- Formal risk assessment of Semantic Index data processing has been conducted for regulated data types (Regulated level)
- Semantic Index governance posture is documented in the organization's Copilot governance documentation
- Information security team has reviewed and approved the Semantic Index governance posture (Regulated level)
- Semantic Index data flow and architecture are documented in the organization's technology risk assessment (Regulated level)
- Periodic review cadence for Semantic Index governance is established (quarterly minimum)
Additional Resources
- Microsoft Learn: Semantic indexing for Microsoft 365 Copilot
- Microsoft Learn: Microsoft Graph overview
- Microsoft Learn: How Microsoft 365 Copilot works
- Microsoft Learn: Data, Privacy, and Security for Microsoft 365 Copilot
- SR 11-7 / OCC Bulletin 2011-12: Model Risk Management
- Related Controls: 1.3 Restricted SharePoint Search, 1.1 Copilot Readiness Assessment, 1.7 SharePoint Advanced Management, 3.1 Copilot Audit Logging, 4.5 Usage Analytics
- Playbooks: Portal Walkthrough, PowerShell Setup, Verification & Testing, Troubleshooting
FSI Copilot Governance Framework v1.4.0 - April 2026