Control 1.4: Semantic Index Governance and Scope Control
Control ID: 1.4 Pillar: Readiness & Assessment Regulatory Reference: GLBA 501(b), FFIEC IT Handbook (Information Security Booklet), SOX 302/404 Last Verified: 2026-02-17 Governance Levels: Baseline / Recommended / Regulated
Objective
Understand and govern the scope of the Microsoft 365 Semantic Index, which underpins Copilot's ability to ground responses in organizational content. This control helps organizations comprehend what the Semantic Index indexes across M365 workloads, how it interacts with Microsoft Graph to provide Copilot with contextual understanding, and what governance levers are available to control indexing scope. Proper Semantic Index governance supports compliance with data protection requirements by providing visibility into the content pipeline that feeds Copilot's AI capabilities.
Why This Matters for FSI
- GLBA 501(b): Understanding which customer data is processed by the Semantic Index is essential for maintaining safeguards over non-public personal information. The Semantic Index creates derived representations of content that must be governed alongside the source data.
- FFIEC IT Handbook (Information Security): Risk assessment for new technology deployments must include understanding the technology's data processing pipeline. The Semantic Index is a core component of Copilot's data processing architecture.
- SOX 302/404: Financial reporting data processed by the Semantic Index could be surfaced by Copilot in contexts outside normal financial controls. Understanding indexing scope supports internal control integrity.
- SEC Regulation S-P: The Semantic Index processes content that may include consumer financial information subject to privacy safeguards. Governance of indexing scope supports these safeguards.
- OCC SR 11-7 (Model Risk Management): The Semantic Index transforms raw content into vector embeddings that influence Copilot's responses. While not a traditional model, understanding this transformation pipeline aligns with model risk management principles.
Control Description
What Is the Semantic Index?
The Microsoft 365 Semantic Index is a content processing layer that enhances Microsoft Graph by creating vector embeddings (mathematical representations) of organizational content. These embeddings enable Copilot to understand content semantically rather than relying solely on keyword matching.
| Component | Role | Data Flow |
|---|---|---|
| Microsoft Graph | Provides structured access to M365 data (files, emails, chats, meetings) | Source of raw content and metadata |
| Semantic Index | Creates vector embeddings of Graph content for semantic understanding | Processes Graph content into searchable embeddings |
| Copilot Orchestrator | Combines user prompt with Semantic Index results and LLM capabilities | Retrieves relevant content via Semantic Index, sends to LLM for response generation |
| LLM (Large Language Model) | Generates responses grounded in retrieved content | Receives content context from orchestrator, produces response |
What Gets Indexed
The Semantic Index processes content across multiple M365 workloads:
| Workload | Content Indexed | Scope |
|---|---|---|
| SharePoint Online | Document content, metadata, site pages | All sites the tenant has enabled for search indexing |
| OneDrive for Business | Personal files, shared documents | All OneDrive content included in search |
| Exchange Online | Email body, subject, attachments (text extracted) | User's mailbox content |
| Microsoft Teams | Chat messages, channel messages, meeting transcripts | All Teams content the user participates in |
| Loop | Loop component content | Shared Loop workspaces |
| OneNote | Notebook pages and sections | Notebooks accessible to the user |
How Semantic Index Respects Permissions
The Semantic Index does not create a separate permission model. Access to indexed content follows the same Microsoft Graph permissions that govern direct access:
| Principle | Implementation |
|---|---|
| User-scoped access | Copilot queries the Semantic Index in the context of the calling user's permissions |
| No privilege escalation | The Semantic Index does not grant access to content the user cannot already access via Graph |
| Real-time permission evaluation | Permission checks occur at query time, not at indexing time, so permission changes take effect immediately |
| Tenant boundary | Semantic Index content is isolated per tenant with no cross-tenant access |
Governance Implications
| Governance Concern | Implication | Mitigation |
|---|---|---|
| Indexing scope is broad by default | All search-indexed content is available to the Semantic Index | Use RSS (Control 1.3) to limit Copilot search scope; use Restricted Content Discovery to exclude specific sites |
| Embeddings are derived data | Vector embeddings are mathematical representations of content, not copies, but they enable content retrieval | Treat embedding governance as an extension of source content governance |
| Cross-workload synthesis | Semantic Index enables Copilot to combine information from multiple workloads | Monitor via DSPM Activity Explorer for unexpected cross-workload data surfacing |
| Indexing latency | New content is indexed with some delay; permission changes are evaluated in real-time | Factor indexing latency into content lifecycle governance |
| No selective user-level indexing opt-out | Administrators cannot exclude specific users' content from the Semantic Index while keeping Copilot enabled for those users | Use workload-level controls and Copilot license assignment to manage scope |
Controlling What Gets Indexed
While the Semantic Index does not offer granular per-item indexing controls, several governance levers affect what content enters the index:
| Control Lever | Mechanism | Scope |
|---|---|---|
| SharePoint search indexing | Sites excluded from SharePoint search are also excluded from Semantic Index | Per-site |
| Restricted SharePoint Search (RSS) | Limits Copilot grounding to allow-listed sites (does not prevent indexing, but prevents retrieval) | Tenant-wide, up to 100 sites |
| Restricted Content Discovery (RCD) | Excludes specific sites from Copilot discovery while maintaining direct access | Per-site via SAM |
| Copilot license assignment | Users without Copilot licenses do not query the Semantic Index (but their content may still be indexed for others) | Per-user |
| Sensitivity labels + DLP | DLP policies can restrict Copilot from processing content with specific sensitivity labels | Per-label |
| Information barriers | Prevents Copilot from surfacing content across barrier boundaries | Per-segment |
Copilot Surface Coverage
| Copilot Surface | Semantic Index Dependency | Governance Focus |
|---|---|---|
| Microsoft 365 Copilot Chat | Primary | Full cross-workload Semantic Index search -- broadest governance scope |
| Word / Excel / PowerPoint | High | Document-level semantic search for reference and drafting |
| Outlook | High | Email semantic search for summarization and drafting |
| Teams | High | Chat and meeting content semantic search |
| SharePoint | High | Site-scoped semantic search for content discovery |
| OneDrive | Medium | Personal content semantic search |
| Loop | Medium | Collaborative content semantic search |
| Copilot Pages | High | Semantic search for AI-generated collaborative artifacts |
| Viva | Medium | Organizational insights grounded in semantic understanding |
Governance Levels
| Level | Requirement | Rationale |
|---|---|---|
| Baseline | Document understanding of what the Semantic Index indexes across the organization's M365 workloads. Identify which workloads contain sensitive content that will be processed by the Semantic Index. Review SharePoint search indexing configuration for sites containing regulated data. | Minimum understanding of the Semantic Index data pipeline to inform governance decisions. |
| Recommended | All Baseline requirements plus: map Semantic Index scope against data classification inventory (from Control 1.1). Implement RSS or RCD controls to limit Copilot grounding scope for sensitive sites. Monitor Copilot content retrieval patterns via DSPM Activity Explorer. Document Semantic Index governance posture in Copilot governance documentation. | Provides active governance of Semantic Index scope with monitoring and documented controls. |
| Regulated | All Recommended requirements plus: conduct formal risk assessment of Semantic Index data processing for regulated data types (NPI, MNPI, PII). Document Semantic Index architecture and data flow in technology risk assessment. Evaluate embedding storage and lifecycle governance. Include Semantic Index scope in periodic Copilot governance reviews. Engage information security team in Semantic Index governance decisions. | Comprehensive understanding and governance of the Semantic Index data processing pipeline that supports examination readiness and regulatory inquiries about AI data handling. |
Setup & Configuration
Step 1: Inventory Current Search Indexing Configuration
Review which SharePoint sites are currently included in search indexing:
Navigate to SharePoint Admin Center > Settings > Search to review tenant-level search configuration.
For site-level control, review individual site search settings:
# PowerShell reference (see Playbook 1.4.2):
# Get-SPOSite -Limit All | Select Url, DenyAddAndCustomizePages, DisableCompanyWideSharingLinks
Step 2: Map Semantic Index Scope to Data Classification
Cross-reference the workloads and sites indexed by the Semantic Index against the data classification inventory from Control 1.1:
| Workload | Sites/Content in Index | Sensitivity Level | Governance Action Needed |
|---|---|---|---|
| SharePoint | [List sites] | [Classification] | [Action or N/A] |
| OneDrive | All user OneDrive | [Classification] | [Action or N/A] |
| Exchange | All mailboxes | [Classification] | [Action or N/A] |
| Teams | All teams/channels | [Classification] | [Action or N/A] |
Step 3: Implement Scope Controls
Based on the mapping above, implement appropriate scope controls:
- For sites with sensitive content not yet remediated: Apply RSS exclusion or RCD
- For workloads with regulated data: Ensure DLP policies and sensitivity labels are in place
- For content that should not be in Copilot scope: Exclude from SharePoint search indexing
Step 4: Document Semantic Index Governance Posture
Create and maintain documentation that describes: - Current Semantic Index scope across workloads - Controls in place to govern indexing scope - Monitoring mechanisms for Copilot content retrieval - Decision rationale for scope inclusions and exclusions
Financial Sector Considerations
- Material Non-Public Information (MNPI): Content containing MNPI (earnings data, M&A details, trading strategies) is indexed by the Semantic Index if it resides in indexed M365 locations. Financial institutions must ensure information barriers and permission controls prevent Copilot from surfacing MNPI across restricted boundaries.
- Embedding Data Residency: The Semantic Index processes and stores embeddings within the Microsoft 365 tenant boundary. For institutions with data residency requirements, confirm that embedding processing aligns with data residency commitments in Microsoft's service agreements.
- Audit Trail for AI Processing: Regulators may inquire about what data the AI system processes. Understanding Semantic Index scope enables institutions to provide informed responses about Copilot's data processing pipeline.
- Model Risk Management Parallels: While the Semantic Index is not a model in the traditional OCC SR 11-7 sense, it transforms data in ways that affect AI outputs. Institutions with mature model risk management practices should consider whether the Semantic Index warrants inclusion in their AI/ML inventory.
- Cross-Workload Data Leakage: The Semantic Index enables Copilot to synthesize information across workloads (e.g., combining SharePoint document content with email context). This cross-workload synthesis may surface correlations that would not be apparent from any single data source, creating novel information leakage risks.
- Retention and Deletion: When source content is deleted or retention policies expire, the corresponding Semantic Index embeddings are also removed. Verify that this lifecycle alignment meets regulatory retention requirements.
Verification Criteria
- Documentation exists describing what the Semantic Index indexes across each M365 workload in the organization
- SharePoint search indexing configuration has been reviewed and sites containing regulated data are documented
- Semantic Index scope has been mapped against the data classification inventory from Control 1.1
- Appropriate scope controls (RSS, RCD, search exclusion) are in place for sites containing sensitive or regulated content
- DSPM Activity Explorer monitoring is configured to track Copilot content retrieval patterns (Recommended and Regulated levels)
- Formal risk assessment of Semantic Index data processing has been conducted for regulated data types (Regulated level)
- Semantic Index governance posture is documented in the organization's Copilot governance documentation
- Information security team has reviewed and approved the Semantic Index governance posture (Regulated level)
- Semantic Index data flow and architecture are documented in the organization's technology risk assessment (Regulated level)
- Periodic review cadence for Semantic Index governance is established (quarterly minimum)
Additional Resources
- Microsoft Learn: Semantic Index for Copilot
- Microsoft Learn: Microsoft Graph overview
- Microsoft Learn: How Microsoft 365 Copilot works
- Microsoft Learn: Data, Privacy, and Security for Microsoft 365 Copilot
- OCC SR 11-7: Model Risk Management
- Related Controls: 1.3 Restricted SharePoint Search, 1.1 Copilot Readiness Assessment, 1.7 SharePoint Advanced Management, 3.1 Copilot Audit Logging, 4.5 Usage Analytics
- Playbooks: Playbook 1.4.1 (Semantic Index Architecture Briefing), Playbook 1.4.2 (Search Index Audit Scripts), Playbook 1.4.3 (Scope Control Configuration), Playbook 1.4.4 (Semantic Index Risk Assessment Template)
FSI Copilot Governance Framework v1.2.1 - March 2026