Control 1.4: Semantic Index Governance and Scope Control

Control ID: 1.4 Pillar: Readiness & Assessment Regulatory Reference: GLBA 501(b), FFIEC IT Handbook (Information Security Booklet), SOX 302/404 Last Verified: 2026-02-17 Governance Levels: Baseline / Recommended / Regulated

Objective

Understand and govern the scope of the Microsoft 365 Semantic Index, which underpins Copilot's ability to ground responses in organizational content. This control helps organizations comprehend what the Semantic Index indexes across M365 workloads, how it interacts with Microsoft Graph to provide Copilot with contextual understanding, and what governance levers are available to control indexing scope. Proper Semantic Index governance supports compliance with data protection requirements by providing visibility into the content pipeline that feeds Copilot's AI capabilities.

Why This Matters for FSI

GLBA 501(b): Understanding which customer data is processed by the Semantic Index is essential for maintaining safeguards over non-public personal information. The Semantic Index creates derived representations of content that must be governed alongside the source data.
FFIEC IT Handbook (Information Security): Risk assessment for new technology deployments must include understanding the technology's data processing pipeline. The Semantic Index is a core component of Copilot's data processing architecture.
SOX 302/404: Financial reporting data processed by the Semantic Index could be surfaced by Copilot in contexts outside normal financial controls. Understanding indexing scope supports internal control integrity.
SEC Regulation S-P: The Semantic Index processes content that may include consumer financial information subject to privacy safeguards. Governance of indexing scope supports these safeguards.
OCC SR 11-7 (Model Risk Management): The Semantic Index transforms raw content into vector embeddings that influence Copilot's responses. While not a traditional model, understanding this transformation pipeline aligns with model risk management principles.

Control Description

What Is the Semantic Index?

The Microsoft 365 Semantic Index is a content processing layer that enhances Microsoft Graph by creating vector embeddings (mathematical representations) of organizational content. These embeddings enable Copilot to understand content semantically rather than relying solely on keyword matching.

Component	Role	Data Flow
Microsoft Graph	Provides structured access to M365 data (files, emails, chats, meetings)	Source of raw content and metadata
Semantic Index	Creates vector embeddings of Graph content for semantic understanding	Processes Graph content into searchable embeddings
Copilot Orchestrator	Combines user prompt with Semantic Index results and LLM capabilities	Retrieves relevant content via Semantic Index, sends to LLM for response generation
LLM (Large Language Model)	Generates responses grounded in retrieved content	Receives content context from orchestrator, produces response

What Gets Indexed

The Semantic Index processes content across multiple M365 workloads:

Workload	Content Indexed	Scope
SharePoint Online	Document content, metadata, site pages	All sites the tenant has enabled for search indexing
OneDrive for Business	Personal files, shared documents	All OneDrive content included in search
Exchange Online	Email body, subject, attachments (text extracted)	User's mailbox content
Microsoft Teams	Chat messages, channel messages, meeting transcripts	All Teams content the user participates in
Loop	Loop component content	Shared Loop workspaces
OneNote	Notebook pages and sections	Notebooks accessible to the user

How Semantic Index Respects Permissions

The Semantic Index does not create a separate permission model. Access to indexed content follows the same Microsoft Graph permissions that govern direct access:

Principle	Implementation
User-scoped access	Copilot queries the Semantic Index in the context of the calling user's permissions
No privilege escalation	The Semantic Index does not grant access to content the user cannot already access via Graph
Real-time permission evaluation	Permission checks occur at query time, not at indexing time, so permission changes take effect immediately
Tenant boundary	Semantic Index content is isolated per tenant with no cross-tenant access

Governance Implications

Governance Concern	Implication	Mitigation
Indexing scope is broad by default	All search-indexed content is available to the Semantic Index	Use RSS (Control 1.3) to limit Copilot search scope; use Restricted Content Discovery to exclude specific sites
Embeddings are derived data	Vector embeddings are mathematical representations of content, not copies, but they enable content retrieval	Treat embedding governance as an extension of source content governance
Cross-workload synthesis	Semantic Index enables Copilot to combine information from multiple workloads	Monitor via DSPM Activity Explorer for unexpected cross-workload data surfacing
Indexing latency	New content is indexed with some delay; permission changes are evaluated in real-time	Factor indexing latency into content lifecycle governance
No selective user-level indexing opt-out	Administrators cannot exclude specific users' content from the Semantic Index while keeping Copilot enabled for those users	Use workload-level controls and Copilot license assignment to manage scope

Controlling What Gets Indexed

While the Semantic Index does not offer granular per-item indexing controls, several governance levers affect what content enters the index:

Control Lever	Mechanism	Scope
SharePoint search indexing	Sites excluded from SharePoint search are also excluded from Semantic Index	Per-site
Restricted SharePoint Search (RSS)	Limits Copilot grounding to allow-listed sites (does not prevent indexing, but prevents retrieval)	Tenant-wide, up to 100 sites
Restricted Content Discovery (RCD)	Excludes specific sites from Copilot discovery while maintaining direct access	Per-site via SAM
Copilot license assignment	Users without Copilot licenses do not query the Semantic Index (but their content may still be indexed for others)	Per-user
Sensitivity labels + DLP	DLP policies can restrict Copilot from processing content with specific sensitivity labels	Per-label
Information barriers	Prevents Copilot from surfacing content across barrier boundaries	Per-segment

Copilot Surface Coverage

Copilot Surface	Semantic Index Dependency	Governance Focus
Microsoft 365 Copilot Chat	Primary	Full cross-workload Semantic Index search -- broadest governance scope
Word / Excel / PowerPoint	High	Document-level semantic search for reference and drafting
Outlook	High	Email semantic search for summarization and drafting
Teams	High	Chat and meeting content semantic search
SharePoint	High	Site-scoped semantic search for content discovery
OneDrive	Medium	Personal content semantic search
Loop	Medium	Collaborative content semantic search
Copilot Pages	High	Semantic search for AI-generated collaborative artifacts
Viva	Medium	Organizational insights grounded in semantic understanding

Governance Levels

Level	Requirement	Rationale
Baseline	Document understanding of what the Semantic Index indexes across the organization's M365 workloads. Identify which workloads contain sensitive content that will be processed by the Semantic Index. Review SharePoint search indexing configuration for sites containing regulated data.	Minimum understanding of the Semantic Index data pipeline to inform governance decisions.
Recommended	All Baseline requirements plus: map Semantic Index scope against data classification inventory (from Control 1.1). Implement RSS or RCD controls to limit Copilot grounding scope for sensitive sites. Monitor Copilot content retrieval patterns via DSPM Activity Explorer. Document Semantic Index governance posture in Copilot governance documentation.	Provides active governance of Semantic Index scope with monitoring and documented controls.
Regulated	All Recommended requirements plus: conduct formal risk assessment of Semantic Index data processing for regulated data types (NPI, MNPI, PII). Document Semantic Index architecture and data flow in technology risk assessment. Evaluate embedding storage and lifecycle governance. Include Semantic Index scope in periodic Copilot governance reviews. Engage information security team in Semantic Index governance decisions.	Comprehensive understanding and governance of the Semantic Index data processing pipeline that supports examination readiness and regulatory inquiries about AI data handling.

Setup & Configuration

Step 1: Inventory Current Search Indexing Configuration

Review which SharePoint sites are currently included in search indexing:

Navigate to SharePoint Admin Center > Settings > Search to review tenant-level search configuration.

For site-level control, review individual site search settings:

# PowerShell reference (see Playbook 1.4.2):
# Get-SPOSite -Limit All | Select Url, DenyAddAndCustomizePages, DisableCompanyWideSharingLinks

Step 2: Map Semantic Index Scope to Data Classification

Cross-reference the workloads and sites indexed by the Semantic Index against the data classification inventory from Control 1.1:

Workload	Sites/Content in Index	Sensitivity Level	Governance Action Needed
SharePoint	[List sites]	[Classification]	[Action or N/A]
OneDrive	All user OneDrive	[Classification]	[Action or N/A]
Exchange	All mailboxes	[Classification]	[Action or N/A]
Teams	All teams/channels	[Classification]	[Action or N/A]

Step 3: Implement Scope Controls

Based on the mapping above, implement appropriate scope controls:

For sites with sensitive content not yet remediated: Apply RSS exclusion or RCD
For workloads with regulated data: Ensure DLP policies and sensitivity labels are in place
For content that should not be in Copilot scope: Exclude from SharePoint search indexing

Step 4: Document Semantic Index Governance Posture

Create and maintain documentation that describes: - Current Semantic Index scope across workloads - Controls in place to govern indexing scope - Monitoring mechanisms for Copilot content retrieval - Decision rationale for scope inclusions and exclusions

Financial Sector Considerations

Material Non-Public Information (MNPI): Content containing MNPI (earnings data, M&A details, trading strategies) is indexed by the Semantic Index if it resides in indexed M365 locations. Financial institutions must ensure information barriers and permission controls prevent Copilot from surfacing MNPI across restricted boundaries.
Embedding Data Residency: The Semantic Index processes and stores embeddings within the Microsoft 365 tenant boundary. For institutions with data residency requirements, confirm that embedding processing aligns with data residency commitments in Microsoft's service agreements.
Audit Trail for AI Processing: Regulators may inquire about what data the AI system processes. Understanding Semantic Index scope enables institutions to provide informed responses about Copilot's data processing pipeline.
Model Risk Management Parallels: While the Semantic Index is not a model in the traditional OCC SR 11-7 sense, it transforms data in ways that affect AI outputs. Institutions with mature model risk management practices should consider whether the Semantic Index warrants inclusion in their AI/ML inventory.
Cross-Workload Data Leakage: The Semantic Index enables Copilot to synthesize information across workloads (e.g., combining SharePoint document content with email context). This cross-workload synthesis may surface correlations that would not be apparent from any single data source, creating novel information leakage risks.
Retention and Deletion: When source content is deleted or retention policies expire, the corresponding Semantic Index embeddings are also removed. Verify that this lifecycle alignment meets regulatory retention requirements.

Verification Criteria

Documentation exists describing what the Semantic Index indexes across each M365 workload in the organization
SharePoint search indexing configuration has been reviewed and sites containing regulated data are documented
Semantic Index scope has been mapped against the data classification inventory from Control 1.1
Appropriate scope controls (RSS, RCD, search exclusion) are in place for sites containing sensitive or regulated content
DSPM Activity Explorer monitoring is configured to track Copilot content retrieval patterns (Recommended and Regulated levels)
Formal risk assessment of Semantic Index data processing has been conducted for regulated data types (Regulated level)
Semantic Index governance posture is documented in the organization's Copilot governance documentation
Information security team has reviewed and approved the Semantic Index governance posture (Regulated level)
Semantic Index data flow and architecture are documented in the organization's technology risk assessment (Regulated level)
Periodic review cadence for Semantic Index governance is established (quarterly minimum)

Additional Resources

Microsoft Learn: Semantic Index for Copilot
Microsoft Learn: Microsoft Graph overview
Microsoft Learn: How Microsoft 365 Copilot works
Microsoft Learn: Data, Privacy, and Security for Microsoft 365 Copilot
OCC SR 11-7: Model Risk Management
Related Controls: 1.3 Restricted SharePoint Search, 1.1 Copilot Readiness Assessment, 1.7 SharePoint Advanced Management, 3.1 Copilot Audit Logging, 4.5 Usage Analytics
Playbooks: Playbook 1.4.1 (Semantic Index Architecture Briefing), Playbook 1.4.2 (Search Index Audit Scripts), Playbook 1.4.3 (Scope Control Configuration), Playbook 1.4.4 (Semantic Index Risk Assessment Template)

FSI Copilot Governance Framework v1.2.1 - March 2026