Skip to content

Control 1.4: Semantic Index Governance and Scope Control

Control ID: 1.4 Pillar: Readiness & Assessment Regulatory Reference: GLBA 501(b), FFIEC IT Handbook (Information Security Booklet), SOX 302/404 Last Verified: 2026-02-17 Governance Levels: Baseline / Recommended / Regulated


Objective

Understand and govern the scope of the Microsoft 365 Semantic Index, which underpins Copilot's ability to ground responses in organizational content. This control helps organizations comprehend what the Semantic Index indexes across M365 workloads, how it interacts with Microsoft Graph to provide Copilot with contextual understanding, and what governance levers are available to control indexing scope. Proper Semantic Index governance supports compliance with data protection requirements by providing visibility into the content pipeline that feeds Copilot's AI capabilities.


Why This Matters for FSI

  • GLBA 501(b): Understanding which customer data is processed by the Semantic Index is essential for maintaining safeguards over non-public personal information. The Semantic Index creates derived representations of content that must be governed alongside the source data.
  • FFIEC IT Handbook (Information Security): Risk assessment for new technology deployments must include understanding the technology's data processing pipeline. The Semantic Index is a core component of Copilot's data processing architecture.
  • SOX 302/404: Financial reporting data processed by the Semantic Index could be surfaced by Copilot in contexts outside normal financial controls. Understanding indexing scope supports internal control integrity.
  • SEC Regulation S-P: The Semantic Index processes content that may include consumer financial information subject to privacy safeguards. Governance of indexing scope supports these safeguards.
  • OCC SR 11-7 (Model Risk Management): The Semantic Index transforms raw content into vector embeddings that influence Copilot's responses. While not a traditional model, understanding this transformation pipeline aligns with model risk management principles.

Control Description

What Is the Semantic Index?

The Microsoft 365 Semantic Index is a content processing layer that enhances Microsoft Graph by creating vector embeddings (mathematical representations) of organizational content. These embeddings enable Copilot to understand content semantically rather than relying solely on keyword matching.

Component Role Data Flow
Microsoft Graph Provides structured access to M365 data (files, emails, chats, meetings) Source of raw content and metadata
Semantic Index Creates vector embeddings of Graph content for semantic understanding Processes Graph content into searchable embeddings
Copilot Orchestrator Combines user prompt with Semantic Index results and LLM capabilities Retrieves relevant content via Semantic Index, sends to LLM for response generation
LLM (Large Language Model) Generates responses grounded in retrieved content Receives content context from orchestrator, produces response

What Gets Indexed

The Semantic Index processes content across multiple M365 workloads:

Workload Content Indexed Scope
SharePoint Online Document content, metadata, site pages All sites the tenant has enabled for search indexing
OneDrive for Business Personal files, shared documents All OneDrive content included in search
Exchange Online Email body, subject, attachments (text extracted) User's mailbox content
Microsoft Teams Chat messages, channel messages, meeting transcripts All Teams content the user participates in
Loop Loop component content Shared Loop workspaces
OneNote Notebook pages and sections Notebooks accessible to the user

How Semantic Index Respects Permissions

The Semantic Index does not create a separate permission model. Access to indexed content follows the same Microsoft Graph permissions that govern direct access:

Principle Implementation
User-scoped access Copilot queries the Semantic Index in the context of the calling user's permissions
No privilege escalation The Semantic Index does not grant access to content the user cannot already access via Graph
Real-time permission evaluation Permission checks occur at query time, not at indexing time, so permission changes take effect immediately
Tenant boundary Semantic Index content is isolated per tenant with no cross-tenant access

Governance Implications

Governance Concern Implication Mitigation
Indexing scope is broad by default All search-indexed content is available to the Semantic Index Use RSS (Control 1.3) to limit Copilot search scope; use Restricted Content Discovery to exclude specific sites
Embeddings are derived data Vector embeddings are mathematical representations of content, not copies, but they enable content retrieval Treat embedding governance as an extension of source content governance
Cross-workload synthesis Semantic Index enables Copilot to combine information from multiple workloads Monitor via DSPM Activity Explorer for unexpected cross-workload data surfacing
Indexing latency New content is indexed with some delay; permission changes are evaluated in real-time Factor indexing latency into content lifecycle governance
No selective user-level indexing opt-out Administrators cannot exclude specific users' content from the Semantic Index while keeping Copilot enabled for those users Use workload-level controls and Copilot license assignment to manage scope

Controlling What Gets Indexed

While the Semantic Index does not offer granular per-item indexing controls, several governance levers affect what content enters the index:

Control Lever Mechanism Scope
SharePoint search indexing Sites excluded from SharePoint search are also excluded from Semantic Index Per-site
Restricted SharePoint Search (RSS) Limits Copilot grounding to allow-listed sites (does not prevent indexing, but prevents retrieval) Tenant-wide, up to 100 sites
Restricted Content Discovery (RCD) Excludes specific sites from Copilot discovery while maintaining direct access Per-site via SAM
Copilot license assignment Users without Copilot licenses do not query the Semantic Index (but their content may still be indexed for others) Per-user
Sensitivity labels + DLP DLP policies can restrict Copilot from processing content with specific sensitivity labels Per-label
Information barriers Prevents Copilot from surfacing content across barrier boundaries Per-segment

Copilot Surface Coverage

Copilot Surface Semantic Index Dependency Governance Focus
Microsoft 365 Copilot Chat Primary Full cross-workload Semantic Index search -- broadest governance scope
Word / Excel / PowerPoint High Document-level semantic search for reference and drafting
Outlook High Email semantic search for summarization and drafting
Teams High Chat and meeting content semantic search
SharePoint High Site-scoped semantic search for content discovery
OneDrive Medium Personal content semantic search
Loop Medium Collaborative content semantic search
Copilot Pages High Semantic search for AI-generated collaborative artifacts
Viva Medium Organizational insights grounded in semantic understanding

Governance Levels

Level Requirement Rationale
Baseline Document understanding of what the Semantic Index indexes across the organization's M365 workloads. Identify which workloads contain sensitive content that will be processed by the Semantic Index. Review SharePoint search indexing configuration for sites containing regulated data. Minimum understanding of the Semantic Index data pipeline to inform governance decisions.
Recommended All Baseline requirements plus: map Semantic Index scope against data classification inventory (from Control 1.1). Implement RSS or RCD controls to limit Copilot grounding scope for sensitive sites. Monitor Copilot content retrieval patterns via DSPM Activity Explorer. Document Semantic Index governance posture in Copilot governance documentation. Provides active governance of Semantic Index scope with monitoring and documented controls.
Regulated All Recommended requirements plus: conduct formal risk assessment of Semantic Index data processing for regulated data types (NPI, MNPI, PII). Document Semantic Index architecture and data flow in technology risk assessment. Evaluate embedding storage and lifecycle governance. Include Semantic Index scope in periodic Copilot governance reviews. Engage information security team in Semantic Index governance decisions. Comprehensive understanding and governance of the Semantic Index data processing pipeline that supports examination readiness and regulatory inquiries about AI data handling.

Setup & Configuration

Step 1: Inventory Current Search Indexing Configuration

Review which SharePoint sites are currently included in search indexing:

Navigate to SharePoint Admin Center > Settings > Search to review tenant-level search configuration.

For site-level control, review individual site search settings:

# PowerShell reference (see Playbook 1.4.2):
# Get-SPOSite -Limit All | Select Url, DenyAddAndCustomizePages, DisableCompanyWideSharingLinks

Step 2: Map Semantic Index Scope to Data Classification

Cross-reference the workloads and sites indexed by the Semantic Index against the data classification inventory from Control 1.1:

Workload Sites/Content in Index Sensitivity Level Governance Action Needed
SharePoint [List sites] [Classification] [Action or N/A]
OneDrive All user OneDrive [Classification] [Action or N/A]
Exchange All mailboxes [Classification] [Action or N/A]
Teams All teams/channels [Classification] [Action or N/A]

Step 3: Implement Scope Controls

Based on the mapping above, implement appropriate scope controls:

  • For sites with sensitive content not yet remediated: Apply RSS exclusion or RCD
  • For workloads with regulated data: Ensure DLP policies and sensitivity labels are in place
  • For content that should not be in Copilot scope: Exclude from SharePoint search indexing

Step 4: Document Semantic Index Governance Posture

Create and maintain documentation that describes: - Current Semantic Index scope across workloads - Controls in place to govern indexing scope - Monitoring mechanisms for Copilot content retrieval - Decision rationale for scope inclusions and exclusions


Financial Sector Considerations

  • Material Non-Public Information (MNPI): Content containing MNPI (earnings data, M&A details, trading strategies) is indexed by the Semantic Index if it resides in indexed M365 locations. Financial institutions must ensure information barriers and permission controls prevent Copilot from surfacing MNPI across restricted boundaries.
  • Embedding Data Residency: The Semantic Index processes and stores embeddings within the Microsoft 365 tenant boundary. For institutions with data residency requirements, confirm that embedding processing aligns with data residency commitments in Microsoft's service agreements.
  • Audit Trail for AI Processing: Regulators may inquire about what data the AI system processes. Understanding Semantic Index scope enables institutions to provide informed responses about Copilot's data processing pipeline.
  • Model Risk Management Parallels: While the Semantic Index is not a model in the traditional OCC SR 11-7 sense, it transforms data in ways that affect AI outputs. Institutions with mature model risk management practices should consider whether the Semantic Index warrants inclusion in their AI/ML inventory.
  • Cross-Workload Data Leakage: The Semantic Index enables Copilot to synthesize information across workloads (e.g., combining SharePoint document content with email context). This cross-workload synthesis may surface correlations that would not be apparent from any single data source, creating novel information leakage risks.
  • Retention and Deletion: When source content is deleted or retention policies expire, the corresponding Semantic Index embeddings are also removed. Verify that this lifecycle alignment meets regulatory retention requirements.

Verification Criteria

  1. Documentation exists describing what the Semantic Index indexes across each M365 workload in the organization
  2. SharePoint search indexing configuration has been reviewed and sites containing regulated data are documented
  3. Semantic Index scope has been mapped against the data classification inventory from Control 1.1
  4. Appropriate scope controls (RSS, RCD, search exclusion) are in place for sites containing sensitive or regulated content
  5. DSPM Activity Explorer monitoring is configured to track Copilot content retrieval patterns (Recommended and Regulated levels)
  6. Formal risk assessment of Semantic Index data processing has been conducted for regulated data types (Regulated level)
  7. Semantic Index governance posture is documented in the organization's Copilot governance documentation
  8. Information security team has reviewed and approved the Semantic Index governance posture (Regulated level)
  9. Semantic Index data flow and architecture are documented in the organization's technology risk assessment (Regulated level)
  10. Periodic review cadence for Semantic Index governance is established (quarterly minimum)

Additional Resources


FSI Copilot Governance Framework v1.2.1 - March 2026