Control 1.13: Sensitive Information Types (SITs) and Pattern Recognition

Control ID: 1.13
Pillar: Security
Regulatory Reference: FINRA 4511, FINRA 3110, FINRA RN 24-09, SEC 17a-3, SEC 17a-4, SEC Reg S-P, GLBA 501(b), SOX 404, PCI-DSS, CFTC 1.31 (swap dealers), OCC Bulletin 2026-13 (formerly OCC 2011-12) / Fed SR 26-2 (formerly SR 11-7) (model risk for trainable classifiers)
Last UI Verified: May 2026
Governance Levels: Baseline / Recommended / Regulated

Objective

Configure Sensitive Information Types (SITs) for automatic detection and classification of financial data including customer NPI (SSN, account numbers), regulatory identifiers (CRD, CUSIP), and material non-public information (MNPI) to enable DLP policies and AI data governance.

Why This Matters for FSI

GLBA 501(b): Enables automatic detection and protection of customer NPI
SEC Reg S-P (2024 Amendments): Identifies customer information (broadened to include NPI from other financial institutions) requiring safeguards, incident response programs, and 30-day breach notification
FINRA 4511: Classifies records containing customer information for retention
FINRA RN 24-09 / Rule 3110: SIT-based classification supports required governance policies for privacy, integrity, and accuracy of data processed by AI tools
PCI-DSS: Detects credit card numbers in scope systems
SOX 404: Identifies financial reporting data for integrity controls

No companion solution by design

Not all controls have a companion solution in FSI-AgentGov-Solutions; solution mapping is selective by design. This control is operated via native Microsoft admin surfaces and verified by the framework's assessment-engine collectors. See the Solutions Index for the catalog and coverage scope.

Control Description

This control establishes Sensitive Information Types and adjacent classification primitives as the foundation of data classification in Microsoft 365. Microsoft Purview treats these as distinct detection paths, not a single feature, and FSI deployments must reason about them separately:

Built-in pattern SITs — Microsoft-provided regex/checksum/proximity detectors (e.g., U.S. Social Security Number, Credit Card Number with Luhn, U.S. Bank Account Number, ABA Routing Number, ITIN, EIN, CUSIP). Available with base Purview entitlement.
Custom pattern SITs — Organization-specific detectors built from primary elements, supporting elements, regex (with optional validators / checksums), keyword lists or dictionaries, proximity, and Low / Medium / High confidence levels. Examples relevant to FSI include internal account-number formats, FINRA CRD Number (custom — no checksum exists), and MNPI indicator keyword-dictionary SITs.
Named Entity Recognition (NER) SITs — Bundled and unbundled named-entity detectors (e.g., "All full names", "All physical addresses", medical conditions). NER is used inside DLP, sensitivity labels, retention, eDiscovery, and DSPM-for-AI signals. Requires Microsoft 365 E5 / E5 Compliance entitlement.
Trainable classifiers — ML-based complement to pattern SITs. Useful where regex cannot reasonably express intent (e.g., MNPI, source code, customer complaints). See the FSI governance gate below before relying on a trainable classifier as a primary control.
Exact Data Match (EDM) — Detection against a one-way salted hash of customer master data uploaded via the EDM Upload Agent. Required where deterministic match against actual customer records is needed (e.g., specific account numbers, beneficiary records).
Keyword dictionaries — Reusable term lists referenced by SITs (e.g., competitor names, restricted entities, project codenames).

These primitives are then consumed by DLP (including DLP for Microsoft 365 Copilot), sensitivity labels, retention policies, Communication Compliance, eDiscovery, and DSPM for AI to deliver the protections required under FINRA 4511, SEC Reg S-P, GLBA 501(b), and FINRA RN 24-09.

License & Entitlement

Capability	Minimum License (verify per release)
Built-in & custom pattern SITs in DLP / labels / retention	Microsoft Purview (base)
Custom SIT keyword dictionaries	Microsoft Purview (base)
Credential-scanning SITs	Microsoft 365 E5 / E5 Compliance
Named Entity Recognition (NER) — bundled & unbundled	Microsoft 365 E5 / E5 Compliance
Exact Data Match (EDM) — production use	Microsoft 365 E5 / E5 Compliance
Trainable classifiers (custom)	Microsoft 365 E5 / E5 Compliance
DSPM for AI consumption of SITs	DSPM for AI entitlement (per current Microsoft licensing)
DLP for Microsoft 365 Copilot using SIT conditions	Microsoft 365 E5 / E5 Compliance + Copilot entitlement (verify SIT-based prompt-restriction GA vs preview at publication time)
Communication Compliance use of SITs	Communication Compliance (E5 / add-on)
eDiscovery Premium use of SITs in conditions	E5 Compliance / eDiscovery Premium add-on

Re-verify against the Microsoft 365 security & compliance licensing guidance before signing customer entitlements.

Trainable Classifier — FSI Governance Gate

Before relying on a trainable classifier (built-in or custom) as a primary detection control:

Confirm the classifier is GA, not Preview, for the targeted workload.
Sample and document training-data provenance, quality, and review per OCC Bulletin 2026-13 (formerly OCC 2011-12) / Fed SR 26-2 (formerly SR 11-7) model-risk expectations.
Do not use a trainable classifier as the sole evidence for record-classification under SEC 17a-4 / FINRA 4511 — pair with a deterministic SIT or sensitivity label so the auditable record has a reproducible match.
Re-validate accuracy on a defined cadence (recommended quarterly for Zone 3) and capture the validation in the SIT/classifier evidence file.
Custom trainable classifiers are English-only at GA — confirm regional language coverage before scoping multilingual customer datasets.

Copilot / Agent Integration

SITs do not protect Microsoft 365 Copilot or agent grounding directly — they are the detection primitive that feeds the policies that do:

DLP for Microsoft 365 Copilot — DLP policies scoped to the Copilot workload use SIT conditions to exclude items from Copilot processing or restrict prompts that contain matched SITs. Required for FINRA RN 24-09 / Rule 3110 governance of AI data inputs. Confirm SIT-based prompt restriction GA vs preview status at publication time.
Sensitivity labels — Auto-labeling rules driven by SITs can mark items as excluded from Copilot processing, which is the GA-level integration today.
DSPM for AI — Surfaces "sensitive data shared with Copilot" telemetry using the same SITs. Tuning a SIT in this control directly affects DSPM-for-AI signal quality. Note that Microsoft is consolidating the classic AI Hub experience into a unified DSPM experience — reference the capability generically rather than hard-coding a portal name.
Agent grounding (Microsoft Copilot Studio, Microsoft 365 Agents) — Agents inherit DLP-for-Copilot exclusions when configured under a governed environment (see Control 2.x — Environment Governance) and when the agent's grounding sources are SharePoint / Exchange / OneDrive content that DLP can evaluate.

A SIT false negative (e.g., SSN not detected) can result in customer NPI surfaced through Copilot output. Treat such failures as reportable events — see the Incident Handling cross-reference in Related Controls and the §1 FSI Incident Handling section of the troubleshooting playbook.

Key Configuration Points

Review and enable the built-in financial / PII SITs in Purview > Data classification > Sensitive info types: U.S. Social Security Number, Credit Card Number (Luhn-validated), U.S. Bank Account Number, ABA Routing Number, U.S. Individual Taxpayer Identification Number (ITIN), U.S. Employer Identification Number (EIN), CUSIP, and the bundled NER SITs ("All full names", "All physical addresses") where E5 entitlement permits.
Create custom pattern SITs for internal account-number formats — use anchored, scoped regex (avoid bare ^/$ and unbounded wildcards), rely on supporting evidence (keywords, proximity, dictionaries), and pin Low / Medium / High confidence to the SIT's actual numeric thresholds (which vary per SIT).
Create the FINRA CRD Number SIT as a custom SIT — CRD numbers have no published checksum, so confidence tuning relies entirely on regex precision and keyword proximity. Set initial confidence to Medium, monitor false positives weekly during pilot, and never rely on CRD detection alone for blocking actions.
Build the MNPI indicator SIT from a curated keyword dictionary plus deal-codename / project-name supporting elements; pair it with a trainable classifier only after the FSI Governance Gate above is met.
Configure EDM for high-value customer master data: define schema (≤32 searchable columns), salt and hash the source dataset with the EDM Upload Agent, schedule daily refresh, and verify the rule package shows Active before relying on it for a DLP block action. Re-verify limits in Learn about EDM-based SITs at publication.
Set DLP / label confidence by label (Low / Medium / High) per use case: High for block actions, Medium for alert / educate, Low only for diagnostic policies. Underlying numeric thresholds (e.g., 85 / 75 / 65) are per-SIT — open the SIT's entity definition before tuning.
Test every SIT in Test pane → Content Explorer → DLP simulation → DLP-for-Copilot simulation before promoting to enforce. Capture seeded synthetic test data and the resulting matches as evidence (do not seed real customer NPI into test corpora).
Maintain a false-positive / false-negative review cadence (recommended monthly for Zone 3) and route findings into the SIT evidence file.

Zone-Specific Requirements

Zone	Requirement	Rationale
Zone 1 (Personal)	Built-in SITs only; alert-only DLP; High confidence threshold	Low risk, awareness-focused
Zone 2 (Team)	Built-in + foundational custom SITs; alert and educate; Medium–High confidence	Team data requires classification
Zone 3 (Enterprise)	Full SIT library + EDM + NER (where licensed) + governed trainable classifiers; block on High confidence; alert on Medium; DLP-for-Copilot and DSPM-for-AI integration required	Customer-facing data requires comprehensive detection and AI-grounding controls

Roles & Responsibilities

Role	Responsibility
Purview Info Protection Admin	Create and manage SITs, tune confidence levels, deploy EDM rule packages
Purview Compliance Admin	Approve custom SIT definitions, review detections in Content Explorer / Activity Explorer
Purview Data Security AI Admin	Validate SIT coverage feeding DSPM for AI signals and DLP-for-Copilot policies
AI Administrator	Coordinate Copilot feature controls that depend on SIT-driven sensitivity labels
Compliance Officer	Validate regulatory mapping (FINRA 4511 / SEC 17a-3/4 / Reg S-P / GLBA) and accept residual risk
Legal (organizational function)	Review MNPI and privileged-information SIT definitions and keyword dictionaries

Control	Relationship
1.5 - DLP and Sensitivity Labels	SITs are used in DLP policy conditions and as auto-labeling triggers
1.6 - DSPM for AI	SIT detections drive AI data exposure monitoring; SIT false negatives degrade DSPM-for-AI signal quality
1.7 - Audit Logging	SIT detection events surface in Unified Audit Log and Activity Explorer; required for SEC 17a-4 / FINRA 4511 evidence
1.10 - Communication Compliance	SITs used in communication review policies for MNPI, suitability, and customer-NPI conditions
Incident Response Playbook (AI)	SIT false negatives that surface customer NPI through Copilot / agent grounding may trigger Reg S-P 30-day breach notification — triage as reportable events
2.1 - Managed Environments — Customer Lockbox & Data Residency Posture	SIT-detected NPI/MNPI inherits the residency and Customer Lockbox posture documented in the 2.1 sub-section

Implementation Playbooks

Step-by-Step Implementation

This control has detailed playbooks for implementation, automation, testing, and troubleshooting:

Portal Walkthrough — Step-by-step portal configuration
PowerShell Setup — Automation scripts
Verification & Testing — Test cases and evidence collection
Troubleshooting — Common issues and resolutions

Verification Criteria

Confirm control effectiveness by verifying:

Built-in financial & PII SITs (SSN, Credit Card Number, U.S. Bank Account Number, ABA Routing Number, ITIN, EIN, CUSIP) appear in Purview > Data classification > Sensitive info types with the expected publisher.
Custom SITs (e.g., FINRA CRD, internal account formats, MNPI keyword-dictionary SIT) appear with the documented publisher and confidence levels.
Seeded synthetic test document is detected in Content Explorer within the documented telemetry window (typically up to 24 hours; verify against your tenant baseline rather than assuming a fixed SLA).
Test DLP rule fires on the seeded SIT match across Exchange, SharePoint, OneDrive, and Teams workloads.
DLP-for-Copilot policy containing the SIT correctly excludes / redacts the seeded item from Copilot grounding (verify with a controlled prompt and capture both input and output as evidence).
DSPM for AI dashboard registers the SIT detection under "Sensitive data shared with Copilot" telemetry within the documented window.
EDM rule package shows Active; hash refresh executed within the agreed cadence (typically daily); seeded exact-match record detected.
NER-backed SITs (e.g., "All full names") return matches on a seeded NER test document (E5 license required).
False-positive / false-negative review log maintained on a defined cadence (recommended monthly for Zone 3); findings routed back into SIT tuning and DLP policy adjustments.
Trainable classifiers in use have current model-risk validation evidence per OCC Bulletin 2026-13 (formerly OCC 2011-12) / Fed SR 26-2 (formerly SR 11-7) if they drive a primary control.

Additional Resources

Updated: June 2026 | Version: v1.6.2 | UI Verification Status: Current