Control 1.13: Sensitive Information Types (SITs) and Pattern Recognition
Overview
Control ID: 1.13 Control Name: Sensitive Information Types (SITs) and Pattern Recognition Regulatory Reference: FINRA 4511, SEC Reg S-P, GLBA 501(b), SOX 404 Setup Time: 2-4 hours (initial); ongoing refinement
Purpose
Sensitive Information Types (SITs) are the foundation of data classification and protection in Microsoft 365. For financial services, properly configured SITs enable automatic detection of:
- Customer NPI: Social Security numbers, account numbers, tax IDs
- Financial Data: Credit card numbers, bank accounts, trading symbols
- Regulatory Identifiers: CRD numbers, CUSIP, ISIN, LEI
- Regulatory/Market Identifiers: CRD numbers, CUSIP (and internal identifiers)
- Proprietary Information: Trade secrets, material non-public information (MNPI)
SITs power DLP policies (Control 1.5), sensitivity labels, DSPM for AI (Control 1.6), and retention policies, making them critical for comprehensive FSI data governance.
US-only scope note
This control implementation guidance is U.S.-only:
- Use U.S. identifiers (e.g., SSN, ITIN, U.S. bank account/routing formats, CUSIP, FINRA CRD) and organization-specific internal identifiers.
- Avoid non-U.S. examples or patterns (for example, international bank identifiers). If your organization operates outside the U.S., create a separate addendum with jurisdiction-specific SITs and validation steps.
Prerequisites
Primary Owner Admin Role: Purview Info Protection Admin Supporting Roles: Purview Compliance Admin
Required Licenses
- Microsoft 365 E5 OR Microsoft 365 E5 Compliance (full functionality)
- Microsoft 365 E3 (basic built-in SITs only)
- Microsoft Purview Information Protection
Required Permissions
- Compliance Administrator or Global Administrator (create/modify SITs)
- Information Protection Admin (manage SITs)
- Security Reader (view SITs and reports)
Purview tenant prerequisites (required for validation and evidence)
- Access the correct portal
- Use the Microsoft Purview compliance portal: https://compliance.microsoft.com
-
Navigation (Dec 2025 UI): Solutions → Data classification → Classifiers
-
Permissions to create SITs
-
Ensure the implementing admin is assigned a role group that includes Information Protection and Data Classification authoring permissions (commonly via Compliance Administrator or Information Protection Admin, depending on org policy).
-
Permissions to validate SITs (Content explorer / Activity explorer)
- Ensure the implementing admin (or designated tester) has permissions that allow:
- Viewing classified content results in Content explorer
- Viewing signal/events in Activity explorer (where applicable)
-
If those explorers are visible but empty or blocked, it is almost always a role/permission assignment gap.
-
Data locations available for testing
- Confirm at least one Microsoft 365 location is in use and in-scope for scanning (SharePoint/OneDrive content is the most straightforward for SIT validation).
-
Confirm your test site/library is accessible to the test account and is not excluded by existing governance controls.
-
Operational readiness
- Identify a secure internal location for:
- Custom SIT inventory exports
- SIT pattern documentation
- Evidence screenshots and test results
Dependencies
- Control 1.5 (DLP): SITs are used in DLP policies
- Control 1.6 (DSPM for AI): SITs enable AI data exposure monitoring
- Control 1.7 (Audit Logging): SIT detection events logged
Pre-Setup Checklist
- [ ] Inventory of sensitive data types handled by organization
- [ ] Sample data patterns for custom SITs (sanitized)
- [ ] Approval for custom SIT definitions from Compliance/Legal
- [ ] Test environment for SIT validation
- [ ] Documentation of internal identifier formats
Governance Levels
Baseline (Level 1)
Define and implement SITs for common financial data (SSN, account numbers, credit cards).
Recommended (Level 2-3)
Custom SITs for industry-specific sensitive data; regex patterns for internal identifiers.
Regulated/High-Risk (Level 4)
Comprehensive SIT library with ML-based detection; continuous updates per threat landscape.
Setup & Configuration
Step 1: Review Built-in Financial SITs
Portal Path: Microsoft Purview → Data classification → Classifiers → Sensitive info types
Microsoft provides numerous built-in SITs for financial data. Review and enable these:
Essential Financial SITs
| SIT Name | Description | Use Case |
|---|---|---|
| U.S. Social Security Number (SSN) | 9-digit SSN with various formats | Customer identification |
| U.S. Bank Account Number | Bank account detection | Payment/transfer monitoring |
| Credit Card Number | Visa, MC, Amex, Discover patterns | PCI-DSS compliance |
| ABA Routing Number | 9-digit bank routing numbers | Wire transfer protection |
| U.S. Individual Taxpayer ID (ITIN) | 9-digit tax IDs starting with 9 | Tax document protection |
| U.S. Driver's License Number | State-specific DL patterns | Identity verification |
| U.S. Passport Number | 9-digit passport numbers | KYC documents |
Investment/Trading SITs
| SIT Name | Description | Use Case |
|---|---|---|
| CUSIP | 9-character security identifiers | Trading/portfolio data |
| IP Address | IPv4 and IPv6 addresses | System access monitoring |
To view all built-in SITs:
- Navigate to Purview → Data classification → Sensitive info types
- Filter by category: Financial
- Review description and detection patterns
Step 1A: Confirm you can access SIT authoring and validation screens
- Go to https://compliance.microsoft.com
- Navigate: Solutions → Data classification → Classifiers → Sensitive info types
- Validate:
- EXPECTED: You can open a built-in SIT and view its patterns, confidence, and supporting evidence configuration.
- Optional but recommended for implementers:
- Open Content explorer (same Solution area) to confirm you have viewing rights for later validation.
Step 2: Create Custom FSI SITs
Portal Path: Microsoft Purview → Data classification → Classifiers → Sensitive info types → + Create sensitive info type
Pattern design checklist (use for every custom SIT)
Before creating a custom SIT pattern, define:
- Canonical format (what the identifier truly looks like)
- Allowed variations (spaces, hyphens, prefixes, “CRD#”, etc.)
- Corroborating context (keywords near the identifier that reduce false positives)
- Exclusions (test prefixes, known non-sensitive lookalikes)
- Target confidence (start higher; relax only after measured false-negative review)
Custom SIT 1: Internal Account Number
- Click + Create sensitive info type
- Configure:
- Name:
FSI-Internal-Account-Number - Description: "Detects internal customer account numbers in format XXX-XXXXXX-XX"
- Click Next → Create pattern
- Add primary element:
- Type: Regular expression
- Pattern:
\b[A-Z]{3}-\d{6}-[A-Z0-9]{2}\b - Confidence level: High (85)
- Add supporting element (optional):
- Keywords: "account", "acct", "customer number"
- Within: 300 characters
- Set Character proximity: 300 characters
- Click Create
Custom SIT 2: FINRA CRD Number
- Click + Create sensitive info type
- Configure:
- Name:
FSI-FINRA-CRD-Number - Description: "Detects FINRA Central Registration Depository numbers"
- Add pattern:
- Type: Regular expression
- Pattern:
\b(?:CRD\s*#?\s*)?([1-9]\d{4,7})\b - Confidence level: Medium (75)
- Add supporting keywords:
- "CRD", "registered representative", "broker", "advisor"
- Click Create
Custom SIT 3: Material Non-Public Information (MNPI)
- Click + Create sensitive info type
- Configure:
- Name:
FSI-MNPI-Indicators - Description: "Detects potential material non-public information indicators"
- Add pattern using keywords:
- Type: Keyword dictionary
- Keywords:
- "earnings announcement", "merger", "acquisition target"
- "quarterly results", "guidance revision", "restructuring"
- "SEC filing", "insider", "material information"
- "trading restriction", "blackout period"
- Confidence level: Medium (65)
- Add proximity requirement with financial entity mentions
- Click Create
Custom SIT 4: Trade Details
- Click + Create sensitive info type
- Configure:
- Name:
FSI-Trade-Details - Description: "Detects trading activity patterns"
- Add pattern:
- Type: Regular expression
- Pattern:
\b(BUY|SELL|HOLD)\s+\d+(?:,\d{3})*\s+(?:shares?|units?|contracts?)\s+(?:of\s+)?[A-Z]{1,5}\b - Add supporting keywords:
- "execute", "trade", "order", "position", "fill"
- Click Create
Step 3: Create Keyword Dictionaries
Portal Path: Microsoft Purview → Data classification → Classifiers → EDM classifiers → Keyword dictionaries
Create keyword dictionaries for complex detection:
FSI Competitor Names Dictionary
- Click Create keyword dictionary
- Configure:
- Name:
FSI-Competitor-Names - Description: "List of competitor companies for MNPI monitoring"
- Upload or enter keywords:
Goldman Sachs Morgan Stanley JP Morgan Citigroup Bank of America Wells Fargo [Add your specific competitors] - Click Create
Step 4: Configure Exact Data Match (EDM) Classification
Portal Path: Microsoft Purview → Data classification → Classifiers → EDM classifiers
EDM provides precise matching against your actual customer data (without storing PII in the cloud):
- Click + Create EDM classifier
- Define schema:
- Name:
FSI-Customer-Data-EDM - Description: "Exact match for customer account data"
- Add columns:
CustomerAccountNumber(searchable)SSN(searchable)CustomerName(supporting)DateOfBirth(supporting)- Configure matching:
- Primary: CustomerAccountNumber
- Corroborative: SSN + CustomerName
- Upload hashed data source (see PowerShell section)
Step 5: Test SIT Detection
Portal Path: Microsoft Purview → Data classification → Content explorer
- Navigate to Content explorer
- Filter by sensitive information type
- Review detected items
- Validate accuracy:
- True positives: Correctly identified sensitive data
- False positives: Non-sensitive data flagged
- False negatives: Sensitive data missed
Create test documents:
- Create a Word/Excel file with test data:
Test Account: ABC-123456-XY Test SSN: 123-45-6789 Test Credit Card: 4111-1111-1111-1111 - Upload to SharePoint
- Wait for classification (up to 24 hours)
- Verify SIT detection in Content Explorer
Step 6: Tune SIT Accuracy
Based on testing results, adjust SITs:
Reduce False Positives:
- Edit the SIT → Patterns
- Add exclusions:
- Exclude patterns: Common false positive formats
- Add keywords: Require supporting context
- Increase confidence threshold
Reduce False Negatives:
- Edit the SIT → Patterns
- Add pattern variations
- Lower confidence threshold (carefully)
- Add alternative keyword groups
PowerShell Configuration
# Sensitive Information Types Configuration
# Requires: Exchange Online PowerShell (Security & Compliance)
# Connect to Security & Compliance PowerShell
Connect-IPPSSession -UserPrincipalName admin@contoso.com
# Get all sensitive information types
$AllSITs = Get-DlpSensitiveInformationType
Write-Host "Total SITs available: $($AllSITs.Count)"
# Filter for financial SITs
$FinancialSITs = $AllSITs | Where-Object {
$_.Name -match "credit card|bank account|SSN|social security|ABA|tax"
}
$FinancialSITs | Select-Object Name, Description | Format-Table
# Get details of a specific SIT
Get-DlpSensitiveInformationType -Identity "U.S. Social Security Number (SSN)" |
Select-Object Name, Description, RulePackageId
# Create a new custom SIT for internal account numbers
$RuleXML = @"
<?xml version="1.0" encoding="utf-8"?>
<RulePackage xmlns="http://schemas.microsoft.com/office/2011/mce">
<RulePack id="$(New-Guid)">
<Version build="0" major="1" minor="0" revision="0"/>
<Publisher id="$(New-Guid)"/>
<Details defaultLangCode="en">
<LocalizedDetails langcode="en">
<PublisherName>Contoso Financial</PublisherName>
<Name>FSI Custom Rule Pack</Name>
<Description>Custom SITs for financial services</Description>
</LocalizedDetails>
</Details>
</RulePack>
<Rules>
<Entity id="$(New-Guid)" patternsProximity="300" recommendedConfidence="75">
<Pattern confidenceLevel="85">
<IdMatch idRef="Regex_InternalAccountNumber"/>
<Match idRef="Keyword_AccountContext"/>
</Pattern>
<Pattern confidenceLevel="75">
<IdMatch idRef="Regex_InternalAccountNumber"/>
</Pattern>
<LocalizedStrings>
<Resource idRef="$(New-Guid)">
<Name default="true" langcode="en-us">FSI Internal Account Number</Name>
<Description default="true" langcode="en-us">Detects internal customer account numbers</Description>
</Resource>
</LocalizedStrings>
</Entity>
</Rules>
<Regex id="Regex_InternalAccountNumber">\b[A-Z]{3}-\d{6}-[A-Z0-9]{2}\b</Regex>
<Keyword id="Keyword_AccountContext">
<Group matchStyle="word">
<Term>account</Term>
<Term>customer number</Term>
<Term>acct</Term>
<Term>client id</Term>
</Group>
</Keyword>
</RulePackage>
"@
# Save XML and upload
$RuleXML | Out-File "C:\Governance\SITs\FSI-Custom-SIT.xml" -Encoding UTF8
# Create the rule package
New-DlpSensitiveInformationTypeRulePackage -FileData ([System.IO.File]::ReadAllBytes("C:\Governance\SITs\FSI-Custom-SIT.xml"))
# Get custom SITs
Get-DlpSensitiveInformationType | Where-Object { $_.Publisher -eq "Contoso Financial" }
# Create keyword dictionary
$Keywords = @"
Goldman Sachs
Morgan Stanley
JP Morgan
Citigroup
Bank of America
Wells Fargo
"@
New-DlpKeywordDictionary -Name "FSI-Competitor-Names" `
-Description "Major financial institution names for competitive intelligence monitoring" `
-FileData ([System.Text.Encoding]::UTF8.GetBytes($Keywords))
# Get all keyword dictionaries
Get-DlpKeywordDictionary | Select-Object Name, Description
# Test SIT detection (requires a file to test against)
# Using Test-DataClassification cmdlet
$TestContent = "Customer account ABC-123456-XY with SSN 123-45-6789"
$TestResult = Test-DataClassification -TextToClassify $TestContent
$TestResult.SensitiveInformation | Format-Table SensitiveInformationType, Count, Confidence
# Export all custom SITs for documentation
$CustomSITs = Get-DlpSensitiveInformationType | Where-Object { $_.Publisher -ne "Microsoft Corporation" }
$CustomSITs | Select-Object Name, Description, Publisher, RulePackageId |
Export-Csv "C:\Governance\SITs\Custom-SIT-Inventory-$(Get-Date -Format 'yyyyMMdd').csv" -NoTypeInformation
# EDM Classifier Setup (hash and upload data)
# Note: This requires the EDM Upload Agent installed locally
# Step 1: Create schema file
$SchemaXML = @"
<?xml version="1.0" encoding="utf-8"?>
<EdmSchema xmlns="http://schemas.microsoft.com/office/2018/edm">
<DataStore name="FSICustomerData" description="Customer account data for exact matching">
<Field name="CustomerAccountNumber" searchable="true"/>
<Field name="SSN" searchable="true"/>
<Field name="CustomerName" searchable="false"/>
<Field name="DateOfBirth" searchable="false"/>
</DataStore>
</EdmSchema>
"@
$SchemaXML | Out-File "C:\EDM\FSICustomerDataSchema.xml" -Encoding UTF8
# Step 2: Upload schema (run from EDM Upload Agent directory)
# .\EdmUploadAgent.exe /UploadSchema /DataStoreName FSICustomerData /HashFile C:\EDM\FSICustomerDataSchema.xml
# Step 3: Hash the data file
# .\EdmUploadAgent.exe /CreateHash /DataFile C:\EDM\CustomerData.csv /HashFile C:\EDM\CustomerDataHash.csv /Salt <your-salt>
# Step 4: Upload hashed data
# .\EdmUploadAgent.exe /UploadHash /DataStoreName FSICustomerData /HashFile C:\EDM\CustomerDataHash.csv
Write-Host "SIT configuration complete" -ForegroundColor Green
Financial Sector Considerations
Regulatory Alignment
| Regulation | SIT Requirement |
|---|---|
| GLBA 501(b) | Detect and protect customer NPI (SSN, account numbers) |
| SEC Reg S-P | Identify customer records for privacy protection |
| FINRA 4511 | Classify records containing customer information |
| PCI-DSS | Detect credit card numbers in scope systems |
| SOX 404 | Identify and protect financial reporting data |
Zone-Specific Configuration
Zone 1 (Personal Productivity)
SITs Used: Basic built-in SITs
Detection Mode: Alert only
Confidence Threshold: High (85+)
Action: Log for awareness
Zone 2 (Team Collaboration)
SITs Used: Built-in + basic custom
Detection Mode: Alert + educate
Confidence Threshold: Medium-High (75-85)
Action: Notify user, log to compliance
Zone 3 (Enterprise Managed)
SITs Used: Full library + EDM
Detection Mode: Block on high confidence
Confidence Threshold: Medium (65+)
Action: Block, notify compliance, require justification
FSI SIT Library Template
Create these custom SITs for comprehensive FSI coverage:
| SIT Name | Pattern/Keywords | Use Case |
|---|---|---|
FSI-Internal-Account-Number |
Company-specific format | Internal account detection |
FSI-FINRA-CRD-Number |
CRD# patterns | Registered rep identification |
FSI-CUSIP-Extended |
9-char + trading context | Security identification |
FSI-MNPI-Indicators |
Earnings, merger, insider keywords | Insider trading prevention |
FSI-Trade-Details |
Buy/Sell patterns + quantities | Trading activity monitoring |
FSI-Wire-Transfer |
Wire instructions + amounts | Payment monitoring |
FSI-Client-PII-Bundle |
SSN + Name + Account | Customer data package |
FSI-Net-Worth |
Dollar amounts + wealth keywords | Privacy for HNW clients |
Verification & Testing
Verification Steps
- Confirm Built-in SITs Available:
- Navigate to Purview → Data classification → Sensitive info types
- Filter by "Financial"
-
EXPECTED: Core financial SITs listed and active
-
Test SIT Detection:
- Upload test document with sample sensitive data
- Wait 24 hours for indexing
- Check Content Explorer for detection
-
EXPECTED: Test data correctly identified
-
Verify Custom SITs:
- Navigate to Sensitive info types → Filter by Publisher (your org)
-
EXPECTED: All custom SITs created and active
-
Test DLP Policy Integration:
- Create a test DLP policy using custom SIT
- Trigger policy with test content
-
EXPECTED: Policy fires on SIT detection
-
Validate EDM Classifier:
- Check EDM classifier status (Active/Indexing)
- Test with exact data from source
- EXPECTED: Exact match detected
Verification Evidence
- [ ] Screenshot: Built-in financial SITs list
- [ ] Export: Custom SIT definitions (XML)
- [ ] Documentation: SIT pattern definitions and rationale
- [ ] Screenshot: Content Explorer showing detections
- [ ] Test results: True/false positive rates
- [ ] Documentation: EDM schema and refresh schedule
Troubleshooting & Validation
Issue: SIT Not Detecting Expected Content
Symptoms: Known sensitive data not being flagged
Solutions:
- Verify SIT is active (not in test mode)
- Check regex pattern matches actual data format
- Ensure content has been indexed (can take 24 hours)
- Review confidence threshold - may be too high
- Check if keywords are required but missing from content
- Test pattern at regex101.com or similar
Issue: Too Many False Positives
Symptoms: Non-sensitive data triggering SIT alerts
Solutions:
- Add exclusion patterns (e.g., test data prefixes)
- Increase confidence level requirement
- Add required keywords for context
- Implement secondary elements for corroboration
- Use exact data match instead of regex for known values
Issue: Custom SIT Creation Fails
Symptoms: Error when uploading rule package XML
Solutions:
- Validate XML syntax
- Ensure all GUIDs are unique and valid format
- Check regex syntax in XML (escape special characters)
- Verify required XML namespace declarations
- Review error message for specific element issues
Issue: EDM Classifier Not Matching
Symptoms: EDM data uploaded but no matches occurring
Solutions:
- Verify hash upload completed successfully
- Check schema column names match data file headers
- Confirm searchable columns are marked correctly
- Allow 24-48 hours for initial indexing
- Verify data format matches expected pattern exactly
Additional Resources
- Sensitive Information Types Overview
- Create Custom SITs
- Keyword Dictionaries
- Exact Data Match
- SIT Entity Definitions
- Test SITs
Related Controls
| Control | Relationship |
|---|---|
| Control 1.5 | SITs are used in DLP policy conditions |
| Control 1.6 | SITs enable AI data exposure monitoring |
| Control 1.7 | SIT detection events are logged |
| Control 1.10 | SITs used in communication compliance policies |
Support & Questions
For implementation support or questions about this control, contact:
- Information Protection Team: SIT creation and tuning
- Compliance Officer: Regulatory requirements for data classification
- Legal: Review of custom SIT definitions for MNPI/confidential
- IT Security: Integration with DLP and monitoring
Updated: Dec 2025
Version: v1.0 Beta (Dec 2025)
UI Verification Status: ❌ Needs verification