Skip to content

Control 1.13: Sensitive Information Types (SITs) and Pattern Recognition

Overview

Control ID: 1.13 Control Name: Sensitive Information Types (SITs) and Pattern Recognition Regulatory Reference: FINRA 4511, SEC Reg S-P, GLBA 501(b), SOX 404 Setup Time: 2-4 hours (initial); ongoing refinement


Purpose

Sensitive Information Types (SITs) are the foundation of data classification and protection in Microsoft 365. For financial services, properly configured SITs enable automatic detection of:

  • Customer NPI: Social Security numbers, account numbers, tax IDs
  • Financial Data: Credit card numbers, bank accounts, trading symbols
  • Regulatory Identifiers: CRD numbers, CUSIP, ISIN, LEI
  • Regulatory/Market Identifiers: CRD numbers, CUSIP (and internal identifiers)
  • Proprietary Information: Trade secrets, material non-public information (MNPI)

SITs power DLP policies (Control 1.5), sensitivity labels, DSPM for AI (Control 1.6), and retention policies, making them critical for comprehensive FSI data governance.

US-only scope note

This control implementation guidance is U.S.-only:

  • Use U.S. identifiers (e.g., SSN, ITIN, U.S. bank account/routing formats, CUSIP, FINRA CRD) and organization-specific internal identifiers.
  • Avoid non-U.S. examples or patterns (for example, international bank identifiers). If your organization operates outside the U.S., create a separate addendum with jurisdiction-specific SITs and validation steps.

Prerequisites

Primary Owner Admin Role: Purview Info Protection Admin Supporting Roles: Purview Compliance Admin

Required Licenses

  • Microsoft 365 E5 OR Microsoft 365 E5 Compliance (full functionality)
  • Microsoft 365 E3 (basic built-in SITs only)
  • Microsoft Purview Information Protection

Required Permissions

  • Compliance Administrator or Global Administrator (create/modify SITs)
  • Information Protection Admin (manage SITs)
  • Security Reader (view SITs and reports)

Purview tenant prerequisites (required for validation and evidence)

  1. Access the correct portal
  2. Use the Microsoft Purview compliance portal: https://compliance.microsoft.com
  3. Navigation (Dec 2025 UI): SolutionsData classificationClassifiers

  4. Permissions to create SITs

  5. Ensure the implementing admin is assigned a role group that includes Information Protection and Data Classification authoring permissions (commonly via Compliance Administrator or Information Protection Admin, depending on org policy).

  6. Permissions to validate SITs (Content explorer / Activity explorer)

  7. Ensure the implementing admin (or designated tester) has permissions that allow:
    • Viewing classified content results in Content explorer
    • Viewing signal/events in Activity explorer (where applicable)
  8. If those explorers are visible but empty or blocked, it is almost always a role/permission assignment gap.

  9. Data locations available for testing

  10. Confirm at least one Microsoft 365 location is in use and in-scope for scanning (SharePoint/OneDrive content is the most straightforward for SIT validation).
  11. Confirm your test site/library is accessible to the test account and is not excluded by existing governance controls.

  12. Operational readiness

  13. Identify a secure internal location for:
    • Custom SIT inventory exports
    • SIT pattern documentation
    • Evidence screenshots and test results

Dependencies

  • Control 1.5 (DLP): SITs are used in DLP policies
  • Control 1.6 (DSPM for AI): SITs enable AI data exposure monitoring
  • Control 1.7 (Audit Logging): SIT detection events logged

Pre-Setup Checklist

  • [ ] Inventory of sensitive data types handled by organization
  • [ ] Sample data patterns for custom SITs (sanitized)
  • [ ] Approval for custom SIT definitions from Compliance/Legal
  • [ ] Test environment for SIT validation
  • [ ] Documentation of internal identifier formats

Governance Levels

Baseline (Level 1)

Define and implement SITs for common financial data (SSN, account numbers, credit cards).

Custom SITs for industry-specific sensitive data; regex patterns for internal identifiers.

Regulated/High-Risk (Level 4)

Comprehensive SIT library with ML-based detection; continuous updates per threat landscape.


Setup & Configuration

Step 1: Review Built-in Financial SITs

Portal Path: Microsoft PurviewData classificationClassifiersSensitive info types

Microsoft provides numerous built-in SITs for financial data. Review and enable these:

Essential Financial SITs

SIT Name Description Use Case
U.S. Social Security Number (SSN) 9-digit SSN with various formats Customer identification
U.S. Bank Account Number Bank account detection Payment/transfer monitoring
Credit Card Number Visa, MC, Amex, Discover patterns PCI-DSS compliance
ABA Routing Number 9-digit bank routing numbers Wire transfer protection
U.S. Individual Taxpayer ID (ITIN) 9-digit tax IDs starting with 9 Tax document protection
U.S. Driver's License Number State-specific DL patterns Identity verification
U.S. Passport Number 9-digit passport numbers KYC documents

Investment/Trading SITs

SIT Name Description Use Case
CUSIP 9-character security identifiers Trading/portfolio data
IP Address IPv4 and IPv6 addresses System access monitoring

To view all built-in SITs:

  1. Navigate to Purview → Data classification → Sensitive info types
  2. Filter by category: Financial
  3. Review description and detection patterns

Step 1A: Confirm you can access SIT authoring and validation screens

  1. Go to https://compliance.microsoft.com
  2. Navigate: SolutionsData classificationClassifiersSensitive info types
  3. Validate:
  4. EXPECTED: You can open a built-in SIT and view its patterns, confidence, and supporting evidence configuration.
  5. Optional but recommended for implementers:
  6. Open Content explorer (same Solution area) to confirm you have viewing rights for later validation.

Step 2: Create Custom FSI SITs

Portal Path: Microsoft Purview → Data classificationClassifiersSensitive info types+ Create sensitive info type

Pattern design checklist (use for every custom SIT)

Before creating a custom SIT pattern, define:

  • Canonical format (what the identifier truly looks like)
  • Allowed variations (spaces, hyphens, prefixes, “CRD#”, etc.)
  • Corroborating context (keywords near the identifier that reduce false positives)
  • Exclusions (test prefixes, known non-sensitive lookalikes)
  • Target confidence (start higher; relax only after measured false-negative review)

Custom SIT 1: Internal Account Number

  1. Click + Create sensitive info type
  2. Configure:
  3. Name: FSI-Internal-Account-Number
  4. Description: "Detects internal customer account numbers in format XXX-XXXXXX-XX"
  5. Click NextCreate pattern
  6. Add primary element:
  7. Type: Regular expression
  8. Pattern: \b[A-Z]{3}-\d{6}-[A-Z0-9]{2}\b
  9. Confidence level: High (85)
  10. Add supporting element (optional):
  11. Keywords: "account", "acct", "customer number"
  12. Within: 300 characters
  13. Set Character proximity: 300 characters
  14. Click Create

Custom SIT 2: FINRA CRD Number

  1. Click + Create sensitive info type
  2. Configure:
  3. Name: FSI-FINRA-CRD-Number
  4. Description: "Detects FINRA Central Registration Depository numbers"
  5. Add pattern:
  6. Type: Regular expression
  7. Pattern: \b(?:CRD\s*#?\s*)?([1-9]\d{4,7})\b
  8. Confidence level: Medium (75)
  9. Add supporting keywords:
  10. "CRD", "registered representative", "broker", "advisor"
  11. Click Create

Custom SIT 3: Material Non-Public Information (MNPI)

  1. Click + Create sensitive info type
  2. Configure:
  3. Name: FSI-MNPI-Indicators
  4. Description: "Detects potential material non-public information indicators"
  5. Add pattern using keywords:
  6. Type: Keyword dictionary
  7. Keywords:
    • "earnings announcement", "merger", "acquisition target"
    • "quarterly results", "guidance revision", "restructuring"
    • "SEC filing", "insider", "material information"
    • "trading restriction", "blackout period"
  8. Confidence level: Medium (65)
  9. Add proximity requirement with financial entity mentions
  10. Click Create

Custom SIT 4: Trade Details

  1. Click + Create sensitive info type
  2. Configure:
  3. Name: FSI-Trade-Details
  4. Description: "Detects trading activity patterns"
  5. Add pattern:
  6. Type: Regular expression
  7. Pattern: \b(BUY|SELL|HOLD)\s+\d+(?:,\d{3})*\s+(?:shares?|units?|contracts?)\s+(?:of\s+)?[A-Z]{1,5}\b
  8. Add supporting keywords:
  9. "execute", "trade", "order", "position", "fill"
  10. Click Create

Step 3: Create Keyword Dictionaries

Portal Path: Microsoft Purview → Data classificationClassifiersEDM classifiersKeyword dictionaries

Create keyword dictionaries for complex detection:

FSI Competitor Names Dictionary

  1. Click Create keyword dictionary
  2. Configure:
  3. Name: FSI-Competitor-Names
  4. Description: "List of competitor companies for MNPI monitoring"
  5. Upload or enter keywords:
    Goldman Sachs
    Morgan Stanley
    JP Morgan
    Citigroup
    Bank of America
    Wells Fargo
    [Add your specific competitors]
    
  6. Click Create

Step 4: Configure Exact Data Match (EDM) Classification

Portal Path: Microsoft Purview → Data classificationClassifiersEDM classifiers

EDM provides precise matching against your actual customer data (without storing PII in the cloud):

  1. Click + Create EDM classifier
  2. Define schema:
  3. Name: FSI-Customer-Data-EDM
  4. Description: "Exact match for customer account data"
  5. Add columns:
  6. CustomerAccountNumber (searchable)
  7. SSN (searchable)
  8. CustomerName (supporting)
  9. DateOfBirth (supporting)
  10. Configure matching:
  11. Primary: CustomerAccountNumber
  12. Corroborative: SSN + CustomerName
  13. Upload hashed data source (see PowerShell section)

Step 5: Test SIT Detection

Portal Path: Microsoft Purview → Data classificationContent explorer

  1. Navigate to Content explorer
  2. Filter by sensitive information type
  3. Review detected items
  4. Validate accuracy:
  5. True positives: Correctly identified sensitive data
  6. False positives: Non-sensitive data flagged
  7. False negatives: Sensitive data missed

Create test documents:

  1. Create a Word/Excel file with test data:
    Test Account: ABC-123456-XY
    Test SSN: 123-45-6789
    Test Credit Card: 4111-1111-1111-1111
    
  2. Upload to SharePoint
  3. Wait for classification (up to 24 hours)
  4. Verify SIT detection in Content Explorer

Step 6: Tune SIT Accuracy

Based on testing results, adjust SITs:

Reduce False Positives:

  1. Edit the SIT → Patterns
  2. Add exclusions:
  3. Exclude patterns: Common false positive formats
  4. Add keywords: Require supporting context
  5. Increase confidence threshold

Reduce False Negatives:

  1. Edit the SIT → Patterns
  2. Add pattern variations
  3. Lower confidence threshold (carefully)
  4. Add alternative keyword groups

PowerShell Configuration

# Sensitive Information Types Configuration
# Requires: Exchange Online PowerShell (Security & Compliance)

# Connect to Security & Compliance PowerShell
Connect-IPPSSession -UserPrincipalName admin@contoso.com

# Get all sensitive information types
$AllSITs = Get-DlpSensitiveInformationType
Write-Host "Total SITs available: $($AllSITs.Count)"

# Filter for financial SITs
$FinancialSITs = $AllSITs | Where-Object {
   $_.Name -match "credit card|bank account|SSN|social security|ABA|tax"
}
$FinancialSITs | Select-Object Name, Description | Format-Table

# Get details of a specific SIT
Get-DlpSensitiveInformationType -Identity "U.S. Social Security Number (SSN)" |
    Select-Object Name, Description, RulePackageId

# Create a new custom SIT for internal account numbers
$RuleXML = @"
<?xml version="1.0" encoding="utf-8"?>
<RulePackage xmlns="http://schemas.microsoft.com/office/2011/mce">
  <RulePack id="$(New-Guid)">
    <Version build="0" major="1" minor="0" revision="0"/>
    <Publisher id="$(New-Guid)"/>
    <Details defaultLangCode="en">
      <LocalizedDetails langcode="en">
        <PublisherName>Contoso Financial</PublisherName>
        <Name>FSI Custom Rule Pack</Name>
        <Description>Custom SITs for financial services</Description>
      </LocalizedDetails>
    </Details>
  </RulePack>
  <Rules>
    <Entity id="$(New-Guid)" patternsProximity="300" recommendedConfidence="75">
      <Pattern confidenceLevel="85">
        <IdMatch idRef="Regex_InternalAccountNumber"/>
        <Match idRef="Keyword_AccountContext"/>
      </Pattern>
      <Pattern confidenceLevel="75">
        <IdMatch idRef="Regex_InternalAccountNumber"/>
      </Pattern>
      <LocalizedStrings>
        <Resource idRef="$(New-Guid)">
          <Name default="true" langcode="en-us">FSI Internal Account Number</Name>
          <Description default="true" langcode="en-us">Detects internal customer account numbers</Description>
        </Resource>
      </LocalizedStrings>
    </Entity>
  </Rules>
  <Regex id="Regex_InternalAccountNumber">\b[A-Z]{3}-\d{6}-[A-Z0-9]{2}\b</Regex>
  <Keyword id="Keyword_AccountContext">
    <Group matchStyle="word">
      <Term>account</Term>
      <Term>customer number</Term>
      <Term>acct</Term>
      <Term>client id</Term>
    </Group>
  </Keyword>
</RulePackage>
"@

# Save XML and upload
$RuleXML | Out-File "C:\Governance\SITs\FSI-Custom-SIT.xml" -Encoding UTF8

# Create the rule package
New-DlpSensitiveInformationTypeRulePackage -FileData ([System.IO.File]::ReadAllBytes("C:\Governance\SITs\FSI-Custom-SIT.xml"))

# Get custom SITs
Get-DlpSensitiveInformationType | Where-Object { $_.Publisher -eq "Contoso Financial" }

# Create keyword dictionary
$Keywords = @"
Goldman Sachs
Morgan Stanley
JP Morgan
Citigroup
Bank of America
Wells Fargo
"@

New-DlpKeywordDictionary -Name "FSI-Competitor-Names" `
    -Description "Major financial institution names for competitive intelligence monitoring" `
    -FileData ([System.Text.Encoding]::UTF8.GetBytes($Keywords))

# Get all keyword dictionaries
Get-DlpKeywordDictionary | Select-Object Name, Description

# Test SIT detection (requires a file to test against)
# Using Test-DataClassification cmdlet
$TestContent = "Customer account ABC-123456-XY with SSN 123-45-6789"
$TestResult = Test-DataClassification -TextToClassify $TestContent
$TestResult.SensitiveInformation | Format-Table SensitiveInformationType, Count, Confidence

# Export all custom SITs for documentation
$CustomSITs = Get-DlpSensitiveInformationType | Where-Object { $_.Publisher -ne "Microsoft Corporation" }
$CustomSITs | Select-Object Name, Description, Publisher, RulePackageId |
    Export-Csv "C:\Governance\SITs\Custom-SIT-Inventory-$(Get-Date -Format 'yyyyMMdd').csv" -NoTypeInformation

# EDM Classifier Setup (hash and upload data)
# Note: This requires the EDM Upload Agent installed locally

# Step 1: Create schema file
$SchemaXML = @"
<?xml version="1.0" encoding="utf-8"?>
<EdmSchema xmlns="http://schemas.microsoft.com/office/2018/edm">
  <DataStore name="FSICustomerData" description="Customer account data for exact matching">
    <Field name="CustomerAccountNumber" searchable="true"/>
    <Field name="SSN" searchable="true"/>
    <Field name="CustomerName" searchable="false"/>
    <Field name="DateOfBirth" searchable="false"/>
  </DataStore>
</EdmSchema>
"@
$SchemaXML | Out-File "C:\EDM\FSICustomerDataSchema.xml" -Encoding UTF8

# Step 2: Upload schema (run from EDM Upload Agent directory)
# .\EdmUploadAgent.exe /UploadSchema /DataStoreName FSICustomerData /HashFile C:\EDM\FSICustomerDataSchema.xml

# Step 3: Hash the data file
# .\EdmUploadAgent.exe /CreateHash /DataFile C:\EDM\CustomerData.csv /HashFile C:\EDM\CustomerDataHash.csv /Salt <your-salt>

# Step 4: Upload hashed data
# .\EdmUploadAgent.exe /UploadHash /DataStoreName FSICustomerData /HashFile C:\EDM\CustomerDataHash.csv

Write-Host "SIT configuration complete" -ForegroundColor Green

Financial Sector Considerations

Regulatory Alignment

Regulation SIT Requirement
GLBA 501(b) Detect and protect customer NPI (SSN, account numbers)
SEC Reg S-P Identify customer records for privacy protection
FINRA 4511 Classify records containing customer information
PCI-DSS Detect credit card numbers in scope systems
SOX 404 Identify and protect financial reporting data

Zone-Specific Configuration

Zone 1 (Personal Productivity)

SITs Used: Basic built-in SITs
Detection Mode: Alert only
Confidence Threshold: High (85+)
Action: Log for awareness

Zone 2 (Team Collaboration)

SITs Used: Built-in + basic custom
Detection Mode: Alert + educate
Confidence Threshold: Medium-High (75-85)
Action: Notify user, log to compliance

Zone 3 (Enterprise Managed)

SITs Used: Full library + EDM
Detection Mode: Block on high confidence
Confidence Threshold: Medium (65+)
Action: Block, notify compliance, require justification

FSI SIT Library Template

Create these custom SITs for comprehensive FSI coverage:

SIT Name Pattern/Keywords Use Case
FSI-Internal-Account-Number Company-specific format Internal account detection
FSI-FINRA-CRD-Number CRD# patterns Registered rep identification
FSI-CUSIP-Extended 9-char + trading context Security identification
FSI-MNPI-Indicators Earnings, merger, insider keywords Insider trading prevention
FSI-Trade-Details Buy/Sell patterns + quantities Trading activity monitoring
FSI-Wire-Transfer Wire instructions + amounts Payment monitoring
FSI-Client-PII-Bundle SSN + Name + Account Customer data package
FSI-Net-Worth Dollar amounts + wealth keywords Privacy for HNW clients

Verification & Testing

Verification Steps

  1. Confirm Built-in SITs Available:
  2. Navigate to Purview → Data classification → Sensitive info types
  3. Filter by "Financial"
  4. EXPECTED: Core financial SITs listed and active

  5. Test SIT Detection:

  6. Upload test document with sample sensitive data
  7. Wait 24 hours for indexing
  8. Check Content Explorer for detection
  9. EXPECTED: Test data correctly identified

  10. Verify Custom SITs:

  11. Navigate to Sensitive info types → Filter by Publisher (your org)
  12. EXPECTED: All custom SITs created and active

  13. Test DLP Policy Integration:

  14. Create a test DLP policy using custom SIT
  15. Trigger policy with test content
  16. EXPECTED: Policy fires on SIT detection

  17. Validate EDM Classifier:

  18. Check EDM classifier status (Active/Indexing)
  19. Test with exact data from source
  20. EXPECTED: Exact match detected

Verification Evidence

  • [ ] Screenshot: Built-in financial SITs list
  • [ ] Export: Custom SIT definitions (XML)
  • [ ] Documentation: SIT pattern definitions and rationale
  • [ ] Screenshot: Content Explorer showing detections
  • [ ] Test results: True/false positive rates
  • [ ] Documentation: EDM schema and refresh schedule

Troubleshooting & Validation

Issue: SIT Not Detecting Expected Content

Symptoms: Known sensitive data not being flagged

Solutions:

  1. Verify SIT is active (not in test mode)
  2. Check regex pattern matches actual data format
  3. Ensure content has been indexed (can take 24 hours)
  4. Review confidence threshold - may be too high
  5. Check if keywords are required but missing from content
  6. Test pattern at regex101.com or similar

Issue: Too Many False Positives

Symptoms: Non-sensitive data triggering SIT alerts

Solutions:

  1. Add exclusion patterns (e.g., test data prefixes)
  2. Increase confidence level requirement
  3. Add required keywords for context
  4. Implement secondary elements for corroboration
  5. Use exact data match instead of regex for known values

Issue: Custom SIT Creation Fails

Symptoms: Error when uploading rule package XML

Solutions:

  1. Validate XML syntax
  2. Ensure all GUIDs are unique and valid format
  3. Check regex syntax in XML (escape special characters)
  4. Verify required XML namespace declarations
  5. Review error message for specific element issues

Issue: EDM Classifier Not Matching

Symptoms: EDM data uploaded but no matches occurring

Solutions:

  1. Verify hash upload completed successfully
  2. Check schema column names match data file headers
  3. Confirm searchable columns are marked correctly
  4. Allow 24-48 hours for initial indexing
  5. Verify data format matches expected pattern exactly

Additional Resources


Control Relationship
Control 1.5 SITs are used in DLP policy conditions
Control 1.6 SITs enable AI data exposure monitoring
Control 1.7 SIT detection events are logged
Control 1.10 SITs used in communication compliance policies

Support & Questions

For implementation support or questions about this control, contact:

  • Information Protection Team: SIT creation and tuning
  • Compliance Officer: Regulatory requirements for data classification
  • Legal: Review of custom SIT definitions for MNPI/confidential
  • IT Security: Integration with DLP and monitoring

Updated: Dec 2025
Version: v1.0 Beta (Dec 2025)
UI Verification Status: ❌ Needs verification