Skip to content

theogravity/openredaction

 
 

Repository files navigation

OpenRedaction

Version License: MIT Tests TypeScript

OpenRedaction is an open-source JavaScript/TypeScript library for detecting and redacting PII with a regex-first approach. It runs locally by default and can be combined with an optional hosted API for AI-assisted detection.

What is OpenRedaction?

OpenRedaction is a production-ready library that helps you keep sensitive data out of logs, prompts, and analytics pipelines. It combines 570+ curated regex patterns with advanced context-aware validation, checksum verification, and multiple redaction modes.

Key principles:

  • Regex-first: Pattern-based detection runs locally, fast, and private
  • Fully open source: MIT licensed, no vendor lock-in
  • Privacy-first: All detection happens locally by default
  • Production-ready: Battle-tested with 450+ passing tests
  • Hardened patterns: Advanced validation with checksums and context filtering

Installation

npm install openredaction

Basic Usage (Regex-Only)

The library works entirely with regex patterns by default. All detection happens locally in your application.

import { OpenRedaction } from 'openredaction';

const redactor = new OpenRedaction({
  redactionMode: 'placeholder'
});

const result = await redactor.detect('My name is John Smith and my email is john@example.com');

console.log(result.redacted);
// "My name is [NAME_XXXX] and my email is [EMAIL_XXXX]"

console.log(result.detections);
// [{ type: 'EMAIL', value: 'john@example.com', placeholder: '[EMAIL_XXXX]', ... }]

Simple Redaction Example

import { OpenRedaction } from 'openredaction';

const redactor = new OpenRedaction({
  includeNames: true,
  includeEmails: true,
  includePhones: true,
  redactionMode: 'mask-middle'
});

const input = "Contact Sarah Jones at sarah@example.com or call +1 202-555-0110";
const { redacted } = await redactor.detect(input);

console.log(redacted);
// "Contact S***h J***s at s***@example.com or call +1 ***-***-0110"

Pre-processing for LLM Pipelines

import { OpenRedaction } from 'openredaction';

const redactor = new OpenRedaction({
  preset: 'gdpr',
  redactionMode: 'token-replace',
  deterministic: true
});

async function sanitizeForLLM(text: string) {
  const { redacted, redactionMap } = await redactor.detect(text);
  
  // Safe to send to LLM
  const response = await sendToLLM(redacted);
  
  // Optionally restore for trusted destinations
  const restored = redactor.restore(response, redactionMap);
  return { redacted, restored, redactionMap };
}

PII Types & Patterns Overview

OpenRedaction detects 570+ PII patterns across multiple categories:

Personal Information

  • Email addresses
  • Phone numbers (US, UK, International)
  • Names (with context-aware validation)
  • Social Security Numbers (SSN)
  • Passports, Driver's Licenses

Financial (13+ patterns)

  • Credit Cards (with Luhn validation)
  • IBANs, Bank Accounts
  • Swift Codes, Routing Numbers
  • Cryptocurrency addresses

Government IDs (50+ countries)

  • SSN, NINO, NHS Numbers
  • Tax IDs, VAT Numbers
  • Company Registration Numbers
  • ITIN, SIN, and more

Healthcare

  • Medical Record Numbers
  • NHS Numbers, CHI, EHIC
  • Health Insurance IDs
  • Prescription Numbers, DEA Numbers

Digital Identity

  • API Keys, OAuth Tokens
  • JWT, Bearer Tokens
  • Social Media IDs

Industries (25+)

  • Retail, Legal, Real Estate
  • Logistics, Insurance, Healthcare
  • Emergency Response, Hospitality
  • Professional Certifications, and more

Advanced Configuration

OpenRedaction is highly configurable. Pass options to the constructor to tailor detection and redaction:

const redactor = new OpenRedaction({
  // Toggle built-in categories
  includeNames: true,
  includeAddresses: false,
  includeEmails: true,
  
  // Filter by category or specific patterns
  categories: ['financial'],
  patterns: ['EMAIL', 'SSN'],
  
  // Add custom patterns
  customPatterns: [
    {
      type: 'EMPLOYEE_ID',
      regex: /EMP-\d{4}/g,
      priority: 10,
      placeholder: '[EMPLOYEE_ID_{n}]',
      severity: 'medium',
    },
  ],
  
  // Whitelist approved terms
  whitelist: ['ACME Corp'],
  
  // Redaction modes
  redactionMode: 'mask-all', // placeholder | mask-middle | mask-all | format-preserving | token-replace
  
  // Compliance presets
  preset: 'hipaa', // gdpr | hipaa | ccpa | finance | education | transportation
  
  // Advanced options
  deterministic: true,           // Stable placeholders for same value
  enableContextAnalysis: true,   // Context-aware filtering
  confidenceThreshold: 0.5,
  enableCache: true,
});

Common Presets

  • gdpr — General data protection defaults (EU)
  • hipaa — Health data emphasis (US)
  • ccpa — Consumer privacy defaults (California)
  • finance, education, transportation — Sector-focused bundles

Ecosystem

OpenRedaction is part of a broader ecosystem:

  • openredaction (this package) — Core library for local, regex-based PII detection and redaction
  • openredaction-api — Optional hosted API that wraps this library and provides AI-assisted detection with API keys and rate limiting
  • openredaction-site — Website and playground where you can try the library and hosted API in your browser

Using the Hosted API (Optional)

If you want AI-assisted detection or don't want to run your own server, you can call the OpenRedaction hosted API with an API key. Note: Regex-based self-hosted usage is completely free and doesn't require any API key.

// Call the hosted API directly with fetch
const response = await fetch('https://api.openredaction.com/ai-detect', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': process.env.OPENREDACTION_API_KEY!,
  },
  body: JSON.stringify({ 
    text: 'John Smith, john@example.com' 
  }),
});

const data = await response.json();
console.log(data);
// { entities: [...], aiUsed: true }

Important:

  • The hosted API provides AI-assisted detection and requires an API key
  • The core library (this package) runs entirely locally and is free
  • AI-assisted detection is provided via the hosted API, not via this library package
  • For maximum privacy, use the library locally without any API calls

Limitations & Disclaimers

  • Best-effort detection: Regex-based detection is best-effort and may miss edge cases or context-dependent PII
  • Pattern coverage: While we maintain 570+ patterns, the set is not exhaustive and may not cover all PII types
  • AI-assisted detection: AI-assisted detection is provided via the hosted API service, not via this library package
  • Manual review recommended: For highly sensitive use cases, manually review redacted output
  • No guarantees: This library is provided as-is without warranties. Use at your own risk

Contributing

We welcome contributions! OpenRedaction is fully open source and community-driven.

Contribution Areas

  • Pattern improvements: We maintain a regex hardening plan and welcome contributions to improve or extend patterns
  • Bug fixes: Report issues and submit fixes
  • Documentation: Help improve docs and examples
  • Tests: Add test cases for edge cases

Contribution Flow

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests (if applicable)
  5. Submit a pull request

See CONTRIBUTING.md for our workflow, coding standards, and testing steps.

Hardening & Pattern Development

We continuously improve our regex patterns and detection accuracy. Recent improvements include:

  • Enhanced checksum validation for financial patterns
  • Context-aware false positive filtering
  • Separator normalization for international formats
  • Advanced validation for government IDs and crypto addresses

Community & Support

  • Report bugs or request features: Open a GitHub issue with details and reproduction steps
  • Questions or discussions: Use GitHub Discussions or issues to talk through ideas
  • Attribution: Mention "OpenRedaction" and link to this repository in research or production use

License

OpenRedaction is licensed under the MIT License.

About

Open Source PII detection and redaction for JavaScript/Typescript. Achieve enterprise compliance and unparalleled, lightning fast performance, all 100% local.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 98.1%
  • JavaScript 1.9%