AI Smart Redact
AI Smart Redact detects and permanently removes sensitive information from PDFs. The service runs entirely within your infrastructure, so no data leaves your environment. AI Smart Redact is built for regulated industries with strict data-sovereignty and compliance requirements: government, financial services, insurance, healthcare, and legal sectors, that require full data sovereignty, provable compliance, and complete auditability.
How smart redaction works
AI Smart Redact processes documents through a four-stage pipeline.
- Input. An integrating system submits a PDF. AI Smart Redact encrypts it immediately.
- Detect. The detection engine identifies personally identifiable information (PII) using a hybrid of an AI model and a deterministic rules engine.
- Review. A reviewer inspects, dismisses, or adds detections, and then approves the set before any redaction is applied.
- Redact. AI Smart Redact creates a new PDF by copying only the visible, approved elements. Hidden content, metadata, and invisible layers don’t carry over.
Start with Get started with AI Smart Redact to bring the stack up with Docker Compose.
Detection engine
AI Smart Redact combines two complementary detection approaches:
- AI model. A non-generative Named Entity Recognition (NER) model. It identifies context-dependent entities (people, organizations, addresses) and supports English, German, French, Italian, Spanish, Portuguese, and Dutch. The model works out of the box; no customer data is needed for training. It can’t hallucinate or produce output beyond text in the document.
- Rules engine. A deterministic pattern matcher for structured identifiers: credit card numbers, IBANs, account numbers, case IDs, and other domain-specific patterns. Each match is explainable, and checksum or format validation rejects false-positive matches.
You can extend both: add new PII entity types through configuration, and add new patterns without retraining the model. For the full pipeline and per-method details, refer to Detection.
Key features
AI Smart Redact provides:
- Self-hosted: Deploy in your own infrastructure. License validation is offline. Runtime usage reporting connects to the Pdftools licensing server, or to an on-premise License Gateway Service for air-gapped deployments.
- True redaction: The output PDF contains only visible, approved elements. Hidden content, metadata, and invisible layers don’t carry over.
- Multilingual detection: The AI model recognizes context-dependent entities in English, German, French, Italian, Spanish, Portuguese, and Dutch out of the box.
- Human-in-the-Loop (HITL) review: A reviewer approves every detection before redaction.
- Full audit trail: OpenTelemetry integration provides per-job traceability. Every detection and redaction action is logged for compliance verification.
Compliance
AI Smart Redact targets regulated industries where data sovereignty and provable handling are non-negotiable. The deployment model and the detection pipeline together cover the following regimes:
| Regulation or standard | How AI Smart Redact supports it |
|---|---|
| GDPR Art. 5(1)(b,c,e) | Purpose limitation, data minimization, and storage limitation through per-file AES-256-GCM encryption and crypto-erasure (refer to Data handling). |
| GDPR Art. 30 | Records of processing activities through the OpenTelemetry audit trail (every detection and redaction action is logged per job). |
| GDPR Art. 32 | Security of processing through encryption at rest, JWT-authenticated APIs, and self-hosted deployment that keeps data in your environment. |
| GDPR Art. 35 | Data protection impact assessment supported by deterministic rule matches and an explainable, non-generative AI model. |
| NIST SP 800-88 | Provable media sanitization through DEK token deletion (crypto-erasure makes encrypted files cryptographically unrecoverable). |
| Data sovereignty | Fully self-hosted on your infrastructure or in an air-gapped environment. Offline license validation. No customer data ever leaves your network. |
For the encryption mechanism, DEK token lifecycle, and erasure scenarios, refer to Data handling.
Data handling
AI Smart Redact treats every uploaded file as sensitive from the moment it arrives. Files are encrypted at rest with a per-file key, the key tokens live only as long as a job needs them, and deleting a token renders the underlying file cryptographically unrecoverable. The sections that follow cover the encryption scheme, where DEK tokens are cached during the human review workflow, and how crypto-erasure is triggered.
File encryption
AI Smart Redact encrypts each uploaded file at rest using AES-256-GCM with a unique per-file Data Encryption Key (DEK). The Manager doesn’t persist DEK tokens; it returns each token to the integrating system, which holds it. The Orchestrator caches tokens temporarily for the human review workflow only; refer to DEK token storage in the human review workflow. Without the token, the encrypted file is cryptographically unreadable.
DEK token storage in the human review workflow
During human review, the Orchestrator caches each DEK token until the reviewer finishes. Two backends are available:
| Backend | When to use |
|---|---|
| Redis (recommended) | Configure with Redis__ConnectionString on the Orchestrator. Deploy without persistence (no AOF, no RDB) so cached tokens are lost on restart, which is what guarantees crypto-erasure. |
| In-memory (fallback) | Used automatically when Redis__ConnectionString is empty. Single-instance only; tokens don’t survive a restart or scale across replicas. |
Crypto-erasure
Deleting a DEK token makes the corresponding file permanently unrecoverable, even if encrypted blobs remain in backup storage. This supports provable deletion in line with General Data Protection Regulation (GDPR) Art. 5(1)(e) and NIST SP 800-88.
The following scenarios trigger crypto-erasure:
| Scenario | Result |
|---|---|
| Client deletes the DEK token | File is immediately and permanently unrecoverable. |
| DEK token time to live (TTL) expires | Server rejects further operations; file is unrecoverable. |
Client calls DELETE /v1/files/{fileId} | Encrypted blob deleted; token discarded. |
For the regulations and standards this design supports, refer to Compliance.
Deployment
AI Smart Redact ships as Docker images and supports Docker Compose and Kubernetes deployments. The full CPU stack requires approximately 8.5 GB RAM and 9.5 CPU cores across the service containers. For the per-service breakdown, refer to System requirements. To bring the stack up, refer to Get started with AI Smart Redact.
A CUDA-compatible GPU is optional but recommended for higher detection throughput at scale. For more details, refer to Scale and Worker configuration.
Licensing
AI Smart Redact is licensed per deployment. For setup, review Licensing. To get a license or discuss pricing, contact sales.