Skip to main content

AI Smart Redact

AI Smart Redact detects and permanently removes sensitive information from PDFs. The service runs entirely within your infrastructure, so no data leaves your environment. AI Smart Redact is built for regulated industries with strict data-sovereignty and compliance requirements: government, financial services, insurance, healthcare, and legal sectors, that require full data sovereignty, provable compliance, and complete auditability.

How smart redaction works

AI Smart Redact processes documents through a four-stage pipeline.

AI Smart Redact workflow: upload document, detect sensitive data, review findings, apply redactions, download redacted outputAI Smart Redact workflow: upload document, detect sensitive data, review findings, apply redactions, download redacted output
  1. Input. An integrating system submits a PDF. AI Smart Redact encrypts it immediately.
  2. Detect. The detection engine identifies personally identifiable information (PII) using a hybrid of an AI model and a deterministic rules engine.
  3. Review. A reviewer inspects, dismisses, or adds detections, and then approves the set before any redaction is applied.
  4. Redact. AI Smart Redact creates a new PDF by copying only the visible, approved elements. Hidden content, metadata, and invisible layers don’t carry over.
Ready to deploy?

Start with Get started with AI Smart Redact to bring the stack up with Docker Compose.

Detection engine

AI Smart Redact combines two complementary detection approaches:

  • AI model. A non-generative Named Entity Recognition (NER) model. It identifies context-dependent entities (people, organizations, addresses) and supports English, German, French, Italian, Spanish, Portuguese, and Dutch. The model works out of the box; no customer data is needed for training. It can’t hallucinate or produce output beyond text in the document.
  • Rules engine. A deterministic pattern matcher for structured identifiers: credit card numbers, IBANs, account numbers, case IDs, and other domain-specific patterns. Each match is explainable, and checksum or format validation rejects false-positive matches.

You can extend both: add new PII entity types through configuration, and add new patterns without retraining the model. For the full pipeline and per-method details, refer to Detection.

Key features

AI Smart Redact provides:

  • Self-hosted: Deploy in your own infrastructure. License validation is offline. Runtime usage reporting connects to the Pdftools licensing server, or to an on-premise License Gateway Service for air-gapped deployments.
  • True redaction: The output PDF contains only visible, approved elements. Hidden content, metadata, and invisible layers don’t carry over.
  • Multilingual detection: The AI model recognizes context-dependent entities in English, German, French, Italian, Spanish, Portuguese, and Dutch out of the box.
  • Human-in-the-Loop (HITL) review: A reviewer approves every detection before redaction.
  • Full audit trail: OpenTelemetry integration provides per-job traceability. Every detection and redaction action is logged for compliance verification.

Compliance

AI Smart Redact targets regulated industries where data sovereignty and provable handling are non-negotiable. The deployment model and the detection pipeline together cover the following regimes:

Regulation or standardHow AI Smart Redact supports it
GDPR Art. 5(1)(b,c,e)Purpose limitation, data minimization, and storage limitation through per-file AES-256-GCM encryption and crypto-erasure (refer to Data handling).
GDPR Art. 30Records of processing activities through the OpenTelemetry audit trail (every detection and redaction action is logged per job).
GDPR Art. 32Security of processing through encryption at rest, JWT-authenticated APIs, and self-hosted deployment that keeps data in your environment.
GDPR Art. 35Data protection impact assessment supported by deterministic rule matches and an explainable, non-generative AI model.
NIST SP 800-88Provable media sanitization through DEK token deletion (crypto-erasure makes encrypted files cryptographically unrecoverable).
Data sovereigntyFully self-hosted on your infrastructure or in an air-gapped environment. Offline license validation. No customer data ever leaves your network.

For the encryption mechanism, DEK token lifecycle, and erasure scenarios, refer to Data handling.

Data handling

AI Smart Redact treats every uploaded file as sensitive from the moment it arrives. Files are encrypted at rest with a per-file key, the key tokens live only as long as a job needs them, and deleting a token renders the underlying file cryptographically unrecoverable. The sections that follow cover the encryption scheme, where DEK tokens are cached during the human review workflow, and how crypto-erasure is triggered.

File encryption

AI Smart Redact encrypts each uploaded file at rest using AES-256-GCM with a unique per-file Data Encryption Key (DEK). The Manager doesn’t persist DEK tokens; it returns each token to the integrating system, which holds it. The Orchestrator caches tokens temporarily for the human review workflow only; refer to DEK token storage in the human review workflow. Without the token, the encrypted file is cryptographically unreadable.

DEK token storage in the human review workflow

During human review, the Orchestrator caches each DEK token until the reviewer finishes. Two backends are available:

BackendWhen to use
Redis (recommended)Configure with Redis__ConnectionString on the Orchestrator. Deploy without persistence (no AOF, no RDB) so cached tokens are lost on restart, which is what guarantees crypto-erasure.
In-memory (fallback)Used automatically when Redis__ConnectionString is empty. Single-instance only; tokens don’t survive a restart or scale across replicas.

Crypto-erasure

Deleting a DEK token makes the corresponding file permanently unrecoverable, even if encrypted blobs remain in backup storage. This supports provable deletion in line with General Data Protection Regulation (GDPR) Art. 5(1)(e) and NIST SP 800-88.

The following scenarios trigger crypto-erasure:

ScenarioResult
Client deletes the DEK tokenFile is immediately and permanently unrecoverable.
DEK token time to live (TTL) expiresServer rejects further operations; file is unrecoverable.
Client calls DELETE /v1/files/{fileId}Encrypted blob deleted; token discarded.

For the regulations and standards this design supports, refer to Compliance.

Deployment

AI Smart Redact ships as Docker images and supports Docker Compose and Kubernetes deployments. The full CPU stack requires approximately 8.5 GB RAM and 9.5 CPU cores across the service containers. For the per-service breakdown, refer to System requirements. To bring the stack up, refer to Get started with AI Smart Redact.

GPU acceleration

A CUDA-compatible GPU is optional but recommended for higher detection throughput at scale. For more details, refer to Scale and Worker configuration.

Licensing

AI Smart Redact is licensed per deployment. For setup, review Licensing. To get a license or discuss pricing, contact sales.