Skip to content

This repository contains a reference implementation and deployment scaffold for a privacy-preserving PDF sanitization agent (Content Disarm & Reconstruction focused). The project is intended for defensive use only — to remove active content and sensitive metadata from incoming PDFs before distribution or storage.

Notifications You must be signed in to change notification settings

zjncs/pdf-sanitizer-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

pdf-sanitizer-agent

This repository contains a reference implementation and deployment scaffold for a privacy-preserving PDF sanitization agent (Content Disarm & Reconstruction focused). The project is intended for defensive use only — to remove active content and sensitive metadata from incoming PDFs before distribution or storage.

Project goals

Provide a local-first, auditable pipeline to sanitize PDF files. Minimize retention of PII and full-text in logs by design. Offer configurable sanitization policies (light ↔ strong). Provide easy deployment patterns: CLI, Docker, AWS Lambda / S3 trigger, and Email gateway examples. Be a developer-friendly open-source project with tests and CI.

Quick links

License: MIT Language: Python (>=3.9) Tools used: pikepdf, qpdf, exiftool / mat2, ghostscript (optional), pytest

About

This repository contains a reference implementation and deployment scaffold for a privacy-preserving PDF sanitization agent (Content Disarm & Reconstruction focused). The project is intended for defensive use only — to remove active content and sensitive metadata from incoming PDFs before distribution or storage.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published