Run OpenAI's PII detector entirely in your browser.
100% local inference · WebGPU · no backend · no data transmission.
Live demo · Quickstart · How it works · Browser support · Deploy · Privacy · Montevive.ai
Try it now: labs.montevive.ai/openai-privacy-demo
Hosted on the Montevive Labs subdomain. First load fetches ~770 MB of model weights from the Hugging Face CDN (cached in your browser afterwards); every subsequent visit starts instantly. Open your browser's DevTools Network tab to verify for yourself that nothing is sent back to a server.
A small browser app that runs openai/privacy-filter
— OpenAI's bidirectional token classifier for personal data detection — entirely on the
user's device. Model weights are downloaded once from the Hugging Face CDN, cached in
IndexedDB, and inference runs on the local GPU via WebGPU (with a WASM CPU fallback for
browsers without WebGPU). There is no backend. There are no API calls. Your text never
leaves the tab it's typed into.
Built by Montevive.ai as a concrete example of the privacy-first techniques we advocate for. Secure AI for secure decisions.
- 100% local inference — model weights live in IndexedDB, tensors live on the user's GPU. No server, no API, no telemetry.
- WebGPU first, WASM fallback — uses
navigator.gpuwhen available, falls back to ONNX Runtime Web on CPU otherwise. - Adaptive precision — detects
shader-f16support and picks theq4f16(772 MB) variant when it's safe, orq4(875 MB) otherwise. Manual override in an Advanced toggle. - Pre-flight system check — shows WebGPU /
shader-f16/ GPU buffer / device memory / storage quota probes before any bytes are fetched. No auto-download. - Web-Worker inference — keeps the UI thread responsive during model load and scoring.
- Masked output + entity table — 8 PII categories (
private_person,private_email,private_phone,private_url,private_address,private_date,account_number,secret) with character-level spans and confidence scores. - Light + dark theme — honors
prefers-color-scheme, with a manual toggle persisted inlocalStorage. - Deploy-anywhere static build — a single
BASE_PATH=/repo/ npm run buildproduces a drop-in GitHub Pages site.
git clone https://github.com/montevive/openai-privacy-filter-web.git
cd openai-privacy-filter-web
npm install
npm run dev # open http://localhost:5173Requires Node 18+, a modern browser (Chrome 120+, Edge 120+, Safari 26+, or Firefox 145+ on macOS ARM), and ~1 GB of free IndexedDB storage on first visit.
┌──────────────┐ ┌────────────────┐ ┌────────────────────┐
│ App.tsx │──►│ worker.ts │──►│ transformers.js │
│ (UI) │ │ (Web Worker) │ │ pipeline │
└──────┬───────┘ └────────────────┘ └─────────┬──────────┘
│ │
│ postMessage { type: 'run', text } │ fetch once
▼ ▼
diagnostics.ts ┌────────────────────┐
(WebGPU / CPU │ Hugging Face CDN │
capability probe) │ openai/privacy-fltr│
└─────────┬──────────┘
│ cached in
▼
┌────────────────────┐
│ Browser IndexedDB │
└────────────────────┘
- Pre-flight. On mount,
src/diagnostics.tsprobes the browser:navigator.gpu.requestAdapter(),adapter.features.has('shader-f16'),adapter.limits.maxBufferSize,navigator.deviceMemory,navigator.storage.estimate(). It returns a recommended{device, dtype}pair and never fires a request for the model. - User action. The Load model button is the only trigger for the ~800 MB download. Progress streams per-file from the HF CDN.
- Inference.
src/worker.tskeeps a singletonTokenClassificationPipelinealive. Each input sentence is scored withaggregation_strategy: "simple"; character offsets are reconstructed locally (the BPE tokenizer doesn't expose them, so we walk the input withindexOf). - Render.
src/App.tsxshows a colour-coded masked view plus a table of(label, text, score, range)per detected entity.
| Browser | WebGPU | shader-f16 |
Active variant | Notes |
|---|---|---|---|---|
| Chrome / Edge 120+ (Windows, Linux, macOS, macOS ARM) | ✅ | ✅ | q4f16 (772 MB) |
Best experience |
| Safari 26+ (macOS / iOS) | ✅ | ✅ | q4f16 |
Stable since Sept 2025 on macOS Tahoe |
| Firefox 145+ (macOS ARM) | ✅ | partial | q4 or q4f16 |
WebGPU on Mac ARM; variable elsewhere |
| Safari ≤ 18 | ❌ | — | q4f16 via WASM |
Falls back to CPU (~1 s/sentence) |
| Chrome on Android (120+) | depends | device-specific | Works on higher-end SoCs | |
Older desktop Linux without shader-f16 |
✅ | ❌ | q4 (875 MB) |
Auto-selected; pure int4 |
If shader-f16 is missing, the app automatically picks q4. If WebGPU is unavailable, it falls back to the WASM CPU backend. Both choices are shown in the system-check card before anything is downloaded.
All five ONNX variants published by OpenAI on the Hub are supported. Only the first two are exposed by default; the rest are reachable through the Advanced toggle.
| Dtype | File | On-disk | Best for |
|---|---|---|---|
q4f16 |
model_q4f16.onnx |
772 MB | WebGPU with shader-f16 (default) |
q4 |
model_q4.onnx |
875 MB | WebGPU without shader-f16 |
fp16 |
model_fp16.onnx |
2.6 GB | Powerful devices prioritizing quality |
q8 |
model_quantized.onnx |
1.5 GB | CPU fallback on older hardware |
Size → speed measurements on CPU are in our internal research notes.
npm run dev # start Vite dev server on :5173
npm run build # typecheck + production build → dist/
npm run preview # serve the built dist/ locally
npm run lint # eslint.
├── index.html # theme boot + OG/Twitter meta
├── public/
│ └── img/
│ └── logo-montevive.png
├── src/
│ ├── App.tsx # UI: Header, DiagnosticsPanel, ResultsPanel, Footer
│ ├── App.css # Light + dark palettes, Montevive colors
│ ├── diagnostics.ts # WebGPU / browser capability probes + recommendation
│ ├── main.tsx
│ ├── types.ts # WorkerMessage + Entity + Diagnostics types
│ └── worker.ts # Singleton transformers.js pipeline
├── vite.config.ts # base: process.env.BASE_PATH ?? '/'
├── deploy/
│ ├── Dockerfile # multi-stage: Vite build → nginx
│ ├── nginx.conf
│ ├── landing/ # labs.montevive.ai root landing page
│ ├── k8s/ # Namespace, Deployment, Service, HTTPRoute, Certificate
│ └── README.md # DNS / deploy / rollback docs
└── .github/workflows/
└── publish.yml # build + push to ghcr.io on push to main
- Extend the
Diagnosticsinterface insrc/types.ts. - Compute the new field in
runDiagnostics()insidesrc/diagnostics.ts. - Add a row to
DiagnosticsPanelinsrc/App.tsxwith a pass/warn/fail icon.
This repo builds cleanly to a static bundle. From the root of the repository:
BASE_PATH=/openai-privacy-filter-web/ npm run buildThen publish dist/ using the actions/deploy-pages workflow or by pushing to a gh-pages branch.
A minimal workflow (save as .github/workflows/pages.yml):
name: Deploy to GitHub Pages
on:
push: { branches: [main] }
workflow_dispatch:
permissions: { pages: write, id-token: write, contents: read }
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20' }
- run: npm ci && BASE_PATH=/${{ github.event.repository.name }}/ npm run build
- uses: actions/upload-pages-artifact@v3
with: { path: dist }
deploy:
needs: build
runs-on: ubuntu-latest
environment: { name: github-pages, url: ${{ steps.deployment.outputs.page_url }} }
steps:
- id: deployment
uses: actions/deploy-pages@v4npm run build with no env vars produces a root-hosted site. All assets are fingerprinted, so long-cache headers are safe on everything except index.html.
- The app requests ~800 MB of model files from
huggingface.coon first load. If you self-host, you'll need to mirror those files and point transformers.js at your mirror viaenv.remoteHost. - WebGPU requires an HTTPS context outside of
localhost. GitHub Pages / Netlify / Cloudflare Pages all qualify out of the box.
This is the whole point of the demo. To make it as honest as possible:
- No server-side inference. The repo has no backend. Inference runs entirely in the visitor's browser.
- No analytics or telemetry. No Google Analytics, no Plausible, no Sentry, no third-party scripts. The only network requests made after page load are to the Hugging Face CDN for model weights (once, then cached).
- No tracking cookies. The only things persisted are the theme preference (
localStorage) and the model weights (IndexedDB). - Your text is never transmitted. The textarea content never leaves the browser — it's passed by
postMessageto a same-origin Web Worker and nothing else.
If you fork this and add analytics, please update this section so the statement remains literally true.
- Architecture. Pre-norm transformer encoder with grouped-query attention, 128-expert MoE, 50M active / 1.5B total parameters.
- Output. 33 BIOES token classes over 8 privacy categories, decoded with either HF's built-in
aggregation_strategy: "simple"(what this demo uses) or a constrained Viterbi decoder (shipped with the model but not wired up in the browser yet). - License. Apache 2.0 — commercial use permitted.
- Model card. Full card (PDF).
- Disclaimer. The model's authors explicitly flag it as a "redaction and data-minimization aid, not an anonymization, compliance, or safety guarantee." High-stakes deployments should layer it with policy, audit and human review.
This demo wouldn't exist without the work of several teams who chose to give their research away. Heartfelt thanks to:
- OpenAI — thank you for training the privacy-filter model and, crucially, for releasing it under Apache 2.0. An on-device PII detector with a permissive license is exactly what the ecosystem needed; the fact that we can run it in a browser tab, commercially, without phoning home, is a direct consequence of that choice. Extra thanks for shipping pre-quantized ONNX variants (including
q4f16) right in the repo — the demo works out of the box because of that. - Hugging Face — thank you for transformers.js and the whole WebGPU + ONNX pipeline stack. The v4 release turned "run any HF model in the browser" from a party trick into a boring one-liner, and we appreciate it. Thanks also for hosting the weights on the Hub and keeping the CDN fast.
- ONNX Runtime — thank you for the Web backend. The WebGPU execution provider (and the WASM fallback that picks up the slack on Safari ≤ 18) is what actually makes this fast on consumer hardware.
- The WebGPU working group — thank you for shipping a real GPU API to the browser. Running a 1.5B-parameter model on-device at ~50 ms/sentence is genuinely new, and it's only possible because you landed the standard.
- The
tokenizersandonnxruntime-webmaintainers — thank you for the countless hours of unglamorous work that make everything above Just Work™ for end users. - Everyone who reported issues, wrote blog posts, and answered our questions while we were getting WebGPU +
shader-f16+ transformers.js v4 to cooperate — you made this a weekend instead of a month.
And of course, Montevive.ai built and published the demo itself — if it's useful to you, we'd love to hear about it.
Copyright © Montevive.ai. Licensed under the Apache License, Version 2.0. See LICENSE for the full text.
The underlying model is distributed separately by OpenAI under Apache 2.0.
Secure AI for secure decisions. We help companies make strategic use of AI safely, with legal compliance and without putting their information at risk. 100% AI, 99% security.
Built with ♥ by Montevive.ai