privacy is a DuckDB extension that automatically privatizes SQL aggregate queries. It supports two privacy mechanisms:
- PAC mode (default): Empirical membership-inference-attack (MIA) resistance via the PAC Privacy framework. Adds calibrated noise derived from the query's own variance over parallel sub-samples. Tight theoretical MI bounds; no per-query sensitivity analysis needed.
- DP-elastic mode: Formal (Ξ΅,Ξ΄)-differential privacy via elastic sensitivity. Computes a per-query sensitivity bound from the join structure (max frequency per table), then injects calibrated Laplace noise. Provides a provable Ξ΅-DP guarantee.
Both modes share the same DDL and require no changes to query syntax β the extension rewrites aggregate query plans transparently.
This works on DuckDB v1.5 and beyond. See https://duckdb.org/install to install. Or if you do not want to install anything: this extension is also distributed in WASM, so you can run the examples also in https://shell.duckdb.org in a browser.
INSTALL privacy FROM community;
LOAD privacy;Before running private queries, you declare the privacy structure using three DDL constructs:
PRIVACY_KEY (col) β Identifies which column is the privacy unit key. The privacy unit (PU) is the entity being protected: one PU = one customer, user, or individual. Composite keys are supported.
PRIVACY_LINK (local_col) REFERENCES table(col) β Declares that a table's rows belong to a PU via a foreign key. Links propagate privacy through joins: if lineitem links to orders which links to customer, then every lineitem row is tied to a customer (the PU).
PROTECTED (col) β Marks a column as sensitive. Protected columns cannot be returned directly; they can only be accessed inside aggregate functions (SUM, COUNT, etc.). On PU tables, all columns are protected by default.
-- Mark the entity being protected as the privacy unit
ALTER TABLE customer ADD PRIVACY_KEY (c_custkey);
ALTER TABLE customer SET PU;
-- Declare join paths so privacy propagates through the schema
ALTER TABLE orders ADD PRIVACY_LINK (o_custkey) REFERENCES customer(c_custkey);
ALTER TABLE lineitem ADD PRIVACY_LINK (l_orderkey) REFERENCES orders(o_orderkey);
-- Mark sensitive columns (optional on PU tables β all columns protected by default)
ALTER PU TABLE customer ADD PROTECTED (c_name);
ALTER PU TABLE customer ADD PROTECTED (c_acctbal);| PAC (default) | DP-elastic | |
|---|---|---|
| Privacy guarantee | Empirical MIA resistance, theoretical MI bound | Formal (Ξ΅,Ξ΄)-differential privacy |
| Noise calibration | From query variance across 64 sub-samples | From elastic sensitivity of the join tree |
| Joins | Missing PU joins injected automatically from PRIVACY_LINK | FK chain derived from PRIVACY_LINK metadata; explicit joins also work |
| SUM / AVG | Works automatically | Requires SET dp_sum_bound for value clipping |
| When to use | General-purpose; works on most queries | When you need a formal DP certificate |
SET privacy_mode = 'pac'; -- default
SET privacy_mode = 'dp_elastic'; -- formal Ξ΅-DPUsing the TPC-H benchmark. Customers place orders consisting of lineitems. Customer data and purchase history are sensitive.
-- Generate TPC-H database
INSTALL tpch;
LOAD tpch;
CALL dbgen(sf=1);
-- Declare privacy structure
ALTER TABLE customer ADD PRIVACY_KEY (c_custkey);
ALTER TABLE customer SET PU;
ALTER PU TABLE customer ADD PROTECTED (c_custkey);
ALTER PU TABLE customer ADD PROTECTED (c_comment);
ALTER PU TABLE customer ADD PROTECTED (c_acctbal);
ALTER PU TABLE customer ADD PROTECTED (c_name);
ALTER PU TABLE customer ADD PROTECTED (c_address);
ALTER TABLE orders ADD PRIVACY_LINK (o_custkey) REFERENCES customer(c_custkey);
ALTER TABLE lineitem ADD PRIVACY_LINK (l_orderkey) REFERENCES orders(o_orderkey);
ALTER TABLE orders ADD PROTECTED (o_comment);
ALTER TABLE lineitem ADD PROTECTED (l_comment);
-- Protected columns cannot be returned directly
SELECT c_name FROM customer;
-- Error: protected column 'customer.c_name' can only be accessed inside
-- aggregate functions (e.g., SUM, COUNT, AVG, MIN, MAX)
-- PAC rewrites the aggregate and injects the join to orders automatically
-- The noised result is close to the true answer but safe against MIA
SELECT l_returnflag, l_linestatus, SUM(l_extendedprice) FROM lineitem GROUP BY ALL;
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββ
β l_returnflag β l_linestatus β sum(l_extendedprice) β
β varchar β varchar β decimal(38,2) β
ββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββ€
β A β F β 57278925373.44 β
β N β F β 1515625185.28 β
β N β O β 116295729152.00 β
β R β F β 57318996705.28 β
ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββββ
-- The rewritten plan β note pac_noised_sum and the injected join to orders
EXPLAIN SELECT l_returnflag, l_linestatus, SUM(l_extendedprice) FROM lineitem GROUP BY ALL;
βββββββββββββββββββββββββββββ
β PERFECT_HASH_GROUP_BY β
β ββββββββββββββββββββ β
β Groups: #0 #1 β
β Aggregates: β
β pac_noised_sum(#2, #3) β
βββββββββββββββ¬ββββββββββββββ
βββββββββββββββ΄ββββββββββββββ
β HASH_JOIN β
β ββββββββββββββββββββ β
β Join Type: INNER β
β Conditions: #3 = #0 ββββββββββββββββ
β ~6,036,047 rows β β
βββββββββββββββ¬ββββββββββββββ β
βββββββββββββββ΄βββββββββββββββββββββββββββββ΄ββββββββββββββ
β SEQ_SCAN ββ PROJECTION β
β ββββββββββββββββββββ ββ ββββββββββββββββββββ β
β memory.main.lineitem ββ pac_pu=pac_hash(hash(#1)) β
β l_returnflag ββ #0 β
β l_linestatus ββ β
β l_extendedprice ββ β
β l_orderkey ββ β
β ~6,001,215 rows ββ ~1,500,000 rows β
ββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββ
βββββββββββββββ΄ββββββββββββββ
β SEQ_SCAN β
β ββββββββββββββββββββ β
β memory.main.orders β
β o_orderkey β
β o_custkey β
β ~1,500,000 rows β
βββββββββββββββββββββββββββββ
-- Each query uses a different hash function β results vary slightly per run
SELECT l_returnflag, l_linestatus, SUM(l_extendedprice) FROM lineitem GROUP BY ALL;
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββ
β l_returnflag β l_linestatus β sum(l_extendedprice) β
β varchar β varchar β decimal(38,2) β
ββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββ€
β A β F β 58988885442.56 β
β N β F β 1613206650.88 β
β N β O β 119904634142.72 β
β R β F β 58803811778.56 β
ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββββ
-- Disable noise to see the true answer
SET privacy_noise = false;
SELECT l_returnflag, l_linestatus, SUM(l_extendedprice) FROM lineitem GROUP BY ALL;
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββ
β l_returnflag β l_linestatus β sum(l_extendedprice) β
β varchar β varchar β decimal(38,2) β
ββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββ€
β A β F β 56586554400.73 β
β N β F β 1487504710.38 β
β N β O β 114935210409.19 β
β R β F β 56568041380.90 β
ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββββSame dataset and privacy structure. No explicit joins are needed β the extension derives the FK chain from PRIVACY_LINK metadata automatically.
-- (same DDL as above β reuse the privacy structure already declared)
SET privacy_mode = 'dp_elastic';
SET dp_epsilon = 1.0;
SET dp_delta = 1e-6;
SET dp_sum_bound = 100000; -- max absolute l_extendedprice per row, for clipping
SET privacy_seed = 42;
-- No joins needed: the extension derives lineitem β orders β customer from PRIVACY_LINK metadata
SELECT l_returnflag, l_linestatus, SUM(l_extendedprice) FROM lineitem GROUP BY ALL ORDER BY ALL;
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββ
β l_returnflag β l_linestatus β sum(l_extendedprice) β
β varchar β varchar β decimal(38,2) β
ββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββ€
β A β F β 56589946031.64 β
β N β F β 1534396959.73 β
β N β O β 114528541709.94 β
β R β F β 56505612137.93 β
ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββββThe extension computes elastic sensitivity as the product of per-table max frequencies along the lineitem β orders β customer chain (via auxiliary MAX(COUNT(*)) queries, not by executing the join), clips each value to [-dp_sum_bound, dp_sum_bound], and injects Laplace(sensitivity/Ξ΅) noise into the aggregate result. This gives a formal (Ξ΅,Ξ΄)-DP guarantee at the privacy-unit (customer) level.
- You declare which table is the privacy unit and which columns to protect.
- You link related tables with
PRIVACY_LINKto propagate privacy through joins. - The extension intercepts every aggregate query, hashes each PU key into a 64-bit value, and uses the bits to create 64 sub-samples (possible worlds). Each aggregate runs on all sub-samples independently. The final result is taken from one secret world, noised using the variance across all worlds. Each query uses a different hash function and selects a different secret world.
PAC bounds the mutual information (MI) between the query output and whether any individual is in the database. The pac_mi parameter sets this bound: at the default pac_mi = 1/128, an attacker gains at most 1/128 nats of information per query. PAC is not differential privacy β it provides empirical MIA resistance with theoretical MI bounds, not a formal Ξ΅-DP guarantee.
Uses elastic sensitivity (Flex, Johnson/Near/Song VLDB 2018) to derive a per-query upper bound on local sensitivity from the join structure. The sensitivity equals the product of per-table max frequencies along the FK join chain β computed from auxiliary MAX(COUNT(*)) queries, not by executing the user's join. Each SUM/AVG input is clipped to [-dp_sum_bound, dp_sum_bound] and calibrated Laplace noise is added to the aggregate result.
AVG(x) is rewritten to noised SUM(x) / COUNT(*). With k user-visible aggregates in a query the budget is split evenly under sequential composition β each consumes Ξ΅/k, and AVG further splits its share into Ξ΅/(2k) per component, so the total cost across the query stays at Ξ΅. WHERE, HAVING, and FILTER (WHERE β¦) aggregate clauses are honoured: the predicates run before noise is added (or, for HAVING, against the already-noised value, which is post-processing and DP-safe). When privacy_min_group_count is set, low-count groups are dropped via Ο-thresholding (a filter on the noised value, also post-processing).
This gives a formal (Ξ΅,Ξ΄)-DP guarantee at the user level β neighboring datasets differ by removing all rows belonging to one privacy unit across all linked tables. Requires Ξ΄ > 0 for smooth elastic sensitivity (tighter noise); set dp_delta = 0 for pure Ξ΅-DP with global elastic sensitivity.
| Setting | Default | Mode | Description |
|---|---|---|---|
privacy_mode |
pac |
both | Active privacy mechanism: pac or dp_elastic |
privacy_seed |
random | both | Fix seed for reproducible results |
privacy_noise |
true |
both | Toggle noise injection |
privacy_min_group_count |
NULL |
both | Suppress GROUP BY cells below this count (PAC: low-SNR suppression; DP: Ο-thresholding) |
pac_mi |
1/128 |
pac | Mutual information bound (higher = less noise, more leakage) |
pac_ctas |
true |
pac | Propagate PAC metadata through CTAS |
privacy_diffcols |
NULL |
both | Utility diff: compare noised vs exact results |
dp_epsilon |
1.0 |
dp_elastic | Privacy budget Ξ΅ |
dp_delta |
1e-6 |
dp_elastic | Failure probability Ξ΄ (0 = pure DP) |
dp_sum_bound |
required | dp_elastic | Clipping bound for SUM/AVG inputs: values are clipped to [-bound, bound] |
PAC mode supports SUM, COUNT, AVG, MIN, MAX, and COUNT(DISTINCT). DP-elastic supports SUM, COUNT, and AVG β MIN and MAX have unbounded sensitivity in DP and are not supported. Joins, subqueries (correlated and uncorrelated), UNION/UNION ALL, GROUP BY, HAVING, ORDER BY, and LIMIT all work. Window functions and EXCEPT/INTERSECT are not yet supported.
-- Create a new PU table
CREATE PU TABLE t (col1 INT, col2 INT, PRIVACY_KEY (col1), PROTECTED (col2));
-- Or convert an existing table
ALTER TABLE t ADD PRIVACY_KEY (col1);
ALTER TABLE t SET PU;
ALTER TABLE t UNSET PU;
-- Non-PU tables: use ALTER TABLE
ALTER TABLE orders ADD PRIVACY_LINK (fk_col) REFERENCES t(col1);
ALTER TABLE orders ADD PROTECTED (sensitive_col);
-- PU tables: use ALTER PU TABLE
ALTER PU TABLE t ADD PROTECTED (col2);
ALTER PU TABLE t DROP PROTECTED (col2);CREATE TABLE AS SELECT from a PU table automatically propagates privacy metadata to the new table (see syntax docs).
For implementation details, see the docs/ folder:
Parser | Query Operators | PAC Functions | Runtime Checks | Tests | Benchmarks
I. Battiston, D. Yuan, X. Zhu, P. Boncz. SIMD-PAC-DB: Pretty Performant PAC Privacy. 2026.
N. Johnson, J.M. Near, D. Song. Towards Practical Differential Privacy for SQL Queries (Flex / Elastic Sensitivity). VLDB 2018.
R. Wilson, C. Zhang, W. Lam, D. Desfontaines, D. Simmons-Marengo, B. Gipson. Differentially Private SQL with Bounded User Contribution. PoPETs 2020.
@misc{battiston2026simdpacdbprettyperformantpac,
title={SIMD-PAC-DB: Pretty Performant PAC Privacy},
author={Ilaria Battiston and Dandan Yuan and Xiaochen Zhu and Peter Boncz},
year={2026},
eprint={2603.15023},
archivePrefix={arXiv},
primaryClass={cs.DB},
url={https://arxiv.org/abs/2603.15023},
}This extension is maintained by @ila (ilaria@cwi.nl).