Skip to content

Tags: PostHog/duckgres

Tags

latest

Toggle latest's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore(httpfs): bump extension to v1.5.3-cred-refresh — mid-statement …

…credential recovery (#780)

Rolls the PostHog httpfs fork patch (PostHog/duckdb-httpfs branch
cred-refresh-read-path, tag commit b1fece6, based on the previously
deployed v1.5.3-stoi-fix) into both images and the worker-CD matrix
default row. On an auth failure (400/403) in any S3 request path —
range GET, full GET, HEAD, DELETE, and the multipart-upload POST/PUT —
the extension re-resolves the LATEST COMMITTED ducklake_s3 secret
(bypassing the statement's MVCC snapshot) and retries once with the
refreshed credentials.

Combined with the control plane's credential-refresh scheduler (which
keeps committing fresh STS credentials via CREATE OR REPLACE SECRET on
the worker's controlDB every <=30min), this removes the
statement-length ceiling on STS-backed DuckLake/Iceberg access: a
backfill DELETE/SELECT can now run for hours, surviving expiry of the
token it started with. Stock httpfs has no such recovery — pinned by
TestInFlightScanDiesOnCredentialRotation; the patched behavior is
proven by TestInFlightScanSurvivesRotationWithPatchedHTTPFS
(tests/integration, real MinIO rotation + revocation mid-scan, exact
row count).

Harness: assert_fork_extensions now expects the new build's SHA
(EXPECT_HTTPFS_SHA=b1fece6), so the e2e Job fails if workers ship any
other httpfs build; README documents why real in-statement STS expiry
is not soak-tested in-Job (900s AssumeRole floor vs Job budget) and
where the deterministic coverage lives.

Rollback: revert this commit — v1.5.3-stoi-fix release assets remain
published; the 1.5.2 matrix row is untouched.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

build-f414f19

Toggle build-f414f19's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(flightclient): stop doubling backslashes in interpolated string a…

…rgs (#776)

* fix(flightclient): stop doubling backslashes in interpolated string args

formatArgValue escaped backslashes C-style (\ -> \\) before wrapping
bound string parameters in single-quoted literals. DuckDB standard
'...' literals — like standard-conforming PostgreSQL — treat backslash
as a literal character, so the doubling silently corrupted every bound
string containing a backslash (Windows paths, regexes, JSON with
escapes) on the process/remote worker backends: 'a\b' arrived in the
table as a\\b. Only the single quote needs doubling in these
literals; the BLOB branch ('\x…'::BLOB) is unaffected because blob
casts do interpret \x escapes.

The new test round-trips interpolated literals through a real DuckDB
instance, so it encodes the engine's actual literal semantics rather
than an assumption about them.

* test(flightclient): satisfy errcheck on db.Close in backslash round-trip test

build-e28ec9d

Toggle build-e28ec9d's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(usersecrets): redact secret DDL anywhere in a multi-statement que…

…ry (#769)

* fix(usersecrets): redact secret DDL anywhere in a multi-statement query

RedactForLog only classified the statement head, so a client could prefix
any leading statement (e.g. "SELECT 1; CREATE PERSISTENT SECRET foo (...,
SECRET 'realKey')") and the head no longer classified as secret DDL — the
query, including credential literals, was returned verbatim and written to
logs, OTel spans, the persisted query log, and pg_stat_activity. This
violates the "Never log/store secret statement text" invariant.

RedactForLog now splits the query on top-level semicolons (reusing the
existing literal/identifier/comment-aware tokenization, not a naive
strings.Split) and, if ANY top-level segment classifies as CREATE SECRET,
replaces the whole query with the fixed redaction placeholder. The
single-statement fast path is unchanged. Over-redaction is intentional:
false positives only cost log fidelity, never credential exposure.

Also fixes the same head-only blind spot in the Flight SQL ingress
RejectPersistentSecretDDL check via a new ContainsPersistentSecretDDL
helper that scans every top-level statement.

Generated-By: PostHog Code
Task-Id: 96557416-3af0-4633-bcf0-164d1e64cf22

* fix(usersecrets): redact secret literals echoed in error messages

RedactForLog scrubs the query attribute, but engine error messages echo
the offending SQL verbatim (DuckDB: `LINE 1: ... SECRET 'literal'`), so a
failed CREATE SECRET leaked the credential through the unredacted error
sinks even though the query text was redacted:

  msg="Query execution errored." query="CREATE TEMPORARY SECRET badt (…redacted)"
    error="...Parser Error: ... SECRET 'realkey' BOGUS)"

This violated the same "never log/store secret statement text" invariant
the PR set out to enforce, just via the error channel instead of the
query channel. Reproduced live against the multi-tenant cluster.

Add usersecrets.RedactErrorForLog(query, errMsg): when the originating
query carries CREATE SECRET DDL (head or any top-level statement, shared
tokenizer with RedactForLog so they can't drift), the whole error is
replaced with a fixed placeholder. Over-redaction only costs diagnostic
detail, never credential exposure; non-secret errors pass through.

Apply it at every sink that logs a query's error: logQueryError and
logQueryFinished (slog `error` attr) and logQuery (query-log `Exception`
column), each classifying against the original query before it is
replaced with the redacted form.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

build-d79ae9c

Toggle build-d79ae9c's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore(httpfs): bump extension to v1.5.3-cred-refresh — mid-statement …

…credential recovery (#780)

Rolls the PostHog httpfs fork patch (PostHog/duckdb-httpfs branch
cred-refresh-read-path, tag commit b1fece6, based on the previously
deployed v1.5.3-stoi-fix) into both images and the worker-CD matrix
default row. On an auth failure (400/403) in any S3 request path —
range GET, full GET, HEAD, DELETE, and the multipart-upload POST/PUT —
the extension re-resolves the LATEST COMMITTED ducklake_s3 secret
(bypassing the statement's MVCC snapshot) and retries once with the
refreshed credentials.

Combined with the control plane's credential-refresh scheduler (which
keeps committing fresh STS credentials via CREATE OR REPLACE SECRET on
the worker's controlDB every <=30min), this removes the
statement-length ceiling on STS-backed DuckLake/Iceberg access: a
backfill DELETE/SELECT can now run for hours, surviving expiry of the
token it started with. Stock httpfs has no such recovery — pinned by
TestInFlightScanDiesOnCredentialRotation; the patched behavior is
proven by TestInFlightScanSurvivesRotationWithPatchedHTTPFS
(tests/integration, real MinIO rotation + revocation mid-scan, exact
row count).

Harness: assert_fork_extensions now expects the new build's SHA
(EXPECT_HTTPFS_SHA=b1fece6), so the e2e Job fails if workers ship any
other httpfs build; README documents why real in-statement STS expiry
is not soak-tested in-Job (900s AssumeRole floor vs Job budget) and
where the deterministic coverage lives.

Rollback: revert this commit — v1.5.3-stoi-fix release assets remain
published; the 1.5.2 matrix row is untouched.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

build-d8dd53a

Toggle build-d8dd53a's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(flightclient): stop doubling backslashes in interpolated string a…

…rgs (#776)

* fix(flightclient): stop doubling backslashes in interpolated string args

formatArgValue escaped backslashes C-style (\ -> \\) before wrapping
bound string parameters in single-quoted literals. DuckDB standard
'...' literals — like standard-conforming PostgreSQL — treat backslash
as a literal character, so the doubling silently corrupted every bound
string containing a backslash (Windows paths, regexes, JSON with
escapes) on the process/remote worker backends: 'a\b' arrived in the
table as a\\b. Only the single quote needs doubling in these
literals; the BLOB branch ('\x…'::BLOB) is unaffected because blob
casts do interpret \x escapes.

The new test round-trips interpolated literals through a real DuckDB
instance, so it encodes the engine's actual literal semantics rather
than an assumption about them.

* test(flightclient): satisfy errcheck on db.Close in backslash round-trip test

build-cedafa4

Toggle build-cedafa4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(s3): survive STS token expiry — freshness floor + httpfs fork cre…

…dential refresh proof (#778)

* fix(controlplane): raise STS freshness floor to 35min; surface expiry on config-store aws path

DuckDB statements capture S3 credentials when they start executing (secret
lookups resolve against the statement's MVCC snapshot), so the credential-
refresh scheduler's CREATE OR REPLACE SECRET push never reaches an in-flight
statement. With a 10min cache safety margin, a statement could begin on
near-expiry cached STS creds and die with ExpiredToken after as little as
10 minutes — which is what killed a long-running events-backfill DELETE.

- Raise stsCacheSafetyMargin 10min -> 35min: every credential capture point
  (worker activation, scheduler refresh push) now hands out creds with at
  least 35min of runway, and since the margin exceeds the scheduler's 30min
  lookahead, a refresh push always stamps the worker out of the "due"
  window — eliminating the 5s identical-creds re-push churn observed in
  prod (~700 push events / 15min).
- Move credentialRefreshLookahead to package scope and add a compile-time
  guard that fails the build if the margin ever drops to/below the
  lookahead.
- Fix a latent bug: the config-store "aws" provider path brokers STS creds
  but returned no expiration, leaving S3CredentialsExpiresAt nil — the
  refresh scheduler never refreshed those workers, so any session
  outliving the 1h token died. The path now returns the STS expiry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(s3): pin no-mid-statement-credential-recovery; STS session-duration knob; always pin SESSION_TOKEN

Investigation result (REFUTES the rotating-credential-file design explored on
this branch): DuckDB v1.5.3's refresh-on-403 hook (TryRefreshS3Secret) lives
only in S3FileHandle::Initialize and fires only when the file open performs a
network request. It never does for scan workloads: the S3 glob (ListObjectsV2),
DuckLake, and Iceberg file lists all pre-populate file_size/etag/last_modified
in extended_info, which marks the handle initialized and skips the HEAD — the
first auth failure surfaces on a range GET in the read path, which has no
refresh handling, and the statement dies. Verified empirically against MinIO
(an in-flight 48-file scan died on a newly-opened file's first GET despite the
credential file being rewritten and the secret CREATE OR REPLACE'd first) and
at source level in PostHog/duckdb-httpfs v1.5.3-stoi-fix, duckdb/ducklake, and
duckdb/duckdb-iceberg v1.5. The credential_chain + rotating-file mechanism
would therefore only guarantee freshness for NEW statements — which plain
CREATE OR REPLACE SECRET rotation already achieves — so it is not included;
the 35-min freshness floor (26fede7) remains the only (and honest) guarantee.

What this commit ships instead:

- tests/integration/credential_rotation_pin_test.go: pins the limitation the
  freshness floor is designed around — an in-flight scan does NOT survive
  credential rotation (MVCC secret snapshot + no HEAD at open), while a fresh
  statement on the rotated secret does. If a DuckDB upgrade ever makes the
  in-flight scan survive, the test fails loudly telling us to revisit the
  floor.
- DUCKGRES_STS_SESSION_DURATION (env-only): overrides the 1h AssumeRole
  session duration (clamped to AWS's 900s floor) so a soak test can exercise
  real in-statement expiry. credentialRefreshLookahead (duration/2) and
  stsCacheSafetyMargin (lookahead+5m, the historical 35m at default) now
  derive from it; the margin>lookahead invariant holds by construction and is
  unit-tested.
- buildConfigSecret always emits SESSION_TOKEN (empty pins "no token"):
  httpfs copies a host AWS_SESSION_TOKEN env var into the global
  s3_session_token setting at load and secret lookup falls back to settings
  for omitted keys, signing with a mismatched (key, token) pair.
- tests/e2e-mw-dev/README.md: documents why the freshness floor cannot be
  asserted in-Job and where its coverage lives.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(s3): prove mid-statement credential recovery with patched httpfs fork

The PostHog httpfs fork (PostHog/duckdb-httpfs branch
cred-refresh-read-path) adds what DuckDB v1.5.3 lacks: on an auth
failure (400/403) in any S3 request path it re-resolves the LATEST
COMMITTED secret — bypassing the statement's MVCC snapshot, so the
credential-refresh scheduler's CREATE OR REPLACE SECRET pushes are
picked up mid-statement — and retries once with the new credentials.
That removes the statement-length ceiling the STS freshness floor could
only push to ~35-60min.

Restructure the credential-rotation pin test into a shared scenario
(rotate to user B on a side connection, then disable user A while a
glob scan is in flight) driving two tests:

- TestInFlightScanDiesOnCredentialRotation (unchanged semantics): the
  STOCK extension still dies on the first post-revocation range GET —
  keeps pinning the upstream limitation so we notice if DuckDB gains
  recovery natively.
- TestInFlightScanSurvivesRotationWithPatchedHTTPFS (new): with
  DUCKGRES_TEST_PATCHED_HTTPFS pointing at a locally-built patched
  extension, the SAME in-flight scan survives rotation + revocation
  with an exact row count (no loss or duplication on retry).

Verified locally against MinIO: stock dies at T+4.7s with 403; patched
completes 1,920,000 rows at T+25s with the starting credentials revoked
at T+4.2s.

Also update the stsCacheSafetyMargin doc: once a fork release with the
patch is pinned via HTTPFS_EXTENSION_TAG in Dockerfile.worker, the
freshness floor becomes defense-in-depth rather than the only guarantee
for long statements.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(controlplane): clamp DUCKGRES_STS_SESSION_DURATION to the 1h role-chaining ceiling

Review finding (codex): values above 1h passed straight through to
AssumeRole. STS does not truncate an oversized DurationSeconds — it
REJECTS the call (role chaining via EKS Pod Identity caps at 3600s, and
every duckling-* role has MaxSessionDuration=3600), so an accidentally
high env value (e.g. "2h") would break activation for every org.
Clamp down to maxSTSSessionDuration=1h with a loud warning, mirroring
the existing 900s floor clamp; TestResolveSTSSessionDuration covers
both bounds.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

build-b29f4c9

Toggle build-b29f4c9's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(duckdbservice): wipe all user secrets between sessions, not just …

…persistent (#771)

The cross-user isolation wipe at session create only dropped PERSISTENT
secrets. Clients can create non-persistent secrets via plain CREATE SECRET
or CREATE TEMPORARY SECRET, which the control plane passes through verbatim
to the org's worker DuckDB. DuckDB secrets are instance-global and survive
connection eviction, so a temporary secret created by user A of an org
lingered on the hot-idle worker pod and was inherited by the next user B of
the same org, who could then read A's private buckets or enumerate A's
credentials via duckdb_secrets().

Rename wipeUserPersistentSecrets to wipeUserSecrets and make it enumerate
every secret (SELECT name, persistent FROM duckdb_secrets()), dropping all
non-allowlisted ones with the storage-correct DROP form (PERSISTENT vs
TEMPORARY per the persistent flag). System-managed secrets are preserved via
usersecrets.IsReservedName (ducklake_s3 / iceberg_sigv4 / iceberg_oauth plus
the __default_*/duckgres_* prefixes), which activation re-creates. The
mandatory fail-the-session-on-wipe-error semantics at CreateSession and the
best-effort wipe at DestroySession are unchanged.

Adds a temporary-secret regression unit test and a TEMPORARY-secret variant
to the e2e persistent_user_secret_isolation harness assertion.

Generated-By: PostHog Code
Task-Id: 96557416-3af0-4633-bcf0-164d1e64cf22

build-4573179

Toggle build-4573179's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
refactor query outcome metrics (#779)

build-941882a

Toggle build-941882a's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(usersecrets): redact secret DDL anywhere in a multi-statement que…

…ry (#769)

* fix(usersecrets): redact secret DDL anywhere in a multi-statement query

RedactForLog only classified the statement head, so a client could prefix
any leading statement (e.g. "SELECT 1; CREATE PERSISTENT SECRET foo (...,
SECRET 'realKey')") and the head no longer classified as secret DDL — the
query, including credential literals, was returned verbatim and written to
logs, OTel spans, the persisted query log, and pg_stat_activity. This
violates the "Never log/store secret statement text" invariant.

RedactForLog now splits the query on top-level semicolons (reusing the
existing literal/identifier/comment-aware tokenization, not a naive
strings.Split) and, if ANY top-level segment classifies as CREATE SECRET,
replaces the whole query with the fixed redaction placeholder. The
single-statement fast path is unchanged. Over-redaction is intentional:
false positives only cost log fidelity, never credential exposure.

Also fixes the same head-only blind spot in the Flight SQL ingress
RejectPersistentSecretDDL check via a new ContainsPersistentSecretDDL
helper that scans every top-level statement.

Generated-By: PostHog Code
Task-Id: 96557416-3af0-4633-bcf0-164d1e64cf22

* fix(usersecrets): redact secret literals echoed in error messages

RedactForLog scrubs the query attribute, but engine error messages echo
the offending SQL verbatim (DuckDB: `LINE 1: ... SECRET 'literal'`), so a
failed CREATE SECRET leaked the credential through the unredacted error
sinks even though the query text was redacted:

  msg="Query execution errored." query="CREATE TEMPORARY SECRET badt (…redacted)"
    error="...Parser Error: ... SECRET 'realkey' BOGUS)"

This violated the same "never log/store secret statement text" invariant
the PR set out to enforce, just via the error channel instead of the
query channel. Reproduced live against the multi-tenant cluster.

Add usersecrets.RedactErrorForLog(query, errMsg): when the originating
query carries CREATE SECRET DDL (head or any top-level statement, shared
tokenizer with RedactForLog so they can't drift), the whole error is
replaced with a fixed placeholder. Over-redaction only costs diagnostic
detail, never credential exposure; non-secret errors pass through.

Apply it at every sink that logs a query's error: logQueryError and
logQueryFinished (slog `error` attr) and logQuery (query-log `Exception`
column), each classifying against the original query before it is
replaced with the redacted form.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

build-660c0e0

Toggle build-660c0e0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(flightsqlingress): never log raw Flight session token (#768)

The x-duckgres-session token is a standalone bearer credential — token-only
auth grants full session access. reconnectByToken logged the raw token in
plaintext on two reachable error paths (durable record load failure and
reconnect failure), which fire under normal ops (CP restart/failover, worker
OwnerEpoch mismatch) for LIVE valid tokens. The global slog redaction handler
only scrubs password=<value> patterns, so a bare token attr passed through
verbatim to stderr and the OTLP/log exporter, letting anyone with log read
access lift and replay it to hijack the victim's org-bound session.

Fix:
- Add tokenFingerprint() returning the first 8 hex chars of sha256(token) and
  log "token_fp" instead of the raw "token" at both sites. Auth behavior is
  unchanged; only logging changes.
- Defense-in-depth: the cliboot RedactingHandler now scrubs any slog attr whose
  key is "token" to [REDACTED] (token_fp is preserved).
- Unit tests for tokenFingerprint (deterministic, != input, fixed length,
  empty-safe) and for the key-based redaction.

Generated-By: PostHog Code
Task-Id: 96557416-3af0-4633-bcf0-164d1e64cf22