Skip to content

fix race condition in sandbox pack lock#8183

Merged
BridgeAR merged 1 commit into
masterfrom
fix-sandbox-lock-race
Apr 30, 2026
Merged

fix race condition in sandbox pack lock#8183
BridgeAR merged 1 commit into
masterfrom
fix-sandbox-lock-race

Conversation

@rochdev
Copy link
Copy Markdown
Member

@rochdev rochdev commented Apr 30, 2026

What does this PR do?

Fix race condition in sandbox pack lock.

Motivation

Causes the process to hang.

Explanation from Claude:

The hang is the 60-second retry backoff in execHelperAsync (line 408):

bun pm pack fails on the first try (transient bun issue or resource contention)
execHelperAsync catches the error, waits 60 seconds, then retries
The retry usually succeeds → "then go through the retry so it can finish"
That 60-second sleep is what you're experiencing as the hang. Since bun pm pack runs without a timeout and no output during the wait, it looks completely frozen.

There's also a latent bug in the locking: if both attempts fail, packTarball throws, and the finally block at line 488 only closes the file descriptor — it never deletes the lock file. Any subsequent call to packTarballWithLock then sees EEXIST and enters the while (!existsSync(tarballPath)) polling loop at line 480 forever, because the tarball will never appear. That would be a true infinite hang rather than just a 60-second delay.

The bug has two parts: let me show both clearly, then fix them.

Bug 1: When packTarball fails, the finally block closes the file descriptor but never deletes the lock file. So on a subsequent call, fs.open(lockFile, 'wx') throws EEXIST and the code enters the while (!existsSync(tarballPath)) polling loop — but since no one holds the lock anymore, the tarball will never appear, and the loop runs forever.

Bug 2: The polling loop only exits when the tarball appears. It has no way to detect that the lock holder gave up.

The fix: delete the lock file in finally, and in the polling loop, check if the lock disappeared without a tarball and retry from scratch.

The fix in two parts:

finally now deletes the lock file — so if packing fails, the lock is released and the next call can acquire it and try again. Previously the comment "no need to clean up" was wrong: the temp directory is per-process (created fresh each run via id() on line 21), so the lock file would persist across all the createSandbox calls within the same test run.

Polling loop checks for a vanished lock — after each 100ms sleep it checks whether the lock file is gone but the tarball still isn't there. If so, the lock holder failed, and this waiter retries from scratch rather than looping forever.

Additional Notes

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.19%. Comparing base (96c92d9) to head (9e0dd47).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8183      +/-   ##
==========================================
+ Coverage   71.83%   77.19%   +5.36%     
==========================================
  Files         732      813      +81     
  Lines       33452    37658    +4206     
==========================================
+ Hits        24030    29070    +5040     
+ Misses       9422     8588     -834     
Flag Coverage Δ
aiguard-macos 35.82% <ø> (-0.09%) ⬇️
aiguard-ubuntu 35.93% <ø> (-0.09%) ⬇️
aiguard-windows 35.72% <ø> (-0.09%) ⬇️
apm-capabilities-tracing-macos 48.15% <ø> (?)
apm-capabilities-tracing-ubuntu-active 48.23% <ø> (?)
apm-capabilities-tracing-ubuntu-latest 48.20% <ø> (?)
apm-capabilities-tracing-ubuntu-maintenance 48.23% <ø> (?)
apm-capabilities-tracing-ubuntu-oldest 48.21% <ø> (?)
apm-capabilities-tracing-windows 48.02% <ø> (?)
apm-integrations-aerospike-18-gte.5.2.0 34.97% <ø> (-0.09%) ⬇️
apm-integrations-aerospike-20-gte.5.5.0 34.99% <ø> (-0.09%) ⬇️
apm-integrations-aerospike-22-gte.5.12.1 34.99% <ø> (-0.09%) ⬇️
apm-integrations-aerospike-22-gte.6.0.0 34.99% <ø> (-0.09%) ⬇️
apm-integrations-aerospike-eol- 34.89% <ø> (-0.09%) ⬇️
apm-integrations-child-process 36.15% <ø> (-0.09%) ⬇️
apm-integrations-confluentinc-kafka-javascript-18 41.83% <ø> (-0.09%) ⬇️
apm-integrations-confluentinc-kafka-javascript-20 41.85% <ø> (-0.09%) ⬇️
apm-integrations-confluentinc-kafka-javascript-22 41.85% <ø> (-0.09%) ⬇️
apm-integrations-confluentinc-kafka-javascript-24 41.78% <ø> (-0.09%) ⬇️
apm-integrations-couchbase-18 35.15% <ø> (-0.09%) ⬇️
apm-integrations-couchbase-eol 35.23% <ø> (-0.09%) ⬇️
apm-integrations-dns 35.00% <ø> (-0.09%) ⬇️
apm-integrations-elasticsearch 35.55% <ø> (-0.09%) ⬇️
apm-integrations-http-latest 42.98% <ø> (-0.08%) ⬇️
apm-integrations-http-maintenance 43.04% <ø> (-0.08%) ⬇️
apm-integrations-http-oldest 43.05% <ø> (-0.08%) ⬇️
apm-integrations-http2 40.35% <ø> (-0.09%) ⬇️
apm-integrations-kafkajs-latest 41.72% <ø> (-0.08%) ⬇️
apm-integrations-kafkajs-oldest 41.77% <ø> (-0.09%) ⬇️
apm-integrations-net 35.66% <ø> (-0.09%) ⬇️
apm-integrations-next-11.1.4 29.43% <ø> (-0.08%) ⬇️
apm-integrations-next-13.2.0 31.28% <ø> (-0.09%) ⬇️
apm-integrations-next-gte.10.2.0.and.lt.11 23.33% <ø> (ø)
apm-integrations-next-gte.11.0.0.and.lt.13 31.29% <ø> (-0.09%) ⬇️
apm-integrations-next-gte.13.0.0.and.lt.14 31.54% <ø> (-0.09%) ⬇️
apm-integrations-next-gte.14.0.0.and.lte.14.2.6 31.36% <ø> (-0.09%) ⬇️
apm-integrations-next-gte.14.2.7.and.lt.15 31.36% <ø> (-0.09%) ⬇️
apm-integrations-next-gte.15.0.0 31.42% <ø> (-0.09%) ⬇️
apm-integrations-prisma-18-gte.6.16.0.and.lt.7.0.0 35.52% <ø> (-0.09%) ⬇️
apm-integrations-prisma-latest-all 35.85% <ø> (-0.09%) ⬇️
apm-integrations-sharedb 34.57% <ø> (-0.09%) ⬇️
apm-integrations-tedious 35.13% <ø> (-0.09%) ⬇️
appsec-express 52.78% <ø> (-0.07%) ⬇️
appsec-fastify 49.26% <ø> (-0.07%) ⬇️
appsec-graphql 49.44% <ø> (-0.17%) ⬇️
appsec-kafka 42.07% <ø> (-0.09%) ⬇️
appsec-ldapjs 41.30% <ø> (-0.08%) ⬇️
appsec-lodash 41.42% <ø> (-0.08%) ⬇️
appsec-macos 56.79% <ø> (-0.07%) ⬇️
appsec-mongodb-core 45.69% <ø> (-0.07%) ⬇️
appsec-mongoose 46.57% <ø> (-0.07%) ⬇️
appsec-mysql 48.74% <ø> (-0.07%) ⬇️
appsec-next-latest-11.1.4 29.60% <ø> (-0.08%) ⬇️
appsec-next-latest-13.2.0 31.49% <ø> (-0.09%) ⬇️
appsec-next-latest-gte.10.2.0.and.lt.11 31.60% <ø> (ø)
appsec-next-latest-gte.11.0.0.and.lt.13 31.47% <ø> (-0.09%) ⬇️
appsec-next-latest-gte.13.0.0.and.lt.14 31.67% <ø> (-0.09%) ⬇️
appsec-next-latest-gte.14.0.0.and.lte.14.2.6 31.52% <ø> (-0.09%) ⬇️
appsec-next-latest-gte.14.2.7.and.lt.15 31.52% <ø> (-0.09%) ⬇️
appsec-next-latest-gte.15.0.0 31.52% <ø> (-0.09%) ⬇️
appsec-next-oldest-11.1.4 29.62% <ø> (-0.08%) ⬇️
appsec-next-oldest-13.2.0 31.73% <ø> (-0.09%) ⬇️
appsec-next-oldest-gte.10.2.0.and.lt.11 31.76% <ø> (ø)
appsec-next-oldest-gte.11.0.0.and.lt.13 31.49% <ø> (-0.09%) ⬇️
appsec-next-oldest-gte.13.0.0.and.lt.14 31.92% <ø> (-0.09%) ⬇️
appsec-next-oldest-gte.14.0.0.and.lte.14.2.6 31.77% <ø> (-0.09%) ⬇️
appsec-next-oldest-gte.14.2.7.and.lt.15 31.77% <ø> (-0.09%) ⬇️
appsec-next-oldest-gte.15.0.0 31.77% <ø> (-0.09%) ⬇️
appsec-node-serialize 40.60% <ø> (-0.08%) ⬇️
appsec-passport 44.59% <ø> (-0.08%) ⬇️
appsec-postgres 48.33% <ø> (-0.07%) ⬇️
appsec-sourcing 40.09% <ø> (-0.08%) ⬇️
appsec-stripe 42.32% <ø> (-0.08%) ⬇️
appsec-template 40.76% <ø> (-0.08%) ⬇️
appsec-ubuntu 56.87% <ø> (-0.07%) ⬇️
appsec-windows 56.66% <ø> (-0.09%) ⬇️
debugger-ubuntu-active 62.13% <ø> (-0.33%) ⬇️
debugger-ubuntu-latest 62.03% <ø> (-0.33%) ⬇️
debugger-ubuntu-maintenance 62.13% <ø> (-0.33%) ⬇️
debugger-ubuntu-oldest 62.23% <ø> (-0.33%) ⬇️
instrumentations-instrumentation-bluebird 29.88% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-body-parser 37.74% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-child_process 35.53% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-cookie-parser 31.81% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-express 32.03% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-express-mongo-sanitize 31.93% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-express-session 37.38% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-fs 29.56% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-generic-pool 30.59% <ø> (ø)
instrumentations-instrumentation-http 36.99% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-knex 29.85% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-light-my-request 37.30% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-mongoose 30.94% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-multer 37.52% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-mysql2 35.49% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-passport 41.17% <ø> (-0.08%) ⬇️
instrumentations-instrumentation-passport-http 40.95% <ø> (-0.08%) ⬇️
instrumentations-instrumentation-passport-local 41.46% <ø> (-0.08%) ⬇️
instrumentations-instrumentation-pg 35.03% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-promise 29.82% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-promise-js 29.83% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-q 29.86% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-url 29.83% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-when 29.84% <ø> (-0.09%) ⬇️
llmobs-ai 38.45% <ø> (-0.09%) ⬇️
llmobs-anthropic 37.94% <ø> (-0.09%) ⬇️
llmobs-bedrock 37.15% <ø> (-0.08%) ⬇️
llmobs-google-genai 37.58% <ø> (-0.08%) ⬇️
llmobs-langchain 37.09% <ø> (-0.07%) ⬇️
llmobs-openai 41.26% <ø> (-0.08%) ⬇️
llmobs-vertex-ai 37.77% <ø> (-0.09%) ⬇️
openfeature-unit-active 50.43% <ø> (-0.52%) ⬇️
openfeature-unit-latest 50.28% <ø> (-0.52%) ⬇️
openfeature-unit-maintenance 50.43% <ø> (-0.52%) ⬇️
openfeature-unit-oldest 50.43% <ø> (-0.52%) ⬇️
platform-core 36.53% <ø> (ø)
platform-esbuild 40.80% <ø> (ø)
platform-instrumentations-misc 31.34% <ø> (ø)
platform-shimmer 42.11% <ø> (ø)
platform-unit-guardrails 35.88% <ø> (ø)
platform-webpack 20.78% <ø> (ø)
plugins-azure-durable-functions 25.36% <ø> (ø)
plugins-azure-event-hubs 25.51% <ø> (ø)
plugins-azure-service-bus 24.92% <ø> (ø)
plugins-bullmq 40.68% <ø> (-0.09%) ⬇️
plugins-cassandra 35.26% <ø> (-0.09%) ⬇️
plugins-cookie 26.47% <ø> (ø)
plugins-cookie-parser 26.28% <ø> (ø)
plugins-crypto 27.32% <ø> (ø)
plugins-dd-trace-api 35.48% <ø> (-0.09%) ⬇️
plugins-express-mongo-sanitize 26.42% <ø> (ø)
plugins-express-session 26.24% <ø> (ø)
plugins-fastify 39.38% <ø> (-0.09%) ⬇️
plugins-fetch 35.87% <ø> (-0.09%) ⬇️
plugins-fs 35.75% <ø> (-0.09%) ⬇️
plugins-generic-pool 25.40% <ø> (ø)
plugins-google-cloud-pubsub 43.05% <ø> (-0.11%) ⬇️
plugins-grpc 38.12% <ø> (-0.09%) ⬇️
plugins-handlebars 26.46% <ø> (ø)
plugins-hapi 37.36% <ø> (-0.09%) ⬇️
plugins-hono 37.61% <ø> (-0.09%) ⬇️
plugins-ioredis 35.80% <ø> (-0.09%) ⬇️
plugins-knex 26.14% <ø> (ø)
plugins-langgraph 35.14% <ø> (-0.09%) ⬇️
plugins-ldapjs 24.02% <ø> (ø)
plugins-light-my-request 25.88% <ø> (ø)
plugins-limitd-client 30.12% <ø> (-0.09%) ⬇️
plugins-lodash 25.47% <ø> (ø)
plugins-mariadb 36.67% <ø> (-0.14%) ⬇️
plugins-memcached 35.45% <ø> (-0.09%) ⬇️
plugins-microgateway-core 36.45% <ø> (-0.09%) ⬇️
plugins-modelcontextprotocol-sdk 34.39% <ø> (-0.09%) ⬇️
plugins-moleculer 38.14% <ø> (-0.09%) ⬇️
plugins-mongodb 36.75% <ø> (+0.02%) ⬆️
plugins-mongodb-core 36.27% <ø> (-0.12%) ⬇️
plugins-mongoose 36.12% <ø> (-0.17%) ⬇️
plugins-multer 26.24% <ø> (ø)
plugins-mysql 36.40% <ø> (-0.22%) ⬇️
plugins-mysql2 36.50% <ø> (-0.09%) ⬇️
plugins-node-serialize 26.51% <ø> (ø)
plugins-opensearch 35.11% <ø> (-0.09%) ⬇️
plugins-passport-http 26.30% <ø> (ø)
plugins-pino 31.91% <ø> (-0.09%) ⬇️
plugins-postgres 34.61% <ø> (ø)
plugins-process 27.32% <ø> (ø)
plugins-pug 26.47% <ø> (ø)
plugins-redis 36.01% <ø> (-0.09%) ⬇️
plugins-router 39.78% <ø> (-0.21%) ⬇️
plugins-sequelize 25.18% <ø> (ø)
plugins-test-and-upstream-amqp10 35.64% <ø> (-0.22%) ⬇️
plugins-test-and-upstream-amqplib 40.92% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-apollo 36.61% <ø> (-0.08%) ⬇️
plugins-test-and-upstream-avsc 35.10% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-bunyan 31.26% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-connect 37.95% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-graphql 37.29% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-koa 37.56% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-protobufjs 35.32% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-rhea 41.00% <ø> (-0.09%) ⬇️
plugins-undici 36.50% <ø> (-0.21%) ⬇️
plugins-url 27.32% <ø> (ø)
plugins-valkey 35.49% <ø> (-0.09%) ⬇️
plugins-vm 27.32% <ø> (ø)
plugins-winston 31.86% <ø> (-0.09%) ⬇️
plugins-ws 39.07% <ø> (-0.09%) ⬇️
profiling-macos 40.69% <ø> (-0.08%) ⬇️
profiling-ubuntu 41.30% <ø> (-0.08%) ⬇️
profiling-windows 40.87% <ø> (-0.08%) ⬇️
serverless-azure-functions-client 25.25% <ø> (ø)
serverless-azure-functions-eventhubs 25.25% <ø> (ø)
serverless-azure-functions-servicebus 25.25% <ø> (ø)
serverless-lambda 33.56% <ø> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

Overall package size

Self size: 5.67 MB
Deduped: 6.52 MB
No deduping: 6.52 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.1 | 82.56 kB | 817.39 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@datadog-official
Copy link
Copy Markdown

datadog-official Bot commented Apr 30, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 71.49% (+6.41%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 9e0dd47 | Docs | Datadog PR Page | Give us feedback!

@rochdev
Copy link
Copy Markdown
Member Author

rochdev commented Apr 30, 2026

@watson Did you consider alternatives? In my experience, adding locks is how you end up with an infinite stream of problems (which we already have with integration tests, so I'm not a fan of adding locks to the mix)

@rochdev rochdev marked this pull request as ready for review April 30, 2026 04:09
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Apr 30, 2026

Benchmarks

Benchmark execution time: 2026-04-30 04:16:49

Comparing candidate commit 9e0dd47 in PR branch fix-sandbox-lock-race with baseline commit 96c92d9 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1349 metrics, 95 unstable metrics.

@BridgeAR BridgeAR merged commit 2fa6f67 into master Apr 30, 2026
877 checks passed
@BridgeAR BridgeAR deleted the fix-sandbox-lock-race branch April 30, 2026 09:43
@dd-octo-sts dd-octo-sts Bot mentioned this pull request Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants