Skip to content

fix: Resolve TOCTOU race condition in data version allocation#359

Merged
stack72 merged 1 commit intomainfrom
fix/toctou-race-version-allocation
Feb 16, 2026
Merged

fix: Resolve TOCTOU race condition in data version allocation#359
stack72 merged 1 commit intomainfrom
fix/toctou-race-version-allocation

Conversation

@stack72
Copy link
Contributor

@stack72 stack72 commented Feb 16, 2026

Summary

Fixes #327 — TOCTOU race condition in unified_data_repository.ts where concurrent allocateVersion() and save() calls could compute the same version number, causing silent data loss.

Reasoning

Both allocateVersion() and save() followed a read-then-create pattern: read existing version directories, compute max + 1, then create the directory with ensureDir(). Because ensureDir() uses recursive: true, it silently succeeds even if the directory already exists — so two concurrent calls that both compute version N would both "succeed", with the last writer silently overwriting the first.

This is a real risk because workflow steps execute in parallel via Promise.allSettled() and jobs execute in parallel via Promise.all().

The fix uses Deno.mkdir() without recursive: true as an atomic claim mechanism. mkdir is an atomic kernel-level operation — it either creates the directory or throws AlreadyExists. On collision, we increment the version and retry. This avoids introducing any locking infrastructure; the directory creation itself is the lock.

Plan vs Implementation

Plan Step Status Notes
Step 1: Add failing concurrent tests Done Two tests: concurrent allocateVersion() (10 parallel calls, assert unique versions) and concurrent save() (10 parallel calls with distinct content, assert unique versions + content preserved). Uses Deno.makeTempDir(), Data.create(), Promise.all().
Step 2: Add atomicAllocateVersionDir() private method Done Added after getDataNameDir(). Ensures parent dir with recursive: true, reads starting version via listVersions(), loops with non-recursive Deno.mkdir(), catches AlreadyExists to retry with incremented version, rethrows other errors, caps at 100 retries.
Step 3: Update allocateVersion() Done Replaced racy listVersions + ensureDir block with atomicAllocateVersionDir() call. Ownership validation unchanged.
Step 4: Update save() Done Same replacement. Metadata write, content write, symlink update all unchanged.
Step 5: Run tests and confirm pass Done All 1713 tests pass including the 2 new concurrent tests.
Step 6: Remove unused ensureDir import Done import { ensureDir } from "@std/fs" removed — no other usages in the file.

No deviations from the plan. All steps implemented exactly as specified.

Files Changed

  • src/infrastructure/persistence/unified_data_repository.ts — core fix (atomic version allocation)
  • src/infrastructure/persistence/unified_data_repository_test.ts — concurrent race condition tests

Test plan

  • deno check — type checking passes
  • deno lint — linting passes
  • deno fmt — formatting passes
  • deno run test — all 1713 tests pass
  • deno run compile — binary compiles successfully

🤖 Generated with Claude Code

Use Deno.mkdir() (non-recursive) as an atomic claim mechanism for
version directory creation. On AlreadyExists collision, increment
and retry. This eliminates the read-then-create race between
concurrent allocateVersion() and save() calls.

Closes #327

Co-Authored-By: Magistr <2269529+umag@users.noreply.github.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM - Approved

This PR correctly fixes the TOCTOU race condition in version allocation. The implementation is well-designed and follows best practices.

What's Good

Correct Fix: Using Deno.mkdir() without recursive: true as an atomic claim mechanism is the right approach. The kernel-level atomicity of mkdir provides a reliable lock without introducing additional infrastructure.

Code Quality:

  • TypeScript strict mode compliance - no any types
  • Proper named exports
  • License header present
  • Clean error handling with proper re-throw of non-collision errors

Test Coverage:

  • Two new tests directly verify the race condition fix
  • Tests use proper isolation with Deno.makeTempDir()
  • Cleanup in finally blocks ensures no test pollution

DDD Alignment:

  • atomicAllocateVersionDir is correctly a private implementation detail
  • Repository pattern maintained - persistence concerns stay in infrastructure layer
  • No domain logic changes required

Minor Suggestions (Non-blocking)

  1. Unused return value: atomicAllocateVersionDir returns { version, versionDir } but callers only use version. Consider returning just { version } for clarity, though this is harmless as-is.

  2. Content verification: The concurrent save test verifies content is not null, but could verify each version contains the expected content-N string to confirm no overwrites occurred. Current test is sufficient to catch the bug though.

Summary

The fix is correct, well-tested, and follows the project's code style guidelines. The approach is elegant - leveraging OS-level atomicity rather than introducing application-level locking.

@stack72 stack72 merged commit fbac7c1 into main Feb 16, 2026
4 checks passed
@stack72 stack72 deleted the fix/toctou-race-version-allocation branch February 16, 2026 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TOCTOU race condition in data version allocation

1 participant