Skip to content

noms chunk journal unrecoverable after unclean shutdown ('journal index is malformed' / 'corrupted journal at offset N') — request repair tooling #11181

@vbtcl

Description

@vbtcl

Summary

When a dolt sql-server is terminated uncleanly — SIGKILL mid-write, or two processes briefly contending for one data_dir — the noms chunk journal index can become malformed, and the server then refuses to start with no built-in repair path:

possible data loss detected in journal file at offset 8322590399: corrupted journal
database "dolt" is locked by another dolt process; ... stop the dolt process which
   currently holds an exclusive write lock on the database
error bootstrapping chunk journal: journal index is malformed

Today the only recovery is to manually move the data dir aside and restore from a backup — lossy if the DB wasn't backed up.

Requests

  1. Journal repair/recovery tooling — a dolt subcommand (or automatic recovery on boot behind a flag) that truncates the journal to the last valid offset. The error already computes "possible data loss detected ... at offset N", so the last-good boundary is known.
  2. Stronger concurrent-access guard — a second process attaching to a locked data_dir should fail before any write that can corrupt the journal, rather than after.

Context

Running many Dolt databases under an orchestration layer (gas-city) where unclean restarts occasionally happen; this corruption recurs every few days and each event is a full outage. Dolt 2.0.7, linux/arm64. Happy to provide a corrupted store for repro.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions