Summary
When a dolt sql-server is terminated uncleanly — SIGKILL mid-write, or two processes briefly contending for one data_dir — the noms chunk journal index can become malformed, and the server then refuses to start with no built-in repair path:
possible data loss detected in journal file at offset 8322590399: corrupted journal
database "dolt" is locked by another dolt process; ... stop the dolt process which
currently holds an exclusive write lock on the database
error bootstrapping chunk journal: journal index is malformed
Today the only recovery is to manually move the data dir aside and restore from a backup — lossy if the DB wasn't backed up.
Requests
- Journal repair/recovery tooling — a
dolt subcommand (or automatic recovery on boot behind a flag) that truncates the journal to the last valid offset. The error already computes "possible data loss detected ... at offset N", so the last-good boundary is known.
- Stronger concurrent-access guard — a second process attaching to a locked
data_dir should fail before any write that can corrupt the journal, rather than after.
Context
Running many Dolt databases under an orchestration layer (gas-city) where unclean restarts occasionally happen; this corruption recurs every few days and each event is a full outage. Dolt 2.0.7, linux/arm64. Happy to provide a corrupted store for repro.
Summary
When a
dolt sql-serveris terminated uncleanly — SIGKILL mid-write, or two processes briefly contending for onedata_dir— the noms chunk journal index can become malformed, and the server then refuses to start with no built-in repair path:Today the only recovery is to manually move the data dir aside and restore from a backup — lossy if the DB wasn't backed up.
Requests
doltsubcommand (or automatic recovery on boot behind a flag) that truncates the journal to the last valid offset. The error already computes "possible data loss detected ... at offset N", so the last-good boundary is known.data_dirshould fail before any write that can corrupt the journal, rather than after.Context
Running many Dolt databases under an orchestration layer (gas-city) where unclean restarts occasionally happen; this corruption recurs every few days and each event is a full outage. Dolt 2.0.7, linux/arm64. Happy to provide a corrupted store for repro.