Tags: deepcausa/datawal
Tags
Add dm-flakey power-loss harness + release 0.1.5 (#31) * feat(testing): add dm-flakey power-loss simulation harness (#31) Add a Linux device-mapper based power-loss harness that exercises the WAL under a fault model stronger than SIGKILL: the harness flips the ext4-backing dm-flakey layer to error_writes, force-unmounts, then remounts the layer healthy and verifies the reopened store against an fsync-ordered oracle. New artefacts: - crates/datawal-core/examples/power_loss_workload.rs: deterministic put/delete workload, fsync per op, append-on-fsync JSONL oracle on a separate filesystem. - crates/datawal-core/examples/power_loss_validate.rs: post-fault validator. Prints RecoveryReport, checks per-key prefix (Inv 3), payload integrity (Inv 4) and no-extras (Inv 5). - scripts/power_loss_dm_flakey.sh: orchestrator. losetup -> dmsetup flakey healthy -> mkfs.ext4 -> mount -> workload -> reload to error_writes -> umount -f -> reload healthy -> remount -> validate. Strict prefix guardrails: /tmp/datawal-powerloss-* and datawal-test-* dm names only. Linux + root only. - scripts/power_loss_cleanup.sh: idempotent teardown, prefix-guarded. - docs/power-loss-testing.md: harness contract, env-var table, exit codes, 'what this is not' framing. - docs/power-loss-results.md: sanitized record of a verified run (50000 ops, 47827 puts + 2173 dels, 3918 live keys; recovery files_scanned=1 records_replayed=50000 tail_truncated=0 mid_stream_errors=0; validator OK with extras=0). Doc updates: - README.md: Durability evidence section linking the harness. - docs/roadmap.md: row for issue #31, dm-log-writes noted as v2. - CHANGELOG.md: Unreleased entry. Public Rust API unchanged. Wire format unchanged. Linux-only, root-only, not part of CI. * chore: release datawal 0.1.5 Docs-and-testing release. Adds the dm-flakey power-loss harness from #31 (committed separately) and aligns the README with the post-alpha state of the crate. Public Rust API unchanged. Wire format (WIRE_VERSION = 1) unchanged. Corpus fixtures unchanged. README cleanup: - Replaced 'alpha crate' / 'alpha release' / 'alpha limits' / 'not production-ready' wording with pre-1.0 scoped-production framing. - 'What is in' now lists RecordLogReader, scan_iter, datawal CLI, four TLA+ models, fuzz, proptest, crash injection, ENOSPC, soak, dm-flakey, Criterion benches. Removed stale fixed counts. - 'What is not in' dropped 'Reader API / concurrent reads' (now in) and added 'Group commit / configurable fsync policy'. - Limits table updated: Readers = snapshot-at-open RecordLogReader; scan() = eager Vec<Record> with scan_iter() lazy alternative; DataWal keydir = offsets in memory with on-demand CRC-checked I/O; Production status = scoped production use. - Evidence stack reframed by layer (spec / wire / formal / parser / recovery / long-run / performance / operations) instead of fixed counts. - Formal models section updated to four models including ReadWhileWrite.tla. - Running section adds 'cargo run -p datawal-cli -- --help'. * style: cargo fmt power_loss_workload.rs Single-line collapse of write_oracle_line(OracleLine::Del { .. }) call to satisfy rustfmt. No behavioral change. Reformats only crates/datawal-core/examples/power_loss_workload.rs. Same content -- the inline struct literal is now on one line so rustfmt is happy under stable. * style: silence clippy on power-loss examples Two clippy::-D warnings findings, both in the new examples added in cd79bfa. Public API and behaviour unchanged. power_loss_validate.rs: - Replace the 4-tuple return of load_oracle() with a named struct OracleAggregate { effect, last_seq, puts, dels } so the signature is no longer flagged by clippy::type_complexity. Call-site uses field destructuring; no behavioural change. power_loss_workload.rs: - Add #[allow(clippy::modulo_one)] on fn run() with a comment explaining why FSYNC_BATCH=1 today and why the modulo structure is intentional (it stays correct if the constant is ever raised for batched-fsync tuning). Verified locally: - cargo clippy --workspace --all-targets -- -D warnings clean - cargo fmt --all -- --check clean - cargo test --workspace passes
datawal 0.1.4 First non-alpha release of datawal. Closes the 8-PR 0.1.4 quality kit (PRs #23, #24, #25, #26, #27, #28, #29, #30). Skips 0.1.1 / 0.1.2 / 0.1.3 -- each PR in the kit was a quality increment, not a hotfix, and the 'alpha' qualifier is dropped now that the crate has been validated in active use. Highlights since 0.1.0-alpha.1: - RecordLog::scan_iter (record-level lazy iterator). - RecordLog::is_poisoned (poison-writer + 9 stable reasons). - RecordLogReader (lockless reader) + formal/ReadWhileWrite.tla. - Keydir-by-offset: O(keys) memory instead of O(keys + values), per-get pread + CRC32C revalidation via fd-pool LRU. - datawal CLI binary: 8 subcommands (scan/get/report/verify/dump/ check inspection + export/compact source-untouched). JSON schema datawal.cli.v1. - 4 TLA+ models gated in CI. - Reference benches on ext4-NVMe in docs/benchmarks/v0.1.4-reference.md. Wire format unchanged (WIRE_VERSION=1, frozen). MSRV unchanged (1.75.0). Public surface: additive only modulo &mut self on DataWal::get/items/compact_to/export_jsonl (semver-acceptable in 0.1.x line).
datawal v0.1.0-alpha First hardened release after the v0.1-pre walking skeleton. Highlights: - Real CRC-32C (Castagnoli), pinned by a known-vector test. - Single-writer per directory via fs2 fd-based advisory lock; second open on the same directory fails fast; Drop / process exit releases. - Durability boundary made explicit: append is recoverable, fsync is the durability boundary (sync_all on segment + fsync containing dir). - TLA+ models for RecordLog, KeydirProjection, Compaction; model-checked with TLC 2.19. - Wire-format corpus committed under tests/corpus/, plus 11 tests exercising it (valid scan, tail truncation, bad CRC in a closed segment, unknown version, delete tombstone projection, compact_to). - 58 tests total, all green. Not yet: - CAS, PyO3, server, multi-writer, compression, in-place compact, reader API, query, semantic codecs. Wording: model-checked under documented assumptions, not formally verified. The models pin the protocol; they do not check the Rust implementation.
ci: add tag-gated release job for crates.io publish New job 'release' triggers only on push of tags matching 'v*'. Gates publish on success of rust matrix + dry-run + formal + corpus. Verifies that the tag name (minus 'v' prefix) matches Cargo.toml version before invoking 'cargo publish -p datawal'. This catches the common mistake of tagging without bumping (or vice versa) before the package hits crates.io irrevocably. Uses secrets.CARGO_REGISTRY_TOKEN. To release: bump Cargo.toml, commit, 'git tag v<version> && git push origin v<version>'.