hdvrescue

Non-destructive recovery of footage from damaged HDV (Sony MPEG-TS) tape captures — the .mpeg/.m2t files a disk-recovery tool (EasyRecovery, PhotoRec) produces when it carves a lost partition. Such files mix the original recording bytes with chunks of unrelated neighbouring files, and a single recording is often split across several carved files. Players freeze mid-playback on them.

hdvrescue rebuilds clean, seekable files without ever re-encoding or remuxing. Everything is byte-level 188-byte transport-stream manipulation; ffmpeg is never used, because even ffmpeg -c copy strips Sony's private AUX stream (stream_type 0xA1, usually PID 0x811) — the stream that carries the camera's recording date/time. That timecode is preserved exactly and is used to name outputs and to correlate fragments across files.

How it works

A non-destructive, report-driven pipeline. The original sources are read-only throughout; bytes are copied exactly once, at the final build step.

   carved sources (.mpeg/.m2t, read-only)
        │
  scan   cut each source into trustworthy SPANS; everything else is a GAP.
  │      spans ∪ gaps tile the file exactly — nothing is silently dropped.
  ▼ report.json   byte-precise: offsets, PMT, PCR, AUX timecode, confidence
        │
  plan   group spans into output recordings: merge what is provably continuous
  │      (even across different source files), split where it is not.
  ▼ plan.json     human-editable: reorder / cut / force-join / rename
        │
  build  copy the exact byte ranges from the originals, fix continuity at each
  │      seam, name by AUX timecode, self-verify the timecode survived.
  ▼ out/2007-10-18_09-14-03.m2t  …

Why report-first instead of carve-then-merge: a scan that immediately writes out fragments has to make irreversible decisions (minimum length, where to cut) while it still has the least information. Here, scan only describes the bytes; the merge/split decisions happen later against the full picture and are reviewable and editable before a single output byte is written.

Install

Python 3.9+, standard library only — no third-party packages, no ffmpeg.

pip install -e .          # provides the `hdvrescue` command

Use

One shot:

hdvrescue recover CLIP001.mpeg CLIP002.mpeg -o out/ --verify

…or stay in the loop and inspect/edit between steps:

hdvrescue scan  CLIP001.mpeg CLIP002.mpeg -o report.json
hdvrescue plan  report.json -o plan.json      # then edit plan.json by hand
hdvrescue build plan.json --report report.json -o out/
hdvrescue verify out/2007-10-18_09-14-03.m2t

plan.json is the source of truth for build: reorder members, set "enabled": false, move a span from one output to another, or rename an output, and build honours it.

Commands

Command	Purpose
`scan INPUT… -o report.json`	Describe each source as byte-precise spans + gaps.
`plan report.json -o plan.json`	Propose merges and splits into a hand-editable plan.
`build plan.json -o out/`	Materialize the plan into final `.m2t` files (reads `report.json` beside the plan, or `--report`).
`report plan.json`	Render the plan as a human-readable Markdown summary (writes `<plan>.md`; `-o -` for stdout).
`verify FILE`	Is the Sony AUX recording timecode still readable? Exit `0` yes, `1` no, `2` error.
`recover INPUT… -o out/`	`scan → plan → build` in one pass; `report.json`/`plan.json` are saved in `out/`.
`dedup report.json`	Byte-verify duplicate fragments (same footage carved twice): reports which are `contained`/`identical` (safe to drop) vs `diverges`.

Key scan knobs: --cc-tolerance {strict,lenient} (strict produces more, smaller, internally-cleaner spans), --pcr-jump-sec, --aux-boundary-sec. Key plan knobs: --max-pcr-jump-sec (within one source, merge same-day footage while the PCR clock stays seekable — raise it to merge more aggressively) and --max-chain-sec (how large an AUX gap may be across carved files and still be chained).

What you get

Outputs named by their AUX recording timecode: YYYY-MM-DD_HH-MM-SS.m2t (_a, _b, … on collisions).
A recording split across several carved files is reassembled into one output.
A camera pause/resume (the AUX timecode jumps but the capture ran on) is kept in one output, not split — and same-day footage stays merged as long as the PCR clock proves the result is still seekable (the scrub bar stays draggable).
Over-recorded tape residue and disk-recovery cross-file contamination (real footage from a different session that the recovery tool spliced in) are separated into their own outputs by recording date — not welded into your clip.
The 0xA1 AUX stream is byte-preserved, so the recording timecode is intact.
When the recovery tool carved the same footage twice, dedup byte-verifies the copies so you can drop a confirmed-redundant one with confidence.

Documentation

docs/hdv-internals.md — HDV/MPEG-TS byte-level reference.
docs/report-format.md — report.json schema.
docs/plan-format.md — plan.json schema and how to edit it.

Development

python -m unittest discover -s tests

Tests are deterministic and use tiny synthetic transport streams built in tests/fixtures.py — no sample captures required.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
src/hdvrescue		src/hdvrescue
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hdvrescue

How it works

Install

Use

Commands

What you get

Documentation

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hdvrescue

How it works

Install

Use

Commands

What you get

Documentation

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages