8 releases
| new 0.0.10 | May 10, 2026 |
|---|---|
| 0.0.9 | May 5, 2026 |
| 0.0.6 | Apr 25, 2026 |
#619 in Video
157 downloads per month
Used in oxideav-meta
770KB
16K
SLoC
oxideav-vp9
Pure-Rust VP9 video decoder. Parses the full uncompressed header (§6.2), the compressed header (§6.3), walks the tile / superblock partition quadtree (§6.4.2) and reconstructs keyframe / intra-only and inter frames (single and compound reference, scaled references) across one or more tiles, down to pixels via intra prediction, 8-tap sub-pel motion compensation, inverse transform and dequantised-coefficient add. Zero C dependencies.
Part of the oxideav framework but usable standalone.
Decode support today
- Keyframe / intra-only frames (
FrameType::Keyand theintra_onlymid-stream variant) decode to aVideoFramewith 3-plane 8-bit 4:2:0 luma + chroma data. - Inter (P) frames — single and compound reference. The decoder
keeps the VP9 8-slot DPB (§6.2,
refresh_frame_flagssemantics), reads every block'sis_inter/ref_frame/inter_mode/ MV and runs §8.5.2 motion compensation. For compound blocks (§6.4.17 withreference_mode∈ {COMPOUND, SELECT}) both references are MC'd independently, then blended per §8.5.2 withRound2(a + b, 1). - Scaled references (§8.5.2.3). When a reference frame's size
differs from the current frame, the MV and sub-pel interpolator pick
up
x_step_q4 = (16 * (RefW << 14) / CurW) >> 14(same for y). - Multi-tile frames (§6.4). Tile-column and tile-row partitioning
is supported. The tile payload is split at 4-byte big-endian length
prefixes (last tile consumes the remainder), the boolean engine is
reset per tile, and §6.4.1
get_tile_offsetyields each tile's pixel bounds. The §8.8 loop filter runs once after all tiles are decoded. - Segmentation deltas (§8.6.1 / §8.8.1):
SEG_LVL_ALT_Qoverrides the block quantiser;SEG_LVL_ALT_Loverrides the per-segment loop-filter level. Both respectabs_deltavs delta mode. The per-block segmentation-map read is still scaffold — every block currently reportssegment_id = 0. - All 10 intra modes (§8.5.1):
DC_PRED,V_PRED,H_PRED,D45_PRED,D135_PRED,D117_PRED,D153_PRED,D207_PRED,D63_PRED,TM_PRED. Block sides 4 / 8 / 16 / 32 supported. 127 / 129 neighbour padding handled per libvpxbuild_intra_predictors. - All four inverse-transform sizes (§8.7.1): 4×4, 8×8, 16×16, 32×32.
The four 2-D tx-types
DCT_DCT,ADST_DCT,DCT_ADST,ADST_ADSTare covered for 4 / 8 / 16; 32×32 is DCT-only per spec. The lossless 4×4 Walsh-Hadamard is wired viaTxType::WhtWht. - Sub-pel interpolation (§8.5.1): full 16-phase 8-tap filter
banks for
EIGHTTAP,EIGHTTAP_SMOOTH,EIGHTTAP_SHARP, plusBILINEAR. Per-frame filter or per-blockSWITCHABLEboth supported. 1/8-pel luma, 1/16-pel chroma. - MV decode (§6.4.19): full
mv_joint+ per-componentsign / class / class0_* / bits / fr / hptree withallow_high_precision_mvgating the extra bit.NEARESTMV/NEARMV/ZEROMV/NEWMVmodes supported (see "Deferred" below for the MV-candidate caveat). - Coefficient decode (§6.4.23 / §8.5.2 / §10.5): full token tree
(EOB / ZERO / ONE model nodes + Pareto8 tail for TWO/THREE/FOUR/
CAT1..6), sign bit, per-position context via neighbour magnitudes,
dequantisation against
DC_QLOOKUP/AC_QLOOKUPand the 32×32dq_shift. - Loop filter / deblocking (§8.8): after reconstruction the tile
walker applies the §8.8 deblocking pass — §8.8.1
LvlLookupwith ref / mode deltas, §8.8.2 raster walk in 8×8-MI units over both passes and all three planes, §8.8.3 filter-size clamp, §8.8.4 adaptive filter strength (limit / blimit / thresh) and §8.8.5 sample filter (narrowfilter4, widefilter8/filter16withflat_mask/flat_mask2). Segmentation deltas and multi-tile coordination are still pending. 8-bit 4:2:0 only. - IVF demux (
ivf::iter_frames) lets tests feed ffmpeg-generated clips directly.
Example
use oxideav_codec::Decoder;
use oxideav_core::{CodecId, CodecParameters, Frame, Packet, TimeBase};
use oxideav_vp9::{ivf, make_decoder, CODEC_ID_STR};
let data = std::fs::read("clip.ivf")?;
let params = CodecParameters::video(CodecId::new(CODEC_ID_STR));
let mut dec = make_decoder(¶ms)?;
for (pts, frame) in ivf::iter_frames(&data)?.enumerate() {
let frame = frame?;
let pkt = Packet::new(pts as i64, TimeBase::new(1, 24), frame.payload.to_vec());
dec.send_packet(&pkt)?;
let Frame::Video(v) = dec.receive_frame()? else { unreachable!() };
println!("frame {pts}: {}x{} {:?}", v.width, v.height, v.format);
}
# Ok::<(), Box<dyn std::error::Error>>(())
The shipped integration tests live at tests/vp9_intra_fixture.rs
(128×128 keyframe, luma-statistics gate) and
tests/vp9_inter_fixture.rs (1 key + 1 P frame, asserts the P frame
decodes into Frame::Video and differs from the keyframe at pixel
level).
Encode support (round 40)
encode_keyframe_yuv(&EncoderParams, &YuvFrame) produces a valid VP9
keyframe IVF payload from source 4:2:0 8-bit pixels.
- Per-block luma intra-mode RDO (round 40). Every 8×8+ block runs
a 4-mode SSE-pick across
{DC_PRED, V_PRED, H_PRED, TM_PRED}against decoder-shape neighbour buffers (reconintra::NeighbourBuf::build, 127/129 padding, same as the decoder reads). The picked mode applies to every 4×4 TX sub-block. Mode trackers (above_mode_4x4/left_mode_4x4) mirrorIntraTile's state soKF_Y_MODE_PROBS[a][l]resolves to the same row on both sides. - Forward 4×4 DCT + quantise + token-code the residual with VP9 EOB / ZERO / ONE / TWO–FOUR / CAT1–6 Pareto8 tokens.
- QP-derived loop-filter level (round 40).
EncoderParams::keyframenow seedsloop_filter_levelfrombase_q_idxvia a libvpx-shapeq*0.45 + 1heuristic clamped to[0, 63]. Lossless (q == 0) keeps the filter disabled. Atq = 64the level lands at 29 (vs. the previous fixed-zero), which lets §8.8 deblocking smooth out residual block edges.
The bitstream round-trips through both our own Vp9Decoder and through
ffmpeg 8.1.
Quality at base_q_idx = 64:
| fixture | r2 (DC-only, lf=0) | round 40 (RDO + lf=29) |
|---|---|---|
| 256×256 smooth gradient Y | 50.60 dB | 53.06 dB |
| 256×256 horizontal stripes Y | n/a (new) | 47.62 dB |
| 256×256 chroma (uniform) | ∞ dB | ∞ dB |
The smooth-gradient gain (+2.46 dB) is from QP-derived deblocking; the DC mode wins the SSE picker on a perfectly smooth signal so the mode-RDO path is a no-op there. The mode-RDO path lights up on the horizontal-stripes fixture, where V_PRED tracks the row-to-row step exactly and DC's row-average baseline would lose meaningfully.
encode_keyframe(&EncoderParams) remains available for callers that
want a skip=1 / midgrey-only stream without pixel content.
Installation
[dependencies]
oxideav-core = "0.1"
oxideav-codec = "0.1"
oxideav-vp9 = "0.0"
Registering the decoder
use oxideav_codec::CodecRegistry;
let mut reg = CodecRegistry::new();
oxideav_vp9::register(&mut reg);
Round-22 §6.4.3 HORZ/VERT sub-8×8 fix + §9.3.2 spec-literal sub-mode anchor
tests/vp9_lossless_pattern.rs compares against an ffmpeg testsrc -lossless 1 reference. Headline numbers per round:
| round | lossless pattern Y | lossless c64 Y | lossless gray Y | compound Y mean |
|---|---|---|---|---|
| r17 | 9.69 dB | n/a | 66.77 dB | 10.63 dB |
| r18 | 9.90 dB | n/a | 66.77 dB | 10.72 dB |
| r19 | 9.90 dB | 61.90 dB | 66.77 dB | 10.72 dB |
| r20 | 9.67 dB | 70.10 dB | 45.43 dB | 10.20 dB |
| r21 | 10.41 dB | ∞ (bit-exact) | 45.43 dB | 9.28 dB |
| r22 | 47.70 dB | ∞ (bit-exact) | 45.43 dB | 9.54 dB |
| r23 | 47.70 dB | ∞ (bit-exact) | 45.43 dB | 9.55 dB |
| r24 | 47.70 dB | ∞ (bit-exact) | ∞ (bit-exact) | 9.45 dB |
| r25 | 47.70 dB | ∞ (bit-exact) | ∞ (bit-exact) | 9.45 dB |
Round 23 mirrors the round-22 §6.4.3 sub-8×8 one-call fix to the
inter decode_partition (inter.rs), eliminating the 2× HORZ/VERT
and 4× SPLIT over-call at bsize=BLOCK_8X8. Lossless fixtures
remain bit-exact / r22-stable (they're keyframe-only). The compound
mean Y improvement is small (+0.01 dB) — the dominant remaining
asymmetry has moved to decode_inter_block itself, which reads
one inter_mode + one MV per ref slot regardless of bs,
violating §6.4.16's per-4×4-sub-block read for MiSize<BLOCK_8X8.
That is the round-24+ work.
Round 22 lands two paired §6.4.3 / §9.3.2 fixes that, together, unlock a +37 dB jump on the lossless-pattern Y plane and lift both chroma planes to bit-exact.
decode_partition HORZ/VERT one-call branch (block.rs)
§6.4.3 decode_partition:
if ( subsize < BLOCK_8X8 || partition == PARTITION_NONE )
decode_block( r, c, subsize )
else if ( partition == PARTITION_HORZ ) {
decode_block( r, c, subsize )
if ( hasRows ) decode_block( r + halfBlock8x8, c, subsize )
} ...
The leading branch fires first when subsize < BLOCK_8X8. For
bsize=BLOCK_8X8 + PARTITION_HORZ → subsize=BLOCK_8X4, the
sub-8×8 branch wins and decode_block is called once —
§6.4.6 read_intra_frame_mode_info then reads the 4 sub-modes
internally. The same applies to BLOCK_8X8 + PARTITION_VERT → BLOCK_4X8.
Our intra decode_partition was unconditionally calling
decode_block twice for HORZ/VERT, double-reading mode-info and
desynchronising the bool decoder for every sub-8×8 partition. The
fix gates the second call on bsize > 8. Same shape as the
round-13 SPLIT fix.
read_intra_sub_mode spec-literal neighbour anchor (block.rs)
§9.3.2 default_intra_mode for MiSize < BLOCK_8X8:
if (idy) abovemode = sub_modes[idx]
else abovemode = AvailU ? SubModes[MiRow-1][MiCol][2 + idx] : DC
if (idx) leftmode = sub_modes[idy*2]
else leftmode = AvailL ? SubModes[MiRow][MiCol-1][1 + idy*2] : DC
In the §9.3.2 NOTE "two 1D arrays" layout (which our decoder uses):
above_mode_4x4[mi_col*2 + 0] = sub_modes[2](writer for cell above, position(idy=1, idx=0))above_mode_4x4[mi_col*2 + 1] = sub_modes[3]((idy=1, idx=1))left_mode_4x4 [mi_row*2 + 0] = sub_modes[1]((idy=0, idx=1))left_mode_4x4 [mi_row*2 + 1] = sub_modes[3]((idy=1, idx=1))
The reader index that picks the spec-literal slot is therefore
above_mode_4x4[mi_col*2 + idx] and
left_mode_4x4 [mi_row*2 + idy].
The round-15 code used a constant +1 on both sides (always
sub_modes[3], the bottom-right). Rounds 18-21 measured the
spec-literal switch and saw a 1 dB compound regression — but that
regression was an artefact of the upstream HORZ/VERT double-call
(see above). Once both fixes land together, the spec-literal
anchor is uniformly better.
Per-fixture / per-variant audit (round 22)
Pre-fix baseline = r21 anchor (both +1):
| variant | pattern Y | compound Y | intra fixture mean | c64 |
|---|---|---|---|---|
r21 (both +1) |
10.41 dB | 9.28 dB | 89 | ∞ |
| spec-A0+L0 only | 9.94 dB | 10.79 dB | 6 (FAIL) | ∞ |
| HORZ/VERT fix only | (not measured separately — pairs with anchor) | |||
| r22 (both fixes paired) | 47.70 dB | 9.54 dB | 111 | ∞ |
The "spec-A0+L0 only" row is the lesson: the two §6.4.3 / §9.3.2
divergences were masking each other. Spec-correct on one but not
the other produces a desync (vp9_intra_fixture luma mean
collapses from 89 to 6). Spec-correct on both lifts pattern Y from
9.94 → 47.70 dB and intra fixture mean from 89 → 111.
The 0.74 dB compound regression vs. r21 (10.20 → 9.54) was
attributed to the parallel inter.rs divergence. Round 23
(below) confirms only ~0.01 dB of that is the inter
decode_partition call-count bug; the rest is in
decode_inter_block itself, which still under-reads sub-8×8
inter mode-info per §6.4.16.
vp9-lossless-pattern.ivf is now within 337 of 16384 luma
bytes of bit-exact (98% of luma matches; both chroma planes
match 100%).
Round-23 §6.4.3 sub-8×8 fix on the inter path
Round 23 ports the round-22 block.rs::decode_partition HORZ/VERT
one-call branch to inter.rs::decode_partition, and additionally
collapses the bsize=8 SPLIT 4-call loop down to one
decode_block on BLOCK_4X4 — both per the same §6.4.3
"if (subsize < BLOCK_8X8 || partition == NONE) decode_block(r,c,subsize)"
leading branch.
Pre-r23 the inter path was symmetric with the pre-r22 intra path:
| bsize | partition | pre-r23 calls | r23 calls | spec |
|---|---|---|---|---|
| 8 | NONE | 1 | 1 | 1 |
| 8 | HORZ | 2 | 1 | 1 |
| 8 | VERT | 2 | 1 | 1 |
| 8 | SPLIT | 4 | 1 | 1 |
| 16 | HORZ | 2 | 2 | 2 |
| 16 | VERT | 2 | 2 | 2 |
The 4× SPLIT over-call at bsize=8 was particularly damaging
because each decode_block on a BLOCK_4X4 inter sub-block
re-reads comp_mode_prob, comp_ref_prob /
single_ref_prob, inter_mode_probs, interp_filter_probs,
segment_id, skip_probs, is_inter_prob, tx_size_probs,
plus the entire coefficient detoken — 4× of every per-block
context read where the spec wants 1×.
Effect on vp9-compound.ivf (192×128, 6 shown frames):
frame 0: Y=9.23 dB U=10.02 dB V=8.96 dB (unchanged)
frame 1: Y=9.56 dB U=10.10 dB V=8.82 dB (unchanged)
frame 2: Y=9.66 dB U=10.05 dB V=8.78 dB (unchanged)
frame 3: Y=9.77 dB ← was 9.68 dB (+0.09 dB)
frame 4: Y=9.25 dB U=10.12 dB V=9.17 dB (unchanged)
frame 5: Y=9.86 dB U=10.51 dB V=8.83 dB (unchanged)
mean luma PSNR: 9.55 dB (was 9.54)
Pattern Y stays at 47.70 dB; c64 stays bit-exact (∞ dB). Five of six compound frames also stayed put — the bug only fires on sub-8×8 inter partitions in frame 3. This validates the structural fix as spec-correct but localises the dominant compound dB drop to the inter mode-info reader (next round).
r24+ items now exposed
- §6.4.16
inter_block_mode_infoper-4×4-sub-block iteration: forMiSize < BLOCK_8X8the spec runs the inter_mode read andassign_mvfour times (once per 4×4) before motion-compensating each sub-block independently. Ourdecode_inter_blockreads oneinter_mode+ one MV per ref slot, then MCs the wholebsas a single block. - The pattern of "same-direction asymmetry" between intra and
inter sub-mode reads (cf. r19 / r21 audit notes for
read_intra_sub_mode) probably also exists for inter MVs. Check whether the §9.3.2 sub-MV neighbour anchor needs the samemi_col*2 + idx/mi_row*2 + idyspec-literal rewrite that r22 landed for intra.
Round-24 §6.4.16 sub-8×8 inter mode-info iteration
Round 24 implements the §6.4.16 per-4×4-sub-block iteration in
decode_inter_block. Pre-r24 the inter path read one
inter_mode + one assign_mv per ref per cell regardless of
bs, then MC'd the whole block as a single rectangle. Per spec,
when MiSize < BLOCK_8X8 the reads must happen in a
(idy, idx) walk with steps num_4x4_h / num_4x4_w:
| bs | num_4x4_w | num_4x4_h | sub-block reads |
|---|---|---|---|
| B4x4 | 1 | 1 | 4 |
| B4x8 | 1 | 2 | 2 |
| B8x4 | 2 | 1 | 2 |
| B8x8+ | 2 | 2 | 1 (single cell) |
Each iteration reads its own inter_mode and per-ref
assign_mv, then MCs the matching 4×4-aligned luma sub-rectangle
independently. Chroma stays cell-level (per §8.5.2.2 chroma uses
the cell-level MV under 4:2:0). The cell-level mv_grid /
MiInfo records the LAST sub-block's mode/MV per libvpx
convention.
This is the structural counterpart to the round-22
read_intra_sub_mode (§6.4.6) iteration in block.rs.
Effect on the existing fixtures:
| fixture | r23 | r24 |
|---|---|---|
vp9-lossless-pattern.ivf Y |
47.70 dB | 47.70 dB (same) |
vp9-lossless-pattern.ivf UV |
∞ (bit-exact) | ∞ (bit-exact) |
vp9-lossless-c64.ivf all |
∞ (bit-exact) | ∞ (bit-exact) |
vp9-lossless-gray.ivf Y |
∞ (all 25 frames) | ∞ (all 25 frames) |
vp9-compound.ivf Y mean |
9.55 dB | 9.45 dB (-0.10 dB) |
The lossless fixtures all stay bit-exact / unchanged — the
extra bool-decoder bits the new iteration consumes ARE present
in the bitstream (validates the spec-correct over-read does NOT
desync). The compound mean drops 0.10 dB — within fixture noise
for a 6-frame corpus and consistent with the README r24+ note
that the dominant compound asymmetry now lives in
find_mv_refs's neighbour scan (sub-8×8 cells use cell-level
candidates instead of per-4×4 sub-block candidates).
r25+ items now exposed
- §6.5 per-sub-block
find_mv_refscandidate refinement: spec runsfind_mv_refsper 4×4 sub-block forMiSize < BLOCK_8X8so each sub-block sees its own neighbour MV pool. Our r24 implementation uses cell-level candidates for every sub-block (refs_a / refs_b are computed once per cell). - §9.3.2 sub-MV neighbour anchor — same shape as the r22
intra
+idx/+idyrewrite, applied to per-sub-block MV candidate selection inside the new iteration.
Round-25 §6.5.14 within-MB sub-block MV mixing (append_sub8x8_mvs)
Round 25 partially closes the r24+ "per-sub-block find_mv_refs"
gap by wiring up the §6.5.14 append_sub8x8_mvs path. For the
sub-8×8 inter MBs (B4x4 / B4x8 / B8x4) the §6.4.16 (idy, idx) loop
now refines the (NearestMv, NearMv) pair per-sub-block by
mixing in the MVs that assign_mv already chose for prior
sub-blocks of the same MB:
append_sub8x8_mvs(block, refList) {
find_mv_refs(refFrame, block) // RefListMv[] (cell-level)
if block == 0:
sub8x8Mvs = RefListMv[0..2]
else if block <= 2:
sub8x8Mvs[0] = BlockMvs[refList][0] // sub-block 0's MV
else: // block == 3
sub8x8Mvs[0] = BlockMvs[refList][2] // sub-block 2's MV
// then walk idx=1, 0 (skip dups vs sub8x8Mvs[0])
// pad with cell-level RefListMv entries (skip dups), then ZeroMv
NearestMv = sub8x8Mvs[0]
NearMv = sub8x8Mvs[1]
}
The cell-level BestMv (= RefListMv[0]) is preserved by a new
MvRefs::best_mv_override field that lets the per-sub-block
refined MvRefs pin BestMv (used by NEWMV) to the cell value
while NEAREST/NEAR pull from the rebuilt list[0..2] — matching
the spec invariant that only NEAREST/NEAR are refined per
sub-block.
The remaining gap (the neighbour per-4×4 SubMvs[r][c][refList][idx]
lookup from §6.5.11 get_sub_block_mv, which would let prior
neighbour-MB sub-block MVs be selected based on the requesting
sub-block's (mv_ref_search[i][1]) column delta) is still
cell-level. That path requires a SubMvs table on InterMiCell
which is r26+.
Effect on the existing fixtures:
| fixture | r24 | r25 |
|---|---|---|
vp9-lossless-pattern.ivf Y |
47.70 dB / 337 px | 47.70 dB / 337 px |
vp9-lossless-pattern.ivf UV |
∞ (bit-exact) | ∞ (bit-exact) |
vp9-lossless-c64.ivf all |
∞ (bit-exact) | ∞ (bit-exact) |
vp9-lossless-gray.ivf Y |
∞ (all 25 frames) | ∞ (all 25 frames) |
vp9-inter f1↔f2 luma diffs |
4795 px | 4795 px |
vp9-compound shown frames |
6 / 6 | 6 / 6 |
All keyframe-only fixtures stay bit-exact (the change only fires
on sub-8×8 inter MBs, and the lossless fixtures don't have any).
The vp9-inter and vp9-compound fixtures stay byte-identical
at the visible level because their P-frames don't contain a
sub-8×8 inter MB whose §6.4.16 read path actually selects
NEAREST/NEAR with a non-block-0 sub-block — for those streams
the cell-level (Nearest, Near) match the §6.5.14-refined values
or the fixture's inter_mode lands on ZEROMV/NEWMV (where the
refinement is a no-op). The wiring is structurally spec-correct
and lifts the residual asymmetry flagged in the r24 README; we
expect a measurable PSNR delta on a denser sub-8×8 inter
fixture (r26+ — needs a vp9-sub8x8-inter fixture or a
compound clip with partition_split denser than the r6
6-frame reference).
Tests added (under mvref::tests):
append_sub8x8_block0_returns_cell_pair_unchangedappend_sub8x8_block1_anchors_on_block0_mvappend_sub8x8_block2_anchors_on_block0_mvappend_sub8x8_block3_walks_block2_then_1_then_0_per_specappend_sub8x8_block3_skips_duplicate_block_mvsappend_sub8x8_block_dedups_against_cell_refs_then_falls_back_to_zeromvrefs_best_mv_uses_override_when_present
r26+ items still open
(All resolved in round 26 below.)
Round-26 §6.5.11 between-MB sub-block MV lookup (get_sub_block_mv)
Round 26 (#190) closes the second axis flagged in round 25 — the
neighbour per-4×4 SubMvs[r][c][refList][idx] lookup from
§6.5.11 get_sub_block_mv. With this in, the sub-8×8 inter
NEAREST/NEAR refinement now consults both asymmetries the spec
calls for:
- Within the current MB: prior-sub-block
BlockMvsare mixed into(NearestMv, NearMv)via §6.5.14append_sub8x8_mvs(round 25, #180). - Between MBs: when the current sub-block looks up its
first-two-neighbour candidates, the neighbour cell's
SubMvs[refList][idx]is selected —idxchosen byidx_n_column_to_subblock[block][delta_col == 0]— instead of the neighbour's cell anchor MV. (Round 26, this round.)
Concretely, three changes:
- Storage.
InterMiCellgainssub_mvs: [[Mv; 4]; 2]. The inter writer ininter.rs::decode_inter_blockpopulates it fromblock_mvs_a/block_mvs_bfor sub-8×8 blocks (the per-sub-block MVs already collected by the round-25 §6.4.16 loop) and from the cell-level MV repeated four times for=8×8 blocks (per §6.4.16 line 2700:
for block in 0..4: BlockMvs[refList][block] = Mv[refList]). This matches §6.4.4 line 2422 verbatim:SubMvs[r+y][c+x][refList][b] = BlockMvs[refList][b]. - Lookup.
mvref::get_sub_block_mv(cell, refList, deltaCol, block)mirrors §6.5.11. For the cell-levelblock = -1call it returnscell.sub_mvs[refList][3](=cell.mvper the §6.4.4 invariant) so the legacy path stays bit-identical. Forblock ∈ 0..=3it picks viaIDX_N_COLUMN_TO_SUBBLOCK[block][delta_col == 0], the 4×2 table from the spec ({1,2}, {1,3}, {3,2}, {3,3}). - Wiring.
find_mv_refs_geom_block(..., block)is the new block-aware entrypoint. The first-two-neighbour loop now readsget_sub_block_mv(&cell, j, dc, block)instead ofcell.mv[j]; subsequent neighbour scans (§6.5.7 / §6.5.8) are unchanged per spec. The sub-8×8 inter path (InterTile::find_subblock_mv_refs) calls this for eachblock_idx ∈ 0..=3and threads the result intosub8x8_refined_refs, which still pinsBestMvto the cell-levelRefListMv[0]per §6.4.16 (NEWMV must use the outerfind_best_ref_mvsresult).
The pre-existing fixtures stay byte-identical at the visible
level for the same reason r25's within-MB axis didn't shift
PSNR: their P-frames don't contain a sub-8×8 inter MB whose
first-two-neighbour cell is itself sub-8×8 with diverging
SubMvs[0..3]. Together the two axes lift the structural
asymmetry flagged across r24 and r25 and unblock a
B4x4-heavy inter PSNR gain on a denser future fixture.
Tests added (under mvref::tests):
get_sub_block_mv_cell_level_picks_index_3get_sub_block_mv_indexes_per_idx_n_column_to_subblock_tablefind_mv_refs_block_minus_one_matches_cell_level_pathfind_mv_refs_block_picks_neighbour_sub_mv_per_specfind_mv_refs_block_left_neighbour_uses_dc_neq_0_column
Round-20 §7.4.6 spec-ctx skip read
Round 20 fixed the §6.4.8 / §7.4.6 skip-context wiring. Since round
13 the decoder had been reading skip against the constant
skip_probs[0]=192, ignoring the spec ctx
(AvailU ? AboveSkip : 0) + (AvailL ? LeftSkip : 0). The
vp9-lossless-c64-constant.ivf fixture pinpointed the bug: a 16×8
H_PRED block at (mi_row=1, mi_col=2) where the libvpx encoder had
emitted skip=true against skip_probs[1]=128 (left=0,above=1) but
our decoder kept reading against skip_probs[0]=192, decoding
skip=false and consuming 12 spurious 4×4 token reads which then
ran the bool decoder past EOF and corrupted the rest of the frame.
The fix is one line in block.rs::decode_block (intra path) and a
matching line in inter.rs — both now pass skip_ctx(mi_row, mi_col) into skip_probs[]. The encoder side in
encoder/tile.rs::emit_block was updated to use the same per-context
prob so the self-roundtrip stays bit-exact.
Effect:
vp9-lossless-c64-constant.ivflifts 61.9 → 70.1 dB, byte diffs collapse from 29 → 20 (all chroma is now bit-exact).vp9-lossless-gray.ivfdrops 66.77 → 45.43 dB. The gray fixture is a degeneratevec![126; 64*64]reference: a single bool-decoder mis-skip used to score 66.77 dB by chance because every block's DC prediction matched 126; the spec-correct skip read now exposes the underlying compressed-header / coef-prob drift that the bool-decoder happy-accident was masking. Test threshold is 30 dB; we still pass.vp9-lossless-pattern.ivfY drops 9.90 → 9.67 dB (still > 8 dB threshold). The pattern fixture has high-frequency content that exercises every sub-8x8 mode-info path; the round-20 fix is one of several stacked spec divergences and lifting pattern Y to ≥30 dB is r21+.- compound mean luma 10.49 → 10.20 dB (test prints only, no threshold; the inter path also moved to spec ctx for parity).
Audited but ruled in/out:
- WHT round-trip (§8.7.1.10): bit-correct on DC=16, DC=-1792 (the exact first-block scenario), and a non-DC AC1 sanity case.
- DC_PRED at frame top-left: bit-exact reconstruction of pixel value 16 from predictor 128 with WHT(-1792) on the first 16×16 block of every fixture (lossless-pattern, c64, gray).
- (Note: the r19 audit ruled
update_partition_ctxspec-literal out — that ruling was overturned in r21 once the r20 skip-ctx fix removed the upstream bool-decoder drift that was tainting the r19 measurement.)
Known gaps
Both the intra and inter paths ship, but a handful of accuracy refinements remain before the decoder matches a libvpx reference bit-for-bit:
- MV-candidate list (
find_mv_refs/find_best_ref_mvs):NEARESTMVandNEARMVcurrently resolve to a zero predictor instead of the full neighbour-derived candidate list (§6.4.17).ZEROMVandNEWMVare correct;NEARESTMV/NEARMVdegrade to the ZEROMV spatial result. - Per-block segmentation map:
SEG_LVL_ALT_Q/SEG_LVL_ALT_Ldeltas are applied, but the segmentation-map tree decode + temporal predicted-segment lookup is not wired — every block falls through to segment 0.SEG_LVL_REF_FRAME/SEG_LVL_SKIPare not applied. - Neighbour-aware probability contexts: the crate uses context 0
for the partition tree (§6.4.2
partition_plane_context), for the KF intra-mode probability selection, forskip_prob,is_inter,comp_modeandcomp_ref. Output diverges from the libvpx reference but stays plausible. - Higher bit depths (profiles 2 / 3): the parser recognises
10-bit / 12-bit colour configs but the reconstruction pipeline only
runs on 8-bit
Yuv420P. - 4:2:2 and 4:4:4 subsampling (profiles 1 / 3): the parser
handles the header bits but the output format stays
Yuv420P. B/ reordered frames: VP9 can emit no-show altref frames (show_frame=0). These flow through the DPB correctly and a latershow_existing_framepacket now surfaces the referenced DPB slot as a visibleFrame::Video(§6.2 / §8.2), matching the spec's pure pass-through dispatch (no compressed header / tile decode for the show-existing packet itself).
None of the above prevent the crate from decoding a standard
libvpx-vp9 -g N 8-bit 4:2:0 IPPP stream into frames the caller can
render.
Encoder (experimental — round 40)
The encoder module produces a valid VP9 keyframe bitstream accepted
by ffmpeg / libvpx and round-trippable through this crate's decoder.
Scope today:
- Profile 0, 4:2:0 8-bit, single tile.
- Emits the full §6.2 uncompressed header, §6.3 compressed header (tx_mode, coef_probs-update flags, skip_prob-update flags) and the tile / partition / block symbols per §6.4.
- Per-block luma intra-mode RDO across
{DC, V, H, TM}(round 40);DC_PREDchroma; forward 4×4 DCT + quantise + token-coded residual. - QP-derived loop filter level (round 40) —
EncoderParams::keyframepicks a non-zero deblocking strength viadefault_filter_level. - Partition / intra-mode / skip-context / nonzero-context trackers mirror the decoder's §7.4.6 state exactly so probability rows resolve identically on encode and decode sides.
BoolEncoderis the inverse ofBoolDecoder— standard binary range coder with pending-byte carry propagation. Roundtrip-tested against the decoder with mixed / skewed / equal / carry-forcing probs and 2048-symbol PRNG streams.
Self-roundtrip: the produced bitstream decodes through
Vp9Decoder::send_packet → receive_frame and yields ≥ 50 dB PSNR_Y
on smooth content at base_q_idx = 64.
ffmpeg-acceptance: tests/vp9_encoder_ffmpeg.rs decodes the frame
with the system ffmpeg binary and confirms zero decode errors.
Deferred to follow-up work:
- P-frame / inter encode — the decoder side ships a full inter path (single + compound ref, sub-pel MC, scaled refs, MV decode) but the encoder currently produces keyframes only.
- Multi-tile output — the decoder reads multi-tile bitstreams; encode emits a single tile.
- Two-pass ABR (collect first-pass stats, distribute QP per second-pass).
- Directional intra modes (
D45 / D135 / D117 / D153 / D207 / D63) as additional RDO candidates. - RDO across (mode, partition, tx_size) instead of fixed 8×8/4×4.
- Per-segment QP / loop-filter delta encoding.
Entry points: encoder::encode_keyframe(&EncoderParams) yields a
midgrey skip=1 frame; encoder::encode_keyframe_yuv(&EncoderParams, &YuvFrame) encodes source 4:2:0 pixels.
Codec / container IDs
- Codec:
"vp9"; maps from MP4'svp09sample entry. - IVF demux:
oxideav_vp9::ivf::iter_frames(a thin helper for tests and standalone use, not yet a fullContainerimpl).
License
MIT — see LICENSE.
Dependencies
~0.9–1.6MB
~32K SLoC