Skip to content

perf(explore, needs lean4#14053): decode wide bit-refill — ugetUInt64LE in refill #2630

@kim-em

Description

@kim-em

Status: exploration — BLOCKED on lean4#14053 (open). Do not claim until #14053 merges.

Idea

InflateBuf.refill fills the bit buffer with a 7-iteration byte loop. With #14053 it becomes one bulk ugetUInt64LE + shift + advance (byte-assembly fallback near the buffer end). refill is 13.9% of decode (clean decode-ab profile).

Measured (pr-release-14053 kernel microbench)

byte-refill vs wide-refill = 1.63× (earlier campaign run measured 2.08×; ≥1.5× robustly). refill at 13.9% → estimate ~5% decode.

Why we might reject

  • Earlier the matcher USize-refill in-situ regressed (−19%) due to per-call tuple allocation on a short loop — the decode refill must be wired so it does not reintroduce that tuple-alloc boundary. Confirm in situ.

Status

Kernel numbers staged (Refill.lean on pr-release-14053); lands when #14053 merges.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions