Minimal, config-first automation for a full MoE compression run:
- observations
- pruning
- upload pruned weights to Hugging Face
- quantization
- upload quantized weights to Hugging Face
- benchmarking
- render one auditable run report
This repo is intentionally small. It does not pretend every MoE uses the same vendor command line.
What it does provide:
- one pipeline runner that executes a full run from one JSON config
- one calibration-bundle builder for local JSONL data plus public Hugging Face datasets
- one normalized report renderer
What you configure up front:
- model path or model id
- local calibration dataset path
- public calibration mix size
- prune percentages
- quantization method and scheme such as
w4a16 - Hugging Face repo names for pruned and quantized outputs
- the exact observe, prune, upload, quantize, benchmark, and manifest-assembly commands for your MoE
compress/
README.md
requirements.txt
scripts/
run_moe_pipeline.py
build_master_calibration_bundle.py
render_reap_run_report.py
examples/
automatic_pipeline.example.json
master_calibration_bundle.example.json
run_report_manifest.example.json
run_moe_pipeline.py is the entrypoint. It runs named stages in order, captures logs, writes pipeline state, and stops on first failure.
Supported stage types:
build_calibration_bundlecommandrender_report
The runner is model agnostic because the architecture-specific work stays in your configured command stages.
The pipeline config supports placeholders like {model_path}.
Available variables:
- top-level
parameters.* {repo_root}{run_dir}{pipeline_name}- stage outputs such as:
{stage_build_calibration_bundle_output_dir}{stage_build_calibration_bundle_summary_json}{stage_build_calibration_bundle_merged_output_jsonl}{stage_observations_log_path}{stage_quantize_status}
build_calibration_bundle and render_report both support either a file path or inline JSON:
configorinline_configmanifestorinline_manifest
That means you can drive the whole pipeline from one file.
The bundled calibration example uses the split that has proven most practical for code and agentic REAP work:
- one long-context lane from local data
- one broad short-mix lane from local data plus public coverage
100rows from each public dataset:theblackcat102/evol-codealpaca-v1Salesforce/xlam-function-calling-60kSWE-bench/SWE-smith-trajectoriesopen-r1/Mixture-of-Thoughtscodeopen-r1/Mixture-of-Thoughtsmathopen-r1/Mixture-of-Thoughtsscience
If your deployment traffic is not code or agentic, change the mix. Do not cargo-cult this bundle into a different workload.
From the repo root:
uv run ./scripts/run_moe_pipeline.py \
--config ./examples/automatic_pipeline.example.jsonThe example pipeline is a template. Replace the example command strings with the real commands for your MoE stack.
Recommended stage order:
- build the calibration bundle
- run observations on the base model
- prune the requested variants
- upload pruned checkpoints
- quantize the validated prune outputs
- upload quantized checkpoints
- benchmark the variants you care about
- assemble one normalized run manifest
- render the final report
Dry run:
uv run ./scripts/build_master_calibration_bundle.py \
--config ./examples/master_calibration_bundle.example.json \
--output-dir ./output/calibration-plan \
--dry-runReal build:
uv run --with datasets ./scripts/build_master_calibration_bundle.py \
--config ./examples/master_calibration_bundle.example.json \
--output-dir ./output/master-calibrationOutputs:
- one JSONL per lane
- one merged JSONL
- one summary JSON
- one Markdown summary
uv run ./scripts/render_reap_run_report.py \
--manifest ./examples/run_report_manifest.example.json \
--output-dir ./output/example-reportOutputs:
report.jsonreport.mdindex.html
The report renderer expects a JSON manifest with these sections:
modelcalibrationpruningquantizationpublishingbenchmarkingresults
The pipeline runner does not invent these facts. Your configured commands should write the artifacts and produce a normalized manifest file at the end of the run.
That is the correct boundary:
- this repo handles orchestration, calibration planning, and reporting
- your MoE-specific tooling handles observations, pruning, quantization, benchmarking, and upload mechanics
Treat examples/automatic_pipeline.example.json as the one file you edit for a new model. Keep the stage order. Replace the command strings. Point the final report stage at the normalized manifest produced by your tooling.