Releases: cchuter/ds4
Releases · cchuter/ds4
mgpu v0.1.0 — multi-GPU + perf wave
First tagged release of the multi-GPU (mgpu) line, with the perf wave merged on top.
Highlights
Multi-GPU (mgpu) v0
- Monotonic-contiguous layer-placement packer (wave 1)
- Device-aware CUDA plumbing and per-device selective model cache (wave 1)
- Engine + placement scaffolding (wave 2)
--gpu-vram/--gpu-deviceswired into all four binaries (wave 2)- CUDA-side per-device plumbing and engine-side multi-tier dispatch (wave 3a)
- v0 baseline bench harness + report (wave 3)
- Correctness tests: smoke output + deterministic token compare
- Ctx plumbed into placement, per-layer KV math, auto-mode reserve, upfront refusal
Performance wave (merged at tip)
- perf-01: prefill tensorcore path
- perf-02: opt-in split-KV / flash-decode attention kernels (with exact-window sizing fix)
- perf-04: fill SMs for routed-MoE gate/up decode via finer launch geometry
- Safer smaller q8→f16 cache reserve on high-VRAM cards
Q8_K routing
- Route q8_k experts on CPU and Metal
- Multi-block q8_k metal MoE coverage
Verifying
- Tag: `mgpu-v0.1.0`
- Commit: `0b6035b`
Getting it
```
git clone --branch mgpu-v0.1.0 --depth 1 https://github.com/cchuter/ds4.git
```
or, if already cloned:
```
git fetch --tags origin && git checkout mgpu-v0.1.0
```
Note on `main`
`main` was force-updated to this tag's commit; the previous tip is no longer reachable from `main`. If you had local work on top of the old `main`, rebase onto `mgpu-v0.1.0`.