Skip to content

Releases: cchuter/ds4

mgpu v0.1.0 — multi-GPU + perf wave

10 Jun 15:18

Choose a tag to compare

First tagged release of the multi-GPU (mgpu) line, with the perf wave merged on top.

Highlights

Multi-GPU (mgpu) v0

  • Monotonic-contiguous layer-placement packer (wave 1)
  • Device-aware CUDA plumbing and per-device selective model cache (wave 1)
  • Engine + placement scaffolding (wave 2)
  • --gpu-vram / --gpu-devices wired into all four binaries (wave 2)
  • CUDA-side per-device plumbing and engine-side multi-tier dispatch (wave 3a)
  • v0 baseline bench harness + report (wave 3)
  • Correctness tests: smoke output + deterministic token compare
  • Ctx plumbed into placement, per-layer KV math, auto-mode reserve, upfront refusal

Performance wave (merged at tip)

  • perf-01: prefill tensorcore path
  • perf-02: opt-in split-KV / flash-decode attention kernels (with exact-window sizing fix)
  • perf-04: fill SMs for routed-MoE gate/up decode via finer launch geometry
  • Safer smaller q8→f16 cache reserve on high-VRAM cards

Q8_K routing

  • Route q8_k experts on CPU and Metal
  • Multi-block q8_k metal MoE coverage

Verifying

  • Tag: `mgpu-v0.1.0`
  • Commit: `0b6035b`

Getting it

```
git clone --branch mgpu-v0.1.0 --depth 1 https://github.com/cchuter/ds4.git
```
or, if already cloned:
```
git fetch --tags origin && git checkout mgpu-v0.1.0
```

Note on `main`

`main` was force-updated to this tag's commit; the previous tip is no longer reachable from `main`. If you had local work on top of the old `main`, rebase onto `mgpu-v0.1.0`.