DeepSeek V4 Flash seems to currently be the 192GB VRAM king (though I never managed to test MiMo V2.5 due to needing to limit context to 90K and in theory 200K being possible).
However it's currently stuck in endless churn in vLLM:
Kernels and forks all over the place:
And DeepSeek doesn't plan to support SM120:
And FlashAttention has no MLA kernel and no plan soon:
This requires:
- Mixed precision Fp8/MXFP4 support
- MLA kernel
- DSA kernel (DeepSeek Attention)
- NSA kernel / Lightning Indexer
- mHC kernel (manifold constrained Hyper-Connections)
DeepSeek V4 Flash seems to currently be the 192GB VRAM king (though I never managed to test MiMo V2.5 due to needing to limit context to 90K and in theory 200K being possible).
However it's currently stuck in endless churn in vLLM:
Kernels and forks all over the place:
And DeepSeek doesn't plan to support SM120:
And FlashAttention has no MLA kernel and no plan soon:
This requires: