Skip to content

[Killer app] Implement DeepSeek v4 Flash for SM120 #28

@mratsim

Description

@mratsim

DeepSeek V4 Flash seems to currently be the 192GB VRAM king (though I never managed to test MiMo V2.5 due to needing to limit context to 90K and in theory 200K being possible).

However it's currently stuck in endless churn in vLLM:

Kernels and forks all over the place:

And DeepSeek doesn't plan to support SM120:

And FlashAttention has no MLA kernel and no plan soon:

This requires:

  • Mixed precision Fp8/MXFP4 support
  • MLA kernel
  • DSA kernel (DeepSeek Attention)
  • NSA kernel / Lightning Indexer
  • mHC kernel (manifold constrained Hyper-Connections)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions