WeDLM: The fastest diffusion language model with standard causal attention and native KV cache compatibility, delivering real speedups over vLLM-optimized baselines.

Python 642 43 Updated Mar 3, 2026

taishan1994 / LLM-Quantization

记录量化LLM中的总结。

Python 70 8 Updated Jan 8, 2026

thu-ml / TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,459 249 Updated Apr 15, 2026

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,305 399 Updated Jan 17, 2026

LTH14 / JiT

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 2,258 155 Updated Dec 8, 2025

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 26,050 5,437 Updated Apr 18, 2026

Alibaba-Quark / LiveAvatar

Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"

Python 2,020 236 Updated Apr 8, 2026

inclusionAI / TwinFlow

[ICLR 2026] Taming large-scale few-step training with self-adversarial flows! 👏🏻

Python 508 27 Updated Feb 24, 2026

GeeeekExplorer / nano-vllm

Nano vLLM

Python 12,979 1,952 Updated Apr 13, 2026

Tongyi-MAI / Z-Image

Python 11,010 743 Updated Feb 9, 2026

Tencent-Hunyuan / MixGRPO

(arXiv) MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Python 1,130 49 Updated Feb 26, 2026

XueZeyue / DanceGRPO

An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation

Python 1,580 78 Updated Oct 16, 2025

ModelTC / LightX2V

Light Image Video Generation Inference Framework

Python 2,191 187 Updated Apr 18, 2026

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 4,016 320 Updated Apr 18, 2026

ModelTC / Qwen-Image-Lightning

Qwen-Image-Lightning: Speed up Qwen-Image model with distillation

Python 1,294 44 Updated Jan 1, 2026

yu-rp / Dimple

Dimple, the first Discrete Diffusion Multimodal Large Language Model

Python 117 6 Updated Jul 9, 2025

yifan123 / flow_grpo

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 2,207 151 Updated Nov 4, 2025

inclusionAI / dInfer

dInfer: An Efficient Inference Framework for Diffusion Language Models

Python 453 44 Updated Feb 11, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,501 876 Updated Apr 17, 2026

JIA-Lab-research / DreamOmni2

This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation (CVPR2026 Highlight)''

Python 2,102 180 Updated Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M_bones ichejun

Achievements

Achievements

Block or report ichejun

Stars

wangqiang9 / Awesome-Diffusion-MoE

flashinfer-ai / flashinfer

karpathy / autoresearch

ZhuLinsen / daily_stock_analysis

tulerfeng / Gen-Searcher

Wan-Video / Wan2.2

sgl-project / SpecForge

z-lab / dflash

openclaw / openclaw

Tencent / AngelSlim

Tencent / WeDLM