Skip to content
View ichejun's full-sized avatar

Block or report ichejun

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Awesome MoE Diffusion Models

18 Updated Mar 25, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,393 896 Updated Apr 14, 2026

AI agents running research on single-GPU nanochat training automatically

Python 72,168 10,527 Updated Mar 26, 2026

LLM驱动的 A/H/美股智能分析器:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.

Python 29,852 30,585 Updated Apr 13, 2026

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Python 276 26 Updated Apr 7, 2026

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,252 1,859 Updated Mar 17, 2026

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 780 205 Updated Apr 2, 2026

DFlash: Block Diffusion for Flash Speculative Decoding

Python 1,138 77 Updated Apr 14, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 357,207 72,508 Updated Apr 14, 2026

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

Python 566 78 Updated Apr 13, 2026

WeDLM: The fastest diffusion language model with standard causal attention and native KV cache compatibility, delivering real speedups over vLLM-optimized baselines.

Python 639 43 Updated Mar 3, 2026

记录量化LLM中的总结。

Python 69 8 Updated Jan 8, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,454 250 Updated Apr 8, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,296 398 Updated Jan 17, 2026

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 2,249 156 Updated Dec 8, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,795 5,342 Updated Apr 14, 2026

Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"

Python 2,008 237 Updated Apr 8, 2026

[ICLR 2026] Taming large-scale few-step training with self-adversarial flows! 👏🏻

Python 507 26 Updated Feb 24, 2026

Nano vLLM

Python 12,883 1,925 Updated Apr 13, 2026
Python 10,966 741 Updated Feb 9, 2026

(arXiv) MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Python 1,131 49 Updated Feb 26, 2026

An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation

Python 1,574 77 Updated Oct 16, 2025

Light Image Video Generation Inference Framework

Python 2,172 186 Updated Apr 14, 2026

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 4,008 319 Updated Apr 14, 2026

Qwen-Image-Lightning: Speed up Qwen-Image model with distillation

Python 1,289 44 Updated Jan 1, 2026

Dimple, the first Discrete Diffusion Multimodal Large Language Model

Python 117 6 Updated Jul 9, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 2,203 150 Updated Nov 4, 2025

dInfer: An Efficient Inference Framework for Diffusion Language Models

Python 452 44 Updated Feb 11, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,329 857 Updated Mar 22, 2026

This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation (CVPR2026 Highlight)''

Python 2,101 180 Updated Apr 11, 2026
Next