-
15:53
(UTC -08:00) - https://rogerw.io
- in/rogerywang
- @rogerw0108
Stars
A framework for efficient model inference with omni-modality models
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
A Datacenter Scale Distributed Inference Serving Framework
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
FlashMLA: Efficient Multi-head Latent Attention Kernels
how to optimize some algorithm in cuda.
Entropy Based Sampling and Parallel CoT Decoding
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A high-throughput and memory-efficient inference and serving engine for LLMs