ywang96

Roger Wang ywang96

Move faster and we will get there

254 followers · 36 following

Achievements

x3 x4

Achievements

x3 x4

Organizations

Stars

torchspec-project / TorchSpec

A PyTorch native library for training speculative decoding models

Python 80 13 Updated Apr 14, 2026

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 4,276 747 Updated Apr 14, 2026

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,955 1,731 Updated Jan 30, 2026

tx-project / tx

Cross-platform transformer training

Python 3 1 Updated Nov 14, 2025

ByteDance-Seed / VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,820 179 Updated Apr 14, 2026

SJTU-DENG-Lab / Discrete-Diffusion-Forcing

Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference

Python 247 17 Updated Feb 3, 2026

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 779 204 Updated Apr 2, 2026

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,546 1,020 Updated Apr 14, 2026

hiyouga / EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,846 366 Updated Apr 6, 2026

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,556 1,003 Updated Apr 7, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,918 267 Updated Apr 9, 2026

xjdr-alt / entropix

Entropy Based Sampling and Parallel CoT Decoding

Python 3,432 321 Updated Nov 13, 2024

hu-po / docs

documentation for content creation

HTML 233 22 Updated Oct 3, 2025

lucidrains / transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,336 71 Updated Jan 27, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,568 1,784 Updated Apr 9, 2026

fixie-ai / ultravox

A fast multimodal LLM for real-time voice

Python 4,399 371 Updated Dec 12, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 76,481 15,544 Updated Apr 14, 2026

vllm-project / vllm-nccl

Manages vllm-nccl dependency

Python 18 3 Updated Jun 3, 2024

zeux / calm

CUDA/Metal accelerated language model inference

C 634 31 Updated May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roger Wang ywang96

Achievements

Achievements

Organizations

Block or report ywang96

Stars

torchspec-project / TorchSpec

vllm-project / vllm-omni

QwenLM / Qwen3-VL

tx-project / tx

ByteDance-Seed / VeOmni

SJTU-DENG-Lab / Discrete-Diffusion-Forcing

sgl-project / SpecForge

ai-dynamo / dynamo

hiyouga / EasyR1

deepseek-ai / FlashMLA

BBuf / how-to-optim-algorithm-in-cuda

xjdr-alt / entropix

hu-po / docs

lucidrains / transfusion-pytorch

NVIDIA / cutlass

fixie-ai / ultravox

vllm-project / vllm

vllm-project / vllm-nccl

zeux / calm