jianyuh

Jianyu Huang jianyuh

Beat the speed of light.

294 followers · 39 following

Meta
http://jianyuhuang.com/

Achievements

x3 x2

Achievements

x3 x2

Organizations

Lists (1)

Sort

MLSys_Learn

4 repositories

Stars

KellerJordan / modded-nanogpt

NanoGPT (124M) in 3 minutes

Python 3,812 498 Updated Nov 6, 2025

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 49,481 8,291 Updated Nov 12, 2025

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 1,918 153 Updated Nov 13, 2025

google-gemini / gemini-cli

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 82,392 9,238 Updated Nov 14, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,945 153 Updated Nov 13, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,440 835 Updated Nov 6, 2025

alibaba / ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,296 147 Updated Nov 13, 2025

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,405 229 Updated Nov 2, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 8,832 1,066 Updated Nov 3, 2025

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 2,318 258 Updated Sep 3, 2025

mem0ai / mem0

Universal memory layer for AI Agents

Python 43,089 4,654 Updated Nov 13, 2025

HazyResearch / Megakernels

kernels, of the mega variety

Python 599 27 Updated Sep 28, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 4,144 252 Updated Nov 10, 2025

NVIDIA-NeMo / RL

Scalable toolkit for efficient model reinforcement

Python 1,024 166 Updated Nov 14, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,291 533 Updated Sep 23, 2025

deepseek-ai / DeepSeek-Prover-V2

1,200 90 Updated Jul 18, 2025

MoE-Inf / awesome-moe-inference

Curated collection of papers in MoE model inference

302 11 Updated Oct 20, 2025

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

1,601 86 Updated Nov 13, 2025

simplescaling / s1

s1: Simple test-time scaling

Python 6,596 762 Updated Jun 25, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,264 620 Updated Nov 13, 2025

yifuwang / symm-mem-recipes

Python 147 14 Updated Dec 27, 2024

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 530 69 Updated Nov 7, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 920 48 Updated Mar 19, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,462 959 Updated Oct 24, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,878 741 Updated Oct 15, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,719 988 Updated Nov 6, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,862 899 Updated Sep 30, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,566 2,515 Updated Nov 13, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,930 286 Updated May 15, 2025

Open-Reasoner-Zero / Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Python 2,061 119 Updated Jun 2, 2025