jianyuh

Jianyu Huang jianyuh

Beat the speed of light.

289 followers · 39 following

Meta
http://jianyuhuang.com/

Achievements

x3 x2

Achievements

x3 x2

Organizations

Lists (1)

Sort

MLSys_Learn

4 repositories

Stars

KellerJordan / modded-nanogpt

NanoGPT (124M) in 3 minutes

Python 3,769 489 Updated Nov 6, 2025

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 49,051 8,214 Updated Dec 9, 2024

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 1,445 113 Updated Nov 5, 2025

google-gemini / gemini-cli

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 81,687 9,123 Updated Nov 6, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,936 148 Updated Nov 5, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,332 826 Updated Nov 6, 2025

alibaba / ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,206 136 Updated Nov 5, 2025

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,386 229 Updated Nov 2, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 8,380 1,022 Updated Nov 3, 2025

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 2,298 253 Updated Sep 3, 2025

mem0ai / mem0

Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

Python 42,762 4,612 Updated Nov 4, 2025

HazyResearch / Megakernels

kernels, of the mega variety

Python 597 26 Updated Sep 28, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 4,076 248 Updated Nov 6, 2025

NVIDIA-NeMo / RL

Scalable toolkit for efficient model reinforcement

Python 1,008 166 Updated Nov 6, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,255 524 Updated Sep 23, 2025

deepseek-ai / DeepSeek-Prover-V2

1,199 89 Updated Jul 18, 2025

MoE-Inf / awesome-moe-inference

Curated collection of papers in MoE model inference

297 11 Updated Oct 20, 2025

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

1,582 86 Updated Nov 4, 2025

simplescaling / s1

s1: Simple test-time scaling

Python 6,593 764 Updated Jun 25, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,228 617 Updated Nov 6, 2025

yifuwang / symm-mem-recipes

Python 147 14 Updated Dec 27, 2024

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 523 69 Updated Oct 27, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 919 47 Updated Mar 19, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,442 957 Updated Oct 24, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,863 738 Updated Oct 15, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,696 973 Updated Nov 6, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,849 896 Updated Sep 30, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,179 2,438 Updated Nov 6, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,929 286 Updated May 15, 2025

Open-Reasoner-Zero / Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Python 2,059 119 Updated Jun 2, 2025