Skip to content
View jianyuh's full-sized avatar

Organizations

@ULAFF @facebookresearch @pytorch

Block or report jianyuh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

NanoGPT (124M) in 3 minutes

Python 3,812 498 Updated Nov 6, 2025

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 49,481 8,291 Updated Nov 12, 2025

Post-training with Tinker

Python 1,918 153 Updated Nov 13, 2025

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 82,392 9,238 Updated Nov 14, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,945 153 Updated Nov 13, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,440 835 Updated Nov 6, 2025

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,296 147 Updated Nov 13, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,405 229 Updated Nov 2, 2025

Nano vLLM

Python 8,832 1,066 Updated Nov 3, 2025

Minimalistic large language model 3D-parallelism training

Python 2,318 258 Updated Sep 3, 2025

Universal memory layer for AI Agents

Python 43,089 4,654 Updated Nov 13, 2025

kernels, of the mega variety

Python 599 27 Updated Sep 28, 2025

My learning notes/codes for ML SYS.

Python 4,144 252 Updated Nov 10, 2025

Scalable toolkit for efficient model reinforcement

Python 1,024 166 Updated Nov 14, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,291 533 Updated Sep 23, 2025

Curated collection of papers in MoE model inference

302 11 Updated Oct 20, 2025

Large Language Model (LLM) Systems Paper List

1,601 86 Updated Nov 13, 2025

s1: Simple test-time scaling

Python 6,596 762 Updated Jun 25, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,264 620 Updated Nov 13, 2025
Python 147 14 Updated Dec 27, 2024

Perplexity GPU Kernels

C++ 530 69 Updated Nov 7, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 920 48 Updated Mar 19, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,462 959 Updated Oct 24, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,878 741 Updated Oct 15, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,719 988 Updated Nov 6, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,862 899 Updated Sep 30, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,566 2,515 Updated Nov 13, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,930 286 Updated May 15, 2025

Official Repo for Open-Reasoner-Zero

Python 2,061 119 Updated Jun 2, 2025
Next