Skip to content
View jianyuh's full-sized avatar

Organizations

@ULAFF @facebookresearch @pytorch

Block or report jianyuh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

NanoGPT (124M) in 3 minutes

Python 3,769 489 Updated Nov 6, 2025

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 49,051 8,214 Updated Dec 9, 2024

Post-training with Tinker

Python 1,445 113 Updated Nov 5, 2025

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 81,687 9,123 Updated Nov 6, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,936 148 Updated Nov 5, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,332 826 Updated Nov 6, 2025

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,206 136 Updated Nov 5, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,386 229 Updated Nov 2, 2025

Nano vLLM

Python 8,380 1,022 Updated Nov 3, 2025

Minimalistic large language model 3D-parallelism training

Python 2,298 253 Updated Sep 3, 2025

Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

Python 42,762 4,612 Updated Nov 4, 2025

kernels, of the mega variety

Python 597 26 Updated Sep 28, 2025

My learning notes/codes for ML SYS.

Python 4,076 248 Updated Nov 6, 2025

Scalable toolkit for efficient model reinforcement

Python 1,008 166 Updated Nov 6, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,255 524 Updated Sep 23, 2025

Curated collection of papers in MoE model inference

297 11 Updated Oct 20, 2025

Large Language Model (LLM) Systems Paper List

1,582 86 Updated Nov 4, 2025

s1: Simple test-time scaling

Python 6,593 764 Updated Jun 25, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,228 617 Updated Nov 6, 2025
Python 147 14 Updated Dec 27, 2024

Perplexity GPU Kernels

C++ 523 69 Updated Oct 27, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 919 47 Updated Mar 19, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,442 957 Updated Oct 24, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,863 738 Updated Oct 15, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,696 973 Updated Nov 6, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,849 896 Updated Sep 30, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,179 2,438 Updated Nov 6, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,929 286 Updated May 15, 2025

Official Repo for Open-Reasoner-Zero

Python 2,059 119 Updated Jun 2, 2025
Next