littsk

Follow

littsk littsk

Follow

52 followers · 25 following

Xiamen University

Achievements

Achievements

Lists (9)

Sort

ai-compiler

compiler

Fast kernel

llm infer

11 repositories

llm training

post training infra

training-inference-system

Under the framework

utils

Stars

deepseek-ai / TileKernels

A kernel library written in tilelang

Python 1,523 126 Updated Apr 23, 2026

NVIDIA / kvpress

LLM KV cache compression made easy

Python 1,085 145 Updated May 12, 2026

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,855 697 Updated May 14, 2026

Bruce-Lee-LY / cuda_auto_tune

NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels.

Python 20 1 Updated Apr 10, 2026

tanweai / pua

你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候，对你的期望是很高的。一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.

TypeScript 17,410 1,018 Updated May 13, 2026

SandAI-org / MagiCompiler

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 301 23 Updated Apr 27, 2026

GAIR-NLP / daVinci-MagiHuman

Python 1,996 203 Updated Apr 11, 2026

meta-pytorch / autoparallel

An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.

Python 78 20 Updated May 16, 2026

UT-InfraAI / cuco

An agent for CUDA compute-communication kernel co-design

Cuda 35 4 Updated May 7, 2026

KellerJordan / modded-nanogpt

NanoGPT (124M) in 90 seconds

Python 5,257 765 Updated May 14, 2026

areal-project / AReaL

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,176 495 Updated May 16, 2026

SandAI-org / MagiAttention

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 807 52 Updated May 16, 2026

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

C++ 23,889 5,988 Updated May 16, 2026

shawntan / scattermoe

Triton-based implementation of Sparse Mixture of Experts.

Python 273 28 Updated Oct 3, 2025

open-lm-engine / lm-engine

LM engine is a library for pretraining/finetuning LLMs

Python 171 29 Updated May 15, 2026

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 684 85 Updated May 14, 2026

radixark / miles

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,340 208 Updated May 16, 2026

NVIDIA / Model-Optimizer

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python 2,679 400 Updated May 16, 2026

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,049 136 Updated May 16, 2026

ml-energy / zeus

Measure and optimize the energy consumption of your AI applications!

Python 358 44 Updated May 16, 2026

ZHZisZZ / dllm

dLLM: Simple Diffusion Language Modeling

Python 2,500 263 Updated Apr 15, 2026

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,223 570 Updated May 12, 2026

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,513 942 Updated May 15, 2026

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 5,705 794 Updated May 14, 2026

Victarry / PyTorch-Memory-Profiler

Python 47 1 Updated Sep 8, 2025

sgl-project / sglang-jax

JAX backend for SGL

Python 270 99 Updated May 16, 2026

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 36,810 5,162 Updated May 8, 2026

MoonshotAI / checkpoint-engine

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 953 83 Updated Feb 28, 2026

Libraries-Openly-Fused / FusedKernelLibrary

We aim to redefine Data Parallel libraries portabiliy, performance, programability and maintainability, by using C++ standard features, instead of creating new compilers.

C++ 51 3 Updated May 14, 2026

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,278 1,177 Updated May 16, 2026