Starred repositories
I self petitioned my EB1A and got approved. This repository contains my original petition, RFE response, and link to resources I used.
A simple pip-installable Python tool to generate your HTML citation world map from your Google Scholar ID.
Example Claude skill for explaining technical AI concepts.
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
A Datacenter Scale Distributed Inference Serving Framework
FlashInfer: Kernel Library for LLM Serving
slime is an LLM post-training framework for RL Scaling.
Scalable toolkit for efficient model reinforcement
My learning notes for ML SYS.
Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks
Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
A collection of awesome researchers and papers about disaggregated memory.
DeepEP: an efficient expert-parallel communication library
A quick guide (especially) for trending instruction finetuning datasets
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
GLake: optimizing GPU memory management and IO transmission.
Awesome-LLM: a curated list of Large Language Model
Ongoing research training transformer models at scale
Open-source benchmark suite for cloud microservices