Highlights
Lists (11)
Sort Name ascending (A-Z)
Starred repositories
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
My continuously updated Machine Learning, Probabilistic Models and Deep Learning notes and demos (2000+ slides) 我不间断更新的机器学习,概率模型和深度学习的讲义(2000+页)和视频链接
A kernel library written in tilelang
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Open source repository of plugins primarily intended for knowledge workers to use in Claude Cowork
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
Mawaqit integration - salat time and nearest mosque - in Home Assistant
Accelerating MoE with IO and Tile-aware Optimizations
LM engine is a library for pretraining/finetuning LLMs
DeepEP: an efficient expert-parallel communication library
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
frozen-in-time version of our Paper Finder agent for reproducing evaluation results
Easily embed, cluster and semantically label text datasets
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Distributed Compiler based on Triton for Parallel Systems
A debugging and profiling tool that can trace and visualize python code execution
A flexible and efficient training framework for large-scale alignment tasks
torchcomms: a modern PyTorch communications API
CSCS User Lab Day – Meet the Swiss National Supercomputing Centre
Post-training with Tinker
iperf3: A TCP, UDP, and SCTP network bandwidth measurement tool
A tool for bandwidth measurements on NVIDIA GPUs.
Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
Analyze computation-communication overlap in V3/R1.
Pipeline Parallelism Emulation and Visualization
Bridge Megatron-Core to Hugging Face/Reinforcement Learning