Stars
An instrumentation tool to monitor queue depths in tokio channels
DDGS | Dux Distributed Global Search. A metasearch library that aggregates results from diverse web search services
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Cost-efficient and pluggable Infrastructure components for GenAI inference
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
Tempo is a system for declarative, efficient, end-to-end compiled dynamic deep learning
Large Language Model (LLM) Systems Paper List
Lightweight coding agent that runs in your terminal
Analyze computation-communication overlap in V3/R1.
Replace 'hub' with 'ingest' in any GitHub URL to get a prompt-friendly extract of a codebase
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
Watches files and records, or triggers actions, when they change.
Dynamic resources changes for multi-dimensional parallelism training
Fully open reproduction of DeepSeek-R1
Golang bindings for Nvidia Datacenter GPU Manager (DCGM)
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Recipes to scale inference-time compute of open models
Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.
A low-latency & high-throughput serving engine for LLMs
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Minimal, single page, smooth-scrolling theme for Hugo static site generator.
A bibliography and survey of the papers surrounding o1
Official inference library for Mistral models
📺 Discover the latest machine learning / AI courses on YouTube.