Stars
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
FlashInfer: Kernel Library for LLM Serving
A high-throughput and memory-efficient inference and serving engine for LLMs
The Unified Intent Interface: The easiest way to build intent-powered UIs
A collection of full time roles in SWE, Quant, and PM for new grads.
Building blocks for foundation models.
A modular, extensible LLM inference benchmarking framework that supports multiple benchmarking frameworks and paradigms.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Ongoing research training transformer models at scale
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
Collection of Summer 2026 tech internships!
Development repository for the Triton language and compiler
A tool for examining GPU scheduling behavior.
GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.
A latent text-to-image diffusion model
CVNets: A library for training computer vision networks
An open-source efficient deep learning framework/compiler, written in python.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
A library that provides an embeddable, persistent key-value store for fast storage.