Stars
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A collection of full time roles in SWE, Quant, and PM for new grads.
FlashMLA: Efficient Multi-head Latent Attention Kernels
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A curated list of projects related to the reMarkable tablet
A Datacenter Scale Distributed Inference Serving Framework
FlashInfer: Kernel Library for LLM Serving
we want to create a repo to illustrate usage of transformers in chinese
Translation of C++ Core Guidelines [https://github.com/isocpp/CppCoreGuidelines] into Simplified Chinese.
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
My Python scripts to make high-quality figures for publications in top AI conferences and journals.
Universal cross-platform tokenizers binding to HF and sentencepiece
Quick, visual, principled introduction to pytorch code through five colab notebooks.
Multi-Faceted AI Agent and Workflow Autotuning. Automatically optimizes LangChain, LangGraph, DSPy programs for better quality, lower execution latency, and lower execution cost. Also has a simple …
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
Paper collections of retrieval-based (augmented) language model.
A lightweight design for computation-communication overlap.
Unofficial description of the CUDA assembly (SASS) instruction sets.