Stars
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
CUDA Templates and Python DSLs for High-Performance Linear Algebra
FlashInfer: Kernel Library for LLM Serving
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
A list of awesome compiler projects and papers for tensor computation and deep learning.
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
Causal depthwise conv1d in CUDA, with a PyTorch interface
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Awesome-LLM: a curated list of Large Language Model
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Resources repo for Ajay CXX twitch stream: https://twitch.tv/ajaycxx
MIT unofficial thesis template from overleaf, updated for 2023
Source code for Twitter's Recommendation Algorithm
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
A timeline of the latest AI models for audio generation, starting in 2023!
🦜🔗 The platform for reliable agents.
A Toolkit for Programming Parallel Algorithms on Shared-Memory Multicore Machines
Set of React components for PDF annotation