Highlights
- Pro
Stars
Fast and memory-efficient exact attention
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Official QEMU mirror. Please see https://www.qemu.org/contribute/ for how to submit changes to QEMU. Pull Requests are ignored. Please only use release tarballs from the QEMU website.
Lists of company wise questions available on leetcode premium. Every csv file in the companies directory corresponds to a list of questions on leetcode for a specific company based on the leetcode …
Official inference library for Mistral models
Optimizer and compiler/toolchain library for WebAssembly
Efficient Triton Kernels for LLM Training
Efficient vision foundation models for high-resolution generation and perception.
A list of awesome compiler projects and papers for tensor computation and deep learning.
💥💻💥 A data-parallel functional programming language
This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Implementation of Peter Shirley's Ray Tracing In One Weekend book using Vulkan and NVIDIA's RTX extension.
Working draft of the proposed RISC-V V vector extension
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs.
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
A listing of compiler, language and runtime teams for people looking for jobs in this area
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
A sparse attention kernel supporting mix sparse patterns
Repository to host and maintain SCALE-Sim code
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
(Cir)cuit (C)ompiler. Compiling high-level languages to circuits for SMT, zero-knowledge proofs, and more.
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
Sys: A Static/Symbolic Tool for Finding Good Bugs in Good (Browser) Code