Stars
- All languages
- Batchfile
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Cuda
- Dart
- Elixir
- Elm
- Emacs Lisp
- FreeMarker
- Go
- HTML
- Handlebars
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- Mojo
- MoonScript
- OCaml
- Objective-C
- PHP
- Perl
- Python
- QML
- R
- ReScript
- Ruby
- Rust
- Scala
- Shell
- Swift
- TeX
- TypeScript
- V
- Vim Script
- WebAssembly
TokenSpeed is a speed-of-light LLM inference engine.
Vera: a programming language designed for LLMs to write
Offline optimization of your disaggregated Dynamo graph
A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.
Visualize CPython's specializing, adaptive interpreter. 🔥
Our first fully AI generated deep learning system
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.
Autonomous GPU Kernel Generation & Optimization via Deep Agents
magic-trace collects and displays high-resolution traces of what a process is doing
🤖 MLE-Agent: Your intelligent companion for seamless AI engineering and research. 🔍 Integrate with arxiv and paper with code to provide better code/research plans 🧰 OpenAI, Anthropic, Gemini, Ollam…
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Wave: Python Domain-Specific Language for High Performance Machine Learning
jax-triton contains integrations between JAX and OpenAI Triton
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Development repository for the Triton-Linalg conversion
Efficient Triton Kernels for LLM Training
FlagGems is an operator library for large language models implemented in the Triton Language.
Maple Mono: Open source monospace font with round corner, ligatures and Nerd-Font icons for IDE and terminal, fine-grained customization options. 带连字和控制台图标的圆角等宽字体,中英文宽度完美2:1,细粒度的自定义选项
QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.
A high-throughput and memory-efficient inference and serving engine for LLMs
IREE's PyTorch Frontend, based on Torch Dynamo.