Lists (29)
Sort Name ascending (A-Z)
AI
ai_tool
ai工具类建站app
data
finance
fine-tuning
free_serve
免费服务front-lowcode
game
gfw
gis
hardware
java-frame
js
KB
lowcode
mcp
NLP
note
python
Robot
RPA
safe
shop
spider
vedio
vert.x
vertx
wp
- All languages
- Astro
- Batchfile
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Common Lisp
- Cuda
- Dart
- Dockerfile
- Elixir
- Elm
- Emacs Lisp
- Erlang
- GLSL
- Go
- Groovy
- HCL
- HTML
- Haskell
- JSON
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- LLVM
- Less
- Lua
- MATLAB
- MDX
- MLIR
- MQL5
- Makefile
- Markdown
- Mojo
- Nim
- Objective-C
- PHP
- PLpgSQL
- PostScript
- PowerShell
- Python
- QML
- R
- Ren'Py
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Stylus
- Svelte
- Swift
- TypeScript
- Verilog
- Vue
- Zig
Starred repositories
FlashInfer: Kernel Library for LLM Serving
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Flash Attention in ~100 lines of CUDA (forward pass only)
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.