A Ph.D. student in Database Group @ THU
-
Tsinghua University
- Beijing, China
-
02:21
(UTC +08:00)
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
- All languages
- Assembly
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Dart
- Dockerfile
- EJS
- Erlang
- Go
- HTML
- Haskell
- JSON
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- Lua
- MATLAB
- MDX
- Makefile
- Markdown
- Objective-C
- PHP
- PLpgSQL
- Perl
- PowerShell
- Python
- R
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Swift
- SystemVerilog
- TLA
- TSQL
- Tcl
- TeX
- TypeScript
- V
- VHDL
- Vim Script
- Vue
2
stars
written in Cuda
Clear filter
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.