-
University of Science and Technology of China
- Hefei
-
17:41
(UTC +08:00) - https://qiaolian9.github.io/
- https://orcid.org/0000-0002-3366-9881
Highlights
- Pro
Lists (6)
Sort Name ascending (A-Z)
Stars
Our first fully AI generated deep learning system
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
Ongoing research training transformer models at scale
[HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
Light Image Video Generation Inference Framework
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
Accelerating MoE with IO and Tile-aware Optimizations
Helpful kernel tutorials and examples for tile-based GPU programming
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Official inference repo for FLUX.2 models
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…
Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions
Building the Virtuous Cycle for AI-driven LLM Systems
📄 Awesome CV is LaTeX template for your outstanding job application
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Efficient End2End Compiler for Mixed-Precision Deep Learning
Sparse Attention; Sparse Linear; Diffusion Transformer