qiaolian9

Follow

Focusing

Liang Qiao qiaolian9

Focusing

Follow

12 followers · 23 following

University of Science and Technology of China
Hefei
17:41 (UTC +08:00)
https://qiaolian9.github.io/
https://orcid.org/0000-0002-3366-9881

Achievements

Achievements

Highlights

Pro

Lists (6)

Sort

cuda learning

Diffusion Inference

DL Compilers

13 repositories

DSL

KernelOptimization

LLMs inference

Stars

NVlabs / vibetensor

Our first fully AI generated deep learning system

Python 544 38 Updated Feb 2, 2026

Tencent / hpc-ops

High Performance LLM Inference Operator Library

C++ 729 57 Updated Feb 5, 2026

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 642 35 Updated Feb 14, 2026

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 15,221 3,604 Updated Feb 18, 2026

OpenBitSys / BitDecoding

[HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.

C++ 80 7 Updated Dec 18, 2025

ModelTC / LightX2V

Light Image Video Generation Inference Framework

Python 1,964 161 Updated Feb 11, 2026

AndreSlavescu / mHC.cu

mHC kernels implemented in CUDA

Cuda 252 19 Updated Jan 14, 2026

sgl-project / mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,504 436 Updated Feb 17, 2026

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 833 61 Updated Feb 13, 2026

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 586 53 Updated Feb 12, 2026

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 645 46 Updated Feb 17, 2026

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,926 116 Updated Feb 17, 2026

dsl-learn / cutile-learn

NVIDIA cuTile learn

Python 162 1 Updated Dec 9, 2025

black-forest-labs / flux2

Official inference repo for FLUX.2 models

Python 1,792 108 Updated Feb 17, 2026

microsoft / MInference

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,188 75 Updated Sep 30, 2025

InfiniTensor / InfiniTensor

C++ 289 69 Updated Feb 4, 2026

mjun0812 / flash-attention-prebuild-wheels

Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions

Python 938 58 Updated Feb 18, 2026

kraiskil / onnx2c

Open Neural Network Exchange to C compiler.

C 363 64 Updated Feb 7, 2026

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 22,479 2,056 Updated Jan 27, 2026

flashinfer-ai / flashinfer-bench

Building the Virtuous Cycle for AI-driven LLM Systems

Python 177 26 Updated Feb 13, 2026

ChandlerGuan / mercury_artifact

Python 22 6 Updated Oct 1, 2025

posquit0 / Awesome-CV

📄 Awesome CV is LaTeX template for your outstanding job application

TeX 26,743 5,174 Updated Feb 10, 2026

pytorch / helion

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 757 103 Updated Feb 18, 2026

PKU-DAIR / Hetu-DiT

Python 35 Updated Oct 16, 2025

tile-ai / Ladder

Efficient End2End Compiler for Mixed-Precision Deep Learning

Python 10 Updated Feb 8, 2025

flashinfer-ai / cubloaty

a size profiler for cuda binary

Python 72 Updated Jan 15, 2026

tw93 / Mole

🐹 Deep clean and optimize your Mac.

Shell 35,176 960 Updated Feb 16, 2026

qiaolian9 / FlashOmni

Sparse Attention; Sparse Linear; Diffusion Transformer

Cuda 5 Updated Nov 1, 2025

deepseek-ai / DeepSeek-V3.2-Exp

Python 1,480 139 Updated Nov 18, 2025

apache / tvm-ffi

Open ABI and FFI for Machine Learning Systems

C++ 346 60 Updated Feb 17, 2026