Skip to content
View qiaolian9's full-sized avatar
:shipit:
Focusing
:shipit:
Focusing

Highlights

  • Pro

Block or report qiaolian9

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Our first fully AI generated deep learning system

Python 544 38 Updated Feb 2, 2026

High Performance LLM Inference Operator Library

C++ 729 57 Updated Feb 5, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 642 35 Updated Feb 14, 2026

Ongoing research training transformer models at scale

Python 15,221 3,604 Updated Feb 18, 2026

[HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.

C++ 80 7 Updated Dec 18, 2025

Light Image Video Generation Inference Framework

Python 1,964 161 Updated Feb 11, 2026

mHC kernels implemented in CUDA

Cuda 252 19 Updated Jan 14, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,504 436 Updated Feb 17, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 833 61 Updated Feb 13, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 586 53 Updated Feb 12, 2026

Helpful kernel tutorials and examples for tile-based GPU programming

Python 645 46 Updated Feb 17, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,926 116 Updated Feb 17, 2026

NVIDIA cuTile learn

Python 162 1 Updated Dec 9, 2025

Official inference repo for FLUX.2 models

Python 1,792 108 Updated Feb 17, 2026

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,188 75 Updated Sep 30, 2025

Provide with pre-build flash-attention package wheels on Linux and Windows platforms using GitHub Actions

Python 938 58 Updated Feb 18, 2026

Open Neural Network Exchange to C compiler.

C 363 64 Updated Feb 7, 2026

Contexts Optical Compression

Python 22,479 2,056 Updated Jan 27, 2026

Building the Virtuous Cycle for AI-driven LLM Systems

Python 177 26 Updated Feb 13, 2026

📄 Awesome CV is LaTeX template for your outstanding job application

TeX 26,743 5,174 Updated Feb 10, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 757 103 Updated Feb 18, 2026
Python 35 Updated Oct 16, 2025

Efficient End2End Compiler for Mixed-Precision Deep Learning

Python 10 Updated Feb 8, 2025

a size profiler for cuda binary

Python 72 Updated Jan 15, 2026

🐹 Deep clean and optimize your Mac.

Shell 35,176 960 Updated Feb 16, 2026

Sparse Attention; Sparse Linear; Diffusion Transformer

Cuda 5 Updated Nov 1, 2025

Open ABI and FFI for Machine Learning Systems

C++ 346 60 Updated Feb 17, 2026
Next