yiliu30

Follow

🌍

Working on site

Yi Liu yiliu30

🌍

Working on site

Follow

Talk is cheap, pick one and do it.

29 followers · 185 following

AI Frameworks Engineer @intel
SH
04:13 (UTC +08:00)
https://yiliu30.github.io/

Achievements

Achievements

Lists (5)

Sort

👍

LLM

MLIR

🚀 My stack

Quant

Stars

427 results for source starred repositories

delock / opencode_connect

A connect program to connect opencode session to slack

TypeScript 5 Updated Feb 4, 2026

huggingface / trl

Train transformer language models with reinforcement learning.

Python 17,288 2,471 Updated Feb 5, 2026

radixark / miles

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 837 100 Updated Feb 4, 2026

m1heng / clawdbot-feishu

TypeScript 2,559 298 Updated Feb 5, 2026

Tencent / hpc-ops

High Performance LLM Inference Operator Library

C++ 687 56 Updated Feb 5, 2026

deepseek-ai / Engram

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Python 3,571 240 Updated Jan 14, 2026

gkamradt / LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 2,165 233 Updated Aug 17, 2024

kyegomez / FlashAttention20

Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels

Python 112 7 Updated Jul 31, 2023

shreyansh26 / FlashAttention-PyTorch

Implementation of FlashAttention in PyTorch

Python 180 22 Updated Jan 12, 2025

mit-han-lab / Block-Sparse-Attention

A sparse attention kernel supporting mix sparse patterns

C++ 452 45 Updated Jan 18, 2026

mit-han-lab / spatten

[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

Scala 122 11 Updated Aug 27, 2024

GeeeekExplorer / nano-vllm

Nano vLLM

Python 11,503 1,525 Updated Nov 3, 2025

0xD0GF00D / DocumentSASS

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 200 19 Updated Jul 18, 2025

kuterd / nv_isa_solver

Nvidia Instruction Set Specification Generator

Python 311 20 Updated Jul 9, 2024

cloudcores / CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully ：）

Python 567 99 Updated Apr 20, 2023

ModelTC / LightX2V

Light Image Video Generation Inference Framework

Python 1,921 156 Updated Feb 5, 2026

vllm-project / vllm-daily

vLLM Daily Summarization of Merged PRs

39 3 Updated Feb 4, 2026

NVlabs / rcm

[ICLR 2026] rCM: SOTA JVP-Based Diffusion Distillation & Few-Step Video Generation & Scaling Up sCM/MeanFlow

Python 524 21 Updated Feb 5, 2026

thu-ml / TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,309 227 Updated Jan 29, 2026

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 823 60 Updated Jan 14, 2026

baidu / vLLM-Kunlun

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.

Python 249 43 Updated Feb 2, 2026

osayamenja / FlashMoE

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 190 21 Updated Feb 5, 2026

DayuanJiang / next-ai-draw-io

A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…

TypeScript 20,778 2,194 Updated Feb 5, 2026

aikitoria / nanotrace

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C 130 5 Updated Nov 26, 2025

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 630 44 Updated Feb 5, 2026

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,901 110 Updated Feb 3, 2026

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 2,623 389 Updated Feb 5, 2026

meta-pytorch / tritonparse

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

Python 194 19 Updated Feb 5, 2026

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,472 1,209 Updated Feb 5, 2026

intel / graph-compiler

MLIR-based toolkit targeting intel heterogeneous hardware

C++ 51 16 Updated Feb 5, 2026