-
AI Frameworks Engineer @intel
- SH
-
04:13
(UTC +08:00) - https://yiliu30.github.io/
Lists (5)
Sort Name ascending (A-Z)
Stars
A connect program to connect opencode session to slack
Train transformer language models with reinforcement learning.
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
Implementation of FlashAttention in PyTorch
A sparse attention kernel supporting mix sparse patterns
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Unofficial description of the CUDA assembly (SASS) instruction sets.
Nvidia Instruction Set Specification Generator
An unofficial cuda assembler, for all generations of SASS, hopefully :)
Light Image Video Generation Inference Framework
[ICLR 2026] rCM: SOTA JVP-Based Diffusion Distillation & Few-Step Video Generation & Scaling Up sCM/MeanFlow
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.
Distributed MoE in a Single Kernel [NeurIPS '25]
A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
Helpful kernel tutorials and examples for tile-based GPU programming
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
A framework for efficient model inference with omni-modality models
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
MLIR-based toolkit targeting intel heterogeneous hardware