yiliu30

🌍

Working on site

Yi Liu yiliu30

🌍

Working on site

Talk is cheap, pick one and do it.

24 followers · 175 following

AI Frameworks Engineer @intel
SH
21:37 (UTC +08:00)

Achievements

x3 x3

Achievements

x3 x3

Lists (5)

Sort

Stars

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 322 23 Updated Dec 20, 2025

baidu / vLLM-Kunlun

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.

Python 193 19 Updated Dec 23, 2025

osayamenja / FlashMoE

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 157 18 Updated Dec 23, 2025

DayuanJiang / next-ai-draw-io

A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…

TypeScript 14,634 1,502 Updated Dec 23, 2025

aikitoria / nanotrace

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C 127 5 Updated Nov 26, 2025

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 475 26 Updated Dec 23, 2025

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,658 85 Updated Dec 20, 2025

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 1,347 182 Updated Dec 23, 2025

meta-pytorch / tritonparse

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

Python 178 15 Updated Dec 23, 2025

huggingface / kernel-builder

👷 Build compute kernels

Nix 195 33 Updated Dec 23, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,246 1,190 Updated Dec 23, 2025

intel / graph-compiler

MLIR-based toolkit targeting intel heterogeneous hardware

C++ 49 16 Updated Feb 25, 2025

intel / llm-scaler

Shell 111 15 Updated Dec 22, 2025

dkhaldi / sycl_joint_matrix_kernels

GEMM performance kernels for Intel GPUs, Nvidia GPUs, and Intel CPUs, written using SYCL joint matrix extension

C++ 6 4 Updated Apr 3, 2025

NVIDIA / accelerated-computing-hub

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 1,013 180 Updated Dec 12, 2025

gaogaotiantian / viztracer

A debugging and profiling tool that can trace and visualize python code execution

Python 7,460 467 Updated Dec 21, 2025

usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 277 22 Updated Jul 16, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,025 886 Updated Dec 4, 2025

intel / pti-gpu

Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily

C++ 255 65 Updated Dec 17, 2025