Skip to content
View yiliu30's full-sized avatar
🌍
Working on site
🌍
Working on site
  • AI Frameworks Engineer @intel
  • SH
  • 21:37 (UTC +08:00)

Block or report yiliu30

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 322 23 Updated Dec 20, 2025

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.

Python 193 19 Updated Dec 23, 2025

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 157 18 Updated Dec 23, 2025

A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…

TypeScript 14,634 1,502 Updated Dec 23, 2025

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C 127 5 Updated Nov 26, 2025

Helpful kernel tutorials and examples for tile-based GPU programming

Python 475 26 Updated Dec 23, 2025

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,658 85 Updated Dec 20, 2025

A framework for efficient model inference with omni-modality models

Python 1,347 182 Updated Dec 23, 2025

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

Python 178 15 Updated Dec 23, 2025

👷 Build compute kernels

Nix 195 33 Updated Dec 23, 2025

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,246 1,190 Updated Dec 23, 2025

MLIR-based toolkit targeting intel heterogeneous hardware

C++ 49 16 Updated Feb 25, 2025
Shell 111 15 Updated Dec 22, 2025

GEMM performance kernels for Intel GPUs, Nvidia GPUs, and Intel CPUs, written using SYCL joint matrix extension

C++ 6 4 Updated Apr 3, 2025

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 1,013 180 Updated Dec 12, 2025

A debugging and profiling tool that can trace and visualize python code execution

Python 7,460 467 Updated Dec 21, 2025

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 277 22 Updated Jul 16, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,025 886 Updated Dec 4, 2025

Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily

C++ 255 65 Updated Dec 17, 2025

Samples for Intel® oneAPI Toolkits

C++ 1,118 741 Updated Nov 21, 2025

SYCL implementation of Fused MLPs for Intel GPUs

C++ 49 11 Updated Nov 24, 2025

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 780 51 Updated Oct 15, 2025

Helpful tools and examples for working with flex-attention

Python 1,093 66 Updated Dec 22, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,290 354 Updated Dec 23, 2025
Python 39 3 Updated Dec 14, 2025

🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.

JavaScript 22,592 2,285 Updated Oct 17, 2025

🔀 yet another mixture of experts

Python 22 2 Updated Sep 19, 2025

Go ahead and axolotl questions

Python 10,986 1,223 Updated Dec 23, 2025
Next