-
AI Frameworks Engineer @intel
- SH
-
21:37
(UTC +08:00)
Lists (5)
Sort Name ascending (A-Z)
Stars
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.
Distributed MoE in a Single Kernel [NeurIPS '25]
A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
Helpful kernel tutorials and examples for tile-based GPU programming
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
A framework for efficient model inference with omni-modality models
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
MLIR-based toolkit targeting intel heterogeneous hardware
GEMM performance kernels for Intel GPUs, Nvidia GPUs, and Intel CPUs, written using SYCL joint matrix extension
NVIDIA curated collection of educational resources related to general purpose GPU programming.
A debugging and profiling tool that can trace and visualize python code execution
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
SYCL implementation of Fused MLPs for Intel GPUs
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Helpful tools and examples for working with flex-attention
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.