Skip to content
View a243845305's full-sized avatar

Block or report a243845305

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A lightweight, single-header C++11 Jinja2 template engine for LLM chat templates.

C++ 14 3 Updated Dec 23, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,290 354 Updated Dec 23, 2025

Open Machine Learning Compiler Framework

Python 12,958 3,743 Updated Dec 23, 2025

Development repository for the Triton language and compiler

MLIR 17,913 2,465 Updated Dec 23, 2025

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 148 31 Updated Dec 23, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 808 181 Updated Dec 23, 2025

Hands-On Practical MLIR Tutorial

C++ 690 103 Updated Oct 20, 2023

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 36,057 15,565 Updated Dec 23, 2025

compiler learning resources collect.

Python 2,622 362 Updated Mar 19, 2025

Yinghan's Code Sample

Cuda 361 61 Updated Jul 25, 2022
Cuda 18 1 Updated Jul 31, 2023

可编译的 nvidia opencl 官方 实例代码,https://developer.nvidia.com/opencl

C 2 Updated Dec 17, 2020

back up iphone photo

Python 5 2 Updated Oct 27, 2023

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Python 1,776 274 Updated Mar 28, 2024

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱

C++ 192 32 Updated Aug 17, 2023
C++ 256 40 Updated Sep 15, 2023

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,208 178 Updated Jul 29, 2023

C++ Parallel Computing and Asynchronous Networking Framework

C++ 14,255 2,563 Updated Dec 1, 2025

The note of Qualcomm OpenCL SDK

C++ 37 9 Updated Nov 8, 2018

This is an implementation of sgemm_kernel on L1d cache.

Assembly 233 33 Updated Feb 26, 2024

cpu cache延迟实验

C 1 Updated Jan 21, 2022

Correlation demo in OpenCL that uses local memory.

C 1 Updated Feb 24, 2015

OpenCL memory tester for GPUs

C++ 145 27 Updated Jan 23, 2021

The official rendering library for PAG (Portable Animated Graphics) files that renders After Effects animations natively across multiple platforms.

C++ 5,551 504 Updated Dec 23, 2025

The SHOC Benchmark Suite

Makefile 259 105 Updated Oct 6, 2025
45 17 Updated Dec 18, 2020

A primitive library for neural network

C++ 1,369 223 Updated Nov 24, 2024

arm-neon

C++ 92 23 Updated Aug 2, 2024
Next