Skip to content
View a243845305's full-sized avatar

Block or report a243845305

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,849 298 Updated Nov 5, 2025

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 12,789 3,690 Updated Nov 5, 2025

Development repository for the Triton language and compiler

MLIR 17,469 2,360 Updated Nov 5, 2025

FlagTree is a unified compiler for multiple AI chips, which is forked from triton-lang/triton.

C++ 128 18 Updated Nov 5, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 747 147 Updated Nov 5, 2025

Hands-On Practical MLIR Tutorial

C++ 647 93 Updated Oct 20, 2023

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 35,241 15,083 Updated Nov 5, 2025

compiler learning resources collect.

Python 2,576 358 Updated Mar 19, 2025

Yinghan's Code Sample

Cuda 354 62 Updated Jul 25, 2022
Cuda 18 1 Updated Jul 31, 2023

可编译的 nvidia opencl 官方 实例代码,https://developer.nvidia.com/opencl

C 2 Updated Dec 17, 2020

back up iphone photo

Python 5 2 Updated Oct 27, 2023

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Python 1,764 270 Updated Mar 28, 2024

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱

C++ 191 32 Updated Aug 17, 2023
C++ 253 40 Updated Sep 15, 2023

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,181 172 Updated Jul 29, 2023

C++ Parallel Computing and Asynchronous Networking Framework

C++ 14,154 2,551 Updated Nov 3, 2025

The note of Qualcomm OpenCL SDK

C++ 36 9 Updated Nov 8, 2018

This is an implementation of sgemm_kernel on L1d cache.

Assembly 230 33 Updated Feb 26, 2024

cpu cache延迟实验

C 1 Updated Jan 21, 2022

Correlation demo in OpenCL that uses local memory.

C 1 Updated Feb 24, 2015

OpenCL memory tester for GPUs

C++ 144 26 Updated Jan 23, 2021

The official rendering library for PAG (Portable Animated Graphics) files that renders After Effects animations natively across multiple platforms.

C++ 5,496 491 Updated Nov 5, 2025

The SHOC Benchmark Suite

Makefile 257 105 Updated Oct 6, 2025
45 17 Updated Dec 18, 2020

A primitive library for neural network

C++ 1,367 222 Updated Nov 24, 2024

arm-neon

C++ 92 23 Updated Aug 2, 2024

A simple high performance CUDA GEMM implementation.

Cuda 414 42 Updated Jan 4, 2024
Next