Skip to content
View sazczmh's full-sized avatar

Block or report sazczmh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Quirky Assortment of CuTe Kernels

Python 612 48 Updated Oct 9, 2025

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

187 4 Updated Aug 26, 2025

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 737 123 Updated Oct 6, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,555 1,474 Updated Sep 25, 2025

LLM Inference analyzer for different hardware platforms

Jupyter Notebook 94 19 Updated Jul 8, 2025
C++ 12 4 Updated Oct 6, 2025

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 145 14 Updated Jul 18, 2025

Simple python library for generating your own perfetto traces for your application. Can be used for both app instrumentation and custom trace generation (for your own purposes)

Python 18 6 Updated Jun 22, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 1 Updated Mar 25, 2025

some knowleage about SystemC/TLM etc.

25 5 Updated Jun 8, 2023

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,784 710 Updated Oct 9, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,798 907 Updated Sep 30, 2025

Run compilers interactively from your web browser and interact with the assembly

TypeScript 18,097 1,938 Updated Oct 8, 2025

A Easy-to-understand TensorOp Matmul Tutorial

C++ 379 49 Updated Sep 21, 2024

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 6,178 190 Updated Oct 7, 2025

A tool for converting text log files to the VCD format.

C++ 30 2 Updated Apr 26, 2021

Animation engine for explanatory math videos

Python 81,080 6,890 Updated Jun 14, 2025

Konata is an instruction pipeline visualizer for Onikiri2-Kanata/Gem5-O3PipeView formats. You can download the pre-built binaries from https://github.com/shioyadan/Konata/releases

JavaScript 482 43 Updated Apr 8, 2024

收集整理觉得还行的一些规则

JavaScript 945 207 Updated Jan 4, 2025

This is a Chinese translation of the CUDA programming guide

1,695 247 Updated Nov 13, 2024

🌊 Digital timing diagram rendering engine

JavaScript 3,256 389 Updated Jul 10, 2025

Verilog/SystemVerilog Syntax and Omni-completion

Vim Script 405 94 Updated Oct 13, 2024

Spectacle allows you to organize your windows without using a mouse.

Objective-C 13,656 840 Updated Jan 15, 2022

帮助大家进行FPGA的入门,分享FPGA相关的优秀文章,优秀项目

5,012 761 Updated May 15, 2022

Report historical and statistical real time of the system, keeping it between restarts. Like uptime command but with more interesting output.

Python 298 40 Updated Aug 19, 2025

🖥️ macOS status monitoring app written in SwiftUI.

Swift 9,632 358 Updated May 25, 2024

Raspberry Pi PCI Express device compatibility database

HTML 1,823 168 Updated Oct 9, 2025
Next