sazczmh

sazc sazczmh

84 followers · 24 following

Achievements

Stars

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 612 48 Updated Oct 9, 2025

attention-survey / Efficient_Attention_Survey

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

187 4 Updated Aug 26, 2025

CoffeeBeforeArch / nvbit_tools

C 13 6 Updated Sep 11, 2020

NVIDIA / accelerated-computing-hub

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 737 123 Updated Oct 6, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,555 1,474 Updated Sep 25, 2025

abhibambhaniya / GenZ-LLM-Analyzer

LLM Inference analyzer for different hardware platforms

Jupyter Notebook 94 19 Updated Jul 8, 2025

ademeure / QuickRunCUDA

C++ 12 4 Updated Oct 6, 2025

0xD0GF00D / DocumentSASS

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 145 14 Updated Jul 18, 2025

ihavnoid / tg4perfetto

Simple python library for generating your own perfetto traces for your application. Can be used for both app instrumentation and custom trace generation (for your own purposes)

Python 18 6 Updated Jun 22, 2025

sazczmh / DeepGEMM

Forked from deepseek-ai/DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 1 Updated Mar 25, 2025

chenhaoc / ESL_Learning

some knowleage about SystemC/TLM etc.

25 5 Updated Jun 8, 2023

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,784 710 Updated Oct 9, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,798 907 Updated Sep 30, 2025

ColfaxResearch / cfx-article-src

C++ 148 29 Updated May 7, 2025

compiler-explorer / compiler-explorer

Run compilers interactively from your web browser and interact with the assembly

TypeScript 18,097 1,938 Updated Oct 8, 2025

HuyNguyen-hust / hopper-gemm-101

Cuda 10 1 Updated Dec 22, 2024

KnowingNothing / MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

C++ 379 49 Updated Sep 21, 2024

XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 6,178 190 Updated Oct 7, 2025

WojciechRynczuk / vcdMaker

A tool for converting text log files to the VCD format.

C++ 30 2 Updated Apr 26, 2021

3b1b / manim

Animation engine for explanatory math videos

Python 81,080 6,890 Updated Jun 14, 2025

shioyadan / Konata

Konata is an instruction pipeline visualizer for Onikiri2-Kanata/Gem5-O3PipeView formats. You can download the pre-built binaries from https://github.com/shioyadan/Konata/releases

JavaScript 482 43 Updated Apr 8, 2024