Ammar-Alnagar

Deciphering the GPU manuscript.....

Ammar Ammar-Alnagar

Deciphering the GPU manuscript.....

34 followers · 18 following

Achievements

x2 x3

Achievements

x2 x3

Highlights

Developer Program Member

Organizations

Stars

AdepojuJeremy / CUDA-120-DAYS--CHALLENGE

A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…

Shell 865 100 Updated Mar 29, 2025

NVIDIA / accelerated-computing-hub

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 1,166 209 Updated Feb 9, 2026

gpu-mode / popcorn-cli

Rust 95 17 Updated Feb 9, 2026

srush / GPU-Puzzles

Solve puzzles. Learn CUDA.

Jupyter Notebook 11,940 927 Updated Sep 1, 2024

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,247 1,670 Updated Feb 4, 2026

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,709 571 Updated Feb 1, 2026

godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine

C++ 106,504 24,294 Updated Feb 9, 2026

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,636 952 Updated Feb 5, 2026

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 8,826 2,265 Updated Jan 6, 2026

openvex / spec

OpenVEX Specification

167 20 Updated Jan 16, 2026

MekkCyber / CutlassAcademy

A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS

251 12 Updated May 6, 2025

Frikallo / axiom

High-performance C++ tensor library with NumPy/PyTorch-like API, SIMD vectorization, BLAS acceleration, and Metal GPU support.

C++ 38 1 Updated Feb 10, 2026

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,849 2,095 Updated Feb 10, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,169 817 Updated Feb 3, 2026

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,916 112 Updated Feb 3, 2026

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 827 60 Updated Jan 14, 2026

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,680 2,317 Updated Feb 9, 2026