Skip to content
View Aidyn-A's full-sized avatar

Block or report Aidyn-A

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AI agents running research on single-GPU nanochat training automatically

Python 70,253 10,245 Updated Mar 26, 2026
Python 2 Updated Feb 5, 2026

Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools

Python 189 12 Updated Mar 12, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 924 76 Updated Apr 1, 2026

Helpful kernel tutorials and examples for tile-based GPU programming

Python 699 60 Updated Apr 10, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,014 130 Updated Apr 11, 2026

Helpful tools and examples for working with flex-attention

Python 1,174 76 Updated Apr 1, 2026

Parrot is a C++ library for fused array operations using CUDA/Thrust. It provides efficient GPU-accelerated operations with lazy evaluation semantics, allowing for chaining of operations without un…

Cuda 260 16 Updated Apr 9, 2026

Customized matrix multiplication kernels

Jupyter Notebook 57 6 Updated Mar 5, 2022

Simple, portable, and self-contained stacktrace library for C++11 and newer

C++ 1,408 166 Updated Mar 13, 2026

Header-only C++/python library for fast approximate nearest neighbors

C++ 5,164 810 Updated Mar 28, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,236 1,038 Updated Apr 8, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,366 886 Updated Apr 11, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,646 5,282 Updated Apr 11, 2026

Convert PDF to markdown + JSON quickly with high accuracy

Python 33,659 2,327 Updated Apr 10, 2026

A community-maintained Python framework for creating mathematical animations.

Python 37,714 2,774 Updated Apr 7, 2026

Generate audiobooks from e-books, voice cloning & 1158+ languages!

Python 18,658 1,531 Updated Apr 10, 2026

A collection of inspiring lists, manuals, cheatsheets, blogs, hacks, one-liners, cli/web tools and more.

214,441 12,816 Updated Nov 19, 2024

NVIDIA Math Libraries for the Python Ecosystem

Cython 557 33 Updated Mar 11, 2026

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,242 1,285 Updated May 23, 2024

LLM training in simple, raw C/CUDA

Cuda 29,511 3,512 Updated Jun 26, 2025

The official Meta Llama 3 GitHub site

Python 29,292 3,530 Updated Jan 26, 2025

PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.

Python 1,451 113 Updated Apr 6, 2026

🔥Highlighting the top ML papers every week.

12,275 769 Updated Jul 20, 2025

Zero Bubble Pipeline Parallelism

Python 452 34 Updated May 7, 2025

GPU programming related news and material links

2,091 125 Updated Mar 8, 2026

A dbg(…) macro for C++

C++ 3,233 274 Updated Feb 14, 2026

A minimal programming example for a chat server

C 7,526 844 Updated Jan 27, 2024
Next