This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,317 181 Updated Jul 29, 2023

karpathy / autoresearch

AI agents running research on single-GPU nanochat training automatically

Python 88,217 12,766 Updated Mar 26, 2026

nanocoai / nanoclaw

A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs dir…

TypeScript 29,953 12,897 Updated Jun 22, 2026

basecamp / omarchy

Beautiful, Modern & Opinionated Linux

Shell 23,716 2,389 Updated Jun 23, 2026

moonshine-ai / moonshine

Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces

C++ 8,526 462 Updated Jun 17, 2026

alibaba / AliSQL

AliSQL is a MySQL branch originated from Alibaba Group. Fetch document from Release Notes at bottom.

C++ 5,826 893 Updated May 12, 2026

alibaba / zvec

A lightweight, lightning-fast, in-process vector database

C++ 12,237 726 Updated Jun 23, 2026

progschj / ThreadPool

A simple C++11 Thread Pool implementation

C++ 8,757 2,341 Updated Jul 20, 2024

xai-org / x-algorithm

Algorithm powering the For You feed on X

Rust 26,257 4,504 Updated May 15, 2026

microsoft / DiskANN

A vector indexing library to bring fast, fresh and filtered search to your database

Rust 1,857 427 Updated Jun 23, 2026

1rgs / nanocode

Minimal Claude Code alternative. Single Python file, zero dependencies, ~250 lines.

Python 2,437 237 Updated Jan 14, 2026

Roxanne0321 / vsag

Forked from antgroup/vsag

vsag is a vector indexing library used for similarity search.

C++ 6 Updated Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FisherKK FisherKKK

Achievements

Achievements

Highlights

Block or report FisherKKK

Stars

rwitten / HighPerfLLMs2024

NVIDIA / FasterTransformer

google-ai-edge / model-explorer

NVIDIA-developer-blog / code-samples

pytorch / pytorch

NVIDIA / CUDALibrarySamples

Liu-xiandong / How_to_optimize_in_GPU

karpathy / autoresearch

nanocoai / nanoclaw

basecamp / omarchy

moonshine-ai / moonshine

alibaba / AliSQL

alibaba / zvec

progschj / ThreadPool

xai-org / x-algorithm

microsoft / DiskANN

1rgs / nanocode

Roxanne0321 / vsag

ComposioHQ / awesome-claude-skills

3b1b / manim

sgl-project / mini-sglang

huggingface / lerobot

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

yuninxia / hands-on-simd-programming

uestc-huangyw / Awesome-Generative-Recommendation

aibrix / PrisKV

sonnyli / flash_attention_from_scratch

karpathy / nanochat

Jokeren / Awesome-GPU

Fridge003 / Cuda-Learn-By-Practice