Highlights
- Pro
Lists (7)
Sort Name ascending (A-Z)
Awesome-X
Awesome ListsCourses
Courses or TutorialsCpp Header Libs
Header-Only LibrariesGPGPU Programming
Resources
Collection of open source resourcesTiny/Nano-X
Tiny Projects for educational purposeTools
Simple but quite good toolsStarred repositories
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
A community-maintained Python framework for creating mathematical animations.
Unofficial description of the CUDA assembly (SASS) instruction sets.
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
Third party assembler and GEMM library for NVIDIA Kepler GPU
Nvidia Instruction Set Specification Generator
Helpful kernel tutorials and examples for tile-based GPU programming
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Collective communications library with various primitives for multi-machine training.
🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,1分钟手机通知,无需…
cuVS - a library for vector search and clustering on the GPU
Efficient Triton Kernels for LLM Training
An unofficial cuda assembler, for all generations of SASS, hopefully :)
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
Optimized primitives for collective multi-GPU communication
Fast and memory-efficient exact attention
DeepEP: an efficient expert-parallel communication library
collection of benchmarks to measure basic GPU capabilities
A markup-based typesetting system that is powerful and easy to learn.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
FlashInfer: Kernel Library for LLM Serving