A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-performance systems.

Python 59 4 Updated Feb 2, 2026

NVIDIA-NeMo / Megatron-Bridge

Training library for Megatron-based models with bidirectional Hugging Face conversion capability

Python 433 173 Updated Feb 18, 2026

Ascend / TransferQueue

An asynchronous streaming data management module for efficient post-training.

Python 28 9 Updated Feb 13, 2026

meituan-longcat / SGLang-FluentLLM

Python 61 4 Updated Feb 16, 2026

technillogue / ptx-isa-markdown

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 19 2 Updated Dec 24, 2025

ace-step / ACE-Step-1.5

The most powerful local music generation model that outperforms most commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.

Python 5,784 640 Updated Feb 18, 2026

mikker / Moves.app

Moves makes it easier than ever to position your windows juuust right

Swift 210 11 Updated Feb 9, 2026

SeTriones / LLM-Tools

Python 4 Updated Dec 31, 2025

SJTU-Liquid / deterministic-FA3

This repository contains the code for the ICLR 2026 paper “DASH: Deterministic Attention Scheduling for High-Throughput Reproducible LLM Training”, developed on top of the FlashAttention codebase.

Python 7 Updated Jan 31, 2026

zfan2356 / kernel-guide

cuda best practice & notes

Python 10 Updated Oct 24, 2025

hkust-nlp / KernelGYM

[KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Python 83 5 Updated Feb 6, 2026

fvliang / DART

Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).

Python 42 1 Updated Feb 8, 2026

taco-project / FlexKV

Python 169 30 Updated Feb 12, 2026

stepfun-ai / Step-3.5-Flash

Fast, Sharp & Reliable Agentic Intelligence

C++ 1,283 40 Updated Feb 13, 2026

leepoly / sm-profiler

Python 47 4 Updated Feb 5, 2026

alibaba / tair-kvcache

C++ 66 6 Updated Feb 14, 2026

Osilly / Vision-DeepResearch

Multimodal deep-research MLLM and benchmark. The first long-horizon multimodal deep-research MLLM, extending the number of reasoning turns to dozens and the number of search-engine interactions to …

Python 350 36 Updated Feb 9, 2026

tile-ai / tilelang-puzzles

Learning TileLang with 10 puzzles!

Python 136 15 Updated Jan 30, 2026

cher cherhh

Lists (8)

awesome c++

course

llm serve

mlsys

os

system

tools

好玩的

Stars