Lists (8)
Sort Name ascending (A-Z)
Stars
Manifold-Constrained Hyper-Connections with fused Triton kernels for efficient training
mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations
Efficient GPU communication over multiple NICs.
Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.
A claude code skill to delegate prompts to codex
Spec-driven development (SDD) for AI coding assistants.
🔥 A minimal training framework for scaling FLA models
A 5-20x faster experimental Homebrew alternative
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Flexible and Pluggable Serving Engine for Diffusion LLMs
A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-performance systems.
Training library for Megatron-based models with bidirectional Hugging Face conversion capability
An asynchronous streaming data management module for efficient post-training.
PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
The most powerful local music generation model that outperforms most commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.
Moves makes it easier than ever to position your windows juuust right
This repository contains the code for the ICLR 2026 paper “DASH: Deterministic Attention Scheduling for High-Throughput Reproducible LLM Training”, developed on top of the FlashAttention codebase.
[KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations
Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).
Fast, Sharp & Reliable Agentic Intelligence
Multimodal deep-research MLLM and benchmark. The first long-horizon multimodal deep-research MLLM, extending the number of reasoning turns to dozens and the number of search-engine interactions to …