Skip to content
View cherhh's full-sized avatar

Block or report cherhh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Manifold-Constrained Hyper-Connections with fused Triton kernels for efficient training

Python 10 1 Updated Feb 7, 2026

mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations

Python 69 2 Updated Jan 12, 2026

Efficient GPU communication over multiple NICs.

C++ 24 4 Updated Nov 20, 2025

Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.

Python 334 50 Updated Feb 16, 2026

A claude code skill to delegate prompts to codex

595 51 Updated Feb 9, 2026

Spec-driven development (SDD) for AI coding assistants.

TypeScript 24,475 1,633 Updated Feb 18, 2026

🔥 A minimal training framework for scaling FLA models

Python 349 56 Updated Nov 15, 2025

NVIDIA cuTile learn

Python 162 1 Updated Dec 9, 2025

A 5-20x faster experimental Homebrew alternative

Rust 6,374 143 Updated Feb 17, 2026
C++ 165 43 Updated Feb 5, 2026

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 2,184 254 Updated Jan 27, 2026

Flexible and Pluggable Serving Engine for Diffusion LLMs

Python 58 11 Updated Feb 14, 2026

A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-performance systems.

Python 59 4 Updated Feb 2, 2026

Training library for Megatron-based models with bidirectional Hugging Face conversion capability

Python 433 173 Updated Feb 18, 2026

An asynchronous streaming data management module for efficient post-training.

Python 28 9 Updated Feb 13, 2026

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 19 2 Updated Dec 24, 2025

The most powerful local music generation model that outperforms most commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.

Python 5,784 640 Updated Feb 18, 2026

Moves makes it easier than ever to position your windows juuust right

Swift 210 11 Updated Feb 9, 2026
Python 4 Updated Dec 31, 2025

This repository contains the code for the ICLR 2026 paper “DASH: Deterministic Attention Scheduling for High-Throughput Reproducible LLM Training”, developed on top of the FlashAttention codebase.

Python 7 Updated Jan 31, 2026

cuda best practice & notes

Python 10 Updated Oct 24, 2025

[KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Python 83 5 Updated Feb 6, 2026

Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).

Python 42 1 Updated Feb 8, 2026
Python 169 30 Updated Feb 12, 2026

Fast, Sharp & Reliable Agentic Intelligence

C++ 1,283 40 Updated Feb 13, 2026
Python 47 4 Updated Feb 5, 2026
C++ 66 6 Updated Feb 14, 2026

Multimodal deep-research MLLM and benchmark. The first long-horizon multimodal deep-research MLLM, extending the number of reasoning turns to dozens and the number of search-engine interactions to …

Python 350 36 Updated Feb 9, 2026

Learning TileLang with 10 puzzles!

Python 136 15 Updated Jan 30, 2026
Next