Skip to content
View yh8899's full-sized avatar

Block or report yh8899

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

UniRL is a Framework for Unified Multimodal Model Reinforcement Learning

Python 663 41 Updated Jun 19, 2026

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Python 771 39 Updated Jun 3, 2026

PyTorch Single Controller

Rust 1,050 161 Updated Jun 21, 2026

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Python 215 22 Updated Jun 21, 2026

Flow Map OPD for AnyStep Video Diffusion

Python 369 8 Updated May 23, 2026

Official Repo of "D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models"

Python 255 7 Updated May 22, 2026

A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.

179,632 18,380 Updated Apr 20, 2026

NVIDIA AITune is an inference toolkit designed for tuning and deploying Deep Learning models with a focus on NVIDIA GPUs.

Python 275 31 Updated Jun 3, 2026

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 314 23 Updated May 31, 2026

Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1

Python 67,662 11,007 Updated Jun 21, 2026

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

JavaScript 219,221 33,602 Updated Jun 21, 2026

rCM & Causal-rCM: Leading and Unified Algorithms/Infrastructures for Bidirectional/Autoregressive Video Diffusion Distillation at Scale

Python 704 26 Updated Jun 5, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,536 265 Updated Jun 17, 2026

A framework for efficient model inference with omni-modality models

Python 5,226 1,150 Updated Jun 21, 2026
Python 11,590 790 Updated Feb 9, 2026

flex-block-attn: an efficient block sparse attention computation library

Jupyter Notebook 130 14 Updated Dec 26, 2025
Python 1,868 282 Updated Jun 19, 2026

Transforming Video Diffusion with Temporal Sparse Attention

Python 49 5 Updated Apr 8, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 890 154 Updated Jun 21, 2026

torchcomms: a modern PyTorch communications API

C++ 372 153 Updated Jun 21, 2026

PyTorch-native post-training at scale

Python 687 97 Updated Jun 21, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 313 19 Updated Feb 24, 2026

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 852 58 Updated Jun 21, 2026

Aiming to integrate most existing feature caching-based diffusion acceleration schemes into a unified framework.

Python 104 11 Updated Oct 23, 2025

Lightweight Image Video Action Generation Inference Framework

Python 2,428 220 Updated Jun 21, 2026

Trainable fast and memory-efficient sparse attention

Python 709 52 Updated Jun 21, 2026

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 1,030 417 Updated Jun 21, 2026

A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.

Python 1,204 75 Updated Jun 16, 2026

(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Python 1,370 82 Updated Aug 7, 2025
Next