-
NVIDIA
- Boston
-
08:01
(UTC -04:00) - yukangchen.com
- https://scholar.google.com/citations?user=6p0ygKUAAAAJ&hl=en
- @yukangchen_
- in/yukang-chen-35aaa2151
Stars
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
Frequency-based KV cache pruning for llama.cpp — 25% cache reduction, improved PPL at long context. GPU compaction kernel for HIP/ROCm.
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
[Official Repo] SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
[CVPRW oral 2022] MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
AutoGaze automatically removes redundant patches in a video, reducing #tokens in ViT/MLLM by 4x-100x.
Official code of Motus: A Unified Latent Action World Model
Code for the paper “Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs”
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
Official repository of Utonia: Toward One Encoder for All Point Clouds
CVPR 2026 | Official Implementation of "MultiShotMaster: A Controllable Multi-Shot Video Generation Framework" 🔥
Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning
A rejection-sampling based distribution alignment method for extreme actor-policy mismatch RL Training
[ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
NVIDIA FastGen: Fast Generation from Diffusion Models
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Consistent Autoregressive Video Generation with Long Context
Code for "LIVE: Long-horizon Interactive Video World ModEling"
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
A unified inference and post-training framework for accelerated video generation.