Stars
UniRL is a Framework for Unified Multimodal Model Reinforcement Learning
SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.
OfficeCLI is the first and best Office suite purpose-built for AI agents to read, edit, and automate Word, Excel, and PowerPoint files. Free, open-source, single binary, no Office installation requ…
Making daily work at MSRA easier — especially cluster training, data management, and server operations.
Bridging the gap between image generation and real-world design: a benchmark for structured, multi-constraint commercial visual content generation.
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
Build coherent and visually polished multimodal webpages with hierarchical planning, AIGC tools, and iterative reflection.
Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.
📷 [CVPR'26] Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!
GRADE: Grounded Reasoning Assessment for Discipline-informed Editing
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
Code repo for "EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation"
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single framework.
[ICML 2026 Oral] Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
(ICML2026) Official implementation of VLANeXt.
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
A unified framework for easy reinforcement learning in Flow-Matching models
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
[ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
Wan: Open and Advanced Large-Scale Video Generative Models
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
official training and inference code of bitwise tokenizer
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++