Highlights
- Pro
Lists (7)
Sort Name ascending (A-Z)
Stars
Official implementation of "Continuous Autoregressive Language Models"
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
Official repo of Toucan: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
AgenTracer: A Lightweight Failure Attributor for Agentic Systems
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
On the Effect of Instruction Tuning Loss on Generalization
Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".
Efficient Mixture of Experts for LLM Paper List
Source code of "Dr.LLM: Dynamic Layer Routing in LLMs"
Official Implement of "Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models"
"LightAgent: Lightweight and Cost-Effective Mobile Agents"
Demystifying Reinforcement Learning in Agentic Reasoning
Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"
AgentFlow: In-the-Flow Agentic System Optimization
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
π₯π₯π₯ Latest Papers, Codes and Datasets on Video-LMM Post-Training
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star π if you find it useful.
Official Repo for Self-Forcing++ High Quality Long Video Generation
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
MiroMind Research Agent: Fully Open-Source Deep Research Agent with Reproducible State-of-the-Art Performance on FutureX, GAIA, HLE, BrowserComp and xBench.
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.