-
MIT
- Boston
- https://jameshujy.github.io/
Stars
Official implementation of "Figure It Out: Improve the Frontier of Reasoning with Active Visual Thinking"
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
A General, Accurate, Long-Horizon, and Efficient Mobile Agent driven by Multimodal Foundation Models
LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence https://arxiv.org/abs/2509.03505
verl: Volcano Engine Reinforcement Learning for LLMs
Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team.
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI…
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning
Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"
Scalable RL solution for advanced reasoning of language models
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
This package contains the original 2012 AlexNet code.
A series of technical report on Slow Thinking with LLM