-
Zhejiang University
- Hangzhou, China
- https://jianbiaomei.github.io
Stars
NEO Series: Native Vision-Language Models from First Principles
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
AI agents running research on single-GPU nanochat training automatically
Hy3 preview (295B A21B), a leading reasoning and agent model in its size, with great cost efficiency
The agent that grows with you
Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
Supercharge your AI agents by versioning, tracking, and merging overlapping skills.
OpenClaw-RL: Train any agent simply by talking
[RSS 2026] Causal video-action world model for generalist robot control
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
[ICML-2026] Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
[ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
RynnVLA-002: A Unified Vision-Language-Action and World Model
The Agent’s First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios
✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
[NeurIPS'24 Spotlight] GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
A unified inference and post-training framework for accelerated video generation.