Highlights
- Pro
Stars
Official code for "SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization"
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees.
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
Official code for PEARL: Personalized Streaming Video Understanding Model
This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and proces…
Advancing AI by embracing human-likeness for better AI understanding, human–AI collaboration, and social simulation, bridging technology and genuine human experience.
[ACL 2026] CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
[CVPR'26 Highlight] SimRecon: SimReady Compositional Scene Reconstruction from Real Videos
A lightweight, AI-native training framework for large language models. Designed for fast iteration, reproducible experiments, and modular configuration across SFT, RLVR, and evaluation workflows.
Foundations of Medical Large Language Model Learning
WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation
[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.
Fast, Sharp & Reliable Agentic Intelligence
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
[ACL 2026 Findings] CoV: Chain-of-View Prompting for Spatial Reasoning
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI
[CVPR 2026] InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its size.
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Code for "In-Context Former: Lightning-fast Compressing Context for Large Language Model" (Findings of EMNLP 2024)
STEP-GUI: The top GUI agent solution in the galaxy. Developed by the StepFun-GELab team and powered by StepFun’s cutting-edge research capabilities.