Highlights
- Pro
Stars
Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees.
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
Official code for PEARL: Personalized Streaming Video Understanding Model
This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and proces…
Advancing AI by embracing human-likeness for better AI understanding, human–AI collaboration, and social simulation, bridging technology and genuine human experience.
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
[CVPR'26] SimRecon: SimReady Compositional Scene Reconstruction from Real Videos
A lightweight, AI-native training framework for large language models. Designed for fast iteration, reproducible experiments, and modular configuration across SFT, RLVR, and evaluation workflows.
Foundations of Medical Large Language Model Learning
WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics
CoCo: CoCo as CoT for Text-to-Image Preview and Rare Concept Generation
[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.
Fast, Sharp & Reliable Agentic Intelligence
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI
[CVPR 2026] InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its size.
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Code for "In-Context Former: Lightning-fast Compressing Context for Large Language Model" (Findings of EMNLP 2024)
STEP-GUI: The top GUI agent solution in the galaxy. Developed by the StepFun-GELab team and powered by StepFun’s cutting-edge research capabilities.