- Beijing
Stars
Official implementation of "Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation".
🔎 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"
CLIP+MLP Aesthetic Score Predictor
AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.
slime is an LLM post-training framework for RL Scaling.
[BMVC2021] The first image composition assessment dataset. Used in the paper "Image Composition Assessment with Saliency-augmented Multi-pattern Pooling". Useful for image composition assessment, i…
This is the official code repo for DiT4DiT, a Vision-Action-Model (VAM) framework that combines video generation model with flow-matching-based action prediction for generalizable robotic manipulat…
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
[CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition
[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
Next generation frontend tooling. It's fast!
General technology for enabling AI capabilities w/ LLMs and MLLMs
A framework for efficient model inference with omni-modality models
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
AerialClaw: Towards General Intelligence for Autonomous Aerial Agents
The agent that grows with you
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
Post-training with Tinker
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
The awesome collection of OpenClaw skills. 5,400+ skills filtered and categorized from the official OpenClaw Skills Registry.🦞
⏬ Download AVA dataset (A Large-Scale Database for Aesthetic Visual Analysis)
Wan: Open and Advanced Large-Scale Video Generative Models