Lists (1)
Sort Name ascending (A-Z)
Stars
Official implementation of "E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models"
GLUEMAP: Global Structure-from-Motion Meets Feedforward Reconstruction
high-performance inference and serving library for interactive autoregressive video and world models
Official implementation of paper "VLM³: Vision Language Models Are Native 3D Learners".
Scaling Diffusion Transformers with Mixture of Experts
Simple 3d mapping and physic simulation on blender
[ICML 2026] Code for Equilibrium Reasoners: learning attractor dynamics for scalable reasoning
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Official PyTorch Implementation of Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
(NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
TIPSv2 (CVPR'26) and TIPS (ICLR'25)
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Code implementation of the paper "World-in-World: World Models in a Closed-Loop World" (ICLR'26 Oral)
A feed-forward 3D foundation model for reconstructing scenes from streaming data
SteerViT is a framework that equips any ViT with the ability to steer both its global and local visual representations with natural language.
An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
Build, Evaluate, and Deploy GUI Agents — online RL training, standardized benchmarks, and real-device deployment in one framework.
The video search layer for AI agents. Search video by meaning — across speech, visuals, and on-screen text.
Information collection for the Happy Horse AI video generator model. Official demo and updates at happyhorses.io.
Recipe for a General, Powerful, Scalable Graph Transformer
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
Official implementation of Categorical Flow Maps on text.