Skip to content
View zhanghe3z's full-sized avatar

Organizations

@ant-research

Block or report zhanghe3z

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official implementation of "E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models"

105 1 Updated Jun 4, 2025

GLUEMAP: Global Structure-from-Motion Meets Feedforward Reconstruction

Python 253 12 Updated May 26, 2026
Python 18 1 Updated May 30, 2026

high-performance inference and serving library for interactive autoregressive video and world models

Python 317 16 Updated Jun 13, 2026

Official implementation of paper "VLM³: Vision Language Models Are Native 3D Learners".

Jupyter Notebook 293 9 Updated Jun 1, 2026

Scaling Diffusion Transformers with Mixture of Experts

Python 426 20 Updated Sep 9, 2024
Python 191 14 Updated May 30, 2026

Simple 3d mapping and physic simulation on blender

Python 26 2 Updated Jun 2, 2026

[ICML 2026] Code for Equilibrium Reasoners: learning attractor dynamics for scalable reasoning

Python 38 5 Updated Jun 1, 2026

This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.

Python 1,420 73 Updated Aug 4, 2025

Official PyTorch Implementation of Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Python 213 16 Updated May 25, 2026

(NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation

Python 74 Updated May 21, 2026

[CVPR 2026 Oral] VGGT Omega

Python 2,950 120 Updated May 18, 2026

Flow Map OPD for AnyStep Video Diffusion

Python 366 8 Updated May 23, 2026
Python 64 1 Updated May 14, 2026

Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation

Python 712 28 Updated Jun 9, 2026

TIPSv2 (CVPR'26) and TIPS (ICLR'25)

Jupyter Notebook 544 36 Updated Jun 1, 2026

Model souping for LLMs

Python 73 4 Updated Nov 18, 2025

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Python 514 48 Updated Jul 15, 2024

Code implementation of the paper "World-in-World: World Models in a Closed-Loop World" (ICLR'26 Oral)

Python 173 3 Updated Apr 3, 2026

A feed-forward 3D foundation model for reconstructing scenes from streaming data

Python 7,197 712 Updated Jun 2, 2026

SteerViT is a framework that equips any ViT with the ability to steer both its global and local visual representations with natural language.

Python 103 5 Updated Jun 8, 2026

An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale

Python 423 45 Updated Jun 13, 2026

Build, Evaluate, and Deploy GUI Agents — online RL training, standardized benchmarks, and real-device deployment in one framework.

Python 1,283 54 Updated Jun 3, 2026

The video search layer for AI agents. Search video by meaning — across speech, visuals, and on-screen text.

TypeScript 139 6 Updated May 18, 2026

Information collection for the Happy Horse AI video generator model. Official demo and updates at happyhorses.io.

633 60 Updated May 12, 2026

Recipe for a General, Powerful, Scalable Graph Transformer

Python 862 155 Updated Jul 4, 2024

JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.

Python 2,169 157 Updated Jun 12, 2026

SOTA Open Source TTS

Python 30,794 2,626 Updated Jun 9, 2026

Official implementation of Categorical Flow Maps on text.

Python 59 5 Updated Feb 16, 2026
Next