Skip to content
View sugar-fly's full-sized avatar
  • PhD student, Wuhan University
  • Wuhan

Block or report sugar-fly

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

CoTracker is a model for tracking any point (pixel) on a video.

Jupyter Notebook 4,989 377 Updated Mar 3, 2026

GLUEMAP: Global Structure-from-Motion Meets Feedforward Reconstruction

Python 265 12 Updated May 26, 2026

Official implementation of paper "VLM³: Vision Language Models Are Native 3D Learners".

Jupyter Notebook 318 9 Updated Jun 1, 2026

UFM: A Unified Dense Image Correspondence Estimator for both Optical Flow & Wide Baseline Matching Tasks. Matches any pair of images. (NeurIPS 2025)

Python 330 21 Updated Apr 4, 2026

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Python 797 91 Updated Oct 8, 2024

[ICLR 2026] PyTorch implementation of "The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images with Mimimal 3D Knowledge".

Python 63 6 Updated May 13, 2026

[SIGGRAPH 2026 / TOG] Official code of the paper "UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors".

Python 233 9 Updated May 15, 2026

[ICML 2026] 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

Python 176 3 Updated May 18, 2026

Awesome List for On-Policy Distillation

653 11 Updated Jun 13, 2026

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Python 683 43 Updated May 30, 2026

This is a project about visual spatial reasoning.

HTML 136 5 Updated May 6, 2026

A paper list for spatial reasoning

755 42 Updated Jan 19, 2026

[ICLR 2026] Official implementation of the paper "📷 On the Generalization Capacities of MLLMs for Spatial Intelligence"

Python 29 1 Updated Mar 17, 2026

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 4,185 512 Updated Mar 23, 2026

[CVPR 2026 Oral] "MARCO: Navigating the Unseen Space of Semantic Correspondence"

Python 139 6 Updated Apr 21, 2026

[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"

Python 199 3 Updated Mar 19, 2026
Python 1,237 78 Updated Nov 20, 2025

Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"

Python 368 20 Updated Apr 17, 2026

[ICCV '25 Highlight] CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching

Python 37 4 Updated Jul 25, 2025

A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.

3,032 125 Updated Jun 12, 2026

The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'

Jupyter Notebook 240 8 Updated Nov 28, 2025

[ECCV 2026] Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

Python 418 22 Updated Jun 18, 2026
Python 77 2 Updated Oct 1, 2025
Python 339 16 Updated Apr 24, 2026

[CVPR 2026] "E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training" official implementation.

Python 296 15 Updated May 30, 2026

[ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning

Python 2,023 156 Updated May 18, 2026

[CVPR 2026] Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Python 237 7 Updated May 7, 2026

[ICLR'26] This repository is the implementation of "3D Aware Region Prompted Vision Language Model"

Python 26 Updated Feb 19, 2026

Open-source, self-hosted note-taking tool built for quick capture. Markdown-native, lightweight, and fully yours.

Go 60,898 4,478 Updated Jun 15, 2026

[ICCV 2025 Highlight] No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views

Python 152 8 Updated Dec 5, 2025
Next