Skip to content
View sayands's full-sized avatar
🎯
🎯

Organizations

@GradientSpaces

Block or report sayands

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Some useful tools for my own research (e.g. visualization of slam trajectory, codes to generate sliding bar video)

Python 4 Updated Jun 3, 2025

[CVPR 2026] WildPose: A Unified Framework for Robust Pose Estimation in the Wild

Python 7 Updated May 14, 2026
Python 37 3 Updated Jan 8, 2026

[Preprint] Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale

49 Updated Apr 14, 2026

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 13,105 1,457 Updated May 16, 2026

Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models

47 1 Updated Mar 24, 2026

Native and Compact Structured Latents for 3D Generation

Python 6,903 848 Updated Jan 10, 2026

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

2,082 91 Updated May 15, 2026

[ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Python 321 13 Updated Dec 21, 2025

Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding

Python 143 5 Updated May 16, 2026

Inference, evaluation and analysis code for STEVO-Bench

Python 12 1 Updated May 9, 2026

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 2,282 157 Updated Apr 13, 2026

Official Implementation of Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction

15 Updated Apr 26, 2026

Code repository for "DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers"

Python 96 6 Updated Oct 28, 2025

Strategic research thinking agents for Claude Code — idea evaluation, project triage, and structured brainstorming. Helps you decide which papers to write, not just how to write them.

653 56 Updated Apr 13, 2026

Official implementation of "Repurposing Geometric Foundation Models for Multi-view Diffusion"

Python 196 8 Updated Apr 1, 2026

[ICLR '26 Oral] Official repository of the paper "AnyUp: Universal Feature Upsampling".

Jupyter Notebook 543 36 Updated Apr 17, 2026

[CVPR 2024] Probing the 3D Awareness of Visual Foundation Models

Python 348 19 Updated Dec 1, 2025

[ICLR 2026] Official implementation of "Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation"

Python 24 Updated May 4, 2026

TIPSv2 (CVPR'26) and TIPS (ICLR'25)

Jupyter Notebook 488 31 Updated May 15, 2026

🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman

JavaScript 60,870 3,386 Updated May 12, 2026
Jupyter Notebook 44 Updated Apr 15, 2026

[ICLR'26] This repository is the implementation of "3D Aware Region Prompted Vision Language Model"

Python 24 Updated Feb 19, 2026
Python 4,664 460 Updated Apr 15, 2026

Code for the Molmo2 Vision-Language Model

Python 539 36 Updated Mar 18, 2026

[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.

Python 216 12 Updated Jun 4, 2025

[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.

Python 806 54 Updated Nov 10, 2025

Code and data for UniEgoMotion (ICCV 2025)

Python 53 4 Updated Apr 18, 2026

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 776 35 Updated May 15, 2026
Next