Skip to content
View ldkong1205's full-sized avatar
๐ŸŒณ
๐ŸŒณ

Organizations

@PJLab-ADG @Pointcept @worldbench @WorldDock

Block or report ldkong1205

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Awesome Multimodal Modeling [Covers MLLM, UMM, and NMM]

222 13 Updated Apr 13, 2026

Official implementation for "HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions".

C++ 391 36 Updated Mar 30, 2026

Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. ๐Ÿฆž

Python 11,135 1,278 Updated Apr 10, 2026

Official implementation for "Language-Conditioned World Modeling for Visual Navigation"

Python 8 Updated Apr 1, 2026
Python 6 1 Updated Apr 3, 2026

Official implementation of paper "Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation"

Python 286 34 Updated Mar 26, 2026

Official implementation of Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions (NeurIPS DB Track'24 Spotlight).

C++ 54 7 Updated Dec 20, 2024

[CVPR 2026] ReasonMap: Towards Fine-Grained Visual Reasoning from Transit Maps

Python 77 3 Updated Feb 22, 2026

[ICLR 2026] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Python 43 2 Updated Feb 22, 2026

๐ŸŒ Event Camera Vision in the Era of Large Models: A Survey

6 2 Updated Apr 12, 2026

๐ŸŒ Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

HTML 145 12 Updated Apr 4, 2026

๐ŸŒ Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

HTML 376 36 Updated Apr 12, 2026

[CVPR 2026 Oral] WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Python 205 16 Updated Jan 18, 2026

Learning to Remove Lens Flare in Event Camera

Python 12 Updated Dec 24, 2025

[CVPR 2026 Highlight] U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Python 17 1 Updated Dec 20, 2025

"Paper2Slides: From Paper to Presentation in One Click"

Python 3,294 434 Updated Mar 15, 2026

[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D

Python 209 13 Updated Dec 26, 2025

[NeurIPS 2025] SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Python 43 3 Updated Nov 30, 2025

Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image Editing>

Python 71 Updated Dec 20, 2025

[NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding

Python 66 1 Updated Feb 10, 2026
Jupyter Notebook 268 56 Updated Apr 14, 2026

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

429 22 Updated Sep 22, 2025

[AAAI 2026 Oral] LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

Python 193 13 Updated Dec 12, 2025

Official Competition Toolkit for The 2025 RoboSense Challenge

10 Updated Jan 16, 2026

Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"

Python 141 6 Updated Dec 18, 2025

[ICCV 2025] Perspective-Invariant 3D Object Detection

Python 174 14 Updated Dec 22, 2025

[ACL 2026 Findings, ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

117 1 Updated Apr 8, 2026

AevaScenes Python SDK

Python 49 10 Updated Nov 6, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,958 1,732 Updated Jan 30, 2026

[SIGGRAPH Asia 2025] WorldExplorer: Towards Generating Fully Navigable 3D Scenes

Python 185 12 Updated Mar 30, 2026
Next