Skip to content
View ldkong1205's full-sized avatar
🌳
🌳

Organizations

@PJLab-ADG @Pointcept @worldbench @WorldDock

Block or report ldkong1205

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🌐 Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

HTML 31 4 Updated Dec 24, 2025

🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

HTML 124 8 Updated Dec 24, 2025

🌐 WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Python 161 13 Updated Dec 19, 2025

Learning to Remove Lens Flare in Event Camera

Python 11 Updated Dec 24, 2025

U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Python 9 Updated Dec 20, 2025

"Paper2Slides: From Paper to Presentation in One Click"

Python 2,490 339 Updated Dec 19, 2025

[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D

Python 195 10 Updated Dec 12, 2025

[NeurIPS 2025] SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Python 40 2 Updated Nov 30, 2025

Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image Editing>

Python 20 Updated Dec 20, 2025

[NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding

Python 61 Updated Oct 23, 2025
Jupyter Notebook 253 53 Updated Dec 22, 2025

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

396 19 Updated Sep 22, 2025

[AAAI 2026 Oral] LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

Python 178 12 Updated Dec 12, 2025

Official Competition Toolkit for The 2025 RoboSense Challenge

9 Updated Nov 30, 2025

Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"

Python 127 7 Updated Dec 18, 2025

[ICCV 2025] Perspective-Invariant 3D Object Detection

Python 153 11 Updated Dec 22, 2025

The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”

110 1 Updated Oct 7, 2025

[arxiv 2025] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Python 35 1 Updated Oct 29, 2025

AevaScenes Python SDK

Python 41 10 Updated Nov 6, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,384 1,456 Updated Nov 28, 2025

[SIGGRAPH Asia 2025] WorldExplorer: Towards Generating Fully Navigable 3D Scenes

Python 157 10 Updated Dec 8, 2025

🌐 3D and 4D World Modeling: A Survey

HTML 744 41 Updated Dec 17, 2025

🌐 A Roadmap for 3D Scene Understanding in the Wild

HTML 21 Updated Dec 19, 2025

Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap

66 3 Updated Jul 15, 2025

【Accepted by TPAMI】Human Motion Video Generation: A Survey (https://ieeexplore.ieee.org/document/11106267)

284 11 Updated Dec 24, 2025
Python 1 Updated Sep 1, 2025

[CVPR 25] Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

Python 241 9 Updated Sep 17, 2025

4DNeX: Feed-Forward 4D Generative Modeling Made Easy

Python 804 10 Updated Dec 14, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,225 40 Updated Dec 23, 2025
Next