Skip to content
View ldkong1205's full-sized avatar
🌳
🌳

Organizations

@PJLab-ADG @Pointcept @worldbench @WorldDock

Block or report ldkong1205

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

HTML 68 1 Updated Dec 19, 2025

🌐 WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Python 131 9 Updated Dec 19, 2025

Learning to Remove Lens Flare in Event Camera

Python 11 Updated Dec 17, 2025

U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences

Python 7 Updated Dec 7, 2025

"Paper2Slides: From Paper to Presentation in One Click"

Python 2,357 319 Updated Dec 19, 2025

[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D

Python 196 10 Updated Dec 12, 2025

[NeurIPS 2025] SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Python 38 2 Updated Nov 30, 2025

Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image Editing>

Python 18 Updated Dec 20, 2025

[NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding

Python 60 Updated Oct 23, 2025
Jupyter Notebook 251 53 Updated Dec 19, 2025

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

396 19 Updated Sep 22, 2025

[AAAI 2026 Oral] LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

Python 178 12 Updated Dec 12, 2025

Official Competition Toolkit for The 2025 RoboSense Challenge

8 Updated Nov 30, 2025

Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"

Python 127 7 Updated Dec 18, 2025

[ICCV 2025] Perspective-Invariant 3D Object Detection

Python 152 11 Updated Dec 17, 2025

The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”

109 1 Updated Oct 7, 2025

[arxiv 2025] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Python 34 1 Updated Oct 29, 2025

AevaScenes Python SDK

Python 40 10 Updated Nov 6, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,279 1,447 Updated Nov 28, 2025

[SIGGRAPH Asia 2025] WorldExplorer: Towards Generating Fully Navigable 3D Scenes

Python 157 10 Updated Dec 8, 2025

🌐 3D and 4D World Modeling: A Survey

HTML 706 41 Updated Dec 17, 2025

🌐 A curated collection of large-scale 3D scene understanding models with real-world applications

HTML 20 Updated Dec 19, 2025

Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap

66 3 Updated Jul 15, 2025

【Accepted by TPAMI】Human Motion Video Generation: A Survey (https://ieeexplore.ieee.org/document/11106267)

283 11 Updated Dec 19, 2025
Python 1 Updated Sep 1, 2025

[CVPR 25] Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation

Python 241 9 Updated Sep 17, 2025

4DNeX: Feed-Forward 4D Generative Modeling Made Easy

Python 803 10 Updated Dec 14, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,213 39 Updated Oct 4, 2025

[NeurIPS 2024] Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

Python 330 7 Updated Jan 21, 2025
Next