-
National University of Singapore
- Singapore
- https://ldkong.com
- in/ldkong
- @ldkong1205
Highlights
Lists (5)
Sort Name ascending (A-Z)
Stars
🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
🌐 WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World
U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
"Paper2Slides: From Paper to Presentation in One Click"
[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D
[NeurIPS 2025] SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image Editing>
[NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
[AAAI 2026 Oral] LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
Official Competition Toolkit for The 2025 RoboSense Challenge
Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
[ICCV 2025] Perspective-Invariant 3D Object Detection
The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”
[arxiv 2025] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[SIGGRAPH Asia 2025] WorldExplorer: Towards Generating Fully Navigable 3D Scenes
🌐 3D and 4D World Modeling: A Survey
🌐 A curated collection of large-scale 3D scene understanding models with real-world applications
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
【Accepted by TPAMI】Human Motion Video Generation: A Survey (https://ieeexplore.ieee.org/document/11106267)
[CVPR 25] Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
4DNeX: Feed-Forward 4D Generative Modeling Made Easy
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
[NeurIPS 2024] Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models