-
National University of Singapore
- Singapore
- https://ldkong.com
- in/ldkong
- @ldkong1205
Highlights
Lists (5)
Sort Name ascending (A-Z)
Stars
Awesome Multimodal Modeling [Covers MLLM, UMM, and NMM]
Official implementation for "HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions".
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. ๐ฆ
Official implementation for "Language-Conditioned World Modeling for Visual Navigation"
Official implementation of paper "Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation"
Official implementation of Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions (NeurIPS DB Track'24 Spotlight).
[CVPR 2026] ReasonMap: Towards Fine-Grained Visual Reasoning from Transit Maps
[ICLR 2026] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
๐ Event Camera Vision in the Era of Large Models: A Survey
๐ Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
๐ Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
[CVPR 2026 Oral] WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World
[CVPR 2026 Highlight] U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
"Paper2Slides: From Paper to Presentation in One Click"
[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D
[NeurIPS 2025] SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image Editing>
[NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
[AAAI 2026 Oral] LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
Official Competition Toolkit for The 2025 RoboSense Challenge
Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
[ICCV 2025] Perspective-Invariant 3D Object Detection
[ACL 2026 Findings, ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[SIGGRAPH Asia 2025] WorldExplorer: Towards Generating Fully Navigable 3D Scenes