Jian Zhang jzh15

🌟 Jian Zhang | 张舰

🎓 Graduate Student
Xiamen University

🔬 Research Intern
Baidu Inc.

🚀 Research Vision

My long-term vision follows a progressive pathway: first achieving 3D-consistent content generation, then developing comprehensive 3D understanding, and ultimately enabling intelligent embodied agents that can navigate and interact within these 3D environments.

🎯 Current Focus Areas

🎬 3D-Consistent Content Generation
🔬 3D Spatial Understanding
🤖 3D Embodied Agents
🎮 Virtual Worlds & Metaverse Applications

🎓 Education

Graduate Student | Xiamen University (Sept 2023 - Present)
B.S. Artificial Intelligence | Nanchang University (Sept 2019 - June 2023)

💼 Experience

Research Intern | Baidu Inc. (Aug 2025 - Present) - Video Generation Research
Research Assistant | Texas A&M University (May 2025 - Aug 2025) - 3D Vision & Embodied Intelligence
Research Assistant | VITA Group, University of Texas at Austin (Jan 2024 - May 2025) - 3D Spatial Reconstruction & Understanding

📚 Featured Publications

🔥 Recent Highlights

🌟 VLM-3R: Vision-Language Models Augmented with 3D Reconstruction

ArXiv 2025 | Jian Zhang*, Zhiwen Fan*, et al.

Unified VLM framework incorporating 3D Reconstructive instruction tuning, processing monocular video to derive implicit 3D tokens for spatial assistance and embodied reasoning.

🌍 DynamicVerse: Physically-Aware Multimodal Modeling for Dynamic 4D Worlds

Preprint | Kairun Wen*, Yuzhi Huang*, ..., Jian Zhang, et al.

Large-scale dataset with 100K+ videos, 800K+ masks, and 10M+ frames for understanding dynamic physical worlds with evolving 3D structure and motion.

🏆 Large Spatial Model: End-to-end Unposed Images to Semantic 3D

NeurIPS 2024 | Jian Zhang*, Zhiwen Fan*, et al.

First real-time semantic 3D reconstruction system that directly processes unposed RGB images into semantic radiance fields in a single feed-forward pass.

⚡ InstantSplat: Sparse-view Gaussian Splatting in Seconds

ArXiv 2024 | Zhiwen Fan*, Kairun Wen*, ..., Jian Zhang, et al.

Lightning-fast sparse-view 3D scene reconstruction using self-supervised framework that optimizes 3D scene representation and camera poses simultaneously.

🌟 Open for Opportunities

🎬

3D-Consistent Video Generation
_{Creating spatially coherent visual content}

🔬

3D Spatial Understanding
_{Developing comprehensive 3D perception}

🤝

Research Collaborations
_{Building the future of 3D AI together}

Particularly interested in opportunities that bridge cutting-edge research with real-world applications.

📫 Contact

Building the future of 3D AI, one breakthrough at a time ✨

Provide feedback

Saved searches

Use saved searches to filter your results more quickly