Skip to content
View jzh15's full-sized avatar
๐ŸŽฏ
Focusing
๐ŸŽฏ
Focusing

Block or report jzh15

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
jzh15/README.md

๐ŸŒŸ Jian Zhang | ๅผ ่ˆฐ

Typing SVG

Homepage Google Scholar CV Email


Jian Zhang

๐ŸŽ“ Graduate Student
Xiamen University

๐Ÿ”ฌ Research Intern
Baidu Inc.

๐Ÿš€ Research Vision

My long-term vision follows a progressive pathway: first achieving 3D-consistent content generation, then developing comprehensive 3D understanding, and ultimately enabling intelligent embodied agents that can navigate and interact within these 3D environments.

๐ŸŽฏ Current Focus Areas

  • ๐ŸŽฌ 3D-Consistent Content Generation
  • ๐Ÿ”ฌ 3D Spatial Understanding
  • ๐Ÿค– 3D Embodied Agents
  • ๐ŸŽฎ Virtual Worlds & Metaverse Applications

๐ŸŽ“ Education

  • Graduate Student | Xiamen University (Sept 2023 - Present)
  • B.S. Artificial Intelligence | Nanchang University (Sept 2019 - June 2023)

๐Ÿ’ผ Experience

  • Research Intern | Baidu Inc. (Aug 2025 - Present) - Video Generation Research
  • Research Assistant | Texas A&M University (May 2025 - Aug 2025) - 3D Vision & Embodied Intelligence
  • Research Assistant | VITA Group, University of Texas at Austin (Jan 2024 - May 2025) - 3D Spatial Reconstruction & Understanding

๐Ÿ“š Featured Publications

๐Ÿ”ฅ Recent Highlights

๐ŸŒŸ VLM-3R: Vision-Language Models Augmented with 3D Reconstruction

ArXiv 2025 | Jian Zhang*, Zhiwen Fan*, et al.

Unified VLM framework incorporating 3D Reconstructive instruction tuning, processing monocular video to derive implicit 3D tokens for spatial assistance and embodied reasoning.

Paper Code Project Demo


๐ŸŒ DynamicVerse: Physically-Aware Multimodal Modeling for Dynamic 4D Worlds

Preprint | Kairun Wen*, Yuzhi Huang*, ..., Jian Zhang, et al.

Large-scale dataset with 100K+ videos, 800K+ masks, and 10M+ frames for understanding dynamic physical worlds with evolving 3D structure and motion.

Project Paper Code Demo


๐Ÿ† Large Spatial Model: End-to-end Unposed Images to Semantic 3D

NeurIPS 2024 | Jian Zhang*, Zhiwen Fan*, et al.

First real-time semantic 3D reconstruction system that directly processes unposed RGB images into semantic radiance fields in a single feed-forward pass.

Paper Code Project


โšก InstantSplat: Sparse-view Gaussian Splatting in Seconds

ArXiv 2024 | Zhiwen Fan*, Kairun Wen*, ..., Jian Zhang, et al.

Lightning-fast sparse-view 3D scene reconstruction using self-supervised framework that optimizes 3D scene representation and camera poses simultaneously.

Paper Code Project



๐ŸŒŸ Open for Opportunities

๐ŸŽฌ

3D-Consistent Video Generation
Creating spatially coherent visual content

๐Ÿ”ฌ

3D Spatial Understanding
Developing comprehensive 3D perception

๐Ÿค

Research Collaborations
Building the future of 3D AI together

Particularly interested in opportunities that bridge cutting-edge research with real-world applications.


๐Ÿ“ซ Contact

Email


Thanks for visiting!

Building the future of 3D AI, one breakthrough at a time โœจ

Pinned Loading

  1. NVlabs/LSM NVlabs/LSM Public

    [NeurIPS'24] Large Spatial Model: End-to-end Unposed Images to Semantic 3D

    Python 222 9

  2. VITA-Group/VLM-3R VITA-Group/VLM-3R Public

    VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

    Python 311 23

  3. NVlabs/InstantSplat NVlabs/InstantSplat Public

    InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds

    Python 1.6k 135