I am an incoming Ph.D. student in National University of Singapore. I received my Master’s degree in Computer Technology from the University of Chinese Academy of Sciences (UCAS) in 2026. Prior this, I received my Bachelor’s degree in Software Engineering from Huazhong University of Science and Technology (HUST) in 2023. My research interests focus on Visual Tracking, Diffusion Model, Multi-modal Learning, and Large Language Models. I am currently exploring a unified omnimodal foundational model involving vision, audio, and text modalities, and I hope to see the model generate synergy and benefit from both generation and understanding, thereby extending the intelligence boundaries of existing models. I firmly believe that it can unify the paradigms of world models or vision-language-action models, and through this, benefit interactions across different physical devices and the real world.
🎯
Focusing
The incoming PhD student in NUS. Focus on Multimodal Learning (CVer&NLPer)
Pinned Loading
-
DiffusionTrack
DiffusionTrack Public[AAAI 2024] DiffusionTrack: Diffusion Model For Multi-Object Tracking. DiffusionTrack is the first work to employ the diffusion model for multi-object tracking by formulating it as a generative noi…
-
GUI-R1
GUI-R1 PublicForked from ritzz-ai/GUI-R1
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Python 3
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.