-
The University of Hong Kong (HKU )
- Hong Kong
- https://qi-zhangyang.github.io/
Stars
[CoRL 2025] Repository relating to "TrackVLA: Embodied Visual Tracking in the Wild"
Low-level locomotion policy training in Isaac Lab
A toolbox for spectral compressive imaging reconstruction including MST (CVPR 2022), CST (ECCV 2022), DAUHST (NeurIPS 2022), BiSCI (NeurIPS 2023), HDNet (CVPR 2022), MST++ (CVPRW 2022), etc.
[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
a framework combining abilities of QwenVL and Deepseek Apis to enable a visual interaction using deepseek model.
A workflow for DeepSeek to automatically write exam questions
Three examples using AutoGLM api to control mobile through esp32 and web server
a open framework for blind navigation based on esp32
Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models"
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
🔥🔥🔥 专业版iOS混淆工具,马甲工具包、ipa静态分析工具(相似度对比、敏感词检测),提供试用版本,100%过机器审核,解决 AppStore 4.3,2.3.1问题,支持语言 c、c++、objc、dart、swift 并支持各种资源改名,混淆、傻瓜化操作、一键出包,提供良好的UI界面,支持多包管理一包一特征、支持Unity3d、cocos2d全家桶、swiftUI、flutter、虚幻…
Official style files for papers submitted to venues of the Association for Computational Linguistics
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfac…
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
[CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Solve Visual Understanding with Reinforced VLMs
Witness the aha moment of VLM with less than $3.
[RSS 2024 & RSS 2025] VLN-CE evaluation code of NaVid and Uni-NaVid
Fully open reproduction of DeepSeek-R1
[CVPR 2025 Highlight] Real-time dense scene reconstruction with SLAM3R
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
MultiScan: Scalable RGBD scanning for 3D environments with articulated objects
Open3D: A Modern Library for 3D Data Processing