-
Hong Kong University of Science and Technology
- Shanghai,China
-
12:51
(UTC -12:00) - https://hq-King.github.io
Stars
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Reference PyTorch implementation and models for DINOv3
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Notes about courses Dive into Deep Learning by Mu Li
Python实用教程,包括:Python基础,Python高级特性,面向对象编程,多线程,数据库,数据科学,Flask,爬虫开发教程。
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024
The official implementation of Segment Any 3D GAussians (AAAI-25)
《动手学深度学习》习题解答,在线阅读地址如下:
Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"
Open-vocabulary Object Segmentation with Diffusion Models
Code implementation of the paper "World-in-World: World Models in a Closed-Loop World"