-
The Hong Kong Polytechnic University
Stars
Taming large-scale full-parameter few-step training with self-adversarial flows! 👏🏻
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Paper list for LLM/MLLM-based image segmentation
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.
A summary of related works about flow matching, stochastic interpolants
[AAAI 2026] The Official Implementation for "Anomagic: Crossmodal Prompt-driven Zero-shot Anomaly Generation"
Implementation for FP8/INT8 Rollout for RL training without performence drop.
Normal-Abnormal Guided Generalist Anomaly Detection (NeurIPS 2025)
PyTorch re-implementation for MeanFlow
[JMS 2025] A Comprehensive Survey for Real-World Industrial Surface Defect Detection: Challenges, Approaches, and Prospects (Journal of Manufacturing Systems)
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
[ICLR 2025] Point-SAM: Promptable 3D Segmentation Model for Point Clouds
This repository is the official implementation of Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation.
[NeurIPS 2025 Spotlight] "SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation."
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
Similar to the 2D Base Model, 3D Base Model is a bridge between images and 3D data.
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)