Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking
[SIGGRAPH 2025] Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
This is a project about visual spatial reasoning.
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
A Collection of Papers and Codes for CVPR2025/CVPR2024/CVPR2021/CVPR2020 Low Level Vision
Collection of the latest spatial, 3D, and video/temporal reasoning papers
Awesome Spatial Intelligence (Personal Use)
[MICCAI 2025] Official code for "Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster"
Official implementation of "Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals" (NeurIPS 2025)
Official implementation of Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model (ICLR 2025 Oral)
Offical Code for Paper "A General Knowledge Injection Framework for ICD Coding" (ACL 2025 Findings)
[ISBI 2023] Official Pytorch implementation of "CMU-Net: A Strong ConvMixer-based Medical Ultrasound Image Segmentation Network"
[MICCAI 2024] HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training
[MedIA 2025] MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation
[ISBI 2024 Oral] Official Pytorch Code base for "CMUNeXt: An Efficient Medical Image Segmentation Network based on Large Kernel and Skip Fusion"
A Pytorch implement of medical image segmentation U-shape architecture benchmarks
The official implementation of "ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training"
LAVIS - A One-stop Library for Language-Vision Intelligence
The official implementation of AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
A curated publication list on evidential deep learning.
[ICLR 2025] Official code of Self-Supervised Diffusion MRI Denoising via Iterative and Stable Refinement
[MedIA 2025] Hi-End-MAE: Hierarchical encoder-driven masked autoencoders are stronger vision learners for medical image segmentation