Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Awesome-LLM: a curated list of Large Language Model
✨✨Latest Advances on Multimodal Large Language Models
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
LAVIS - A One-stop Library for Language-Vision Intelligence
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
A Collection of Papers and Codes for CVPR2025/CVPR2024/CVPR2021/CVPR2020 Low Level Vision
[SIGGRAPH 2025] Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
Official implementation of Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model (ICLR 2025 Oral)
The official implementation of AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
A curated publication list on evidential deep learning.
U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking
Official implementation of "Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals" (NeurIPS 2025)
A Pytorch implement of medical image segmentation U-shape architecture benchmarks
[ISBI 2024 Oral] Official Pytorch Code base for "CMUNeXt: An Efficient Medical Image Segmentation Network based on Large Kernel and Skip Fusion"
[ISBI 2023] Official Pytorch implementation of "CMU-Net: A Strong ConvMixer-based Medical Ultrasound Image Segmentation Network"
This is a project about visual spatial reasoning.
Privacy-oriented proxy & network manager, supporting WireGuard, L7 firewall, App-based policies and scripted MitM.
The official implementation of "ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training"
[MICCAI 2025] Official code for "Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster"
[MedIA 2025] MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation
Awesome Spatial Intelligence (Personal Use)
Collection of the latest spatial, 3D, and video/temporal reasoning papers