Vision Foundation Models: SAM, ViT, CLIP, DINOv2, object detection, segmentation, and multimodal AI for computer vision.
computer-vision sam yolo image-recognition object-detection vit clip semantic-segmentation zero-shot-learning mae instance-segmentation vision-transformers foundation-models visual-understanding open-vocabulary dinov2 grounding-dino multimodal-ai
-
Updated
Nov 10, 2025 - Makefile