You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clipart-126-DomainNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify clipart images into 126 domain categories using the SiglipForImageClassification architecture
Hand-Gesture-2-Robot is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to recognize hand gestures and map them to specific robot commands using the SiglipForImageClassification architecture.
open-deepfake-detection is a vision-language encoder model fine-tuned from siglip2-base-patch16-512 for binary image classification. It is trained to detect whether an image is fake or real using the OpenDeepfake-Preview dataset. The model uses the SiglipForImageClassification architecture.
Food-101-93M is a fine-tuned image classification model built on top of google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It is trained to classify food images into one of 101 popular dishes, derived from the Food-101 dataset.
Multilabel-GeoSceneNet is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-label image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the SiglipForImageClassification architecture.
Face-Mask-Detection is a binary image classification model based on google/siglip2-base-patch16-224, trained to detect whether a person is wearing a face mask or not. This model can be used in public health monitoring, access control systems, and workplace compliance enforcement.
Multilabel-Portrait-SigLIP2 is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies portrait-style images into one of the following visual portrait categories:
Multisource-121-DomainNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images into 121 domain categories using the SiglipForImageClassification architecture.
facial-age-detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for multi-class image classification. It is trained to detect and classify human faces into age groups ranging from early childhood to elderly adults. The model uses the SiglipForImageClassification architecture.
Fire-Detection-Siglip2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to detect fire, smoke, or normal conditions using the SiglipForImageClassification architecture.
Hand-Gesture-2-Robot is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to recognize hand gestures and map them to specific robot commands using the SiglipForImageClassification architecture.
Clipart-126-DomainNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify clipart images into 126 domain categories using the SiglipForImageClassification architecture
open-age-detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for multi-class image classification. It is trained to classify the estimated age group of a person from an image. The model uses the SiglipForImageClassification architecture.
Document-Type-Detection is a multi-class image classification model based on google/siglip2-base-patch16-224, trained to detect and classify types of documents from scanned or photographed images. This model is helpful for automated document sorting, OCR pipelines, and digital archiving systems.
Gender-Classifier-Mini is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images based on gender using the SiglipForImageClassification architecture.
Traffic-Density-Classification is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images into traffic density categories using the SiglipForImageClassification architecture.
IndoorOutdoorNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images as either Indoor or Outdoor using the SiglipForImageClassification architecture..