Fovea is a unified command-line interface to computer vision APIs from Google, Microsoft, AWS, Clarifai, Imagga, IBM Watson, and SightHound Use Fovea if you want to:
- Easily classify images in a shell script.
- Compare alternative computer vision apis.
The table below characterizes Fovea's current feature coverage. Most vendors offer broadly similar features, but their output formats differ. Where possible, Fovea uses a tabular output mode suitable for interactive shell sessions, and scripts. If a particular feature is not supported by this tabular output mode, vendor-specific JSON is available, instead.
| Feature | Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|---|
| Labels | ✅️️ | ✅ ️️ | ✅️️ | ✅ | ✅ | ✅ | ✅ ️️ | ✅ ️️ | |
| Label i18n | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||
| Faces | ✅️️ | ✅️️ | ✅️️ | ✅ | ✅ | ✅️️ | ✅️️ | ✅️️ | |
| Landmarks | ✅ | ✅️️ | ✅️ ️ | ||||||
| Text (OCR) | ✅ | ✅️️️ | ️️❌ | ✅️️ | |||||
| Emotions | ✅️️ | ✅️️ | ❌️ | ✅ | ❌ | ✅️️ | |||
| Description | ✅️️ | ❌ | ✅️️ | ||||||
| Adult (NSFW) | ✅ | ✅️️ | ✅️️ | ❌ | ✅️️ | ✅️️ | |||
| Categories | ✅️️ | ✅️️ | ✅️️ | ✅️️ | |||||
| Image Type | ✅️ | ❌ | ✅️ ️ | ||||||
| Color | ✅️️ | ✅️️ | ❌ | ❌ | ✅️️ | ||||
| Celebrities | ✅ | ✅️ | ✅ | ✅️ | ✅️ | ✅ | |||
| Vehicles | ✅ | ✅️ | ✅ |
✅ indicates a working feature, ❌ indicates a missing feature, and empty-cells represent features not supported by a particular vendor.
Clone the Fovea repository, install its dependencies, and source its environment script.
[user@host]$ git clone https://github.com/28mm/Fovea.git
[user@host]$ cd Fovea
[user@host]$ pip3 install -r requirements.txt
[user@host]$ source fovea-env.sh Obtain credentials for the web services you plan to use. These should be supplied to Fovea via environment variables. See fovea-env.sh for a template.
- Google Cloud Vision API: https://cloud.google.com/vision/docs/
- Microsoft Computer Vision API: https://www.microsoft.com/cognitive-services/en-us/computer-vision-api
- Amazon Web Services Rekognition: https://aws.amazon.com/rekognition/
- IBM Watson Image Recognition: https://www.ibm.com/watson/developercloud/visual-recognition.html
- Clarifai: https://developer.clarifai.com/
- Imagga: https://docs.imagga.com
- SightHound: https://www.sighthound.com/products/cloud/
export GOOG_CV_KEY=""
export MSFT_CV_KEY=""
export AWS_CV_KEY_ID=""
export AWS_CV_KEY_SECRET=""
export AWS_CV_REGION=""
export CLARIFAI_CLIENT_ID=""
export CLARIFAI_CLIENT_SECRET=""
export CLARIFAI_ACCESS_TOKEN=""
export WATSON_CV_URL=""
export WATSON_CV_KEY=""
export IMAGGA_ID=""
export IMAGGA_SECRET=""
export SIGHTHOUND_TOKEN=""usage: fovea [-h]
[--provider {google,microsoft,amazon,opencv,watson,clarifai,imagga,facebook,sighthound}]
[--google | --microsoft | --amazon | --opencv | --watson | --clarifai | --facebook | --imagga | --sighthound]
[--output {tabular,json,yaml}] [--tabular | --json | --yaml]
[--lang LANG] [--ocr-lang OCR_LANG] [--max-labels MAX_LABELS]
[--precision PRECISION] [--labels] [--faces] [--text]
[--emotions] [--description] [--celebrities] [--adult]
[--categories] [--image_type] [--color] [--landmarks]
[--vehicles] [--confidence confidence threshold] [--ontology]
[--model MODELS] [--list-models] [--list-langs]
[--list-ocr-langs]
[files [files ...]]| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅️️ | ✅ ️️ | ✅️️ | ✅ | ✅ | ✅ | ✅ ️️ | ✅ ️️ |
If no other flags are set, --labels is set by default, and --provider is set to google.
[user@host]$ fovea inverts/cten/pleur.jpg
0.76 biology
0.72 organism
0.61 invertebrate
0.50 deep sea fish[user@host]$ fovea --clarifai inverts/cten/pleur.jpg
1.00 invertebrate
0.99 science
0.97 no person
0.97 desktop
0.96 biology
0.96 exploration
0.94 nature
0.93 underwater
0.93 one
0.92 wildlife| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Several providers offer label translations, and all default to English (en). Learn which languages a given provider supports with the list --list-langs flag.
[user@host]$ fovea --microsoft --list-langs
en
zhFrom the list of vendor-supported languages, set the desired language with the --lang argument.
[user@host]$ fovea http://omp.gso.uri.edu/ompweb/doee/biota/inverts/cten/pleur.jpg --clarifai --lang ar
0.99 لافقاريات
0.99 العلوم
0.96 لا يجوز لأي شخص
0.96 خلفية حاسوب
0.95 علم الاحياء
0.95 استكشاف| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅️️ | ✅️️ | ✅️️ | ✅ | ✅ | ✅️️ | ✅️️ | ✅️️ |
Most vendors support face detection. In addtion, OpenCV's pre-trained Haar cascade is available with the --faces and --opencv flags. The bounding box for each detected face is reported in a four field format, described below.
- Left-X
- Top-Y
- Width
- Height
See examples/face-detection for further information.
| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅ | ✅️️ | ✅️ ️ |
At present, only Google supports landmark and location detection.
[user@host]$ fovea --landmarks ../ex/rattlesnake-ledge.jpg
0.35 Rattlesnake Lake 47.436158,-121.77812576293945| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅ | ✅️️️ | ️️❌ | ✅️️ |
OCR is only supported in the JSON output mode, and its format is vendor specific.
| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅️️ | ✅️️ | ❌️ | ✅️️ | ❌ | ✅️️ |
Emotion detection is only supported in the JSON output mode, and its format is vendor specific.
| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅️️ | ❌ | ✅️️ |
Scene descriptions are only available in the JSON output mode, and its format is vendor specific.
| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅ | ✅️️ | ✅️️ | ❌ | ✅️️ | ✅️️ |
The parameters for NSFW and Adult image detection vary a bit between vendors. The values for Google are fudged from likelihoods (VERY_LIKELY, LIKELY, VERY_UNLIKELY) to numeric values (0.01, 0.25, 0.50, 0.75, 0.99), in order to follow the convention established by other services.
[user@host]$ fovea --adult --google test.jpg
0.25 nsfw
[user@host]$ fovea --adult --clarifai test.jpg
0.99 sfw
0.01 nsfw
[user@host]$ fovea --adult --microsoft test.jpg
0.13 nsfw
0.07 racy| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅️️ | ✅️️ | ✅️️ | ✅️️ |
Categoriziation is only available in the JSON output mode, and its format is vendor specific.
| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅️ | ❌ | ✅️ ️ |
Image type detection is only available in the JSON output mode, and its format is vendor specific.
| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅️️ | ✅️️ | ❌ | ❌ | ✅️️ |
Dominant color detection is only available in the JSON output mode, and its format is vendor specific.
| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅ | ✅️ | ✅ | ✅️ | ✅️ | ✅ |
Celebrity face matches are reported in a seven field format (if --ontology is set), or a six field format (if --ontology is not set). These formats are described below.
- Left-X
- Top-Y
- Width
- Height
- Confidence Score
- Ontology Link or a placeholder (if
--ontologyis set.) - The Celebrity's Name
[user@host]$ fovea obamas.jpg --microsoft --celebrities
432 134 148 148 0.95 Barack Hussein Obama
279 191 117 117 1.00 Michelle ObamaIn contrast to IBM and Microsoft, which return only their highest confidence results, Clarfai returns a long list of possible matches for each face. Exclude lower-confidence matches with the --confidence <int> parameter.
[user@host]$ fovea --celebrities --clarifai --confidence 0.9 --ontology obamas.jpg
427 122 162 162 0.99 ai_5XjK3npz barack obama
266 179 140 140 0.95 ai_z2S44mJX michelle obama
| Microsoft | Amazon | Clarifai | Watson | Imagga | S. Hound | Tabular | JSON | |
|---|---|---|---|---|---|---|---|---|
| ✅️ | ✅️ | ✅ |
Vehicle detection and recognition are only available with SightHound. Recognized cars are reported in a ten field format, described below.
- Left-X
- Top-Y
- Width
- Height
- Confidence Score (Make)
- Make (e.g. Cadillac)
- Confidence Score (Model)
- Model (e.g. Ats)
- Confidence Score (Color)
- Color (e.g. Black)
[user@host]$ fovea --sighthound --vehicles batmobile.jpg
15 112 580 238 0.08 Cadillac 0.08 Ats 0.66 black