This repo aims to provide a Ready-to-Go setup with TensorFlow environment for Image Captioning Inference using pre-trained model. For training from scratch or funetuning, please refer to Tensorflow Model Repo.
The Show and Tell model is a deep neural network that learns how to describe the content of images. For example:
Show and Tell: A Neural Image Caption Generator
A TensorFlow implementation of the image-to-text model described in the paper:
"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge."
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.
IEEE transactions on pattern analysis and machine intelligence (2016).
Full text available at: http://arxiv.org/abs/1609.06647
Please refer to the original Tensorflow Model Repo.
I strongly suggest that you run pip install -r requirement.txt
in your CLI
to get all packages needed.
OR you could opt for manually installing the required packages below:
- TensorFlow 1.0 or greater (instructions)
- NumPy (instructions)
- Natural Language Toolkit (NLTK):
- First install NLTK (instructions)
- Then install the NLTK data package "punkt" (instructions)
Download inceptionv3 finetuned parameters over 1M and you will get 4 files, and make sure to put them all into this path im2txt/model/Hugh/train/
- newmodel.ckpt-2000000.data-00000-of-00001
- newmodel.ckpt-2000000.index
- newmodel.ckpt-2000000.meta
- checkpoint
Your downloaded Show and Tell model can generate captions for any JPEG image! The following command line will generate captions for such an image.
python im2txt/run_inference.py --checkpoint_path="im2txt/model/Hugh/train/newmodel.ckpt-2000000" --vocab_file="im2txt/data
/Hugh/word_counts.txt" --input_files="im2txt/data/images/test.jpg"
Example output:
Captions for image test.jpg:
0) a young boy wearing a hat and tie . (p=0.000195)
1) a young boy wearing a blue shirt and tie . (p=0.000100)
2) a young boy wearing a blue shirt and a tie . (p=0.000045)
Note: you may get different results. Some variation between different models is expected.
Here is the image:
First, check out on this thread and it's likely that you find answer there. Otherwise, open an issue and I will try to help you.