Skip to content

eggry/image-retrieval-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Retrieval Demo

This is a demo image search engine powered by BGE-VL and Faiss. It supports two retrieval modes:

  • Text-to-Image Retrieval: Search for images using a textual query.
  • Image+Text Retrieval: Search for images based on both an image and a textual query.

The demo uses BAAI/BGE-VL-large to extract image embeddings and leverages Faiss for efficient similarity search. Image embeddings are extracted and stored as a file for fast future retrieval.

Run

  1. Install all requirements.
    pip install -r requirements.txt
    
  2. Modify line 21 of demo.py to specify your device (gpu or cpu).

    Note: The model requires ~2GB VRAM for inference.

  3. Modify line 48 to the folder containing your images. The embeddings for these images will be generated and added to the index.
  4. Run demo.py with Python. You may use a Jupyter interactive window for better experience.

Demo Overview

Embeddings were generated for 8,319 images. Each embedding is a 768-dimensional vector, requiring ~25MiB of storage for all vectors (~3MiB per 1,000 embeddings). Embedding extraction speed varies by hardware:

  • Intel i7-1255U: ~1.5 embeddings/sec.
  • NVIDIA RTX 2050: ~8.2 embeddings/sec.
  • NVIDIA A800: ~39.1 embeddings/sec.

For both retrieval modes, I used the same target image (a color wheel of Vocaloid characters) to test the performance.

For Text-to-Image Retrieval, I used the query: rainbow, color wheel, colorful, color circle, and the target image ranked at #79. This ranking is quite low, with many less relevant images ranked higher, causing me to miss the target image initially.
Text-to-Image Retrieval Results

For Image+Text Retrieval, I used the query: Find a picture that also a color wheel, along with another picture of a color wheel of Vocaloid characters. In this case, the target image ranked at #12, which is better but still not ideal, as many less relevant images were ranked higher.
Image+Text Retrieval Results

Acknowledgment

This demo was inspired by bd4sur/meme-search. Much of the code is adapted from the the offical demo of BGE-VL.

This demo was created to help me search this color wheel of Vocaloid characters. Special thanks to @lapis_jam2 for drawing this colorful work and motivating me to build this demo :)

About

A demo image search engine powered by BGE-VL and Faiss.

Resources

License

Stars

Watchers

Forks

Languages