This repository provides tools to convert PowerPoint presentations (.pptx
) into images for each slide and then detect logos on those slides using Azure OpenAI. The project includes two main scripts:
Converts each .pptx
file in a given input folder to PNG images for each slide. These images are saved in a specified output folder with filenames indicating the slide number. The conversion process involves:
- Converting
.pptx
files to PDF using LibreOffice. - Converting the PDF into PNG images for each slide using
pdftoppm
.
-
Place your
.pptx
files in the input folder (e.g.,data
). -
Run the script:
python convert_pptx_to_png.py
-
The script will save PNG images for each slide in the output folder (e.g.,
slides
).
The default directories for input and output are:
INPUT_FOLDER = 'data' # Path to the input folder with .pptx files
OUTPUT_FOLDER = 'slides' # Path to the output folder for PNG images
You can change INPUT_FOLDER
and OUTPUT_FOLDER
to customize where files are read and saved.
Processes each PNG image of a slide, sending it to an Azure OpenAI model that detects company logos. The script reads all PNG files in a specified folder, converts each image to base64, and sends it to the OpenAI API. Results are saved in an output file with the detected logos for each slide.
- Ensure that the PNG images from
convert_pptx_to_png.py
are in the slides directory (e.g.,slides
). - Run the script:
python detect_logos_gpt4o.py
- The results will be saved in an output file (
output/logos_output_gpt4o.txt
).
The default directory for PNG files is:
SLIDES_DIR = 'slides' # Path to the folder with PNG files for processing
You can change SLIDES_DIR
if your images are saved in a different directory.
Before running detect_logos_gpt4o.py
, create a .env
file in the project root with the necessary Azure OpenAI API credentials. The file should contain:
AZURE_OPENAI_API_KEY
: Your API key for Azure OpenAI.AZURE_OPENAI_ENDPOINT
: The endpoint URL for Azure OpenAI.AZURE_OPENAI_DEPLOYMENT_ID
: The deployment ID of your model.
An example .env
file:
AZURE_OPENAI_API_KEY=your_api_key
AZURE_OPENAI_ENDPOINT=https://your_endpoint.openai.azure.com
AZURE_OPENAI_DEPLOYMENT_ID=your_deployment_id
These values will be used by detect_logos_gpt4o.py
to authenticate and interact with the Azure OpenAI API.
- Python (3.11 or later)
- LibreOffice: Required for converting
.pptx
to PDF on Linux. - poppler-utils: Required for converting PDF to PNG.
- Java: Needed for LibreOffice if not already installed.
-
Install LibreOffice:
sudo apt update sudo apt install -y libreoffice-core libreoffice-impress libreoffice-script-provider-python sudo apt upgrade -y libreoffice
-
Install Java (if needed):
sudo apt install -y default-jre java -version sudo update-alternatives --config java echo "export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64" >> ~/.bashrc echo "export PATH=$PATH:$JAVA_HOME/bin" >> ~/.bashrc source ~/.bashrc
-
Install Poppler Utils:
sudo apt-get update sudo apt-get install poppler-utils
-
Test Installation:
soffice --headless --convert-to pdf ./data/sample_deck.pptx --outdir ./data/