gateway

OpenAI PDF Tool

Overview

This tool uses GPTScript Gateway and OpenAI's GPT-4o to process and extract text from PDF files. Each page of the PDF is processed individually and the extracted text is then consumed by the LLM.

Usage

Configure your GPTScript Gateway API key as an environment variable.
Run the tool with the PDF file you want to process.
The tool will output the extracted text for each page of the PDF.

Example

export GPTSCRIPT_GATEWAY_API_KEY="your_openai_api_key"
gptscript eval --tools github.com/gptscript-ai/pdf-tool/gateway "use /path/to/pdf/file.pdf and report the contents of the file"

Detailed Description

tool.gpt

Name: pdf_vision
Description: Convert PDF to images and use GPT-4o vision to parse out text info.
Params:
- file_path: Path to the PDF file to analyze.
- prompt: Information to extract from the PDF.
- max_tokens: Integer value of tokens to have created by the LLM. Default is 300.

tool.py

The tool.py script performs the following steps:

Convert PDF to Images: Each page of the PDF is converted to an image using the fitz library (PyMuPDF).
Encode Image: The image is encoded to a base64 string.
Send Image to OpenAI: The base64 image is sent to OpenAI through GPTScript Gateway for analysis.
Output Extracted Data: The extracted text is printed to the console.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
requirements.txt		requirements.txt
tool.gpt		tool.gpt
tool.py		tool.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gateway

gateway

README.md

OpenAI PDF Tool

Overview

Usage

Example

Detailed Description

tool.gpt

tool.py

Files

gateway

Directory actions

More options

Directory actions

More options

Latest commit

History

gateway

Folders and files

parent directory

README.md

OpenAI PDF Tool

Overview

Usage

Example

Detailed Description

tool.gpt

tool.py