Skip to content

Latest commit

 

History

History

gateway

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

OpenAI PDF Tool

Overview

This tool uses GPTScript Gateway and OpenAI's GPT-4o to process and extract text from PDF files. Each page of the PDF is processed individually and the extracted text is then consumed by the LLM.

Usage

  1. Configure your GPTScript Gateway API key as an environment variable.
  2. Run the tool with the PDF file you want to process.
  3. The tool will output the extracted text for each page of the PDF.

Example

export GPTSCRIPT_GATEWAY_API_KEY="your_openai_api_key"
gptscript eval --tools github.com/gptscript-ai/pdf-tool/gateway "use /path/to/pdf/file.pdf and report the contents of the file"

Detailed Description

tool.gpt

  • Name: pdf_vision
  • Description: Convert PDF to images and use GPT-4o vision to parse out text info.
  • Params:
    • file_path: Path to the PDF file to analyze.
    • prompt: Information to extract from the PDF.
    • max_tokens: Integer value of tokens to have created by the LLM. Default is 300.

tool.py

The tool.py script performs the following steps:

  1. Convert PDF to Images: Each page of the PDF is converted to an image using the fitz library (PyMuPDF).
  2. Encode Image: The image is encoded to a base64 string.
  3. Send Image to OpenAI: The base64 image is sent to OpenAI through GPTScript Gateway for analysis.
  4. Output Extracted Data: The extracted text is printed to the console.