This tool uses GPTScript Gateway and OpenAI's GPT-4o to process and extract text from PDF files. Each page of the PDF is processed individually and the extracted text is then consumed by the LLM.
- Configure your GPTScript Gateway API key as an environment variable.
- Run the tool with the PDF file you want to process.
- The tool will output the extracted text for each page of the PDF.
export GPTSCRIPT_GATEWAY_API_KEY="your_openai_api_key"
gptscript eval --tools github.com/gptscript-ai/pdf-tool/gateway "use /path/to/pdf/file.pdf and report the contents of the file"
- Name: pdf_vision
- Description: Convert PDF to images and use GPT-4o vision to parse out text info.
- Params:
file_path
: Path to the PDF file to analyze.prompt
: Information to extract from the PDF.max_tokens
: Integer value of tokens to have created by the LLM. Default is 300.
The tool.py
script performs the following steps:
- Convert PDF to Images: Each page of the PDF is converted to an image using the
fitz
library (PyMuPDF). - Encode Image: The image is encoded to a base64 string.
- Send Image to OpenAI: The base64 image is sent to OpenAI through GPTScript Gateway for analysis.
- Output Extracted Data: The extracted text is printed to the console.