llm-pdf-retrieval

This small project leverages Large Language Models (LLMs) to automatically extract structured data from a set of scholarly articles in PDF format. It uses Mistral, lightweight Retrieval-Augmented Generation (RAG) and LangChain to process the input documents and identify key details specified by the user. The main script returns a JSON file storing the key information retrieved from one or more articles.

Configuration

Edit config.py to add your own INPUT_PATH and OUTPUT PATH.
Edit config.py to add your own Mistral model under MODEL_NAME.
Toggle PARSER_USAGE in config.py to True if you would like to use a specific parser.

Installation

To install and run the project locally, follow the steps below:

A Mistral API key is needed (Get API Key).
Install Python. Version 3.12.3 was used for this development.
Clone the repository from terminal (git must be installed):
```
git clone https://github.com/carobs9/llm-pdf-retrieval.git
```
- Cloning a repository
Navigate to the project directory:
```
cd [YOUR PROJECT DIRECTORY]
```
Create a virtual environment:
```
python3.12.3 -m venv <env_name>
```
- Creating Virtual Environments
Activate the virtual environment:

Mac:
```
source venv/bin/activate
```
Windows:
```
./env_name>/Scripts/activate
```
Linux:
```
./<env_name>/bin/activate
```
- Activating a virtual environment

Install the dependencies:
```
pip install -r requirements.txt
```
- Installing Packages

Mistral API Configuration

This project uses a Mistral LLM to obtain results.

Go into the official Mistral website and click on "Try the API".
Create an account and click on API Keys.
Click on "Create new key" and store it in your environment file.
In a PowerShell terminal, run:
```
$env:MISTRAL_API_KEY = "[your_api_key]"
```
In the same terminal, run:
```
python main.py
```

Structure

llm-pdf-retrieval
|  |___ config.py
|  |___ main.py
|  |___ README.md
|  |___ requirements.txt
|
|___ outputs/
|     |___ output.json
|
|___ pdfs/

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
outputs		outputs
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llm-pdf-retrieval

Configuration

Installation

Mistral API Configuration

Structure

About

Uh oh!

Releases

Packages

Languages

License

carobs9/llm-pdf-retrieval

Folders and files

Latest commit

History

Repository files navigation

llm-pdf-retrieval

Configuration

Installation

Mistral API Configuration

Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages