myst | ||||||||
---|---|---|---|---|---|---|---|---|
|
To install AutoRAG, you can use pip:
pip install AutoRAG
Plus, it is recommended to install PyOpenSSL and nltk libraries for full features.
pip install --upgrade pyOpenSSL
pip install nltk
python3 -c "import nltk; nltk.download('punkt_tab')"
python3 -c "import nltk; nltk.download('averaged_perceptron_tagger_eng')"
Do you have any trouble with installation?
First, you can check out the [troubleshooting](troubleshooting.md) page.
AutoRAG is not fully supported on Windows yet. There are several constraints for Windows users.
- TART, UPR, and MonoT5 passage rerankers does not support Windows.
- Parsing might be not working properly in the Windows environment.
- Cannot use FlagEmbedding passage reranker with
batch
setting with 1. The default batch is 64.
Due to the constraints, we recommend using Docker images for running AutoRAG on Windows.
Plus, you MAKE SURE UPGRADE UP TO v0.3.1 for Windows users.
For using local models, you need to install some additional dependencies.
pip install "AutoRAG[gpu]"
For parsing you need to install some local packages like libmagic, tesseract, and poppler. The installation method depends upon your OS.
After installing this, you can install AutoRAG with parsing like below.
pip install "AutoRAG[parse]"
You can install optional dependencies for the Korean language.
pip install "AutoRAG[ko]"
And after that, you have to install jdk 17 for using konlpy
.
Plus, remember to set environment PATH for jdk.
(JAVA_HOME and PATH)
The instruction for Mac users is here.
pip install "AutoRAG[ja]"
To use LLM and embedding models, it is common to use OpenAI models. If you want to use other models, check out here
You need to set OPENAI_API_KEY environment variable. You can get your API key at here.
export OPENAI_API_KEY="sk-...your-api-key..."
Or, as an alternative, you can set env variable using the .env
file.
pip install python-dotenv
Then, make an.env file at your root folder like below.
OPENAI_API_KEY=sk-...your-api-key...
And when you try to run AutoRAG, you can use below code to load .env
file.
from dotenv import load_dotenv
load_dotenv()
And you are ready to use AutoRAG!
If you want to build AutoRAG from source, the first step is to clone the AutoRAG repository.
git clone https://github.com/Marker-Inc-Korea/AutoRAG.git
And install AutoRAG to editable.
cd AutoRAG
pip install -e .
And then, for testing and documentation build, you need to install some additional packages.
pip install -r tests/requirements.txt
pip install -r docs/requirements.txt
For testing, you have to set the environment variable at pytest.ini.
Make a new pytest.ini
file at the root of the project and write below.
[pytest]
env =
OPENAI_API_KEY=sk-...your-api-key...
log_cli=true
log_cli_level=INFO
After that, you can run tests with pytest.
python -m pytest -n auto
After this, please check out our documentation for contributors. We are writing this documentation for contributors, so please wait for a while.
Tip: If you want to build an image for a gpu version, you can use autoraghq/autorag:gpu
or autoraghq/autorag:gpu-parsing
To run AutoRAG using Docker, follow these steps:
docker build --target production -t autorag:prod .
This command will build the production-ready Docker image, using only the production
stage defined in the Dockerfile
.
Run the container with the following command:
docker run --rm -it \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v $(pwd)/sample_config:/usr/src/app/sample_config \
-v $(pwd)/projects:/usr/src/app/projects \
autoraghq/autorag:api evaluate \
--config /usr/src/app/sample_config/rag/simple/simple_openai.yaml \
--qa_data_path /usr/src/app/projects/test01/qa_validation.parquet \
--corpus_data_path /usr/src/app/projects/test01/corpus.parquet \
--project_dir /usr/src/app/projects/test01
-v ~/.cache/huggingface:/root/.cache/huggingface
: Mounts the host's Hugging Face cache to the container, allowing it to access pre-downloaded models.-v $(pwd)/sample_config:/usr/src/app/sample_config
: Mounts the localsample_config
directory to the container.-v $(pwd)/projects:/usr/src/app/projects
: Mounts the localprojects
directory to the container.autoraghq/autorag:all evaluate
: Executes theevaluate
command inside theautoraghq/autorag:all
container.--config
,--qa_data_path
,--corpus_data_path
,--project_dir
: Specifies paths to the configuration file, QA dataset, corpus data, and project directory.
Alternatively, you can mount the Hugging Face cache to a custom location inside the container and set the HF_HOME
environment variable:
docker run --rm -it \
-v ~/.cache/huggingface:/cache/huggingface \
-v $(pwd)/sample_config:/usr/src/app/sample_config \
-v $(pwd)/projects:/usr/src/app/projects \
-e HF_HOME=/cache/huggingface \
autoraghq/autorag:api evaluate \
--config /usr/src/app/sample_config/rag/simple/simple_openai.yaml \
--qa_data_path /usr/src/app/projects/test01/qa_validation.parquet \
--corpus_data_path /usr/src/app/projects/test01/corpus.parquet \
--project_dir /usr/src/app/projects/test01
-v ~/.cache/huggingface:/cache/huggingface
: Mounts the host's Hugging Face cache to/cache/huggingface
inside the container.-e HF_HOME=/cache/huggingface
: Sets theHF_HOME
environment variable to point to the mounted cache directory.
To manually access the container for debugging or testing, start a Bash shell:
docker run --rm -it --entrypoint /bin/bash autoraghq/autorag:api
This command allows you to explore the container’s filesystem, run commands manually, or inspect logs for troubleshooting.
To use the gpu version, you must install CUDA and cuDNN in your host system. It built on the cuda 11.8 version and pytorch docker image.
docker run --rm -it \
-v ~/.cache/huggingface:/cache/huggingface \
-v $(pwd)/sample_config:/usr/src/app/sample_config \
-v $(pwd)/projects:/usr/src/app/projects \
-e HF_HOME=/cache/huggingface \
--gpus all \ # Be sure to add this line
autoraghq/autorag:gpu evaluate \
--config /usr/src/app/sample_config/rag/simple/simple_openai.yaml \
--qa_data_path /usr/src/app/projects/test01/qa_validation.parquet \
--corpus_data_path /usr/src/app/projects/test01/corpus.parquet \
--project_dir /usr/src/app/projects/test01
- Ensure that the necessary directories (
sample_config
andprojects
) are present in the host system. - If running in a CI/CD pipeline, consider using environment variables or
.env
files to manage API keys and paths dynamically.