- Clone the repo
- Create a conda environment conda create -n SUNAR python=3.10.3
conda activate SUNAR - pip install -e .
curl https://gitlab.tudelft.nl/anonymous_arr/bcqa_data/-/raw/main/2wikimultihopQA.zip -o 2wikimultihopQA.zip
To download the Document graph of the corpus employed in neighborhood adaptive retrieval for MusiqueQa and WikimultihopQA use this link https://drive.google.com/drive/folders/1zyWtCyhQzxaMQpM6uXT5oqvtqFMg7nYl?usp=sharing
If you want to create your own corpus graph run
python evaluation/form_corpus_graph.py
Download wiki_musique_corpus.json from https://drive.google.com/drive/folders/1zyWtCyhQzxaMQpM6uXT5oqvtqFMg7nYl?usp=drive_link and save it in data folder
If you do not want to run splade download wqa_splade_docs.json from https://drive.google.com/drive/folders/1zyWtCyhQzxaMQpM6uXT5oqvtqFMg7nYl?usp=drive_link and drop it inside data/intermediate_outputs/
In evaluation/config.ini configure the corresponding paths to downloaded files configure project root directory to PYTHONPATH variable
Additionally set the following environment variables in the terminal
export PYTHONPATH=/path
export OPENAI_KEY=<your openai key>
export huggingface_token = <your huggingface token to access models >
To run first-stage retrieval run
python evaluation/wikimultihop/run_splade_inference.py
To directly reproduce the best results in paper (Searchain+SUNAR) in Table 2 run
python evaluation/wikimultihop/llms/run_searchain_sunar.py
Note the above script runs using evidences saved froma run of SUNAR algorithm and does inference to enable easy reproduction of the results in paper
If interested in running SUNAR end-end for reproducing SUNAR numbers in Table 1 run
python evaluation/wikimultihop/llms/run_sunar.py