irg_final_project

Final project for the Information Retrieval and Generation class

Getting Started

1. Download the repository

git clone https://github.com/olaghattas/irg_final_project.git
cd irg_final_project

2. Install dependencies

(Recommended) create/activate a conda environment:

conda create -n rag python=3.11
conda activate rag

Install Python packages:

pip install -r requirements.txt

3. Download dataset

cd src/getting_started
python download_dataset.py

4. Ollama model create with 16k context size

Make sure you have ollama installed in your system. To install check here.
After installing ollama run the following to pull required model:

cd <project_root>
ollama pull llama3.1:8b-instruct-q8_0
ollama create llama3.1:8b-instruct-q8_0-16k -f src/getting_started/Modelfile
ollama serve # It will run ollama server in background.

Running Methods

1. ltc.nnn + Cross-encoder

cd irg_final_project
python3 src/methods/run_scratch.py \
    --corpus dataset/LitSearch_corpus_clean \
    --query dataset/LitSearch_query \
    --topk 50
    
python3 src/methods/run_ce.py \
    --run_file run_files/ltc_nnn_scratch_topk_1000.run \
    --corpus dataset/LitSearch_corpus_clean \
    --query dataset/LitSearch_query \
    --topk 50

2. LLM Exp + BM25 + DenseRet

Dense retrieval dependencies conflict with the LLM reranker, so this method must be run in an environment created using requirements_dense.txt (not requirements.txt).

Run from the project root:

cd irg_final_project
python3 src/methods/BM25_LLMExp_DenseRetrieval.py

3. TF-IDF(lnc.nnn) + DenseRet

This also requires the requirements_dense.txt environment.

Run from the project root:

cd irg_final_project
python3 src/methods/tfidf_DenseRetrieval.py  --runfile tfidf_runfile.run

tfidf_runfile.run must be generated before running hte script (see Step 6: TF-IDF (lnc.nnn)). Ensure the file is located inside the run_files directory.

4. LLM Exp + BM25 + DenseRet + LLM Rerank

This notebook must be run in the environment created with requirements.txt

First generate the dense-retrieval runfile using Step 2.
Update the Jupyter notebook with the correct path to the generated runfile.
Run the jupyter notebook BM25_LLMExp_DenseRetrieval_LLMRe-rank.ipynb

5. LLM Exp + BM25 + LLM Rerank

Run the jupyter notebook bm25_LLMExp_LLMRerank.ipynb

6.TF-IDF(lnc.nnn)

Run the jupyter notebook tfidf_lnc_nnn.ipynb

7. Thesausrus Expansion + TF-IDF(lnc.nnn) + LLM Rerank

Run the jupyter notebook tfidf_lnc-nnn_ThesaurusExp_LLMRerank.ipynb

Evalution

How to Run

python evaluate.py --qrels <qrels_path> --runs <run1> <run2> ... --metric <metric_name> --output <output_dir>

Required Inputs

--qrels: Path to qrels file (TREC format)
--runs: One or more run files (TREC format)

Available Metrics

ndcg@K
p@K
p@R
ap
map

Output

The script will generate: - Per-query CSV results - A summary CSV file - Printed summary statistics

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
data/corpus_jsonl		data/corpus_jsonl
dataset		dataset
evaluation		evaluation
indexes		indexes
logs		logs
run_files		run_files
src		src
.gitignore		.gitignore
README.md		README.md
bm25_baseline.py		bm25_baseline.py
litsearch_top3_results.jsonl		litsearch_top3_results.jsonl
litsearch_top3_results_tfidf_basic.jsonl		litsearch_top3_results_tfidf_basic.jsonl
litsearch_topk_results.run		litsearch_topk_results.run
litsearch_topk_results_tfidf_basic.run		litsearch_topk_results_tfidf_basic.run
requirements.txt		requirements.txt
requirements_dense.txt		requirements_dense.txt
retreival_method.ipynb		retreival_method.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

irg_final_project

Getting Started

1. Download the repository

2. Install dependencies

3. Download dataset

4. Ollama model create with 16k context size

Running Methods

1. ltc.nnn + Cross-encoder

2. LLM Exp + BM25 + DenseRet

3. TF-IDF(lnc.nnn) + DenseRet

4. LLM Exp + BM25 + DenseRet + LLM Rerank

5. LLM Exp + BM25 + LLM Rerank

6.TF-IDF(lnc.nnn)

7. Thesausrus Expansion + TF-IDF(lnc.nnn) + LLM Rerank

Evalution

How to Run

Required Inputs

Available Metrics

Output

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

olaghattas/irg_final_project

Folders and files

Latest commit

History

Repository files navigation

irg_final_project

Getting Started

1. Download the repository

2. Install dependencies

3. Download dataset

4. Ollama model create with 16k context size

Running Methods

1. ltc.nnn + Cross-encoder

2. LLM Exp + BM25 + DenseRet

3. TF-IDF(lnc.nnn) + DenseRet

4. LLM Exp + BM25 + DenseRet + LLM Rerank

5. LLM Exp + BM25 + LLM Rerank

6.TF-IDF(lnc.nnn)

7. Thesausrus Expansion + TF-IDF(lnc.nnn) + LLM Rerank

Evalution

How to Run

Required Inputs

Available Metrics

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages