G-Retriever

This repository contains the source code for the paper "G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering".

We attempt to adapt G-Retriever, a flexible question-answering framework targeting real-world textual graphs, applicable to multiple applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning, for the purposes of anomaly explanation.

G-Retriever integrates the strengths of Graph Neural Networks (GNNs), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG), and can be fine-tuned to enhance graph understanding via soft prompting.

Here are some resources to learn more about the dataset we have constructed, how we construct it, and it's results.
"Medium Article"
"Slides"

Citation

@article{he2024g,
  title={G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering},
  author={He, Xiaoxin and Tian, Yijun and Sun, Yifei and Chawla, Nitesh V and Laurent, Thomas and LeCun, Yann and Bresson, Xavier and Hooi, Bryan},
  journal={arXiv preprint arXiv:2402.07630},
  year={2024}
}

Environment setup

conda create --name g_retriever python=3.9 -y
conda activate g_retriever

# https://pytorch.org/get-started/locally/
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

python -c "import torch; print(torch.__version__)"
python -c "import torch; print(torch.version.cuda)"
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu118.html

pip install peft
pip install pandas
pip install ogb
pip install transformers
pip install wandb
pip install sentencepiece
pip install torch_geometric
pip install datasets
pip install pcst_fast

Data Preprocessing

# expla_graphs
python -m src.dataset.preprocess.expla_graphs
python -m src.dataset.expla_graphs

# scene_graphs, might take
python -m src.dataset.preprocess.scene_graphs
python -m src.dataset.scene_graphs

# webqsp
python -m src.dataset.preprocess.webqsp
python -m src.dataset.webqsp

Training

Replace path to the llm checkpoints in the src/model/__init__.py, then run

1) Inference-Only LLM

python inference.py --dataset scene_graphs --model_name inference_llm --llm_model_name 7b_chat

2) Frozen LLM + Prompt Tuning

# promot tuning
python train.py --dataset scene_graphs_baseline --model_name pt_llm

# G-Retriever
python train.py --dataset scene_graphs --model_name graph_llm

3) Tuned LLM

# finetune LLM with LoRA
python train.py --dataset scene_graphs_baseline --model_name llm --llm_frozen False

# G-Retriever with LoRA
python train.py --dataset scene_graphs --model_name graph_llm --llm_frozen False

Reproducibility

Use run.sh to run the codes and reproduce the published results in the main table.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
dataset		dataset
figs		figs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_real_names.py		generate_real_names.py
generate_scenarios.py		generate_scenarios.py
inference.py		inference.py
run.sh		run.sh
setup.sh		setup.sh
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

G-Retriever

Citation

Environment setup

Data Preprocessing

Training

1) Inference-Only LLM

2) Frozen LLM + Prompt Tuning

3) Tuned LLM

Reproducibility

About

Uh oh!

Releases

Packages

Languages

License

unshrawal/G-Retriever

Folders and files

Latest commit

History

Repository files navigation

G-Retriever

Citation

Environment setup

Data Preprocessing

Training

1) Inference-Only LLM

2) Frozen LLM + Prompt Tuning

3) Tuned LLM

Reproducibility

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages