GOOD: A Graph Out-of-Distribution Benchmark
We are actively building the document.
Out-of-distribution (OOD) learning deals with scenarios in which training and test data follow different distributions. Although general OOD problems have been intensively studied in machine learning, graph OOD is only an emerging area of research. Currently, there lacks a systematic benchmark tailored to graph OOD method evaluation. This project is for Graph Out-of-distribution development, known as GOOD. We explicitly make distinctions between covariate and concept shifts and design data splits that accurately reflect different shifts. We consider both graph and node prediction tasks as there are key differences when designing shifts. Currently, GOOD contains 8 datasets with 14 domain selections. When combined with covariate, concept, and no shifts, we obtain 42 different splits. We provide performance results on 7 commonly used baseline methods with 10 random runs. This results in 294 dataset-model combinations in total. Our results show significant performance gaps between in-distribution and OOD settings. We hope our results also shed light on different performance trends between covariate and concept shifts by different methods. This GOOD benchmark is a growing project and expects to expand in both quantity and variety of resources as the area develops. Any contribution is welcomed!
Whether you are an experienced researcher for graph out-of-distribution problem or a first-time learner of graph deep learning, here are several reasons for you to use GOOD as your Graph OOD research, study, and development toolkit.
- Easy-to-use APIs: GOOD provides simple apis for loading OOD algorithms, graph neural networks, and datasets, so that you can take only several lines of code to start.
- Flexibility: Full OOD split generalization code is provided for extensions and any new graph OOD dataset contributions. OOD algorithm base class can be easily overwrite to create new OOD methods.
- Easy-to-extend architecture: In addition to play as a package, GOOD is also an integrated and well-organized project ready to be further developed.
All algorithms, models, and datasets can be easily registered by
registerand automatically embedded into the designed pipeline like a breeze! The only thing the user need to do is writing your own OOD algorithm class, your own model class, or your new dataset class. Then you can compare your results with the leaderboard. - Easy comparisons with the leaderboard: We provide insightful comparisons from multiple perspectives. Any researches and studies can use our leaderboard results for comparison. Note that, this is a growing project, so we will include new OOD algorithms gradually. Besides, if you hope to include your algorithms in the leaderboard, please contact us or contribute to this project. A big welcome!
- Reproducibility:
- OOD Datasets: GOOD provides full OOD split generalization code for reproduction, and generation for new datasets.
- Leaderboard results: One random seed round results are provided, and loaded models pass the test result reproduction.
GOOD depends on PyTorch (>=1.6.0), PyG (>=2.0), and RDKit (>=2020.09.5). For more details: conda environment
Note that we currently test on PyTorch (==1.10.1), PyG (==2.0.3), RDKit (==2020.09.5); thus we strongly encourage to install these versions.
Attention! Due to a known issue, please install PyG through Pip to avoid incompatibility.
pip install graph-oodgit clone https://github.com/divelab/GOOD.git && cd GOOD
pip install -e .There are two ways to import 8 GOOD datasets with 14 domain selections and totally 42 splits:
# Directly import
from GOOD.data.good_datasets.good_hiv import GOODHIV
hiv_datasets = GOODHIV.load(dataset_root, domain='scaffold', shift='covariate', generate=False)
# Or using register
from GOOD import register as good_reg
hiv_datasets = good_reg.datasets['GOODHIV'].load(dataset_root, domain='scaffold', shift='covariate', generate=False)
cmnist_datasets = good_reg.datasets['GOODCMNIST'].load(dataset_root, domain='color', shift='concept', generate=False)The best and fair way to compare algorithms with the leaderboard is to use the same and similar graph encoder structure;
therefore, we provide GOOD GNN apis to support. Here, we use an objectified dictionary config to pass parameters. More
details about the config: Document of config (pending)
To use exact GNN
from GOOD.networks.models.GCNs import GCN
model = GCN(config)
# Or
from GOOD import register as good_reg
model = good_reg.models['GCN'](config)To only use parts of GNN
from GOOD.networks.models.GINvirtualnode import GINEncoder
encoder = GINEncoder(config)Try to apply OOD algorithms to your own models?
from GOOD.ood_algorithms.algorithms.VREx import VREx
ood_algorithm = VREx(config)
# Then you can provide it to your model for necessary ood parameters,
# and use its hook-like function to process your input, output, and loss.It is a good beginning to directly make it work. Here, we provide command line script goodtg (GOOD to go) to access the main function located at GOOD.kernel.pipeline:main.
Choosing a config file in configs/GOOD_configs, we can start a task:
goodtg --config_path GOOD_configs/GOODCMNIST/color/concept/DANN.yamlSpecifically, the task is clearly divided into three parts:
- Config
from GOOD import config_summoner
from GOOD.utils.args import args_parser
from GOOD.utils.logger import load_logger
args = args_parser()
config = config_summoner(args)
load_logger(config)- Loader
from GOOD.kernel.pipeline import initialize_model_dataset
from GOOD.ood_algorithms.ood_manager import load_ood_alg
model, loader = initialize_model_dataset(config)
ood_algorithm = load_ood_alg(config.ood.ood_alg, config)Or concretely,
from GOOD.data import load_dataset, create_dataloader
from GOOD.networks.model_manager import load_model
from GOOD.ood_algorithms.ood_manager import load_ood_alg
dataset = load_dataset(config.dataset.dataset_name, config)
loader = create_dataloader(dataset, config)
model = load_model(config.model.model_name, config)
ood_algorithm = load_ood_alg(config.ood.ood_alg, config)- Train/test pipeline
from GOOD.kernel.pipeline import load_task
load_task(config.task, model, loader, ood_algorithm, config)Or concretely,
# Train
from GOOD.kernel.train import train
train(model, loader, ood_algorithm, config)
# Test
from GOOD.kernel.evaluation import evaluate
test_stat = evaluate(model, loader, ood_algorithm, 'test', config)