Skip to content

A PyTorch implementation of Knowledge Graph Pseudo-Labeling (KGPL) for Cold-Start Mitigation in Personalized Recommendation Systems. (https://arxiv.org/abs/2011.05061)

License

Notifications You must be signed in to change notification settings

dna-witch/KGPL-PyTorch

Repository files navigation

Alleviating Cold-Start Problems in Recommendation through Pseudo-Labelling over Knowledge Graph

A PyTorch implementation of Knowledge Graph Pseudo-Labeling (KGPL), inspired by the paper: "Alleviating Cold-Start Problems in Recommendation through Pseudo-Labelling over Knowledge Graph" by Riku Togashi, Mayu Otani, and Shin’ichi Satoh. (https://arxiv.org/abs/2011.05061)

Overview

This project provides a functional PyTorch implementation of the KGPL model, which addresses cold-start problems in personalized recommendation systems by pseudo-labeling over knowledge graphs.

The recommendation model uses a knowledge graph to identify potential positive items for each user by focusing on neighbors in the graph structure, and treats unobserved user-item interactions as weakly-positive instances via pseudo-labeling. To mitigate popularity bias, the model uses an improved negative sampling strategy. The recommender also implements a co-training approach with dual student models to improve learning stability and robustness.

Results

The PyTorch reimplementation of KGPL demonstrates stable co-training dynamics and successfully replicates the original model’s behavior. Both student models (f and g) showed synchronized convergence over 40 epochs, with training loss decreasing from ~5.04 to ~1.81, confirming effective learning from both observed and pseudo-labeled instances.

Training Loss over Co-Training Epochs

Validation Performance

  • Recall@20 increased from ~0.67% (epoch 1) to ~15.3% (epoch 40)
  • Recall@10 reached ~9.4%
  • Recall@5 reached ~6.2%

Most learning occurred in the first 20 epochs, followed by gradual fine-tuning. Validation metrics plateaued without decline, indicating no overfitting.

Validation Recall across Cutoffs over Epochs

Cold-Start Analysis

  • Users with ≤1 interaction: Recall@20 ~8.3%
  • Users with ≤2 interactions: Recall@20 ~20.3%

Performance improves steadily as interaction history increases, showing that the KGPL model effectively mitigates cold-start issues using pseudo-labeling.

Top-K Test Set Evaluation

Metric PyTorch Implementation TensorFlow Implementation
Recall@5 7.1% 9.93%
Recall@10 12.4% 15.47%
Recall@20 17.6% 22.25%
Precision@20 2.0% 2.3%

The PyTorch model's metrics are closely aligned with the TensorFlow version, showing consistent performance trends. The small differences are likely due to minor implementation or environment setup differences, and overall, the reimplementation was successful.


Repository Structure

KGPL-PyTorch/
├── conf/                   # Configuration files for experiments
├── data/                   # Datasets and data loaders
├── preprocess/             # Data preprocessing scripts
├── utils/                  # Utility functions and evaluation metrics
├── model.py                # Implementation of the KGPL model
├── KGPL_MUSIC_FINAL_40.ipynb  # Example notebook demonstrating usage
├── requirements.txt        # Python dependencies
├── CHANGELOG.md            # Record of changes and updates
├── LICENSE                 # MIT License
└── README.md               # Project overview and instructions

Getting Started

Run the following commands to clone the repository, create a virtual environment, and install the required packages to set up the model environment.

git clone https://github.com/dna-witch/KGPL-PyTorch.git
cd KGPL-PyTorch
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage and Workflow

The KGPL_MUSIC_FINAL_40.ipynb notebook provides a step-by-step example of preprocessing data, co-training, and evaluating the KGPL model on a benchmark dataset. It's a great starting point to understand the workflow and experiment with the recommender model!

Contributors

Shakuntala Mitra @dna-witch

Taylor Hawks @taylorhawks


Changelog

Adding to the changelog

First, identify the last commit hash recorded in CHANGELOG.md. Then, use the following command (replacing LAST_COMMIT_HASH with the actual hash):

git log --pretty=format:"## %h%n #### %ad %n%n%s%n%n%b%n" --date=short LAST_COMMIT_HASH..HEAD >> CHANGELOG.md

This appends all new commits since LAST_COMMIT_HASH to the end of the changelog.

Citation

If you find this implementation useful for your research, please cite the original paper:

@article{togashi2020alleviating,
  title={Alleviating Cold-Start Problems in Recommendation through Pseudo-Labelling over Knowledge Graph},
  author={Togashi, Riku and Otani, Mayu and Satoh, Shin’ichi},
  journal={arXiv preprint arXiv:2011.05061},
  year={2020}
}

About

A PyTorch implementation of Knowledge Graph Pseudo-Labeling (KGPL) for Cold-Start Mitigation in Personalized Recommendation Systems. (https://arxiv.org/abs/2011.05061)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •