🧠🔗 CoPCA: Capturing Symbolic Knowledge of Constraints and Incompleteness to Guide Inductive Learning in Neuro-Symbolic Knowledge Graph Completion
Welcome to the official repository for CoPCA, a novel framework that integrates symbolic constraints and incomplete knowledge to guide neuro-symbolic learning. This pipeline enhances the quality of Knowledge Graph Embeddings (KGEs) through logical rule mining, heuristic categorization, and constraint-based learning — paving the way for more explainable and robust downstream tasks such as link prediction.
The CoPCA Pipeline follows these major steps:
- Validation of the Knowledge Graph (KG) using SHACL constraints.
- Mining of Horn rules over the KG using AMIE.
- CoPCA model: logical rules into valid and invalid heuristics.
- Transformation of the input KG into a refined KG′ using symbolic knowledge.
- Numerical Knowledge Graph Embedding of KG′ using state-of-the-art KGE models.
- Downstream Tasks: Link Prediction such as link prediction on vectorized KG representations.
Example downstream task: Predicting whether a football player (yago:Ronaldo) is affiliated with a particular sports team (yago:Portugal).
├── KG/ # Original, valid, and invalid KGs for benchmarks
│ ├── french_royalty/
│ ├── YAGO3-10/
│ └── DB100K/
│
├── Rules/ # AMIE-mined Horn rules over KGs
│
├── Constraints/ # SHACL constraints for respective KGs
│
├── Symbolic Learning/ # Scripts for heuristic transformation and categorization
│
├── Numerical Learning/ # KGE pipeline scripts (kge.py) and configs (input.json)
├── requirements.txt # Necessary dependencies
└── README.md
KG Size | Benchmark | #Triples | #Entities | #Relations |
---|---|---|---|---|
Large | DB100K | 695,572 | 99,604 | 470 |
Medium | YAGO3-10 | 1,080,264 | 123,086 | 37 |
Small | French Royalty | 10,526 | 2,601 | 12 |
KG Size | Benchmark | #Constraints | #Valid | #Invalid |
---|---|---|---|---|
Large | DB100K | 6 | 406,533 | 45,842 |
Medium | YAGO3-10 | 4 | 407,480 | 44,444 |
Small | French Royalty | 2 | 1,979 | 243 |
We evaluate KG completion using embedding models:
- TransE, TransH, TransD
- RotatE, ComplEx, TuckER
- CompGCN
Metrics reported:
- Hits@1, Hits@3, Hits@5, Hits@10
- Mean Reciprocal Rank (MRR)
git clone https://github.com/SDM-TIB/CoPCA.git
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Naviagte to Symbolic Learning/
directory and follow the steps below:
python Validation.py
This script will excute SHACL constraints over repsective benchmark KG.
python constraint-driven-pca-calculator.py --input input.json
This script estimate the metrics PCA_valid and PCA_invalid.
python Symbolic_predictions.py --input input-symbolicPred.json
This script generates the enriched KG based on the selection of valid and invalid rules. Further, this KG is utilized to measure the performance of numerical learning models.
Now navigate to Numerical Learning
directory to execute KGE models
python kge.py
This script will take input.json
as an input to select respective KGE models, becnhmark KGs and the path to store the results (e.g., H@1).
- French Royalty KG
- YAGO3-10
- DB100K
Each benchmark includes original, validated, and constraint-filtered variants of the KG. Find DB100K and YAGO3-10 benchmarks in Leibniz Data Manager: https://doi.org/10.57702/y3f76e2h
- PyKEEN – _Ali et al., 2021: paper
- SPaRKLE – Purohit et al., 2023: paper
- VISE – Purohit et al., 2024: paper
- VANILLA – Purohit et al., 2025: paper
CoPCA has been developed by members of the Scientific Data Management Group at TIB, as an ongoing research effort. The development is co-ordinated and supervised by Maria-Esther Vidal. Developed and maintained by:
- Disha Purohit and Yashrajsinh Chudasama
- Feel free to reach out for any issues related to reproducibility or implementation at: disha.purohit@tib.eu
This project is licensed under the MIT License.
This project builds upon contributions and tools from the neuro-symbolic and knowledge representation communities.