Deterministic record-linkage and data enrichment pipeline designed for multi-source datasets without a shared primary key.
-
Updated
Dec 29, 2025 - Python
Deterministic record-linkage and data enrichment pipeline designed for multi-source datasets without a shared primary key.
A fully customizable tool for generating comprehensive data linkage quality reports.
Streamlining the algorithm design and testing process of data linkage by removing the programmatic requirements from data analysts through R scripts and a database of linkage tests.
Minimizing time spent by data analysts on the cleaning and processing of data by providing a user friendly system for processing source data into a clean format.
Stata package to implement probabilistic record linkage
Merging Data from UK Company House RDF databases and Wikidata using OWL2 and Python
A collection of awesome resources regarding Record Linkage.
Fast and simple probabilistic data matching package
Interpretable metadata for the results of NHS England record linkage
KDD'23 | MedLink: De-Identified Patient Health Record Linkage
Python package containing functions used for deterministically matching a Post Enumeration Survey to census data
The aim is to implement and evaluate a data linkage system and a classification system using sound data science principles.
Add a description, image, and links to the data-linkage topic page so that developers can more easily learn about it.
To associate your repository with the data-linkage topic, visit your repo's landing page and select "manage topics."