CorrectMatch

Installation

CorrectMatch requires gfortran to precompile the mvndst routines.

On macOS, using Homebrew, run: brew install gcc

On GNU/Linux, install the gfortran package with your preferred package manager:

sudo apt-get install gfortran  # on Debian-based systems
sudo pacman -S gcc-gfortran    # on Archlinux-based systems

Usage

CorrectMatch works directly with DataFrames, automatically handling categorical variables:

using CorrectMatch
using DataFrames

# Load your data as a DataFrame
df = DataFrame(
    age = [25, 30, 35, 25, 30],
    gender = ["M", "F", "X", "M", "M"],
    city = ["NYC", "LA", "NYC", "NYC", "SF"]
)

# Compute population metrics directly on DataFrame
uniqueness(df)   # 0.60 (fraction of unique records)
correctness(df)  # 0.80 (fraction of correctly re-identifiable records)

# Fit a Gaussian copula model
G = fit_mle(GaussianCopula, df)

# Generate synthetic data
d_sim = rand(G, 100)
uniqueness(d_sim)  # e.g., 0.04

Individual uniqueness and correctness can be computed for any record:

indiv = df[1, :]
individual_uniqueness(G, indiv, 100)  # e.g., 0.20
individual_correctness(G, indiv, 100)  # e.g., 0.50

# Or pass raw values
individual_uniqueness(G, [35, "M", "NYC"], 100)  # e.g., 0.35

Working with integer matrices

The codebase also supports working directly with integer matrices, where each column represents a categorical variable encoded as integers starting from 1. This allows for using the exact_marginal=false option for better fitting distributions in small sparse datasets.

# Create a simple dataset of 1000 records and 3 columns
d = rand(1:10, 1000, 3)
uniqueness(d)

G = fit_mle(GaussianCopula, d)
d_sim = rand(G, 1000)
uniqueness(d_sim)
correctness(d_sim)

# Individual metrics (values must be 1-indexed for matrix API)
individual_uniqueness(G, [5, 5, 5], 1000)
individual_correctness(G, [5, 5, 5], 1000)

See the examples folder to learn how to load a CSV file and estimate the metrics from a small sample.

License

GNU General Public License v3.0

See LICENSE to see the full text.

Patent-pending code. Additional support and details are available for commercial uses.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
examples		examples
src		src
test		test
.JuliaFormatter.toml		.JuliaFormatter.toml
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CorrectMatch

Installation

Usage

Working with integer matrices

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Languages

License

imperial-aisp/CorrectMatch.jl

Folders and files

Latest commit

History

Repository files navigation

CorrectMatch

Installation

Usage

Working with integer matrices

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Languages

Packages