Skip to content

feat: add sortable keys for record linkage#654

Draft
adamdecaf wants to merge 1 commit into
moov-io:masterfrom
adamdecaf:feat-add-record-linkage
Draft

feat: add sortable keys for record linkage#654
adamdecaf wants to merge 1 commit into
moov-io:masterfrom
adamdecaf:feat-add-record-linkage

Conversation

@adamdecaf

@adamdecaf adamdecaf commented Jul 2, 2025

Copy link
Copy Markdown
Member

The idea is to generate a list of sortable keys (buckets the fields hash into) so that we can find records which are similar. You can do a multi-compare against these and grab rows which are greater/less than the keys to shrink the amount of detailed similarity scoring calls to make.

"TYPE:0230"
"NAME:0190"

// Country | Type | Identifier
"GOVID:C0173|T0190|X0146"

// Country | State | PostalCode | City | Line1 | Line2 [optional]
"ADDR:C0143|S0021|P0007|Y0023|L0201,0028,0173"

You could then compute some traditional string distance metrics over these sortable keys to rank what's most similar. The keys move from general data to more specific.

With broad fields on the left this allows for prefix filtering in SQL. You could strip out Line1/Line2 data and filter down to a city level. Or find the rows nearby to an exact address by grabbing those greater and less than the target.

@adamdecaf adamdecaf force-pushed the feat-add-record-linkage branch from c342518 to d1a2f71 Compare November 11, 2025 22:43
@adamdecaf adamdecaf force-pushed the feat-add-record-linkage branch from d1a2f71 to 6961426 Compare June 2, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant