#ontology #similarity #hpo

phrank

An ontology base similarity algorithm for patient wise similarity. Has been originally published by Karthik A. Jagadeesh 2018

17 releases

Uses new Rust 2024

new 0.2.16 Jun 14, 2026
0.2.15 Jun 10, 2026
0.2.11 Apr 30, 2026

#1787 in Algorithms

MIT license

31KB
551 lines

Phrank Similarity Engine

Phrank is a high-performance, phenotype-driven similarity engine designed to calculate the similarity between patient cohorts. By leveraging information theory, Phrank quantifies the significance of shared phenotypic features (such as Human Phenotype Ontology terms) based on their Information Content (IC). This allows for the rarity of a shared phenotype to dictate the weight of the similarity across the cohort.

Built with speed and scalability in mind, this crate utilizes parallel processing, sparse matrix representation, and efficient caching to compute pairwise similarity matrices for large patient cohorts rapidly.

🚀 Key Features

Information-Theory Driven: Uses Information Content (IC) to weight rare phenotypes higher than common ones. The algorithm automatically propagates annotations up the ontology tree.

High Performance: Parallelism: Utilizes rayon to perform parallelized Cartesian product calculations across the cohort.

Efficient Memory Usage: Employs sprs for sparse matrix storage to generate the coordinate matrix of similarity scores.

Smart Caching: Uses moka to cache expensive ancestor lookups in a thread-safe manner, significantly reducing redundant ontology traversals.

Extensible Architecture: Designed around the OntologyTraversal trait, allowing you to plug in different ontology backends seamlessly. It natively includes an adapter for ontolius.

Credit

Original Publication by Karthik A. Jagadeesh et al. here

Dependencies

~12MB
~188K SLoC