52.3194° N · 4.9421° E · v2.0
§ 01 · Overview2.0M points · 14ms / query

Trace any path
through millions
of points.

Like UMAP, but neighbors are preserved exactly. TMAP builds a minimum spanning tree over your data - so you can compute shortest paths, measure tree distance, and extract sub-trees that scatter-plot embeddings can’t answer.

Metric
jaccard · euclidean · cosine · custom
Data
molecules · proteins · embeddings · images · text
License
MIT · open source
egfr_sar.tmaplive · drag to explore
10,482 molecules · MHFP6t-MAP v2.0
Figure 1. EGFR inhibitor SAR. Each node is a molecule; nearby nodes share structural features.
§ 02  ·  How it works

Four steps from vectors to a navigable map.

Step 01
vectors
A 2D array - fingerprints, embeddings, counts, anything numeric.
Step 02
kNN graph
Each point connects to its k nearest neighbors under the chosen metric.
Step 03
spanning tree
Prim's algorithm extracts a single edge per node - no cycles.
Step 04
2D layout
Force-directed layout of the tree in two navigable dimensions.
→ The idea

TMAP builds a minimum spanning tree over a nearest-neighbor graph, then lays that tree out in two dimensions. Unlike t-SNE or UMAP, neighbors are preserved exactly- because they’re connected by tree edges, there is no lossy projection from the high-dimensional space.

The graph is the source of truth; the layout is one way of looking at it. Change the random seed and you get a different drawing of the same tree.

→ The pipeline

The pipeline has four stages: approximate nearest neighbors (MinHash LSH for binary inputs, cosine or euclidean distance for continuous ones), a sparse kNN graph, Prim’s algorithm for the spanning tree, and a force-directed layout.

Each stage is parallelized and streams. On a single workstation the full pipeline runs to tens of millions of points- no GPU required. Larger jobs distribute trivially because the kNN step is embarrassingly parallel.

→ Why a tree

Because the result is an actual tree, you can compute exact shortest paths between any two points, measure tree distance, and extract sub-trees for analysis.

These are queries that scatter-plot embeddings can’t answer - and they’re what makes TMAP useful for SAR analysis, lineage tracing, and any task where the route between two points matters as much as the points themselves.

Try it: click any two points in the live demo to trace the path through the tree.
§ 03  ·  The tool

Not just a map - a workbench.

Build a TMAP, then explore it. These are three of the tools you get in the browser. For the full list, see the manual.

→ See all features
Figure 2a · filter panelinteractive
Figure 2a. pIC50 slider and target-class checkboxes dim 42% of 10k molecules without losing tree topology.
Figure 2a

Filter by any property, in real time.

Multi-column filter panels - numeric ranges, categoricals, computed expressions. The tree stays visible underneath so you can see how the subset relates to the rest.

Figure 2b · lasso + exportinteractive
Figure 2b. 412 molecules lassoed from a kinase-enriched sub-tree; mean pIC50 = 7.4.
Figure 2b

Lasso a region, export to CSV.

Draw a free-form selection to pull out any subset of the map. Live summary statistics on the selection; one-click CSV export for downstream analysis in pandas or a notebook.

Figure 2c · jupyter notebookinteractive
Jupyter notebook cell showing TMAP().fit(X) and an inline interactive scatter output
Figure 2c. The entire notebook path: import, fit, show. The output cell is a live, pannable, lasso-able scatter.
Figure 2c

Three lines in a notebook.

Sklearn-style API: TMAP().fit(X) and you have a map. Call .show() for an interactive inline view via jupyter-scatter - no HTML export, no extra files.

§ 04  ·  API

One shape, any data.

TMAP 2.0 is a single sklearn-style class. fit, transform, add_points, save, load. Works on anything you can represent as a vector.

Metric
jaccard · euclidean · cosine · custom
Data
molecules · proteins · embeddings · images · text
Install
pip install tmap2
# Build a TMAP from anything - here, sentence embeddings
from tmap import TMAP
from sentence_transformers import SentenceTransformer

enc = SentenceTransformer("all-MiniLM-L6-v2")
X = enc.encode(documents)

model = TMAP(metric="cosine", seed=42).fit(X)
model.plot("docs.html", color=topic)
§ 05  ·  Used by

In the wild.

Case 01 · 2024

Medicinal chemistry SAR exploration

TMAP over 2M internal compounds at a large pharma - used to identify scaffold hops between discovery programs and pull neighboring series for re-screening.

read →
Case 02 · 2024

Protein fold clustering

Embedding of 50k AlphaFold predictions into a single tree to identify evolutionarily related folds outside the known SCOP classes.

read →
Case 03 · 2023

Natural product diversity

COCONUT database visualization for scaffold diversity analysis in natural-product-inspired compound libraries.

read →
§ 06  ·  Try it

Drive it yourself.

Open full screen ↗
egfr_sar.tmap · 10,482 moleculeslive · drag · scroll · lasso
MHFP6 · JaccardShift-click to lasso · / to search
Figure 6. The full interactive viewer. Filter by property, search by ChEMBL ID, lasso a region, inspect any molecule's 2D structure.