Phylotrajectories.jl infers cell-type phylogenies from single-cell
clonotype-by-cell-type count matrices. It treats clonotype frequencies
as an Ornstein–Uhlenbeck (OU) process — a Brownian motion with
mean-reversion towards an equilibrium — diffusing along an unknown tree
of cell phenotypes, and samples the joint posterior over topology,
branch lengths and OU parameters via Metropolis-Hastings MCMC.
See the docs for the full reference.
using Pkg
Pkg.add(url = "https://github.com/MurrellGroup/Phylotrajectories.jl")The repository ships with a tiny simulated count matrix under
examples/data/simulated_clone_data.csv and a runnable end-to-end
notebook at examples/usage_example.ipynb.
using Phylotrajectories, StatsBase
# Wide-form CSV: rows are clonotypes, columns are cell-type subsets.
_, cluster_names, _, count_matrix = import_count_matrix(
"examples/data/simulated_clone_data.csv",
)
# Subsample for a quick demo
sampled_indices = sample(1:size(count_matrix, 2), 750; replace = false)
count_matrix = count_matrix[:, sampled_indices]
plot_init, init_tree, trees, LLs, models, root_ps, upd =
tree_inference(
OUContinuousModel(burn_in = 5_000, sample_interval = 100, n_samples = 100),
cluster_names, count_matrix;
eqmu = 1.5, eqtheta = 0.1, v = 1.0, d = 0.5, g = 0.5,
)
ladderize!.(trees)
hip, node2logcred, node2support = HIPSTR(trees; getcred = true, getsupport = true)The notebook also produces the diagnostic dashboard, a HIPSTR credibility-annotated tree, and a Newick of the consensus.
import_count_matrix accepts two file shapes:
clono_info, cluster_names, cluster_sizes, count_matrix = import_count_matrix(
"data/clone_data_HDM.tsv",
:Clonotype, :cell_types, :TRB_cdr3aa,
cluster_filters = ["Proliferating"],
):Clonotype, :cell_types and :TRB_cdr3aa are the column names that hold
the clonotype, cell-type label and TRB CDR3 sequence respectively. The
cluster_filters keyword drops named clusters before pivoting.
clono_info, cluster_names, cluster_sizes, count_matrix =
import_count_matrix("data/Clone_counts_HDM.csv")plot_init, init_tree, trees, LLs, models, root_ps, upd =
tree_inference(
OUContinuousModel(burn_in = 2_000, sample_interval = 50, n_samples = 200),
cluster_names, count_matrix;
eqmu = 1.5, eqtheta = 0.1, v = 1.0, d = 0.5, g = 0.5,
)The returned tuple is:
plot_init—Plots.Plotof the starting tree (handy for sanity checks),init_tree—FelNodeactually used to start MCMC,trees—Vector{FelNode}of posterior topology samples,LLs— log-likelihoods at the sampled points,models— sampled(θ, v, μ)triples,root_ps— sampled root state distributions,upd— theUpdateobject holding acceptance ratios and proposal stats.
See the Models & parameters
reference for every available knob on OUContinuousModel.
src/ # package source
inference/ # OU MCMC samplers
viz/ # tree-on-UMAP plotting helpers
utils/ # post-processing helpers (HIPSTR, clone matrices, metrics)
importing.jl # import_count_matrix
simulations.jl # sim_count_matrix
docs/ # Documenter source
examples/ # bundled simulated dataset + Jupyter notebook
test/ # unit tests