Skip to content

MurrellGroup/Phylotrajectories.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phylotrajectories

Stable Dev Build Status

Phylotrajectories.jl infers cell-type phylogenies from single-cell clonotype-by-cell-type count matrices. It treats clonotype frequencies as an Ornstein–Uhlenbeck (OU) process — a Brownian motion with mean-reversion towards an equilibrium — diffusing along an unknown tree of cell phenotypes, and samples the joint posterior over topology, branch lengths and OU parameters via Metropolis-Hastings MCMC.

See the docs for the full reference.

Installation

using Pkg
Pkg.add(url = "https://github.com/MurrellGroup/Phylotrajectories.jl")

Quick start on the bundled simulated dataset

The repository ships with a tiny simulated count matrix under examples/data/simulated_clone_data.csv and a runnable end-to-end notebook at examples/usage_example.ipynb.

using Phylotrajectories, StatsBase

# Wide-form CSV: rows are clonotypes, columns are cell-type subsets.
_, cluster_names, _, count_matrix = import_count_matrix(
    "examples/data/simulated_clone_data.csv",
)

# Subsample for a quick demo
sampled_indices = sample(1:size(count_matrix, 2), 750; replace = false)
count_matrix = count_matrix[:, sampled_indices]

plot_init, init_tree, trees, LLs, models, root_ps, upd =
    tree_inference(
        OUContinuousModel(burn_in = 5_000, sample_interval = 100, n_samples = 100),
        cluster_names, count_matrix;
        eqmu = 1.5, eqtheta = 0.1, v = 1.0, d = 0.5, g = 0.5,
    )

ladderize!.(trees)
hip, node2logcred, node2support = HIPSTR(trees; getcred = true, getsupport = true)

The notebook also produces the diagnostic dashboard, a HIPSTR credibility-annotated tree, and a Newick of the consensus.

Importing data

import_count_matrix accepts two file shapes:

Long-form (one row per cell)

clono_info, cluster_names, cluster_sizes, count_matrix = import_count_matrix(
    "data/clone_data_HDM.tsv",
    :Clonotype, :cell_types, :TRB_cdr3aa,
    cluster_filters = ["Proliferating"],
)

:Clonotype, :cell_types and :TRB_cdr3aa are the column names that hold the clonotype, cell-type label and TRB CDR3 sequence respectively. The cluster_filters keyword drops named clusters before pivoting.

Wide-form (already-pivoted CSV)

clono_info, cluster_names, cluster_sizes, count_matrix =
    import_count_matrix("data/Clone_counts_HDM.csv")

Performing inference

plot_init, init_tree, trees, LLs, models, root_ps, upd =
    tree_inference(
        OUContinuousModel(burn_in = 2_000, sample_interval = 50, n_samples = 200),
        cluster_names, count_matrix;
        eqmu = 1.5, eqtheta = 0.1, v = 1.0, d = 0.5, g = 0.5,
    )

The returned tuple is:

  • plot_initPlots.Plot of the starting tree (handy for sanity checks),
  • init_treeFelNode actually used to start MCMC,
  • treesVector{FelNode} of posterior topology samples,
  • LLs — log-likelihoods at the sampled points,
  • models — sampled (θ, v, μ) triples,
  • root_ps — sampled root state distributions,
  • upd — the Update object holding acceptance ratios and proposal stats.

See the Models & parameters reference for every available knob on OUContinuousModel.

Repository layout

src/                # package source
  inference/        # OU MCMC samplers
  viz/              # tree-on-UMAP plotting helpers
  utils/            # post-processing helpers (HIPSTR, clone matrices, metrics)
  importing.jl      # import_count_matrix
  simulations.jl    # sim_count_matrix
docs/               # Documenter source
examples/           # bundled simulated dataset + Jupyter notebook
test/               # unit tests

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages