This repository hosts tooling for working with mixed-type temporal signals and phage module annotations. It contains two complementary components:
- Hybrid DS-HDP-HMM sampler – a disentangled sticky HDP-HMM implementation
capable of handling Gaussian, categorical and beta-count observations through
a unified emission model. The implementation lives in the
hybrid/package and exposes theHybridEmissionModel,HybridStateStatisticsandDSHDPHMMHybridSamplerclasses for probabilistic sequence modelling. - Fused Gromov–Wasserstein (FGW) comparison pipeline – utilities to align
and compare phage genome module annotations. The
hybrid.fgw_pipelinemodule builds the cross-module affinity matrix, distance matrices and transport plan used for the FGW alignment. A thin CLI wrapper is provided inhybrid.compare_fgw.
The repository also ships a small collection of annotated example genomes under
phages-example-datas/ that can be used to exercise the
pipeline.
- Python 3.9 or newer
- NumPy
- SciPy
- Matplotlib
- POT (
pip install pot) for the FGW solver - pytest for running the automated tests (optional)
A virtual environment is highly recommended:
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install numpy scipy matplotlib pot pytest| Path | Description |
|---|---|
hybrid/ |
Hybrid emission DS-HDP-HMM sampler, FGW comparison utilities and the CLI entry point. |
phages-example-datas/ |
Example phage module annotations (*_segments.tsv) and optional gene annotations (*_genes.tsv). |
hybrid/tests/ |
Pytest-based regression test that exercises the FGW pipeline end-to-end. |
The command line interface wraps compare_phages_fgw and materialises all
artefacts (transport plan, affinity matrices, report and plots) in an output
directory. A minimal example using the bundled data looks as follows:
python -m hybrid.compare_fgw \
--a phages-example-datas/GU988610.2/GU988610.2_segments.tsv \
--b phages-example-datas/IMGVR_UViG_2684623197_000002/IMGVR_UViG_2684623197_000002_segments.tsv \
--out /tmp/fgw-comparisonThe command produces the following files inside /tmp/fgw-comparison:
W.npy,T.npy,D_A.npy,D_B.npy– NumPy arrays describing the feature matrix, optimal transport plan and intra-phage distances.fgwdistance.json,p.json,q.json– JSON summaries of the numerical outputs.heatmap.png– a heatmap visualising the affinity matrix and transport plan.report.json– a structured report containing coverage statistics, runtime metadata and the candidate offsets explored during matching.cli_summary.json– a short JSON blob mirroring the return value of the API.
For programmatic use you can import the API directly:
from hybrid.fgw_pipeline import compare_phages_fgw
result = compare_phages_fgw(
"phages-example-datas/GU988610.2/GU988610.2_segments.tsv",
"phages-example-datas/IMGVR_UViG_2684623197_000002/IMGVR_UViG_2684623197_000002_segments.tsv",
"./outputs",
method="voting",
alpha=0.6,
reg=1e-2,
max_iter=200,
tol=1e-6,
)
print(result["fgw_distance"])Refer to the function docstrings in
hybrid/fgw_pipeline.py for a detailed description of
all parameters.
After installing the dependencies you can verify the pipeline end-to-end:
pytest hybrid/tests/test_fgw_pipeline.pyThe test downloads no external resources; it reuses the data shipped with the repository and writes temporary artefacts into a pytest-managed directory.
If you use this code or derive ideas from it in academic work, please cite:
- Zhou, D., Gao, Y., Paninski, L. (2020). Disentangled sticky hierarchical Dirichlet process hidden Markov model. ECML. arXiv:2004.03019