22 releases

0.9.0 Apr 5, 2026
0.8.5 Apr 6, 2025
0.7.7 Dec 27, 2024
0.7.5 Sep 20, 2024
0.5.5 Mar 4, 2022

#110 in Biology

MIT and GPL-3.0 licenses

4.5MB
6K SLoC

Rust 4.5K SLoC // 0.1% comments Shell 1K SLoC // 0.1% comments

nwr - NCBI taxonomy/assembly WRangler

Publish Build Codecov Crates.io Lines of code Documentation

nwr is a command line NCBI taxonomy and assembly WRangler.

Install

Current release: 0.9.0

cargo install nwr

# or
cargo install --path . --force # --offline

nwr help

$ nwr help
`nwr` is a command line **N**CBI taxonomy and assembly **WR**angler.

Usage: nwr [COMMAND]

Commands:
  download     Download the latest releases of `taxdump` and assembly reports
  txdb         Init the taxonomy database
  ardb         Init the assembly database
  info         Information of Taxonomy ID(s) or scientific name(s)
  lineage      Output the lineage of the term
  member       List members (of certain ranks) under ancestral term(s)
  append       Append fields of higher ranks to a TSV file
  restrict     Restrict taxonomy terms to ancestral descendants
  common       Output the common tree of terms
  template     Create dirs, data and scripts for a phylogenomic research
  kb           Prints docs (knowledge bases)
  seqdb        Init the seq database
  help         Print this message or the help of the given subcommand(s)

Options:
  -h, --help     Print help
  -V, --version  Print version

Subcommand groups:

* Database
    * download / txdb / ardb
* Taxonomy
    * info / lineage / member / append / restrict / common
* Assembly
    * template / kb / seqdb

Examples

Usage of each command

For practical uses of nwr and other awesome companions, follow this page.

nwr download

nwr txdb

nwr info "Homo sapiens" 4932

nwr lineage "Homo sapiens"
nwr lineage 4932

nwr restrict "Vertebrata" -c 2 -f tests/nwr/taxon.tsv
##sci_name       tax_id
#Human   9606

nwr member "Homo"

nwr append tests/nwr/taxon.tsv -c 2 -r species -r family --id

nwr ardb
nwr ardb --genbank

nwr common "Escherichia coli" 4932 Drosophila_melanogaster 9606 Mus_musculus

# rm ~/.nwr/*.dmp

Development

cargo test --color=always --package nwr --test cli_nwr command_template -- --show-output

# debug mode has a slow connection
cargo run --release --bin nwr download

# tests/nwr/
cargo run --bin nwr txdb -d tests/nwr/

cargo run --bin nwr info -d tests/nwr/ --tsv Viruses "Actinophage JHJ-1" "Bacillus phage bg1"

cargo run --bin nwr common -d tests/nwr/ "Actinophage JHJ-1" "Bacillus phage bg1"

cargo run --bin nwr template tests/assembly/Trichoderma.assembly.tsv --ass -o stdout

seqdb

export SPECIES="$HOME/data/Archaea/Protein/Sulfolobus_acidocaldarius"

cargo run --bin nwr seqdb -d ${SPECIES} --init --strain

cargo run --bin nwr seqdb -d ${SPECIES} \
    --size <(
        hnsm size ${SPECIES}/pro.fa.gz
    ) \
    --clust

cargo run --bin nwr seqdb -d ${SPECIES} \
    --anno <(
        gzip -dcf "${SPECIES}"/anno.tsv.gz
    ) \
    --asmseq <(
        gzip -dcf "${SPECIES}"/asmseq.tsv.gz
    )

cargo run --bin nwr seqdb -d ${SPECIES} --rep f1="${SPECIES}"/fam88_cluster.tsv

echo "
    SELECT
        *
    FROM asm
    WHERE 1=1
    " |
    sqlite3 -tabs ${SEQ_DIR}/seq.sqlite

echo "
    SELECT
        COUNT(distinct asm_seq.asm_id)
    FROM asm_seq
    WHERE 1=1
    " |
    sqlite3 -tabs ${SEQ_DIR}/seq.sqlite

echo "
.header ON
    SELECT
        'species' AS species,
        COUNT(distinct asm_seq.asm_id) AS strain,
        COUNT(*) AS total,
        COUNT(distinct rep_seq.seq_id) AS dedup,
        COUNT(distinct rep_seq.rep_id) AS rep
    FROM asm_seq
    JOIN rep_seq ON asm_seq.seq_id = rep_seq.seq_id
    WHERE 1=1
    " |
    sqlite3 -tabs ${SEQ_DIR}/seq.sqlite


Plots

# venn
nwr plot venn \
    tests/plot/rocauc.result.tsv \
    tests/plot/mcox.05.result.tsv |
    tectonic - &&
    mv texput.pdf venn2.pdf

nwr plot venn \
    tests/plot/rocauc.result.tsv \
    tests/plot/mcox.05.result.tsv \
    tests/plot/mcox.result.tsv |
    tectonic - &&
    mv texput.pdf venn3.pdf

nwr plot venn \
    tests/plot/rocauc.result.tsv \
    tests/plot/rocauc.result.tsv \
    tests/plot/mcox.05.result.tsv \
    tests/plot/mcox.result.tsv |
    tectonic - &&
    mv texput.pdf venn4.pdf

plotr venn tests/plot/rocauc.result.tsv tests/plot/mcox.05.result.tsv

tectonic docs/venn4.tex

# histo
nwr plot hh tests/plot/hist.tsv -g 2 --bins 20 --xl "" --unit 0.5,1.5 |
    tectonic - &&
    mv texput.pdf hist.pdf

nwr plot hh tests/plot/hist.tsv --bins 30 --xl "" --xmm 45,75 --unit 0.5,1.5 |
    tectonic - &&
    mv texput.pdf hist.pdf

cargo run --bin nwr plot hh tests/plot/adomain.tsv -g 2 --bins 40 --xl "" --yl "" --unit 0.3,0.5 |
    tectonic - &&
    mv texput.pdf hist.pdf

tectonic docs/heatmap.tex

# nrps
cargo run --bin nwr plot nrps tests/plot/srf.tsv --legend --color blue |
    tectonic - &&
    mv texput.pdf srf.pdf

tectonic docs/nrps.tex

tectonic docs/da.tex

Database schema

brew install k1LoW/tap/tbls

tbls doc sqlite://./tests/nwr/taxonomy.sqlite docs/txdb

tbls doc sqlite://./tests/nwr/ar_refseq.sqlite docs/ardb

txdb

ardb

Dependencies

~78MB
~1M SLoC