22 releases
| 0.9.0 | Apr 5, 2026 |
|---|---|
| 0.8.5 | Apr 6, 2025 |
| 0.7.7 | Dec 27, 2024 |
| 0.7.5 | Sep 20, 2024 |
| 0.5.5 | Mar 4, 2022 |
#110 in Biology
4.5MB
6K
SLoC
nwr - NCBI taxonomy/assembly WRangler
nwr is a command line NCBI taxonomy and assembly WRangler.
Install
Current release: 0.9.0
cargo install nwr
# or
cargo install --path . --force # --offline
nwr help
$ nwr help
`nwr` is a command line **N**CBI taxonomy and assembly **WR**angler.
Usage: nwr [COMMAND]
Commands:
download Download the latest releases of `taxdump` and assembly reports
txdb Init the taxonomy database
ardb Init the assembly database
info Information of Taxonomy ID(s) or scientific name(s)
lineage Output the lineage of the term
member List members (of certain ranks) under ancestral term(s)
append Append fields of higher ranks to a TSV file
restrict Restrict taxonomy terms to ancestral descendants
common Output the common tree of terms
template Create dirs, data and scripts for a phylogenomic research
kb Prints docs (knowledge bases)
seqdb Init the seq database
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
Subcommand groups:
* Database
* download / txdb / ardb
* Taxonomy
* info / lineage / member / append / restrict / common
* Assembly
* template / kb / seqdb
Examples
Usage of each command
For practical uses of nwr and other awesome companions, follow this page.
nwr download
nwr txdb
nwr info "Homo sapiens" 4932
nwr lineage "Homo sapiens"
nwr lineage 4932
nwr restrict "Vertebrata" -c 2 -f tests/nwr/taxon.tsv
##sci_name tax_id
#Human 9606
nwr member "Homo"
nwr append tests/nwr/taxon.tsv -c 2 -r species -r family --id
nwr ardb
nwr ardb --genbank
nwr common "Escherichia coli" 4932 Drosophila_melanogaster 9606 Mus_musculus
# rm ~/.nwr/*.dmp
Development
cargo test --color=always --package nwr --test cli_nwr command_template -- --show-output
# debug mode has a slow connection
cargo run --release --bin nwr download
# tests/nwr/
cargo run --bin nwr txdb -d tests/nwr/
cargo run --bin nwr info -d tests/nwr/ --tsv Viruses "Actinophage JHJ-1" "Bacillus phage bg1"
cargo run --bin nwr common -d tests/nwr/ "Actinophage JHJ-1" "Bacillus phage bg1"
cargo run --bin nwr template tests/assembly/Trichoderma.assembly.tsv --ass -o stdout
seqdb
export SPECIES="$HOME/data/Archaea/Protein/Sulfolobus_acidocaldarius"
cargo run --bin nwr seqdb -d ${SPECIES} --init --strain
cargo run --bin nwr seqdb -d ${SPECIES} \
--size <(
hnsm size ${SPECIES}/pro.fa.gz
) \
--clust
cargo run --bin nwr seqdb -d ${SPECIES} \
--anno <(
gzip -dcf "${SPECIES}"/anno.tsv.gz
) \
--asmseq <(
gzip -dcf "${SPECIES}"/asmseq.tsv.gz
)
cargo run --bin nwr seqdb -d ${SPECIES} --rep f1="${SPECIES}"/fam88_cluster.tsv
echo "
SELECT
*
FROM asm
WHERE 1=1
" |
sqlite3 -tabs ${SEQ_DIR}/seq.sqlite
echo "
SELECT
COUNT(distinct asm_seq.asm_id)
FROM asm_seq
WHERE 1=1
" |
sqlite3 -tabs ${SEQ_DIR}/seq.sqlite
echo "
.header ON
SELECT
'species' AS species,
COUNT(distinct asm_seq.asm_id) AS strain,
COUNT(*) AS total,
COUNT(distinct rep_seq.seq_id) AS dedup,
COUNT(distinct rep_seq.rep_id) AS rep
FROM asm_seq
JOIN rep_seq ON asm_seq.seq_id = rep_seq.seq_id
WHERE 1=1
" |
sqlite3 -tabs ${SEQ_DIR}/seq.sqlite
Plots
# venn
nwr plot venn \
tests/plot/rocauc.result.tsv \
tests/plot/mcox.05.result.tsv |
tectonic - &&
mv texput.pdf venn2.pdf
nwr plot venn \
tests/plot/rocauc.result.tsv \
tests/plot/mcox.05.result.tsv \
tests/plot/mcox.result.tsv |
tectonic - &&
mv texput.pdf venn3.pdf
nwr plot venn \
tests/plot/rocauc.result.tsv \
tests/plot/rocauc.result.tsv \
tests/plot/mcox.05.result.tsv \
tests/plot/mcox.result.tsv |
tectonic - &&
mv texput.pdf venn4.pdf
plotr venn tests/plot/rocauc.result.tsv tests/plot/mcox.05.result.tsv
tectonic docs/venn4.tex
# histo
nwr plot hh tests/plot/hist.tsv -g 2 --bins 20 --xl "" --unit 0.5,1.5 |
tectonic - &&
mv texput.pdf hist.pdf
nwr plot hh tests/plot/hist.tsv --bins 30 --xl "" --xmm 45,75 --unit 0.5,1.5 |
tectonic - &&
mv texput.pdf hist.pdf
cargo run --bin nwr plot hh tests/plot/adomain.tsv -g 2 --bins 40 --xl "" --yl "" --unit 0.3,0.5 |
tectonic - &&
mv texput.pdf hist.pdf
tectonic docs/heatmap.tex
# nrps
cargo run --bin nwr plot nrps tests/plot/srf.tsv --legend --color blue |
tectonic - &&
mv texput.pdf srf.pdf
tectonic docs/nrps.tex
tectonic docs/da.tex
Database schema
brew install k1LoW/tap/tbls
tbls doc sqlite://./tests/nwr/taxonomy.sqlite docs/txdb
tbls doc sqlite://./tests/nwr/ar_refseq.sqlite docs/ardb
Dependencies
~78MB
~1M SLoC