GitHub - jianshu93/dotANI: GPU-accelerated ANI computation at large scale

DotANI: Ultra-fast and Memory efficient ANI computation with GPU acceleration

dotANI first samples the kmer set and then estimate intersection via dotHash. Then the kmer hashes are encoded into hyperdimensional vectors (HVs) using HDC encoding to obtain better tradeoff of ANI estimation quality, sketch size, and computation speed. To obtains the cardinality of genomes, we use UltraLogLog (ULL). The sketch generated by dotANI has 2 parts, the dotHash sketch and ULL sketch. ANI estimation in dotANI can be realized using highly vectorized vector multiplication. dotANI also provides database search.

Quickstart

Installation

Basic Installation

dotANI requires Rust language and Cargo to be installed. We recommend installing HyperGen using the following command:

git clone https://github.com/jianshu93/dotANI.git
cd dotANI

# Without GPU acceleration for sketching
cargo build --release

Install with GPU Support

dotANI supports GPU acceleration. Using GPU mode will require the installation of NVIDIA GPU driver. Use nvidia-smi or nvcc -V to check if the driver is installed. Then run the following command to install with GPU support:

# With GPU acceleration for sketching and distance calculation, tested on RTX 3090
cargo build --release --features cuda

Currently only Nvidia GPUs are supported. We tested the compatibility on both desktop RTX4090 and laptop RTX4060 with CUDA Version 12.x.

Usage

Current version supports following functions:

1. Genome sketching for .fa/.fna/.fasta files

Example:
dotani sketch -p ./data -o ./fna.sketch

Usage: dotani sketch [OPTIONS] --path <path> --out <out>

Options:
  -p, --path <path>                Input folder path containing .fna/.fa/.fasta files (gzip/bzip2/xz/zstd compressed files supported, e.g., .fna.gz, .fa.bz2, .fasta.xz, .fna.zst)
  -o, --out <out>                  Output DotHash sketch file
  -T, --threads <threads>          Number of threads, default all logical cores
  -C, --canonical <canonical>      Whether to use canonical k-mers [default: true] [possible values: true, false]
  -k, --ksize <ksize>              k-mer size for sketching [default: 16]
  -S, --seed <seed>                Hash seed [default: 1447]
      --ull-p <ull_p>              UltraLogLog precision parameter [default: 14]
  -d, --hv-d <hv_d>                Dimension for hypervector [default: 4096]
  -Q, --quant-scale <quant_scale>  Scaling factor for HV quantization [default: 1.0]
  -h, --help                       Print help
  -V, --version                    Print version

2. ANI estimation and database search

Example:
dotani dist -r fna1.sketch -q fna2.sketch -o output.ani

Positional arguments:
-r, --path_r <PATH_R>           Path to ref sketch file
-q, --path_q <PATH_Q>           Path to query sketch file
-o, --out <OUT>                 Output path 
-t, --thread <THREAD>           Threads used for computation [default: 16]
-a, --ani_th <ANI_TH>           ANI threshold [default: 85.0]

3. Faster sketching on GPU

dotANI supports offloading the kmer hashing and sampling steps to GPU to speed up the sketching process. Use the following command to run on GPU device:

dotani-cuda sketch -p ./data -o ./fna.sketch
dotani-cuda dist -r fna1.sketch -q fna2.sketch -o output.ani

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
src		src
test		test
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-MIT		LICENSE-MIT
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DotANI: Ultra-fast and Memory efficient ANI computation with GPU acceleration

Quickstart

Installation

Basic Installation

Install with GPU Support

Usage

1. Genome sketching for .fa/.fna/.fasta files

2. ANI estimation and database search

3. Faster sketching on GPU

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DotANI: Ultra-fast and Memory efficient ANI computation with GPU acceleration

Quickstart

Installation

Basic Installation

Install with GPU Support

Usage

1. Genome sketching for .fa/.fna/.fasta files

2. ANI estimation and database search

3. Faster sketching on GPU

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages