KFC (K-mer Fast Counter) is a fast and space-efficient k-mer counter based on hyper-k-mers.
It is particularly well-suited for counting large k-mers (with k ≥ 63) from long reads with a low error-rate.
It can also filter k-mers based on their count and only retrieve the k-mers above a certain threshold.
If you have not installed Rust yet, please visit rustup.rs to install it.
Then clone this repository and build KFC using:
git clone https://github.com/lrobidou/KFC
cd KFC
RUSTFLAGS="-C target-cpu=native" cargo +nightly build --release -F nightlyMake sure to set RUSTFLAGS="-C target-cpu=native" to use the fastest instructions available on your architecture.
If you cannot use Rust nightly, you can also build KFC in stable mode (which may be slightly slower):
RUSTFLAGS="-C target-cpu=native" cargo build --releaseThis will create a binary located at target/release/kfc.
The KFC binary provides two main subcommands: build (to count k-mers from a FASTA/Q file) and dump (to extract the k-mers contained in a KFC index).
You can view the detailed usage of each subcommand using:
./kfc <subcommand> -hThe first step to any KFC usage is to build a KFC index.
./kfc build -k <k> -i <FASTA/Q> -o <index>.kfcOnce the KFC index is computed, it is possible to dump it to text. The k-mers are not ordered.
./kfc dump -t <threshold> -i <index>.kfc --output-text <kmers>.txtKFC supports the k-mer file format (see Dufresne et al, The K-mer File Format: a standardized and compact disk representation of sets of k-mers). As such, it is possible to dump a KFC index into a KFF file. The count of each k-mer is encoded in the KFF file.
./kfc dump --input-index <index>.kfc --output-kff <index>.kffWarning: KFC only handles KFF files built by KFC.
Reading the KFF file produced by KFC should be possible with any implementation supporting KFF, but we recommend relying on KFC for this task. Indeed, a KFF built by KFC respects some assumptions on the count of k-mers, which can be used to dump the KFF file with a lower memory consumption. This also means that files not respecting these assumptions would produce invalid count if dumped by KFC.
./kfc kff-dump --input-kff <index>.kff --output-text <index>.txt