This is the denovo pipeline from the Sequana project.
| Overview: | De-novo assembly pipeline for short-read Illumina data (bacterial genomes) |
|---|---|
| Input: | A set of paired or single-end FastQ files |
| Output: | Assembled FASTA contigs, annotation (GFF/GenBank), variant calls (VCF), HTML reports |
| Status: | Production |
| Documentation: | This README and https://sequana.readthedocs.io |
| Citation: | Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, https://doi.org/10.21105/joss.00352 |
If you already have all requirements, install the package with pip:
pip install sequana_denovo --upgrade
You will need third-party tools (spades, prokka, quast, etc.). Install all dependencies at once:
mamba env create -f environment.yml
Scan FastQ files in a directory and set up the pipeline (replace DATAPATH with your input directory):
sequana_denovo --input-directory DATAPATH
To skip Prokka annotation:
sequana_denovo --input-directory DATAPATH --skip-prokka
To tune SPAdes memory (default 64 Gb) and digital normalisation:
sequana_denovo --input-directory DATAPATH --spades-memory 32 --digital-normalisation-max-memory-usage 1e9
This creates a denovo/ directory with the pipeline and configuration file. Execute the pipeline locally:
cd denovo sh denovo.sh
If you are familiar with Snakemake, you can also run the pipeline directly:
snakemake -s denovo.rules --cores 4 --stats stats.txt
See .sequana/profile/config.yaml to tune Snakemake behaviour (cores, cluster settings, etc.).
With apptainer, initiate the working directory as follows:
sequana_denovo --input-directory DATAPATH --use-apptainer
Images are downloaded in the working directory. To store them in a shared location:
sequana_denovo --input-directory DATAPATH --use-apptainer --apptainer-prefix ~/.sequana/apptainers
Then run as usual:
cd denovo sh denovo.sh
This pipeline requires the following executables (install via bioconda/conda):
- spades or unicycler — de-novo assembler (
--assembleroption) - khmer — digital normalisation (normalize-by-median.py, filter-abund.py, etc.)
- quast — assembly quality assessment
- prokka — genome annotation (optional,
--skip-prokka) - busco — assembly completeness assessment (optional)
- checkm-genome — genome completeness and contamination (optional)
- bwa + sambamba — read mapping back to assembly
- freebayes — variant calling
- samtools — BAM/SAM processing
- seqkit — contig filtering by length
- blast — taxonomic identification of contigs (optional)
- multiqc — aggregated HTML report
- graphviz — pipeline DAG image
This Snakemake pipeline assembles bacterial (or other small) genomes from short Illumina reads.
Digital normalisation (khmer): optionally reduces sequencing depth to a target coverage level, discarding redundant reads. This lowers memory usage and speeds up assembly without significantly impacting quality.
Assembly: SPAdes (default) or Unicycler. SPAdes uses multiple k-mer sizes and is recommended for most bacterial genomes. Unicycler is designed for hybrid or circular assemblies.
Quality assessment (QUAST): reports assembly statistics (N50, # contigs, total length, GC%, coverage depth) with an interactive Icarus contig browser.
Annotation (Prokka): rapid prokaryotic genome annotation producing GFF, GenBank, and other standard formats.
Coverage analysis (sequana_coverage): reads are mapped back to the assembly with BWA, duplicates flagged with Sambamba, and per-contig coverage profiles computed and visualised.
Variant calling (Freebayes): detects SNPs and small indels between the assembled consensus and the mapped reads.
Completeness (BUSCO / CheckM): optionally assess assembly completeness against conserved single-copy orthologs (BUSCO) or lineage-specific marker genes (CheckM).
Taxonomic identification (BLAST): optionally BLASTs the top contigs against the nt database to identify their taxonomy.
A summary HTML report (summary.html) with per-sample assembly statistics and embedded
coverage plots is generated at the end of the run, alongside a MultiQC report.
See the latest documented configuration file for all available parameters.
| Version | Description |
|---|---|
| 0.12.0 |
|
| 0.11.1 |
|
| 0.11.0 |
|
| 0.10.0 |
|
| 0.9.0 |
|
| 0.8.5 |
|
| 0.8.4 |
|
| 0.8.3 |
|
| 0.8.2 |
|
| 0.8.1 | |
| 0.8.0 | First release. |
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.