This is the code repository for bjorn - a suite of miscellaneous tools that can be used to:
-
generate information for large-scale genomic surveillance of SARS-CoV-2 sequences. This functionality relies on external tools such as
datafunk,minimap2, andpandas. -
prepare results and data files from SARS-CoV-2 sequencing analysis for release to public databases such as GISAID, Google Cloud, and GitHub
- Install Anaconda: instructions can be found here
- Create the
bjornenvironment
conda env create -f env/linux.yml -n bjorn- Activate environment
conda activate bjorn- Install datafunk (inside the activated environment): instructions (ensure environment is activated during installation)
- Activate
bjornenvironment
conda activate bjorn- Copy and modify
test_config.jsonto specify your parameters such as- gisaid sign-in info
- output directory where results are saved
- number of CPU cores available for use by minimap2
- current date (false for latest; fix for testing)
- Run the
run_pipeline.shscript to initiate the Snakemake pipeline
bash run_pipeline.sh [CONFIG_FILE]- Activate
bjornenvironment
conda activate bjorn- Open
run_alab_release.shto specify your parameters such as- filepath to sample sheet containing sample metadata (input)
- filepath to updated metadata of samples that have already been uploaded
- output directory where results are saved
- number of CPU cores available for use
- minimum coverage required for each sample (QC filter)
- minimum average depth required for each sample (QC filter)
- DEFAULT: test parameters
- Open
config.jsonto specify your parameters such as- list of SARS-CoV-2 genes that are considered non-concerning
- i.e. the occurrence of open-read frame (ORF) altering mutations can be accepted
- e.g. ['ORF8', 'ORF10']
- list of SARS-CoV-2 mutations that are considered non-concerning
- i.e. the occurrence of
ORF8:Q27_can be accepted (B117 exists) - e.g. ['ORF8:Q27_']
- i.e. the occurrence of
- list of SARS-CoV-2 genes that are considered non-concerning
- Run the
run_alab_release.shscript to initiate the data release pipeline
bash run_alab_release.sh