This session is part of Biodiversity Genomics Academy 2024
Genome Reference Informatics Team, Wellcome Sanger Institute
- Jo Wood
- Joanna Collins
- Dominic Absolon
- Tom Mathers
- Michael Paulini
- Camilla Santos
By the end of this session you will be able to:
- Understand the Manual curation pipeline
- Understand how Jbrowse can be utilised as a tool both to aid curation and also for post curation analysis
- Interpret a HiC Map
- Understand how to curate an assembly using PretextView and associated curation scripts
- Create an updated/curated assembly fasta
- Understanding the terms genome assembly, reads, contigs
- Understanding of linux command line basics
- To have read/watched the rapid curation documentation available at https://gitlab.com/wtsi-grit/rapid-curation/-/tree/main
- To have read through the Interpreting_HiC_maps guide
- Have access to a 3 button mouse
[!warning] "Please make sure you MEET THE PREREQUISITES and READ THE DESCRIPTION above"
You will get the most out of this session if you meet the prerequisites above.
Please also read the description carefully to see if this session is relevant to you.
If you don't meet the prerequisites or change your mind based on the description or are no longer available at the session time, please email tol-training at sanger.ac.uk to cancel your slot so that someone else on the waitlist might attend.
These instructions will guide you through the processing step for the files related to ilThyBati1
perl rapid_split.pl -fa ./test_data/ilThyBati1.PB.asm1.purge1.polish1.scaff1.decontaminated.fa
The output will be ilThyBati1.PB.asm1.purge1.polish1.scaff1.decontaminated.fa.tpf
./PretextView &
Important
Note that you must have pop-ups enabled in your browser as this opens the software in a new window
Ensure any gap that is edited in the map is reflected in the tpf with a ">" at the start of the "GAP" line. - the TPF file you created in Step 1.
"Paint" chromosomes using the chromosome paining mode. Keyboard shortcut "s". Also add any metadata tags required for example "X" "Z" "haplotig" etc. using metadata tagging mode (keyboard shortcut "m")
This is achieved by navigating to the menu in PretextView (keyboard shortcut "u") and clicking "Generate AGP"
Step 6. Run rapid_prextext2tpf_XL. This takes as input your outputted agp, your edited tpf and the original fasta file. The output of this script is a csv file reflecting the chromosomes, a new tpf file that reflects the structure of what the new curated fasta will look like and a tpf file for an haplotigs that were marked in the curation.
python rapid_pretext2tpf_XL.py ./test_data/ilThyBati1.PB.asm1.purge1.polish1.scaff1.decontaminated.fa ./test_data/ilThyBati1_precurated.pretext.agp_1
Output files will appear in the working directory and will be named: - chrs.csv - rapid_prtxt_XL.tpf - haps_rapid_prtxt_XL.tpf
perl rapid_join.pl -csv chrs.csv -fa ./test_data/ilThyBati1.PB.asm1.purge1.polish1.scaff1.decontaminated.fa -tpf rapid_prtxt_XL.tpf -out ilThyBati1_1
N.B. rapid_join.pl can also be used to output a fasta file of the haplotigs by running the script with the -hap flag and using the haps_rapid_prtxt_XL.tpf file