Host read removal with Bowtie 2#49
Conversation
…stom content file
|
Bowtie2 host removal is fine. |
|
Ok thanks, will do! |
| zcat ${reads[0]} | echo "Read pairs before removal: \$((`wc -l`/4))" >>${name}_remove_host.log | ||
| zcat ${name}_host_unmapped_1.fastq.gz | echo "Read pairs after removal: \$((`wc -l`/4))" >>${name}_remove_host.log |
There was a problem hiding this comment.
Is this just for debugging? I can't see where ${name}_remove_host.log is used elsewhere.
Might be a bit of an expensive operation if so, zcat on a big FastQ file can take quite a while. And I guess we get this info from your bowtie logs MultiQC module anyway?
There was a problem hiding this comment.
True, I originally copied it from the remove_phix process, but it can be removed here.
|
|
||
| script: | ||
| def sensitivity = params.host_removal_verysensitive ? "--very-sensitive" : "--sensitive" | ||
| if ( !params.single_end ) { |
There was a problem hiding this comment.
for avoiding to duplicate most of the process code (sin gle end and paired end), you can define an input parameter, like at https://github.com/nf-core/mag/blob/master/main.nf#L672
There was a problem hiding this comment.
ok, but in this case there are multiple parts in the process code affected:
-1 "${reads[0]}" -2 "${reads[1]}"--un-conc-gz ${name}_host_unmapped_%.fastq.gz \
--al-conc-gz ${name}_host_mapped_%.fastq.gz \zcat ${name}_host_mapped_1.fastq.gz | awk '{if(NR%4==1) print substr(\$0, 2)}' > ${name}_host_mapped_1.read_ids.txt
zcat ${name}_host_mapped_2.fastq.gz | awk '{if(NR%4==1) print substr(\$0, 2)}' > ${name}_host_mapped_2.read_ids.txt
Maybe in this case it doesn't necessarily get cleaner if solved like this?
|
OK, I now added a test using the Currently this host removal only works for short reads. If |
|
In the MAG |
|
Hi @ewels or @apeltzer, could one of you tell me by any chance what the purpose of the line |
|
Link to line in question: Line 18 in 4c2f61c Nothing to do with me I'm afraid. Looks like it was added by @HadrienG Phil |
|
(but I agree, I can't see anything obvious that it is doing, and I suspect that it can be removed from both config files) |
|
OK, thanks @ewels ! |
I think that's fine. Using proper settings for ONT qc (e.g. |
|
Ok, will change it then so that the already filtered short reads will be used to filter the long reads. |
…ost removal is run in combination with long reads.
560b7c3 to
e3db180
Compare
|
Hi @d4straub, thanks for your input. The following points were added/changed now:
I tested this locally for a very basic example, and with
Best, |
|
If I am not mistaken there are 3 tests now:
wouldn't it be good to test also
My reasoning is that there are now some channels that are not tested otherwise. And further changes breaking processes that use these channels might go undetected when the test is missing. What do you think? edit: layout |
|
Ok, I added a corresponding test, but it is more for channel testing, as I didn't add host reads to the long read dataset. |
d4straub
left a comment
There was a problem hiding this comment.
Great, thanks! Looks good!
|
I made the saving of the host read ids optional and added some sorting, since the order of the Bowtie 2 results is not reproducible. |
Requested changes were either addressed or reasonably dismissed
Here is a suggestion for host read removal using bowtie2. For the host reference sequence either iGenomes or a user specified Fasta reference file can be used.
I also added a MultiQC section for this to display which fraction of reads maps against the host reference and is thus filtered out using a custom content file as suggested by @ewels and @drpatelh (MultiQC/MultiQC#1199), instead of using the standard Bowtie 2 MultiQC format (with its multiple different mapping categories). Moreover, I separated the MultiQC FastQC for before and after preprocessing.
If using this bowtie2 based strategy would be fine for you, I can add a test.
PR checklist
nextflow run . -profile test,docker).nf-core lint .).docsis updatedCHANGELOG.mdis updatedREADME.mdis updatedLearn more about contributing: https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md