Skip to content

[PATCH] allow bwa to parse Casava 1.8 fastqs and filterer flagged#1

Closed
RoelKluin wants to merge 3 commits into
lh3:masterfrom
RoelKluin:master
Closed

[PATCH] allow bwa to parse Casava 1.8 fastqs and filterer flagged#1
RoelKluin wants to merge 3 commits into
lh3:masterfrom
RoelKluin:master

Conversation

@RoelKluin

Copy link
Copy Markdown
Contributor

In Casava 1.8 the fastq output changed, the name has a space which bwa
wasn't parsing correctly. This patch fixes it and enables bwa to filter
sequences marked :Y: by Casava. The tag is removed from the output.

Signed-off-by: RoelKluin roel.kluin@gmail.com

wasn't parsing correctly. This patch fixes that and enables bwa to filter
sequences marked by Casava, removing this tag from the output.

Signed-off-by: RoelKluin <roel.kluin@gmail.com>
@lh3

lh3 commented Jul 8, 2011

Copy link
Copy Markdown
Owner

Thanks. Nonetheless, you should not modify kseq.h. By convention, a FASTA name should not contain any space. Anything beyond space is comment. Allowing space in sequence names as is in the modified kseq.h will cause problems to other input sequences. If you want to get the string after space in the fasta/q header lines, you should check "kseq_t::comment". Could you help to modify using "kseq_t::comment" without touching kseq.h? Thanks.

…which bwa"

This reverts commit 36cd4f9.

The comment shouldn't be included in the sequence name.
In Casava 1.8 the fastq output changed. e.g.

@EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
BBBBCCCC?<A?BC?7@@???????DBBA@@@@A@@

The part after the space, treated as comment by bwa, contains the fields:
<read number>:<is filtered>:<control number>:<barcode sequence>

With `Y' Casava indicates that a sequence should be filtered. This patch
enables bwa, with an -Y flag, to filter these sequences.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
@RoelKluin RoelKluin closed this Jul 10, 2011
pmarks added a commit to 10XGenomics/bwa that referenced this pull request Jan 4, 2019
ksprintf is inline, to avoid collision with htslib
teepean added a commit to teepean/bwa that referenced this pull request Jun 1, 2026
lh3#5 real)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants