Explore a fix for broken casava format handling in a number of scripts#818
Conversation
|
Confronting #817 head-on, we have a few different problems:
It looks to me like the right thing to do is make sure that parse_description is always set to false cc @standage |
|
Addresses #763 |
|
This fixes the splitting/interleaving issues with Casava 1.8 AFAICS. |
|
The changes might not be fun but if the tests pass without modifications then we are violating the semantic versioning. |
|
Sorry, clarify: yes, fix, and to heck with semver?
|
|
mea culpa; ----> Then we are not violating semantic versioning if the On Sat Feb 21 2015 at 8:34:43 PM C. Titus Brown notifications@github.com
|
|
To summarize my understanding: khmer has always mangled sequence record names: either by dropping descriptions or appending different pairing notation on the end. This PR aims to bring khmer in line with expected practice: both old and new style pair naming are accepted everywhere; descriptions will no longer be dropped. With respect to semantic versioning we have three options: A) We declare the old behavior a horrible bug and widely publicize the correction and thus we don't increment the major version number. or B) We accept that the old behavior was present for such a long time and likely required workarounds that will break upon its fixing. Therefore we increment the major version to indicate this, thus following our commitment to semantic (and not 'marketing') versioning. or C) we hid the corrected behavior behind a switch and wait for the 2.0 release to make it the default behavior As Titus privately pointed out to me, seeing what changes will be required to update the protocols (if anything) will inform the wisdom of option B or C. |
|
While additional tests are needed, and it's not yet ready for a full CR, it's mostly done, I think? Thoughts welcome. I'll finish sanding down the corners tomorrow or later today, as time permits. |
OK, so it's a slightly bigger issue than I thought :). |
|
…mats Conflicts: ChangeLog
|
|
Ready for review, w00t. |
|
Additional issues/bugs found & fixed:
|
There was a problem hiding this comment.
'For khmer 1.x count-median.py will split sequence names at the first space which means that some ..'
There was a problem hiding this comment.
It would be nifty if Casava 1.8 formatted test sequence files had 'c18' in their name.
|
LGTM! |
Explore a fix for broken casava format handling in a number of scripts
|
w00t! |
See #817, among others.