Skip to content

Add file type signatures to binary files#1031

Merged
mr-c merged 3 commits into
masterfrom
feature/magicnumbers
Jun 14, 2015
Merged

Add file type signatures to binary files#1031
mr-c merged 3 commits into
masterfrom
feature/magicnumbers

Conversation

@mr-c

@mr-c mr-c commented May 26, 2015

Copy link
Copy Markdown
Contributor

Closes #519

@mr-c mr-c added this to the 2.0 milestone May 26, 2015
@mr-c mr-c force-pushed the feature/magicnumbers branch from 9d34ece to b50d5bf Compare May 26, 2015 17:28
Comment thread lib/counting.cc Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with oxli and CountGraph being used internally, but it seems problematic to put them in diagnostic/error output, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. We can change it back during the planned renames.

@mr-c

mr-c commented May 26, 2015

Copy link
Copy Markdown
Contributor Author
  • Is it mergeable?
  • Did it pass the tests?
  • If it introduces new functionality in scripts/ is it tested?
    Check for code coverage with make clean diff-cover
  • Is it well formatted? Look at make pep8, make diff_pylint_report,
    make cppcheck, and make doc output. Use make format and manual
    fixing as needed.
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Is it documented in the ChangeLog?
    http://en.wikipedia.org/wiki/Changelog#Format
  • Was a spellchecker run on the source code and documentation after
    changes were made?
  • Is the Copyright year up to date?

@ctb

ctb commented May 26, 2015

Copy link
Copy Markdown
Member

Could you add some discussion about implementation and implications moving forward? How does this break backwards compatibility, etc. etc.?

@mr-c

mr-c commented May 26, 2015

Copy link
Copy Markdown
Contributor Author

Discussion in the docs or here?

Old counting table and presence tables will not be loadable with khmer 2.0. Likewise counting table and presence tables created with 2.0 will not load with khmer 1.4 and prior.

@ctb

ctb commented May 26, 2015

Copy link
Copy Markdown
Member

On Tue, May 26, 2015 at 10:48:22AM -0700, Michael R. Crusoe wrote:

Discussion in the docs or here?

Old counting table and presence tables will not be loadable with khmer 2.0. Likewise counting table and presence tables created with 2.0 will not load with khmer 1.4 and prior.

Here is fine. OK, anything else we should know? :)

@mr-c

mr-c commented May 26, 2015

Copy link
Copy Markdown
Contributor Author

To add the header to an old file:

echo -n 'OXLICountGraph' > new_file.ct
cat old_counting_table.ct >> new_file.ct

# or 

echo -n 'OXLINodeGraph_' > new_file.pt
cat old_presence_table.ct >> new_file.pt

@luizirber

Copy link
Copy Markdown
Member

Marginally related to this PR: What is the official procedure to add this signature to libmagic?

@mr-c

mr-c commented May 26, 2015

Copy link
Copy Markdown
Contributor Author

@luizirber I just left a comment in the tracking issue #519 (comment)

@mr-c

mr-c commented May 26, 2015

Copy link
Copy Markdown
Contributor Author

Hmm.. I just noticed that we do include the sub-file type as the second field. I'm going to switch to 'OXLI' as the signature. According to the file documentation we just need 32 unique bits, so that should suffice.

@mr-c

mr-c commented May 26, 2015

Copy link
Copy Markdown
Contributor Author
  • tagset
  • stoptags
  • subset

@mr-c

mr-c commented May 26, 2015

Copy link
Copy Markdown
Contributor Author

@ctb @luizirber ready for review & merge

@kdm9

kdm9 commented May 27, 2015

Copy link
Copy Markdown
Contributor

Somewhere in the back of my mind I remember being told to include a non-printable character in the magic string for binary files, as it helps some things (like grep) detect a file is a binary one. I have no idea if it's relevant here, but I know the likes of BAM, gz etc do so. [citation needed]

@ctb

ctb commented May 27, 2015

Copy link
Copy Markdown
Member

On Tue, May 26, 2015 at 12:44:02PM -0700, Michael R. Crusoe wrote:

  • tagset
  • stoptags
  • subset

also, labels - see #1021.

@mr-c

mr-c commented May 27, 2015

Copy link
Copy Markdown
Contributor Author

@kdmurray91 Good point, though our versioning & subtype bytes that follow the magic are not in the printable ASCII range so we are already fine on the point. For example, file already correctly identifies a counting table as being of type 'data' prior to the introduction of our rules.

@mr-c

mr-c commented May 27, 2015

Copy link
Copy Markdown
Contributor Author

@ctb do you want me to add the signature to your PR or later to this PR for merging after #1021 ?

@ctb

ctb commented May 27, 2015

Copy link
Copy Markdown
Member

On Wed, May 27, 2015 at 07:27:04AM -0700, Michael R. Crusoe wrote:

@ctb do you want me to add the signature to your PR or later to this PR for merging after #1021 ?

later, I think.

@mr-c mr-c force-pushed the feature/magicnumbers branch from 0290083 to b813793 Compare May 29, 2015 19:11
@mr-c

mr-c commented May 29, 2015

Copy link
Copy Markdown
Contributor Author

@ctb updated for the recently merged labels/tags format. Ready for review & merge

Comment thread doc/whats-new-2.0.rst Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have

@ctb

ctb commented May 31, 2015

Copy link
Copy Markdown
Member

Other than the comments, looks good. Note that this PR passes the truncation tests I'm adding in #1048 too, so +1 that :)

@mr-c

mr-c commented Jun 14, 2015

Copy link
Copy Markdown
Contributor Author

@ctb comments are addressed; ready for review & merge

Comment thread .gitattributes Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be merged? If so, please add to ChangeLog.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. Probably shouldn't be merged as not everyone will have git-merge-changelog installed. I've moved it to my personal site-wide configuration file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Sun, Jun 14, 2015 at 12:34:20PM -0700, Michael R. Crusoe wrote:

@@ -1 +1,2 @@
khmer/_version.py export-subst
+ChangeLog merge=merge-changelog

Thanks for catching that. Probably shouldn't be merged as not everyone will have git-merge-changelog installed. I've moved it to my personal site-wide configuration file.

(It doesn't seem to be a problem for merging, note; I ran a merge with it
set, and no problem.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Something to investigate for another day.

On Sun, Jun 14, 2015 at 12:36 PM C. Titus Brown notifications@github.com
wrote:

In .gitattributes
#1031 (comment):

@@ -1 +1,2 @@
khmer/_version.py export-subst
+ChangeLog merge=merge-changelog

On Sun, Jun 14, 2015 at 12:34:20PM -0700, Michael R. Crusoe wrote: > @@ -1
+1,2 @@ > khmer/_version.py export-subst > +ChangeLog merge=merge-changelog
Thanks for catching that. Probably shouldn't be merged as not everyone will
have git-merge-changelog installed. I've moved it to my personal site-wide
configuration file.
(It doesn't seem to be a problem for merging, note; I ran a merge with it
set, and no problem.)


Reply to this email directly or view it on GitHub
https://github.com/dib-lab/khmer/pull/1031/files#r32383396.

Michael R. Crusoe: Programmer & Bioinformatician crusoe@ucdavis.edu
mcrusoe@msu.edu
The lab for Data Intensive Biology; University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe

@ctb

ctb commented Jun 14, 2015

Copy link
Copy Markdown
Member

Other than the .gitattributes change, LGTM. Please go ahead and merge (after fixing .gitattributes thing one way or another)

@mr-c mr-c force-pushed the feature/magicnumbers branch from 2aa5787 to 8718de6 Compare June 14, 2015 19:33
@mr-c

mr-c commented Jun 14, 2015

Copy link
Copy Markdown
Contributor Author

Jenkins, test this please

mr-c added a commit that referenced this pull request Jun 14, 2015
Add file type signatures to binary files
@mr-c mr-c merged commit 7f72b99 into master Jun 14, 2015
@mr-c mr-c deleted the feature/magicnumbers branch June 14, 2015 22:06
@mr-c

mr-c commented Jun 14, 2015

Copy link
Copy Markdown
Contributor Author

@ctb, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2.0 file formats need a magic header

4 participants