Support JavaScript and HTML files#3
Merged
hishamhm merged 3 commits intohishamhm:masterfrom Feb 27, 2014
Merged
Conversation
JavaScript has the same comment and string syntax as the C-like languages, so that code is reused.
Previously, is(name, "h") would match "x.html" because it contains ".h". Now, anchor the pattern to the end of the filename string.
With this commit codegrep will recognise *.html and *.htm files as HTML and process HTML comments. It will also handle <script> and <style> tags, using the appropriate comment and string formats within those.
Contributor
Author
|
GitHub's comments seem to break (on purpose?) if I put the <> on the script tag above - it should be <script ... </script> and the same for style. Notably the > is absent on the opening tags to let arbitrary attributes appear. |
hishamhm
added a commit
that referenced
this pull request
Feb 27, 2014
Support JavaScript and HTML files
Owner
|
Merged! At some point this script probably needs a cleanup/rewrite to make it cleaner/faster/smarter, but for its humble purposes I think it's holding up nicely so far :) |
Owner
|
(oh, btw, just pushed a commit that makes it ridiculously faster... I would have never guessed that the silly is() function was so expensive (fun fact: most of the speed gain was from removing the initial .* from the string matching) ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds support for *.js, *.html, and *.htm.
JavaScript has the same comment and string syntax as C++, and so isn't terribly interesting here.
HTML support is more interesting and I'm not so sure about the approach I took. It has two components: ordinary comments, and embedded languages. It understands both script ... /script and style ... /style and switches to the appropriate commenting styles for those blocks, then back at the end. It does that by adding a new string-valued state variable "htmlmode", which is updated according to which of those modes it's in. I'm a bit torn about that approach, but it seems the simplest for the moment.
It assumes script tags are JavaScript and doesn't try to address CDATA sections. It also currently does not take quoted-string attributes out as strings - it's not clear whether it should or not, but it would need to know how to tell it was inside <> first anyway. Arguably it makes sense to instead exclude everything that isn't inside <> like a string in this case, which might suggest it's not a suitable language to support (but it's useful for my purpose where there's little direct textual content anyway).
In any case, e05b78f should probably be merged as I think it's a straightforward bug fix - originally *.html was recognised as a C header file because of the presence of ".h" in the name, but the extensions should really be anchored to the end of the string.