Skip to content

Support JavaScript and HTML files#3

Merged
hishamhm merged 3 commits intohishamhm:masterfrom
mwh:master
Feb 27, 2014
Merged

Support JavaScript and HTML files#3
hishamhm merged 3 commits intohishamhm:masterfrom
mwh:master

Conversation

@mwh
Copy link
Contributor

@mwh mwh commented Feb 27, 2014

This adds support for *.js, *.html, and *.htm.

JavaScript has the same comment and string syntax as C++, and so isn't terribly interesting here.

HTML support is more interesting and I'm not so sure about the approach I took. It has two components: ordinary comments, and embedded languages. It understands both script ... /script and style ... /style and switches to the appropriate commenting styles for those blocks, then back at the end. It does that by adding a new string-valued state variable "htmlmode", which is updated according to which of those modes it's in. I'm a bit torn about that approach, but it seems the simplest for the moment.

It assumes script tags are JavaScript and doesn't try to address CDATA sections. It also currently does not take quoted-string attributes out as strings - it's not clear whether it should or not, but it would need to know how to tell it was inside <> first anyway. Arguably it makes sense to instead exclude everything that isn't inside <> like a string in this case, which might suggest it's not a suitable language to support (but it's useful for my purpose where there's little direct textual content anyway).

In any case, e05b78f should probably be merged as I think it's a straightforward bug fix - originally *.html was recognised as a C header file because of the presence of ".h" in the name, but the extensions should really be anchored to the end of the string.

mwh added 3 commits February 27, 2014 20:55
JavaScript has the same comment and string syntax as the C-like
languages, so that code is reused.
Previously, is(name, "h") would match "x.html" because it contains ".h".
Now, anchor the pattern to the end of the filename string.
With this commit codegrep will recognise *.html and *.htm files as HTML
and process HTML comments. It will also handle <script> and <style>
tags, using the appropriate comment and string formats within those.
@mwh
Copy link
Contributor Author

mwh commented Feb 27, 2014

GitHub's comments seem to break (on purpose?) if I put the <> on the script tag above - it should be <script ... </script> and the same for style. Notably the > is absent on the opening tags to let arbitrary attributes appear.

hishamhm added a commit that referenced this pull request Feb 27, 2014
Support JavaScript and HTML files
@hishamhm hishamhm merged commit e31a238 into hishamhm:master Feb 27, 2014
@hishamhm
Copy link
Owner

Merged!

At some point this script probably needs a cleanup/rewrite to make it cleaner/faster/smarter, but for its humble purposes I think it's holding up nicely so far :)

@hishamhm
Copy link
Owner

(oh, btw, just pushed a commit that makes it ridiculously faster... I would have never guessed that the silly is() function was so expensive (fun fact: most of the speed gain was from removing the initial .* from the string matching) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants