Skip to content

ECMAScript2/htmlparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ES2 HTML Parser

Compact JavaScript HTML parser.

  1. Target and Development Environments
  2. Demo
  3. Functions and Features
  4. Development and test
  5. Links
  6. License

1. Target and Development Environments

  1. Works in a wide range of environments (but is slow) because it does not use RegExp
  2. Written in Closure Script

2. Demo

Blogger Templete Cleaner

3. Functions and Features

HTML document fragments written by web designers generally work correctly.

  1. The document tree can be constructed correctly even if the optional closing tag is omitted.
    • caption,dd,li,td,dt,th,p,rb,rp,rt,html,head,colgroup,optgroup,option,tbody,thead,tfoot,tr,rbc,rtc
  2. Broken document fragments in conditional comments can also be parsed.
    • <!--[if IE 8]> </div><br clear=both><div> <![endif]-->
      • Retrieve and parse the comment text @see
    • Element missing end tag
      • An“auto-closing end tag”that is not present in the document, cannot omit the end tag, and is not closed by another starting tag is identified by the isInvalidEndTagOmission flag. (onParseEndTag)
    • Element missing start tag
      • isMissingStartTag flag is true (onParseEndTag)
  3. <html><head><body> is not a supplement to create a complete HTML document like parse5.
  4. ⚠️ <table><p> and other invalid documents, the structure of the tree created from them differs from the specification.
  5. ⚠️ Do not remove newline characters in <script>, <style>, <textarea>, <title>, <plaintext>, <xmp>, <listing>. Test page
  6. RawTextElements(<script>, <style>, <textarea>, <title>, <plaintext>, <xmp>, <listing>) can contain ProcessingInstruction. (#1)
  7. Pause parsing and Resume
  8. XHTML
  9. ⚠️ In a complete document, the <body> tag must not be omitted.

4. Development and test

git clone https://github.com/ECMAScript2/es2-html-parser
cd es2-html-parser
npm i

gulp dist

npm run test

See src/js/example/*.js for how to write the handler. A SAX Style API is provided.

See test/*.js for how to use the parser.

5. Links

  1. Original code by Erik John Resig (ejohn.org) Early JavaScript HTML parser, compact code but useful in most cases
  2. pettanR / webframework / js / 02_Dom / 09_HTMLParser.js Based on John Resig's code, without regular expressions
  3. html.json Project using es2-html-parser
  4. クラウド番外地 / 7KB 弱の JavaScript 製 HTML パーサーを書いた

6. License

ES2 HTML Parser is licensed under MIT license.

(C) 2024-2025 itozyun(blog)

About

HTML Parser Implementation for html.json and ES2 Web Browsers

Resources

Stars

Watchers

Forks