SwiftSoup is a Swift library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
SwiftSoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
- scrape and parse HTML from a URL, file, or string
- find and extract data, using DOM traversal or CSS selectors
- manipulate the HTML elements, attributes, and text
- clean user-submitted content against a safe white-list, to prevent XSS attacks
- output tidy HTML
SwiftSoupis designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup;SwiftSoupwill create a sensible parse tree.
SwiftSoup is available through CocoaPods. To install it, simply add the following line to your Podfile:
pod "SwiftSoup"To parse a HTML document:
let html = "<html><head><title>First parse</title></head>"
+ "<body><p>Parsed HTML into a doc.</p></body></html>"
let doc: Document = try SwiftSoup.parse(html)
return try doc.text()- unclosed tags (e.g.
<p>Lorem <p>Ipsumparses to<p>Lorem</p> <p>Ipsum</p>) - implicit tags (e.g. a naked
<td>Table data</td>is wrapped into a<table><tr><td>...) - reliably creating the document structure (
htmlcontaining aheadandbody, and only appropriate elements within the head)
###The object model of a document
- Documents consist of Elements and TextNodes
- The inheritance chain is:
DocumentextendsElementextendsNode.TextNodeextendsNode. - An Element contains a list of children Nodes, and has one parent Element. They also have provide a filtered list of child Elements only.
Nabil Chatbi, scinfu@gmail.com
SwiftSoup was ported to Swift from Java Jsoup library.
SwiftSoup is available under the MIT license. See the LICENSE file for more info.