soup

Web Scraper in Go, similar to BeautifulSoup

soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.

Functions implemented till now :

func Get(string) (string,error) // Takes the url as an argument, returns HTML string
func HTMLParse(string) struct{} // Takes the HTML string as an argument, returns a pointer to the DOM constructed
func Find([]string) struct{} // Element tag,(attribute key-value pair) as argument, pointer to first occurence returned
func FindAll([]string) []struct{} // Same as Find(), but pointers to all occurrences returned
func FindNextSibling() struct{} // Pointer to the next sibling of the Element in the DOM returned
func FindNextElementSibling() struct{} // Pointer to the next element sibling of the Element in the DOM returned
func FindPrevSibling() struct{} // Pointer to the previous sibling of the Element in the DOM returned
func FindPrevElementSibling() struct{} // Pointer to the previous element sibling of the Element in the DOM returned
func Attrs() map[string]string // Map returned with all the attributes of the Element as lookup to their respective values
func Text() string // Full text inside a non-nested tag returned

The struct returned by the functions has two fields :

Pointer containing the pointer to the current html node
NodeValue containing the current html node's value, i.e. the tag name for an ElementNode, or the text in case of a TextNode

Installation

Install the package using the command

go get github.com/anaskhan96/soup

Example

An example code is given below to scrape the "Comics I Enjoy" part (text and its links) from xkcd.

More Examples

package main

import (
	"fmt"
	"github.com/anaskhan96/soup"
	"os"
)

func main() {
	resp, err := soup.Get("https://xkcd.com")
	if err != nil {
		os.Exit(1)
	}
	doc := soup.HTMLParse(resp)
	links := doc.Find("div", "id", "comicLinks").FindAll("a")
	for _, link := range links {
		fmt.Println(link.Text(), "| Link :", link.Attrs()["href"])
	}
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
examples		examples
fetch		fetch
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
license		license
soup.go		soup.go
soup_test.go		soup_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

soup

Installation

Example

About

Uh oh!

Releases

Packages

Languages

License

Gyga8K/soup

Folders and files

Latest commit

History

Repository files navigation

soup

Installation

Example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages