Web Crawler

A simple web crawler built using Go (powered by go routines)

Demo

How to build the binary

Run go get -v to download the dependencies
Run go build -o webcrawler to build the binary

Usage

./webcrawler -baseurl https://golang.org -max-depth 2

This will start the webcrawler and generate two files

url-tree.txt which shows the links between pages. The default file name can be changed by -tree-file-name flag. (You can disable the tree generation by setting -show-tree flag to false).
sitemap.xml which contains the sitemap in xml format.

How to run tests

go test -v ./...

Things that can be improved

Command line flags: The flag handling can be improved by using https://github.com/spf13/cobra
Configuration: It would be nice to have some configuration management (It could be done by https://github.com/spf13/viper)
Performance: All the go routines write to a single shared state. The performance might improve if we use channels. (we will have to benchmark it to find the actual performance improvements)

Issue with Page Links Tree

Please note that the tree generated shows children for any given node only once For example, if there is a page with URLs links as

foo
 |-bar
      |-lorem
            |-ipsum
 |-lorem
      |-ipsum

the generated tree would look like

foo
 |-bar
      |-lorem
            |-ipsum
 |-lorem

That is, the child nodes of any given node are shown only once.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
crawler		crawler
fetchers		fetchers
tree		tree
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Crawler

Demo

How to build the binary

Usage

How to run tests

Things that can be improved

Issue with Page Links Tree

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jarifibrahim/webcrawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

Demo

How to build the binary

Usage

How to run tests

Things that can be improved

Issue with Page Links Tree

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages