Skip to content

dsp25no/crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crawler

This tool was written to crawling usage of Javascript-plugins on sites in the context of diploma.

Installation

via Git clone

  • Clone this repository
git clone https://github.com/dsp25no/crawler.git
  • install python requirements (support only python3)
cd crawler
pip install -r requirements.txt
  • install docker with splash image (support docker from 1.21)

Usage

Usage: crawler.py [OPTIONS] TARGETS_LIST

Options:
  --debug             Include this flag to enable debug messages.
  --filters FILENAME  File with filters to find metrics.  [required]
  --offline           Include this flag lo load hars from save.
  -h, --help          Show this message and exit.

Format of filters:

#This is comment
filter_name regular_expression_for_filter

Format of targets list:

{ "Target_class": [
    {
      "name" : "target name",
      "url" : "optional url of target",
      "param" : "optional param",
      "another param" : "another optional param"
    }
    ...
  ],
  "Another_target_class": [
    ...
  ]
}

Target_class must inherit Target and file with code of class must be in directory targets

Example

  • Check that docker daemon is running
  • Copy files from one of the examples subdirectories to the tool's directory
  • Run tool:
python crawler.py --filters filters.txt targets.json

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages