GitHub - PeymanNr/Crawler-Digiato

Introduction

This is a Crawler written in pure python.

It can Crawl all categories and link of the all article in the specific category

Also it can crawl all article in Zoomit.ir Website with these details:

Title
Body
Author
Posted DateTime

##Installation instructions:

Create your virtual environment:

python3 -m venv venv

Install pip packages:

pip install -r requirements.txt

Create a MySQL Database:

CREATE DATABASE dbname;

Grant all permissions to the database:

GRANT ALL PRIVILEGES ON DATABASE database_name TO username;

Add your local config to `config.py`:

SQL_USERNAME = ""
SQL_PASSWORD = ""
SQL_HOST = ""
SQL_PORT = 0

Run the Crawler:

To see all the categories

python config.py category_name

To start crawling:

Step1
python start.py find_links
Step2
python start.py save_data

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
README.md		README.md
config.py		config.py
crawler.py		crawler.py
db.py		db.py
digiato.png		digiato.png
main.py		main.py
models.py		models.py
mongo.py		mongo.py
parser.py		parser.py
requirements.txt		requirements.txt
storage.py		storage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Create your virtual environment:

Install pip packages:

Create a MySQL Database:

Grant all permissions to the database:

Add your local config to `config.py`:

Run the Crawler:

To see all the categories

To start crawling:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

PeymanNr/Crawler-Digiato

Folders and files

Latest commit

History

Repository files navigation

Introduction

Create your virtual environment:

Install pip packages:

Create a MySQL Database:

Grant all permissions to the database:

Add your local config to config.py:

Run the Crawler:

To see all the categories

To start crawling:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Add your local config to `config.py`:

Packages