Glassdoor job scraper

Web scraping the popular job listing site "Glassdoor" with Python and BeautifulSoup.

Intended to work without sign-in. User to provide a 'base url' to scrape from, based on desired job role and country.
User to set a 'target job size' i.e. number of individual job listings to scrape from.
Python script scrapes job link, role, company and job description from glassdoor results.
Scrapped information are returned to users in the form of an output csv.

Collection of unstructured data

This script serves as a means of collecting unstructured data of job descriptions provided in job listings.
With some programming knowledge, one can easily modify the script to work for job listing sites with similar layouts.
Output data can then be analysed and visualised to generate useful insights.

Disclosures

The intended audience of this repository is people with some programming experience to improve on and/ or incorporate into their own data science pipelines.
Script has been tested and verified to work up to a target job size of <2000, of >10 pages of job listing links.

Prerequisites

Core Library: Beautiful Soup
Please refer to requirements.txt for list of requirements.

How it works

HTML parser (Beautiful Soup) extracts job listing links (to individual job listing pages) from result page(s).
HTML parser extracts information from individual job listing pages.
Loop conditions control the 'movement' from job listing page-to-page.
Loop conditions control the 'movement' from result page-to-page.

Definitions

Running tests and deployment

Original configuration.json file has been set to run tests.

output_sample.txt contains expected results from tests.
Run command to install prerequisites pip install -r requirements.txt
Run command to execute script python main.py
Verify that the resulting output.txt file is as expected.
Modify the configuration.json file as necessary for deployment.
The following gif shows how a base_url can be obtained.

Built With

Future work

There are plans to create a data processing pipeline to analyse and visualise to generate useful insights from extracted data in the future. Feel free to collaborate and contribute to this project, or open an issue to suggest more useful features for implementation.

Author

Kelvin Tan Xuan De

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
img		img
README.md		README.md
config.json		config.json
functions.py		functions.py
main.py		main.py
output_sample.csv		output_sample.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Glassdoor job scraper

Collection of unstructured data

Disclosures

Prerequisites

How it works

Definitions

Running tests and deployment

Built With

Future work

Author

About

Uh oh!

Releases

Packages

Languages

evandroguedes/glassdoor-scraper

Folders and files

Latest commit

History

Repository files navigation

Glassdoor job scraper

Collection of unstructured data

Disclosures

Prerequisites

How it works

Definitions

Running tests and deployment

Built With

Future work

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages