A Python-based web scraper for extracting company information from Welcome to the Jungle (WTJ).
This scraper automates the collection of company profiles from Welcome to the Jungle, including:
- Company names and locations
- Official websites and social media links
- Industry sectors
- Company descriptions and presentations
- Recruitment information
- Additional company insights
- Clone the repository:
git clone https://github.com/zaidkx7/WTJ_Scrapper.git
cd WTJ_Scrapper- Create Virtual Environment
python -m venv .venv
.venv\Scripts\activate- Install the required dependencies:
pip install -r requirements.txtSimply run the main script:
python main.pyThe scraper will:
- Fetch company data from WTJ
- Save the results in two formats:
- JSON file (
response/data.json) - Excel file (
response/companies_info.xlsx)
- JSON file (
The scraper generates two types of output files in the response directory:
data.json: Contains all scraped data in JSON formatcompanies_info.xlsx: An Excel file with formatted columns containing:- Company name
- Location
- Website
- URL
- Sectors
- Social media links
- Description
- Presentation
- Recruitment information
- Additional insights
- beautifulsoup4
- openpyxl
- requests
This scraper is for educational purposes only. Please ensure you comply with WTJ's terms of service and robots.txt when using this tool.
This project is licensed under the MIT License - see the LICENSE file for details.