This project contains a Python script to fetch Indonesian administrative division data from the official BPS (Badan Pusat Statistik) API. It traverses the administrative levels (province, regency, district, and village), saves the raw JSON responses, processes them into CSV files, and generates a final SQL dump.
- Fetches data for specified administrative levels.
- Automatically discovers the latest data period (
periode). - Normalizes and processes raw data into a clean, tabular format.
- Generates raw JSON, processed CSV, and SQL outputs.
- Supports concurrent requests for faster data extraction.
- Python 3.8+
- Astral
uvfor environment and package management.
-
Create a virtual environment:
uv venv
-
Activate the environment:
source .venv/bin/activate -
Install dependencies:
uv pip install -r requirements.txt
To fetch the data, run the fetch_bps_wilayah.py script. You must provide a valid Cookie from a browser session on the BPS website.
python fetch_bps_wilayah.py --cookie "YOUR_BPS_COOKIE_HERE"For more options, see the script's help message:
python fetch_bps_wilayah.py --helpA Makefile is provided for convenience.
make install: Sets up the virtual environment and installs dependencies.make fetch: Fetches the data. You can pass the cookie as a variable:make fetch COOKIE="YOUR_BPS_COOKIE_HERE"make clean: Removes all generated data files.
The script generates the following outputs in the data/ directory:
data/raw/bps/<periode>/: Raw JSON responses from the API.data/processed/bps/<periode>/: Processed data in CSV format, with amanifest.jsonfile.data/sql/bps_wilayah_<periode>.sql: A SQL dump of the final processed data.