Skip to content

kleash/propertyguru-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PropertyGuru Singapore Scraper

CI Python License: MIT

Scrape any PropertyGuru Singapore property listing into a formatted Excel file. One command, all results.

  • Cloudflare bypass β€” stealth browser solves Turnstile challenges automatically
  • All search filters β€” property type, price, bedrooms, bathrooms, floor size, build year, tenure
  • HDB + Condo β€” full support for all HDB room types and non-landed property codes
  • Crash-safe β€” every listing is written to disk immediately; Ctrl+C and resume any time
  • Ready-made presets β€” one flag to switch between search profiles

What you get

Each run produces a formatted Excel report:

Column Example
listing_id 12345678
title Spacious 3BR near Bishan MRT
full_address 123 Bishan Street 12
district_name Bishan
tenure 99-year Leasehold
bedrooms / bathrooms 3 / 2
floor_area_sqft 1,076
price_sgd 1,350,000
price_psf 1,254.64
build_year 2015
nearest_mrt Bishan MRT (0.3 km)
listing_url https://www.propertyguru.com.sg/...

Quick Start

# 1. Clone and install
git clone https://github.com/kleash/propertyguru-scraper.git
cd propertyguru-scraper
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
playwright install chromium

# 2. Run a preset
python run.py --config configs/condo_original.py   # condos ≀ S$2.2M, 3-5 bed, built 2010+
python run.py --config configs/hdb.py              # all HDB flat types
python run.py --config configs/condo.py            # all condos, no filters

# 3. Open your Excel report
#    propertyguru_<filter-tag>.xlsx

Windows: replace source .venv/bin/activate with .venv\Scripts\activate


Installation

Requirements: Python 3.11+

git clone https://github.com/kleash/propertyguru-scraper.git
cd propertyguru-scraper

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
playwright install chromium

Preset Config Files

Three ready-made search profiles are in configs/:

File Searches for
configs/condo_original.py Condos Β· ≀ S$2.2M Β· 3/4/5-bed Β· built 2010 or later
configs/condo.py All condos Β· no filters
configs/hdb.py All HDB flat types Β· no filters

Copy any preset to create your own search profile:

cp configs/condo.py configs/my_search.py
# edit configs/my_search.py
python run.py --config configs/my_search.py

Configuring Filters

Option A β€” Edit config.py

config.py holds the default values used when no --config flag is supplied.

# Property type group
# "N" = non-landed (condo/apt)  |  "L" = landed  |  "H" = HDB
PROPERTY_TYPE_GROUP: str = "N"

# Type codes β€” combine multiple
# Non-landed : CONDO  APT  EXECONDOMINIUM  WALK-UP  CLUSTER
# Landed     : TERR   SEMI  BUNG  DETAC  CORNER  GOOD
# HDB        : 3A 3I 4A 4I 5A 5I EA EM … (full list in config.py)
PROPERTY_TYPE_CODES: list[str] = ["CONDO"]

# Price in SGD β€” None removes the bound
MIN_PRICE: int | None = None
MAX_PRICE: int | None = 2_000_000

# Bedroom and bathroom counts β€” None means any
BEDROOMS: list[int] | None = [3, 4, 5]
BATHROOMS: list[int] | None = None

# Floor area in sqft β€” None means no bound
MIN_FLOOR_SIZE: int | None = None
MAX_FLOOR_SIZE: int | None = None

# TOP (completion) year β€” None means no bound
MIN_BUILD_YEAR: int | None = 2010
MAX_BUILD_YEAR: int | None = None

# Tenure β€” None means any; options: "FH"  "L99"  "L999"
TENURE: list[str] | None = None

Option B β€” Pass filters on the command line

# Freehold condos, 3-bed, built 2015+, under S$1.8M
python run.py \
  --tenure FH \
  --bedrooms 3 \
  --min-build-year 2015 \
  --max-price 1800000

# HDB 4-room and 5-room, S$400k–S$700k
python run.py \
  --config configs/hdb.py \
  --property-types 4A 4I 4NG 4PA 4S 4STD 5A 5I 5PA 5S \
  --min-price 400000 \
  --max-price 700000

# Remove the upper price limit entirely
python run.py --max-price ""

# Load a preset, then override one value
python run.py --config configs/hdb.py --max-price 600000

All CLI Options

--config FILE           Load a preset config file (e.g. configs/hdb.py).
                        Any flag below will override the preset.

--listing-type          sale | rent                          (default: sale)
--property-group        N | L | H                            (default: N)
--property-types CODE…  One or more type codes               (default: CONDO)

--min-price SGD         Lower price bound. "" = no bound
--max-price SGD         Upper price bound. "" = no bound

--bedrooms N…           e.g. --bedrooms 3 4 5
--bathrooms N…          e.g. --bathrooms 2 3

--min-floor-size SQFT   Minimum floor area
--max-floor-size SQFT   Maximum floor area

--min-build-year YEAR   Earliest TOP / completion year
--max-build-year YEAR   Latest TOP / completion year

--tenure CODE…          FH  L99  L999  (omit = any tenure)

Pausing and Resuming

Press Ctrl+C at any time. Every listing is already saved to disk.

Re-run the exact same command to pick up where you left off:

python run.py --config configs/condo_original.py   # interrupted at page 35
python run.py --config configs/condo_original.py   # resumes from page 35

Merging Multiple Runs

For large result sets, scrape in price bands to avoid long Cloudflare sessions:

python run.py --config configs/condo_original.py --max-price 2000000
python run.py --config configs/condo_original.py --min-price 2000001 --max-price 2500000

Then merge into one report:

import json, exporter

files = [
    "listings_CONDO_price0-2000000_bed3_4_5_yr2010-any.jsonl",
    "listings_CONDO_price2000001-2500000_bed3_4_5_yr2010-any.jsonl",
]

rows, seen = [], set()
for path in files:
    with open(path) as f:
        for line in f:
            item = json.loads(line)
            if item["listing_id"] not in seen:
                rows.append(item)
                seen.add(item["listing_id"])

exporter.export(rows, "merged.xlsx")
print(f"{len(rows)} unique listings")

Output Files

Each run creates files tagged with the active filter combination:

propertyguru_CONDO_price0-2000000_bed3_4_5_yr2010-any.xlsx    ← formatted Excel
listings_CONDO_price0-2000000_bed3_4_5_yr2010-any.jsonl       ← raw data (crash-safe log)
crawl_data_CONDO_price0-2000000_bed3_4_5_yr2010-any/          ← resume checkpoint

Troubleshooting

"No Cloudflare challenge found" in logs Normal β€” means the page loaded without a Turnstile widget. The scraper continues normally.

Spider pauses mid-run The spider retries automatically (15s β†’ 30s β†’ 60s) before pausing. Re-run the same command. If it keeps blocking, wait 5–10 minutes first.

Stale checkpoint after changing filters Delete the old checkpoint before re-running with different filters:

rm -rf crawl_data_CONDO_price0-2000000_bed3_4_5_yr2010-any/

playwright install fails Make sure you are inside the virtual environment when running playwright install chromium.


Running Tests

# Offline tests β€” fast, no network needed
pytest tests/test_pipeline.py -k "excel" -v

# Full integration suite β€” live network, ~7 minutes
pytest tests/ -v

Project Structure

β”œβ”€β”€ run.py              CLI entry point
β”œβ”€β”€ spider.py           Async spider (Scrapling Spider class)
β”œβ”€β”€ config.py           All filter constants + URL builder
β”œβ”€β”€ parsers.py          Pure HTML/JSON parsing (no I/O)
β”œβ”€β”€ exporter.py         Excel generation
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ condo_original.py   Condos ≀2.2M Β· 3-5 bed Β· 2010+
β”‚   β”œβ”€β”€ condo.py            All condos Β· no filters
β”‚   └── hdb.py              All HDB types Β· no filters
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ conftest.py         Shared browser fixture
β”‚   β”œβ”€β”€ test_filters.py     47 live filter tests
β”‚   └── test_pipeline.py    End-to-end pipeline + offline Excel tests
└── v1/                 Original V1 scraper (linear, kept for reference)

Tech Stack

Library Purpose
Scrapling Stealth browser with Cloudflare bypass
Playwright / Patchright Browser automation
openpyxl Excel generation
pytest Test framework

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

  • Bug? Open an issue with the log output and blocked page HTML
  • Filter broken? PropertyGuru updates their URL parameters occasionally β€” PRs are appreciated
  • New feature? Open an issue first to discuss before building

Legal

This tool is for personal research and market analysis only. Use responsibly and in accordance with PropertyGuru's Terms of Service. The authors are not responsible for how this software is used.


License

MIT Β© 2026 kleash

About

🏠 Scrape PropertyGuru Singapore listings into Excel. Cloudflare bypass, all filters (Condo + HDB), pause/resume, 58 integration tests.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages