Find the closest bike share station to you, anywhere in the world.
Scrapes GBFS (General Bikeshare Feed Specification) feeds from 1,200+ systems across 50+ countries.
pnpm installDownloads the master list of all GBFS systems from MobilityData/gbfs/systems.csv and writes it to data/systems.json.
pnpm fetch-systemsOutput includes a summary:
Fetching GBFS systems catalogβ¦
Found 1245 systems
Top 10 countries:
DE: 217
US: 172
FR: 139
...
Reads data/systems.json, resolves a system's auto-discovery endpoint, and fetches its station/vehicle data. This includes static info (locations, capacity) and a snapshot of current availability.
# Search for systems
npx tsx src/scripts/fetch-station.ts --list "paris"
npx tsx src/scripts/fetch-station.ts --list "us"
# Fetch by index (from --list output)
npx tsx src/scripts/fetch-station.ts 753
# Fetch by system_id
npx tsx src/scripts/fetch-station.ts dublin
# Fetch by name (substring match)
npx tsx src/scripts/fetch-station.ts "Citi Bike"Output is written to data/stations/<system_id>.json and includes:
station_informationβ station locations (lat/lon), names, capacitystation_statusβ real-time availability per stationfree_bike_statusβ dockless vehicle locations (lat/lon)
Re-fetches only the real-time availability feeds (station_status + free_bike_status/vehicle_status) using the cached discovery URLs from a previous fetch-station run. Much faster than a full fetch since it skips discovery and station_information.
# List systems that have been fetched
npx tsx src/scripts/fetch-availability.ts --list
npx tsx src/scripts/fetch-availability.ts --list "dub"
# Refresh by system_id
npx tsx src/scripts/fetch-availability.ts dublin
# Refresh by name
npx tsx src/scripts/fetch-availability.ts "Citi Bike"
# Refresh by index
npx tsx src/scripts/fetch-availability.ts 0Output is written to data/availability/<system_id>.json.
pnpm fetch-systems # once: get all 1,200+ systems
npx tsx src/scripts/fetch-station.ts "Citi Bike" # once: get stations + first snapshot
npx tsx src/scripts/fetch-availability.ts "Citi Bike" # repeat: refresh availabilityReads all fetched station data from data/stations/*.json, extracts every geo point (stations + free-floating vehicles), and builds a two-tier spatial index of binary kdbush tiles.
pnpm build-tiles
# Custom target size (default 5000 points per box)
pnpm build-tiles -- --target-size 3000Output:
data/tiles/box-index.binβ routing index (~5 KB, KDBush of ~120 box centers)data/tiles/box-NNN.binβ one data tile per box (~340 KB each, ~5000 points)
The tiling groups points into bounding boxes:
- Points are grouped by GBFS system first, keeping systems together in boxes
- Large systems (>6500 points) are recursively split along the median latitude or longitude
- Small nearby systems are greedily merged into the same box (within 500 km)
- Bounding boxes are expanded by 10% for overlap, with closest-center tiebreaking
- Result: ~120 boxes covering 598K+ stations and vehicles worldwide
Both the box-index and data tiles use the same self-contained binary format:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HEADER (12 bytes) β
β magic: 0x4742 ("GB") [2 bytes] β
β version: 1 [2 bytes] β
β point_count: N [4 bytes] β
β metadata_offset: M [4 bytes] β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β KDBUSH INDEX (variable) β
β Raw kdbush ArrayBuffer β zero-copy restore β
β with KDBush.from() β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β METADATA (starts at byte M) β
β JSON-encoded metadata (UTF-8) β
β β
β Box-index tiles: β
β BoxIndexMeta[] β { box, bbox, n } β
β β
β Data tiles (compact string-table format): β
β { systems: ["velib", ...], β
β types: ["station", "vehicle"], β
β points: [{ i, s, t, name, cap? }, ...] } β
β s/t are indices into the systems/types tables β
β i is a sequential integer (point identity) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Given a latitude and longitude, finds the closest bike stations and vehicles using the two-tier KDBush index and geokdbush (haversine-aware kNN).
# Find 5 nearest (default)
pnpm query-nearest -- 48.8566 2.3522
# Find 10 nearest
pnpm query-nearest -- 48.8566 2.3522 --k 10Output:
Querying nearest to (48.8566, 2.3522) with k=5β¦
Loaded box-index: 120 boxes
# Distance System Name Type Cap
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 80 m velib-paris Rue de Rivoli - ChΓ’telet station 34
2 120 m dott-paris (dockless: #47) vehicle β
3 150 m velib-paris Place du ChΓ’telet station 28
4 190 m lime-paris (dockless: #102) vehicle β
5 230 m velib-paris Quai de la MΓ©gisserie station 42
The query loads exactly one data tile per request:
- Loads the box-index (~5 KB, cached) and finds the 5 nearest box centers
- Picks the box whose bounding box contains the query point (closest-center tiebreaking)
- Loads that single box tile and runs kNN within it
pnpm fetch-systems # once: get all 1,200+ systems
npx tsx src/scripts/fetch-station.ts 0 # fetch systems one by one (or batch)
pnpm build-tiles # build geo-index from all fetched data
pnpm query-nearest -- 48.8566 2.3522 # find nearest bikes to the Eiffel TowerThe ingest pipeline automates fetching, scheduling, and compacting GBFS data across all 1,200+ systems. It replaces manual per-system fetching with a continuous batch process that tracks changes and stores time-series snapshots.
Data flow:
pnpm ingest (batch fetcher)
ββ data/stations/{id}.json latest cached station data
ββ data/snapshots/YYYY-MM-DD/HH-MM/{id}.{type}.json time-series snapshots
ββ data/ingest.db fetch logs + scheduling state
pnpm compact
ββ data/parquet/availability/{date}.parquet compressed history
pnpm rebuild-index
ββ data/tiles/box-index.bin + box-NNN.bin spatial index
Each system is polled on a schedule with three interval tiers β base (300s), rush (120s), quiet (900s). Feeds are hashed with SHA-256 and only written when content changes. Failing systems use exponential backoff.
See operations.md for how to run the ingest pipeline locally and monitor it.
fetch-all-stations.bashβ fetches station data for all ~1,700 systems in parallel (10 concurrent jobs viaxargs)fetch-all-availability.bashβ sequentially refreshes availability for all fetched systems
scripts/analyze-station-density.pyβ reports station/vehicle counts per system, bounding boxes, and geohash cell distributionscripts/identify-hotspots.pyβ maps the top 40 densest geohash-4 cells to their contributing systems
The entire system β ingest, tile building, and serving β runs on Cloudflare Workers. No external VPS needed.
- operations.md β running locally, monitoring, troubleshooting
- deploy.md β deployment instructions, KV layout, architecture
| Route | Description |
|---|---|
GET / |
Landing page with geolocation β redirects to /nearby |
GET /nearby?lat=&lon= |
Server-rendered HTML page showing nearest bikes |
GET /nearest?lat=&lon=&k= |
JSON API: nearest bikes (default k=5) |
GET /systems |
Return cached systems catalog from KV |
GET /systems/refresh |
Re-fetch systems.csv from GitHub, store in KV |
GET /systems/status?format=html |
Systems directory β every system with stats |
GET /station/:system_id |
Fetch full station data (cached 24h in KV) |
GET /availability/:system_id |
Refresh just availability using cached discovery |
GET /ingest/init |
Seed scheduling state for all systems in KV |
GET /ingest/status?format=html&n=30 |
Ingest dashboard β JSON (default) or HTML, n=rows |
GET /ingest/run?sync=true&limit=N |
Fetch N due systems inline (local dev) |
GET /planner/run?sync=true |
Run planner + build tiles inline (local dev) |
GET /api |
List available API routes |
pnpm test # single run
pnpm test:watch # watch modeTests use fixture files in src/test/fixtures/ with mock fetch β no network calls.
src/
types/gbfs.ts # TypeScript types for GBFS feeds + geo-index
lib/
gbfs-fetch.ts # Shared fetch helpers (fetchJson, findFeedUrl)
fetch-systems-catalog.ts # Pure fn: systems.csv β SystemsCatalog
fetch-station-data.ts # Pure fn: system β station/vehicle data
fetch-availability.ts # Pure fn: cached discovery β fresh availability
box-assign.ts # Pure fn: assign points to bounding boxes
box-assign-meta.ts # Pure fn: metadata-based box assignment (for Workers)
geo-tile.ts # Pure fn: tile building + binary serialization
geo-query.ts # Pure fn: two-tier kNN query with box routing
content-hash.ts # Pure fn: content hashing for change detection
kv-scheduling.ts # KV-based scheduling state (for Workers)
ingest-db.ts # SQLite metadata DB for CLI ingest pipeline
ingest-scheduler.ts # Batch ingest scheduling logic
compact-parquet.ts # Parquet compaction for historical data
scripts/
fetch-systems.ts # CLI: fetches catalog β data/systems.json
fetch-station.ts # CLI: fetches station data β data/stations/
fetch-availability.ts # CLI: refreshes availability β data/availability/
build-tiles.ts # CLI: builds geo-index tiles β data/tiles/
query-nearest.ts # CLI: queries nearest bikes from tiles
ingest.ts # CLI: batch ingest with scheduling
ingest-init.ts # CLI: initialize ingest database
ingest-status.ts # CLI: show ingest status
compact.ts # CLI: compact snapshots to parquet
rebuild-index.ts # CLI: rebuild index with change detection
test/
fixtures/ # Small example JSON/CSV for tests
fetch-systems-catalog.test.ts
fetch-station-data.test.ts
fetch-availability.test.ts
box-assign.test.ts
box-assign-meta.test.ts
geo-tile.test.ts
geo-query.test.ts
content-hash.test.ts
kv-scheduling.test.ts
ingest-db.test.ts
ingest-scheduler.test.ts
compact-parquet.test.ts
worker/
index.ts # Cloudflare Worker (routes + cron + queues)
ingest.ts # Queue consumer: per-system GBFS fetches
planner.ts # Box assignment planner (metadata-based)
tile-builder.ts # Queue consumer: per-box tile building
pages.ts # Server-rendered HTML pages
index.test.ts # Worker tests with mock KV
tsconfig.json # Worker-specific TS config
wrangler.toml # Wrangler configuration (KV + Queues + crons)
deploy.md # Full deployment guide
data/ # Fetched data (gitignored)
stations/ # Per-system station data JSON files
availability/ # Per-system availability JSON files
tiles/ # Geo-index tiles (box-index.bin + box-NNN.bin)
The library functions in src/lib/ are pure β they take an optional fetch parameter and do no file I/O. This makes them callable from:
- Local CLI scripts (
src/scripts/) β use Node's globalfetch, write to disk - Cloudflare Workers β pass the Worker's
fetchbinding, store in KV/R2/D1 - Unit tests β pass a mock
fetch, assert on returned objects
systems.csv βββ gbfs.json βββ station_information.json (static, fetch once)
(master list) (per system) station_status.json (real-time, refresh)
free_bike_status.json (real-time, refresh)
The scraper handles both GBFS v2.x (feeds nested under language keys like data.en.feeds) and v3.0 (flat data.feeds array), and the v3 rename of free_bike_status β vehicle_status.
fetch-station-dataβ full fetch: discovery β resolve all feed URLs β fetch station_information + station_status + free_bike_status. Caches the discovery response in the output file.fetch-availabilityβ lightweight refresh: reads cached discovery β fetches only station_status + free_bike_status/vehicle_status. Skips the discovery round-trip entirely.
This separation means you can poll availability every few minutes without redundantly re-fetching static station info or re-doing discovery.
The geo-index is built from scraped station data and uses two key libraries:
- kdbush β static KD-tree for 2D points. 2Γ less memory than flatbush for point data. Serializes to/from
ArrayBufferwith zero-copyKDBush.from()restore. - geokdbush β haversine-aware kNN queries on kdbush indexes.
around(index, lon, lat, k)returns indices sorted by distance. ~0.025ms per query on 500K points.
The spatial index has two tiers, both using the same binary tile format and KDBush:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tier 1: box-index.bin (~5 KB) β
β KDBush of ~120 box center-points β
β Metadata: [{ box: "box-001", bbox: {...}, n: 4832 }, ...] β
β β
β Tier 2: box-NNN.bin (~340 KB each, ~120 tiles) β
β KDBush of ~5000 station/vehicle points β
β Metadata: { systems: [...], types: [...], points: [...] } β
β Compact format β system_id and type stored as string-table refs β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Points are grouped into boxes by the assignPointsToBoxes() algorithm:
- System-first grouping β all points from a GBFS system start in one group
- Recursive median-split β groups with >6500 points are split along the median lat or lon
- Greedy merge β small groups are merged with their nearest neighbor (within 500 km)
- Bbox expansion β bounding boxes are expanded by 10% to create overlap zones
This keeps systems together (no fragmentation across arbitrary grid cells), handles dense cities naturally via overlap + closest-center tiebreaking, and eliminates the need for hardcoded city zones or geohash cells.
lat,lon β around(boxIndex, lon, lat, 5) β check bbox containment β closest center wins
β load single box-NNN.bin β around(tileIndex, lon, lat, k) β results
A query always loads exactly 1 data tile. The box-index (~5 KB) is loaded once and cached. Total: 2 KV reads on first request, 1 KV read on subsequent requests. CPU budget is well under 5ms β within Cloudflare Workers free tier limits.