08 Jun 25
DuckLake provides a lightweight one-stop solution for a data lake and catalog, similar to Delta Lake with Unity Catalog and Iceberg with Lakekeeper or Polaris, released under the MIT license. It includes an open table format but it’s also a data lakehouse format, meaning that it also contains a catalog to encode the schema of the data stored. It needs a storage layer (both blob storage and block-based storage work) and a catalog database (any SQL-compatible database works). The data files of DuckLake must be stored in Parquet. Similarly to other data lakehouse technologies, DuckLake does not support constraints, keys, or indexes. Currently, it can be exported into a DuckDB database and vanilla Parquet files. You can also use it for a “multiplayer DuckDB” setup with multiple DuckDB instances reading and writing the same dataset.
20 Apr 25
An open-source tool for reading OpenStreetMap PBF files using DuckDB.
- Scalable reader for OpenStreetMap ProtoBuffer (pbf) files.
- Is based on top of DuckDB1 with its Spatial2 extension.
- Saves files in the GeoParquet3 file format for easier integration with modern cloud stacks.
- Utilizes multithreading unlike GDAL that works in a single thread only.
- Can filter data based on geometry without the need for ogr2ogr clipping before operation.
- Can filter data based on OSM tags.
- Utilizes caching to reduce repeatable computations.
- Can be used as Python module as well as a beautiful CLI based on Typer4.