Skip to content

Conversation

@hannes
Copy link
Member

@hannes hannes commented Jun 3, 2020

This PR adds a basic Parquet file reading capability to DuckDB. For now, this reader does not support nested tables. Some higher-level types are also still unsupported due to lack of agreement between systems. For now we only support Snappy compression.

Use like so from SQL:

SELECT * FROM PARQUET_SCAN('some/parquet/file.parquet')

The parquet reader is also integrated with the Python relational API, e.g.

import duckdb
duckdb.from_parquet('some/parquet/file.parquet').limit(10)

The Parquet reader is implemented as a DuckDB extension. The Python and R packages load this extension by default, from C++ :

#include "parquet-extension.hpp"
db.LoadExtension<ParquetExtension>()

@hannes hannes merged commit 439acd0 into master Jun 4, 2020
@hannes hannes deleted the powerparquet branch June 4, 2020 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants