Replies: 7 comments 6 replies
-
|
Hmm, interesting! What are the use cases you want to enable with that? At some point, DuckDB will need to execute and materialize the results as Arrow format, since we don't use Arrow internally. Or would an Arrow recordset/scanner work? We can read those, but I don't think we write them... |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the reply, I should have included more context, and maybe my question doesn't make sense in the context of polars. My basic use case is for students to write more pythonic 'piped' syntax we see in |
Beta Was this translation helpful? Give feedback.
-
|
Hi, It would be useful to use LazyDataFrame without loading all data into memory. For instance : duckdb.sql("SELECT * FROM large_data").lazy().filter(pl.col("A") == 4).select(pl.col("A") + 10).limit(10).collect() |
Beta Was this translation helpful? Give feedback.
-
|
Looking for the same thing! This would open the door to a really strong duckdb+delta integration, because polars has a |
Beta Was this translation helpful? Give feedback.
-
|
Following up here, with the announcement of Ducklake, there's an even stronger case for this integration. Needing Polars to add a read_ducklake before adopting would be a bit unfortunate, but if duckdb supported a |
Beta Was this translation helpful? Give feedback.
-
|
The attention here is misplaced, this is not a DuckDB limitation, we can create a record batch reader that lazily scans a duckdb relation just fine, but Polars does not expose any way to lazily construct a LazyFrame, all of the ingestion methods eagerly materialize the input Please raise awareness on the Polars side if you want this feature, we'd be happy to implement this when they support it. |
Beta Was this translation helpful? Give feedback.
-
|
This has been implemented in #17947 that will land in v1.4.0 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
duckdb's integration with polars is really nice.
I'd really like to be able to use
polarspythonic notation withduckdb's superior performance with large, out-of-RAM data (including remote data, like partitioned parquet files on S3 buckets). Based on the current documentation though, it looks like the.pl()method creates an in-memory dataframe, not a lazy data frame. Is it possible to generate a lazy data frame instead?Beta Was this translation helpful? Give feedback.
All reactions