feat(python): support file objects for read_json method#9573
Conversation
Tishj
left a comment
There was a problem hiding this comment.
Thanks, looks good to me
Good thinking on the parametrized tests, solid way to easily add coverage for this!
|
Thanks for the PR! Could you just have a look at the failing CI? |
1f9aed9 to
dda97af
Compare
|
I've had to make several changes here to avoid files being automatically closed, and to improve size inference. If I can't infer the size, I default back to -1. @lnkuiper thoughts? |
|
@Mause if defaulting to -1 works then that seems fine, this will only affect the progress bar (I think?). The file size is used to determine the next number of bytes to read when reading files that are "seekable" (and therefore can be read in parallel). Maybe it would be safer to make it non-seekable if the file size cannot be inferred? Then there shouldn't be issues at all. |
I actually found that if I returned 0 as the file size, no rows were returned at all
We can look at this yeah, what would be the performance implication? |
|
@Mause, yes. We cannot read from non-seekable files in parallel. Threads will need to wait for each other to read raw bytes from the file object, but after they have read a buffer, they can parse/convert independently (in parallel). So there's a performance implication, but it shouldn't be big because the file object is already in memory, not on disk. I'm also doing some work on the JSON reader in #10300 currently, but I think the behavior for non-seekable files should be the same. |
…t_python_support_file_objects_for_read_json_method
|
@Mause it looks like there are still several test failures here, but I'm not sure if they are related to the actual PR. Do you want to pick this up again? If so could you resolve the merge conflicts? |
|
This has since been implemented in #13040 |
No description provided.