Skip to content

Conversation

@Tishj
Copy link
Contributor

@Tishj Tishj commented Sep 8, 2023

This PR fixes #8522

Arrow supports TIMESTAMP_TZ with any time unit (s, ms, ns, us)
We only support the time unit us for them, so we convert to us.

When a filter is applied that does a comparison against a constant, the constant is cast to the type of the column.
So the constant gets cast to TIMESTAMP_TZ (us)

When this filter gets pushed down into arrow, that results in attempting to compare TIMESTAMP_TZ(ms) against TIMESTAMP_TZ(us), which is not supported.

@github-actions github-actions bot marked this pull request as draft September 8, 2023 15:47
@Tishj Tishj marked this pull request as ready for review September 8, 2023 15:54
@Mytherin
Copy link
Collaborator

Looks like the exception type is wrong in the test, could you have another look?

@Tishj
Copy link
Contributor Author

Tishj commented Sep 14, 2023

On Mac with Python 3.9.17 I can't reproduce it


On a Linux docker container:
(all packages in requirements-dev.txt have been reinstalled prior to running this)

root@f6dc42ebe522:/app/duckdb# python3 -m pytest tools/pythonpkg/tests/fast/test_import_without_pyarrow_dataset.py 
============================= test session starts ==============================
platform linux -- Python 3.9.12, pytest-7.4.2, pluggy-1.3.0
rootdir: /app/duckdb/tools/pythonpkg/tests
configfile: pytest.ini
collected 1 item                                                               

tools/pythonpkg/tests/fast/test_import_without_pyarrow_dataset.py .      [100%]

============================== 1 passed in 0.73s ===============================
root@f6dc42ebe522:/app/duckdb# uname -a 
Linux f6dc42ebe522 5.15.49-linuxkit #1 SMP PREEMPT Tue Sep 13 07:51:32 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

@github-actions github-actions bot marked this pull request as draft September 14, 2023 19:18
@Mytherin Mytherin marked this pull request as ready for review September 14, 2023 20:08
@github-actions github-actions bot marked this pull request as draft September 15, 2023 06:11
@Tishj
Copy link
Contributor Author

Tishj commented Sep 15, 2023

Think I found the culprit, the test that's failing is actually underspecified, which I now fixed.
It uses arrow_dataset().IsLoaded() which will return true if it was loaded correctly by us before.

The test however monkeypatches the pyarrow dataset to not be present, so in the sys modules it's no longer present.
We don't check for this so the VerifyDatasetLoaded method succeeds, while later on where the attributes are imported this will fail.

I don't see why that's a huge deal though, in both cases it throws an exception, the one we throw is just a little more descriptive, none the less I changed and hopefully fixed it now :)


This doesn't fix it either, unless we actually make an environment where pyarrow.dataset is not available we can't really test this, because monkeypatching just makes it as if it wasn't imported

If we import pyarrow.dataset from within duckdb, I believe that just undoes all of the monkeypatching
If I add a test above this one which doesn't monkeypatch, then the second test will fail because it was already imported and probably cached.

@Tishj
Copy link
Contributor Author

Tishj commented Sep 15, 2023

I am tempted to remove the failing test
It's testing a scenario that we can not reliably, deterministically replicate

If the test is run standalone it passes, when I run it together with other tests, it doesn't raise an exception (because pyarrow is loaded before it was monkeypatched out)
And I'm guessing the order tests are run is platform dependent, which makes the exception we get on Linux different because of the sequence of tests that are run before the offending test.

@Mytherin Mytherin marked this pull request as ready for review September 15, 2023 19:57
bool PythonImportCacheItem::IsLoaded() const {
auto type = (*this)();
return type.ptr() != nullptr;
bool loaded = type.ptr() != nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the point of this change is?

Copy link
Contributor Author

@Tishj Tishj Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None really, I had initially meant to check ModuleIsLoaded if the ptr was none-null, to double verify
But even with monkeypatching this looked like it didn't make IsLoaded return false

So this is left over code that I might want to undo

@Mytherin Mytherin merged commit 35cf84a into duckdb:main Sep 16, 2023
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[python] InvalidInputException filtering pyarrow table timestamp column with timezone and unit is not 'us'

3 participants