-
Notifications
You must be signed in to change notification settings - Fork 63
feat: Implement updated execute query protocol #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement updated execute query protocol #1096
Conversation
867cd6b
to
9686ba3
Compare
self._executor = ( | ||
concurrent.futures.ThreadPoolExecutor() if not CrossSync.is_async else None | ||
) | ||
self._crc32c = _CRC32C() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment here describing why this is needed could be helpful for the future
Edit: see my comments below on using static methods before addressing. Maybe we can avoid passing this around?
app_profile_id: Optional[str], | ||
request_body: Dict[str, Any], | ||
prepare_metadata: Metadata, | ||
crc32c_implementation: _CRC32C, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is a public class. Adding a new positional argument could be problematic, because it can break existing code. (I don't know why a user would be building this themselves, but technically it is part of the public API surface)
Instead of passing it can, can we just access client._crc32c? Maybe make this an optional argument if the user wants to override it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removed but it raises an interesting question. I dont think we really want the constructor to be public here, as it will likely need to change when we add PreparedStatement.
Maybe we should only expose an ExecuteQueryIterator
interface publicly and move the details to an ExecuteQueryIteratorImpl? We can still break the interface currently bc of the preview warning but don't want to box us in after launch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the interface/implementation option could work. We could also raise a warning/exception in the init, that we suppress when called internally (maybe by passing an extra _ignore_warnings init argument?). Or even just adding a warning to the init docstring should be enough.
I think any of those options should be enough to make changes freely in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool, I added a docstring as that seems simplest and clear enough for users
"proto-plus >= 1.22.3, <2.0.0dev", | ||
"proto-plus >= 1.25.0, <2.0.0dev; python_version>='3.13'", | ||
"protobuf>=3.20.2,<6.0.0dev,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5", | ||
"google-crc32c>=1.5.0, <2.0.0dev", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to mark this as an optional dependency (move into extras?), so this doesn't break any supported platforms. I need to do a bit of research around this though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at this some more, and it seems like if we want it to be optional, we have to add it to extras. But extras are only installed if explicitly requested (pip install google-cloud-bigtable[crc32c]
). There's a PEP for default extras, but it looks like it's not accepted yet.
Do you think it makes sense to make this opt-in? Maybe add a reference to google-cloud-bigtable[crc32c] in the error message? Or does that cause problems for the feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Based on discussion offline) there is a python fallback built into this. I wrapped the import in a custom warning when it uses the fallback now
""" | ||
# google_crc32c wants bytes. Use a memoryview to avoid a copy | ||
memory_view = memoryview(val) | ||
return self._google_crc32c.value(bytes(memory_view)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this class doesn't hold any state. I wonder if this could all be implemented as static/class methods? Maybe we could avoid creating a _CRC32C instance and passing it around so much, and just call _CRC32C.checksum()
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this use a classmethod that warns on the first call if the import warned
if expected_checksum != checksum: | ||
raise ValueError( | ||
f"Unexpected checksum mismatch. Expected: {expected_checksum}, got: {checksum}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this (and other exceptions) should be listed in the docstrings somewhere.
Maybe in a class docstring for ExecuteQueryIterator, explaining what the generator yields and raises? Maybe in anext too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
" Disabling ExecuteQuery checksum validation. Please upgrade to a supported version" | ||
" and architecture if possible", | ||
category=RuntimeWarning, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do you expect this feature to be used? It might be better to only show this if a checksum is encountered? Or execute_query is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
b128439
to
286e192
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a couple questions about tests, but LGTM
import google_crc32c # type: ignore | ||
|
||
|
||
class _CRC32C(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a test for this class. Especially checking if the warning is emitted as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in checksum_test
|
||
from google.cloud.bigtable.data._cross_sync import CrossSync | ||
from tests.unit.data.execute_query.sql_helpers import ( | ||
chunked_responses, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any system tests that can be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a follow on with a couple of new integration tests. W.r.t the chunked responses there isn't a great way to verify if the backend is sending multiple chunks or a single chunk.
I do have a significant number of conformance tests ready to go for this logic though, and im planning on adding the testproxy support for python as a fast follow
7893f22
to
172e602
Compare
assert ( | ||
"Using pure python implementation of `google-crc32` for ExecuteQuery response validation" | ||
in str(first_call_warning[0]) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it would be good to make sure repeated calls only send one warning. And maybe also a test where no warnings are needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
b0d0176
to
5e01070
Compare
5e01070
to
d7654dc
Compare
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕