Skip to content

Conversation

jackdingilian
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@jackdingilian jackdingilian requested review from a team as code owners March 13, 2025 16:43
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigtable Issues related to the googleapis/python-bigtable API. labels Mar 13, 2025
@jackdingilian jackdingilian force-pushed the update-protocol branch 2 times, most recently from 867cd6b to 9686ba3 Compare March 13, 2025 18:44
self._executor = (
concurrent.futures.ThreadPoolExecutor() if not CrossSync.is_async else None
)
self._crc32c = _CRC32C()
Copy link
Contributor

@daniel-sanche daniel-sanche Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment here describing why this is needed could be helpful for the future

Edit: see my comments below on using static methods before addressing. Maybe we can avoid passing this around?

app_profile_id: Optional[str],
request_body: Dict[str, Any],
prepare_metadata: Metadata,
crc32c_implementation: _CRC32C,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is a public class. Adding a new positional argument could be problematic, because it can break existing code. (I don't know why a user would be building this themselves, but technically it is part of the public API surface)

Instead of passing it can, can we just access client._crc32c? Maybe make this an optional argument if the user wants to override it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed but it raises an interesting question. I dont think we really want the constructor to be public here, as it will likely need to change when we add PreparedStatement.

Maybe we should only expose an ExecuteQueryIterator interface publicly and move the details to an ExecuteQueryIteratorImpl? We can still break the interface currently bc of the preview warning but don't want to box us in after launch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the interface/implementation option could work. We could also raise a warning/exception in the init, that we suppress when called internally (maybe by passing an extra _ignore_warnings init argument?). Or even just adding a warning to the init docstring should be enough.

I think any of those options should be enough to make changes freely in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool, I added a docstring as that seems simplest and clear enough for users

"proto-plus >= 1.22.3, <2.0.0dev",
"proto-plus >= 1.25.0, <2.0.0dev; python_version>='3.13'",
"protobuf>=3.20.2,<6.0.0dev,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5",
"google-crc32c>=1.5.0, <2.0.0dev",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to mark this as an optional dependency (move into extras?), so this doesn't break any supported platforms. I need to do a bit of research around this though

Copy link
Contributor

@daniel-sanche daniel-sanche Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this some more, and it seems like if we want it to be optional, we have to add it to extras. But extras are only installed if explicitly requested (pip install google-cloud-bigtable[crc32c]). There's a PEP for default extras, but it looks like it's not accepted yet.

Do you think it makes sense to make this opt-in? Maybe add a reference to google-cloud-bigtable[crc32c] in the error message? Or does that cause problems for the feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Based on discussion offline) there is a python fallback built into this. I wrapped the import in a custom warning when it uses the fallback now

"""
# google_crc32c wants bytes. Use a memoryview to avoid a copy
memory_view = memoryview(val)
return self._google_crc32c.value(bytes(memory_view))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this class doesn't hold any state. I wonder if this could all be implemented as static/class methods? Maybe we could avoid creating a _CRC32C instance and passing it around so much, and just call _CRC32C.checksum() instead?

Copy link
Contributor Author

@jackdingilian jackdingilian Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this use a classmethod that warns on the first call if the import warned

if expected_checksum != checksum:
raise ValueError(
f"Unexpected checksum mismatch. Expected: {expected_checksum}, got: {checksum}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this (and other exceptions) should be listed in the docstrings somewhere.

Maybe in a class docstring for ExecuteQueryIterator, explaining what the generator yields and raises? Maybe in anext too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

" Disabling ExecuteQuery checksum validation. Please upgrade to a supported version"
" and architecture if possible",
category=RuntimeWarning,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do you expect this feature to be used? It might be better to only show this if a checksum is encountered? Or execute_query is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jackdingilian jackdingilian force-pushed the update-protocol branch 4 times, most recently from b128439 to 286e192 Compare March 17, 2025 14:41
@jackdingilian jackdingilian added the kokoro:run Add this label to force Kokoro to re-run the tests. label Mar 17, 2025
Copy link
Contributor

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a couple questions about tests, but LGTM

import google_crc32c # type: ignore


class _CRC32C(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a test for this class. Especially checking if the warning is emitted as expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in checksum_test


from google.cloud.bigtable.data._cross_sync import CrossSync
from tests.unit.data.execute_query.sql_helpers import (
chunked_responses,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any system tests that can be added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a follow on with a couple of new integration tests. W.r.t the chunked responses there isn't a great way to verify if the backend is sending multiple chunks or a single chunk.

I do have a significant number of conformance tests ready to go for this logic though, and im planning on adding the testproxy support for python as a fast follow

@daniel-sanche daniel-sanche added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 17, 2025
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Mar 17, 2025
@jackdingilian jackdingilian force-pushed the update-protocol branch 2 times, most recently from 7893f22 to 172e602 Compare March 17, 2025 21:00
assert (
"Using pure python implementation of `google-crc32` for ExecuteQuery response validation"
in str(first_call_warning[0])
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be good to make sure repeated calls only send one warning. And maybe also a test where no warnings are needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jackdingilian jackdingilian force-pushed the update-protocol branch 2 times, most recently from b0d0176 to 5e01070 Compare March 18, 2025 14:37
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Mar 18, 2025
@jackdingilian jackdingilian merged commit eb2cd26 into googleapis:prepare-query Mar 18, 2025
7 of 9 checks passed
jackdingilian added a commit that referenced this pull request Mar 18, 2025
* feat: update execute_query to use PrepareQuery API (#1095)

* feat: Implement updated execute query protocol (#1096)

* feat: Refactor Metadata, add system tests, remove preview warning (#1099)

* Fix  setup.py merge

* fix: skip sql tests for emulator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants