Skip to content

Conversation

@MedvedewEM
Copy link
Contributor

@MedvedewEM MedvedewEM commented Dec 15, 2025

While removing host from table's metadata we used hardcoded values for excluded_paths.
Now it is configurable in config - default values are unchanged.

Summary by Sourcery

Make Zookeeper table metadata cleanup respect configurable exclusion patterns instead of hardcoded ones.

New Features:

  • Allow configuration of Zookeeper metadata cleanup excluded paths via chadmin config.

Enhancements:

  • Wire excluded path patterns from configuration into the Zookeeper cleanup logic and add a scenario test covering configurable exclusions.

Tests:

  • Add an integration test verifying that tables under configured excluded Zookeeper paths are not cleaned up.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 15, 2025

Reviewer's Guide

Make Zookeeper table/database metadata cleanup use configurable excluded_paths from chadmin.zookeeper.clean_zk_metadata_for_hosts instead of a hardcoded list, and add a feature test validating that cleanup respects configured exclusions.

Sequence diagram for configurable excluded_paths in Zookeeper metadata cleanup

sequenceDiagram
    participant Config as clean_zk_metadata_for_hosts_config
    participant Cleanup as cleanup_tables_and_databases
    participant Collector as _collect_objects_for_cleanup
    participant Traverser as _traverse_zk_tree_and_find_objects

    Config->>Cleanup: provide workers, retry_min_wait_sec, retry_max_wait_sec, max_retries, excluded_paths
    Cleanup->>Cleanup: configure retry_decorator and max_workers
    Cleanup->>Collector: call with zk, zk_root_path, collect_tables, collect_database, excluded_paths, max_workers
    Collector->>Traverser: traverse Zookeeper tree and find replicated_objects
    Traverser-->>Collector: replicated_objects
    Collector->>Collector: filter replicated_objects using excluded_paths
    Collector-->>Cleanup: tables_for_cleanup, databases_for_cleanup
    Cleanup->>Cleanup: delete non_excluded tables and databases metadata
    Cleanup-->>Config: operation completes using configured excluded_paths
Loading

Class diagram for updated Zookeeper cleanup configuration and helper function

classDiagram
    class CleanZkMetadataForHostsConfig {
        int workers
        int retry_min_wait_sec
        int retry_max_wait_sec
        int max_retries
        List~string~ excluded_paths
    }

    class ZookeeperCleanModule {
        +cleanup_tables_and_databases(zk, zk_root_path)
        +_collect_objects_for_cleanup(zk, zk_root_path, collect_tables, collect_database, excluded_paths, max_workers) Tuple
    }

    CleanZkMetadataForHostsConfig <.. ZookeeperCleanModule : reads
    ZookeeperCleanModule ..> CleanZkMetadataForHostsConfig : uses excluded_paths
Loading

File-Level Changes

Change Details Files
Wire excluded_paths for Zookeeper metadata cleanup through configuration instead of using hardcoded patterns.
  • Add excluded_paths parameter to _collect_objects_for_cleanup so callers can supply patterns to skip when collecting replicated objects for cleanup.
  • Remove the local hardcoded excluded_paths list from _collect_objects_for_cleanup and rely on the passed-in configuration.
  • Read excluded_paths from clean_zk_metadata_for_hosts config section in cleanup_tables_and_databases and pass it down to _collect_objects_for_cleanup.
  • Keep existing default excluded path patterns by moving them into the default configuration.
ch_tools/chadmin/internal/zookeeper_clean.py
ch_tools/common/config.py
Define default excluded_paths in configuration and fix a comment typo in the Zookeeper cleanup config.
  • Extend clean_zk_metadata_for_hosts default config to include the previous excluded_paths regex list so behaviour remains unchanged without explicit configuration.
  • Fix typo in comment describing retry timings from 'wrost' to 'worst' and clarify the calculation.
ch_tools/common/config.py
Add an integration/feature test to verify table metadata cleanup honours configurable excluded_paths.
  • Introduce a new scenario that configures chadmin.zookeeper.clean_zk_metadata_for_hosts.excluded_paths to exclude a specific replicated table path.
  • Create and detach two replicated tables, one matching the excluded path and one not, then run table cleanup to validate that replicas for the excluded table are preserved while the other table's replicas are cleaned up.
tests/features/chadmin_zookeeper.feature

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@MedvedewEM MedvedewEM marked this pull request as ready for review December 16, 2025 05:35
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Accessing clean_zk_metadata_for_hosts["excluded_paths"] directly will break existing deployments that don't yet define this key; consider using .get("excluded_paths", DEFAULT_LIST) or similar to preserve backward compatibility with older configs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Accessing `clean_zk_metadata_for_hosts["excluded_paths"]` directly will break existing deployments that don't yet define this key; consider using `.get("excluded_paths", DEFAULT_LIST)` or similar to preserve backward compatibility with older configs.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@aalexfvk aalexfvk merged commit b532d45 into main Dec 16, 2025
18 checks passed
@aalexfvk aalexfvk deleted the remove_hosts_metadata_with_excluded_paths branch December 16, 2025 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants