Skip to content

DataLake setting 'skip_non_iceberg_tables'#1815

Open
ianton-ru wants to merge 1 commit into
antalya-26.3from
feature/antalya-26.3/test-glue-non-iceberg-hive-table
Open

DataLake setting 'skip_non_iceberg_tables'#1815
ianton-ru wants to merge 1 commit into
antalya-26.3from
feature/antalya-26.3/test-glue-non-iceberg-hive-table

Conversation

@ianton-ru
Copy link
Copy Markdown

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

DataLake catalog setting skip_non_iceberg_tables to skip non-Iceberg tables in Glue and Unity catalogs

Documentation entry for user-facing changes

Solved #1811 .
Some DataLake catalogs can contain different type of tables, when ClickHouse can read only Iceberg tables.
New setting skip_non_iceberg_tables hides non-Iceberg tables from system.tables table and from SHOW TABLES FROM <database> query response.
Partially with AI assistance (tests, unity implementation)

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

Partially with AI assistance (tests, unity implementation)
@github-actions
Copy link
Copy Markdown

Workflow [PR], commit [b4cb5aa]

@ianton-ru
Copy link
Copy Markdown
Author

Audit: PR #1815 — DataLake setting skip_non_iceberg_tables

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1815 (DataLake setting skip_non_iceberg_tables for Glue and Unity catalogs)

Diff scope: antalya-26.3feature/antalya-26.3/test-glue-non-iceberg-hive-table (b4cb5aa6125fc6b7a4f2dc007ccdcc23db54b8d6) — DatabaseDataLakeSettings.cpp, DatabaseDataLake.cpp, GlueCatalog.{h,cpp}, UnityCatalog.{h,cpp}, docs/en/engines/database-engines/datalake.md, tests/integration/test_database_glue/test.py, tests/integration/test_database_delta/test.py.

Confirmed defects

No confirmed defects in reviewed scope.

Coverage summary

  • Scope reviewed: skip_non_iceberg_tables setting declaration and wiring in DatabaseDataLake; list-time filtering in GlueCatalog::getTablesForDatabase (Iceberg table_type) and UnityCatalog::getTablesForSchema (isReadableDeltaTable); integration tests for Glue (Hive TEXT) and Unity (TEXT + Delta); documentation in datalake.md.

  • Categories failed: (none).

  • Categories passed: Glue filter aligns with existing readability rule (table_type must be ICEBERG, case-insensitive); Unity isReadableDeltaTable mirrors tryGetTableMetadata rules (data_source_format == DELTA, securable_kind in TABLE_DELTA / TABLE_DELTA_EXTERNAL); empty() with limit=1 still returns first readable table after skips; no new shared mutable state or locking; bundled fix for wrong placeholder in Unity data_source_format error message (securable_kinddata_source_format in tryGetTableMetadata).

  • Assumptions/limits: Static review only; CI not executed in this audit. Unity list responses are assumed to include data_source_format for Spark-created Delta tables (integration test encodes this). With the setting enabled, skipped tables remain reachable by explicit name (SHOW CREATE TABLE, existsTable) — intentional per tests, slightly beyond PR text that only mentions SHOW TABLES / system.tables. Setting name says “iceberg” while Unity filters non-Delta tables — documented in PR/changelog, not a code defect.

Expanded review notes (methodology snapshot)

Call graph (in scope)

  1. Config: CREATE DATABASE … SETTINGS skip_non_iceberg_tables=1DatabaseDataLake::getCatalog() passes flag into GlueCatalog / UnityCatalog constructors.
  2. Listing: getCatalog()->getTables() → Glue getTablesForDatabase / Unity getTablesForSchema → filtered names → DatabaseDataLake::getLightweightTablesIterator / getTablesIteratorSHOW TABLES / system.tables (with show_data_lake_catalogs_in_system_tables=1).
  3. Direct access (not filtered): existsTable, SHOW CREATE TABLE, getCreateTableQueryImpl still resolve skipped tables by name; SELECT fails with existing unreadable-table errors (no table_type / ICEBERG for Glue, unsupported data_source_format / DELTA for Unity).

Invariant

When skip_non_iceberg_tables=1, catalog enumeration (getTables and downstream list queries) exposes only tables that would be default-readable under existing per-catalog rules; direct metadata access by known name is unchanged.

Fault categories exercised (logical)

Category Status Outcome
Glue: missing / non-ICEBERG table_type Executed Skipped from lists; still unreadable on SELECT
Unity: non-DELTA data_source_format Executed Skipped from lists; still unreadable on SELECT
Unity: list entry missing both data_source_format and securable_kind Not Applicable OSS UC list includes data_source_format per tested Spark/Delta path
empty() / limit=1 with skips Executed Finds first readable table after filter
Concurrency / shared state Not Applicable Immutable bool set at catalog construction
existsTable vs list inconsistency Executed Intentional; covered by integration tests

Positive incidental change

UnityCatalog::tryGetTableMetadata error for unsupported data_source_format now prints the actual format value instead of securable_kind (pre-existing message bug fixed in the same diff).

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant