-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Describe the bug
Antalya is experiencing Segmentation Faults (Signal 11) when interacting with the DataLakeCatalog engine connected to an Apache Polaris Iceberg REST catalog. This affects both metadata operations (like SHOW TABLES) and direct data access queries.
The crashes occur in:
DB::DatabaseDataLake::getLightweightTablesIterator- when enumerating tablesDB::DatabaseDataLake::tryGetTableImpl- when resolving table identifiers with namespaces
ClickHouse pod logs show segmentation fault (Signal 11) errors, and the crashes are reproducible and consistent across multiple attempts.
To Reproduce
Steps to reproduce the behavior:
-
Install and configure Polaris (Apache Iceberg REST Catalog) using the official Helm chart from the Apache Polaris repository
- Configure Polaris with GCS as storage backend
- Ensure a catalog exists with at least one namespace containing Iceberg tables
- REST Catalog API endpoint should be accessible at
http://polaris-service:8181/api/catalog/v1/
-
Create the database in ClickHouse:
SET allow_experimental_database_iceberg = 1; CREATE DATABASE IF NOT EXISTS polaris_iceberg ENGINE = DataLakeCatalog('http://polaris-service:8181/api/catalog/v1/demo_catalog') SETTINGS catalog_type = 'rest', catalog_credential = 'root:secret', oauth_server_uri = 'http://polaris-service:8181/api/catalog/v1/oauth/tokens', warehouse = 'demo_catalog';
-
Attempt metadata listing:
SHOW TABLES FROM polaris_iceberg- Result: Segmentation fault, pod crashes and restarts immediately
- Stack trace:
DB::DatabaseDataLake::getLightweightTablesIterator
-
Attempt system table query:
SELECT * FROM system.tables WHERE database='polaris_iceberg'- Result: Segmentation fault, pod crashes and restarts immediately
- Stack trace:
DB::DatabaseDataLake::getLightweightTablesIterator
-
Attempt direct table query with quoted identifier:
SELECT * FROM polaris_iceberg.\demo_ns.flights` LIMIT 5`- Result: Segmentation fault, pod crashes and restarts immediately
- Stack trace:
DB::DatabaseDataLake::tryGetTableImpl->DB::IdentifierResolver
Note: Before the crashes, we also observed syntax limitations:
SELECT * FROM polaris_iceberg.demo_ns.flights- Results inSyntax errorat.flights(parser does not natively support 3-part names without quoting)SELECT * FROM polaris_iceberg.flights- Results inTable cannot have empty namespace(correctly identifies namespace requirement)
Expected behavior
The DataLakeCatalog engine should:
- Successfully enumerate tables from the REST catalog without crashing
- Resolve table identifiers with namespace prefixes (e.g.,
demo_ns.flights) correctly - Allow querying of Iceberg tables without segmentation faults
The same Polaris catalog works correctly with Spark and other Iceberg clients, confirming the issue is specific to ClickHouse's DataLakeCatalog implementation.
Key information
- Project Antalya Build Version:
altinity/clickhouse-server:25.8.9.20496.altinityantalya - Cloud provider: Google Cloud Platform (GCP)
- Kubernetes provider: Google Kubernetes Engine (GKE)
- Object storage: Google Cloud Storage (GCS)
- Iceberg catalog: Apache Polaris REST Catalog
- Catalog endpoint:
http://polaris-service:8181/api/catalog/v1/ - Authentication: Basic auth (root/secret credentials)
- Storage configuration: GCS bucket as warehouse location (e.g.,
gs://bucket-name/data) - Catalog name:
demo_catalogwith namespacedemo_nscontaining Iceberg tables - Polaris metadata stored in PostgreSQL (Cloud SQL)
- Catalog endpoint:
Additional context
Storage Access Details: Iceberg table data is stored in Google Cloud Storage (GCS). ClickHouse would need to access GCS to read table data files, but the crashes occur during metadata operations (REST catalog API calls) before any data file access is attempted. This suggests the issue is in the catalog metadata handling rather than storage access.
Workaround: The same Polaris catalog works correctly with Spark and other Iceberg clients, so users can fall back to Spark for Iceberg interactions until this bug is resolved.