Releases: Altinity/clickhouse-operator
release-0.27.1
NOTE: This is a major update to 0.27.0. Starting from 0.27.1 clickhouse-operator is FIPS-enabled
Added
- secure/insecure flags to CHK in order to control ports by a single switch
- FIPS-140 security hardening controls in operator configuration: https://github.com/Altinity/clickhouse-operator/blob/0.27.1/docs/security_hardening_fips.md
- FIPS-140 reference configuration: https://github.com/Altinity/clickhouse-operator/blob/0.27.1/docs/fips_setup.md
Fixed
- Fix permanent ArgoCD diff on resourceFieldRef.divisor by @pkieszcz in #1989
- Fix inconsistent secret rendering in autogenerated
remote_serversby @realyota in #1992. Closes #1991 - Fixed skipped Delete when scale-to-0 Update fails by @dashashutosh80 in #1993. Closes #1990
- Fixed
.status.usedTemplatesthat were not properly cleaned when templates were removed - Bump dependent libraries to address CVEs
New Contributors
Full Changelog: release-0.27.0...release-0.27.1
release-0.27.0
Added
- Reference CHK from CHI by a name rather than a service:
spec:
configuration:
zookeeper:
keeper:
name: my-keeper
- [EXPERIMENTAL] Pre/post hooks. Allows to inject different SQL commands on reconcile events.
- supported
events:Any,HostCreate,HostUpdate,HostStart,HostStop,HostConfigRestart,HostRollout,HostShutdown,HostDelete targetcontrols where to run SQL:FirstHost(default),AllHosts,AllShardsfailurePolicyspecifies what to do if hook execution fails with an error:Fail(default) orIgnore.
See CRD for detailed description. Example:
- supported
reconcile:
host:
hooks:
pre:
- sql:
queries:
- "SYSTEM STOP REPLICATION QUEUES"
events:
- HostShutdown
- Now operator automatically restarts aborted reconcile if failed pod goes online after abort
- Restart operator on operator configuration change. #1960. Closes #1930
- Allow to exclude certain metrics from exporting. Noisy OS/CPU metrics are now excluded by default. #1975. Closes #1876
Exclusion rules are applied by default and can be changed in operator configuration:
excludeMetricsRegexp:
- "^metric\\.(OS.*CPU[0-9]+|CPUFrequencyMHz_[0-9]+)$"
- Helm: Allow readiness and liveness probes for operator containers by @janeklb in #1976
- Helm: Add security context for crdHook to enable security policy compliance by @qlevasseur-genetec in #1950
Changed
- Enabled
async_replicationanduse_xid_64in Keeper default configuration. Requires Keeper 25.3 or above. - Whitelist Keeper four letter commands by default
- Switched Keeper probes to ruok
- Kubernetes client library has been upgraded to 0.30.14 by @dcoppa in #1970
Fixed
- Fix watch ClickHouseOperatorConfiguration in operator namespace. #1959.
- Fix chk probe crashloop by @hananbs in #1961. Closes #1962
- Fix secret-backed env upgrade rollout. #1966. Closes #1963
- Helm: fix configmap too long by @madrisan in #1949. Closes #1911
- build(deps): bump go.opentelemetry.io/otel/sdk from 1.42.0 to 1.43.0 by @dependabot[bot] in #1953
- Fix: format string issues by @destinyoooo in #1955
- Fix: Prevent CrashLoopBackOff during image upgrade with RollingUpdate (#1926) by @dashashutosh80 in #1956. Closes #1926
- Fixed a bug when host has not been removed from status fields
hostsWithReplicaCaughtUpandhostsWithTablesCreatedwhen removing from a cluster. - Prevent nil pointer panic in poll when CR Get fails by @wucm667 in #1974. Closes #1972
- Scope schema discovery to target host's cluster in multi-cluster CHI by @lukas-pfannschmidt-tr in #1965. Closes #1964
- Fix: use plain errors for clickhouse model messages in #1980 by @immanuwell
New Contributors
- @qlevasseur-genetec made their first contribution in #1950
- @destinyoooo made their first contribution in #1955
- @hananbs made their first contribution in #1961
- @wucm667 made their first contribution in in #1974
- @immanuwell made their first contribution in #1980
Full Changelog: release-0.26.3...release-0.27.0
release-0.26.3
Fixed
- Prevent CrashLoopBackOff during image upgrade with RollingUpdate #1956 by @dashashutosh80. Closes #1926
- Address CVEs in dependent libraries
Full Changelog: release-0.26.2...release-0.26.3
release-0.26.2
Fixed
- Fixed a race condition when updating ClickHouse configuration and version altogether with a configuration setting that did not exist in the old version. Closes #1926
- Fixed a bug when some Keeper nodes could be left offline after configuration changes
- Fixed a bug when operator did not respect watched namespaces for Keeper. Closes #1923
- Fixed potential races in configuration hash calculation. May close #1907
- Updated dependent libraries to address CVE-2026-24051
Changed
- Deleting multi-node CHI and CHK is now much faster
- Added
asynchronous_metrics_keeper_metrics_onlyto default Keeper configuration
Full Changelog: release-0.26.1...release-0.26.2]
release-0.26.1
Fixed
- Fixed Keeper startup that was slow due to missing quorum. Closes #1931 and #1856
- Fixed Keeper deletion logic that could previously leave PVCs undeleted
- Fixed hostName generation in
statusthat might result in excessive schema propagation cycles - Fixed FQDN normalization to prevent trailing-dot inconsistencies between internal hostname representations
- Bump stdlib version to address CVE
- Bump base image version by @Slach in #1941. Closes #1940
- Document custom service template behaviour for CHK by @realyota in #1939
Full Changelog: release-0.26.0...release-0.26.1
release-0.26.0
IMPORTANT: Due to ClickHouse upstream regression ClickHouse/ClickHouse#89693 DDL queries may not work on newly created ClickHouse pods. It affects Kubernetes deployments only in some new ClickHouse versions (25.8.10+ and above). The workaround is to restart ClickHouse pods. The problem is fixed by ClickHouse/ClickHouse#92339, see backports for different release branches. The fix is backported to Altinity Stable 25.8.16.10001 as well.
Closes #1883 and #1913
Added
- Added an option to abort reconcile if STS needs to be recreated. It can be configured in operator configuration or CHI.
# Reconcile StatefulSet scenario
reconcile:
statefulSet:
recreate:
# What to do in case operator is in need to recreate StatefulSet?
# Possible options:
# 1. abort - abort the process, do nothing with the problematic StatefulSet, leave it as it is,
# do not try to fix or delete or update it, just abort reconcile cycle.
# Do not proceed to the next StatefulSet(s) and wait for an admin to assist.
# 2. recreate - proceed and recreate StatefulSet.
# Triggered when StatefulSet update fails or StatefulSet is not ready
onUpdateFailure: recreate
- Added an option to configure system tables for metrics scrapping. The default is
system.metricsandsystem.custom_metricstables, but those can be changed with a regular expression if needed:
tablesRegexp: "^(metrics|custom_metrics)$"
Changed
- The
suspendflag now immediately aborts a running reconcile. Previously, it did not affect the one that was running - When
suspendflag is set, any reconcile attempt automatically sets CHI/CHK status to aborted. - Add optional registry prefix for operator and metrics images in Helm chart by @lesandie in #1928
- Improve ClickHouse Keeper Grafana Dashboard by @discostur in #1872
- Add CRDHook annotations by @eyyu in #1914
- Hotfix crdhook, add imagePullSecrets by @Slach in #1917
- Fix installer to default template URL to OPERATOR_VERSION by @realyota in #1910
- sort keys in Settings.Keys() method for consistent order (fix manifest reconcile issue) by @mastercactapus in #1900
- Multiple documentation fixes
Fixed
- Fixed Keeper rolling update logic. Closes #1796 #1915
- Fixed a bug when replica was not added to monitoring until it catches up the replication lag
- Fixed version parsing for FIPS compatible builds of ClickHouse. Closes #1850
- Fixed
stopandsuspendattributes for CHK that were previously ignored - Fix
distributed_ddl.replicas_pathmismatch that could prevent sharing (Zoo)Keeper between multiple clusters @Elmo33 in #1922 - Fixed a bug when
defaults.storageManagement.reclaimPolicywas not respected - Fixed slow initial connectivity to newly created pods caused by DNS search list exhaustion (ndots:5). Added trailing dot to FQDN and increased connect timeout
- Fixed a bug where reconcile settings specified at CHI level (e.g.
spec.reconcile.statefulSet.recreate.onUpdateFailure) were not inherited by cluster-level reconcile configuration
Other
- stdlib has been upgraded to 1.25.6 to address CVEs
- Operator has been certified for 25.8.16.10001 Altinity.Stable.
New Contributors
- @rajdudhare1 made their first contribution in #1890
- @siggy made their first contribution in #1906
- @eyyu made their first contribution in #1914
- @Elmo33 made their first contribution in #1922
- @mastercactapus made their first contribution in #1900
- @lesandie made their first contribution in #1928
Full Changelog: release-0.25.6...release-0.26.0
release-0.25.6
Changed
- Hosts are not excluded from remote_servers anymore if restart is not needed. Previously, replicas in replicated cluster might be removed for a short time even if restart was not needed.
configuration.zookeepersection changes do not require restart anymore- Last reconciliation error and list of errors are now stored in CHI status
Fixed
- actionPlan is now optional in status. That fixes operator upgrade problems that might happen in some environments.
- Fixed excessive reconciles triggered by endpoint slices. Closes #1873
- Fixed crash in CHK that might happen sometimes. Closes #1863
- Fixed an issue with handling fractional requests/limits that would result in excessive reconcile. Closes #1849 #1821
- Fixed a bug with operator crash in Terminating namespace. Closes #1871
- stdlib was upgraded in order to address CVEs in dependent libraries
NOTE: Due to regression in upstream ClickHouse ClickHouse/ClickHouse#89693 schema propagation and DDL statements do not work with ClickHouse versions 25.8.10+ and newer until it is resolved.
Full Changelog: release-0.25.5...release-0.25.6
release-0.25.5
Added
- The latest applied ActionPlan is now stored in chi-storage ConfigMap
- volumeClaimTemplate.volumeAttributeClass attribute. Closes #1818
- Configuration for DROP REPLICA behavior:
reconcile:
host:
drop:
replicas:
# Whether the operator during reconcile procedure should drop replicas when replica is deleted
onDelete: yes
# Whether the operator during reconcile procedure should drop replicas when replica volume is lost
onLostVolume: yes
# Whether the operator during reconcile procedure should drop active replicas when replica is deleted or recreated
active: no
Now active replicas are never dropped. That solves a potential bug when replica could be dropped on a multi-volume node if newly added volume is not yet available.
Changed
- Enabled ReadinessProbe for Keeper. Closes #1846
- Enabled query_log for all DDL statements performed by operator
- minor logging improvements by @janeklb in #1829
Fixed
- Removed excessive logging isUpdatedEndpointSlice():unknown
- Changed metrics collection query that could be broken in 0.25.4 for old ClickHouse versions
Helm updates
- fix: correctly delete existing helm release artifacts by appending asset ID to URL by @Slach in #1827
- Updated Helm repo location to https://helm.altinity.com by @DougTidwell in #1835
- Helm: add metrics.enabled conditional checks for serviceMonitor by @KJone1 in #1834
- Hotfix helm schema by @Slach in #1842
New Contributors
Full Changelog: release-0.25.4...release-0.25.5
release-0.25.4
Added
- Operator configuration 'reconcile' section is now fully supported at CHI level under both 'reconcile' and old 'reconciling' name. Previously, only selected settings were available at CHI level.
- Allow to exclude namespaces that operator watches. by @AdheipSingh in #1770
spec:
watch:
namespaces:
include: []
exclude: [] # new
- Option to choose which probe should operator wait for during reconcile. Previously, it always waited for pod to be ready. This can now be configured in 'reconcile' section of operator or CHI:
spec:
reconcile:
host:
wait:
probes:
startup: "yes"
readiness: "no"
- ZooKeeper compression support by @wilkermichael in #1809
- Support watching multiple namespaces in OLM by @saeedhosseini21 in #1825. Closes #1491
Changed
system.custom_metricstable is currently scrapped for monitoring in addition tometricsandasynchronous_metrics. That allows to inject custom monitoring data from ClickHouse side.- Deprecated Endpoints API has been replaced with EndpointSlice. Closes 1801
Fixed
- Fixed a bug with long environment variables used for secrets being truncated. Closes #1804
- Fixed a bug that operator did not respect watched namespaces for CHK
Helm updates
- Define values.schema.json by @Slach in #1815. Closes #1814
- Added
clickhouse-operatordeployment strategy parameters to Helm chart by @Slach in #1789 - Publish operator helm chart to helm.altinity.com in addition to artifacthub.io
Full Changelog: release-0.25.3...release-0.25.4
release-0.25.3
Added
- Added support for
pdbMaxUnavailablein CHK - Added
.spec.configuration.clusters[].pdbManagedfor CHI and CHK that allow to set external PDBs by @zrudzionis in #1768
Fixed
- Add ZK error handling and logging by @wilkermichael in #1762
- Fixed collision between PDBs for CHI and CHK with the same name. As a side effect PDB for CHI will be re-created with this upgrade
- Fixed rare panic in buildCRFromObj() when it can not find CR
- Fixed update of storage configmap that might block in rare cases. Closes #1781
- Fixed the check for host to be included in remote_servers, that did not work correctly in some network configurations. Closes #1782
Helm updates
- feat(helm): add priority class to helm chart by @nobletooth in #1774
- feat(helm): publish as an OCI helm package as well by @ogirardot in #1779
- There is no 'helm repo upgrade', should be 'update' by @CaptTofu in #1780
- Hotfix port names, to avoid warning during helm install by @Slach in #1784
New Contributors
- @wilkermichael made their first contribution in #1762
- @zrudzionis made their first contribution in #1768
- @nobletooth made their first contribution in #1774
- @ogirardot made their first contribution in #1779
- @CaptTofu made their first contribution in #1780
Full Changelog: release-0.25.2...release-0.25.3