Releases: longhorn/longhorn
Longhorn v1.12.0
Longhorn v1.12.0 Release Notes
The Longhorn team is excited to announce the release of Longhorn v1.12.0. This feature release marks a major milestone for Longhorn: the V2 Data Engine is now officially Generally Available (GA).
With the V2 Data Engine reaching GA, Longhorn v1.12.0 strengthens the production story for modern workloads with topology-aware provisioning, dual-stack and V2 IPv6 support, improved observability and operational tooling, and clearer guidance around V1 and V2 behavior and feature parity.
For terminology and background on Longhorn releases, see Releases.
Removal
V2 Backing Image Removal
V2 Backing Images are removed in Longhorn v1.12.0. Suggest using the Containerized Data Importer (CDI) to import VM disk images into V2 volumes to achieve the same purpose.
If you have V2 volumes that were created from backing images, you must migrate them before upgrading to v1.12.0:
- Backup and recreate (recommended): Create a backup of the V2 volume, delete the original volume, then restore from backup. The restored volume will not have a backing image dependency.
- Delete the volume: If the data is not needed, delete the V2 volume directly.
V2 volumes with backing image dependencies cannot be upgraded in-place. Attempting to upgrade without migration may result in volume attachment failures.
Primary Highlights
V2 Data Engine
Generally Available
We are pleased to announce that the V2 Data Engine has officially graduated to General Availability in Longhorn v1.12.0.
This milestone reflects major progress in stability, operational safety, networking support, and feature maturity. Compared with earlier releases, V2 volumes are better positioned for production use, combining GA readiness with modern networking support, more precise scheduling behavior, and clearer visibility into where V2 already matches V1 behavior and where differences still matter.
Important
V2 Live Upgrade:
V2 volumes do not support live upgrades between Longhorn v1.12 patch releases and must be detached before upgrading. Support is planned when upgrading from a Longhorn v1.12 release to a Longhorn v1.13 release.
V2 Volume Attach Latency at Scale:
In environments with a growing number of attached V2 volumes, increased attach latency has been observed for subsequent volumes. Initial analysis suggests this may be related to NVMe-TCP connection handling at scale, though the precise layer, SPDK user-space or Linux kernel, has not yet been identified. Further investigation is in progress. For follow-up status, see Issue #13241.
ARM64 NVMe-backed Block-Type Node Disk Limitation:
On ARM64 systems, V2 volumes may experience stuck I/O when SPDK is configured with two or more CPU cores and node disks use the NVMe driver. The root cause may lie in either the Linux kernel or SPDK itself, and further investigation is required. As a workaround, use AIO-backed node disks instead of NVMe-backed node disks on ARM64 systems. For follow-up status, see Issue #13243.
For a summary of the current V1 and V2 volume behavior differences and feature parity, see V1 and V2 Volume Feature Support.
Looking ahead, the roadmap remains active: fast volume cloning for V2 data engine (#12552) and Sharding Storage (Experimental Feature) (#1061) are planned for Longhorn v1.12.1.
Smarter Provisioning and Modern Networking
Topology-Aware PV Node Affinity Control
Longhorn v1.12.0 adds the csi-allowed-topology-keys setting and strictTopology StorageClass parameter for more precise control of PV nodeAffinity. These options allow users to limit which topology keys are propagated and, with WaitForFirstConsumer, pin the PV to the selected node topology when needed.
IPv6 Support for V2 Volumes
V2 volumes now support single-stack IPv6 Kubernetes clusters.
Dual-Stack Cluster Support
Longhorn now supports dual-stack Kubernetes clusters when all nodes are configured with their IP families in the same order, either all IPv4-first or all IPv6-first. This applies to both the V1 and V2 data engines.
Warning: Dual-stack clusters with mixed IP family ordering across nodes are not supported and may result in connectivity failures between replicas and the engine.
Better Operations and Observability
Default CPU Allocation
Longhorn v1.12.0 changes the default data-engine-cpu-mask from 0x1, one CPU core, to 0x3, two CPU cores. V2 Data Engine uses a busy-polling reactor model where the master reactor handles both I/O polling and management RPCs. When only a single core is assigned, heavy I/O workloads can delay or starve RPC processing, resulting in increased latency, timeout events, and operational instability.
Assigning two or more cores allows I/O and management tasks to run on separate reactors, improving responsiveness and operational stability.
On-Demand Snapshot Checksum Calculation
Longhorn v1.12.0 adds longhornctl support for triggering on-demand snapshot checksum calculation. The command can target a specific volume, all volumes on a specific node, or all volumes in the cluster, and the checksum operation runs asynchronously in the background.
Toggle Kubernetes Metrics Server Integration
Longhorn v1.12.0 adds the Kubernetes Metrics Server Metrics Enabled setting to disable metrics-server-dependent metrics when the Kubernetes Metrics Server API is unavailable. This reduces repeated scrape warnings and unnecessary API calls while preserving other Longhorn metrics.
Longhorn Manager Memory Optimization
Longhorn v1.12.0 optimizes longhorn-manager informer caching to reduce memory usage, especially in large clusters with high pod counts. This lowers cluster-wide memory overhead caused by repeated caching of non-Longhorn pod data on every manager instance.
Configurable Engine Image Pod Liveness Probe
Longhorn v1.12.0 adds settings to configure the engine-image DaemonSet liveness probe period, timeout, and failure threshold. These settings help reduce unnecessary engine-image pod restarts on resource-constrained clusters, especially during upgrades or transient CPU spikes.
Critical Stability Fixes
Instance Manager Stability During Replica Rebuild Storms
Longhorn v1.12.0 fixes an instance-manager panic that could occur during replica rebuild storms. In affected environments, the panic could terminate all iSCSI targets served by the instance-manager and trigger cascading volume detachments across multiple PVCs.
Replica Rebuild Progress Reporting
Longhorn v1.12.0 fixes a replica rebuild progress reporting bug that could display values greater than 100% after file-sync retries on unstable networks. Progress accounting is now reset correctly for retried files, so rebuild progress remains within the valid 0% to 100% range.
Replica Auto-Balance Scheduling Loop
Longhorn v1.12.0 fixes a regression in replica auto-balance that could trigger a repeated replica create-and-delete loop when Replica Auto Balance was set to best-effort. In affected clusters, Longhorn could keep scheduling an extra replica instead of stabilizing at the configured replica count.
Replica CR Leak During Failed Local Scheduling
Longhorn v1.12.0 fixes a replica scheduling issue where large numbers of stopped Replica CRs could accumulate when dataLocality was set to best-effort and the node did not have enough eligible local disk space for another replica. In affected clusters, recurring reconciliation could keep creating placeholder Replica CRs instead of reusing a single failed-schedule placeholder.
CSI Storage Capacity Tracking
Longhorn v1.12.0 fixes a CSIStorageCapacity scheduling issue that could cause compute nodes without Longhorn disks to report zero capacity and be rejected by WaitForFirstConsumer scheduling. In affected clusters with separated compute and storage nodes, new PVCs could remain pending even though eligible storage was available on storage nodes.
Encrypted Volume Size Correction
Longhorn v1.12.0 pre-allocates the 16 MiB LUKS2 header in the replica backend file for encrypted volumes, so the dm-...
Longhorn v1.12.0-rc4
DON'T UPGRADE from/to any RC/Preview/Sprint releases because the operation is not supported.
Resolved Issues in this release
Highlight
- [FEATURE] Decouple V2 Data Engine Initiator and Target Placement 7124 - @derekbit @shuo-wu @chriscchien
- [FEATURE] IPv6 for V2 Data Engine 10928 - @COLDTURNIP @chriscchien
- [FEATURE] Support IPv4/IPv6 Dual-Stack with IPv6 Family First or IPv4 Family First 11531 - @COLDTURNIP @c3y1huang @chriscchien
- [FEATURE] Support v2 Data Engine (GA) 6229 - @derekbit
Feature
- [FEATURE] Support on-demand snapshot checksum calculation 11442 - @yangchiu @davidcheng0922
- [FEATURE] Add
--tolerationsflag tolonghornctlfor scheduling DaemonSet pods on tainted nodes 12993 - @chriscchien @bachmanity1
Improvement
- [IMPROVEMENT] Remove v2 backing image monitoring 13181 - @COLDTURNIP @derekbit @chriscchien
- [IMPROVEMENT] Wait for spdk_tgt process to terminate during pre-stop cleanup 13179 - @derekbit @chriscchien
- [IMPROVEMENT] Restart Instance Manager pod when hugepage settings change and no instances are running 13170 - @derekbit @chriscchien
- [IMPROVEMENT] Support CPU list format for V2 Data Engine CPU Mask setting with automatic conversion to hex mask 13166 - @derekbit @chriscchien
- [IMPROVEMENT] Update Longhorn
distroin chart tolonghorn13160 - @derekbit @chriscchien - [IMPROVEMENT] Misleading storage values 12633 - @elTwingo @davidcheng0922 @houhoucoop @roger-ryao
- [IMPROVEMENT] Implement Network Reconnection for Enhancing Replica Rebuilding Resilience 9626 - @yangchiu @mschneider82
- [IMPROVEMENT] Add support of new StorageClass parameters to helm chart 9324 - @yangchiu @TheFutonEng
- [IMPROVEMENT] Make Kubernetes Metrics Server (metrics.k8s.io) integration toggleable 13011 - @yangchiu @mantissahz @hookak
- [IMPROVEMENT] Reduce longhorn-manager memory usage by optimizing cluster-wide informer caching 12771 - @hookak @roger-ryao
- [IMPROVEMENT] Topology-aware PV nodeAffinity control: allowedTopologies keys + strictTopology 12684 - @hookak @roger-ryao
- [IMPROVEMENT] Set storage class annotations using helm values 13137 - @yangchiu @Profiidev
- [IMPROVEMENT] longhorn-manager pods race on webhook TLS Secret at scale 13012 - @yangchiu @hookak
- [IMPROVEMENT] Improve Longhorn auto-salvage observability 13018 - @yangchiu @derekbit
- [IMPROVEMENT] Removing Scheduled condition check during volume expansion 12606 - @yangchiu @davidcheng0922
- [IMPROVEMENT]
TooManySnapshotsvolume condition uses a hard-coded threshold despite configurable snapshot max count 12396 - @COLDTURNIP @yangchiu - [IMPROVEMENT] Move v2 volume backup restore from replica to engine 9277 - @davidcheng0922 @roger-ryao
- [IMPROVEMENT] Is there any way to have longhorn without python 12679 - @roger-ryao
- [IMPROVEMENT] sparse-tools APIs must not introduce breaking changes to existing APIs. 12967 - @yangchiu @derekbit
- [UI][IMPROVEMENT]
TooManySnapshotsvolume condition uses a hard-coded threshold despite configurable snapshot max count 12922 - @chriscchien @houhoucoop - [IMPROVEMENT] Add
Backup Targetto volume listcustom columnoptions 12619 - @yangchiu @houhoucoop - [IMPROVEMENT] Allow disabling creation of the default longhorn StorageClass via Helm 12906 - @hookak @roger-ryao
- [IMPROVEMENT][TEST] Add unit tests for util parsing and string conversion helpers 12898 - @archy-rock3t-cloud @chriscchien
- [IMPROVEMENT] Metrics for backups 11387 - @yangchiu @mantissahz @Copilot
- [IMPROVEMENT] chart: allow specifying spec.sampleLimit on ServiceMonitor 12671 - @grelland @yangchiu
- [IMPROVEMENT] Add metrics for non-Encrypted and encrypted volumes 12462 - @derekbit @mantissahz @chriscchien @Copilot
- [IMPROVEMENT] Clarify helm version in generate-longhorn-yaml error message 12630 - @luojiyin1987 @chriscchien
- [IMPROVEMENT][UI] Link version number to git releases 11132 - @chriscchien @houhoucoop
- [IMPROVEMENT] Record the current share manager image in the Share Manager CR status 11203 - @derekbit @roger-ryao @Copilot
- [IMPROVEMENT] Snapshot tree color explanation 12247 - @houhoucoop
- [IMPROVEMENT] Refuse to attach strict-local volume to the wrong node 8546 - @yangchiu @derekbit @mantissahz @Copilot
- [IMPROVEMENT] Ensure V2 Engine ReplicaAdd respects the fast-replica-rebuild-enabled setting 12540 - @davidcheng0922 @roger-ryao
- [IMPROVEMENT] Relax
endpoint-network-for-rwx-volumevalidation for migratable block-mode volumes 12644 - @c3y1huang @chriscchien - [IMPROVEMENT] detailed log for the reason of node controller deleting backing image copies 12584 - @COLDTURNIP @yangchiu
- [IMPROVEMENT] RBAC permissions for csi-resizer 12681 - @yangchiu @konstantin-kelemen
- [IMPROVEMENT] Adding a message to hint users to clean up non-existing disks in Backing Image CR 10617 - @chriscchien @Copilot
- [IMPROVEMENT] Keep workload pod in the original zone and region 12517 - @bachmanity1
- [IMPROVEMENT] Consider node storage capacity when scheduling pods with existing PVs 12398 - @bachmanity1
- [IMPROVEMENT] Volume may enter faulty state without clear reason when backing image size mismatches 11673 - @COLDTURNIP @derekbit @roger-ryao @Copilot
Bug
- [BUG] nil pointer dereference panic in instance-manager during replica rebuild. 13087 - @derekbit @shuo-wu @roger-ryao
- [BUG] v2 RWX workload IO timed out after Longhorn components are deleted and restarted 13217 - @yangchiu @derekbit
- [BUG] v2 volume gets stuck in
Degradedstate after instance manager is deleted and restarted 13215 - @derekbit - [BUG] Test Encrypted Volume Upgrade: Old-engine RWO volume shows 1008 MiB after expansion to 2 GiB instead of expected 2032 MiB 13194 - @derekbit @mantissahz @roger-ryao
- [BUG] v2 volume deletion clears
Spec.NodeIDbefore delete, potentially orphaning replicas whenStatus.InstanceManagerNameis empty 13198 - @derekbit @chriscchien - [BUG] Backup target still shows
Availableafter being reset to empty 13195 - @yangchiu @derekbit - [BUG] v2 instance-manager pod stuck in create/delete loop when engine frontend recovery blocks gRPC startup 13185 - @derekbit @chriscchien
- [BUG] global.cattle.systemDefaultRegistry is not applied as the image registry prefix in 108.2.1+up1.10.2 13071 - @COLDTURNIP @yangchiu
- [BUG] Potential resource leak in longhorn-instance-manager 13143 - @derekbit @chriscchien
- [BUG] Encrypt volume provided size is 16MB shorter than the claimed size 9205 - @mantissahz @roger-ryao
- [BUG] CSIStorageCapacity reports 0 for compute nodes without Longhorn disks, breaking WaitForFirstConsumer scheduling 12807 - @bachmanity1 @roger-ryao
- [BUG] Google Cloud Storage (GCS) backup target always fails with SignatureDoesNotMatch due to AWS SDK Go v2 CRC32 checksum incompatibility 12676 - @mantissahz @chriscchien
- [BUG] Longhorn Fails to enable volume security on FIPS enabled systems 12721 - @davidcheng0922 @c...
Longhorn v1.12.0-rc3
DON'T UPGRADE from/to any RC/Preview/Sprint releases because the operation is not supported.
Resolved Issues in this release
Highlight
- [FEATURE] Decouple V2 Data Engine Initiator and Target Placement 7124 - @derekbit @shuo-wu @chriscchien
- [FEATURE] IPv6 for V2 Data Engine 10928 - @COLDTURNIP @chriscchien
- [FEATURE] Support IPv4/IPv6 Dual-Stack with IPv6 Family First or IPv4 Family First 11531 - @COLDTURNIP @c3y1huang @chriscchien
- [FEATURE] Support v2 Data Engine (GA) 6229 - @derekbit
Feature
- [FEATURE] Support on-demand snapshot checksum calculation 11442 - @yangchiu @davidcheng0922
- [FEATURE] Add
--tolerationsflag tolonghornctlfor scheduling DaemonSet pods on tainted nodes 12993 - @chriscchien @bachmanity1
Improvement
- [IMPROVEMENT] Remove v2 backing image monitoring 13181 - @COLDTURNIP @derekbit @chriscchien
- [IMPROVEMENT] Wait for spdk_tgt process to terminate during pre-stop cleanup 13179 - @derekbit @chriscchien
- [IMPROVEMENT] Restart Instance Manager pod when hugepage settings change and no instances are running 13170 - @derekbit @chriscchien
- [IMPROVEMENT] Support CPU list format for V2 Data Engine CPU Mask setting with automatic conversion to hex mask 13166 - @derekbit @chriscchien
- [IMPROVEMENT] Update Longhorn
distroin chart tolonghorn13160 - @derekbit @chriscchien - [IMPROVEMENT] Misleading storage values 12633 - @elTwingo @davidcheng0922 @houhoucoop @roger-ryao
- [IMPROVEMENT] Implement Network Reconnection for Enhancing Replica Rebuilding Resilience 9626 - @yangchiu @mschneider82
- [IMPROVEMENT] Add support of new StorageClass parameters to helm chart 9324 - @yangchiu @TheFutonEng
- [IMPROVEMENT] Make Kubernetes Metrics Server (metrics.k8s.io) integration toggleable 13011 - @yangchiu @mantissahz @hookak
- [IMPROVEMENT] Reduce longhorn-manager memory usage by optimizing cluster-wide informer caching 12771 - @hookak @roger-ryao
- [IMPROVEMENT] Topology-aware PV nodeAffinity control: allowedTopologies keys + strictTopology 12684 - @hookak @roger-ryao
- [IMPROVEMENT] Set storage class annotations using helm values 13137 - @yangchiu @Profiidev
- [IMPROVEMENT] longhorn-manager pods race on webhook TLS Secret at scale 13012 - @yangchiu @hookak
- [IMPROVEMENT] Improve Longhorn auto-salvage observability 13018 - @yangchiu @derekbit
- [IMPROVEMENT] Removing Scheduled condition check during volume expansion 12606 - @yangchiu @davidcheng0922
- [IMPROVEMENT]
TooManySnapshotsvolume condition uses a hard-coded threshold despite configurable snapshot max count 12396 - @COLDTURNIP @yangchiu - [IMPROVEMENT] Move v2 volume backup restore from replica to engine 9277 - @davidcheng0922 @roger-ryao
- [IMPROVEMENT] Is there any way to have longhorn without python 12679 - @roger-ryao
- [IMPROVEMENT] sparse-tools APIs must not introduce breaking changes to existing APIs. 12967 - @yangchiu @derekbit
- [UI][IMPROVEMENT]
TooManySnapshotsvolume condition uses a hard-coded threshold despite configurable snapshot max count 12922 - @chriscchien @houhoucoop - [IMPROVEMENT] Add
Backup Targetto volume listcustom columnoptions 12619 - @yangchiu @houhoucoop - [IMPROVEMENT] Allow disabling creation of the default longhorn StorageClass via Helm 12906 - @hookak @roger-ryao
- [IMPROVEMENT][TEST] Add unit tests for util parsing and string conversion helpers 12898 - @archy-rock3t-cloud @chriscchien
- [IMPROVEMENT] Metrics for backups 11387 - @yangchiu @mantissahz @Copilot
- [IMPROVEMENT] chart: allow specifying spec.sampleLimit on ServiceMonitor 12671 - @grelland @yangchiu
- [IMPROVEMENT] Add metrics for non-Encrypted and encrypted volumes 12462 - @derekbit @mantissahz @chriscchien @Copilot
- [IMPROVEMENT] Clarify helm version in generate-longhorn-yaml error message 12630 - @luojiyin1987 @chriscchien
- [IMPROVEMENT][UI] Link version number to git releases 11132 - @chriscchien @houhoucoop
- [IMPROVEMENT] Record the current share manager image in the Share Manager CR status 11203 - @derekbit @roger-ryao @Copilot
- [IMPROVEMENT] Snapshot tree color explanation 12247 - @houhoucoop
- [IMPROVEMENT] Refuse to attach strict-local volume to the wrong node 8546 - @yangchiu @derekbit @mantissahz @Copilot
- [IMPROVEMENT] Ensure V2 Engine ReplicaAdd respects the fast-replica-rebuild-enabled setting 12540 - @davidcheng0922 @roger-ryao
- [IMPROVEMENT] Relax
endpoint-network-for-rwx-volumevalidation for migratable block-mode volumes 12644 - @c3y1huang @chriscchien - [IMPROVEMENT] detailed log for the reason of node controller deleting backing image copies 12584 - @COLDTURNIP @yangchiu
- [IMPROVEMENT] RBAC permissions for csi-resizer 12681 - @yangchiu @konstantin-kelemen
- [IMPROVEMENT] Adding a message to hint users to clean up non-existing disks in Backing Image CR 10617 - @chriscchien @Copilot
- [IMPROVEMENT] Keep workload pod in the original zone and region 12517 - @bachmanity1
- [IMPROVEMENT] Consider node storage capacity when scheduling pods with existing PVs 12398 - @bachmanity1
- [IMPROVEMENT] Volume may enter faulty state without clear reason when backing image size mismatches 11673 - @COLDTURNIP @derekbit @roger-ryao @Copilot
Bug
- [BUG] Potential resource leak in longhorn-instance-manager 13143 - @derekbit @chriscchien
- [BUG] nil pointer dereference panic in instance-manager during replica rebuild. 13087 - @derekbit @shuo-wu @roger-ryao
- [BUG] Encrypt volume provided size is 16MB shorter than the claimed size 9205 - @mantissahz @roger-ryao
- [BUG] CSIStorageCapacity reports 0 for compute nodes without Longhorn disks, breaking WaitForFirstConsumer scheduling 12807 - @bachmanity1 @roger-ryao
- [BUG] Google Cloud Storage (GCS) backup target always fails with SignatureDoesNotMatch due to AWS SDK Go v2 CRC32 checksum incompatibility 12676 - @mantissahz @chriscchien
- [BUG] Longhorn Fails to enable volume security on FIPS enabled systems 12721 - @davidcheng0922 @chriscchien
- [BUG] Replica Auto-Balance Causes Infinite Replica Scheduling Loop 12926 - @yangchiu @shuo-wu
- [BUG] Replica rebuild progress can go over 100% 12949 - @yangchiu @mschneider82 @davidcheng0922
- [BUG] v2 backup/restore open failure paths can leak NVMe initiators and exposed bdevs 13114 - @derekbit @roger-ryao
- [BUG] Connection leak in longhorn-spdk-engine 13101 - @derekbit @roger-ryao @Copilot
- [BUG] HTTP response body leaks in support bundle status polling and webhook readiness checks 13115 - @derekbit @roger-ryao
- [BUG] Encrypted volume stuck in Attaching/Detaching loop after node reboot and instance manager deletion 11510 - @yangchiu @mantissahz
- [BUG] Test case
test_cleanup_system_generated_snapshotsfails on v2 volumes 13123 - @yangchiu @davidcheng0922 - [BUG] Test case
test_drain_with_block_for_eviction_if_contains_last_replica_successfailed on v1 volumes [13103](#13103...
Longhorn v1.12.0-rc2
DON'T UPGRADE from/to any RC/Preview/Sprint releases because the operation is not supported.
Resolved Issues in this release
Highlight
- [FEATURE] Support IPv4/IPv6 Dual-Stack with IPv6 Family First or IPv4 Family First 11531 - @COLDTURNIP @c3y1huang @chriscchien
- [FEATURE] Decouple V2 Data Engine Initiator and Target Placement 7124 - @derekbit @shuo-wu @chriscchien
- [FEATURE] IPv6 for V2 Data Engine 10928 - @COLDTURNIP @chriscchien
- [FEATURE] V2 Data Engine Sharding - Experimental 1061 - @c3y1huang
- [FEATURE] Support v2 Data Engine (GA) 6229 - @derekbit
- [FEATURE] V2 Data Engine Fast Cloning 12552 - @shuo-wu
Feature
- [FEATURE] Add
--tolerationsflag tolonghornctlfor scheduling DaemonSet pods on tainted nodes 12993 - @chriscchien @bachmanity1 - [FEATURE] Support on-demand snapshot checksum calculation 11442 - @yangchiu @davidcheng0922
Improvement
- [IMPROVEMENT] longhorn-manager pods race on webhook TLS Secret at scale 13012 - @yangchiu @hookak
- [IMPROVEMENT] Misleading storage values 12633 - @elTwingo @davidcheng0922 @houhoucoop
- [IMPROVEMENT] Improve Longhorn auto-salvage observability 13018 - @yangchiu @derekbit
- [IMPROVEMENT] Removing Scheduled condition check during volume expansion 12606 - @yangchiu @davidcheng0922
- [IMPROVEMENT]
TooManySnapshotsvolume condition uses a hard-coded threshold despite configurable snapshot max count 12396 - @COLDTURNIP @yangchiu - [IMPROVEMENT] Move v2 volume backup restore from replica to engine 9277 - @davidcheng0922 @roger-ryao
- [IMPROVEMENT] Make Kubernetes Metrics Server (metrics.k8s.io) integration toggleable 13011 - @yangchiu @mantissahz @hookak
- [IMPROVEMENT] Add support of new StorageClass parameters to helm chart 9324 - @yangchiu @TheFutonEng
- [IMPROVEMENT] Is there any way to have longhorn without python 12679 - @roger-ryao
- [IMPROVEMENT] sparse-tools APIs must not introduce breaking changes to existing APIs. 12967 - @yangchiu @derekbit
- [IMPROVEMENT] Implement Network Reconnection for Enhancing Replica Rebuilding Resilience 9626 - @yangchiu @mschneider82
- [UI][IMPROVEMENT]
TooManySnapshotsvolume condition uses a hard-coded threshold despite configurable snapshot max count 12922 - @chriscchien @houhoucoop - [IMPROVEMENT] Add
Backup Targetto volume listcustom columnoptions 12619 - @yangchiu @houhoucoop - [IMPROVEMENT] Allow disabling creation of the default longhorn StorageClass via Helm 12906 - @hookak @roger-ryao
- [IMPROVEMENT][TEST] Add unit tests for util parsing and string conversion helpers 12898 - @archy-rock3t-cloud @chriscchien
- [IMPROVEMENT] Metrics for backups 11387 - @yangchiu @mantissahz @Copilot
- [IMPROVEMENT] chart: allow specifying spec.sampleLimit on ServiceMonitor 12671 - @grelland @yangchiu
- [IMPROVEMENT] Topology-aware PV nodeAffinity control: allowedTopologies keys + strictTopology 12684 - @hookak @roger-ryao
- [IMPROVEMENT] Reduce longhorn-manager memory usage by optimizing cluster-wide informer caching 12771 - @hookak @roger-ryao
- [IMPROVEMENT] Add metrics for non-Encrypted and encrypted volumes 12462 - @derekbit @mantissahz @chriscchien @Copilot
- [IMPROVEMENT] Clarify helm version in generate-longhorn-yaml error message 12630 - @luojiyin1987 @chriscchien
- [IMPROVEMENT][UI] Link version number to git releases 11132 - @chriscchien @houhoucoop
- [IMPROVEMENT] Record the current share manager image in the Share Manager CR status 11203 - @derekbit @roger-ryao @Copilot
- [IMPROVEMENT] Snapshot tree color explanation 12247 - @houhoucoop
- [IMPROVEMENT] Refuse to attach strict-local volume to the wrong node 8546 - @yangchiu @derekbit @mantissahz @Copilot
- [IMPROVEMENT] Ensure V2 Engine ReplicaAdd respects the fast-replica-rebuild-enabled setting 12540 - @davidcheng0922 @roger-ryao
- [IMPROVEMENT] Relax
endpoint-network-for-rwx-volumevalidation for migratable block-mode volumes 12644 - @c3y1huang @chriscchien - [IMPROVEMENT] detailed log for the reason of node controller deleting backing image copies 12584 - @COLDTURNIP @yangchiu
- [IMPROVEMENT] RBAC permissions for csi-resizer 12681 - @yangchiu @konstantin-kelemen
- [IMPROVEMENT] Adding a message to hint users to clean up non-existing disks in Backing Image CR 10617 - @chriscchien @Copilot
- [IMPROVEMENT] Keep workload pod in the original zone and region 12517 - @bachmanity1
- [IMPROVEMENT] Consider node storage capacity when scheduling pods with existing PVs 12398 - @bachmanity1
- [IMPROVEMENT] Volume may enter faulty state without clear reason when backing image size mismatches 11673 - @COLDTURNIP @derekbit @roger-ryao @Copilot
Bug
- [BUG] global.cattle.systemDefaultRegistry is not applied as the image registry prefix in 108.2.1+up1.10.2 13071 - @COLDTURNIP
- [BUG] [v1.12.0-rc1] longhornctl fails on sle-micro 6.1 13048 - @COLDTURNIP @roger-ryao
- [BUG] [longhorn-engine/dataserver] Handling EOF returned by io.ReadFull robustly 12964 - @yangchiu @apoorvajagtap
- [BUG] PrometheusTimeseriesCardinality for metric longhorn_rest_client_rate_limiter_latency_seconds_bucket 13085 - @derekbit @chriscchien
- [BUG]
snapshot-max-countdoesn't work on v2 volumes 12921 - @davidcheng0922 - [BUG] nil pointer dereference panic in instance-manager during replica rebuild. 13087 - @derekbit @shuo-wu @roger-ryao
- [BUG][v1.12.0-rc1] RWX Volume Gets Stuck in Detaching/Attaching Loop After Reboot Replica Node While Heavy Writing And Recurring Jobs on v2 Data Engine 13062 - @mantissahz
- [BUG] After a node is rebooted and attach a v2 volume to the rebooted node, the volume gets stuck in the
Attachingstate 13084 - @yangchiu @derekbit - [BUG] Encrypted volume stuck in Attaching/Detaching loop after node reboot and instance manager deletion 11510 - @yangchiu @mantissahz
- [BUG] [v1.12.0-rc1] longhornctl fails on Ubuntu 26.04 13072 - @derekbit @roger-ryao
- [BUG] Encrypt volume provided size is 16MB shorter than the claimed size 9205 - @mantissahz @roger-ryao
- [BUG] longhorn-spdk-engine
verify()race overwritesreplicaMap, breaking concurrent replica rebuild 13074 - @derekbit @roger-ryao - [BUG] spdk_tgt crash during rebuild cleanup 13076 - @derekbit @roger-ryao
- [BUG] Stopping spdk_tgt during v2 volume expansion leaves the volume stuck in detaching and does not record the failure in the Engine CR 12903 - @davidcheng0922 @chriscchien
- [BUG] RWX Workload becomes Read-only after nodes shutdown and share manager is recreated on a new node 12986 - @yangchiu @davidcheng0922
- [BUG]
test_support_bundle.pytest cases fail onhardened clusterwithIPv6 mode13066 - @COLDTURNIP @yangchiu - [BUG] [v1.12.0-rc1]
test_delete_backup_during_restoring_volumefails on v2 volume, volume not faulted and condition Restore is True 13061 - @derekbit @chriscchien - [BUG] Volume may get stuck when the snapshot CR deletion cannot be handled [12489](https://github.com/longhorn/lo...
Longhorn v1.12.0-rc1
DON'T UPGRADE from/to any RC/Preview/Sprint releases because the operation is not supported.
Resolved Issues in this release
Highlight
- [FEATURE] v2 volume supports cross-node initiator and target 7124 - @derekbit @shuo-wu @chriscchien
- [FEATURE] Support IPv4/IPv6 Dual-Stack 11531 - @chriscchien
- [FEATURE] IPv6 for V2 Data Engine 10928 - @COLDTURNIP @chriscchien
- [FEATURE] V2 Data Engine Sharding - Experimental 1061 - @c3y1huang
- [FEATURE] Support v2 Data Engine (GA) 6229 - @derekbit
- [FEATURE] V2 Data Engine Fast Cloning 12552 - @shuo-wu
Feature
- [FEATURE] Add
--tolerationsflag tolonghornctlfor scheduling DaemonSet pods on tainted nodes 12993 - @chriscchien @bachmanity1 - [FEATURE] Support on-demand snapshot checksum calculation 11442 - @yangchiu @davidcheng0922
Improvement
- [IMPROVEMENT] Make Kubernetes Metrics Server (metrics.k8s.io) integration toggleable 13011 - @yangchiu @mantissahz @hookak
- [IMPROVEMENT] Add support of new StorageClass parameters to helm chart 9324 - @yangchiu @TheFutonEng
- [IMPROVEMENT] Misleading storage values 12633 - @elTwingo @carterli0407-cell
- [IMPROVEMENT] Move v2 volume backup restore from replica to engine 9277 - @davidcheng0922 @roger-ryao
- [IMPROVEMENT] Improve Longhorn auto-salvage observability 13018 - @yangchiu @derekbit
- [IMPROVEMENT] longhorn-manager pods race on webhook TLS Secret at scale 13012 - @hookak
- [IMPROVEMENT] Removing Scheduled condition check during volume expansion 12606 - @yangchiu @davidcheng0922
- [IMPROVEMENT]
TooManySnapshotsvolume condition uses a hard-coded threshold despite configurable snapshot max count 12396 - @COLDTURNIP @yangchiu - [IMPROVEMENT] Is there any way to have longhorn without python 12679 - @roger-ryao
- [IMPROVEMENT] sparse-tools APIs must not introduce breaking changes to existing APIs. 12967 - @yangchiu @derekbit
- [IMPROVEMENT] Implement Network Reconnection for Enhancing Replica Rebuilding Resilience 9626 - @yangchiu @mschneider82
- [UI][IMPROVEMENT]
TooManySnapshotsvolume condition uses a hard-coded threshold despite configurable snapshot max count 12922 - @chriscchien @houhoucoop - [IMPROVEMENT] Add
Backup Targetto volume listcustom columnoptions 12619 - @yangchiu @houhoucoop - [IMPROVEMENT] Allow disabling creation of the default longhorn StorageClass via Helm 12906 - @hookak @roger-ryao
- [IMPROVEMENT][TEST] Add unit tests for util parsing and string conversion helpers 12898 - @archy-rock3t-cloud @chriscchien
- [IMPROVEMENT] Metrics for backups 11387 - @yangchiu @mantissahz @Copilot
- [IMPROVEMENT] chart: allow specifying spec.sampleLimit on ServiceMonitor 12671 - @grelland @yangchiu
- [IMPROVEMENT] Topology-aware PV nodeAffinity control: allowedTopologies keys + strictTopology 12684 - @hookak @roger-ryao
- [IMPROVEMENT] Reduce longhorn-manager memory usage by optimizing cluster-wide informer caching 12771 - @hookak @roger-ryao
- [IMPROVEMENT] Add metrics for non-Encrypted and encrypted volumes 12462 - @derekbit @mantissahz @chriscchien @Copilot
- [IMPROVEMENT] Clarify helm version in generate-longhorn-yaml error message 12630 - @luojiyin1987 @chriscchien
- [IMPROVEMENT][UI] Link version number to git releases 11132 - @chriscchien @houhoucoop
- [IMPROVEMENT] Record the current share manager image in the Share Manager CR status 11203 - @derekbit @roger-ryao @Copilot
- [IMPROVEMENT] Snapshot tree color explanation 12247 - @houhoucoop
- [IMPROVEMENT] Refuse to attach strict-local volume to the wrong node 8546 - @yangchiu @derekbit @mantissahz @Copilot
- [IMPROVEMENT] Ensure V2 Engine ReplicaAdd respects the fast-replica-rebuild-enabled setting 12540 - @davidcheng0922 @roger-ryao
- [IMPROVEMENT] Relax
endpoint-network-for-rwx-volumevalidation for migratable block-mode volumes 12644 - @c3y1huang @chriscchien - [IMPROVEMENT] detailed log for the reason of node controller deleting backing image copies 12584 - @COLDTURNIP @yangchiu
- [IMPROVEMENT] RBAC permissions for csi-resizer 12681 - @yangchiu @konstantin-kelemen
- [IMPROVEMENT] Adding a message to hint users to clean up non-existing disks in Backing Image CR 10617 - @chriscchien @Copilot
- [IMPROVEMENT] Keep workload pod in the original zone and region 12517 - @bachmanity1
- [IMPROVEMENT] Consider node storage capacity when scheduling pods with existing PVs 12398 - @bachmanity1
- [IMPROVEMENT] Volume may enter faulty state without clear reason when backing image size mismatches 11673 - @COLDTURNIP @derekbit @roger-ryao @Copilot
- [IMPROVEMENT] Ensure the volume is unstaged while ControllerUnpublish 11377 - @COLDTURNIP
Bug
- [BUG] Test cases in test_engine_upgrade.py failed 13014 - @mantissahz @roger-ryao
- [BUG] v2 DR volume may get stuck in
Degradedstate if replica rebuilding is triggered during incremental restoration 12515 - @davidcheng0922 @chriscchien - [BUG] Unexpected replica remains on node after all volumes have been cleaned up and causing unexpected scheduled storage 11177 - @yangchiu @c3y1huang
- [BUG]
snapshot-max-countdoesn't work on v2 volumes 12921 - @carterli0407-cell - [BUG] Volume may get stuck when the snapshot CR deletion cannot be handled 12489 - @COLDTURNIP
- [BUG] RWX Workload becomes Read-only after nodes shutdown and share manager is recreated on a new node 12986 - @yangchiu @davidcheng0922
- [BUG] V2 volume created from backup become data corrupted after crashing one replica during restore 12830 - @chriscchien
- [BUG] [UI] replica shown as gray when v2 volume engine live switchover 13029 - @derekbit @chriscchien
- [Bug]
test_rebuild_with_restorationis flaky on v2 volume 11447 - @derekbit @chriscchien - [BUG]
lastBackupof a v2 volume may remain empty after a backup is created 12542 - @mantissahz @roger-ryao - [BUG]
dd && synccommand hangs on v2 rwx encrypted volume 12649 - @mantissahz @chriscchien - [BUG] Stopping spdk_tgt during v2 volume expansion leaves the volume stuck in detaching and does not record the failure in the Engine CR 12903 - @davidcheng0922
- [BUG] Encrypted volume stuck in Attaching/Detaching loop after node reboot and instance manager deletion 11510 - @yangchiu @mantissahz
- [BUG] VolumeSnapshot snapshot.storage.k8s.io/v1 report stale error 11429 - @COLDTURNIP @yangchiu
- [BUG] Data integrity check fails on a volume with backing image 12989 - @COLDTURNIP
- [BUG] CSIStorageCapacity reports 0 for compute nodes without Longhorn disks, breaking WaitForFirstConsumer scheduling 12807 - @bachmanity1 @roger-ryao
- [BUG] Encrypt volume provided size is 16MB shorter than the claimed size 9205 - @mantissahz @roger-ryao
- [BUG] Test case
test_storage_capacity_aware_pod_schedulingfails 13001 - @yangchiu @bachmanity1 - [BUG] Crash Single Instance Manager While RWO Encrypted Vo...
Longhorn v1.11.2
Longhorn v1.11.2 Release Notes
Longhorn 1.11.2 introduces several improvements and bug fixes that are intended to improve system quality, resilience, stability and security.
We welcome feedback and contributions to help continuously improve Longhorn.
For terminology and context on Longhorn releases, see Releases.
Important Fixes
This release includes several critical stability fixes.
Replica rebuild progress fix
Resolved an issue where replica rebuild progress could exceed 100% under unstable network conditions. Progress reporting is now capped at 100%.
For more details, see #12949.
CSIStorageCapacity scheduling enhancement
Introduced a new setting to control CSIStorageCapacity reporting. Previously, compute nodes without Longhorn disks incorrectly reported 0 capacity, breaking WaitForFirstConsumer scheduling. With this enhancement, capacity tracking can be configured to avoid rejecting compute nodes in separated compute/storage architectures.
For more details, see #12807.
Improvement
Manager memory optimization
Optimized longhorn‑manager Pod informer caching to reduce cluster‑wide memory usage.
For more details, see #12771.
Installation
Important
Ensure that your cluster is running Kubernetes v1.25 or later before installing Longhorn v1.11.2.
You can install Longhorn using a variety of tools, including Rancher, Kubectl, and Helm. For more information about installation methods and requirements, see Quick Installation in the Longhorn documentation.
Upgrade
Important
Ensure that your cluster is running Kubernetes v1.25 or later before upgrading from Longhorn v1.10.x or v1.11.0 to v1.11.2.
Important
Users on v1.11.0 who experienced the memory leaks of longhorn-instance-manager pods 12575 are highly encouraged to upgrade to v1.11.1 or later to receive the permanent fix for the proxy connection leaks.
Longhorn only allows upgrades from supported versions. For more information about upgrade paths and procedures, see Upgrade in the Longhorn documentation.
Post-Release Known Issues
For information about issues identified after this release, see Release-Known-Issues.
Resolved Issues in this release
Improvement
- [BACKPORT][v1.11.2][IMPROVEMENT] Reduce longhorn-manager memory usage by optimizing cluster-wide informer caching 12819 - @hookak @roger-ryao
Bug
- [BACKPORT][v1.11.2][BUG] Test case
test_storage_capacity_aware_pod_schedulingfails 13006 - @yangchiu @bachmanity1 - [BACKPORT][v1.11.2][BUG] Replica Auto-Balance Causes Infinite Replica Scheduling Loop 12928 - @yangchiu @shuo-wu
- [BACKPORT][v1.11.2][BUG] CSIStorageCapacity reports 0 for compute nodes without Longhorn disks, breaking WaitForFirstConsumer scheduling 12918 - @chriscchien @bachmanity1
- [BACKPORT][v1.11.2][BUG] Replica rebuild progress can go over 100% 12952 - @yangchiu @davidcheng0922
- [BACKPORT][v1.11.2][BUG] Node exhaustion caused by backup inspect buildup induced due to NFS latency 12945 - @COLDTURNIP @roger-ryao
- [BACKPORT][v1.11.2][BUG] Failed to collect health data for block disk (AIO) when disk path is a /dev/disk/by-id symlink 12911 - @yangchiu @hookak
- [BACKPORT][v1.11.2][BUG] "snapshot becomes not ready to use" Warning events emitted during expected auto-cleanup after backup 12856 - @EpochBoy @yangchiu
Stability
- [BACKPORT][v1.11.1][BUG] Potential NEP in Volume Metrics Collector 12733 - @derekbit @chriscchien
Contributors
Longhorn v1.11.1
Longhorn v1.11.1 Release Notes
Longhorn v1.11.1 is a patch release that focuses on critical bug fixes, security hardening, and stability improvements for both V1 and V2 data engines. Key highlights include a fix for a significant memory leak in the instance manager and improvements to backup reliability and volume scheduling.
We welcome feedback and contributions to help continuously improve Longhorn.
For terminology and context on Longhorn releases, see Releases.
Important Fixes
This release includes several critical stability fixes.
Longhorn workload pods memory leak
Fixed a critical regression where proxy connection leaks in the longhorn-instance-manager pods caused high memory consumption.
For more details, see #12575
Backup & Restore compatibility fix
Resolved compatibility issues introduced by aws-go-sdk v2, including backups to S3-compatible storage (like Storj or Google Cloud Storage). This fix ensures the completion of large data transfers to remote backup targets with correct authorization.
For more details, see #12714 and 12688
V2 Data Engine (SPDK) refinements
Several enhancements were delivered for some V2 Data Engine features, including fast replica rebuild and clone.
For more details, see #12751 and 12748
CSI scheduling enhancement
Support CSI topology-aware PV nodeAffinity control.
For more details, see #12689 and 12656
Installation
Important
Ensure that your cluster is running Kubernetes v1.25 or later before installing Longhorn v1.11.1.
You can install Longhorn using a variety of tools, including Rancher, Kubectl, and Helm. For more information about installation methods and requirements, see Quick Installation in the Longhorn documentation.
Upgrade
Important
Ensure that your cluster is running Kubernetes v1.25 or later before upgrading from Longhorn v1.10.x or v1.11.0 to v1.11.1.
Important
Users on v1.11.0 who experienced the memory leaks of longhorn-instance-manager pods 12575 are highly encouraged to upgrade to v1.11.1 to receive the permanent fix for the proxy connection leaks.
Longhorn only allows upgrades from supported versions. For more information about upgrade paths and procedures, see Upgrade in the Longhorn documentation.
Post-Release Known Issues
For information about issues identified after this release, see Release-Known-Issues.
Resolved Issues in this release
Improvement
- [BACKPORT][v1.11.1][IMPROVEMENT] Ensure V2 Engine ReplicaAdd respects the fast-replica-rebuild-enabled setting 12751 - @davidcheng0922 @roger-ryao
- [BACKPORT][v1.11.1][IMPROVEMENT] Topology-aware PV nodeAffinity control: allowedTopologies keys + strictTopology 12689 - @hookak @roger-ryao
- [BACKPORT][v1.11.1][IMPROVEMENT] detailed log for the reason of node controller deleting backing image copies 12585 - @COLDTURNIP @yangchiu
- [BACKPORT][v1.11.1][IMPROVEMENT] Relax
endpoint-network-for-rwx-volumevalidation for migratable block-mode volumes 12711 - @c3y1huang @chriscchien - [BACKPORT][v1.11.1][IMPROVEMENT] RBAC permissions for csi-resizer 12694 - @yangchiu
Bug
- [BACKPORT][v1.11.1][BUG] Failed replicas accumulate during engine upgrade 12768 - @davidcheng0922
- [BACKPORT][v1.11.1][BUG] V2 Volume Clone Status is Changed Over Time 12748 - @davidcheng0922 @roger-ryao
- [BACKPORT][v1.11.1][BUG] Backup to S3 fails at 95% 12714 - @yangchiu @mantissahz
- [BACKPORT][v1.11.1][BUG]
spdk_tgtencountered an assertion failure inlonghorn-spdk-helperduring a CI test run 12738 - @derekbit @roger-ryao - [BACKPORT][v1.11.1][BUG] Google Cloud Storage (GCS) backup target always fails with SignatureDoesNotMatch due to AWS SDK Go v2 CRC32 checksum incompatibility 12688 - @mantissahz @chriscchien
- [BACKPORT][v1.11.1][BUG] Enable to set defaultSettings.nodeDiskHealthMonitoring 12730 - @chriscchien
- [BACKPORT][v1.11.1][BUG] stale name variable in nsmounter get_pid 12704 - @chriscchien
- [BACKPORT][v1.11.1][BUG] After upgrading to 1.11.0, new persistent volumes have nodeAffinity 12665 - @chriscchien
- [BACKPORT][v1.11.1][BUG] Incorrect storage double-counting causes scheduling failure when multiple replicas exist on the same node 12661 - @yangchiu @davidcheng0922
- [BACKPORT][v1.11.1][BUG] Recreated block disk with same name never becomes schedulable after volume and disk deletion 12641 - @davidcheng0922
- [BACKPORT][v1.11.1][BUG] Longhorn v1.10 Volume API is not compatible with the v1.8.1 manifest 12618 - @mantissahz @roger-ryao
- [BACKPORT][v1.11.1][BUG] [v2] Can't use partition as block device 12626 - @bachmanity1
- [BACKPORT][v1.11.1][BUG] Volume.Spec.CloneMode is empty after upgrading to v1.10.x and following version 12615 - @mantissahz
- [BACKPORT][v1.11.1][BUG] Longhorn validating webhook blocks k3s server node joins - flannel CNI fails to initialize 12589 - @yangchiu @mantissahz
- [BACKPORT][v1.11.1][BUG] V1.11.0 very high memory consumption for instance manager 12575 - @derekbit @roger-ryao
- [BACKPORT][v1.11.1][BUG] Backing image data source pod fails when HTTP proxy is enabled 12780 - @c3y1huang @chriscchien
- [BACKPORT][v1.11.1][BUG] orphan controller does not cleanup the instance on the corresponding instance manager on a multiple IM node 12788 - @COLDTURNIP @roger-ryao
Stability
- [BACKPORT][v1.11.1][BUG] Potential NEP in Volume Metrics Collector 12733 - @derekbit @chriscchien
Contributors
Longhorn v1.11.0
Longhorn v1.11.0 Release Notes
The Longhorn team is excited to announce the release of Longhorn v1.11.0. This release marks a major milestone, with the V2 Data Engine officially entering the Technical Preview stage following significant stability improvements.
Additionally, this version optimizes the stability of the whole system and introduces critical improvements in resource observability, scheduling, and utilization.
For terminology and background on Longhorn releases, see Releases.
Warning
Hotfix
longhorn-instance-manager Image
The longhorn-instance-manager:v1.11.0 image is affected by a regression issue introduced by the new longhorn-instance-manager Proxy service APIs. The bug causes Proxy connection leaks in the longhorn-instance-manager pods, resulting in increased memory usage. To mitigate this issue, replace longhornio/longhorn-instance-manager:v1.11.0 with the hotfixed image longhornio/longhorn-instance-manager:v1.11.0-hotfix-1.
You can apply the update by following these steps:
-
Update the
longhorn-instance-managerimage- Change the longhorn-instance-manager image tag from
v1.11.0tov1.11.0-hotfix-1in the appropriate file:- For Helm: Update
values.yaml - For manifests: Update the deployment manifest directly.
- For Helm: Update
- Change the longhorn-instance-manager image tag from
-
Proceed with the installation or upgrade
- Apply the changes using your standard Helm install/upgrade command or reapply the updated manifest.
longhorn-manager Image
The longhorn-manager:v1.11.0 image is affected by a regression issue introduced by the new Kubernetes Node validator. The bug blocks setting Kubernetes node CNI labels because it waits for the Longhorn webhook server to be running, while the Longhorn webhook server waits for CNI network to be ready. To mitigate this issue, replace longhornio/longhorn-manager:v1.11.0 with the hotfixed image longhornio/longhorn-manager:v1.11.0-hotfix-1.
You can apply the update by following these steps:
- Disable the upgrade version check
- Helm users: Set
upgradeVersionChecktofalsein thevalues.yamlfile. - Manifest users: Remove the
--upgrade-version-checkflag from the deployment manifest.
- Update the
longhorn-managerimage
- Change the
longhorn-managerimage tag fromv1.11.0tov1.11.0-hotfix-1in the appropriate file:- For Helm: Update
values.yaml. - For manifests: Update the deployment manifest directly.
- For Helm: Update
- Proceed with the installation or upgrade
- Apply the changes using your standard Helm install/upgrade command or reapply the updated manifest.
Deprecation
V2 Backing Image Deprecation
The Backing Image feature for the V2 Data Engine is now deprecated in v1.11.0 and is scheduled for removal in v1.12.0.
Users using V2 volumes for virtual machines are encouraged to adopt the Containerized Data Importer (CDI) for volume population instead.
Primary Highlights
V2 Data Engine
Now in Technical Preview Stage
We are pleased to announce that the V2 Data Engine has officially graduated to the Technical Preview stage. This indicates increased stability and feature maturity as we move toward General Availability.
Limitation: While the engine is in Technical Preview, live upgrade is not supported yet. V2 volumes must be detached (offline) before engine upgrade.
Support for ublk Frontend
Longhorn supports configuring UBLK performance parameters globally, per volume, or via StorageClass to improve I/O performance.
V1 Data Engine
Faster Replica Rebuilding from Multiple Sources
The V1 Data Engine now supports parallel rebuilding. When a replica needs to be rebuilt, the engine can now stream data from multiple healthy replicas simultaneously rather than a single source. This significantly reduces the time required to restore redundancy for volumes containing tons of scattered data chunks.
General
Balance-Aware Algorithm Disk Selection For Replica Scheduling
Longhorn improves the disk selection for the replica scheduling by introducing an intelligent balance-aware scheduling algorithm, reducing uneven storage usage across nodes and disks.
Node Disk Health Monitoring
Longhorn now actively monitors the physical health of the underlying disks used for storage by using S.M.A.R.T. data. This allows administrators to identify issues and raise alerts when abnormal SMART metrics are detected, helping prevent failed volumes.
Share Manager Networking
Users can now configure an extra network interface for the Share Manager to support complex network segmentation requirements.
ReadWriteOncePod (RWOP) Support
Full support for the Kubernetes ReadWriteOncePod access mode has been added.
StorageClass allowedTopologies Support
Administrators can now use the allowedTopologies field in Longhorn StorageClasses to restrict volume provisioning to specific zones, regions, or nodes within the cluster.
Installation
Important
Ensure that your cluster is running Kubernetes v1.25 or later before installing Longhorn v1.11.0.
You can install Longhorn using a variety of tools, including Rancher, Kubectl, and Helm. For more information about installation methods and requirements, see Quick Installation in the Longhorn documentation.
Upgrade
Important
Ensure that your cluster is running Kubernetes v1.25 or later before upgrading from Longhorn v1.10.x to v1.11.0.
Longhorn only allows upgrades from supported versions. For more information about upgrade paths and procedures, see Upgrade in the Longhorn documentation.
Post-Release Known Issues
For information about issues identified after this release, see Release-Known-Issues.
Resolved Issues in this release
Highlight
- [FEATURE] Add support for ReadWriteOncePod access mode 9727 - @derekbit @shikanime @chriscchien @Copilot
- [FEATURE] Scale replica rebuilding speed from multiple healthy replicas 11331 - @derekbit @shuo-wu @roger-ryao @Copilot
- [FEATURE] Support StorageClass allowedTopologies for Longhorn volumes 12261 - @yangchiu @derekbit @hookak @Copilot
- [FEATURE] Support extra network interface (not only storage network) on the share manager pod 10269 - @yangchiu @c3y1huang
- [FEATURE] Monitor Node Disk Health 12016 - @c3y1huang @roger-ryao
- [FEATURE] Replica Auto Balance Across Nodes based on Node Disk Space Consumption 10512 - @davidcheng0922 @chriscchien
Feature
- [FEATURE] Guess Linux distro from the package manager 12153 - @yangchiu @derekbit @NamrathShetty @Copilot
- [FEATURE] Provide a helm chart setting to define the managerUrl 10583 - @lexfrei @yangchiu
- [FEATURE] Add metric for last backup of a volume 6049 - @c3y1huang @roger-ryao
- [FEATURE] Real-time volume performance monitoring 368 - @derekbit @hookak
- [UI][FEATURE] Monitor Node Disk Health 12263 - @houhoucoop @roger-ryao
- [FEATURE] custom annotation/label of UI's k8s service on value.yaml of helm chart 11754 - @yangchiu @lucasl0st
- [FEATURE] Make
longhornctlloadublk_drvmodule when kernel version is 6 or newer 11803 - @chriscchien @bachmanity1 - [BUG] Inherit namespace for longhorn-share-manager in FastFailover mode 12244 - @yangchiu @semenas
- [FEATURE] Enable CSI pod anti-affinity preset update 12100 - @yangchiu @yulken
- [FEATURE] [Dependency] aws-sdk-go v1.55.7 is EOL as of 2025-07-31 — plan to migrate to v2? 12098 - @mantissahz @roger-ryao
- [FEATURE] Change volume operation menu button behaviour from hover to click. 11408 - @yangchiu @houhoucoop
- [FEATURE] "hard" podAntiAffinity for csi-attacher/csi-provisioner/csi-resizer/csi-snapshotter 11617 - @yangchiu @yulken
- [FEATURE] node storage scheduled metrics 11949 - @yangchiu @AoRuiAC
Impro...
Longhorn v1.10.2
Longhorn v1.10.2 Release Notes
Longhorn 1.10.2 introduces several improvements and bug fixes that are intended to improve system quality, resilience, stability and security.
We welcome feedback and contributions to help continuously improve Longhorn.
For terminology and context on Longhorn releases, see Releases.
Warning
HotFix
The backing-image-manager:v1.10.2 image is affected by
- Regression:
- [BUG] Backing image data source pod fails when HTTP proxy is enabled that can cause failure of backing image upload when HTTP proxy is enabled.
To mitigate the issues, replace backing-image-manager:v1.10.2 with the hotfixed image backing-image-manager:v1.10.2-hotfix-1.
Follow these steps to apply the update:
-
Update the
backing-image-managerimage- Change the image tag from
v1.10.2tov1.10.2-hotfix-1in the appropriate file:- For Helm: Update
values.yaml - For manifests: Update the deployment manifest directly.
- For Helm: Update
- Change the image tag from
-
Proceed with the upgrade
- Apply the changes using your standard Helm upgrade command or reapply the updated manifest.
Important Fixes
This release includes several critical stability fixes.
RWX Volume Unavailable After Node Drain
Fixed a race condition where ReadWriteMany (RWX) volumes could remain in the attaching state after node drains, causing workloads to become unavailable.
For more details, see Issue #12231.
Encrypted Volume Cannot Be Expanded Online
Fixed an issue where online expansion of encrypted volumes did not propagate the new size to the dm-crypt device.
For more details, see Issue #12368.
Cloned Volume Cannot Be Attached to Workload
Fixed a bug where cloned volumes could fail to reach a healthy state, preventing attachment to workloads.
For more details, see Issue #12208.
Block Mode Volume Migration Stuck
Fixed a regression in block-mode volume migrations where newly created replicas could incorrectly inherit the lastFailedAt timestamp from source replicas, causing repeated deletion and blocking migration completion.
For more details, see Issue #12312.
Replica Auto Balance Disk Pressure Threshold Stalled
Fixed an issue where replica auto-balance under disk pressure could be blocked if stopped volumes were present on the disk.
For more details, see Issue #12334.
Replicas Accumulate During Engine Upgrade
Fixed a bug where temporary replicas could accumulate during engine upgrade. High etcd latency could cause new replicas to fail verification, leading to accumulation over multiple reconciliation cycles.
For more details, see Issue #12115.
Potential Client Connection and Context Leak
Fixed potential context leaks in the instance manager client and backing image manager client, improving stability and preventing resource exhaustion.
For more details, see Issue #12200 and Issue #12195.
Replica Node Level Soft Anti-Affinity Ignored
Fixed a bug of replica scheduling loop where replicas could be scheduled onto nodes that already host a replica, even when Replica Node-Level Soft Anti-Affinity was disabled.
For more details, see Issue #12251.
Installation
Important
Ensure that your cluster is running Kubernetes v1.25 or later before installing Longhorn v1.10.2.
You can install Longhorn using a variety of tools, including Rancher, Kubectl, and Helm. For more information about installation methods and requirements, see Quick Installation in the Longhorn documentation.
Upgrade
Important
Ensure that your cluster is running Kubernetes v1.25 or later before upgrading from Longhorn v1.9.x to v1.10.2.
Longhorn only allows upgrades from supported versions. For more information about upgrade paths and procedures, see Upgrade in the Longhorn documentation.
Post-Release Known Issues
For information about issues identified after this release, see Release-Known-Issues.
Resolved Issues
Feature
- [BACKPORT][v1.10.2][FEATURE] Inherit namespace for longhorn-share-manager in FastFailover mode 12245 - @yangchiu
- [BACKPORT][v1.10.2][FEATURE] [Dependency] aws-sdk-go v1.55.7 is EOL as of 2025-07-31 — plan to migrate to v2? 12181 - @mantissahz @roger-ryao
Improvement
- [BACKPORT][v1.10.2][IMPROVEMENT] Fix V2 Volume CSI Clone Slowness Caused by VolumeAttachment Webhook Blocking 12329 - @PhanLe1010 @roger-ryao
Bug
- [BACKPORT][v1.10.2][BUG]
instance-manageron nodes that don't have hard or solid state disk DDOSing cluster DNS server with TXT query_grpc_config.localhost12536 - @COLDTURNIP @chriscchien - [BACKPORT] Replica rebuild, clone and restore fail, traffic being sent to HTTP proxy 12518 - @yangchiu @derekbit
- [BACKPORT][v1.10.2][BUG] Healthy replica could be deleted unexpectedly after reducing volume's number of replicas 12512 - @yangchiu @shuo-wu
- [BACKPORT][v1.10.2][BUG] Data locality enabled volume fails to remove an existing running replica after numberOfReplicas reduced 12509 - @derekbit @chriscchien
- [BACKPORT][v1.10.2][BUG] System backup may fail to be created or deleted 12479 - @yangchiu @mantissahz
- [BACKPORT][v1.10.2][BUG] Some default settings in questions.yaml are placed incorrectly. 12222 - @derekbit @roger-ryao
- [BACKPORT][v1.10.2][BUG] Auto balance feature may lead to volumes falling into a replica deletion-recreation loop 12482 - @shuo-wu @roger-ryao
- [BACKPORT][v1.10.2][BUG] Single replica volume could get stuck in attaching/detaching loop after the replica node rebooted 12494 - @COLDTURNIP @yangchiu
- [BACKPORT][v1.10.2][BUG] Potential Instance Manager Client Context Leak 12200 - @derekbit @chriscchien
- [BACKPORT][v1.10.2][BUG] SnapshotBack proxy request might be sent to incorrect instance-manager pod 12476 - @derekbit @chriscchien
- [BACKPORT][v1.10.2][BUG] unknown OS condition in node CR is not properly removed during upgrade 12451 - @COLDTURNIP @roger-ryao
- [BACKPORT][v1.10.2][BUG] RWX volume becomes unavailable after drain node 12231 - @yangchiu @mantissahz
- [BACKPORT][v1.10.2][BUG] mounting error is not properly hanedled during CSI node publish volume 12382 - @COLDTURNIP @yangchiu
- [BACKPORT][v1.10.2][BUG] Encrypted Volume Cannot Be Expanded Online 12368 - @yangchiu @mantissahz
- [BACKPORT][v1.10.2][BUG] The auo generated backing image pod name is complained by kubelet 12357 - @COLDTURNIP @yangchiu
- [BACKPORT][v1.10.2][BUG]
tests.test_cloning.test_cloning_basicfails at msater-head 12342 - @c3y1huang - [BACKPORT][v1.10.2][Bug] A cloned volume cannot be attached to a workload 12208 - @yangchiu @PhanLe1010
- [BACKPORT][v1.10.2][BUG] Block Mode Volume Migration Stuck 12312 - @COLDTURNIP @yangchiu @shuo-wu
- [BACKPORT][v1.10.2][BUG] Replica auto balance disk pressure threshold stalled with stopped volumes 12334 - @c3y1huang @chriscchien
- [BACKPORT][v1.10.2][BUG] short name mode is enforcing, but image name longhornio/longhorn-manager:v1.10. │ │ 0 returns ambiguous list 12270 - @yangchiu
- [BACKPORT][v1.10.2][BUG] Replicas accumulate during engine upgrade 12115 - @c3y1huang @chriscchien
- [BACKPORT][v1.10.2][BUG] Potential BackingImageManagerClient Connection and Context Leak 12195 - @derekbit @chriscchien
- [BACKPORT][v1.10.2][BUG] Longhorn ignores
Replica Node Level Soft Anti-Affinitywhen auto balance is set tobest-effort12251 - @c3y1huang @chriscchien - [BACKPORT][v1.10.2][BUG] invalid memory address or nil pointer dereference (again) 12234 - @chriscchien @bachmanity1
- [BACKPORT][v1.10.2][BUG] Request Header Or Cookie Too Large in Web UI with OIDC auth 12213 - @chriscchien @houhoucoop
Contributors
- @COLDTURNIP...
Longhorn v1.11.0-rc3
DON'T UPGRADE from/to any RC/Preview/Sprint releases because the operation is not supported.
Resolved Issues in this release
Highlight
- [FEATURE] Add support for ReadWriteOncePod access mode 9727 - @derekbit @shikanime @chriscchien @Copilot
- [FEATURE] Scale replica rebuilding speed form multiple healthy replicas 11331 - @derekbit @shuo-wu @roger-ryao @Copilot
- [FEATURE] Support StorageClass allowedTopologies for Longhorn volumes 12261 - @yangchiu @derekbit @hookak @Copilot
- [FEATURE] Support extra network interface (not only storage network) on the share manager pod 10269 - @yangchiu @c3y1huang
- [FEATURE] Monitor Node Disk Health 12016 - @c3y1huang @roger-ryao
- [FEATURE] Replica Auto Balance Across Nodes based on Node Disk Space Consumption 10512 - @davidcheng0922 @chriscchien
Feature
- [FEATURE] Guess Linux distro from the package manager 12153 - @yangchiu @derekbit @NamrathShetty @Copilot
- [FEATURE] Provide a helm chart setting to define the managerUrl 10583 - @lexfrei @yangchiu
- [FEATURE] Add metric for last backup of a volume 6049 - @c3y1huang @roger-ryao
- [FEATURE] Real-time volume performance monitoring 368 - @derekbit @hookak
- [UI][FEATURE] Monitor Node Disk Health 12263 - @houhoucoop @roger-ryao
- [FEATURE] custom annotation/label of UI's k8s service on value.yaml of helm chart 11754 - @yangchiu @lucasl0st
- [FEATURE] Make
longhornctlloadublk_drvmodule when kernel version is 6 or newer 11803 - @chriscchien @bachmanity1 - [BUG] Inherit namespace for longhorn-share-manager in FastFailover mode 12244 - @yangchiu @semenas
- [FEATURE] Enable CSI pod anti-affinity preset update 12100 - @yangchiu @yulken
- [FEATURE] [Dependency] aws-sdk-go v1.55.7 is EOL as of 2025-07-31 — plan to migrate to v2? 12098 - @mantissahz @roger-ryao
- [FEATURE] Change volume operation menu button behaviour from hover to click. 11408 - @yangchiu @houhoucoop
- [FEATURE] "hard" podAntiAffinity for csi-attacher/csi-provisioner/csi-resizer/csi-snapshotter 11617 - @yangchiu @yulken
- [FEATURE] node storage scheduled metrics 11949 - @yangchiu @AoRuiAC
Improvement
- [IMPROVEMENT] Generalize the offline rebuilding setting for both data engines 12484 - @mantissahz @chriscchien
- [IMPROVEMENT] Introduce Concurrent Job Limit for Snapshot Operations 11635 - @yangchiu @derekbit @davidcheng0922 @Copilot
- [IMPROVEMENT] Improve disk error logging to retain errors from newDiskServiceClients() 12446 - @yangchiu @davidcheng0922
- [IMPROVEMENT] Propagate longhorn-manager's timezone to instance-manager and CSI pods 12448 - @hookak @roger-ryao
- [UI][FEATURE] Scale replica rebuilding speed form multiple healthy replicas 12461 - @houhoucoop @roger-ryao
- [IMPROVEMENT] Configure rolling update strategy for longhorn-manager and CSI deployments 12240 - @hookak @chriscchien
- [IMPROVEMENT] Improve log messages for
rebuildNewReplica()in lonbghorn-manager 12426 - @derekbit @chriscchien - [IMPROVEMENT] misleading message when instance manager tries to create the pod 11759 - @mantissahz @chriscchien
- [IMPROVEMENT] To improve the debugging process and UX, it would be nice that the error is recorded in the
instancemanager.status.conditions. 6732 - @mantissahz @chriscchien - [IMPROVEMENT] Add setting to disable node disk health monitoring 12300 - @derekbit @roger-ryao @Copilot
- [IMPROVEMENT] Avoid repeat engine restart when there are replica unavailable during migration 11397 - @yangchiu @shuo-wu
- [IMPROVEMENT] [Script] Minor script adjustments from PR #12177 12187 - @rauldsl @yangchiu
- [IMPROVEMENT] Check toolchain versions before generate k8s codes 12164 - @derekbit @roger-ryao
- [IMPROVEMENT] Create Volume UI improvement, Automatically Filter
Data SourceBased on v1 or v2 Selection 11846 - @yangchiu @houhoucoop - [IMPROVEMENT] Disable the snapshot of v1 volume hashing while it is being deleted 10294 - @davidcheng0922 @chriscchien
- [IMPROVEMENT] Expose SPDK UBLK Parameters 11039 - @derekbit @PhanLe1010 @roger-ryao @Copilot
- [IMPROVEMENT] Check that block device is not in use before creating disk 12078 - @chriscchien @bachmanity1
- [UI][IMPROVEMENT] Awareness of when an offline replica rebuilding is triggered for an individual volume 11247 - @houhoucoop @roger-ryao
- [IMPROVEMENT] Ensure synchronized upgrades between longhorn-manager and instance-manager 12309 - @hookak @chriscchien
- [IMPROVEMENT] Add Resource Limits Configuration for Longhorn manager/instance-manager 12225 - @hookak @chriscchien
- [IMPROVEMENT] Add Validation Webhook to Volume Expansion When Node Disk Is Full 12134 - @yangchiu @davidcheng0922
- [UI][IMPROVEMENT] Expose SPDK UBLK Parameters 12166 - @houhoucoop @roger-ryao
- [IMPROVEMENT] Fix V2 Volume CSI Clone Slowness Caused by VolumeAttachment Webhook Blocking 12328 - @PhanLe1010 @roger-ryao
- [IMPROVEMENT] Use label-based state in metrics instead of numeric values 10723 - @hookak @roger-ryao
- [IMPROVEMENT] Add Resource Limits Configuration for CSI Components 12224 - @yangchiu @hookak @Copilot
- [IMPROVEMENT] Awareness of when an offline replica rebuilding is triggered for an individual volume 11246 - @yangchiu @mantissahz
- [IMPROVEMENT] Add loadBalancerClass value inside a helm chart for ui service 12273 - @ehpc @chriscchien
- [IMPROVEMENT] Add DNS round-robin load balancing to the pool of S3 addresses 12296 - @yangchiu
- [UI][IMPROVEMENT] Should Not Hide the Deleted Snapshots on UI 11620 - @yangchiu @houhoucoop
- [IMPROVMENT] Helm chart Multiple TLS FQDNs 12127 - @yangchiu @hrabalvojta
- [IMPROVEMENT] Removing executables from mirrored-longhornio-longhorn-engine image 11254 - @derekbit @chriscchien
- [IMPROVEMENT] [DOC] Clarify replica auto-balance behavior for unhealthy and detached volumes 12002 - @roger-ryao @sushant-suse
- [IMPROVEMENT] CRD enum values 9718 - @roger-ryao @nzhan126
- [DOC] Troubleshooting KB Articles Fix Typos 12199 - @jmeza-xyz
- [IMPROVEMENT] Remove backupstore releated settings 11026 - @nzhan126
- [IMPROVEMENT] Reject Trim Operation on Block Volume 12048 - @yangchiu @derekbit
- [IMPROVEMENT] Replace
github.com/pkg/errorswithgithub.com/cockroachdb/errors11413 - @derekbit @chriscchien - [UI][IMPROVEMENT] UI shows the backing image virtual size 11674 - @chriscchien @houhoucoop
- [IMPROVEMENT] Simplify locking in unsub and stream methods 12057 - @derekbit @NamrathShetty
- [UI][IMPROVEMENT] Show Error Message for Unschedulable Disks 11449 - @yangchiu @houhoucoop
- [IMPROVEMENT] The
auto-delete-pod-when-volume-detached-unexpectedlyshould only focus on the kuberentes builtin workload. 12120 - @derekbit @chriscchien @sushant-suse - [IMPROVEMENT]
CSIStorageCapacityobjects must show schedulable (allocatable) capacity 12014 ...