-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Current Behaviour
When the disk pool used for creating physical volumes (via Longhorn) becomes fully allocated, new pods cannot be created. However, the Kubernetes autoscaler does not scale up the cluster. This likely happens because the affected pods remain in the ContainerCreating state rather than transitioning to Pending, which prevents the autoscaler from recognizing the need for additional resources.
This results in a deadlock: pods are stuck in an unschedulable state, and the autoscaler does not react.
Expected Behaviour
The autoscaler should detect that Longhorn cannot provision a persistent volume (PV) and should trigger a scale-up to accommodate the new workload.
Steps To Reproduce
Example: steps to reproduce the behaviour:
- Create a Claudie cluster with autoscaling enabled and attached storage disks:
...
- name: computes
providerSpec:
name: hetzner
region: fsn1
zone: fsn1-dc14
serverType: cpx21
image: ubuntu-24.04
autoscaler:
min: 1
max: 4
storageDiskSize: 50
...
- Create some test workload to simulate load
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: sleeper
namespace: default
spec:
serviceName: sleeper
replicas: 20
selector:
matchLabels:
app: sleeper
template:
metadata:
labels:
app: sleeper
spec:
containers:
- name: stress
image: progrium/stress
command: ["sh", "-c"]
args:
- |
echo "Filling volume with 1.7Gi. Does not support status=progress";
dd if=/dev/urandom of=/data/fillfile bs=1M count=1700;
echo "Starting CPU stress test";
stress --cpu 1 --timeout 86400s
resources:
limits:
cpu: "0.2"
volumeMounts:
- name: sleeper-volume
mountPath: /data
volumeClaimTemplates:
- metadata:
name: sleeper-volume
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: longhorn
resources:
requests:
storage: 2Gi
- Observe status of kubernetes cluster:
k top node,k get pod --watch,k events sts sleeper - At some point, a new node will be created. Longhorn will allocate space on that node (some times consuming the majority of available disk space for volume replicas - base on size of replicas).
- Eventually, volume creation will fail with an error:
20m Normal Scheduled Pod/sleeper-17 Successfully assigned default/sleeper-17 to computes-0e1x9ut-02
2m2s (x17 over 20m) Warning FailedAttachVolume Pod/sleeper-17 AttachVolume.Attach failed for volume "pvc-2ca81235-d871-49c4-992d-595b02c09714" : rpc error: code = Aborted desc = volume pvc-2ca81235-d871-49c4-992d-595b02c09714 is not ready for workloads
- In the Longhorn UI, you'll see that all nodes are fully allocated. Longhorn cannot create or attach new PVs due to the reserved 30% disk space and lack of capacity.
Expected results
The Kubernetes autoscaler should detect that pods are unschedulable due to storage constraints and should provision additional nodes to resolve the issue. This may require tuning the autoscaler to also consider disk pressure or persistent volume provisioning failures.