-
Notifications
You must be signed in to change notification settings - Fork 222
Open
Labels
FEATURE-v2This is a feature improvement that needs to be taken into consideration for v2This is a feature improvement that needs to be taken into consideration for v2enhancementThis was triaged as an enhancementThis was triaged as an enhancementkeepThis won't be closed by the stale bot.This won't be closed by the stale bot.
Description
Hi team,
I'd like to propose the addition of a new Prometheus metric to expose which node currently holds the reboot lock, for better observability and integration with monitoring systems.
Please consider adding a metric such as:
kured_current_node_lock{node="node-name"} = 1This metric would:
- Be exposed on all the nodes, regardless if they currently holdthe lock.
- Allow Prometheus/Grafana to surface which node is actively coordinating a reboot.
- Help cluster operators quickly identify reboot activity across nodes.
Why This Is Useful:
- Debugging coordination issues: Knowing which node is holding the lock helps diagnose stuck or long reboots.
- Auditing: Helps confirm reboots are progressing as expected across rolling updates.
- Alerting: We can alert if a lock is held for too long or is stuck on a specific node.
It could be gated behind a feature flag or config option if needed. Since the locking mechanism is already in place via annotations or leases, this metric could easily map to the local node identity.
Happy to discuss more and open to provide a PR if a design decision is made.
Thanks!
Metadata
Metadata
Assignees
Labels
FEATURE-v2This is a feature improvement that needs to be taken into consideration for v2This is a feature improvement that needs to be taken into consideration for v2enhancementThis was triaged as an enhancementThis was triaged as an enhancementkeepThis won't be closed by the stale bot.This won't be closed by the stale bot.