Skip to content

Feature Request: Expose Prometheus metric for current lock holder node #1156

@localleon

Description

@localleon

Hi team,

I'd like to propose the addition of a new Prometheus metric to expose which node currently holds the reboot lock, for better observability and integration with monitoring systems.

Please consider adding a metric such as:

kured_current_node_lock{node="node-name"} = 1

This metric would:

  • Be exposed on all the nodes, regardless if they currently holdthe lock.
  • Allow Prometheus/Grafana to surface which node is actively coordinating a reboot.
  • Help cluster operators quickly identify reboot activity across nodes.

Why This Is Useful:

  • Debugging coordination issues: Knowing which node is holding the lock helps diagnose stuck or long reboots.
  • Auditing: Helps confirm reboots are progressing as expected across rolling updates.
  • Alerting: We can alert if a lock is held for too long or is stuck on a specific node.

It could be gated behind a feature flag or config option if needed. Since the locking mechanism is already in place via annotations or leases, this metric could easily map to the local node identity.

Happy to discuss more and open to provide a PR if a design decision is made.
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    FEATURE-v2This is a feature improvement that needs to be taken into consideration for v2enhancementThis was triaged as an enhancementkeepThis won't be closed by the stale bot.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions