Skip to content

Conversation

@jlevesy
Copy link
Owner

@jlevesy jlevesy commented Jun 7, 2023

What Does This PR do?

So a couple of things were wrong in how the termination was happening. This PR adresses the following issues.

  • The context wasn't canceled on SIGTERM, which means prometheus-elector was never properly stopped and leaving the election
  • I used Fatal, which is a mistake because it does not honor the scheduled defer introducing a risk of not leaving the election. Never use Fatal kids.
  • The notifier was retyring even if the given context is canceled.
  • Better logs
  • Couple of DX improvements (better makefile targets, preflight checks for tooling, better requirements)
  • Lowers the default lease period to 10s

This leads to much better failover time on graceful shutdowns

How to Test This PR?

make run # then kill pods and look for the logs

Good PR Checklist

  • Addresses one issue
  • Adds/Updates unit tests
  • Adds/Updates the documentation
  • Opened against the right branch
  • Correctly Labeled

@jlevesy jlevesy force-pushed the jl/quality-of-live branch from 103f006 to 8dd3dec Compare June 7, 2023 15:57
@jlevesy jlevesy force-pushed the jl/quality-of-live branch from 8dd3dec to a2bf9fe Compare June 7, 2023 16:01
@jlevesy jlevesy merged commit 466c337 into main Jun 7, 2023
@jlevesy jlevesy deleted the jl/quality-of-live branch June 7, 2023 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants