-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Force Lease Expiration When Leader Exits #2379
base: master
Are you sure you want to change the base?
fix: Force Lease Expiration When Leader Exits #2379
Conversation
Currently, when the leader exits (say, after receiving a SIGINT), the workers need to wait for its lease to expire before a leader is re-elected. This patch mimics the behaviour of the Go Client implementation of using ctx.Done() by capturing the SIGINT and forcing the expiration date to a past date and also sets the acquire_time to None to start the leader election.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: RaghavRoy145 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @RaghavRoy145! |
/assign @yliaog |
Oops, I was supposed to do that after the reviews 🙃 |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Currently, when the leader exits (say, after receiving a
SIGINT
) the workers need to wait for its lease to expire before a leader is re-elected. This patch mimics the behaviour of the Go Client implementation of usingctx.Done()
: https://github.com/kubernetes/client-go/blob/1309f64d6648411b4a36a2f7fa84dd8df31884b6/tools/leaderelection/leaderelection.go#L265-L291. It captures theSIGINT
and forces the lease to exit by setting the expiration to a date in the past, and it also sets theacquire_time
to None to force a leader election.Issue Reproduction
As mentioned in the issue: leaderelection do not stop leading properly #2075, to reproduce this issue you can follow
leaderelection/example.py
. Run it on 2-3 nodes (or tmux screens) and once a leader is elected hitCtrl+C
to force the leader to exit. The workers then wait for the leader's lease to expire before a new leader is elected.Expected behavior
The leader exiting should trigger a leader election without having the workers wait for the lease to expire.
Which issue(s) this PR fixes:
Fixes #2075
Special notes for your reviewer:
This is still not a complete fix. It is definitely hacky at the moment and I would love any guidance here! Currently, the patch only handles
SIGINT
but a leader may exit for various reasons, and there should be a more elegant way of handling this. Probably using the thread context but I was not able to figure that out. Further, the implementation of theforce_expire_lease()
function is not elegant; you shouldn't need to setacquire_time
toNone
and settingexpiration
to the past is also a code smell in my opinion. This patch is a proof of concept because of this.I also had to change the imports to point to my definitions of
electionconfig.py
andleaderelectionrecord.py
for this to work and I'm sure there is a better way of handling this.If its more sensible to mark this PR a draft, I'm happy to do so!
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: