-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Description
Name and Version
binami/redis-cluster 9.1.3
What is the problem this feature will solve?
When applying a redis-cluster configuration change or upgrade (basically anything that requires a new rollout of the statefulset), redis pods will be terminated and restarted progressively, but redis is not informed of the change, which means the cluster status map will be outdated for some time before other nodes notice the redis pod was shut down (which we can adjust by tweaking cluster-node-timeout, but there will still be some delay). Clients may be confused by this and temporarily try to contact the node being shut down (especially if it's a master node).
We'd like to limit the impact of these events, and tell redis that the node is about to get shut down so it can anticipate and update its cluster map before the pod goes down. Of course, this won't handle cases when a node is unexpectedly unavailable, but we think those events are much less frequent than maintenance updates. 🙂
What is the feature you are proposing to solve the problem?
Adding a preStop hook to redis-cluster pods doing the following:
- Check if the current node being shut down is a redis master (https://redis.io/commands/role/). If it is a "slave", do nothing and exit. Else, continue
- Look for replicas of the current pod (https://redis.io/commands/cluster-replicas/ with https://redis.io/commands/cluster-myid/)
- Select a random available replica (look for "connected" to make sure we currently have an active connection) and ask it to take over https://redis.io/commands/cluster-failover/
What alternatives have you considered?
Tweaking cluster-node-timeout and cluster-replica-validity-factor still causes the cluster to be unavailable for some time