Shard rebalancer #517

purplefox · 2022-08-12T15:01:33Z

Each shard has a 1-1 correspondence with a Raft group. We have R replicas of each shard (including the leader), where R is the replication factor.
The leader replica for each shard is determined by which replica won the last leadership election, this is timing dependent and effectively non deterministic.
If we have a three node cluster and start all nodes at the same time, we should have a roughly even distribution of leaders across the nodes.
However, if we start nodes one by one, we will likely find that later nodes don't end up with any leaders as they are joining shard groups that already have leaders. Also if we kill nodes, and then restart them, it's likely the restarted nodes will simply join existing raft groups without a leader election resulting in no leaders on the restarted node.
After #511 is merged we will pin processors to raft leaders and they will be mobile across the cluster. If all leaders are poorly distributed across the cluster then so are processors and therefore processing load is not evenly distributed. This will likely result in reduced overall processing capacity and lack of scalability.

To remedy this issue we should create a new component called Rebalancer.

It will be the job of this component to periodically monitor the distribution of leaders across nodes and if there is a significant imbalance to direct Dragonboat to transfer leadership of a raft group from one node to another.
Dragonboat has a function RequestLeaderTransfer on NodeHost that looks like it an be used to initiate a transfer
The procManager struct maintains the leader state on each node - the Rebalancer can enquire here to get the state.
We should make sure we don't trigger too many transfers.
Consider a heuristic where we initiate transfers when an imbalance threshold has been crossed. We can compute a measure of imbalance.
If there are a lot of leadership elections going on, e.g. at startup, shutdown or if a node has died or restarted, then don't initiate a transfer. A heuristic could be something like "only consider a transfer if there have been no leadership changes for X seconds"
The Rebalancer should implement the Service interface and be started/stopped like other services in Server

The text was updated successfully, but these errors were encountered:

purplefox assigned severb and alex-cash Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard rebalancer #517

Shard rebalancer #517

purplefox commented Aug 12, 2022 •

edited

Loading

Shard rebalancer #517

Shard rebalancer #517

Comments

purplefox commented Aug 12, 2022 • edited Loading

purplefox commented Aug 12, 2022 •

edited

Loading