You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each shard has a 1-1 correspondence with a Raft group. We have R replicas of each shard (including the leader), where R is the replication factor.
The leader replica for each shard is determined by which replica won the last leadership election, this is timing dependent and effectively non deterministic.
If we have a three node cluster and start all nodes at the same time, we should have a roughly even distribution of leaders across the nodes.
However, if we start nodes one by one, we will likely find that later nodes don't end up with any leaders as they are joining shard groups that already have leaders. Also if we kill nodes, and then restart them, it's likely the restarted nodes will simply join existing raft groups without a leader election resulting in no leaders on the restarted node.
After #511 is merged we will pin processors to raft leaders and they will be mobile across the cluster. If all leaders are poorly distributed across the cluster then so are processors and therefore processing load is not evenly distributed. This will likely result in reduced overall processing capacity and lack of scalability.
To remedy this issue we should create a new component called Rebalancer.
It will be the job of this component to periodically monitor the distribution of leaders across nodes and if there is a significant imbalance to direct Dragonboat to transfer leadership of a raft group from one node to another.
Dragonboat has a function RequestLeaderTransfer on NodeHost that looks like it an be used to initiate a transfer
The procManager struct maintains the leader state on each node - the Rebalancer can enquire here to get the state.
We should make sure we don't trigger too many transfers.
Consider a heuristic where we initiate transfers when an imbalance threshold has been crossed. We can compute a measure of imbalance.
If there are a lot of leadership elections going on, e.g. at startup, shutdown or if a node has died or restarted, then don't initiate a transfer. A heuristic could be something like "only consider a transfer if there have been no leadership changes for X seconds"
The Rebalancer should implement the Service interface and be started/stopped like other services in Server
The text was updated successfully, but these errors were encountered:
Each shard has a 1-1 correspondence with a Raft group. We have R replicas of each shard (including the leader), where R is the replication factor.
The leader replica for each shard is determined by which replica won the last leadership election, this is timing dependent and effectively non deterministic.
If we have a three node cluster and start all nodes at the same time, we should have a roughly even distribution of leaders across the nodes.
However, if we start nodes one by one, we will likely find that later nodes don't end up with any leaders as they are joining shard groups that already have leaders. Also if we kill nodes, and then restart them, it's likely the restarted nodes will simply join existing raft groups without a leader election resulting in no leaders on the restarted node.
After #511 is merged we will pin processors to raft leaders and they will be mobile across the cluster. If all leaders are poorly distributed across the cluster then so are processors and therefore processing load is not evenly distributed. This will likely result in reduced overall processing capacity and lack of scalability.
To remedy this issue we should create a new component called Rebalancer.
RequestLeaderTransfer
onNodeHost
that looks like it an be used to initiate a transferprocManager
struct maintains the leader state on each node - theRebalancer
can enquire here to get the state.The text was updated successfully, but these errors were encountered: