-
Notifications
You must be signed in to change notification settings - Fork 2
Resend Replication Requests After a Timeout #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
|
||
// ---------------------------------------------------------------------- | ||
type TaskHeap []*TimeoutTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we assume that time moves only forward and not backwards, and we always add a task with the same timeout, then why do we need a heap and not a list? Is it ever possible that we add an element and it won't be the last one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was thinking this gives us more flexibility. if we decide to send timeout requests with different timeouts(for example exponential backoff on certain nodes) then using a heap will properly order them
} | ||
} | ||
|
||
// this is done from normalNode2 since the lagging node will request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this normalNode2 and not the first node? We don't have a time advancement yet, so shouldn't the lagging node ask for the first sequences from the first node?
The first node has its filter set to reject replication requests, but in order for the lagging node to ask from the second node, time needs to pass in the test, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is helpful to wait for normalNode2
to send out its first replication response before we get its second response. Otherwise, the test would occasionally flake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits, LGTM otherwise.
This PR introduces a
TimeoutHandler
, that tracks outgoing replication requests and resends them once a timeout has occurred.The logic for handling incomplete requests is as follows
TimeoutHandler
ReplicationResponse
we check if the response data contains all the expected sequencesThis approach is nice because it keeps the logic fairly simple. However, because we only remove
1 TimeoutTask
every time we receive a response, it's possible we could send duplicate replication requests over the network. For example say we have 2 TimeoutTasks from a node one expects seqs 5-10 and another with 11-15. If we receive seqs5-13
from the node, we will only remove the timeout with5-10
from the tasks maps, and resend a request for 11-15 after a timeout. To keep the logic simpler, I chose to not worry about this case since it's highly unlikely that a node would have two separate timeouts for consecutive sequences.