-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oz Heartbeating #4
Comments
I think the first version is better. It does not make sense to me to request heartbeats. We know we'll always need some, so assume the other node wants us to send some.
Why? Why not make failure detection work at the level of nodes? When a node's status changes, then change the status of all entities associated with that node. We don't need an instance of the FD for every entity, do we?
I think your understanding of permFail is wrong. IIRC, permFail is never set by the FD. It can only be set explicitly through the Kill operation. The FD is there only to switch back and forth from 'ok' and 'tempFail'. It should be in Rahaël Collet's thesis. I'll try and find a pointer. Oh and... One of the critical design goals of Mozart 2 was to be able to write the DSS entirely in Oz. Make sure you do ^^ If you have trouble figuring out how it can be done, let me know. But basically it is supported by the three following core mechanism:
|
I understand the need for fast failure detection, but isn't this very |
It is possible to avoid sending a heartbeat if a useful message was sent not long ago; and accept useful message received as also being a heartbeat. I do not know if many messages are avoided doing so, but IIRC it is a common optimization, indeed. |
Hi Ozers, Very interesting discussion. Please find my comments below. On Fri, Feb 15, 2013 at 7:23 PM, Sébastien Doeraene <
also am I missing anything?
voilà,
|
I'm quite certain you can kill a language entity. I remember having read this in Raphaël's thesis. It might be the case that one can kill a node too, I don't know. |
On Sat, Feb 16, 2013 at 10:16 PM, Sébastien Doeraene <
If you do {Kill A}, the permFail value will appear on the failure stream of Do I get this right? cheers
|
No, if you do {Kill A}, permFail will appear on the fault stream of A, but not B. I don't think there exists such a thing as the fault stream of a node. It's not because the FD is node-based that the fault streams are node-based too. The fault stream of an entity A is derived from two sources of information: the suspicion state of its node, and its explicitly own state. Internally, we have :
Given these two sources of information, the observable fault state of A (the appearing at the end of its fault stream) is computed as follows:
Does that make sense? |
Overall theory of a heartbeat:
Some points I want to clarify:
Two or more approaches are available - note each approach uses asynchronous io (ie zeromq or nanomsg)
I believe version two will be slower as it has to wait for a round trip journey, also heartbeats could be lost on the wire. Whereas version one operates on the data at hand therefore faster to detect failure and send messages.
Please check the logic, also am I missing anything?
The text was updated successfully, but these errors were encountered: