-
Notifications
You must be signed in to change notification settings - Fork 149
Fix retries and failover #320
base: master
Are you sure you want to change the base?
Fix retries and failover #320
Conversation
cade947 to
2be410c
Compare
|
Pushed this to our staging and it seems to work great with authentication failures / stepdowns etc. (and SSL enabled)! |
This reverts commit edd9eed.
|
Looks good to me. @arthurnn What do you think? |
|
+1 |
|
Found one more issue, if you have a replicaset and you want to re-sync a node (because of disk usage) and the node is in Steps taken:
It will keep on retrying to authenticate on this node causing constant failures. |
|
+1 this sees like to fix the issue, too #268 |
|
+1 this works for me. Anybody using it in production? |
|
@jperichon, we've been using it successfully in production for 3+ months. We added a couple of patches on top of it to fix up things it missed. Haven't seen any problems with the included commits though--they've been great. |
Pull-request that fixes failover and retry mechanism.
Changes in details:
with_retrymethod toCluster-- it belongs there as it operates on cluster.Node#flushwas was usingensure_connected, which involves failover, however processing of database messages after executing operations (and raising errors based on them) was outside ofensure_connectedblock, therefore failover mechanism wasn't exercised in most cases it was meant for.Reconfigurefailover mechanism -- it was raising new exceptions but not retrying -- it should be good enough to just retry.Errorsclass toReplyclass, so errors recognition is in one place.Outcome of those changes is that you can kill / restart mongo replica-set nodes in whatever order and as often as you like. You can even stop all of them for couple of seconds (driven by
retry_countandretry_interval) and application will be able to recover without loosing any operations or throwing errors.