-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spread checks over time with with max slop time #848
base: main
Are you sure you want to change the base?
Spread checks over time with with max slop time #848
Conversation
Helps avoid dogpiling services that are checked using the same mechanism by adding per-child and per-test random additional sleep time between checks.
I don't feel qualified to review this myself, but I don't know who might be considered maintainer for @mcnewton, you have been adding features recently - any opinion? |
I don't think this patch as is has even been running. Also, did you really see bad effects in real life, If you have seen bad behaviour in real life, |
@krig thanks for the include - not really adding features as much as scratching a few itches :). I haven't tested, but agree with @lge - I'm not convinced it will work as stands. And also would be interested in what situations this is useful for. I have never seen service checks being a problem for the realservers, and we run checks much quicker than the default. Real traffic usually drown the service checks out entirely in our case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments inline.
|
||
Defines the maximum number of second to be randomly added to checkinterval. | ||
|
||
When fork=no this option is not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no code for this that I can see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, ignore that - it's in run_child() rather than ld_main() so it's fine.
If set in the virtual server section then the global value is overridden. | ||
|
||
This option causes checks to spread out in time, in order to avoid | ||
overloading services probed using the same test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstracting checks from services might be nicer, but that would be much more difficult to do.
@@ -2740,7 +2767,7 @@ sub run_child | |||
_check_real($v, $r); | |||
} | |||
$0 = "ldirectord $virtual_id"; | |||
sleep $checkinterval; | |||
sleep $checkinterval + (random($$) * $checkintervalspread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted, should be rand(), and as $$ is the current PID you're multiplying checkintervalspread by what is often likely to be a very large integer, so this becomes something like sleep(a few hours). Which doesn't seem right?
Surely should just be
sleep $checkinterval + rand($checkintervalspread);
?
You're right that it's not in use, but yes, it is meant to address a There are several external services that, for various client-side reason, Thanks. Mike Rylander On Mon, Oct 24, 2016 at 4:02 PM, Lars Ellenberg [email protected]
|
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents-pipeline/job/PR-848/1/input |
Helps avoid dogpiling services that are checked using the same mechanism by adding per-child and per-test random additional sleep time between checks.