Skip to content

Spread checks over time with with max slop time #848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 29 additions & 2 deletions ldirectord/ldirectord.in
Original file line number Diff line number Diff line change
@@ -153,6 +153,23 @@ but ONLY if using forking mode (B<fork = >I<yes>).

Default: 10 seconds

B<checkintervalspread = >I<n>

Defines the maximum number of second to be randomly added to checkinterval.

When fork=no this option is not used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no code for this that I can see?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, ignore that - it's in run_child() rather than ld_main() so it's fine.


When fork=yes this option defines the maximum additional amount of time
each forked child sleeps per virtual service pool before running all
realserver checks for that pool.

If set in the virtual server section then the global value is overridden.

This option causes checks to spread out in time, in order to avoid
overloading services probed using the same test.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstracting checks from services might be nicer, but that would be much more difficult to do.


Default: 0 seconds

B<checkcount = >I<n>

This option is deprecated and slated for removal in a future version.
@@ -770,6 +787,7 @@ use vars qw(
$VERSION_STR
$AUTOCHECK
$CHECKINTERVAL
$CHECKINTERVALSPREAD
$LDIRECTORD
$LDIRLOG
$NEGOTIATETIMEOUT
@@ -858,7 +876,7 @@ set_defaults();
use Getopt::Long;
use Pod::Usage;
#use English;
#use Time::HiRes qw( gettimeofday tv_interval );
use Time::HiRes qw(sleep);
use Socket;
use Socket6 qw(NI_NUMERICHOST NI_NUMERICSERV NI_NAMEREQD getaddrinfo getnameinfo inet_pton inet_ntop);
# Workaround warnning messages : Three "_in6" symbols redefined.
@@ -1275,6 +1293,7 @@ sub set_defaults
$CALLBACK = undef;
$CHECKCOUNT = 1;
$CHECKINTERVAL = 10;
$CHECKINTERVALSPREAD = 0;
$CHECKTIMEOUT = -1;
$CLEANSTOP = "yes";
$DEFAULT_CHECKTIMEOUT = 5;
@@ -1455,6 +1474,9 @@ sub read_config
} elsif ($rcmd =~ /^checkinterval\s*=\s*(.*)/){
$1 =~ /(\d+)/ && $1 or &config_error($line, "invalid checkinterval");
$vsrv{checkinterval} = $1
} elsif ($rcmd =~ /^checkintervalspread\s*=\s*(.*)/){
$1 =~ /(\d+)/ && $1 or &config_error($line, "invalid checkintervalspread");
$vsrv{checkintervalspread} = $1
} elsif ($rcmd =~ /^checkport\s*=\s*(.*)/){
$1 =~ /(\d+)/ or &config_error($line, "invalid port");
( $1 > 0 && $1 < 65536 ) or &config_error($line, "checkport must be in range 1..65536");
@@ -1682,6 +1704,10 @@ sub read_config
$1 =~ /(\d+)/ && $1 or &config_error($line,
"invalid check interval value");
$CHECKINTERVAL = $1;
} elsif ($linedata =~ /^checkintervalspread\s*=\s*(.*)/) {
$1 =~ /(\d+)/ && $1 or &config_error($line,
"invalid check interval spread value");
$CHECKINTERVALSPREAD = $1;
} elsif ($linedata =~ /^checkcount\s*=\s*(.*)/) {
$1 =~ /(\d+)/ && $1 or &config_error($line,
"invalid check count value");
@@ -2721,6 +2747,7 @@ sub run_child
my $real = $$v{real};
my $virtual_id = get_virtual_id_str($v);
my $checkinterval = $$v{checkinterval} || $CHECKINTERVAL;
my $checkintervalspread = $$v{checkintervalspread} || $CHECKINTERVALSPREAD;

# delete any entries in EMAILSTATUS that don't belong to this child
my %myservices = ();
@@ -2740,7 +2767,7 @@ sub run_child
_check_real($v, $r);
}
$0 = "ldirectord $virtual_id";
sleep $checkinterval;
sleep $checkinterval + (random($$) * $checkintervalspread);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, should be rand(), and as $$ is the current PID you're multiplying checkintervalspread by what is often likely to be a very large integer, so this becomes something like sleep(a few hours). Which doesn't seem right?

Surely should just be

sleep $checkinterval + rand($checkintervalspread);

?

ld_emailalert_resend();
}
}