-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NE-1790: Enable Dynamic Configuration Manager feature gate #1159
NE-1790: Enable Dynamic Configuration Manager feature gate #1159
Conversation
@alebedev87: This pull request references NE-1790 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
||
// Set max dynamic server when DCM feature gate is on. | ||
ic.Spec.UnsupportedConfigOverrides = runtime.RawExtension{ | ||
Raw: []byte(`{"maxDynamicServers":"7"}`), | ||
} | ||
deployment, err = desiredRouterDeployment(ic, ingressControllerImage, ingressConfig, infraConfig, apiConfig, networkConfig, proxyNeeded, false, nil, clusterProxyConfig, false, true) | ||
if err != nil { | ||
t.Fatalf("invalid router Deployment: %v", err) | ||
} | ||
tests = []envData{ | ||
{"ROUTER_HAPROXY_CONFIG_MANAGER", true, "true"}, | ||
{"ROUTER_MAX_DYNAMIC_SERVERS", true, "7"}, | ||
} | ||
if err := checkDeploymentEnvironment(t, deployment, tests); err != nil { | ||
t.Error(err) | ||
} | ||
|
||
// Set max dynamic server to a wrong value when DCM feature gate is on. | ||
ic.Spec.UnsupportedConfigOverrides = runtime.RawExtension{ | ||
Raw: []byte(`{"maxDynamicServers":"wrong"}`), | ||
} | ||
deployment, err = desiredRouterDeployment(ic, ingressControllerImage, ingressConfig, infraConfig, apiConfig, networkConfig, proxyNeeded, false, nil, clusterProxyConfig, false, true) | ||
if err != nil { | ||
t.Fatalf("invalid router Deployment: %v", err) | ||
} | ||
tests = []envData{ | ||
{"ROUTER_HAPROXY_CONFIG_MANAGER", true, "true"}, | ||
{"ROUTER_MAX_DYNAMIC_SERVERS", true, "1"}, | ||
} | ||
if err := checkDeploymentEnvironment(t, deployment, tests); err != nil { | ||
t.Error(err) | ||
} | ||
|
||
// Opt out from DCM via unsupportedConfigOverride. | ||
ic.Spec.UnsupportedConfigOverrides = runtime.RawExtension{ | ||
Raw: []byte(`{"dynamicConfigManager":"false", "maxDynamicServers":"7"}`), | ||
} | ||
deployment, err = desiredRouterDeployment(ic, ingressControllerImage, ingressConfig, infraConfig, apiConfig, networkConfig, proxyNeeded, false, nil, clusterProxyConfig, false, true) | ||
if err != nil { | ||
t.Fatalf("invalid router Deployment: %v", err) | ||
} | ||
tests = []envData{ | ||
{"ROUTER_HAPROXY_CONFIG_MANAGER", false, ""}, | ||
{"ROUTER_MAX_DYNAMIC_SERVERS", false, "0"}, | ||
} | ||
if err := checkDeploymentEnvironment(t, deployment, tests); err != nil { | ||
t.Error(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long ago, we had just one gigantic TestDesiredRouterDeployment
unit test. This test became difficult to modify when we needed to expand test coverage, and so in 2849fcf, Candace started splitting that gigantic test into multiple tests. We should continue in that direction: more, smaller, focused tests rather than fewer, larger, "all the things!" tests.
What do you think about defining a new TestDesiredRouterDeploymentDynamicConfigManager
test, perhaps using a list of test cases à la TestDesiredRouterDeploymentDefaultPlacement
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we need a dedicated test. I thought of a test for all unsupportedConfigOverrides
, will start it with DCM and maxDynamicServers stetting, the other ones can be migrated later.
Upd: This test already exists: TestUnsupportedConfigOverride
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about defining a new TestDesiredRouterDeploymentDynamicConfigManager test, perhaps using a list of test cases à la TestDesiredRouterDeploymentDefaultPlacement?
Upd: This test already exists: TestUnsupportedConfigOverride.
At the end, I think the best way would be to make it a dedicated unit test indeed. Working on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestDesiredRouterDeploymentDynamicConfigManager
unit test added.
test/e2e/operator_test.go
Outdated
{"contstats", "contStats", "ROUTER_HAPROXY_CONTSTATS"}, | ||
{"dynamic-config-manager", "dynamicConfigManager", "ROUTER_HAPROXY_CONFIG_MANAGER", "true", "false"}, | ||
{"contstats", "contStats", "ROUTER_HAPROXY_CONTSTATS", "", "true"}, | ||
{"max-dynamic-servers", "maxDynamicServers", "ROUTER_MAX_DYNAMIC_SERVERS", "2", "7"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should defaultValue
for ROUTER_MAX_DYNAMIC_SERVERS
be "1"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should. ROUTER_HAPROXY_CONFIG_MANAGER
should have false
default value too. These tests failed in CI:
--- FAIL: TestAll/parallel/TestUnsupportedConfigOverride (306.33s)
--- FAIL: TestAll/parallel/TestUnsupportedConfigOverride/dynamic-config-manager (111.43s)
--- PASS: TestAll/parallel/TestUnsupportedConfigOverride/contstats (83.49s)
--- FAIL: TestAll/parallel/TestUnsupportedConfigOverride/max-dynamic-servers (111.41s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
test/e2e/operator_test.go
Outdated
@@ -3424,10 +3424,11 @@ func TestUnsupportedConfigOverride(t *testing.T) { | |||
t.Parallel() | |||
|
|||
var tests = []struct { | |||
name, unsupportedConfigOverride, env string | |||
name, unsupportedConfigOverride, env, defaultValue, value string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would make the test easier to follow:
name, unsupportedConfigOverride, env, defaultValue, value string | |
name, unsupportedConfigOverride, env, initialValue, updatedValue string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept defaultValue
as I think it matches the idea of this variable - expected default value when no overrides are given. Also, I changed value
to updateValue
and added expectedValue
because for the max dynamic servers setting the value after the updated may be different from the update value.
7717e17
to
8f6f109
Compare
@alebedev87: This pull request references NE-1790 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
9f1b7d2
to
a42eff9
Compare
@alebedev87: This pull request references NE-1790 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@alebedev87: This pull request references NE-1790 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/test e2e-aws-operator-techpreview |
/test e2e-aws-ovn-single-node |
/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator-techpreview openshift/origin/29180 |
@alebedev87,
|
/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator-techpreview openshift/origin#29180 |
/testwith e2e-aws-ovn-techpreview openshift/origin#29180 |
@alebedev87,
|
/testwith openshift/cluster-ingress-operator/master/e2e-aws-ovn-techpreview openshift/origin#29180 |
a42eff9
to
677207b
Compare
@alebedev87: This pull request references NE-1790 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Added a commit for the k8s and openshift api bumps. |
677207b
to
687a892
Compare
/retest |
687a892
to
88b7758
Compare
- Enabled Dynamic Configuration Manager if corresponding feature is on. - Added a new unsupported config override option `maxDynamicServers`, with a default value of 1, which is used only if the feature gate is on. - Disabled DCM route blueprints by setting the pool size to 0. - Added a dedicated unit test for desiredRouterDeployemnt with DCM feature gate.
c9b8f39
to
9532e78
Compare
As discussed during the PR scrub, removed the upper bound limit for the |
/lgtm |
/retest |
1 similar comment
/retest |
@alebedev87: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/label acknowledge-critical-fixes-only This PR does not introduce any change to the default featureset of OpenShift. |
@@ -121,7 +126,7 @@ func (r *reconciler) ensureRouterDeployment(ci *operatorv1.IngressController, in | |||
if err != nil { | |||
return false, nil, fmt.Errorf("failed to determine if proxy protocol is needed for ingresscontroller %s/%s: %v", ci.Namespace, ci.Name, err) | |||
} | |||
desired, err := desiredRouterDeployment(ci, r.config.IngressControllerImage, ingressConfig, infraConfig, apiConfig, networkConfig, proxyNeeded, haveClientCAConfigmap, clientCAConfigmap, clusterProxyConfig, r.config.RouteExternalCertificateEnabled) | |||
desired, err := desiredRouterDeployment(ci, r.config.IngressControllerImage, ingressConfig, infraConfig, apiConfig, networkConfig, proxyNeeded, haveClientCAConfigmap, clientCAConfigmap, clusterProxyConfig, r.config.RouteExternalCertificateEnabled, r.config.IngressControllerDCMEnabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if there is anything else to change, I would suggest refactoring the desiredRouterDeployment
signature to pass in r.config
instead of three different parameters in r.config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can be done in a separate PR.
@@ -523,6 +528,15 @@ func desiredRouterDeployment(ci *operatorv1.IngressController, ingressController | |||
LoadBalancingAlgorithm string `json:"loadBalancingAlgorithm"` | |||
DynamicConfigManager string `json:"dynamicConfigManager"` | |||
ContStats string `json:"contStats"` | |||
// maxDynamicServers specifies the number of dynamic server slots that will be added to each route. | |||
// This setting can only be configured if dynamicConfigManager is `true`. | |||
// The default value is 1, with a maximum of 10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The default value is 1, with a maximum of 10. | |
// The default value is 1. |
// added, the fewer reloads required when scaling up. | ||
// Note, however, that dynamic servers consume memory even when not enabled. | ||
// Use this analysis of the memory usage to assess the impact of different numbers of servers: | ||
// https://gist.github.com/frobware/2b527ce3f040797909eff482a4776e0b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: When this becomes suppported, we should add a link to an OpenShift repo.
{"ROUTER_MAX_DYNAMIC_SERVERS", true, "100"}, | ||
{"ROUTER_BLUEPRINT_ROUTE_POOL_SIZE", true, "0"}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test to see what happens when unsupportedConfigOverrides.DynamicConfigManager
is an empty string? Or if it is malformatted (like "100K") or over the HAProxy limit for numbers (not sure if that applies here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
featuregate-enabled-default-values
and featuregate-enabled-max-servers
tests don't set unsupportedConfigOverrides.dynamicConfigManager
which results into an ""
after the unmarshaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant when the max dynamic servers is set to "" or something like "100K" or over the HAProxy limit for numbers.
// Note, however, that dynamic servers consume memory even when not enabled. | ||
// Use this analysis of the memory usage to assess the impact of different numbers of servers: | ||
// https://gist.github.com/frobware/2b527ce3f040797909eff482a4776e0b | ||
MaxDynamicServers string `json:"maxDynamicServers"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be an int, not a string. Users may be confused that they'd have to specify 10000 as "10000" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I just followed the same pattern as for the other overrides. Two of them (dynamicConfigManager
,contStats
) could be bool
s because bool
is a valid JSON type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I think we have precedent for string-based bools. Choosing the string type for numbers allows malformatting, like "100K", which will fail later.
{"ROUTER_MAX_DYNAMIC_SERVERS", false, ""}, | ||
{"ROUTER_BLUEPRINT_ROUTE_POOL_SIZE", false, ""}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these constants both have variable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unit tests, I try to abstain from using the constants which are used in the code. This avoids repeating mistakes made in a code (typo etc.) and brings an additional check.
@@ -1109,6 +1115,115 @@ func TestDesiredRouterDeploymentClientTLS(t *testing.T) { | |||
} | |||
} | |||
|
|||
// TestDesiredRouterDeploymentDynamicConfigManager verifies that desiredRouterDeployment | |||
// returns the expected deployment when DCM featuregate is involved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// returns the expected deployment when DCM featuregate is involved. | |
// returns the expected deployment when the DCM featuregate is enabled. |
@candita: I would prefer to follow up on the comment changes (1, 2) in a separate PR (along with this enhacement) unless there are some blocking remarks. |
RouterHAProxyMaxDynamicServers = "ROUTER_MAX_DYNAMIC_SERVERS" | ||
RouterHAProxyMaxDynamicServersDefaultValue = 1 | ||
|
||
RouterHAProxyBlueprintRoutePoolSize = "ROUTER_BLUEPRINT_ROUTE_POOL_SIZE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two env variables (ROUTER_MAX_DYNAMIC_SERVERS and ROUTER_BLUEPRINT_ROUTE_POOL_SIZE) are not found in our haproxy.config template yet and I don't see a PR that adds them. Does that matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not used in the config template, they are only for the configuration of the router's DCM behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but it's strange that they'd need to be env variables in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator usually uses environment variables rather than command-line switches for openshift-router, for which reason I asked Andrey to use environment variables in this case too (#1159 (comment)).
@@ -84,6 +84,11 @@ const ( | |||
|
|||
RouterHAProxyConfigManager = "ROUTER_HAPROXY_CONFIG_MANAGER" | |||
|
|||
RouterHAProxyMaxDynamicServers = "ROUTER_MAX_DYNAMIC_SERVERS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does there need to be any change for this in the flags https://github.com/openshift/router/blob/master/pkg/cmd/infra/router/template.go#L185?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Why? This flag accepts an environment varible as an alternative way of using it. This one is limited to 1 as minimal value and defaults to 5. I set the envvar explicitly to 1 so no default value is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that you changed the minimum for BlueprintRoutePoolSize, and wondered why the default didn't need to change for MaxDynamicServers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to deliver an MVP in 4.18, we decided to turn off blueprint routes and pre-allocate 1 dynamic server per backend. openshift/enhancements#1687 explains the rationale.
{"dynamic-config-manager", "dynamicConfigManager", "ROUTER_HAPROXY_CONFIG_MANAGER", dcmDefaultValue, dcmUpdateValue, dcmExpectedValue}, | ||
{"contstats", "contStats", "ROUTER_HAPROXY_CONTSTATS", "", "true", "true"}, | ||
{"max-dynamic-servers", "maxDynamicServers", "ROUTER_MAX_DYNAMIC_SERVERS", maxServersDefaultValue, maxServersUpdateValue, maxServersExpectedValue}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We do have variable names for many of these hard-coded values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there's no test for the route blueprint default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there's no test for the route blueprint default.
Right, the test was made the way that only 1 variable is expected to be checked for a given config override. I didn't have a goal to fully cover all configuration permutations in this e2e test, I just wanted to adapt the test to work with two new config overrides. The route blueprint configurations are well covered in the unit tests (TestDesiredRouterDeploymentDynamicConfigManager
).
if dcmEnabled, err := isFeatureGateEnabled(features.FeatureGateIngressControllerDynamicConfigurationManager); err != nil { | ||
t.Fatalf("failed to get dynamic config manager feature gate: %v", err) | ||
} else if dcmEnabled { | ||
dcmDefaultValue = "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to log something here to let us know this was run on a techpreview cluster.
dcmDefaultValue = "true" | |
t.Logf("DynamicConfigurationManager feature gate is enabled for this test") | |
dcmDefaultValue = "true" |
if err := waitForDeploymentEnvVar(t, kclient, deployment, 30*time.Second, tt.env, ""); err != nil { | ||
t.Fatalf("expected initial deployment not to set %s=true: %v", tt.env, err) | ||
if err := waitForDeploymentEnvVar(t, kclient, deployment, 30*time.Second, tt.env, tt.defaultValue); err != nil { | ||
t.Fatalf("expected initial deployment not to set %s=%s: %v", tt.env, tt.defaultValue, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the feature gate is enabled, we don't expect to see the max servers set to 1, the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, ROUTER_MAX_DYNAMIC_SERVERS
is only set if the dynamic config manager is enabled:
cluster-ingress-operator/pkg/operator/controller/ingress/deployment.go
Lines 594 to 611 in 9532e78
if enableDCM { | |
env = append(env, corev1.EnvVar{ | |
Name: RouterHAProxyConfigManager, | |
Value: "true", | |
}, corev1.EnvVar{ | |
Name: RouterHAProxyBlueprintRoutePoolSize, | |
Value: "0", | |
}) | |
// Max dynamic servers override is only taken into account when DCM is enabled. | |
routerHAProxyMaxDynamicServersValue := RouterHAProxyMaxDynamicServersDefaultValue | |
if v, err := strconv.Atoi(unsupportedConfigOverrides.MaxDynamicServers); err == nil && v > 0 { | |
routerHAProxyMaxDynamicServersValue = v | |
} | |
env = append(env, corev1.EnvVar{ | |
Name: RouterHAProxyMaxDynamicServers, | |
Value: strconv.Itoa(routerHAProxyMaxDynamicServersValue), | |
}) |
@alebedev87 All of my review comments can be handled in a followup PR unless you see something that has a greater impact than I assumed. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: candita The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-ingress-operator |
maxDynamicServers
, with a default value of 1, which is used only if the feature gate is on.