-
Notifications
You must be signed in to change notification settings - Fork 114
Separate control plane and data plane; support multiple Gateways #3318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
sjberman
wants to merge
25
commits into
main
Choose a base branch
from
change/control-data-plane-split
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Removing the nginx runtime manager and deployment container since nginx will live in its own pod managed by agent. Temporarily saving the nginx deployment and service for future use. Updated the control plane liveness probe to return true once it's processed all resources, instead of after it's written config to nginx (since nginx may not be started yet in the future architecture).
Updating the nginx docker containers to build and include agent. Once agent is officially released, we can use the published binary instead of building. Added a temporary nginx deployment to the helm chart to deploy a standalone nginx pod. Added the basic gRPC server and agent API implementation to allow for the agent pod to connect to the control plane without errors.
Added the following: - middleware to extract IP address of agent and store it in the grpc context - link the agent's hostname to its IP address when connecting and track it - use this linkage to pause the Subscription until the agent registers itself, then proceeding This logic is subject to change as we enhance this (like tracking auth token instead of IP address).
Problem: When the control plane and data planes are split, the user will need the ability to specify data plane settings on a per-Gateway basis. To allow this, we need to support NginxProxy at the Gateway level in addition the the GatewayClass level. In practice, this means a user can reference an NginxProxy resource via the spec.infrastructure.parametersRef field on the Gateway resource. We still want to support referencing an NginxProxy at the GatewayClass level. If a Gateway and its GatewayClass reference distinct NginxProxy resources, the settings must be merged. Settings specified on a Gateway NginxProxy must override those set on the GatewayClass NginxProxy. Solution: To support NginxProxy at the Gateway level several changes were made to the API. As a result, the API is now at version v1alpha2. Breaking Changes: * Change the scope of the CRD to Namespaced. The parametersRef.namespace field on the GatewayClass is now required. * Make DisableHTTP2 and Telemetry.Exporter.Endpoint optional. New fields: * Telemetry.DisabledFeatures: allows users to explicitly disable telemetry features. It is a list with one supported entry: DisableTracing. More features may be added in future releases. Other changes: * Remove the listType=Map kubebuilder annotation from the RewriteClientIP.TrustedAddresses field. This listType is incorrect since TrustedAddresses can have duplicate keys. The graph now stores NginxProxies that are referenced by the winning GatewayClass and Gateway. This will need to be updated once we support multiple Gateways. The graph is also responsible for merging the NginxProxies when necessary. The result of this is stored on the graph's Gateway object in the field EffectiveNginxProxy. The EffectiveNginxProxy on the Gateway is used to build the NGINX configuration.
This commit adds functionality to send nginx configuration to the agent. It also adds support for the single nginx Deployment to be scaled, and send configuration to all replicas. This requires tracking all Subscriptions for a particular deployment, and receiving all responses from those replicas to determine the status to write to the Gateway.
Problem: The NGINX Plus API conf file was empty when sending using OSS, which caused an error applying config. This also revealed an issue where we received multiple messages from agent, causing some channel blocking. Solution: Don't send the empty NGINX conf file if not running N+. Ignore responses from agent about rollbacks, so we only ever process a single response as expected.
Add leader election to allow data plane pods to only connect to the lead NGF pod. If control plane is scaled, only the leader is marked as ready and the backups are Unready so the data plane doesn't connect to them. Problem: We want the NGF control plane to fail-over to another pod when the control plane pod goes down. Solution: Only the leader pod is marked as ready by Kubernetes, and all connections from data plane pods are connected to the leader pod.
This commit updates the control plane to deploy an NGINX data plane when a valid Gateway resource is created. When the Gateway is deleted or becomes invalid, the data plane is removed. The NginxProxy resource has been updated with numerous configuration options related to the k8s deployment and service configs, which the control plane will apply to the NGINX resources when set. The control plane fully owns the NGINX deployment resources, so users who want to change any configuration must do so using the NginxProxy resource. This does not yet support NGINX Plus or NGINX debug mode. Those will be added in followup pull requests. This also adds some basic daemonset fields, but does not yet support deploying a daemosnet. That will also be added soon.
This reverts commit a5c989e.
* Add back runnables change and call to nginx provisioner enable --------- Co-authored-by: Benjamin Jee <[email protected]>
…3147) Support nginx debug mode when provisioning the Data Plane. Problem: We want to have the option to provision nginx instances in debug mode. Solution: Add debug field to NginxProxy CRD. Also user can set debug field when installing through Helm by setting the nginx.debug flag.
Continuation from the previous commit to add support for provisioning with NGINX Plus. This adds support for duplicating any NGINX Plus or docker registry secrets into the Gateway namespace. Added unit tests.
With the new deployment model, the provisioner mode for conformance tests is no longer needed. This code is removed, and at a later date the conformance tests will be updated to work with the new model. Renamed the "static-mode" to "controller". Also removed some unneeded metrics collection.
Problem: When a user updates or deletes their docker registry or NGINX Plus secrets, those changes need to be propagated to all duplicate secrets that we've provisioned for the Gateway resources. Solution: If updated, update the provisioned secret. If deleted, delete the provisioned secret.
Update functional tests for the control plane data plane split. Problem: The functional tests do not pass with the current architecture. Solution: Add updates to functional tests.
Problem: We want to ensure that the connection between the control plane and data plane is authenticated and secure. Solution: 1. Configure agent to send the kubernetes service token in the request. The control plane validates this token using the TokenReview API to ensure the agent is authenticated. 2. Configure TLS certificates for both the control and data planes. By default, a Job will run when installing NGF that creates self-signed certificates in the nginx-gateway namespace. The server Secret is mounted to the control plane, and the control plane copies the client Secret when deploying nginx resources. This Secret is mounted to the agent. The control plane will reset the agent connection if it detects that its own certs have changed. For production environments, we'll recommend a user configures TLS using cert-manager instead, for better security and certificate rotation.
Problem: The data plane container was not properly handling the kill signal when the Pod was Terminated. Solution: Update the entrypoint to catch the proper signals.
Problem: Now that we have additional pods in the new architecture, we need the proper SecurityContextConstraints for running in Openshift. Solution: Create an SCC for the cert-generator and an SCC for nginx data plane pods on startup. A Role and RoleBinding are created when deploying nginx to link to the SCC.
Problem: Users want to be able to configure multiple Gateways with a single installation of NGF. Solution: Support the ability to create multiple Gateways. Routes and policies can be attached to multiple Gateways. Also fixed conformance tests. --------- Co-authored-by: Saylor Berman <[email protected]>
Update non-functional tests for the control plane data plane split. Problem: The non-functional tests do not work for the control plane data plane split changes. Solution: Update non-functional tests. Testing: Scale, Reconfiguration, Performance, and Longevity tests work. Upgrade test doesn't work, however that is sort of planned since the CP/DP split is a breaking change of NGF and thus you can't easily upgrade with zero downtime. --------- Co-authored-by: Saylor Berman <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3318 +/- ##
==========================================
+ Coverage 86.20% 86.77% +0.56%
==========================================
Files 116 128 +12
Lines 11928 14712 +2784
Branches 62 62
==========================================
+ Hits 10283 12766 +2483
- Misses 1580 1806 +226
- Partials 65 140 +75 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6 tasks
…ervice (#3319) Add ability to set loadBalancerClass for load balancer Service Problem: We would like the ability to specify the loadBalanacerClass field on a load balancer service. Solution: Add ability to set loadBalancerClass for load balancer Service. Testing: Manually tested that deploying NGF with the nginx.service.loadBalancerClass Helm flag would correctly set the field. Also tested that modifying the NginxProxy resource would set the loadBalancerClass when the service was re-created (the field can only be set upon creation).
Problem: All config update events resulted in sending configuration to every Gateway, even if the change was irrelevant. Solution: Compare new config with old config to determine if a ConfigApply is necessary. Simplified the change processor and handler to no longer have to determine this.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
change
Pull requests that introduce a change
dependencies
Pull requests that update a dependency file
documentation
Improvements or additions to documentation
helm-chart
Relates to helm chart
release-notes
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As a route to efficacy and quickly understanding the Gateway API, its implementation and alignment to NGINX as a data plane, we decided on a simplified, but rigid, deployment pattern. To improve our security posture and installation flexibility the control and data planes are being separated as semi-autonomous, distributed components. This also allows us to support multiple Gateways for a single control plane.
A general summary of the changes being made:
Note: there are still a few more implementation steps to finish the work for this feature, but this PR includes all of the main functionality and passing test pipelines. This is not 100% stable yet and is subject for a few more breakages and changes before release.
Design: https://github.com/nginx/nginx-gateway-fabric/tree/main/docs/proposals/control-data-plane-split
Epic: #1508
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.