-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in-cluster DNS and load balancers on more platforms #1666
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,320 @@ | ||||||
--- | ||||||
title: in-cluster-dns-and-loadbalancers-on-more-platforms | ||||||
authors: | ||||||
- "@mhrivnak" | ||||||
- "@eranco74" | ||||||
reviewers: # Include a comment about what domain expertise a reviewer is expected to bring and what area of the enhancement you expect them to focus on. For example: - "@networkguru, for networking aspects, please look at IP bootstrapping aspect" | ||||||
- "@cybertron" | ||||||
- "@tsorya" | ||||||
- "@zaneb" | ||||||
approvers: # A single approver is preferred, the role of the approver is to raise important questions, help ensure the enhancement receives reviews from all applicable areas/SMEs, and determine when consensus is achieved such that the EP can move forward to implementation. Having multiple approvers makes it difficult to determine who is responsible for the actual approval. | ||||||
- TBD | ||||||
api-approvers: # In case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers). If there is no API change, use "None" | ||||||
- TBD | ||||||
creation-date: 2024-08-26 | ||||||
last-updated: 2024-08-26 | ||||||
tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement | ||||||
see-also: | ||||||
- "/enhancements/network/baremetal-networking.md" | ||||||
replaces: | ||||||
superseded-by: | ||||||
--- | ||||||
|
||||||
# In-cluster DNS and load balancers on more platforms | ||||||
|
||||||
## Summary | ||||||
|
||||||
Multiple on-prem platform types, including `baremetal` and `openstack`, | ||||||
[provide in-cluster implementations of network | ||||||
services](https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md) | ||||||
that are required in order to have a viable stand-alone cluster: | ||||||
|
||||||
* CoreDNS for in-cluster DNS resolution | ||||||
* haproxy with keepalived to provide in-cluster load balancers (ingress) for the API server and workloads | ||||||
|
||||||
Continuing the work from that original [enhancement | ||||||
proposal](https://github.com/openshift/enhancements/blob/master/enhancements/network/baremetal-networking.md), | ||||||
those services should also be available for optional inclusion when installing | ||||||
a cluster with the `external` or `none` platform types, which are likewise | ||||||
often used in environments that lack a suitable alternative for DNS and/or load | ||||||
balancers. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
Provisioning and configuring a [DNS | ||||||
system](https://docs.openshift.com/container-platform/4.16/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-dns-user-infra_installing-platform-agnostic) | ||||||
and [load | ||||||
balancers](https://docs.openshift.com/container-platform/4.16/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-load-balancing-user-infra_installing-platform-agnostic) | ||||||
manually for an OpenShift cluster is a substantial burden on the user. In a | ||||||
cloud environment that's not already supported by the OpenShift installer, | ||||||
utilizing the native offerings requires additional work up-front, creates an | ||||||
ongoing maintenance burden, and more monetary cost for infrastructure. And not | ||||||
all cloud environments offer suitable options. On-prem, there may not exist | ||||||
sufficient DNS and/or load balancer services nor additional infrastructure on | ||||||
which to run them. | ||||||
|
||||||
Many users end up deploying a cluster with platform type `baremetal` when all | ||||||
they really want is to use the in-cluster network services. Instead, it should | ||||||
be possible to utilize those in-cluster network services without them being | ||||||
coupled to the baremetal platform type. | ||||||
|
||||||
For example, the assisted-installer and the agent based installer are often | ||||||
used to deploy clusters into “generic” environments where there is not an | ||||||
opportunity to utilize an external DNS or load balancer solution. Thus the | ||||||
assisted-installer sets the platform type as `baremetal` regardless of whether | ||||||
the systems are actually running on bare metal or whether there is any intent | ||||||
to use metal3 integrations. The resulting cluster has all of the appearances of | ||||||
being bare metal, including a BareMetalHost resource for each Node, which can | ||||||
be confusing to users. Even the web console’s Overview landing page shows | ||||||
“Infrastructure provider: BareMetal” in addition to “N Nodes” and “N Bare Metal | ||||||
Hosts”. | ||||||
|
||||||
Single Node OpenShift uses platform type `none` and [requires the user to | ||||||
configure DNS records | ||||||
manually](https://docs.openshift.com/container-platform/4.16/installing/installing_sno/install-sno-installing-sno.html#install-sno-installing-sno-manually). | ||||||
When using the assisted-installer, it configures dnsmasq in new SNO clusters as | ||||||
a convenience. But it would be better for the internal DNS service to be a | ||||||
native part of platform type `none` so that it is easily available to all | ||||||
users, regardless of how they are installing SNO. | ||||||
|
||||||
### User Stories | ||||||
|
||||||
As a user deploying OpenShift in an environment that lacks a suitable DNS | ||||||
and/or load balancer solution, and with no intent to utilize metal3-related | ||||||
bare metal features, I want to utilize the in-cluster network services without | ||||||
being forced to use the `baremetal` platform type. | ||||||
|
||||||
As a user deploying OpenShift with the `external` platform type into an | ||||||
environment of my choosing, I want the option to use the in-cluster network | ||||||
services because they are easier to use than manually deploying, configuring | ||||||
and managing the alternatives that may be natively available in the | ||||||
environment. | ||||||
|
||||||
As a user deploying Single Node OpenShift, I want the convenience of a | ||||||
cluster-internal DNS solution. | ||||||
|
||||||
As a user deploying OpenShift in a mixed environment, such as [virtualized | ||||||
control plane nodes and bare metal worker | ||||||
nodes](https://access.redhat.com/solutions/5376701), I am forced to select | ||||||
platform type `none`, but I still want the option to use the in-cluster network | ||||||
services. | ||||||
|
||||||
As a developer enabling OpenShift on a new platform via the `external` platform | ||||||
type, I want to get an OpenShift cluster up and running with as little friction | ||||||
as possible so I can start adding integrations with features of the | ||||||
environment. | ||||||
|
||||||
### Goals | ||||||
|
||||||
Enable stand-alone OpenShift clusters to be viable out-of-the-box in | ||||||
environments that A) lack a suitable external DNS and/or load balancer | ||||||
solution, and B) are not one of the platform types that already provide those | ||||||
services in-cluster (`baremetal`, `openstack`, `vsphere`, and `ovirt`). | ||||||
|
||||||
Allow users to opt-in for in-cluster DNS and load balancer services with | ||||||
platform types `none` and `external`. | ||||||
|
||||||
Stop requiring users to select the `baremetal` platform type when all they | ||||||
really want is the in-cluster DNS and load balancer services. | ||||||
|
||||||
Make it easy for Single Node OpenShift users to deploy the cluster-internal DNS | ||||||
service. | ||||||
|
||||||
### Non-Goals | ||||||
|
||||||
The in-cluster network infrastructure has a limitation that it requires nodes | ||||||
to be on the same subnet. This proposal does not seek to change or remove that | ||||||
limitation. | ||||||
|
||||||
## Proposal | ||||||
|
||||||
The install-config.yaml platform section for both `external` and `none` will | ||||||
include optional fields to deploy and configure coredns and/or the in-cluster | ||||||
load balancers. Actual deployment and management will be handled the same way | ||||||
it already is on other platforms. | ||||||
|
||||||
### Workflow Description | ||||||
|
||||||
A user or automation tool (such as the assisted-installer) that is editing | ||||||
install-config.yaml prior to cluster installation will be able to: | ||||||
* Enable internal DNS | ||||||
* Provide VIPs that implicitly enable in-cluster load balancers | ||||||
|
||||||
### API Extensions | ||||||
|
||||||
In the `InstallConfig`, the sections for External and None platforms will have | ||||||
new settings that: | ||||||
* Enable internal DNS | ||||||
* Provide VIPs that implicitly enable internal load balancers (See example under Implementation Details) | ||||||
|
||||||
The `Infrastructure` API will add fields in the | ||||||
[`PlatformSpec`](https://github.com/openshift/api/blob/ef419b6/config/v1/types_infrastructure.go#L272) | ||||||
and | ||||||
[`PlatformStatus`](https://github.com/openshift/api/blob/ef419b6/config/v1/types_infrastructure.go#L389) | ||||||
that mirror the corresponding fields for baremetal, including: | ||||||
* `APIServerInternalIPs` in Spec and Status | ||||||
* `IngressIPs` in Spec and Status | ||||||
* `LoadBalancer` in Status | ||||||
|
||||||
Those fields will be added to the `External` platform Spec and Status. For the | ||||||
`None` platform, a new Spec and Status section will need to be created. | ||||||
|
||||||
### Topology Considerations | ||||||
|
||||||
#### Hypershift / Hosted Control Planes | ||||||
|
||||||
None | ||||||
|
||||||
#### Standalone Clusters | ||||||
|
||||||
The change is only relevant for standalone clusters. | ||||||
|
||||||
#### Single-node Deployments or MicroShift | ||||||
|
||||||
Single Node OpenShift benefits from this change as described above. Being a | ||||||
single node, it does not need the loadbalancers, but it does require a DNS | ||||||
solution. | ||||||
|
||||||
Assisted-installer already deploys dnsmasq by default as a cluster-internal DNS | ||||||
solution for SNO, which has been valuable and successful. | ||||||
|
||||||
### Implementation Details/Notes/Constraints | ||||||
|
||||||
The `InstallConfig` will gain new settings for `InClusterLoadBalancer` and | ||||||
`InternalDNS`. They are shown below, added to the existing settings for the | ||||||
External platform type. | ||||||
|
||||||
``` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
type InClusterLoadBalancer struct { | ||||||
// APIVIPs contains the VIP(s) to use for internal API communication. In | ||||||
// dual stack clusters it contains an IPv4 and IPv6 address, otherwise only | ||||||
// one VIP | ||||||
// | ||||||
// +kubebuilder:validation:MaxItems=2 | ||||||
// +kubebuilder:validation:UniqueItems=true | ||||||
// +kubebuilder:validation:Format=ip | ||||||
Comment on lines
+194
to
+195
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit, for real APIs we use CEL for both of these now, these validations don't actually work as intended |
||||||
APIVIPs []string `json:"apiVIPs,omitempty"` | ||||||
|
||||||
// IngressVIPs contains the VIP(s) to use for ingress traffic. In dual stack | ||||||
// clusters it contains an IPv4 and IPv6 address, otherwise only one VIP | ||||||
// | ||||||
// +kubebuilder:validation:MaxItems=2 | ||||||
// +kubebuilder:validation:UniqueItems=true | ||||||
// +kubebuilder:validation:Format=ip | ||||||
IngressVIPs []string `json:"ingressVIPs,omitempty"` | ||||||
} | ||||||
Comment on lines
+188
to
+205
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are the existing API/ingress VIP fields going to be deprecated/moved to this type? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hopefully that's not necessary. I proposed an API here just to have a standing point for conversation, but I'm happy for the team to weigh in on what would make the most sense. There needs to be some balance of fitting in with the existing APIs, but also making it obvious and simple to enable or disable the feature. And for this proposal, the default should continue to be "disabled" since that matches existing behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following external-lb-vips enhancement a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's prior art for controlling loadbalancer deployment based on either the existence of the VIPs or an explicit field (as in the feature @eranco74 linked). The existence of the VIPs was something of a kludge because the VSphere UPI implementation was done before we had these components and we needed to be able to differentiate UPI and IPI without breaking existing logic. Given that, I'd mildly prefer to use the explicit field (which would default to the opposite of what the existing platforms do), in part because the existing VSphere UPI logic is already a headache that semi-regularly trips us up. It's probably not a big deal as long as we pick a method and stick to it for these new platforms, but if we implicitly enable the LB when VIPs are provided, then we'll have three different sets of logic: Based only on the loadBalancer property (non-vsphere on-prem platforms), based on both the loadBalancer property and the existence of VIPs (vsphere), and only the existence of VIPs (none and external). We're less likely to accidentally break this in the future if it works the same as the other on-prem platforms and VSphere is our only outlier. |
||||||
|
||||||
|
||||||
// Platform stores configuration related to external cloud providers. | ||||||
type Platform struct { | ||||||
// PlatformName holds the arbitrary string representing the infrastructure | ||||||
// provider name, expected to be set at the installation time. This field | ||||||
// is solely for informational and reporting purposes and is not expected | ||||||
// to be used for decision-making. | ||||||
// +kubebuilder:default:="Unknown" | ||||||
// +default="Unknown" | ||||||
// +kubebuilder:validation:XValidation:rule="oldSelf == 'Unknown' || self == oldSelf",message="platform name cannot be changed once set" | ||||||
// +optional | ||||||
PlatformName string `json:"platformName,omitempty"` | ||||||
|
||||||
// CloudControllerManager when set to external, this property will enable | ||||||
// an external cloud provider. | ||||||
// +kubebuilder:default:="" | ||||||
// +default="" | ||||||
// +kubebuilder:validation:Enum="";External | ||||||
// +optional | ||||||
CloudControllerManager CloudControllerManager `json:"cloudControllerManager,omitempty"` | ||||||
|
||||||
// InClusterLoadBalancer is an optional feature that uses haproxy and | ||||||
// keepalived as loadbalancers running in the cluster. Is is useful in | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
// environments where it is not possible or desirable to use loadbalancers outside | ||||||
// of the cluster. | ||||||
// +optional | ||||||
InClusterLoadBalancer *InClusterLoadBalancer `json:"inClusterLoadBalancer,omitempty"` | ||||||
|
||||||
// InternalDNS, when set, activates a DNS service running inside the cluster | ||||||
// to provide DNS resolution internally. It is useful in environments where | ||||||
// it is not possible or desirable to manage the cluster's internal DNS | ||||||
// records in an external DNS system. | ||||||
// +kubebuilder:default:="" | ||||||
// +default="" | ||||||
// +kubebuilder:validation:Enum="";CoreDNS | ||||||
// +optional | ||||||
InternalDNS InternalDNS `json:"internalDNS,omitempty"` | ||||||
} | ||||||
|
||||||
type InternalDNSType string | ||||||
|
||||||
const ( | ||||||
// CoreDNS is the default service used to implement internal DNS within a cluster. | ||||||
CoreDNS InternalDNS = "CoreDNS" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we go with the loadBalancer field discussed above, we probably want this to have a default value of "None" or "UserManaged" or something like that to match the loadBalancer behavior. We also may not want to use CoreDNS in the option name. There have been discussions about changing the implementation of the on-prem DNS to something lighter weight and it would be confusing if the option CoreDNS actually deployed dnsmasq. That's also why the loadBalancer parameter names are "OpenShiftManaged" and "UserManaged" instead of "keepalived" and "none". |
||||||
) | ||||||
``` | ||||||
|
||||||
### Risks and Mitigations | ||||||
|
||||||
All of the components in question are already widely deployed in OpenShift | ||||||
clusters. | ||||||
|
||||||
### Drawbacks | ||||||
|
||||||
|
||||||
## Open Questions [optional] | ||||||
|
||||||
|
||||||
## Test Plan | ||||||
|
||||||
**Note:** *Section not required until targeted at a release.* | ||||||
|
||||||
## Graduation Criteria | ||||||
|
||||||
**Note:** *Section not required until targeted at a release.* | ||||||
|
||||||
### Dev Preview -> Tech Preview | ||||||
|
||||||
|
||||||
### Tech Preview -> GA | ||||||
|
||||||
|
||||||
### Removing a deprecated feature | ||||||
|
||||||
|
||||||
## Upgrade / Downgrade Strategy | ||||||
|
||||||
No change to how the components are upgraded and/or downgraded today. | ||||||
|
||||||
## Version Skew Strategy | ||||||
|
||||||
No change. | ||||||
|
||||||
## Operational Aspects of API Extensions | ||||||
|
||||||
This proposal will enable clusters installed in the future to have fewer CRDs, | ||||||
since they'll be able to use in-cluster network services without having to | ||||||
select the `baremetal` platform type. Thus, the unused CRDs from the | ||||||
`baremetal` platform won't be present on those clusters. | ||||||
|
||||||
## Support Procedures | ||||||
|
||||||
No new support implications. | ||||||
|
||||||
## Alternatives | ||||||
|
||||||
### Move In-Cluster Network Settings Out of the Platform Spec | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, in a perfect world I'd like to see this happen, for a couple of reasons:
That said, I recognize this could complicate things quite a lot, as discussed in the cons section. I guess you can consider this a vote in favor of a common config section, but not a vote against the proposal above. |
||||||
|
||||||
Instead of embedding the in-cluster network settings within the platform | ||||||
specification, these settings could be moved to a separate, dedicated section | ||||||
in the install-config.yaml. This approach would completely decouple the setup | ||||||
of these in-cluster network services from platform-specific settings, allowing | ||||||
greater flexibility in utilizing the network services on any platform type. | ||||||
|
||||||
Pros: | ||||||
* Decouples the in-cluster network services from platform-specific settings. | ||||||
* Simplifies the platform specification and provides a clear, dedicated section for network services. | ||||||
* De-duplicates settings that have been mirrored into `baremetal`, `openstack`, and `ovirt` platforms. | ||||||
|
||||||
Cons: | ||||||
* Introduces a new section in the configuration file, which may confuse users. | ||||||
* May conflict with the settings for these network services that already exist on specific platforms, including baremetal and openstack. | ||||||
* Would require guardrails to ensure they don’t get deployed with platforms that utilize other solutions, even if the user configures them for deployment in the install-config. | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a hard limitation? In general BYO infra in cloud providers using publish strategy
External
they create public and private subnets, placing nodes in private subnets.I see in the EP provide in-cluster implementations of network services that the the Load Balancer service is created, initially in Bootstrap node then moved to Control Plane nodes when
InClusterLoadBalancer
is set, would be possible to have certain flexibility to configure which node it will be deployed? Is it makes sense? I am wondering if we could advise "non-integrated" cloud providers which does not met LB requirements (eg hairpin), or does not offer cloud-based LBs to isolate the nodes where the Load Balancer is deployed in public installations, allowing them to also opt-in this feature, keeping that network standard: keeping control planes in private subnets, and haproxy deployed in nodes where they can expose to internet, in general public subnets.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already possible to exercise at least some control over where the loadbalancer will run, which is how we deploy clusters with remote workers today. However, I am going to agree with @mhrivnak that this is out of scope in terms of simply enabling the DNS and LB infra on platform None. If changes to how that infra works are desired that should be a separate discussion