Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in-cluster DNS and load balancers on more platforms #1666

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,320 @@
---
title: in-cluster-dns-and-loadbalancers-on-more-platforms
authors:
- "@mhrivnak"
- "@eranco74"
reviewers: # Include a comment about what domain expertise a reviewer is expected to bring and what area of the enhancement you expect them to focus on. For example: - "@networkguru, for networking aspects, please look at IP bootstrapping aspect"
- "@cybertron"
- "@tsorya"
- "@zaneb"
approvers: # A single approver is preferred, the role of the approver is to raise important questions, help ensure the enhancement receives reviews from all applicable areas/SMEs, and determine when consensus is achieved such that the EP can move forward to implementation. Having multiple approvers makes it difficult to determine who is responsible for the actual approval.
- TBD
api-approvers: # In case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers). If there is no API change, use "None"
- TBD
creation-date: 2024-08-26
last-updated: 2024-08-26
tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement
see-also:
- "/enhancements/network/baremetal-networking.md"
replaces:
superseded-by:
---

# In-cluster DNS and load balancers on more platforms

## Summary

Multiple on-prem platform types, including `baremetal` and `openstack`,
[provide in-cluster implementations of network
services](https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md)
that are required in order to have a viable stand-alone cluster:

* CoreDNS for in-cluster DNS resolution
* haproxy with keepalived to provide in-cluster load balancers (ingress) for the API server and workloads

Continuing the work from that original [enhancement
proposal](https://github.com/openshift/enhancements/blob/master/enhancements/network/baremetal-networking.md),
those services should also be available for optional inclusion when installing
a cluster with the `external` or `none` platform types, which are likewise
often used in environments that lack a suitable alternative for DNS and/or load
balancers.

## Motivation

Provisioning and configuring a [DNS
system](https://docs.openshift.com/container-platform/4.16/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-dns-user-infra_installing-platform-agnostic)
and [load
balancers](https://docs.openshift.com/container-platform/4.16/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-load-balancing-user-infra_installing-platform-agnostic)
manually for an OpenShift cluster is a substantial burden on the user. In a
cloud environment that's not already supported by the OpenShift installer,
utilizing the native offerings requires additional work up-front, creates an
ongoing maintenance burden, and more monetary cost for infrastructure. And not
all cloud environments offer suitable options. On-prem, there may not exist
sufficient DNS and/or load balancer services nor additional infrastructure on
which to run them.

Many users end up deploying a cluster with platform type `baremetal` when all
they really want is to use the in-cluster network services. Instead, it should
be possible to utilize those in-cluster network services without them being
coupled to the baremetal platform type.

For example, the assisted-installer and the agent based installer are often
used to deploy clusters into “generic” environments where there is not an
opportunity to utilize an external DNS or load balancer solution. Thus the
assisted-installer sets the platform type as `baremetal` regardless of whether
the systems are actually running on bare metal or whether there is any intent
to use metal3 integrations. The resulting cluster has all of the appearances of
being bare metal, including a BareMetalHost resource for each Node, which can
be confusing to users. Even the web console’s Overview landing page shows
“Infrastructure provider: BareMetal” in addition to “N Nodes” and “N Bare Metal
Hosts”.

Single Node OpenShift uses platform type `none` and [requires the user to
configure DNS records
manually](https://docs.openshift.com/container-platform/4.16/installing/installing_sno/install-sno-installing-sno.html#install-sno-installing-sno-manually).
When using the assisted-installer, it configures dnsmasq in new SNO clusters as
a convenience. But it would be better for the internal DNS service to be a
native part of platform type `none` so that it is easily available to all
users, regardless of how they are installing SNO.

### User Stories

As a user deploying OpenShift in an environment that lacks a suitable DNS
and/or load balancer solution, and with no intent to utilize metal3-related
bare metal features, I want to utilize the in-cluster network services without
being forced to use the `baremetal` platform type.

As a user deploying OpenShift with the `external` platform type into an
environment of my choosing, I want the option to use the in-cluster network
services because they are easier to use than manually deploying, configuring
and managing the alternatives that may be natively available in the
environment.

As a user deploying Single Node OpenShift, I want the convenience of a
cluster-internal DNS solution.

As a user deploying OpenShift in a mixed environment, such as [virtualized
control plane nodes and bare metal worker
nodes](https://access.redhat.com/solutions/5376701), I am forced to select
platform type `none`, but I still want the option to use the in-cluster network
services.

As a developer enabling OpenShift on a new platform via the `external` platform
type, I want to get an OpenShift cluster up and running with as little friction
as possible so I can start adding integrations with features of the
environment.

### Goals

Enable stand-alone OpenShift clusters to be viable out-of-the-box in
environments that A) lack a suitable external DNS and/or load balancer
solution, and B) are not one of the platform types that already provide those
services in-cluster (`baremetal`, `openstack`, an `ovirt`).

Allow users to opt-in for in-cluster DNS and load balancer services with
platform types `none` and `external`.

Stop requiring users to select the `baremetal` platform type when all they
really want is the in-cluster DNS and load balancer services.

Make it easy for Single Node OpenShift users to deploy the cluster-internal DNS
service.

### Non-Goals

The in-cluster network infrastructure has a limitation that it requires nodes
to be on the same subnet. This proposal does not seek to change or remove that
Comment on lines +125 to +126
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limitation that it requires nodes to be on the same subnet

Is this a hard limitation? In general BYO infra in cloud providers using publish strategy External they create public and private subnets, placing nodes in private subnets.

I see in the EP provide in-cluster implementations of network services that the the Load Balancer service is created, initially in Bootstrap node then moved to Control Plane nodes when InClusterLoadBalancer is set, would be possible to have certain flexibility to configure which node it will be deployed? Is it makes sense? I am wondering if we could advise "non-integrated" cloud providers which does not met LB requirements (eg hairpin), or does not offer cloud-based LBs to isolate the nodes where the Load Balancer is deployed in public installations, allowing them to also opt-in this feature, keeping that network standard: keeping control planes in private subnets, and haproxy deployed in nodes where they can expose to internet, in general public subnets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already possible to exercise at least some control over where the loadbalancer will run, which is how we deploy clusters with remote workers today. However, I am going to agree with @mhrivnak that this is out of scope in terms of simply enabling the DNS and LB infra on platform None. If changes to how that infra works are desired that should be a separate discussion

limitation.

## Proposal

The install-config.yaml platform section for both `external` and `none` will
include optional fields to deploy and configure coredns and/or the in-cluster
load balancers. Actual deployment and management will be handled the same way
it already is on other platforms.

### Workflow Description

A user or automation tool (such as the assisted-installer) that is editing
install-config.yaml prior to cluster installation will be able to:
* Enable internal DNS
* Provide VIPs that implicitly enable in-cluster load balancers

### API Extensions

In the `InstallConfig`, the sections for External and None platforms will have
new settings that:
* Enable internal DNS
* Provide VIPs that implicitly enable internal load balancers (See example under Implementation Details)

The `Infrastructure` API will add fields in the
[`PlatformSpec`](https://github.com/openshift/api/blob/ef419b6/config/v1/types_infrastructure.go#L272)
and
[`PlatformStatus`](https://github.com/openshift/api/blob/ef419b6/config/v1/types_infrastructure.go#L389)
that mirror the corresponding fields for baremetal, including:
* `APIServerInternalIPs` in Spec and Status
* `IngressIPs` in Spec and Status
* `LoadBalancer` in Status

Those fields will be added to the `External` platform Spec and Status. For the
`None` platform, a new Spec and Status section will need to be created.

### Topology Considerations

#### Hypershift / Hosted Control Planes

None

#### Standalone Clusters

The change is only relevant for standalone clusters.

#### Single-node Deployments or MicroShift

Single Node OpenShift benefits from this change as described above. Being a
single node, it does not need the loadbalancers, but it does require a DNS
solution.

Assisted-installer already deploys dnsmasq by default as a cluster-internal DNS
solution for SNO, which has been valuable and successful.

### Implementation Details/Notes/Constraints

The `InstallConfig` will gain new settings for `InClusterLoadBalancer` and
`InternalDNS`. They are shown below, added to the existing settings for the
External platform type.

```
Copy link
Contributor

@mtulio mtulio Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
```go

type InClusterLoadBalancer struct {
// APIVIPs contains the VIP(s) to use for internal API communication. In
// dual stack clusters it contains an IPv4 and IPv6 address, otherwise only
// one VIP
//
// +kubebuilder:validation:MaxItems=2
// +kubebuilder:validation:UniqueItems=true
// +kubebuilder:validation:Format=ip
Comment on lines +194 to +195
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, for real APIs we use CEL for both of these now, these validations don't actually work as intended

APIVIPs []string `json:"apiVIPs,omitempty"`

// IngressVIPs contains the VIP(s) to use for ingress traffic. In dual stack
// clusters it contains an IPv4 and IPv6 address, otherwise only one VIP
//
// +kubebuilder:validation:MaxItems=2
// +kubebuilder:validation:UniqueItems=true
// +kubebuilder:validation:Format=ip
IngressVIPs []string `json:"ingressVIPs,omitempty"`
}
Comment on lines +188 to +205
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the existing API/ingress VIP fields going to be deprecated/moved to this type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully that's not necessary. I proposed an API here just to have a standing point for conversation, but I'm happy for the team to weigh in on what would make the most sense. There needs to be some balance of fitting in with the existing APIs, but also making it obvious and simple to enable or disable the feature. And for this proposal, the default should continue to be "disabled" since that matches existing behavior.

Copy link
Contributor

@eranco74 eranco74 Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following external-lb-vips enhancement a loadBalancer property was added to the install config [here].(https://github.com/openshift/installer/pull/6812/files)
I think this enhancement is the counterpart of external-lb-vips enhancement and it could probably use the same API

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's prior art for controlling loadbalancer deployment based on either the existence of the VIPs or an explicit field (as in the feature @eranco74 linked). The existence of the VIPs was something of a kludge because the VSphere UPI implementation was done before we had these components and we needed to be able to differentiate UPI and IPI without breaking existing logic.

Given that, I'd mildly prefer to use the explicit field (which would default to the opposite of what the existing platforms do), in part because the existing VSphere UPI logic is already a headache that semi-regularly trips us up. It's probably not a big deal as long as we pick a method and stick to it for these new platforms, but if we implicitly enable the LB when VIPs are provided, then we'll have three different sets of logic: Based only on the loadBalancer property (non-vsphere on-prem platforms), based on both the loadBalancer property and the existence of VIPs (vsphere), and only the existence of VIPs (none and external). We're less likely to accidentally break this in the future if it works the same as the other on-prem platforms and VSphere is our only outlier.



// Platform stores configuration related to external cloud providers.
type Platform struct {
// PlatformName holds the arbitrary string representing the infrastructure
// provider name, expected to be set at the installation time. This field
// is solely for informational and reporting purposes and is not expected
// to be used for decision-making.
// +kubebuilder:default:="Unknown"
// +default="Unknown"
// +kubebuilder:validation:XValidation:rule="oldSelf == 'Unknown' || self == oldSelf",message="platform name cannot be changed once set"
// +optional
PlatformName string `json:"platformName,omitempty"`

// CloudControllerManager when set to external, this property will enable
// an external cloud provider.
// +kubebuilder:default:=""
// +default=""
// +kubebuilder:validation:Enum="";External
// +optional
CloudControllerManager CloudControllerManager `json:"cloudControllerManager,omitempty"`

// InClusterLoadBalancer is an optional feature that uses haproxy and
// keepalived as loadbalancers running in the cluster. Is is useful in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// keepalived as loadbalancers running in the cluster. Is is useful in
// keepalived as loadbalancers running in the cluster. This is useful in

// environments where it is not possible or desirable to use loadbalancers outside
// of the cluster.
// +optional
InClusterLoadBalancer *InClusterLoadBalancer `json:"inClusterLoadBalancer,omitempty"`

// InternalDNS, when set, activates a DNS service running inside the cluster
// to provide DNS resolution internally. It is useful in environments where
// it is not possible or desirable to manage the cluster's internal DNS
// records in an external DNS system.
// +kubebuilder:default:=""
// +default=""
// +kubebuilder:validation:Enum="";CoreDNS
// +optional
InternalDNS InternalDNS `json:"internalDNS,omitempty"`
}

type InternalDNSType string

const (
// CoreDNS is the default service used to implement internal DNS within a cluster.
CoreDNS InternalDNS = "CoreDNS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with the loadBalancer field discussed above, we probably want this to have a default value of "None" or "UserManaged" or something like that to match the loadBalancer behavior.

We also may not want to use CoreDNS in the option name. There have been discussions about changing the implementation of the on-prem DNS to something lighter weight and it would be confusing if the option CoreDNS actually deployed dnsmasq. That's also why the loadBalancer parameter names are "OpenShiftManaged" and "UserManaged" instead of "keepalived" and "none".

)
```

### Risks and Mitigations

All of the components in question are already widely deployed in OpenShift
clusters.

### Drawbacks


## Open Questions [optional]


## Test Plan

**Note:** *Section not required until targeted at a release.*

## Graduation Criteria

**Note:** *Section not required until targeted at a release.*

### Dev Preview -> Tech Preview


### Tech Preview -> GA


### Removing a deprecated feature


## Upgrade / Downgrade Strategy

No change to how the components are upgraded and/or downgraded today.

## Version Skew Strategy

No change.

## Operational Aspects of API Extensions

This proposal will enable clusters installed in the future to have fewer CRDs,
since they'll be able to use in-cluster network services without having to
select the `baremetal` platform type. Thus, the unused CRDs from the
`baremetal` platform won't be present on those clusters.

## Support Procedures

No new support implications.

## Alternatives

### Move In-Cluster Network Settings Out of the Platform Spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, in a perfect world I'd like to see this happen, for a couple of reasons:

  • There's a ton of duplication in the API already, and we're probably only going to add support for more platforms over time. Each platform requires us to duplicate the entire in-cluster network API again.
  • We anticipate having more cross-platform configuration settings in the near future and it would be nice to have a logical place to put them already present.

That said, I recognize this could complicate things quite a lot, as discussed in the cons section. I guess you can consider this a vote in favor of a common config section, but not a vote against the proposal above.


Instead of embedding the in-cluster network settings within the platform
specification, these settings could be moved to a separate, dedicated section
in the install-config.yaml. This approach would completely decouple the setup
of these in-cluster network services from platform-specific settings, allowing
greater flexibility in utilizing the network services on any platform type.

Pros:
* Decouples the in-cluster network services from platform-specific settings.
* Simplifies the platform specification and provides a clear, dedicated section for network services.
* De-duplicates settings that have been mirrored into `baremetal`, `openstack`, and `ovirt` platforms.

Cons:
* Introduces a new section in the configuration file, which may confuse users.
* May conflict with the settings for these network services that already exist on specific platforms, including baremetal and openstack.
* Would require guardrails to ensure they don’t get deployed with platforms that utilize other solutions, even if the user configures them for deployment in the install-config.