in-cluster DNS and load balancers on more platforms #1666

mhrivnak · 2024-08-26T13:01:35Z

proposes adding in-cluster DNS and load balancers as an option for more platform types than just the current baremetal and openstack.

proposes adding in-cluster DNS and load balancers as an option for more platform types than just the current `baremetal` and `openstack`.

rvanderp3 · 2024-09-12T18:00:09Z

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

+    CloudControllerManager CloudControllerManager `json:"cloudControllerManager,omitempty"`
+
+    // InClusterLoadBalancer is an optional feature that uses haproxy and
+    // keepalived as loadbalancers running in the cluster. Is is useful in


Suggested change

// keepalived as loadbalancers running in the cluster. Is is useful in

// keepalived as loadbalancers running in the cluster. This is useful in

rvanderp3 · 2024-09-12T18:09:30Z

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

+type InClusterLoadBalancer struct {
+    // APIVIPs contains the VIP(s) to use for internal API communication. In
+    // dual stack clusters it contains an IPv4 and IPv6 address, otherwise only
+    // one VIP
+    //
+    // +kubebuilder:validation:MaxItems=2
+    // +kubebuilder:validation:UniqueItems=true
+    // +kubebuilder:validation:Format=ip
+    APIVIPs []string `json:"apiVIPs,omitempty"`
+
+    // IngressVIPs contains the VIP(s) to use for ingress traffic. In dual stack
+    // clusters it contains an IPv4 and IPv6 address, otherwise only one VIP
+    //  
+    // +kubebuilder:validation:MaxItems=2
+    // +kubebuilder:validation:UniqueItems=true
+    // +kubebuilder:validation:Format=ip
+    IngressVIPs []string `json:"ingressVIPs,omitempty"`
+}


are the existing API/ingress VIP fields going to be deprecated/moved to this type?

Hopefully that's not necessary. I proposed an API here just to have a standing point for conversation, but I'm happy for the team to weigh in on what would make the most sense. There needs to be some balance of fitting in with the existing APIs, but also making it obvious and simple to enable or disable the feature. And for this proposal, the default should continue to be "disabled" since that matches existing behavior.

Following external-lb-vips enhancement a loadBalancer property was added to the install config [here].(https://github.com/openshift/installer/pull/6812/files)
I think this enhancement is the counterpart of external-lb-vips enhancement and it could probably use the same API

There's prior art for controlling loadbalancer deployment based on either the existence of the VIPs or an explicit field (as in the feature @eranco74 linked). The existence of the VIPs was something of a kludge because the VSphere UPI implementation was done before we had these components and we needed to be able to differentiate UPI and IPI without breaking existing logic.

Given that, I'd mildly prefer to use the explicit field (which would default to the opposite of what the existing platforms do), in part because the existing VSphere UPI logic is already a headache that semi-regularly trips us up. It's probably not a big deal as long as we pick a method and stick to it for these new platforms, but if we implicitly enable the LB when VIPs are provided, then we'll have three different sets of logic: Based only on the loadBalancer property (non-vsphere on-prem platforms), based on both the loadBalancer property and the existence of VIPs (vsphere), and only the existence of VIPs (none and external). We're less likely to accidentally break this in the future if it works the same as the other on-prem platforms and VSphere is our only outlier.

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

rvanderp3 · 2024-09-12T18:22:32Z

cc @2uasimojo

you might be interested in the install-config implications here

2uasimojo · 2024-09-12T19:15:12Z

you might be interested in the install-config implications here

Thanks @rvanderp3. I think these changes will be transparent to hive -- in fact, I'm not even sure if they're applicable to a hive-provisioned cluster -- but I'll keep an eye on this.

Co-authored-by: Richard Vanderpool <[email protected]>

openshift-bot · 2024-11-02T01:15:18Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-11-09T08:45:15Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2024-11-18T00:15:38Z

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2024-11-18T00:15:48Z

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

eranco74 · 2025-01-30T13:59:25Z

/reopen

openshift-ci · 2025-01-30T14:00:02Z

@eranco74: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-01-30T14:00:15Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign dougbtv for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

enhancements/network/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mhrivnak · 2025-01-30T14:13:21Z

/remove-lifecycle rotten

openshift-ci · 2025-01-30T14:25:02Z

@mhrivnak: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/markdownlint	`57fc907`	link	true	`/test markdownlint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

JoelSpeed

Is it possible for this feature to be turned off by a user day 2? Does it get uninstalled automatically? What else would be needed to remove this if it is no longer required by the end user?

JoelSpeed · 2025-02-04T13:22:27Z

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

+    // +kubebuilder:validation:UniqueItems=true
+    // +kubebuilder:validation:Format=ip


Nit, for real APIs we use CEL for both of these now, these validations don't actually work as intended

mtulio · 2025-02-10T15:23:50Z

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

+`InternalDNS`. They are shown below, added to the existing settings for the
+External platform type.
+
+```


Suggested change

```

```go

mtulio · 2025-02-10T19:37:08Z

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

+The in-cluster network infrastructure has a limitation that it requires nodes
+to be on the same subnet. This proposal does not seek to change or remove that


limitation that it requires nodes to be on the same subnet

Is this a hard limitation? In general BYO infra in cloud providers using publish strategy External they create public and private subnets, placing nodes in private subnets.

I see in the EP provide in-cluster implementations of network services that the the Load Balancer service is created, initially in Bootstrap node then moved to Control Plane nodes when InClusterLoadBalancer is set, would be possible to have certain flexibility to configure which node it will be deployed? Is it makes sense? I am wondering if we could advise "non-integrated" cloud providers which does not met LB requirements (eg hairpin), or does not offer cloud-based LBs to isolate the nodes where the Load Balancer is deployed in public installations, allowing them to also opt-in this feature, keeping that network standard: keeping control planes in private subnets, and haproxy deployed in nodes where they can expose to internet, in general public subnets.

It is already possible to exercise at least some control over where the loadbalancer will run, which is how we deploy clusters with remote workers today. However, I am going to agree with @mhrivnak that this is out of scope in terms of simply enabling the DNS and LB infra on platform None. If changes to how that infra works are desired that should be a separate discussion

cybertron · 2025-02-17T20:35:06Z

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

+The in-cluster network infrastructure has a limitation that it requires nodes
+to be on the same subnet. This proposal does not seek to change or remove that


It is already possible to exercise at least some control over where the loadbalancer will run, which is how we deploy clusters with remote workers today. However, I am going to agree with @mhrivnak that this is out of scope in terms of simply enabling the DNS and LB infra on platform None. If changes to how that infra works are desired that should be a separate discussion

cybertron · 2025-02-17T21:08:32Z

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

+type InClusterLoadBalancer struct {
+    // APIVIPs contains the VIP(s) to use for internal API communication. In
+    // dual stack clusters it contains an IPv4 and IPv6 address, otherwise only
+    // one VIP
+    //
+    // +kubebuilder:validation:MaxItems=2
+    // +kubebuilder:validation:UniqueItems=true
+    // +kubebuilder:validation:Format=ip
+    APIVIPs []string `json:"apiVIPs,omitempty"`
+
+    // IngressVIPs contains the VIP(s) to use for ingress traffic. In dual stack
+    // clusters it contains an IPv4 and IPv6 address, otherwise only one VIP
+    //  
+    // +kubebuilder:validation:MaxItems=2
+    // +kubebuilder:validation:UniqueItems=true
+    // +kubebuilder:validation:Format=ip
+    IngressVIPs []string `json:"ingressVIPs,omitempty"`
+}


There's prior art for controlling loadbalancer deployment based on either the existence of the VIPs or an explicit field (as in the feature @eranco74 linked). The existence of the VIPs was something of a kludge because the VSphere UPI implementation was done before we had these components and we needed to be able to differentiate UPI and IPI without breaking existing logic.

Given that, I'd mildly prefer to use the explicit field (which would default to the opposite of what the existing platforms do), in part because the existing VSphere UPI logic is already a headache that semi-regularly trips us up. It's probably not a big deal as long as we pick a method and stick to it for these new platforms, but if we implicitly enable the LB when VIPs are provided, then we'll have three different sets of logic: Based only on the loadBalancer property (non-vsphere on-prem platforms), based on both the loadBalancer property and the existence of VIPs (vsphere), and only the existence of VIPs (none and external). We're less likely to accidentally break this in the future if it works the same as the other on-prem platforms and VSphere is our only outlier.

cybertron · 2025-02-17T21:22:58Z

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

+
+## Alternatives
+
+### Move In-Cluster Network Settings Out of the Platform Spec


FWIW, in a perfect world I'd like to see this happen, for a couple of reasons:

There's a ton of duplication in the API already, and we're probably only going to add support for more platforms over time. Each platform requires us to duplicate the entire in-cluster network API again.

We anticipate having more cross-platform configuration settings in the near future and it would be nice to have a logical place to put them already present.

That said, I recognize this could complicate things quite a lot, as discussed in the cons section. I guess you can consider this a vote in favor of a common config section, but not a vote against the proposal above.

cybertron · 2025-02-17T21:27:22Z

enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md

+
+const (
+    // CoreDNS is the default service used to implement internal DNS within a cluster.
+    CoreDNS InternalDNS = "CoreDNS"


If we go with the loadBalancer field discussed above, we probably want this to have a default value of "None" or "UserManaged" or something like that to match the loadBalancer behavior.

We also may not want to use CoreDNS in the option name. There have been discussions about changing the implementation of the on-prem DNS to something lighter weight and it would be confusing if the option CoreDNS actually deployed dnsmasq. That's also why the loadBalancer parameter names are "OpenShiftManaged" and "UserManaged" instead of "keepalived" and "none".

openshift-ci bot requested review from abhat and dougbtv August 26, 2024 13:02

in-cluster DNS and load balancers on more platforms

1e8dbd5

proposes adding in-cluster DNS and load balancers as an option for more platform types than just the current `baremetal` and `openstack`.

mhrivnak force-pushed the in-cluster-network-services branch from 159b7fd to 1e8dbd5 Compare August 26, 2024 16:27

mhrivnak closed this Aug 26, 2024

mhrivnak reopened this Aug 26, 2024

rvanderp3 reviewed Sep 12, 2024

View reviewed changes

Apply suggestions from code review

57fc907

Co-authored-by: Richard Vanderpool <[email protected]>

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 2, 2024

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 9, 2024

openshift-ci bot closed this Nov 18, 2024

zaneb mentioned this pull request Jan 28, 2025

Two nodes openshift #1675

Open

openshift-ci bot reopened this Jan 30, 2025

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 30, 2025

JoelSpeed reviewed Feb 4, 2025

View reviewed changes

mtulio reviewed Feb 10, 2025

View reviewed changes

cybertron reviewed Feb 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in-cluster DNS and load balancers on more platforms #1666

in-cluster DNS and load balancers on more platforms #1666

mhrivnak commented Aug 26, 2024

rvanderp3 Sep 12, 2024

rvanderp3 Sep 12, 2024

mhrivnak Oct 4, 2024

eranco74 Nov 10, 2024 •

edited

Loading

cybertron Feb 17, 2025

rvanderp3 commented Sep 12, 2024

2uasimojo commented Sep 12, 2024

openshift-bot commented Nov 2, 2024

openshift-bot commented Nov 9, 2024

openshift-bot commented Nov 18, 2024

openshift-ci bot commented Nov 18, 2024

eranco74 commented Jan 30, 2025

openshift-ci bot commented Jan 30, 2025

openshift-ci bot commented Jan 30, 2025

mhrivnak commented Jan 30, 2025

openshift-ci bot commented Jan 30, 2025

JoelSpeed left a comment

JoelSpeed Feb 4, 2025

mtulio Feb 10, 2025 •

edited

Loading

mtulio Feb 10, 2025

cybertron Feb 17, 2025

cybertron Feb 17, 2025

cybertron Feb 17, 2025

cybertron Feb 17, 2025

cybertron Feb 17, 2025

	// keepalived as loadbalancers running in the cluster. Is is useful in
	// keepalived as loadbalancers running in the cluster. This is useful in

		// +kubebuilder:validation:UniqueItems=true
		// +kubebuilder:validation:Format=ip

		The in-cluster network infrastructure has a limitation that it requires nodes
		to be on the same subnet. This proposal does not seek to change or remove that


		## Alternatives

		### Move In-Cluster Network Settings Out of the Platform Spec

in-cluster DNS and load balancers on more platforms #1666

Are you sure you want to change the base?

in-cluster DNS and load balancers on more platforms #1666

Conversation

mhrivnak commented Aug 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eranco74 Nov 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvanderp3 commented Sep 12, 2024

2uasimojo commented Sep 12, 2024

openshift-bot commented Nov 2, 2024

openshift-bot commented Nov 9, 2024

openshift-bot commented Nov 18, 2024

openshift-ci bot commented Nov 18, 2024

eranco74 commented Jan 30, 2025

openshift-ci bot commented Jan 30, 2025

openshift-ci bot commented Jan 30, 2025

mhrivnak commented Jan 30, 2025

openshift-ci bot commented Jan 30, 2025

JoelSpeed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtulio Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eranco74 Nov 10, 2024 •

edited

Loading

mtulio Feb 10, 2025 •

edited

Loading