CORS-2062: Customer configured DNS for cloud platforms AWS, Azure and GCP #1468

sadasu · 2023-08-31T21:20:31Z

Enhancement proposal for CORS-1874.

Two enhancement proposals preceding this work:
#1276
#1400

barbacbd

https://github.com/openshift/enhancements/pull/1400/files#r1279672504 - Is the installer required to clean up the public dns records or since we cannot clean up private records just leave both ?

enhancements/installer/cloud-custom-dns.md

The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y

2uasimojo · 2023-09-29T14:40:00Z

Please link CORS-1874

The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y

JoelSpeed · 2023-10-12T16:49:50Z

enhancements/installer/cloud-custom-dns.md

+even after cluster installation completes.
+
+If the user successfully configures their external DNS service with api,
+api-int and *.apps services, then they could optionally delete the in-cluster


Delete implies that this coredns pod is unmanaged, how will the pod be evolved over time, eg updated as upgrades happen?

The CoreDNS pod would be manged by the MCO. Currently there is no way of knowing if the lb addresses have changed since the Installer first created it. So, the lb dns addresses within the CoreDNS pod are also not expected to change.
A new field is being added to platformSpec for AWS, Azure and GCP which indicates whether the customDNS solution has been enabled. The coreDNS pod would be created by the MCO only when the feature is Enabled and the ConfigMap containing the LB config is present. When either of these conditions becomes False, the CoreDNS pod could be deleted. These are manual steps.
We are not recommending that the customer delete the coreDNS pod but pointing out that if the customer's DNS solution is configured correctly, then the cluster could function without the self-hosted coreDNS.

Why not have the MCO remove the CoreDNS pod if the conditions it needs to be configured change? Eg if they decide to disable the feature, the MCO could recognise that desire and remove the CoreDNS pod right?

I think what you're saying here makes sense, it wasn't clear to me what's lifecycling the pod, so we should make sure the context of MCO lifecycling the pod is clear and then maybe clarify the workflow for the user to disable or remove the CoreDNS as well? WDYT?

Yes, I think we should make sure the user provisioned DNS is functioning before we remove the CoreDNS pod. So, if this capability is disabled day-2, the responsibility to check for a functioning DNS alternative would fall on which component? Or do we assume that the customer knows what they are doing and not have any checks?
As you can tell, I have some unanswered questions in this area. I think if we decide to allow that in the future, we should be able to add something to the Spec to control that behavior.

If the customer disables the feature, and we remove the pod without any checks, what will break? And how hard would it be for them to then recover the cluster?

If the customer disables the feature, and we remove the pod without any checks, what will break? And how hard would it be for them to then recover the cluster?

IMO, disabling the feature would mean that the customer would want to start using the cloud default DNS and discontinue using their external DNS and the in-cluster DNS. When the feature is disabled, the customer could remove/delete entries from their external DNS. MCO could delete the CoreDNS pod. How do we configure the cloud default DNS then? We could:

Ask the customer to manually configure the cloud DNS using LB values gathered from Infra CR or cloud CLI

The Installer is currently responsible for configuring the cloud DNS for API and API-Int. If the cloud DNS has to be configured with these values day-2, MCO (or another appropriate component has to take on this task).

My current understanding is that disabling this feature day-2 would be an exception. Providing just the manual option seems sufficient at this time. @JoelSpeed, @zaneb

JoelSpeed · 2023-10-12T16:51:27Z

enhancements/installer/cloud-custom-dns.md

+4. After the Installer uses the cloud specific terraform providers to create
+the LBs for API and API-Int, it will add the LB DNS Names of these LB to a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/). This ConfigMap is
+only created when the custom DNS feature is enabled. This ConfigMap gets
+appended to the Ignition file created by the Installer. Let us call this
+the `lbConfigforDNS` ConfigMap.


Why a configmap and not on the status of the infrastructure object? Doesn't the installer already populate the status of the infrastructure object?

We had explored that option in #1276 but ran into some implementation issues within the Installer.

The Installer generates all its manifests, adds them to the bootstrap ignition which is written to an s3 bucket (in the case of AWS) before terraform is started to create and configure cloud infrastructure. So, the Infrastructure manifest has already been written to the bootstrap ignition before the LBs are created via terraform. We found it easier to create a new configmap, append it to the bootstrap ignition and re-write to the s3 bucket than updating the Infrastructure CR that has already been written to the bootstrap ignition file.

Secondly, we don't expect the customer to interact with the configmap at all and for the LB information within it to change (no operator is monitoring the LB values today). It is meant to be a simple mechanism to pass the LB DNS information to MCO from the Installer.

Fwiw, we haven't completely given up on finding a way to update a manifest already written to the bootstrap ignition. Also, another operator (say MCO) could potentially read this configmap and update the Infrastructure CR.

From an API perspective, I would much rather see the installer update the status of the infra object (I appreciate the challenges you've outlined) than use a configmap. Configmaps have no validation and aren't real APIs. Having something in cluster change that value based on the configmap value creates a confused deputy style problem.

It's not a blocker per se, but it would help me sleep at night if we could update the infrastructure status directly, or some other in cluster API rather than a configmap

The current Installer architecture makes it very hard to update the bootstrap ignition once it is generated. And the bootstrap ignition is generated before we know the LB IPs and regeneration is also not possible (installer limitation and re-generation would cause us to loose user edits to bootstrap ignition that might have happened in the meantime). We did recognize that updating the Infrastructure is the best option but we are going with our 2nd best option because appending to the bootstrap ignition rather than update a manifest that was already written to it is currently our only viable option.

With the updates happening to the Installer code to remove dependency on terraform, we hope to influence the design to make things like this easier to accomplish within the Installer.

@patrickdillon is opposed to updating the Infrastructure CR within the Installer due to the amount of surgery needed on the bootstrap ignition file. I am not sure if the Installer changes needed to remove terraform make things better. Even if they do, that won't be available until a later release.
If the Infrastructure can be updated with the data in ConfigMap by a component other than the Installer, I am open to that. I already explored MCO as an option early on and that doesn't work either. Any other options seem viable?

Why wrap it in a ConfigMap and add it to the manifests directory at all then? It could be any old JSON file.

Why wrap it in a ConfigMap and add it to the manifests directory at all then? It could be any old JSON file.

@zaneb we wanted to treat it like an asset generated by the Installer which it is.

Going back to @JoelSpeed 's original question. I believe everyone is caught up on the reasons for moving away from the Infra CR although we started there originally :-)

Yes, the Installer populates the Status of the Infra CR while generating the Infra manifest.

Another option would be for MCO to read the configMap and update Infrastructure CR with these values. This was initially thought of as not a possibility because MCO did not own the Infrastructure resource. But, recent discussions seem promising.

we wanted to treat it like an asset generated by the Installer

Oh, like you wanted the user to be able to edit it as a manifest? That makes sense if that is actually a requirement. Is it though? IIUC there's no additional detail that users can add beyond what they already provide in the install-config, so really they can only mess it up.

If that's not a requirement, there are heaps of Assets that get added to the bootstrap ignition without being manifests as such - most of the ones in this list.

JoelSpeed · 2023-10-12T16:54:52Z

enhancements/installer/cloud-custom-dns.md

+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: LBConfigforDNS
+  namespace: openshift-aws-infra
+data:
+  internal-api-lb-dns-name: "abc-123"
+  external-api-lb-dns-name: "xyz-456"
+```


I'm not understanding why this middle man is needed here, can you expand?

Hope https://github.com/openshift/enhancements/pull/1468/files#r1373852053 answers your question.

enhancements/installer/cloud-custom-dns.md

JoelSpeed · 2023-10-12T16:56:28Z

enhancements/installer/cloud-custom-dns.md

+successful cluster install, to configure their own DNS solution.
+
+```go
+type AWSPlatformStatus struct {


Is there an API PR for this? We can do the in-depth API review there, general structure looks ok here I think

This one openshift/api#1606. @barbacbd is working on it.

enhancements/installer/cloud-custom-dns.md

JoelSpeed · 2023-10-12T16:58:39Z

enhancements/installer/cloud-custom-dns.md

+3. Add a field within the `PlatformSpec` for AWS, Azure and GCP to indicate if
+custom DNS is enabled. `PlatformSpec` is within the `Spec` field of the
+Infrastructure CR. Here is the update for platform AWS.


Is this spec or status? Can it be changed after a cluster has been bootstrapped?

Good point. I added it to the Spec field because this is configuration that is provided by the user. But, as you point out this cannot be changed after the cluster has been bootstrapped. If the Status field is a better place for it, please let me know.

In general, if it cannot be changed on day 2, it should be in status. If it can be changed day 2, then we tend to have a spec field that is reflected into the status once the controllers that observe the configuration have had a chance to observe and update themselves based on the new input.

I think for this case, status only is sufficient

enhancements/installer/cloud-custom-dns.md

openshift-ci-robot · 2023-10-27T14:41:15Z

@sadasu: This pull request references CORS-1874 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.15." or "openshift-4.15.", but it targets "openshift-4.14" instead.

In response to this:

Enhancement proposal for CORS-1874.

Two enhancement proposals preceding this work:
#1276
#1400

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sadasu · 2023-10-27T14:41:21Z

/jira refresh

openshift-ci-robot · 2023-10-27T14:41:23Z

@sadasu: This pull request references CORS-1874 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.15." or "openshift-4.15.", but it targets "openshift-4.14" instead.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-10-27T14:49:07Z

@sadasu: This pull request references CORS-2062 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Enhancement proposal for CORS-1874.

Two enhancement proposals preceding this work:
#1276
#1400

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sadasu · 2023-10-27T14:49:28Z

/jira refresh

openshift-ci-robot · 2023-10-27T14:49:30Z

@sadasu: This pull request references CORS-2062 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

enhancements/installer/cloud-custom-dns.md

openshift-bot · 2024-03-13T01:15:21Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-03-20T08:45:17Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2024-03-28T00:15:46Z

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2024-03-28T00:16:31Z

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dhellmann · 2024-03-29T12:45:55Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

dhellmann · 2024-04-05T13:05:22Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

dhellmann · 2024-04-05T13:11:33Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

dhellmann · 2024-04-12T14:08:47Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

dhellmann · 2024-04-19T12:45:58Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

dhellmann · 2024-04-26T12:56:15Z

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

2uasimojo · 2024-05-13T14:57:33Z

@sadasu I can see a merged code related to this (openshift/machine-config-operator#4018 and openshift/installer#7837 at least). Is there a different EP somewhere that supersedes this one? Is this functionality going to be in OCP at some point?

sadasu · 2025-01-14T14:57:45Z

/reopen

openshift-ci · 2025-01-14T14:59:08Z

@sadasu: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci-robot · 2025-01-14T14:59:09Z

@sadasu: This pull request references CORS-2062 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Enhancement proposal for CORS-1874.

Two enhancement proposals preceding this work:
#1276
#1400

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-01-14T14:59:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from sadasu. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-01-15T22:14:19Z

@sadasu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/markdownlint	`fd6cbde`	link	true	`/test markdownlint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

sadasu · 2025-01-18T00:46:08Z

/remove-lifecycle rotten

openshift-bot · 2025-02-15T01:16:47Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2025-02-22T08:45:54Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-ci bot requested review from jcantrill and mandre August 31, 2023 21:20

sadasu force-pushed the custom-dns branch from c8d87d0 to 366aaa6 Compare August 31, 2023 21:25

barbacbd reviewed Sep 25, 2023

View reviewed changes

barbacbd reviewed Sep 27, 2023

View reviewed changes

enhancements/installer/cloud-custom-dns.md Outdated Show resolved Hide resolved

barbacbd mentioned this pull request Sep 28, 2023

Add configuration options for custom DNS work. openshift/api#1603

Closed

JoelSpeed reviewed Oct 12, 2023

View reviewed changes

sadasu mentioned this pull request Oct 13, 2023

CORS-2062: Allow Customers to BYO DNS #1400

Closed

sadasu force-pushed the custom-dns branch 2 times, most recently from 0a637ed to 3da89b0 Compare October 27, 2023 14:30

sadasu changed the title ~~Customer configured DNS for cloud platforms AWS, Azure and GCP~~ CORS-1874: Customer configured DNS for cloud platforms AWS, Azure and GCP Oct 27, 2023

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 27, 2023

sadasu changed the title ~~CORS-1874: Customer configured DNS for cloud platforms AWS, Azure and GCP~~ CORS-2062: Customer configured DNS for cloud platforms AWS, Azure and GCP Oct 27, 2023

zaneb reviewed Oct 30, 2023

View reviewed changes

enhancements/installer/cloud-custom-dns.md Show resolved Hide resolved

sadasu force-pushed the custom-dns branch from 3da89b0 to 7b784cd Compare November 1, 2023 14:38

This was referenced Nov 6, 2023

CORS-2814: Add support for in-cluster DNS on Cloud Platforms when cloud DNS cannot be used openshift/machine-config-operator#4018

Merged

CORS-2818: Adding ability to render Cloud LB IPs openshift/baremetal-runtimecfg#286

Merged

sadasu force-pushed the custom-dns branch 2 times, most recently from 7d7b550 to 2053728 Compare November 14, 2023 18:35

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2024

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 20, 2024

openshift-ci bot closed this Mar 28, 2024

openshift-ci bot reopened this Jan 14, 2025

Customer configured DNS for cloud platforms AWS, Azure and GCP

fd6cbde

sadasu force-pushed the custom-dns branch from 037138a to fd6cbde Compare January 15, 2025 21:56

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 18, 2025

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 15, 2025

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 22, 2025

CORS-2062: Customer configured DNS for cloud platforms AWS, Azure and GCP #1468

Are you sure you want to change the base?

CORS-2062: Customer configured DNS for cloud platforms AWS, Azure and GCP #1468

Conversation

sadasu commented Aug 31, 2023 • edited Loading

barbacbd left a comment

Choose a reason for hiding this comment

2uasimojo commented Sep 29, 2023 • edited by openshift-ci bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sadasu Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

sadasu Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Oct 27, 2023 • edited by openshift-ci bot Loading

sadasu commented Oct 27, 2023

openshift-ci-robot commented Oct 27, 2023 • edited by openshift-ci bot Loading

openshift-ci-robot commented Oct 27, 2023 • edited by openshift-ci bot Loading

sadasu commented Oct 27, 2023

openshift-ci-robot commented Oct 27, 2023 • edited by openshift-ci bot Loading

openshift-bot commented Mar 13, 2024

openshift-bot commented Mar 20, 2024

openshift-bot commented Mar 28, 2024

openshift-ci bot commented Mar 28, 2024

dhellmann commented Mar 29, 2024

dhellmann commented Apr 5, 2024

dhellmann commented Apr 5, 2024

dhellmann commented Apr 12, 2024

dhellmann commented Apr 19, 2024

dhellmann commented Apr 26, 2024

2uasimojo commented May 13, 2024

sadasu commented Jan 14, 2025

openshift-ci bot commented Jan 14, 2025

openshift-ci-robot commented Jan 14, 2025 • edited by openshift-ci bot Loading

openshift-ci bot commented Jan 14, 2025

openshift-ci bot commented Jan 15, 2025

sadasu commented Jan 18, 2025

openshift-bot commented Feb 15, 2025

openshift-bot commented Feb 22, 2025

sadasu commented Aug 31, 2023 •

edited

Loading

2uasimojo commented Sep 29, 2023 •

edited by openshift-ci bot

Loading

sadasu Nov 14, 2023 •

edited

Loading

sadasu Nov 14, 2023 •

edited

Loading

openshift-ci-robot commented Oct 27, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 27, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 27, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 27, 2023 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 14, 2025 •

edited by openshift-ci bot

Loading