-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORS-2062: Customer configured DNS for cloud platforms AWS, Azure and GCP #1468
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/openshift/enhancements/pull/1400/files#r1279672504 - Is the installer required to clean up the public dns records or since we cannot clean up private records just leave both ?
The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y
Please link CORS-1874 |
The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y
The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y
The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y
even after cluster installation completes. | ||
|
||
If the user successfully configures their external DNS service with api, | ||
api-int and *.apps services, then they could optionally delete the in-cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete implies that this coredns pod is unmanaged, how will the pod be evolved over time, eg updated as upgrades happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CoreDNS pod would be manged by the MCO. Currently there is no way of knowing if the lb addresses have changed since the Installer first created it. So, the lb dns addresses within the CoreDNS pod are also not expected to change.
A new field is being added to platformSpec
for AWS, Azure and GCP which indicates whether the customDNS solution has been enabled. The coreDNS pod would be created by the MCO only when the feature is Enabled
and the ConfigMap containing the LB config is present. When either of these conditions becomes False
, the CoreDNS pod could be deleted. These are manual steps.
We are not recommending that the customer delete the coreDNS pod but pointing out that if the customer's DNS solution is configured correctly, then the cluster could function without the self-hosted coreDNS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have the MCO remove the CoreDNS pod if the conditions it needs to be configured change? Eg if they decide to disable the feature, the MCO could recognise that desire and remove the CoreDNS pod right?
I think what you're saying here makes sense, it wasn't clear to me what's lifecycling the pod, so we should make sure the context of MCO lifecycling the pod is clear and then maybe clarify the workflow for the user to disable or remove the CoreDNS as well? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should make sure the user provisioned DNS is functioning before we remove the CoreDNS pod. So, if this capability is disabled day-2, the responsibility to check for a functioning DNS alternative would fall on which component? Or do we assume that the customer knows what they are doing and not have any checks?
As you can tell, I have some unanswered questions in this area. I think if we decide to allow that in the future, we should be able to add something to the Spec
to control that behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the customer disables the feature, and we remove the pod without any checks, what will break? And how hard would it be for them to then recover the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the customer disables the feature, and we remove the pod without any checks, what will break? And how hard would it be for them to then recover the cluster?
IMO, disabling the feature would mean that the customer would want to start using the cloud default DNS and discontinue using their external DNS and the in-cluster DNS. When the feature is disabled, the customer could remove/delete entries from their external DNS. MCO could delete the CoreDNS pod. How do we configure the cloud default DNS then? We could:
- Ask the customer to manually configure the cloud DNS using LB values gathered from Infra CR or cloud CLI
- The Installer is currently responsible for configuring the cloud DNS for API and API-Int. If the cloud DNS has to be configured with these values day-2, MCO (or another appropriate component has to take on this task).
My current understanding is that disabling this feature day-2 would be an exception. Providing just the manual option seems sufficient at this time. @JoelSpeed, @zaneb
4. After the Installer uses the cloud specific terraform providers to create | ||
the LBs for API and API-Int, it will add the LB DNS Names of these LB to a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/). This ConfigMap is | ||
only created when the custom DNS feature is enabled. This ConfigMap gets | ||
appended to the Ignition file created by the Installer. Let us call this | ||
the `lbConfigforDNS` ConfigMap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a configmap and not on the status of the infrastructure object? Doesn't the installer already populate the status of the infrastructure object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had explored that option in #1276 but ran into some implementation issues within the Installer.
The Installer generates all its manifests, adds them to the bootstrap ignition which is written to an s3 bucket (in the case of AWS) before terraform is started to create and configure cloud infrastructure. So, the Infrastructure manifest has already been written to the bootstrap ignition before the LBs are created via terraform. We found it easier to create a new configmap, append it to the bootstrap ignition and re-write to the s3 bucket than updating the Infrastructure CR that has already been written to the bootstrap ignition file.
Secondly, we don't expect the customer to interact with the configmap at all and for the LB information within it to change (no operator is monitoring the LB values today). It is meant to be a simple mechanism to pass the LB DNS information to MCO from the Installer.
Fwiw, we haven't completely given up on finding a way to update a manifest already written to the bootstrap ignition. Also, another operator (say MCO) could potentially read this configmap and update the Infrastructure CR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an API perspective, I would much rather see the installer update the status of the infra object (I appreciate the challenges you've outlined) than use a configmap. Configmaps have no validation and aren't real APIs. Having something in cluster change that value based on the configmap value creates a confused deputy style problem.
It's not a blocker per se, but it would help me sleep at night if we could update the infrastructure status directly, or some other in cluster API rather than a configmap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current Installer architecture makes it very hard to update the bootstrap ignition once it is generated. And the bootstrap ignition is generated before we know the LB IPs and regeneration is also not possible (installer limitation and re-generation would cause us to loose user edits to bootstrap ignition that might have happened in the meantime). We did recognize that updating the Infrastructure is the best option but we are going with our 2nd best option because appending to the bootstrap ignition rather than update a manifest that was already written to it is currently our only viable option.
With the updates happening to the Installer code to remove dependency on terraform, we hope to influence the design to make things like this easier to accomplish within the Installer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickdillon is opposed to updating the Infrastructure CR within the Installer due to the amount of surgery needed on the bootstrap ignition file. I am not sure if the Installer changes needed to remove terraform make things better. Even if they do, that won't be available until a later release.
If the Infrastructure can be updated with the data in ConfigMap by a component other than the Installer, I am open to that. I already explored MCO as an option early on and that doesn't work either. Any other options seem viable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wrap it in a ConfigMap and add it to the manifests directory at all then? It could be any old JSON file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wrap it in a ConfigMap and add it to the manifests directory at all then? It could be any old JSON file.
@zaneb we wanted to treat it like an asset generated by the Installer which it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to @JoelSpeed 's original question. I believe everyone is caught up on the reasons for moving away from the Infra CR although we started there originally :-)
Yes, the Installer populates the Status of the Infra CR while generating the Infra manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be for MCO to read the configMap and update Infrastructure CR with these values. This was initially thought of as not a possibility because MCO did not own the Infrastructure resource. But, recent discussions seem promising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we wanted to treat it like an asset generated by the Installer
Oh, like you wanted the user to be able to edit it as a manifest? That makes sense if that is actually a requirement. Is it though? IIUC there's no additional detail that users can add beyond what they already provide in the install-config, so really they can only mess it up.
If that's not a requirement, there are heaps of Assets that get added to the bootstrap ignition without being manifests as such - most of the ones in this list.
```yaml | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: LBConfigforDNS | ||
namespace: openshift-aws-infra | ||
data: | ||
internal-api-lb-dns-name: "abc-123" | ||
external-api-lb-dns-name: "xyz-456" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding why this middle man is needed here, can you expand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope https://github.com/openshift/enhancements/pull/1468/files#r1373852053 answers your question.
successful cluster install, to configure their own DNS solution. | ||
|
||
```go | ||
type AWSPlatformStatus struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an API PR for this? We can do the in-depth API review there, general structure looks ok here I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one openshift/api#1606. @barbacbd is working on it.
3. Add a field within the `PlatformSpec` for AWS, Azure and GCP to indicate if | ||
custom DNS is enabled. `PlatformSpec` is within the `Spec` field of the | ||
Infrastructure CR. Here is the update for platform AWS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this spec or status? Can it be changed after a cluster has been bootstrapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I added it to the Spec
field because this is configuration that is provided by the user. But, as you point out this cannot be changed after the cluster has been bootstrapped. If the Status
field is a better place for it, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, if it cannot be changed on day 2, it should be in status. If it can be changed day 2, then we tend to have a spec field that is reflected into the status once the controllers that observe the configuration have had a chance to observe and update themselves based on the new input.
I think for this case, status only is sufficient
0a637ed
to
3da89b0
Compare
@sadasu: This pull request references CORS-1874 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.15." or "openshift-4.15.", but it targets "openshift-4.14" instead. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@sadasu: This pull request references CORS-1874 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.15." or "openshift-4.15.", but it targets "openshift-4.14" instead. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sadasu: This pull request references CORS-2062 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@sadasu: This pull request references CORS-2062 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
7d7b550
to
2053728
Compare
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
5 similar comments
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
@sadasu I can see a merged code related to this (openshift/machine-config-operator#4018 and openshift/installer#7837 at least). Is there a different EP somewhere that supersedes this one? Is this functionality going to be in OCP at some point? |
/reopen |
@sadasu: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@sadasu: This pull request references CORS-2062 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@sadasu: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/remove-lifecycle rotten |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Enhancement proposal for CORS-1874.
Two enhancement proposals preceding this work:
#1276
#1400