Skip to content

Conversation

sadasu
Copy link
Contributor

@sadasu sadasu commented Nov 2, 2022

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the major problems we need to work toward solving are:

  1. Figuring out the ingress load balancer story
  2. Answering the open question about configuring the installer-provisioned load balancers with dns records

I have more ideas on the latter so I will follow up with investigation on that.

Comment on lines 115 to 116
5. The Ingress Controller continues managing the Load Balancer configuration for
the Ingress service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to spell out an ingress load balancer solution. The ingress DNS solution is already specified: users will manage DNS entries externally and the ingress controller DNS will be disabled; but the ingress load balancer bits need more detail.

It says above that the users will pre-create ingress dns entries. How will those values get passed to and used by the ingress controller?

Comment on lines 127 to 123
8. The Installer adds the above API-Int Server IPs to the AWSPlatformStatus
struct within the Infrastructure Status. A new field needs to be added to
AWSPlatformStatus for this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to carefully consider whether having IP addresses rather than the load balancer dns name will be sufficient. This comes up for both the external and internal load balancers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating the manifest within the MCO for starting the CoreDNS pod needs the values of the IP addresses to be present in the Infrastructure CR.
If the comment is about getting LB DNS names from the user via install-config, then we are also choosing the path of asking the user to pre configure their LB before calling "create cluster". So, as soon as the DNS and LB configuration workflow (as detailed in the "Design Details" section) is chosen, the install-config inputs will become clear.

@sadasu sadasu force-pushed the custom-dns branch 3 times, most recently from f102f79 to f75f859 Compare November 9, 2022 14:44
Comment on lines 37 to 41
This external Custom DNS solution would be responsible for API and Ingress Server
resolution only. OpenShift would be responsible for creating and managing a DNS
solution for API-Int Server resolution. The cloud Load Balancer is still the
preferred solution for providing load balancing services for API, API-Int and Ingress
servers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen the phrases "Ingress Server" or "API-Int Server" used anywhere else, other than an instance of "API-Int Server" that I found when I grepped a clone of the openshift/installer repository just now. Is there any reason not to write this directly in terms of DNS records?

Suggested change
This external Custom DNS solution would be responsible for API and Ingress Server
resolution only. OpenShift would be responsible for creating and managing a DNS
solution for API-Int Server resolution. The cloud Load Balancer is still the
preferred solution for providing load balancing services for API, API-Int and Ingress
servers.
This external Custom DNS solution would be responsible for DNS records for `api` and
ingress only. OpenShift would be responsible for enabling in-cluster clients to resolve
the `api-int` name. The cloud Load Balancer is still the preferred solution for providing
load balancing services for `api`, `api-int`, and ingress.

Comment on lines 279 to 282
This solution considers adding a pre-install step that configures Load Balancers
for API and Ingress services. The DNS names generated per subnet/Availability Zone
for API and Ingress services are output to the user in the form of DNS CNAME
config. Customers can use this generated output to configure their DNS servers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this approach is supposed to work for ingress. Are you assuming that it is possible to have the service for ingress use the pre-created ELB? I'm pretty sure that that is not possible today. When cluster-ingress-operator creates a service for the default ingresscontroller, kube-controller-manager will provision a new ELB. I guess you could do the following:

  1. Create an ingress ELB for bootstrap.
  2. Let cluster-ingress-operator/kube-controller-manager create another ingress ELB.
  3. Tell the cluster admin to update DNS to target the new ELB.
  4. Delete the superannuated ingress ELB.

That's rather messy though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah Does the latest update take care of your concerns?


This solution will have additional changes/implications in other components of
OpenShift that this enhancement hasn't tackled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that you can put at Network load balancer in front of your application load balancer and specify a static IPv4 address for the Network load balancer: https://www.infoq.com/news/2021/10/aws-alb-static-ips/
So that would be a way to do it all in one step (user gives us the IP address their A record points to, then we use it as the static IP of the NLB). There is a cost to this, however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use NLBs. I think your suggestion is consistent with a PoC that @sadasu and I have been working through. I have tried to capture the idea here: https://issues.redhat.com/browse/CORS-2062?focusedCommentId=21236961&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21236961

@sadasu sadasu force-pushed the custom-dns branch 7 times, most recently from 6086ec3 to 568fdbb Compare November 21, 2022 17:35
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 17, 2023
@sadasu
Copy link
Contributor Author

sadasu commented Jan 25, 2023

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2023
subnets of Private Hosted Zones for `api-int` and `*.apps` for in-cluster
access to the Ingress service.

3. The Installer also creates a ConfigMap(CloudLBConfig) to store LB API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (creating an in-cluster resource after terraform execution) is sort of uncharted territory for the installer, so I think it would be beneficial to think through exactly how/when this is created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to use PlatformStatus for these addresses the same way on-prem does. I'm not sure that helps with the ordering part, but it's consistent with what the existing coredns configs expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CloudLBConfing ConfigMap is serving as the location for storing these IP addresses in this proposal. Yes, we could use the PlatformStatus to pass this information as in the on-prem case. In the custom DNS case, we want to provide these addresses in a friendly format to the customer to configure their own DNS. If PlatformStatus can do the job then we could just use that instead of the ConfigMap.

Comment on lines 111 to 112
the API and API-Int servers. MCO uses the information in the `CloudLBConfig`
ConfigMap to generate the CoreFile to configure the CoreDNS instance in the pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What resource does the MCO typically look at to populate these files? It seems like we should be using an API object rather than a configmap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the on-prem case, the CoreDNS pod also contains a LoadBalancer so there was no need to pass that information to MCO. In the case of the Custom DNS feature, Load Balance configuration is happening within the Installer (for API and API-Int) and Ingress (for *.apps). So, this LB IP addresses need to be provided to MCO so that the CoreDNS pods can be configured with these IP addresses.
We are using the ConfigMap to solve 2 puposes.

  1. as a source of LB IP addresses to configure CoreDNS
  2. as a source of config that the customer can use to configure their own DNS when cluster creation has completed.

Comment on lines 131 to 133
8. MCO will also be required to update the CloudLBConfig ConfigMap with the DNS
configuartion for external access to the cloud services. This takes care of all
the DNS configuartion to bring up the cluster successfully.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ingress? I think API would already be handled by Installer in step 3--no MCO intervention needed for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to provide DNS resolution for *.apps and that is why we need this change.

Major milestones in the life cycle of a proposal should be tracked in `Implementation
History`.

## Alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the idea of using /etc/hosts, but IIUC we decided coreDNS was sufficient for all use cases. If my understanding is correct, I think it would be good to explain how/why we are choosing coreDNS over modifying /etc/hosts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Alternatives" section has some information about this.

Copy link
Member

@cybertron cybertron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did we come down on unrolling the internal DNS changes after deployment? It's not strictly necessary, but I could see some customers wanting to do that. We probably want to talk about that if we're going to support it.

subnets of Private Hosted Zones for `api-int` and `*.apps` for in-cluster
access to the Ingress service.

3. The Installer also creates a ConfigMap(CloudLBConfig) to store LB API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to use PlatformStatus for these addresses the same way on-prem does. I'm not sure that helps with the ordering part, but it's consistent with what the existing coredns configs expect.

ConfigResource(CR) with the configuration required to configure DNS for these
services.

7. The MCO will be updated to watch the DNSRecord CR for the DNS configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want MCO watching this. If it rolls out changes it will reboot the entire cluster. I think you want to add the DNSRecord to the runtimecfg monitor for coredns and let it populate the record.

MCO would just put the record template in place for coredns, runtimecfg would be responsible for actually populating it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCO will watch DNSRecord to only update the CloudLBConfig configmap. Runtimecfg would monitor the CloudLBConfig ConfigMap and update the CoreFile accordingly. These 2 actions are split between Step 7 and 8. Reversing the order might help.

@sadasu
Copy link
Contributor Author

sadasu commented Feb 8, 2023

@JoelSpeed could you please take a look at the additions proposed to the Infrastructure CR?

type AWSPlatformStatus struct {
<snip>
// APIServerDNSConfig contains information to configure DNS for API Server.
// This field will be set only when the userConfiguredDNS feature is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an API field somewhere that tells the user that userConfiguredDNS is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Current understanding is that having non-empty DNSConfig entries within AWSPlatformStatus will be sufficient. The user sets the userConfiguredDNS field in the install-config to Enabled to turn on this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an implicit behaviour, I wonder if we want something more explicit/declarative?

If not, this comment should be updated because userConfiguredDNS isn't a thing in the cluster, I'm thinking it would be more like The presence of this field allows configuration of external DNS or something like that, more about the long term feature from a user perspective than a specific installer feature, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the comment, I was trying to clarify that ClusterDNSConfig will contain non-empty values when a specific action is triggered by the user (setting the userConfiguredDNS config specifically to Enabled) and not just as a result of the general running of the cluster. Is the API review openshift/api#1397 a better place to discuss this?

Regarding, adding an explicit config item to indicate custom DNS is in use, I did consider that while designing this. I was under the impression that we would want to keep the number of net new config items to the minimum so if there were other ways to get this information, we should utilize those. No strong preference. Is this a good item to has out in the implementation review?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets continue the discussion on the API PR then

Comment on lines +181 to +215
functionality. This config is not added to any platform specific section of
the config because there are plans to allow this functionality in Azure and GCP
too. The validation for this config will disallow this value being `Enabled` in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this option will be universal but doesn't apply to platforms other than AWS, Azure and GCP? Better IMO to add the option to each platform's configuration else you end up with configuration that a user could set, that then doesn't apply. It may not be obvious that this option doesn't apply to say, BareMetal platforms if it's at a parent scope. Within the specific platform configuration it's much more obvious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is precedent for both options. Here is an example of config added outside platform specific sections. https://github.com/openshift/installer/blob/dbcb9b2930b985e0c35d8e55c8bf96be86629428/pkg/types/installconfig.go#L161. The validation code makes sure to inform the user when it is configured with an incorrect value on the wrong platform. CredentialsMode is another example.
I see this config as an attribute of the installation and not of the underlying cloud so, I see this being in the non-platform specific section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go towards making this in platform specific as it's more obvious which platforms it's supported in, just from the schema, rather than being validated later, but if you feel strongly, since this is in install config, not much I can do about it ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on whether the field should be top-level or a child of platform. I think Joel's suggestion of having it as a child of platform makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this para needs updating since it still talks as if we are going down the route of something top level, but the API looks like we are going for platform specific

annotations:
controller-gen.kubebuilder.io/version: v0.9.2
creationTimestamp: null
name: installconfigs.install.openshift.io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new CRD to track how the cluster was installed? Question, why would the DNS configuration not live on the infrastructure resource?

What do you see as the future of this resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the DNS config is added to the AWSPlatformStatus struct in the PlatformStatus struct of the Infrastructure resource as detailed 1st in the API Extentions section.

No new CRD is being added.

This section is pointing out the addition to the install-config provided to OpenShift during cluster creation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Install config is a installer thing then? The installer consumes it to configure the cluster resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed yes, that is correct.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 14, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from sadasu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

CoreFile to now also include DNS configuration for both internal and
external access to the cluster services (the `*.apps` records).

9. Once the cluster is functional, the user can look at the contents of the new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think prior to this step, we should add that the installer would need to be connecting to the API using the LB DNS name, since the DNS records for the api have not been established yet.

I think we have an open question about whether this is technically feasible, specifically with the generated certs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to do that. The CoreDNS pod would be providing API server resolution during cluster installation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no issue with the generated certs. Inter-pod communication happens with certs generated for API-Int which will always be resolvable by the in-cluster DNS.
The certs generated for api and *.apps are generated using "api. / *.apps.. That remains the same no-matter which DNS ends up resolving the request, the in-cluster one or the external one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this still is a little hazy to me, so let me try to spell it out a bit. During installation, the installer will connect to the api using the kubeconfig: . That kubeconfig will have the server set as api.<clustername>.<basedomain>. But at the time that the installer is querying the API, there will not be a DNS record for api.<clustername>.<basedomain>, right? So I presume we would need to change the installer code so that the installer queries the LB DNS name directly. That is also why I was wondering about the certificates. I hope that makes sense!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. This might become an exercise in figuring out what the current installer code will allow and less about what is the best solution for the customer. Will look through diff options on how this issue can be averted and will report back.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I presume we would need to change the installer code so that the installer queries the LB DNS name directly. That is also why I was wondering about the certificates. I hope that makes sense!

You can pass in a dial configuration when you create a kubernetes client to dial to a different IP address than listed in the server. An example into what was implemented in hive is here: https://github.com/openshift/hive/pull/1817/files

Should be possible to extend this to use the DNS label on the load balancer or an IP address.

<snip>
// AWSClusterDNSConfig contains all the DNS config required to configure a custom DNS solution.
// +optional
AWSClusterDNSConfig *ClusterDNSConfig `json:"awsClusterDNSConfig,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this need AWS in the name? Given it's already within the aws status?

type AWSPlatformStatus struct {
<snip>
// APIServerDNSConfig contains information to configure DNS for API Server.
// This field will be set only when the userConfiguredDNS feature is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets continue the discussion on the API PR then

Comment on lines +181 to +215
functionality. This config is not added to any platform specific section of
the config because there are plans to allow this functionality in Azure and GCP
too. The validation for this config will disallow this value being `Enabled` in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this para needs updating since it still talks as if we are going down the route of something top level, but the API looks like we are going for platform specific

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2023
contraints. It is important for OpenShift to provide a way for these customers
to use their own preferred DNS solution while supporting their IPI deployments.

This external Custom DNS solution would be responsible for DNS records for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to call out the external DNS solution would also be responsible for DNS records for required OpenShift resources including quay.io, registry.redhat.io, api.openshift.com, cloud management endpoints, etc.

Without these, the cluster would cease to work.

We've seen this issue in a managed offering where the customer brings their own DNS server which doesn't allow resolution of some necessary services.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bennerv is it OK if the entries for quay.io, registry.redhat.io, api.openshift.com, cloud management endpoints are added to the external DNS after initial cluster install has completed with just api, api-int and ingress?


If the user successfully configures their external DNS service with api,
api-int and *.apps services, then they could optionally delete the in-cluster
CoreDNS pod and the cluster is expected to function fully as expected. This is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work with new worker nodes?

Even if you delete the coredns pods, I'm assuming they will be spun up on creation of new workers as it's vital that these resolve api-int in order to register themselves with the kube-apiserver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CoreDNS pods would be running only on the control plane nodes. The external DNS solution would have to be configured to resolve API-Int too. Am I missing something here? Do we not want the external DNS to have entries for api-int?

Copy link

@bennerv bennerv May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment gives some reasoning behind this.

a completely optional step with this design. If the customer does configure
their custom DNS solution and leave the in-cluster CoreDNS pod running, all
in-cluster entities will end up using the CoreDNS pod's DNS services and all
out of cluster requests will be handled by the external DNS.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the ingress service IP address changes? i.e. the customer removes the service of type load balancer, and a new one is created by the ingress operator?


3. The Installer will also update the new fields `APIServerDNSConfig` and
`InternalAPIServerDNSConfig` within the `AWSPlatformStatus`, with the LB
IP addresses and the DNS Record types for API, API-Int servers respectively.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of an ask, not relevant to everyone.

It would be nice to make this CRD pluggable, so managed services or someone else could add additional DNS resolution to the static coredns pods. Would enable an egress lockdown or zero egress solution within managed services.

the API and API-Int servers. MCO uses the DNS config in the `AWSPlatformStatus`
to generate the CoreFile to configure the CoreDNS instance in the pod.
Note: This implementation differs from the on-prem case by not staring the
keepalived or haproxy containers for load balancing.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get a link to the on-prem implementation?

CoreFile to now also include DNS configuration for both internal and
external access to the cluster services (the `*.apps` records).

9. Once the cluster is functional, the user can look at the contents of the new
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I presume we would need to change the installer code so that the installer queries the LB DNS name directly. That is also why I was wondering about the certificates. I hope that makes sense!

You can pass in a dial configuration when you create a kubernetes client to dial to a different IP address than listed in the server. An example into what was implemented in hive is here: https://github.com/openshift/hive/pull/1817/files

Should be possible to extend this to use the DNS label on the load balancer or an IP address.

currently no way of knowing these LB IP addresses ahead of time and so the
the customer's DNS solution cannot be configured before cluster installation.

OpenShift will start static CoreDNS pods to provide DNS resolution for API,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for a node to come online, they make a call to the machine config server via the api-int domain. This happens for workers and masters, but not for bootstrap.

How will a node go through ignition if there's no resolution of the api-int domain it tries to reach out to. The machine config server will have to be overridden with the DNS label set on the load balancer in AWS.

In Azure, you can only set dns labels on public IP addresses for this. Since api-int sits behind a load balancer with a dynamic frontend private IP address, it would have to be passed to the machine config server once the bootstrap node came online.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bennerv Thanks for your comment. If the machine config operator were to start a CoreDNS pod to provide api and api-int resolution (to the LB DNS name) starting from the bootstrap node and then later on the control plane nodes, would that take care of the api-int resolution for node ?

I am working on another proposal where the customer would have to pre-create the api and api-int LBs and configure the custom DNS to resolve these URLs before starting cluster installation.

Copy link

@bennerv bennerv May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadasu Here's the issue.

The ignitionHost parameter defined always points to api-int. Since there is a requirement on the resolvability of api-int to exist in order for any master or worker to come online, api-int needs to be able to be resolved during cluster creation, and not dependent on customers configuring their external dns solution. Other platforms get around it by allowing a VIP to be created in advanced, similar to your suggestion where things are precreated.

If resources are not pre-created, then this method will not work on Azure without feeding the IP address of the internal load balancer back to the ignitionHost variable.

would mean that the Ingress Controller would have to append the /etc/hosts file
after the Ingress Controllers had configured their LBs. We abandoned this
approach in favor of leveraging the prior work done for on-prem platforms and
decided to go with the static CoreDNS pods.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/etc/hosts files also do not support wildcard DNS entries.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 1, 2023

@sadasu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@patrickdillon
Copy link
Contributor

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 1, 2023
@sadasu
Copy link
Contributor Author

sadasu commented May 16, 2023

Closing this enhancement in favor of #1400

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 31, 2023

@sadasu: Failed to re-open PR: state cannot be changed. The custom-dns branch was force-pushed or recreated.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Contributor Author

sadasu commented Aug 31, 2023

/reopen

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 31, 2023

@sadasu: Failed to re-open PR: state cannot be changed. The custom-dns branch was force-pushed or recreated.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@barbacbd barbacbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/enhancements/pull/1400/files#r1279672504 - Is the installer required to clean up the public dns records or since we cannot clean up private records just leave both ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.