Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORS-2818: Adding ability to render Cloud LB IPs #286

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

sadasu
Copy link
Contributor

@sadasu sadasu commented Nov 5, 2023

Add the ability to render and monitor Cloud LB IPs. This capability is used only when userProvisionedDNS is enabled on some (AWS, GCP and Azure) cloud platforms via install-config.

On some cloud platforms (AWS, Azure, GCP), the user cannot use the cloud's default DNS solution. They are expected to use their own custom DNS solution that is external to the cluster. OpenShift is not allowed to configure this DNS solution. In this scenario, these same customers want to continue using the cloud provided Load Balancers (LBs).

OpenShift is expected to continue configuing the cloud LBs for API, API-Int and Ingress access. Since the LB information is not available, before cluster installation, the user cannot configure their custom DNS solution before cluster installation.

So, to support this mode, OpenShift needs to start its in-cluster CoreDNS based DNS solution for API, API-Int and Ingress resolution so that cluster installation is successful. The customer is expected to configure their DNS solution post-install.

At this the time, the cloud's API and API-Int LBs are configured by the Installer and its value is not expected to change during the life of the cluster. The Ingress operator continues to handle Ingress LB configuration.

Based on enhancement: openshift/enhancements#1468

@openshift-ci openshift-ci bot requested review from cybertron and mkowalski November 5, 2023 10:06
@sadasu sadasu changed the title Adding ability to generate Corefile using LB IP addresses CORS-2818: Adding ability to generate Corefile using LB IP addresses Nov 5, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 5, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 5, 2023

@sadasu: This pull request references CORS-2818 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

A new capability is being added to cloud platforms (starting with AWS, Azure and GCP) where the cloud LBs can be used but not the cloud DNS. So, in-cluster DNS is provided by a CoreDNS pod during install. The customer can optionally bring their own DNS after install complete.

This commit adds the ability to generate a CoreDNS Corefile with entries for API and API-Int URLs when their corresponding LB IPs are provided. These LB IPs are not expected to change during the life of the cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Contributor Author

sadasu commented Nov 5, 2023

/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 5, 2023

@sadasu: This pull request references CORS-2818 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 6, 2023

@sadasu: This pull request references CORS-2818 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

A new capability is being added to cloud platforms (starting with AWS, Azure and GCP) where the cloud LBs can be used but not the cloud DNS. So, in-cluster DNS is provided by a CoreDNS pod during install. The customer can optionally bring their own DNS after install complete.

Based on enhancement: openshift/enhancements#1468

This commit adds the ability to generate a CoreDNS Corefile with entries for API and API-Int URLs when their corresponding LB IPs are provided. These LB IPs are not expected to change during the life of the cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -437,7 +443,7 @@ func GetConfig(kubeconfigPath, clusterConfigPath, resolvConfPath string, apiVips
} else {
ingressVip = nil
}
newNode, err := getNodeConfig(kubeconfigPath, clusterConfigPath, resolvConfPath, apiVip, ingressVip, apiPort, lbPort, statPort)
newNode, err := getNodeConfig(kubeconfigPath, clusterConfigPath, resolvConfPath, apiVip, ingressVip, apiPort, lbPort, statPort, apiLBIPs[0], apiIntLBIPs[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it always work? What if [0] does not exist? Can we get some basic unit tests please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding unit tests. To answer the other question, apiLBIPs[0], apiIntLBIPs[0] would be "" by default. It will have actual values only when the userProvisionedDNS feature is enabled.

Copy link
Contributor

@mkowalski mkowalski Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I miss something obvious but I don't see that... You call GetConfig by passing []net.IP{} as apiLBIPs and then you reference apiLBIPs[0]. This does not work, try running that code

a := []net.IP{}
fmt.Println(a[0])

and the output is not "" but a panic runtime error: index out of range [0] with length 0.

The initialization code is

cloudIntLBIPs, err := cmd.Flags().GetIPSlice("cloud-int-lb-ips")
if err != nil {
	cloudIntLBIPs = []net.IP{}
}

meaning if you don't have the feature enabled you pass empty list, not ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mkowalski ! I needed to update the logic in that part of the code too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really recommend starting from unit tests and then proceed with the implementation... What I can see in the current code is that it is still panicking because []net.IP{} and nil are different entities (i.e. empty slice and nil is not the same in golang).

Look at the following code

a := []net.IP{}
if a == nil {
	fmt.Println("nil")
} else {
	fmt.Println(a[0])
}

or run it via playground, e.g. https://go.dev/play/p/UbrxBrUstTw

@sadasu
Copy link
Contributor Author

sadasu commented Nov 7, 2023

/retest-required

Retesting after openshift/installer#7671 merged.

@davegord
Copy link

davegord commented Nov 7, 2023

/retest

@mkowalski
Copy link
Contributor

/hold

Some fundamental flaws in error handling, lets work on them first

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 7, 2023
@sadasu
Copy link
Contributor Author

sadasu commented Nov 10, 2023

/retest

1 similar comment
@sadasu
Copy link
Contributor Author

sadasu commented Nov 13, 2023

/retest

@sadasu sadasu changed the title CORS-2818: Adding ability to generate Corefile using LB IP addresses WIP: CORS-2818: Adding ability to generate Corefile using LB IP addresses Nov 16, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 16, 2023
@sadasu sadasu force-pushed the custom-dns branch 2 times, most recently from 334f418 to 5c404e4 Compare November 16, 2023 21:22
@sadasu
Copy link
Contributor Author

sadasu commented Nov 21, 2023

/test e2e-metal-ipi-ovn-ipv6

1 similar comment
@sadasu
Copy link
Contributor Author

sadasu commented Nov 27, 2023

/test e2e-metal-ipi-ovn-ipv6

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 27, 2023

@sadasu: This pull request references CORS-2818 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Add the ability to render and monitor Cloud LB IPs

On some cloud platforms (AWS, Azure, GCP), the user cannot use the cloud's default DNS solution. They are expected to use their own custom DNS solution that is external to the cluster. OpenShift is not allowed to configure this DNS solution. In this scenario, these same customers want to continue using the cloud provided Load Balancers (LBs).

OpenShift is expected to continue configuing the cloud LBs for API, API-Int and Ingress access. Since the LB information is not available, before cluster installation, the user cannot configure their custom DNS solution before cluster installation.

So, to support this mode, OpenShift needs to start its in-cluster CoreDNS based DNS solution for API, API-Int and Ingress resolution so that cluster installation is successful. The customer is expected to configure their DNS solution post-install.

At this the time, the cloud's API and API-Int LBs are configured by the Installer and its value is not expected to change during the life of the cluster. The Ingress operator continues to handle Ingress LB configuration.

Based on enhancement: openshift/enhancements#1468

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu sadasu changed the title WIP: CORS-2818: Adding ability to generate Corefile using LB IP addresses CORS-2818: Adding ability to generate Corefile using LB IP addresses Nov 27, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 27, 2023
@sadasu sadasu changed the title CORS-2818: Adding ability to generate Corefile using LB IP addresses CORS-2818: Adding ability to render Cloud LB IPs Nov 28, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 28, 2023

@sadasu: This pull request references CORS-2818 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Add the ability to render and monitor Cloud LB IPs. This capability is used only when userProvisionedDNS is enabled on some (AWS, GCP and Azure) cloud platforms via install-config.

On some cloud platforms (AWS, Azure, GCP), the user cannot use the cloud's default DNS solution. They are expected to use their own custom DNS solution that is external to the cluster. OpenShift is not allowed to configure this DNS solution. In this scenario, these same customers want to continue using the cloud provided Load Balancers (LBs).

OpenShift is expected to continue configuing the cloud LBs for API, API-Int and Ingress access. Since the LB information is not available, before cluster installation, the user cannot configure their custom DNS solution before cluster installation.

So, to support this mode, OpenShift needs to start its in-cluster CoreDNS based DNS solution for API, API-Int and Ingress resolution so that cluster installation is successful. The customer is expected to configure their DNS solution post-install.

At this the time, the cloud's API and API-Int LBs are configured by the Installer and its value is not expected to change during the life of the cluster. The Ingress operator continues to handle Ingress LB configuration.

Based on enhancement: openshift/enhancements#1468

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu sadasu force-pushed the custom-dns branch 2 times, most recently from a671579 to d431f2d Compare December 1, 2023 21:41
@sadasu
Copy link
Contributor Author

sadasu commented Dec 2, 2023

/retest-required

})
})
})
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, I would only see some additional test cases for mainly your function being called with the following

  • empty API LB
  • empty API INT
  • empty Ingress
  • empty Node

This is to make sure it handles missing and/or incorrect input correctly and does not crash the whole application when someone calls it with insane input

var apiLBIP, apiIntLBIP, ingressIP net.IP
nodes := []Node{}
ipCount := 0
if len(clusterLBConfig.ApiIntLBIPs) > len(clusterLBConfig.IngressLBIPs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is it possible that we have different number of IPs? This is a honest question, in the IPI loadbalancer we do there is always a requirement that you have the same number of IPs for API and for Ingress

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the cloud IPI case with BYO DNS, there are internal and external Ingress LBs. That is one way to have multiple entries for Ingress. For private clusters, we don't expect to have any IPs for API LB IPs, only API-Int and Ingress LBs are expected to be running. This is a way to future-proof our implementation for all combinations of LB IPs.

if i < len(clusterLBConfig.ApiIntLBIPs) {
apiIntLBIP = clusterLBConfig.ApiIntLBIPs[i]
} else {
apiIntLBIP = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in L718 you wrote var apiIntLBIP net.ip you do not need to assign it to nil explicitly as it's the value you get from start

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiIntLBIP and other LB IP values are being set within a loop starting at L726 so its value could have been !nil in the previous iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see it now... In that case this deserves at least a comment in Simple English directly in the code or some kind of a simplification...

As I read this whole loop now (L726-L730) is that when you finish the loop, you want apiIntLBIP to have a value of the last element of the clusterLBConfig.ApiIntLBIPs slice. Is that correct? If yes, in pseudocode that would be

apiIntLBIP = get_last_element(clusterLBConfig.ApiIntLBIPs)

Do I get the desired behaviour right? If yes, I'd rather do something like

apiIntLBIP = clusterLBConfig.ApiIntLBIPs[len(clusterLBConfig.ApiIntLBIPs)-1]

instead of iteration in which every time we assign a value only to ignore it afterwards and use the last assignment.

If my reasoning is wrong and your desired behaviour is different then ignore what I wrote, but that means a comment is more than desired

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to follow the precedence set by https://github.com/openshift/baremetal-runtimecfg/blob/master/pkg/config/node.go#L421-L447. There are definitely other ways to implement this too.
Adding a comment for clarity.

if len(clusterLBConfig.ApiLBIPs) != 0 && i < len(clusterLBConfig.ApiLBIPs) {
apiLBIP = clusterLBConfig.ApiLBIPs[i]
} else {
apiLBIP = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, no need for an explicit nil-assignment

if len(clusterLBConfig.IngressLBIPs) != 0 && i < len(clusterLBConfig.IngressLBIPs) {
ingressIP = clusterLBConfig.IngressLBIPs[i]
} else {
ingressIP = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, no need for an explicit nil-assignment

if err != nil {
return err
}
// Populate cloud LB IP addresses for platforms where the cloud LBs
// have already been configured
newConfig, _ = config.PopulateCloudLBIPAddresses(clusterLBConfig, newConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not ignore errors here (anywhere, in general)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Updated.

@@ -694,3 +713,80 @@ func PopulateNodeAddresses(kubeconfigPath string, node *Node) {
}
}
}

func getConfigWithCloudLBIPs(kubeconfigPath, clusterConfigPath, resolvConfPath string, clusterLBConfig ClusterLBConfig) (node Node, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of iterating over the length of structs and then accessing them using [i] deserves some test as it has a potential to explode if by any chance we access non-existing index. I see here some kind of min() inside of max() so it does feel that we access the safer way now, but a test would be a guarantee for that

@mkowalski
Copy link
Contributor

Hey, reviewed the last revision. The overall design seems okay, there is a couple of language-specific nits + more test cases needed (mainly for error handling and edge cases of passing empty values)

@sadasu sadasu force-pushed the custom-dns branch 2 times, most recently from eb8eefe to ff22640 Compare December 5, 2023 12:36
@sadasu
Copy link
Contributor Author

sadasu commented Dec 5, 2023

/retest

On some cloud platforms, the user cannot use the cloud's
default DNS solution. They are expected to use their own
custom DNS solution that is external to the cluster. OpenShift
is not allowed to configure this DNS solution. In this scenario,
these same customers want to continue using the cloud provided
Load Balancers (LBs).

OpenShift is expected to continue configuing the cloud LBs for
API, API-Int and Ingress access. Since the LB information is not
available, before cluster installation, the user cannot configure
their custom DNS solution before cluster installation.

So, to support this mode, OpenShift needs to start its in-cluster
CoreDNS based DNS solution for API, API-Int and Ingress resolution
so that cluster installation is successful. The customer is
expected to configure their DNS solution post-install.

At this the time, the cloud's API and API-Int LBs are configured
by the Installer and its value is not expected to change during
the life of the cluster. The Ingress operator continues to handle
Ingress LB configuration.

Based on enhancement: openshift/enhancements#1468
Copy link
Contributor

openshift-ci bot commented Dec 5, 2023

@sadasu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mkowalski
Copy link
Contributor

/hold cancel
/lgtm

/cc @cybertron
The code looks sane, please look if the overall design is okay

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 6, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 6, 2023
Copy link
Contributor

openshift-ci bot commented Dec 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mkowalski, sadasu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2023
@openshift-merge-bot openshift-merge-bot bot merged commit 0ff1f6f into openshift:master Dec 6, 2023
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-baremetal-runtimecfg-container-v4.16.0-202312061430.p0.g0ff1f6f.assembly.stream for distgit baremetal-runtimecfg.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants