Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPA on baremetal server is unable to reach ironic's endpoint - ENETUNREACH #1664

Open
agewagra opened this issue Apr 11, 2024 · 14 comments
Open
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@agewagra
Copy link

Steps to reproduce the bug:
follow the deployment steps for ironic and baremetal operator. Use the metal3 ironic-image for deploying ironic. https://book.metal3.io/quick-start.
After deployment, create a baremetalhost for a baremetal server using the yaml file.

Environment:
Deployed on an openshift cluster v4.13 - dual stack environment.

What did you expect to happen:
The IPA on RAMdisk of baremetal server should reach ironic's endpoint in prebooting of the inspection phase and BMH status should be available.

Anything else you would like to add:
We are using a dual stack environment on openshift to make deployments. The baremetal server that we want to provision needs the virtual boot image from ironic to be hosted on ipv6 address. So we gave an ipv6 address for ironic_external_ip. Rest all are ipv4 addresses.

  • Baremetal Operator version: latest
  • Environment (metal3-dev-env or other): Openshift kubernetes cluster - 4.13

Bug:
IPA is trying to perform lookup for node on ironic during inspection. But it is unable to reach the end point.
Screenshot 2024-04-11 at 2 51 44 PM

@metal3-io-bot metal3-io-bot added the needs-triage Indicates an issue lacks a `triage/foo` label and requires one. label Apr 11, 2024
@matthewei
Copy link

matthewei commented Apr 13, 2024

Maybe you can set ironic-endpoint "https://ip:port" to have a try.

@agewagra
Copy link
Author

Hi,
this is the ironic endpoint that we are using: http://10.59.52.241:6385, which I think is the same as what you are saying.
Correct me if I am wrong

@dtantsur
Copy link
Member

Hi! Just to clarify: you're using OpenShift as Kubernetes, but you're not using a built-in Metal3 (that is deployed automatically if you have platform=baremetal)? You're using upstream BMO, Ironic and IPA of the latest versions? (I wonder why, but okay). Generally, we don't debug OpenShift-specific issues in the context of the upstream Metal3 community, but if so:

This is a networking problem, we cannot debug it just from the description. You may need to log into the ramdisk (figuring out what it actual IP is) and check the networking. If you're using a provisioning network and it's IPv6-only, that's the expected outcome, for example.

@agewagra
Copy link
Author

agewagra commented Apr 26, 2024

Hi, thanks for the response.
Yeah we are using Openshift as kuberenetes and yes we are using the latest versions of BMO, Ironic and IPA.

In the iso image on ramdisk, we confirmed that network data is added in openstack/latest/network_data.json file. Our environment does not support DHCP. We used preprovisioningNetworkData in BMH to add network data. This is supposed to work in DHCP less environment according to https://github.com/metal3-io/metal3-docs/blob/main/design/baremetal-operator/image-builder-integration.md. But the IPA is still waiting for DHCP to assign IP, despite giving the static network data. And since there is no ip that is assigned to the BareMetalServer, it is unable to reach back ironic's endpoint.

Is there any way to disable waiting for DHCP from IPA or generate custom IPA without DHCP waiting?

@dtantsur
Copy link
Member

Is it a duplicate of metal3-io/metal3-docs#416 then? Could you close one of them?

@Rozzii
Copy link
Member

Rozzii commented May 15, 2024

@agewagra Have you built your own custon IPA or are you using the default from upstream?
The default upstream IPA doesn't contain the tooling required for network configuration so you would need most likely "Glean" or Cloud-init.
/triage needs-information

@agewagra
Copy link
Author

Yeah we were using default, but realised we should build the custom IPA.
Can we build a custom IPA with RHCOS as base image?

@Rozzii
Copy link
Member

Rozzii commented May 22, 2024

Yeah we were using default, but realised we should build the custom IPA. Can we build a custom IPA with RHCOS as base image?

Technically by creating custom "elements" for the IPA builder or for OpenStack disk image builder you can build IPA out of any base image. IPA builder is based on Openstack Disk Image Builder (DIB) and DIB is a modular tool so you can build your own custom workflows. I would expect that DIB has very good RH distro support by default but I have not used RHCOS. Maybe @dtantsur or @elfosardo is more up to date .

@agewagra
Copy link
Author

agewagra commented May 24, 2024

Yes, thank you. Added Simple-init element to get a dhcp less ramdisk with centOS as base image. But even then on the baremetal server, the networkData is not being applied.

@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 25, 2024
@Rozzii
Copy link
Member

Rozzii commented Aug 27, 2024

Yes, thank you. Added Simple-init element to get a dhcp less ramdisk with centOS as base image. But even then on the baremetal server, the networkData is not being applied.

I have some simple init fixes that I have not pushed upstream because of change in priorities, you could try those: https://github.com/Nordix/glean/commits/parsing_error/ basically glean is the tool that is installed by the simple init element and this takes care of initial network configuration you could install the simple-init and you could export the environment variables listed here https://docs.openstack.org/diskimage-builder/latest/elements/simple-init/README.html to point to my branch and see if it helps.

@Rozzii
Copy link
Member

Rozzii commented Aug 27, 2024

/remove-lifecycle stale

@metal3-io-bot metal3-io-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 27, 2024
@dtantsur
Copy link
Member

/triage accepted
/kind bug
/help

We need to try bringing the fixes to Glean upstream, check with the latest status of the feature in the Ironic project and update the documentation (https://book.metal3.io/bmo/advanced_instance_customization exists but does not cover building the IPA image).

@metal3-io-bot
Copy link
Contributor

@dtantsur:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted
/kind bug
/help

We need to try bringing the fixes to Glean upstream, check with the latest status of the feature in the Ironic project and update the documentation (https://book.metal3.io/bmo/advanced_instance_customization exists but does not cover building the IPA image).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

5 participants