Skip to content

Conversation

@hjensas
Copy link
Contributor

@hjensas hjensas commented Nov 24, 2025

The ci_nmstate role was failing during adoption deploy-infra jobs when cifmw_openshift_kubeconfig was defined but the file didn't exist yet.

Root cause: ci-framework-jobs' adoption-uni-job-base uses variable_files_dirs to scan all YAML files in the scenario directory, including 05-tests.yaml which sets cifmw_openshift_kubeconfig. This has been the case since adoption jobs were introduced in October 2024.

However, during the deploy-infra phase (before deploy-ocp), the OCP cluster and kubeconfig file don't exist yet. The issue was likely exposed by PR #3471 which changed how ansible_user_dir is evaluated, affecting how/when the kubeconfig path gets resolved.

Fix: Add "cifmw_openshift_kubeconfig is exists" check to tasks that use the kubeconfig. The existing code already handles the skipped task gracefully via default([]) safeguards, treating all hosts as "unmanaged" when no k8s cluster is available (which is correct for infra creation).

Depends-On: openstack-k8s-operators/install_yamls#1110

Fixes: OSPCIX-1122
Related: #3471
Assisted-By: Claude Code/claude-4.5-sonnet

The ci_nmstate role was failing during adoption deploy-infra jobs when
cifmw_openshift_kubeconfig was defined but the file didn't exist yet.

Root cause: ci-framework-jobs' adoption-uni-job-base uses variable_files_dirs
to scan all YAML files in the scenario directory, including 05-tests.yaml
which sets cifmw_openshift_kubeconfig. This has been the case since adoption
jobs were introduced in October 2024.

However, during the deploy-infra phase (before deploy-ocp), the OCP cluster
and kubeconfig file don't exist yet. The issue was likely exposed by PR openstack-k8s-operators#3471
which changed how ansible_user_dir is evaluated, affecting how/when the
kubeconfig path gets resolved.

Fix: Add "cifmw_openshift_kubeconfig is exists" check to tasks that use the
kubeconfig. The existing code already handles the skipped task gracefully
via default([]) safeguards, treating all hosts as "unmanaged" when no k8s
cluster is available (which is correct for infra creation).

Fixes: OSPCIX-1122
Related: openstack-k8s-operators#3471
Assisted-By: Claude Code/claude-4.5-sonnet
Signed-off-by: Harald Jensås <[email protected]>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign brjackma for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ae35e9bb56e9400ebebde13780dd1e88

openstack-k8s-operators-content-provider FAILURE in 12m 34s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 4m 21s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 44s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 03s
✔️ build-push-container-cifmw-client SUCCESS in 19m 21s
cifmw-molecule-ci_nmstate FAILURE in 29m 25s

@hjensas
Copy link
Contributor Author

hjensas commented Nov 25, 2025

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/058a6c579aac46759b2024308c0f4137

openstack-k8s-operators-content-provider FAILURE in 15m 11s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 4m 16s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 35s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 41s
✔️ build-push-container-cifmw-client SUCCESS in 20m 11s
cifmw-molecule-ci_nmstate RETRY_LIMIT in 17m 21s

@hjensas
Copy link
Contributor Author

hjensas commented Nov 25, 2025

recheck

Copy link
Contributor

@amartyasinha amartyasinha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Nov 25, 2025
@hjensas hjensas enabled auto-merge (rebase) November 25, 2025 09:49
@abhibongale
Copy link
Contributor

abhibongale commented Nov 25, 2025

I guess cifmw_openshift_kubeconfig is exists needs to be added in roles/openshift_adm/tasks/main.yml as well 🤔

@hjensas
Copy link
Contributor Author

hjensas commented Nov 25, 2025

There is a change being tested in the job configurations that disable variable dir loading, if the variable file with the cifmw_openshift_kubeconfig definition is not loaded ci_nmstate would skip the task on the "is defined" test returning false.

Marking this as do-not-merge for now. @rebtoor FYI.

when: cifmw_openshift_kubeconfig is defined
when:
- cifmw_openshift_kubeconfig is defined
- cifmw_openshift_kubeconfig is exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe:
cifmw_openshift_kubeconfig | length > 0 might be better than exists.

@hjensas
Copy link
Contributor Author

hjensas commented Nov 27, 2025

Closeing this, as I understand it this is happening due to a design issue in downstream jobs where variables are loaded from a directory - resulting in vars being defined too early in the run. There is a discussion to re-think and do it differently. So this change is not required. Ref gitlab ci-framework-jobs MR: 2622

@hjensas hjensas closed this Nov 27, 2025
auto-merge was automatically disabled November 27, 2025 10:43

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants