Skip to content

Conversation

@chrischdi
Copy link
Contributor

@chrischdi chrischdi commented Oct 30, 2025

Summary by CodeRabbit

  • Improvements

    • Reconciliation now better detects infra-machine existence and creation timing to choose sync direction between providers; fetch order and error handling for infra resources improved.
  • Tests

    • Updated integration tests to assert successful ResourceSynchronized (Status=True) and infra-machine creation.
    • New unit tests cover infra-cluster/machine presence and absence scenarios for infra fetch logic.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 30, 2025
@openshift-ci-robot
Copy link

@chrischdi: This pull request references Jira Issue OCPBUGS-63411, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 30, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 30, 2025

Walkthrough

Reconcile now checks whether a Cluster API infrastructure Machine exists before choosing sync direction, adds private helpers to fetch and detect the infra Machine, refactors infra resource fetching to separate infraCluster and infraMachine retrieval, and updates tests to expect a successful CAPI→MAPI synchronization path when the infra Machine is present.

Changes

Cohort / File(s) Summary
Reconcile & infra existence check
pkg/controllers/machinesync/machine_sync_controller.go
Added local capiInfraMachineExists flag, introduced doesCAPIInfraMachineExist(ctx, capiMachine, mapiMachine), and updated Reconcile branching to consider infra-machine presence and creation-time tie-breaker; adjusted nolint for cyclomatic complexity.
Infra fetch helpers & refactor
pkg/controllers/machinesync/machine_sync_controller.go
Refactored fetchCAPIInfraResources to initialize infraCluster first; added fetchCAPIInfraMachine(ctx, name, namespace) to fetch the infra Machine with clearer error handling; updated error messages and return semantics.
Controller tests updated
pkg/controllers/machinesync/machine_sync_controller_test.go
Reworked test expectations from an error-path to a success-path: asserts Synchronized True with ResourceSynchronized reason and verifies a created/retrieved CAPI InfraMachine exists.
New unit tests for infra fetching
pkg/controllers/machinesync/machine_sync_controller_unit_test.go
Added unit tests covering fetchCAPIInfraResources behavior for nil capiMachine, missing infra cluster, infra cluster present with/without infra Machine; uses fake client and AWS platform fixtures.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant R as Reconciler
    participant Exists as doesCAPIInfraMachineExist()
    participant Fetch as fetchCAPIInfraMachine()
    participant CAPI as CAPIMachine
    participant MAPI as MAPIMachine
    participant Infra as InfraMachine

    R->>Exists: check infra existence (capiMachine, mapiMachine)
    Exists->>CAPI: read infrastructureRef from CAPIMachine
    alt infraRef present
        Exists->>Infra: get InfraMachine by ref (ns/name)
    else fallback to MAPI ref
        Exists->>Infra: get InfraMachine by MAPI name/namespace
    end
    Infra-->>Exists: found / not found
    Exists-->>R: return (bool, error)

    alt infra NOT found and tie-breaker favors MAPI
        R->>Fetch: fetchCAPIInfraMachine(...) to init & retrieve infra
        Fetch->>Infra: init & get InfraMachine
        Infra-->>Fetch: infra object / error
        Fetch-->>R: infra machine returned
        R->>R: proceed MAPI → CAPIMachine reconciliation (create/sync)
    else
        R->>R: follow CAPI→MAPI authoritative path or no-op
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review Reconcile branching introduced around capiInfraMachineExists and the creation-time tie-breaker.
  • Inspect fetchCAPIInfraMachine and doesCAPIInfraMachineExist for correct namespace/name resolution and error wrapping.
  • Verify fetchCAPIInfraResources refactor preserves previous semantics and error paths.
  • Validate updated tests and the new unit test file for adequacy of coverage and fake client setup.

Poem

🐰 I sniff the fields where infra hides,

I fetch the refs and check their tides,
If none are found I hop to make,
A bridging bolt for sync's own sake,
Tiny paws, big sync — carrot cake!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing logic to create Cluster API InfraMachine when it doesn't exist but Cluster API Machine does, which aligns with the refactoring of reconciliation flow and the new doesCAPIInfraMachineExist helper.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from nrb and racheljpg October 30, 2025 07:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/controllers/machinesync/machine_sync_controller.go (1)

246-246: Fix typo in comment.

"existense" should be "existence".

Apply this diff:

-	// Check for existense of the Cluster API Infrastructure Machine or if it needs to get created from MAPI first.
+	// Check for existence of the Cluster API Infrastructure Machine or if it needs to get created from MAPI first.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 8207993 and ebbcf2d.

📒 Files selected for processing (2)
  • pkg/controllers/machinesync/machine_sync_controller.go (4 hunks)
  • pkg/controllers/machinesync/machine_sync_controller_test.go (1 hunks)
🔇 Additional comments (5)
pkg/controllers/machinesync/machine_sync_controller_test.go (1)

782-795: LGTM! Test correctly validates the new InfraMachine creation behavior.

The test expectations have been properly updated to reflect the new functionality where the Cluster API InfraMachine is now created successfully when it doesn't exist, rather than failing. The test now:

  • Expects successful synchronization (Status=True, Reason=ResourceSynchronized)
  • Verifies the InfraMachine is actually created

This aligns well with the implementation changes.

pkg/controllers/machinesync/machine_sync_controller.go (4)

258-262: Good timestamp-based precedence logic.

The condition correctly creates the InfraMachine from MAPI only when:

  1. Cluster API is authoritative
  2. CAPI Machine exists but InfraMachine doesn't
  3. MAPI machine was created first (or simultaneously)

The timestamp comparison (mapiMachine.CreationTimestamp.Equal(&capiMachine.CreationTimestamp) || mapiMachine.CreationTimestamp.Before(&capiMachine.CreationTimestamp)) ensures MAPI-originated configuration takes precedence only when appropriate, preventing overwriting of potentially newer CAPI-originated state.


950-974: Clean refactoring that improves modularity.

The refactored fetchCAPIInfraResources now properly separates InfraCluster and InfraMachine retrieval, delegating the latter to the dedicated fetchCAPIInfraMachine helper. This improves code reusability and maintains clear separation of concerns.


976-994: Well-implemented helper function.

The fetchCAPIInfraMachine helper is cleanly designed with:

  • Focused responsibility (fetch infrastructure machine by name/namespace)
  • Proper NotFound handling (returns nil rather than error)
  • Appropriate error wrapping for clarity
  • Platform-agnostic initialization

996-1012: Solid existence check implementation with proper nil handling.

The doesCAPIInfraMachineExist helper correctly:

  • Handles nil capiMachine by falling back to mapiMachine.Name
  • Uses capiMachine.Spec.InfrastructureRef when available for accurate lookup
  • Returns a clear boolean result rather than requiring nil checks
  • Wraps errors appropriately for context

@chrischdi
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 30, 2025
@openshift-ci-robot
Copy link

@chrischdi: This pull request references Jira Issue OCPBUGS-63411, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sunzhaohua2

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sunzhaohua2 October 30, 2025 10:02
@nrb
Copy link
Contributor

nrb commented Oct 30, 2025

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2025
@openshift-ci-robot
Copy link

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nrb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2025
@sunzhaohua2
Copy link
Contributor

Hi @chrischdi I tried to test this, follow below steps:

  • create a capi machine
  • create same name mapi machine with authoritativeAPI: ClusterAPI
  • check if awsmachine will be created.

result: FailedToConvertCAPIMachineToMAPI, awsmachine was not created.

$ oc get machine.m zhsunaws21-zf4rb-worker-us-east-1b-machine
status:
  authoritativeAPI: ClusterAPI
  conditions:
  - lastTransitionTime: "2025-10-31T08:30:08Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2025-10-31T08:30:08Z"
    message: The AuthoritativeAPI status is set to 'ClusterAPI'
    reason: AuthoritativeAPINotMachineAPI
    status: "True"
    type: Paused
  - lastTransitionTime: "2025-10-31T08:30:08Z"
    status: "True"
    type: Terminable
  - lastTransitionTime: "2025-10-31T08:30:08Z"
    message: 'failed to convert Cluster API machine to Machine API machine: unexpected
      InfraMachine type, expected AWSMachine, got <nil>'
    reason: FailedToConvertCAPIMachineToMAPI
    severity: Error
    status: "False"
    type: Synchronized
  lastUpdated: "2025-10-31T08:30:08Z"
  phase: ""

$ oc get machine.m | grep machine                                                 
zhsunaws21-zf4rb-worker-us-east-1b-machine                                                   10m
$ oc get machine.c -n openshift-cluster-api | grep machine                           
zhsunaws21-zf4rb-worker-us-east-1b-machine   zhsunaws21-zf4rb                                                                                       Pending   15m
$ oc get awsmachine -n openshift-cluster-api | grep machine                                
$

@damdo
Copy link
Member

damdo commented Oct 31, 2025

/hold

For checking issue

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2025
@chrischdi
Copy link
Contributor Author

I think this is the expected behavior for the code:

https://github.com/openshift/cluster-capi-operator/pull/402/files#diff-a00680d231c0783b5333cc3e2c6f15e625b31572f57b7fb4b598dcb18d2418f3R258-R259

I did it this way assuming the scenarios are:

  • Create MAPI Machine with CAPI Authoratitive --> create CAPI and Infra machine from MAPI Machine
    • here this PR fixes the case where the infra machine creation did not happen
  • Create CAPI Machine + Infra Machine --> create MAPI Machine from CAPI and Infra Machine

Not sure if it is right to "just create" the InfraMachine from the MAPI Machine in case the CAPI Machine was created first.

WDYT @damdo?

@chrischdi
Copy link
Contributor Author

/retest

@theobarberbany
Copy link
Contributor

Looks like Cluster API Suite: [It] [sig-cluster-lifecycle][OCPFeatureGate:MachineAPIMigration] Machine Migration CAPI Authoritative Tests Machine Migration Round Trip Tests CAPI (and no existing CAPI Machine with same name) -> MAPI -> CAPI round trip should verify mirror machines are deleted when deleting CAPI machine is consistently failing.

Have we impacted the deletion logic with this change?

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
@chrischdi
Copy link
Contributor Author

Looks like Cluster API Suite: [It] [sig-cluster-lifecycle][OCPFeatureGate:MachineAPIMigration] Machine Migration CAPI Authoritative Tests Machine Migration Round Trip Tests CAPI (and no existing CAPI Machine with same name) -> MAPI -> CAPI round trip should verify mirror machines are deleted when deleting CAPI machine is consistently failing.

Have we impacted the deletion logic with this change?

Fun... yeah that could actually be true. Have to check on this.

@theobarberbany
Copy link
Contributor

theobarberbany commented Nov 5, 2025

Have to check on this.

Happy to pair some time this week if you fancy :)

@chrischdi chrischdi force-pushed the pr-machinesync-create-inframachine-if-not-exists branch from ebbcf2d to debacfc Compare November 6, 2025 07:48
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 6, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2025

New changes are detected. LGTM label has been removed.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2025
@openshift-ci-robot
Copy link

@chrischdi: This pull request references Jira Issue OCPBUGS-63411, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sunzhaohua2

In response to this:

Summary by CodeRabbit

  • Improvements

  • Better detection and checks for infrastructure machine presence before syncing.

  • Reconciliation flow refined to avoid premature transitions and to prefer authoritative API when appropriate.

  • Error messages and handling improved to provide clearer context during infra resource operations.

  • Tests

  • Updated tests to assert successful infrastructure resource creation and that synchronization marks resources as synchronized.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@chrischdi chrischdi force-pushed the pr-machinesync-create-inframachine-if-not-exists branch from debacfc to 566c680 Compare November 6, 2025 07:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/controllers/machinesync/machine_sync_controller.go (1)

269-273: Consider simplifying the timestamp comparison.

The logic correctly handles the scenario where the MAPI Machine is created with authoritativeAPI: ClusterAPI and needs to create the missing InfraMachine.

However, the timestamp comparison could be simplified:

-    if !capiInfraMachineExists && (mapiMachine.CreationTimestamp.Equal(&capiMachine.CreationTimestamp) || mapiMachine.CreationTimestamp.Before(&capiMachine.CreationTimestamp)) {
+    if !capiInfraMachineExists && !mapiMachine.CreationTimestamp.After(&capiMachine.CreationTimestamp) {
         return r.reconcileMAPIMachinetoCAPIMachine(ctx, mapiMachine, capiMachine)
     }

The negative form (!After) is logically equivalent to (Equal || Before) and is more concise.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between ebbcf2d and debacfc.

📒 Files selected for processing (2)
  • pkg/controllers/machinesync/machine_sync_controller.go (5 hunks)
  • pkg/controllers/machinesync/machine_sync_controller_test.go (1 hunks)
🔇 Additional comments (6)
pkg/controllers/machinesync/machine_sync_controller.go (5)

194-194: LGTM! Appropriate linting suppression.

The addition of cyclop to the nolint directive is reasonable given the inherent complexity of the reconciliation logic with multiple conditional paths.


201-201: Good defensive check for InfraMachine existence.

The early existence check for the CAPI InfraMachine allows the controller to make informed decisions about reconciliation direction in the subsequent switch statement. Error handling is appropriate.

Also applies to: 257-262


996-1037: Good refactoring for improved code organization.

The refactored fetchCAPIInfraResources function now properly delegates the InfraMachine fetching to a dedicated helper (fetchCAPIInfraMachine), improving code organization and reusability. Error handling and validation logic are preserved correctly.


1039-1057: Well-designed helper function for fetching InfraMachine.

The fetchCAPIInfraMachine helper appropriately separates the concern of fetching the infrastructure machine. The decision to return nil for NotFound errors (rather than propagating the error) is correct, as the absence of an InfraMachine is a valid state that callers need to handle explicitly.


1059-1075: Robust existence check with appropriate fallback logic.

The doesCAPIInfraMachineExist helper correctly handles both scenarios:

  1. When the CAPI Machine exists - uses its infrastructureRef for the lookup
  2. When the CAPI Machine doesn't exist - falls back to using the MAPI Machine's name

This fallback is necessary because the helper is called before the switch statement that determines reconciliation paths. Error handling is appropriate.

pkg/controllers/machinesync/machine_sync_controller_test.go (1)

952-965: Test correctly validates the new InfraMachine creation behavior.

The test has been appropriately updated to verify the fix:

  1. Expects successful synchronization (ResourceSynchronized = True) instead of an error condition
  2. Explicitly verifies that the Cluster API InfraMachine is created

The test setup correctly simulates the scenario where:

  • MAPI Machine is created first with authoritativeAPI: ClusterAPI (line 936-941)
  • CAPI Machine exists (line 948)
  • InfraMachine does NOT exist

This matches the timestamp-based condition added in the controller (line 270), where MAPI's earlier creation timestamp triggers the MAPI→CAPI reconciliation path to create the missing InfraMachine.

@chrischdi chrischdi force-pushed the pr-machinesync-create-inframachine-if-not-exists branch from 566c680 to 733a848 Compare November 6, 2025 07:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between debacfc and 733a848.

📒 Files selected for processing (2)
  • pkg/controllers/machinesync/machine_sync_controller.go (5 hunks)
  • pkg/controllers/machinesync/machine_sync_controller_test.go (1 hunks)
🔇 Additional comments (4)
pkg/controllers/machinesync/machine_sync_controller_test.go (1)

952-965: LGTM! Test correctly validates InfraMachine creation.

The test expectations have been properly updated to validate the new behavior where the InfraMachine is created when a CAPI Machine exists. The test now:

  • Expects a successful synchronization condition (True with ResourceSynchronized)
  • Constructs and verifies the InfraMachine can be retrieved

This aligns well with the PR objective to ensure infra-machine creation during synchronization.

pkg/controllers/machinesync/machine_sync_controller.go (3)

997-1038: Well-structured refactoring of infra resource fetching.

The refactoring improves code organization by:

  • Separating infraCluster fetching from infraMachine fetching
  • Delegating infraMachine retrieval to the new fetchCAPIInfraMachine helper
  • Maintaining proper error handling and nil checks

This makes the code more maintainable and enables reuse of the infraMachine fetching logic.


1040-1076: New helper functions are well-designed.

The two new helper functions provide clean abstractions:

  1. fetchCAPIInfraMachine (lines 1040-1058): Fetches the InfraMachine and returns nil if not found, properly handling errors.

  2. doesCAPIInfraMachineExist (lines 1060-1076): Provides a boolean check for InfraMachine existence with appropriate fallback logic for name/namespace resolution.

The functions are appropriately scoped to the reconciler and handle edge cases well.


269-273: The timestamp comparison logic is correct and intentionally prevents stale MAPI from overriding CAPI state.

The condition !mapiMachine.CreationTimestamp.After(&capiMachine.CreationTimestamp) only creates an InfraMachine from MAPI if MAPI existed before or at the same time as CAPI. When a MAPI machine is created after CAPI (the edge case raised in the review), the condition fails and reconciliation falls through to reconcileCAPIMachinetoMAPIMachine at line 275, syncing from CAPI to MAPI instead.

This is the correct behavior: since status.authoritativeAPI = ClusterAPI designates CAPI as the source of truth, a MAPI machine created afterward should not override CAPI's state by recreating infrastructure resources. The sync flows from the authoritative source (CAPI) to the mirror (MAPI).

@chrischdi chrischdi force-pushed the pr-machinesync-create-inframachine-if-not-exists branch from 733a848 to 10f80ea Compare November 6, 2025 08:10
@chrischdi chrischdi force-pushed the pr-machinesync-create-inframachine-if-not-exists branch from 10f80ea to f10823d Compare November 6, 2025 12:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
pkg/controllers/machinesync/machine_sync_controller.go (3)

1013-1022: Be tolerant of missing infraRef.Namespace; default to Machine namespace and treat empty name as “not exists” in existence checks.

CAPI specs commonly omit InfrastructureRef.Namespace (same-namespace reference). Current validation turns this into a terminal error and the existence check can error on empty fields. Default the namespace, and in doesCAPIInfraMachineExist() return false (no error) when the ref is incomplete.

@@
-    infraMachineRef := capiMachine.Spec.InfrastructureRef
+    infraMachineRef := capiMachine.Spec.InfrastructureRef
@@
-    if infraMachineRef.Name == "" || infraMachineRef.Namespace == "" {
-        return nil, nil, fmt.Errorf("machine %s/%s: %w",
-            capiMachine.Namespace, capiMachine.Name, errInvalidInfraMachineReference)
-    }
+    if infraMachineRef.Name == "" {
+        return nil, nil, fmt.Errorf("machine %s/%s: %w",
+            capiMachine.Namespace, capiMachine.Name, errInvalidInfraMachineReference)
+    }
+    // Default to the machine namespace when infraRef namespace is omitted.
+    infraNamespace := infraMachineRef.Namespace
+    if infraNamespace == "" {
+        infraNamespace = capiMachine.Namespace
+    }
@@
-    infraMachine, err := r.fetchCAPIInfraMachine(ctx, infraMachineRef.Name, infraMachineRef.Namespace)
+    infraMachine, err := r.fetchCAPIInfraMachine(ctx, infraMachineRef.Name, infraNamespace)
@@
 func (r *MachineSyncReconciler) doesCAPIInfraMachineExist(ctx context.Context, capiMachine *clusterv1.Machine, mapiMachine *mapiv1beta1.Machine) (bool, error) {
-    namespace := r.CAPINamespace
-    name := mapiMachine.Name
-
-    if capiMachine != nil {
-        name = capiMachine.Spec.InfrastructureRef.Name
-        namespace = capiMachine.Spec.InfrastructureRef.Namespace
-    }
+    namespace := r.CAPINamespace
+    name := mapiMachine.Name
+    if capiMachine != nil {
+        ref := capiMachine.Spec.InfrastructureRef
+        if ref.Name == "" {
+            // Treat incomplete ref as “not exists” rather than error.
+            return false, nil
+        }
+        name = ref.Name
+        if ref.Namespace != "" {
+            namespace = ref.Namespace
+        } else {
+            namespace = capiMachine.Namespace
+        }
+    }

Also applies to: 1033-1039, 1061-1077


1101-1108: Unconditional error returns after RemoveFinalizer; always errors even on success.

Both branches return fmt.Errorf(...) regardless of err. This breaks deletion flows and can cause CI failures.

@@
-        _, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer)
-
-        return false, fmt.Errorf("failed to remove finalizer: %w", err)
+        changed, err := util.RemoveFinalizer(ctx, r.Client, mapiMachine, SyncFinalizer)
+        if err != nil {
+            return false, fmt.Errorf("failed to remove finalizer: %w", err)
+        }
+        return changed, nil
@@
-        _, err := util.RemoveFinalizer(ctx, r.Client, capiMachine, SyncFinalizer)
-
-        return false, fmt.Errorf("failed to remove finalizer: %w", err)
+        changed, err := util.RemoveFinalizer(ctx, r.Client, capiMachine, SyncFinalizer)
+        if err != nil {
+            return false, fmt.Errorf("failed to remove finalizer: %w", err)
+        }
+        return changed, nil

Also applies to: 1221-1228


235-235: Minor: double colon in error message.

Typo in format string.

-        return ctrl.Result{}, fmt.Errorf("failed to get Cluster API machine:: %w", err)
+        return ctrl.Result{}, fmt.Errorf("failed to get Cluster API machine: %w", err)
🧹 Nitpick comments (1)
pkg/controllers/machinesync/machine_sync_controller_unit_test.go (1)

81-92: Add coverage for omitted infraRef.Namespace and existence check edge.

Include a case where capiMachine.Spec.InfrastructureRef.Namespace is empty (default to machine namespace) and verify fetch succeeds; also add a unit for doesCAPIInfraMachineExist handling incomplete refs.

@@
         Context("and Infrastructure is present", func() {
             var objs []client.Object
             BeforeEach(func() {
                 objs = []client.Object{
@@
             })
@@
             It("should return infrastructure machine if it is present", func() {
@@
                 Expect(infraMachine).ToNot(BeNil())
             })
+
+            It("should default infraRef namespace to machine namespace when omitted", func() {
+                // Omit namespace in infraRef
+                capiMachine.Spec.InfrastructureRef.Namespace = ""
+                objs = append(objs, &awsv1.AWSCluster{
+                    ObjectMeta: metav1.ObjectMeta{
+                        Name:      capiMachine.Spec.ClusterName,
+                        Namespace: capiMachine.Namespace,
+                    },
+                }, &awsv1.AWSMachine{
+                    ObjectMeta: metav1.ObjectMeta{
+                        Name:      capiMachine.Name,
+                        Namespace: capiMachine.Namespace, // same as machine
+                    },
+                })
+                reconciler.Client = fake.NewClientBuilder().WithScheme(testEnv.Scheme).WithObjects(objs...).Build()
+                infraCluster, infraMachine, err := reconciler.fetchCAPIInfraResources(ctx, capiMachine)
+                Expect(err).ToNot(HaveOccurred())
+                Expect(infraCluster).ToNot(BeNil())
+                Expect(infraMachine).ToNot(BeNil())
+            })

Optionally, add a focused unit for doesCAPIInfraMachineExist() where infraRef.Name is empty and expect (false, nil).

Also applies to: 94-101, 103-116

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 10f80ea and f10823d.

📒 Files selected for processing (3)
  • pkg/controllers/machinesync/machine_sync_controller.go (5 hunks)
  • pkg/controllers/machinesync/machine_sync_controller_test.go (1 hunks)
  • pkg/controllers/machinesync/machine_sync_controller_unit_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/controllers/machinesync/machine_sync_controller_test.go

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2025

@chrischdi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-capi-techpreview ebbcf2d link true /test e2e-aws-capi-techpreview
ci/prow/e2e-openstack-ovn-techpreview ebbcf2d link true /test e2e-openstack-ovn-techpreview
ci/prow/e2e-aws-ovn-techpreview ebbcf2d link true /test e2e-aws-ovn-techpreview
ci/prow/e2e-gcp-ovn-techpreview ebbcf2d link true /test e2e-gcp-ovn-techpreview
ci/prow/okd-scos-e2e-aws-ovn f10823d link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants