CNTRLPLANE-507: Add HCP finalizer to AWSEndpointService reconciler by hypershift-jira-solve-ci[bot] · Pull Request #8499 · openshift/hypershift

hypershift-jira-solve-ci · 2026-05-13T02:39:49Z

What this PR does / why we need it:

Adds a finalizer on the HostedControlPlane resource from the AWSEndpointService reconciler to prevent HCP deletion before AWS PrivateLink resources are cleaned up.

Problem: When the CPO restarts during deletion of a SharedVPC cluster, the clientBuilder is uninitialized and the HCP (with its cross-account role ARNs) may already be deleted. This causes the reconciler to fail creating AWS clients, and after a 10-minute grace period the hypershift-operator force-removes the CPO finalizer — orphaning VPC endpoints, security groups, and DNS records in the shared VPC account.

Solution: The new HCP finalizer (hypershift.openshift.io/aws-private-link-endpoint-cleanup) follows the same pattern used by the Azure PLS controller:

Adds the finalizer to the HCP during normal reconciliation
When HCP deletion is detected, initializes AWS clients from the still-available HCP
Cleans up each AWSEndpointService's AWS resources and removes CR finalizers
Removes the HCP finalizer only after all AWSEndpointService CRs are cleaned up
Extends the HCP watch handler (enqueueOnHCPChange) to also trigger reconciliation when an HCP is being deleted with the finalizer present

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-507

Special notes for your reviewer:

This follows the same finalizer pattern already established by the Azure PLS controller
The enqueueOnHCPChange handler (renamed from enqueueOnAccessChange) now triggers on both EndpointAccess changes and HCP deletions with the finalizer
AWS client initialization during HCP deletion reuses the existing getAWSClient helper, sourcing credentials from the still-available HCP spec

Checklist:

Subject and description added to both, commit and PR.
Relevant issues have been referenced.
This change includes docs.
This change includes unit tests.

Always review AI generated responses prior to use.
Generated with Claude Code via /jira:solve [CNTRLPLANE-507](https://redhat.atlassian.net/browse/CNTRLPLANE-507)

Note: This PR was auto-generated by the jira-agent periodic CI job in response to CNTRLPLANE-507. See the full report for token usage, cost breakdown, and detailed phase output.

Summary by CodeRabbit

Bug Fixes
- Improved AWS PrivateLink deletion cleanup by coordinating finalizers between HostedControlPlane (HCP) and related endpoint service CRs.
- Ensured HCP deletion reconciliation runs reliably across controller restarts, without racing the endpoint-service deletion path.
- Added safer requeue behavior on Kubernetes conflicts and dependency-violation scenarios to avoid premature finalizer removal.
Tests
- Added unit tests validating HCP finalizer management, deletion cleanup coordination, reconciliation request mapping/enqueue logic, and error/requeue handling for Kubernetes and AWS failures.

openshift-merge-bot · 2026-05-13T02:39:51Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci-robot · 2026-05-13T02:39:52Z

@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-507 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Adds a finalizer on the HostedControlPlane resource from the AWSEndpointService reconciler to prevent HCP deletion before AWS PrivateLink resources are cleaned up.

Problem: When the CPO restarts during deletion of a SharedVPC cluster, the clientBuilder is uninitialized and the HCP (with its cross-account role ARNs) may already be deleted. This causes the reconciler to fail creating AWS clients, and after a 10-minute grace period the hypershift-operator force-removes the CPO finalizer — orphaning VPC endpoints, security groups, and DNS records in the shared VPC account.

Solution: The new HCP finalizer (hypershift.openshift.io/aws-private-link-endpoint-cleanup) follows the same pattern used by the Azure PLS controller:

Adds the finalizer to the HCP during normal reconciliation

When HCP deletion is detected, initializes AWS clients from the still-available HCP

Cleans up each AWSEndpointService's AWS resources and removes CR finalizers

Removes the HCP finalizer only after all AWSEndpointService CRs are cleaned up

Extends the HCP watch handler (enqueueOnHCPChange) to also trigger reconciliation when an HCP is being deleted with the finalizer present

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-507

Special notes for your reviewer:

This follows the same finalizer pattern already established by the Azure PLS controller

The enqueueOnHCPChange handler (renamed from enqueueOnAccessChange) now triggers on both EndpointAccess changes and HCP deletions with the finalizer

AWS client initialization during HCP deletion reuses the existing getAWSClient helper, sourcing credentials from the still-available HCP spec

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Always review AI generated responses prior to use.
Generated with Claude Code via /jira:solve [CNTRLPLANE-507](https://redhat.atlassian.net/browse/CNTRLPLANE-507)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-05-13T02:40:02Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This change adds an HCP-scoped AWS PrivateLink finalizer, updates HostedControlPlane event handling to enqueue AWSEndpointService reconciliations, and splits reconciliation into normal and HCP-deletion paths. The deletion path initializes AWS clients from the HCP, cleans up AWS resources, removes the AWSEndpointService finalizer, and clears the HCP finalizer after dependent CRs are done. Tests cover finalizer patching, deletion handling, client errors, and mapping behavior.

Possibly related PRs

openshift/hypershift#7868: Also changes awsprivatelink_controller.go deletion reconciliation and AWS cleanup/retry behavior around DependencyViolation.

Suggested reviewers

devguyio
enxebre
muraee

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name	Status	Explanation	Resolution
Container-Privileges	❌ Error	New YAML manifests add hostPID/hostNetwork/privileged and allowPrivilegeEscalation=true in kubelet-config, kubevirt CSI, and e2e pod files.	Remove or justify the privileged settings; use least-privilege securityContext where possible, or isolate these manifests with explicit exemption.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly matches the main change: adding an HCP finalizer to the AWSEndpointService reconciler.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	Added test titles are static and descriptive; no new titles embed generated names, timestamps, UUIDs, or other run-to-run values.
Test Structure And Quality	✅ Passed	Tests are table-driven fake-client unit tests, with no Ginkgo waits or cluster resources; they follow the repo’s existing testing style.
Topology-Aware Scheduling Compatibility	✅ Passed	Only AWSEndpointService reconciliation/finalizer logic changed; no node selectors, affinity, spread constraints, replicas, or manifests were added.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Only Go unit tests were added; no new Ginkgo e2e tests or external-network dependencies were introduced.
No-Weak-Crypto	✅ Passed	No MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons appear in the changed controller/test code.
No-Sensitive-Data-In-Logs	✅ Passed	PASS: The added HCP-finalizer logs are generic status messages; I found no passwords, tokens, PII, or other clearly sensitive data in the new logging.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

openshift-ci · 2026-05-13T02:40:12Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

codecov · 2026-05-13T02:44:04Z

Codecov Report

❌ Patch coverage is 73.54839% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.10%. Comparing base (bc3bda9) to head (7355f0e).
⚠️ Report is 135 commits behind head on main.

Files with missing lines	Patch %	Lines
...ollers/awsprivatelink/awsprivatelink_controller.go	73.54%	39 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8499      +/-   ##
==========================================
+ Coverage   42.55%   43.10%   +0.54%     
==========================================
  Files         768      773       +5     
  Lines       95297    96302    +1005     
==========================================
+ Hits        40558    41507     +949     
+ Misses      51932    51911      -21     
- Partials     2807     2884      +77

Files with missing lines	Coverage Δ
...ollers/awsprivatelink/awsprivatelink_controller.go	`47.25% <73.54%> (+5.60%)`	⬆️

... and 52 files with indirect coverage changes

Flag	Coverage Δ
cmd-support	`36.08% <ø> (+0.62%)`	⬆️
cpo-hostedcontrolplane	`45.31% <ø> (+0.47%)`	⬆️
cpo-other	`45.54% <73.54%> (+0.83%)`	⬆️
hypershift-operator	`53.58% <ø> (+0.53%)`	⬆️
other	`31.68% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hypershift-jira-solve-ci · 2026-05-25T20:07:25Z

Re: #8499 (comment)

Thanks for the walkthrough summary — the description and sequence diagram accurately capture the changes.

Regarding the docstring coverage warning: noted. This is a new controller flow with several helper functions; docstrings can be added in a follow-up if the team considers it valuable for this area.

AI-assisted response via Claude Code

bryan-cox

Staff Engineer Review: Add HCP finalizer to AWSEndpointService reconciler

The core idea of this PR is correct and addresses a real operational problem — preventing AWS PrivateLink resource orphaning when the HCP is deleted before the AWSEndpointService cleanup runs. However, the implementation diverges from the Azure PLS pattern in ways that introduce correctness risks.

Blocking Issues (3)

See inline comments for details on:

Dual deletion paths compete — existing CR deletion path and new HCP deletion path both remove the CR finalizer
Multi-CR coordination under concurrency — convergent but produces unnecessary work with MaxConcurrentReconciles: 10
UpdateFunc misses HCP deletions on controller restart — defeats the purpose of the PR

Open Questions (2)

Does the hypershift-operator's force-finalizer-removal logic (10-minute grace) know about this new aws-private-link-endpoint-cleanup finalizer? If not, the HCP could get stuck indefinitely.
The finalizer is added for ALL AWS PrivateLink clusters, not just SharedVPC. Is the broader scope intentional?

Praise

Test coverage is excellent — 784 lines of well-structured table-driven tests with gomock and client interceptors covering all new paths. The context.Background() → ctx fix in the handler is a good improvement.

bryan-cox · 2026-06-17T13:49:46Z

 			MaxConcurrentReconciles: 10,
 		}).
-		Watches(&hyperv1.HostedControlPlane{}, handler.Funcs{UpdateFunc: r.enqueueOnAccessChange(mgr)}).
+		Watches(&hyperv1.HostedControlPlane{}, handler.Funcs{UpdateFunc: r.enqueueOnHCPChange(mgr)}).


[blocking] UpdateFunc misses HCP deletions on controller restart

Using handler.Funcs{UpdateFunc: ...} means only Update events trigger this handler. If the CPO restarts while an HCP is being deleted (DeletionTimestamp already set), the informer cache sync generates a Create event — not an Update — so this handler never fires.

The Azure PLS controller avoids this by using handler.EnqueueRequestsFromMapFunc(...), which receives all event types (Create, Update, Delete) from the informer. On restart, it gets a Create event for the HCP with DeletionTimestamp set and correctly enqueues the CRs.

With the current approach, if the CPO restarts mid-HCP-deletion, the new handler will NOT fire. The reconciler would fall through to the existing AWSEndpointService CR deletion path — exactly the scenario this PR is trying to fix.

Recommendation: Switch to handler.EnqueueRequestsFromMapFunc(...) to match the Azure PLS pattern.

Done. Switched from handler.Funcs{UpdateFunc: ...} to handler.EnqueueRequestsFromMapFunc(...) matching the Azure PLS pattern. The new mapHCPToAWSEndpointService() function receives all event types (Create, Update, Delete), so on controller restart the informer cache sync's Create event now correctly triggers cleanup for an HCP with DeletionTimestamp already set.

The EndpointAccess change detection (previously via old/new comparison in UpdateFunc) is dropped from the handler — those changes are picked up by the reconciler's existing 5-minute periodic requeue, which is acceptable since EndpointAccess changes are rare operational events.

AI-assisted response via Claude Code

Done. Switched from handler.Funcs{UpdateFunc: ...} to handler.EnqueueRequestsFromMapFunc(r.mapHCPToAWSEndpointService()) matching the Azure PLS pattern exactly. The new mapHCPToAWSEndpointService() MapFunc receives all event types (Create, Update, Delete), so on controller restart the informer cache sync's Create event correctly triggers cleanup for an HCP with DeletionTimestamp already set.

The MapFunc filters by finalizer presence (controllerutil.ContainsFinalizer(hcp, hcpAWSPrivateLinkFinalizerName)) to avoid unnecessary reconciliations, matching the Azure PLS approach. EndpointAccess change detection is dropped from the handler — those changes are picked up by the reconciler's existing 5-minute periodic requeue.

Tests updated: replaced TestEnqueueOnHCPChange (which tested the old UpdateFunc) with TestMapHCPToAWSEndpointService (which tests the new MapFunc directly).

AI-assisted response via Claude Code

bryan-cox · 2026-06-17T13:49:46Z

+	// Handle HCP deletion: clean up AWS resources while HCP credentials are still valid.
+	if !hcp.DeletionTimestamp.IsZero() {
+		return r.reconcileHCPDeletion(ctx, awsEndpointService, hcp, log)
+	}


[blocking] Dual deletion paths can compete

The existing AWSEndpointService CR deletion path (lines 466-486 in the diff) runs when the CR itself has a DeletionTimestamp and also removes the CR finalizer + calls r.delete(). This new HCP deletion path at line 534 also removes the CR finalizer + calls r.delete().

These two paths can activate simultaneously during namespace deletion or HCP ownership-based cascading. Consider:

HCP deletion triggers enqueueOnHCPChange, enqueuing all CRs

Namespace/owner cascade sets DeletionTimestamp on the CRs themselves

A reconcile fires for a CR that has BOTH its own DeletionTimestamp AND the HCP is being deleted

The CR enters the existing deletion path (step 1), which initializes from HCP and cleans up

Another reconcile enters this HCP deletion path

The existing CR deletion path (line 466) does return early before reaching this check, so they are technically exclusive within a single reconcile call. But with MaxConcurrentReconciles: 10, two concurrent reconciles for the same CR could race.

Suggestion: Add an explicit guard here: if !awsEndpointService.DeletionTimestamp.IsZero() { return ctrl.Result{}, nil } to make the exclusion explicit and defend against concurrent reconciles.

Done. Added explicit guard at the top of reconcileHCPDeletion:

if !awsEndpointService.DeletionTimestamp.IsZero() { return ctrl.Result{}, nil }

This makes the exclusion between the two deletion paths explicit and defends against concurrent reconciles under MaxConcurrentReconciles: 10. If the CR itself is being deleted, we defer to the existing CR deletion path at the top of Reconcile.

AI-assisted response via Claude Code

bryan-cox · 2026-06-17T13:49:46Z

+	// our finalizer blocks HCP deletion.
+	r.awsClientBuilder.initializeWithHCP(log, hcp)
+	ec2Client, route53Client, err := r.awsClientBuilder.getClients(ctx)
+	if err != nil {


[blocking] Multi-CR coordination needs documentation or simplification

With MaxConcurrentReconciles: 10 and enqueueOnHCPChange enqueuing ALL CRs, multiple reconcilers race through reconcileHCPDeletion concurrently. Each one cleans up its own CR, then checks if all others are done. The last one to finish removes the HCP finalizer, while earlier finishers return RequeueAfter: 5s and re-enter this path only to find the HCP finalizer already removed.

This convergent pattern is functionally correct, but:

It produces unnecessary requeues and reconcile loops

It is not documented, making it hard for future maintainers to reason about

The Azure PLS controller avoids this entirely because it has MaxConcurrentReconciles: 1 and only one CR per namespace

Suggestion: At minimum, add a comment explaining the convergent behavior. Alternatively, consider having only the CR whose cleanup triggers len(pendingCRs) == 0 remove the HCP finalizer, and have all others simply return ctrl.Result{} after their own cleanup.

Done. Added comprehensive documentation on the reconcileHCPDeletion function explaining the convergent multi-CR coordination pattern:

Multiple reconcilers run concurrently (one per AWSEndpointService CR)

Each cleans up its own CR, then checks if all CRs are done

Only the last reconciler to finish removes the HCP finalizer

Earlier finishers see pending CRs, return RequeueAfter, and on re-entry find the finalizer already removed

The comment explains this produces a small number of no-op requeues but is correct and self-healing. This is functionally similar to how the Azure PLS controller works, but documented explicitly because the AWS controller has MaxConcurrentReconciles: 10 and multiple CRs per namespace (unlike Azure PLS's MaxConcurrentReconciles: 1 with one CR per namespace).

AI-assisted response via Claude Code

Done. Added comprehensive documentation on reconcileHCPDeletion explaining the convergent multi-CR coordination pattern:

Multiple reconcilers run concurrently (one per AWSEndpointService CR) under MaxConcurrentReconciles: 10

Each cleans up its own CR's AWS resources, removes the CR finalizer, then checks if all CRs are done

The last reconciler to finish (seeing len(pendingCRs) == 0) removes the HCP finalizer

Earlier finishers see pending CRs, return RequeueAfter, and on re-entry find the HCP finalizer already removed

The comment explicitly contrasts this with the Azure PLS controller (MaxConcurrentReconciles: 1, one CR per namespace) to explain why this convergent pattern is necessary for the AWS controller.

AI-assisted response via Claude Code

bryan-cox · 2026-06-17T13:49:46Z

+	controllerutil.AddFinalizer(hcp, hcpAWSPrivateLinkFinalizerName)
+	if err := r.Patch(ctx, hcp, client.MergeFromWithOptions(originalHCP, client.MergeFromWithOptimisticLock{})); err != nil {
+		if apierrors.IsConflict(err) {
+			return ctrl.Result{Requeue: true}, nil


[suggestion] Use RequeueAfter: time.Second instead of Requeue: true on conflicts

The Azure PLS equivalent returns ctrl.Result{RequeueAfter: time.Second} on conflict (see controller.go line 371). Using Requeue: true risks a tight retry loop under contention when multiple AWSEndpointService reconcilers are concurrently trying to patch the same HCP.

Same applies to the conflict handling in ensureHCPFinalizer (line 558).

Done. Changed both conflict-handling sites to ctrl.Result{RequeueAfter: time.Second}:

ensureHCPFinalizer (adding finalizer)

reconcileHCPDeletion (removing finalizer)

Both now include a comment explaining the rationale: avoiding tight retry loops when multiple AWSEndpointService reconcilers concurrently try to patch the same HCP.

Note: the Azure PLS controller also uses Requeue: true for conflicts (controller.go line 371), so this change makes the AWS controller stricter than Azure PLS.

AI-assisted response via Claude Code

Done. Changed both conflict-handling sites to ctrl.Result{RequeueAfter: time.Second}:

ensureHCPFinalizer (adding finalizer)

reconcileHCPDeletion (removing finalizer)

Both now include a comment explaining the rationale: avoiding tight retry loops when multiple AWSEndpointService reconcilers concurrently try to patch the same HCP. Tests updated to assert RequeueAfter > 0 instead of Requeue == true.

AI-assisted response via Claude Code

bryan-cox · 2026-06-17T13:49:46Z

-		// TestReconcileDeletionSharedVPC for details.
+		// The HCP finalizer (hcpAWSPrivateLinkFinalizerName) added during normal
+		// reconciliation ensures the HCP remains available during this cleanup.
+		// For SharedVPC clusters, this guarantees the cross-account role ARNs can


[suggestion] Comment overstates the guarantee

This comment claims the HCP finalizer "ensures the HCP remains available during this cleanup." That is only true after a successful normal reconciliation has added the finalizer. If a cluster is newly created and the controller has not yet reconciled (e.g., controller was down), the HCP can still be deleted before the AWSEndpointService cleanup runs — the old scenario.

Consider acknowledging this edge case rather than stating the guarantee unconditionally.

Done. Updated the comment to acknowledge the edge case. The new wording states that the finalizer "when present, blocks HCP deletion" and explicitly notes that it's only added after a successful normal reconciliation — if the controller hasn't reconciled yet (e.g., was down since cluster creation), the HCP may be deleted before the finalizer is placed, and the best-effort initialization is the only protection in that case.

AI-assisted response via Claude Code

Done. Updated the comment to acknowledge the edge case. The new wording states that the finalizer, "when present, blocks HCP deletion" and explicitly notes that it's only added after a successful normal reconciliation — if the controller hasn't reconciled yet (e.g., was down since cluster creation), the HCP may be deleted before the finalizer is placed, and the best-effort initialization is the only protection in that case.

AI-assisted response via Claude Code

bryan-cox · 2026-06-17T13:49:46Z

+		}
+
+		// Enqueue when EndpointAccess changes (existing behavior).
 		if newHCP.Spec.Platform.AWS != nil && oldHCP.Spec.Platform.AWS != nil && newHCP.Spec.Platform.AWS.EndpointAccess != oldHCP.Spec.Platform.AWS.EndpointAccess {


[suggestion] Filter deletion trigger to transition only

Once the HCP finalizer is added, ANY HCP update with a DeletionTimestamp will re-enqueue all CRs. During HCP deletion, status updates from other controllers will repeatedly trigger this, producing unnecessary list+enqueue cycles.

Consider adding oldHCP.DeletionTimestamp.IsZero() to the condition so it only fires on the transition to deletion:

if oldHCP.DeletionTimestamp.IsZero() && !newHCP.DeletionTimestamp.IsZero() && controllerutil.ContainsFinalizer(newHCP, hcpAWSPrivateLinkFinalizerName) {

Addressed by switching to `EnqueueRequestsFromMapFunc` (per comment 1). With the new `mapHCPToAWSEndpointService()` MapFunc, we can no longer detect the deletion transition (old vs new DeletionTimestamp) because MapFunc only receives the current object, not old/new.

However, the filtering is achieved differently: the MapFunc only fires when the HCP has our finalizer (`controllerutil.ContainsFinalizer(hcp, hcpAWSPrivateLinkFinalizerName)`). Once all CRs are cleaned up and the HCP finalizer is removed, subsequent HCP updates no longer trigger CR enqueues. During the short deletion window, the repeated enqueues from status updates are harmless since the reconciler is idempotent — CRs that are already cleaned up return early. This matches the Azure PLS pattern exactly.

AI-assisted response via Claude Code

Addressed by switching to EnqueueRequestsFromMapFunc (per comment 1). With the new mapHCPToAWSEndpointService() MapFunc, we can no longer detect the deletion transition (old vs new DeletionTimestamp) because MapFunc only receives the current object, not old/new.

However, the filtering is achieved differently: the MapFunc only fires when the HCP has our finalizer (controllerutil.ContainsFinalizer(hcp, hcpAWSPrivateLinkFinalizerName)). Once all CRs are cleaned up and the HCP finalizer is removed, subsequent HCP updates no longer trigger CR enqueues. During the short deletion window, the repeated enqueues from status updates are harmless since the reconciler is idempotent — CRs that are already cleaned up return early. This matches the Azure PLS pattern exactly.

AI-assisted response via Claude Code

hypershift-jira-solve-ci · 2026-06-17T14:47:09Z

Re: #8499 (review)

@bryan-cox Thank you for the thorough staff engineer review. All blocking issues and suggestions have been addressed — see inline replies for details on each change.

Summary of changes

#	Comment	Action
1	[blocking] UpdateFunc misses HCP deletions on restart	Switched to `EnqueueRequestsFromMapFunc` matching Azure PLS pattern
2	[blocking] Dual deletion paths can compete	Added `DeletionTimestamp.IsZero()` guard in `reconcileHCPDeletion`
3	[blocking] Multi-CR coordination needs documentation	Added comprehensive convergent-pattern docs on `reconcileHCPDeletion`
4	[suggestion] Use RequeueAfter on conflicts	Changed to `RequeueAfter: time.Second` in both conflict handlers
5	[suggestion] Comment overstates guarantee	Acknowledged edge case where finalizer isn't yet placed
6	[suggestion] Filter deletion trigger	Handled via finalizer-based filtering in new MapFunc (matches Azure PLS)

Answers to open questions

Q: Does the hypershift-operator's force-finalizer-removal logic know about this new finalizer?
The HO's deleteAWSEndpointServices function handles the CPO finalizer on AWSEndpointService CRs (the 10-minute grace period), not HCP-level finalizers. The new aws-private-link-endpoint-cleanup finalizer is on the HCP itself. There is no HO-side force-removal for this finalizer — it relies on the CPO being healthy to clean up and remove it. This is the same design as the Azure PLS controller's azure-pls-endpoint-cleanup HCP finalizer, which also has no HO-side fallback. If the CPO is permanently down, the HCP would get stuck. For production safety, this could be addressed separately by adding HCP-level finalizer timeout logic to the HO, but that would be a cross-platform concern affecting both AWS and Azure.

Q: Is the broader scope (all AWS PrivateLink clusters, not just SharedVPC) intentional?
Yes, intentional. While the orphan risk is most acute for SharedVPC clusters (where AWS clients fail entirely without the HCP's role ARNs), non-SharedVPC clusters also benefit: the finalizer ensures the clientBuilder can always be re-initialized from the HCP on controller restart during deletion, rather than relying on the best-effort initialization path. The Azure PLS controller applies its HCP finalizer to all Azure PLS clusters regardless of configuration, which is the pattern we're following.

AI-assisted response via Claude Code

hypershift-jira-solve-ci · 2026-06-17T14:53:58Z

Re: #8499 (review)

Thank you for the thorough staff engineer review, @bryan-cox. All 6 inline comments have been addressed:

Blocking Issues (3) — All Fixed

UpdateFunc misses HCP deletions on controller restart — Switched from handler.Funcs{UpdateFunc: ...} to handler.EnqueueRequestsFromMapFunc(r.mapHCPToAWSEndpointService()) matching the Azure PLS pattern. Create events from informer cache sync now correctly trigger cleanup.
Dual deletion paths can compete — Added explicit DeletionTimestamp guard at the top of reconcileHCPDeletion. If the CR itself is being deleted, we defer to the existing CR deletion path.
Multi-CR coordination needs documentation — Added comprehensive comment block on reconcileHCPDeletion documenting the convergent pattern: multiple reconcilers run concurrently, each cleans up its own CR, and the last one to finish removes the HCP finalizer. Explicitly contrasts with Azure PLS's simpler model.

Suggestions (3) — All Fixed

Use RequeueAfter: time.Second on conflicts — Changed both ensureHCPFinalizer and reconcileHCPDeletion conflict handlers. Tests updated accordingly.
Comment overstates the guarantee — Updated comment to say "when present, blocks HCP deletion" and acknowledges the controller-hasn't-reconciled-yet edge case.
Filter deletion trigger to transition only — Handled by the switch to EnqueueRequestsFromMapFunc. The MapFunc filters by finalizer presence, and once the finalizer is removed, no more enqueues occur.

All tests pass. Ready for re-review.

AI-assisted response via Claude Code

bryan-cox · 2026-06-17T15:12:20Z

/test e2e-aws

bryan-cox · 2026-06-18T10:14:39Z

e2e-aws evidence that the HCP finalizer is working

Job: pull-ci-openshift-hypershift-main-e2e-aws/2067264178355179520

Finalizer added during reconciliation (CPO logs)

Both private cluster tests show the CPO logging "Adding HCP finalizer for AWS PrivateLink cleanup":

TestCreateClusterPrivate — CPO log (grep for Adding HCP finalizer):

{"level":"info","ts":"2026-06-17T17:02:21Z","msg":"Adding HCP finalizer for AWS PrivateLink cleanup","controller":"awsendpointservice","AWSEndpointService":{"name":"kube-apiserver-private","namespace":"e2e-clusters-5w84p-private-xcn67"}}

TestCreateClusterPrivateWithRouteKAS — CPO log (grep for Adding HCP finalizer):

{"level":"info","ts":"2026-06-17T17:03:11Z","msg":"Adding HCP finalizer for AWS PrivateLink cleanup","controller":"awsendpointservice","AWSEndpointService":{"name":"private-router","namespace":"e2e-clusters-5jz4x-private-22vkq"}}

Finalizer present on HCP resources (YAML dumps)

All three HCPs in the dump have hypershift.openshift.io/aws-private-link-endpoint-cleanup in their finalizers:

private-xcn67.yaml (TestCreateClusterPrivate)
private-22vkq.yaml (TestCreateClusterPrivateWithRouteKAS)
create-cluster-4wcxx.yaml (TestCreateCluster — public endpoint, confirms broad scope)

Clean teardown (destroy logs)

Both private cluster tests destroyed successfully with no errors — the finalizer did not block teardown:

TestCreateClusterPrivate/destroy.log — "Successfully destroyed cluster and infrastructure"
TestCreateClusterPrivateWithRouteKAS/destroy.log — "Successfully destroyed cluster and infrastructure"

Limitation

The reconcileHCPDeletion path ("HCP is being deleted" / "All AWS PrivateLink resources cleaned up, removing HCP finalizer") is not captured in the dump because the dump is collected pre-destroy and the CPO pod is terminated during destroy. The clean teardown is indirect evidence the finalizer was properly removed. The deletion path is covered by unit tests.

openshift-merge-bot · 2026-06-18T10:18:48Z

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

openshift-ci · 2026-06-18T10:18:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox, hypershift-jira-solve-ci[bot]

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [bryan-cox]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bryan-cox · 2026-06-18T10:19:30Z

/lgtm cancel

bryan-cox · 2026-06-24T13:45:52Z

/rebase

github-actions · 2026-06-24T13:46:05Z

🤖 Rebasing PR onto main: workflow run

bryan-cox · 2026-06-24T14:28:02Z

lint is failing

bryan-cox · 2026-06-24T14:28:11Z

/address-review-comments

github-actions · 2026-06-24T14:28:20Z

🤖 Addressing review comments: workflow run

hypershift-jira-solve-ci · 2026-06-24T14:45:41Z

Now I have a complete understanding. Here's my analysis:

Test Failure Analysis Complete

Job Information

Prow Job: lint / Lint (GitHub Actions reusable workflow lint-reusable.yaml)
Build ID: 28105142093 / Job 83217226103
PR: CNTRLPLANE-507: Add HCP finalizer to AWSEndpointService reconciler #8499 — CNTRLPLANE-507: Add HCP finalizer to AWSEndpointService reconciler
Step Failed: Run make lint (golangci-lint via gocyclo linter)

Test Failure Analysis

Error

control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go:435:1:
cyclomatic complexity 31 of func `(*AWSEndpointServiceReconciler).Reconcile` is high (> 30) (gocyclo)

Summary

The golangci-lint gocyclo linter reports that the Reconcile method in awsprivatelink_controller.go has a cyclomatic complexity of 31, exceeding the configured maximum threshold of 30. The PR added two new branching code paths inside the Reconcile function — an HCP deletion check (if !hcp.DeletionTimestamp.IsZero()) and an HCP finalizer ensurance call (if result, err := r.ensureHCPFinalizer(...)) — which pushed the already-at-limit complexity from ≤30 to 31.

Root Cause

The PR adds HCP finalizer lifecycle management directly into the Reconcile function body at line 435 of awsprivatelink_controller.go. Specifically, two new if branches were inserted:

HCP deletion check (around line 533): if !hcp.DeletionTimestamp.IsZero() { return r.reconcileHCPDeletion(...) } — adds 1 decision point.
HCP finalizer ensurance (around line 549): if result, err := r.ensureHCPFinalizer(...); err != nil || !result.IsZero() { return result, err } — adds 1–2 decision points (the || counts as an additional branch).

While the new helper functions (ensureHCPFinalizer, reconcileHCPDeletion, mapHCPToAWSEndpointService) are properly extracted into separate methods, the dispatch logic inside Reconcile itself still adds branching. The Reconcile function was already at or near the complexity limit of 30 before this PR, and these additions pushed it to 31.

The project's .golangci.yml configures gocyclo with a max complexity of 30. The main (non-API) lint run processed 1,077 raw issues down to exactly 1 — this single gocyclo violation — causing golangci-lint to exit with code 1 and make lint to fail with exit code 1 (escalated to exit code 2 by the shell).

Recommendations

Extract early-return blocks from Reconcile into a helper: Move the HCP-lookup, deletion-timestamp-check, paused-check, and client-initialization block into a single setup/dispatch helper (e.g., reconcileSetup or prepareReconciliation) that returns the HCP, clients, and whether to short-circuit. This would remove 3–5 decision points from Reconcile.
Consolidate the deletion-path dispatch: The Reconcile function currently has two separate deletion code paths — one for the AWSEndpointService CR deletion (early in the function) and one for HCP deletion (newly added). Consider unifying these into a single handleDeletion dispatch at the top of Reconcile that routes to the appropriate cleanup path, reducing the top-level branching.
Simplest fix — extract the new code into an existing helper: Move the HCP deletion check and finalizer ensurance into a reconcileNonDeletion helper that wraps the normal reconciliation path. The Reconcile function would become: handle CR deletion → call reconcileNonDeletion. This is the minimal change to stay under the complexity limit.

Evidence

Evidence	Detail
Lint error	`cyclomatic complexity 31 of func (*AWSEndpointServiceReconciler).Reconcile is high (> 30) (gocyclo)`
File	`control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go:435:1`
Linter	`gocyclo` (via golangci-lint v2.11.4)
Threshold	Max cyclomatic complexity: 30 (configured in `.golangci.yml`)
Violation amount	Complexity 31 — exceeds limit by 1
New branches added	`!hcp.DeletionTimestamp.IsZero()` check + `ensureHCPFinalizer` error/result check with `\|\|`
Issues processed	1,077 raw → 1 after filtering (this single violation)
Exit code	`make lint` exited with code 1 → shell exit code 2
PR files changed	`awsprivatelink_controller.go` (+219/-33), `awsprivatelink_controller_test.go` (+736/-0)

cblecker

Good progress on the HCP finalizer approach. The convergent multi-CR coordination pattern is well-designed and well-documented. One critical ordering issue below that can deadlock HCP deletion.

cblecker · 2026-06-24T17:55:08Z

+		}
+
+		controllerutil.RemoveFinalizer(awsEndpointService, finalizer)
+		if err := r.Update(ctx, awsEndpointService); err != nil {


nit: Since you're already using Patch + MergeFromWithOptimisticLock + explicit conflict handling for the HCP finalizer removal at line 706, consider using the same approach here for consistency within reconcileHCPDeletion. The awsEndpointService object is the one fetched at Reconcile entry and could be stale after the AWS cleanup operations above.

Done. Switched to Patch + MergeFromWithOptimisticLock with explicit conflict handling, consistent with the HCP finalizer removal pattern.

AI-assisted response via Claude Code

The fix was applied to the AES finalizer removal within reconcileHCPDeletion (line 683) but the original CR deletion path here still uses r.Update. This path has the same concurrent-reconciler race window — the AES object is fetched at line 448, then AWS cleanup runs before we reach this Update at line 494.

Done. Switched the original CR deletion path to use Patch + MergeFromWithOptimisticLock with explicit conflict handling, matching the pattern in reconcileHCPDeletion. Also extracted the CR deletion logic into a reconcileCRDeletion helper to reduce Reconcile cyclomatic complexity.

AI-assisted response via Claude Code

cblecker · 2026-06-24T17:55:53Z

-		return ctrl.Result{}, fmt.Errorf("unexpected number of HostedControlPlanes in namespace, expected: 1, actual: %d", len(hcpList.Items))
+
+	// Handle HCP deletion: clean up AWS resources while HCP credentials are still valid.
+	if !hcp.DeletionTimestamp.IsZero() {


The CR-level finalizer is added unconditionally at line 502 (before the serviceName check at line 512), but this HCP deletion check is only reachable when serviceName != "" — the early return at line 512-516 blocks entry.

During HCP deletion, reconcileHCPDeletion lists all CRs in the namespace (line 693-694) and keeps the HCP finalizer if any CR still has the CR finalizer. A CR that was created but not yet populated with EndpointServiceName by hypershift-operator will have the CR finalizer (added at line 502) but can never reach this HCP deletion check — the serviceName guard returns first. This permanently blocks HCP deletion.

Consider either:

Moving this HCP fetch + deletion check before the serviceName guard (line 512), so CRs without serviceName can still enter the HCP deletion cleanup path, or

Moving the CR finalizer addition (lines 502-510) to after the serviceName check, so CRs without serviceName don't get the finalizer and don't block the HCP finalizer removal.

Done. Moved the CR finalizer addition to after the serviceName check so CRs not yet populated by hypershift-operator don't get a finalizer that would block HCP deletion.

AI-assisted response via Claude Code

bryan-cox · 2026-06-24T18:04:07Z

/address-review-comments

github-actions · 2026-06-24T18:04:14Z

🤖 Addressing review comments: workflow run

cblecker

Minor consistency note (not in the diff, so noting here): Line 515 still uses Requeue: true for the CR finalizer addition conflict, while all the new HCP finalizer operations (L612, L685, L716) use RequeueAfter: time.Second. Since this line was reorganized in this PR (moved after the serviceName check), it'd be a good opportunity to align it with the rest of the conflict handling in this file.

cblecker · 2026-06-24T18:40:55Z

+				return mockBuilder
+			},
+			clientInterceptors: interceptor.Funcs{
+				Patch: func(ctx context.Context, c crclient.WithWatch, obj crclient.Object, patch crclient.Patch, opts ...crclient.PatchOption) error {


This interceptor catches all Patch calls, not just HCP patches. It works because the test AES has no CR finalizer so the AES Patch at L683 is never reached — but if someone adds a finalizer to this test fixture later, the test would pass for the wrong reason. Consider filtering by type to match the pattern at L2232:

if _, ok := obj.(*hyperv1.HostedControlPlane); ok { return apierrors.NewConflict(...) } return c.Patch(ctx, obj, patch, opts...)

Done. Added HCP type filter to the patch interceptor, consistent with the AES interceptor pattern at L2232.

AI-assisted response via Claude Code

cblecker · 2026-06-24T18:41:04Z

+	// path (which returns early at the top of Reconcile) handle cleanup. This prevents
+	// the two deletion paths from racing under MaxConcurrentReconciles > 1, where the
+	// CR deletion path and HCP deletion path could both try to clean up the same CR.
+	if !awsEndpointService.DeletionTimestamp.IsZero() {


This guard is the main defense against the dual-deletion race (HCP deletion path vs CR deletion path under concurrent reconciliation), but none of the TestReconcileHCPDeletion test cases exercise it — all AES objects are created without DeletionTimestamp. A test case with a DeletionTimestamp-bearing AES (and a CR finalizer) would protect this guard against accidental removal. The bare mock (no expectations) would also catch any leaked AWS client calls, acting as a double safety net.

Done. Added test case "When AWSEndpointService is being deleted it should return early and let the CR deletion path handle cleanup" with a DeletionTimestamp-bearing AES that has the CR finalizer. The bare mock (no expectations) catches any leaked AWS client calls.

AI-assisted response via Claude Code

bryan-cox · 2026-06-25T11:06:07Z

/address-review-comments

github-actions · 2026-06-25T11:06:15Z

🤖 Addressing review comments: workflow run

github-actions · 2026-06-25T11:24:29Z

Minor consistency note (not in the diff, so noting here): Line 515 still uses Requeue: true for the CR finalizer addition conflict, while all the new HCP finalizer operations (L612, L685, L716) use RequeueAfter: time.Second. Since this line was reorganized in this PR (moved after the serviceName check), it'd be a good opportunity to align it with the rest of the conflict handling in this file.

Done. Aligned the CR finalizer conflict handling at L515 to use RequeueAfter: time.Second, consistent with all HCP finalizer operations in this file.

AI-assisted response via Claude Code

cblecker

Pre-existing nit (F2): In setFromHCP (line ~368), the if branch sets three fields (assumeSharedVPCEndpointRoleARN, assumeSharedVPCRoute53RoleARN, localZoneID) but the else branch only clears the two role ARNs — localZoneID is left stale. Unlikely to matter in practice (SharedVPC config isn't removed from a live HCP), but asymmetric cleanup is easy to fix: add b.localZoneID = "" at line 370.

cblecker · 2026-06-25T21:53:40Z

+
+		originalAES := awsEndpointService.DeepCopy()
+		controllerutil.RemoveFinalizer(awsEndpointService, finalizer)
+		if err := r.Patch(ctx, awsEndpointService, client.MergeFromWithOptions(originalAES, client.MergeFromWithOptimisticLock{})); err != nil {


The conflict handler here (returning RequeueAfter: time.Second) doesn't have a corresponding test case. The existing "removing AWSEndpointService finalizer fails" test at line 2237 uses fmt.Errorf, which exercises the generic error path (line 687), not the conflict path. The symmetric HCP Patch conflict is tested at line 2296.

Worth adding a test case with apierrors.NewConflict for AES objects — assert RequeueAfter == time.Second and err == nil.

Done. Added a test case "When removing AWSEndpointService finalizer returns conflict error it should requeue" to TestReconcileHCPDeletionClientErrors — it returns apierrors.NewConflict for AES Patch calls and asserts RequeueAfter == time.Second with err == nil.

AI-assisted response via Claude Code

cblecker · 2026-06-25T21:53:45Z

+// and return early at the top of this function. This convergent pattern produces a small
+// number of no-op requeues but is correct and self-healing.
+func (r *AWSEndpointServiceReconciler) reconcileHCPDeletion(ctx context.Context, awsEndpointService *hyperv1.AWSEndpointService, hcp *hyperv1.HostedControlPlane, log logr.Logger) (ctrl.Result, error) {
+	if !controllerutil.ContainsFinalizer(hcp, hcpAWSPrivateLinkFinalizerName) {


Consider adding a defensive guard at the top: if hcp.DeletionTimestamp.IsZero() { return ctrl.Result{}, nil }. The caller checks this at line 530, but the function name implies the precondition without enforcing it. If a future call site skips the check, this would clean up resources on a live cluster.

Done. Added if hcp.DeletionTimestamp.IsZero() { return ctrl.Result{}, nil } guard at the top of reconcileHCPDeletion to enforce the precondition explicitly.

AI-assisted response via Claude Code

+1 on the guard. One follow-up: the diff version included a "When HCP is not being deleted it should return early" test case that exercised this guard, but it appears to have been dropped during a rebase. All current TestReconcileHCPDeletion cases construct the HCP with DeletionTimestamp: &now, so this guard path has no coverage. Worth adding back to protect it from accidental removal.

Done. Added "When HCP is not being deleted it should return early" test case to TestReconcileHCPDeletion that constructs the HCP without DeletionTimestamp, exercising the hcp.DeletionTimestamp.IsZero() guard.

AI-assisted response via Claude Code

cblecker · 2026-06-26T02:52:37Z

/address-review-comments

github-actions · 2026-06-26T02:52:45Z

🤖 Addressing review comments: workflow run

github-actions

@cblecker Re: the pre-existing nit in setFromHCP — Done. Added b.localZoneID = "" in the else branch to symmetrically clear all three fields.

AI-assisted response via Claude Code

cblecker

Good pattern overall — the convergent multi-CR coordination is well-designed and the switch to EnqueueRequestsFromMapFunc correctly fixes the controller-restart scenario. A few ordering issues in the reconcile flow and some tests from the diff that appear to have been lost during rebasing.

cblecker · 2026-07-04T03:04:33Z

+
+	// Initialize AWS clients from the HCP — guaranteed to be available because
+	// our finalizer blocks HCP deletion.
+	r.awsClientBuilder.initializeWithHCP(log, hcp)


initializeWithHCP and getClients are called unconditionally here, before the CR finalizer check at L687. If getClients fails (transient STS/credential issue), the error return at L683 prevents the pending-CRs check from ever being reached — blocking HCP finalizer removal even when all CRs are already cleaned up and no AWS API calls are needed.

Moving the client initialization inside the if controllerutil.ContainsFinalizer(awsEndpointService, finalizer) block would let already-cleaned-up CRs proceed straight to the pending-CRs check.

Done. Moved initializeWithHCP and getClients inside the if controllerutil.ContainsFinalizer(awsEndpointService, finalizer) block so already-cleaned-up CRs proceed straight to the pending-CRs check without needing AWS API calls.

AI-assisted response via Claude Code

cblecker · 2026-07-04T03:04:37Z

 	return false
 }

+// reconcileCRDeletion handles the AWSEndpointService CR deletion path.


The diff includes a TestReconcileCRDeletion with cases for the Patch conflict requeue path, but it doesn't appear in the actual test file — looks like it was dropped during a rebase. The refactoring from Update to Patch+MergeFromWithOptimisticLock with the RequeueAfter: time.Second conflict handling is a meaningful behavioral change worth covering directly.

Done. Added TestReconcileCRDeletion covering: no-finalizer early return, successful cleanup with GC, AWS client init failure, Patch conflict requeue (asserts RequeueAfter == time.Second and err == nil), and Patch non-conflict error.

AI-assisted response via Claude Code

cblecker · 2026-07-04T03:04:41Z

@@ -518,6 +505,13 @@ func (r *AWSEndpointServiceReconciler) Reconcile(ctx context.Context, req ctrl.R
 		return ctrl.Result{}, err


Nit: this getClients error is returned bare while the other two call sites (L588, L683) wrap it with context identifying the reconciliation path. Wrapping here too ("failed to get AWS clients for endpoint reconciliation: %w") would help with production triage.

Done. Wrapped with "failed to get AWS clients for endpoint reconciliation: %w" to match the other call sites.

AI-assisted response via Claude Code

cblecker · 2026-07-04T03:04:45Z

+	// Ensure the awsEndpointService has a finalizer for cleanup.
+	// This is placed after the serviceName check so that CRs not yet populated
+	// by hypershift-operator don't get a finalizer that would block HCP deletion.
 	if !controllerutil.ContainsFinalizer(awsEndpointService, finalizer) {


The CR finalizer is added here before the HCP deletion check at L493. During HCP deletion, after reconcileHCPDeletion removes a CR's finalizer and requeues (waiting for other CRs), the next reconcile re-adds the CR finalizer at this line before discovering the HCP is being deleted. This causes reconcileHCPDeletion to re-run r.delete on already-cleaned-up resources each cycle.

Moving the HCP fetch and deletion check before the CR finalizer addition would avoid the re-addition loop:

hcp, err := r.getHostedControlPlane(ctx, req.Namespace) ... if hcp != nil && !hcp.DeletionTimestamp.IsZero() { return r.reconcileHCPDeletion(ctx, awsEndpointService, hcp, log) } // Only add CR finalizer during normal reconciliation if !controllerutil.ContainsFinalizer(awsEndpointService, finalizer) {

Done. Moved the HCP fetch and deletion check before the CR finalizer addition so that during HCP deletion, reconcileHCPDeletion handles cleanup without re-adding the CR finalizer.

AI-assisted response via Claude Code

cblecker · 2026-07-04T03:10:26Z

/address-review-comments

github-actions · 2026-07-04T03:10:34Z

🤖 Addressing review comments: workflow run

…reconciler Add a finalizer on the HostedControlPlane to block HCP deletion until all AWS PrivateLink resources (VPC endpoints, security groups, DNS records) are cleaned up by the AWSEndpointService reconciler. - Add hcpAWSPrivateLinkFinalizerName finalizer, placed after client initialization succeeds on the normal reconciliation path - Add reconcileHCPDeletion to clean up AWS resources for each AWSEndpointService CR before removing the HCP finalizer - Replace handler.Funcs{UpdateFunc: ...} HCP watch with EnqueueRequestsFromMapFunc so Create/Delete events also trigger reconciliation (critical for CPO restarts during HCP deletion) - Use convergent multi-CR coordination: each reconciler cleans its own CR, only the last one to finish removes the HCP finalizer - Move HCP deletion check before CR finalizer addition to prevent re-addition loop during HCP deletion cleanup cycles - Move AWS client initialization inside CR finalizer check in reconcileHCPDeletion so already-cleaned-up CRs proceed to the pending-CRs check without needing AWS API calls - Add comprehensive unit tests covering finalizer lifecycle, HCP deletion cleanup, CR deletion cleanup, concurrent reconciler coordination, and SharedVPC scenarios Previously, if the HCP was deleted before AWSEndpointService cleanup, the controller could not construct valid AWS clients — particularly for SharedVPC clusters where cross-account role ARNs are sourced from the HCP spec — orphaning AWS resources. Signed-off-by: OpenShift CI Bot <ci-bot@redhat.com> Commit-Message-Assisted-by: Claude (via Claude Code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci · 2026-07-04T03:45:36Z

@hypershift-jira-solve-ci[bot]: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 13, 2026

openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels May 13, 2026

openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform and removed do-not-merge/needs-area labels May 13, 2026

bryan-cox suggested changes Jun 17, 2026

View reviewed changes

openshift-ci Bot assigned bryan-cox Jun 17, 2026

bryan-cox approved these changes Jun 18, 2026

View reviewed changes

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2026

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2026

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2026

hypershift-jira-solve-ci Bot force-pushed the fix-CNTRLPLANE-507 branch from 3c58530 to 9877433 Compare June 24, 2026 14:01

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2026

csrwng marked this pull request as ready for review June 24, 2026 14:20

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 24, 2026

openshift-ci Bot requested review from cblecker and sjenning June 24, 2026 14:21

hypershift-jira-solve-ci Bot force-pushed the fix-CNTRLPLANE-507 branch from 3286f0b to ebb3676 Compare June 24, 2026 15:21

cblecker suggested changes Jun 24, 2026

View reviewed changes

openshift-ci Bot assigned cblecker Jun 24, 2026

hypershift-jira-solve-ci Bot force-pushed the fix-CNTRLPLANE-507 branch from ebb3676 to 6e58fca Compare June 24, 2026 18:21

cblecker reviewed Jun 24, 2026

View reviewed changes

hypershift-jira-solve-ci Bot force-pushed the fix-CNTRLPLANE-507 branch from 6e58fca to ce69e69 Compare June 25, 2026 11:24

cblecker reviewed Jun 25, 2026

View reviewed changes

hypershift-jira-solve-ci Bot force-pushed the fix-CNTRLPLANE-507 branch from ce69e69 to 09f6bd8 Compare June 26, 2026 03:35

github-actions Bot reviewed Jun 26, 2026

View reviewed changes

cblecker suggested changes Jul 4, 2026

View reviewed changes

hypershift-jira-solve-ci Bot force-pushed the fix-CNTRLPLANE-507 branch from 09f6bd8 to 7355f0e Compare July 4, 2026 03:30

		@@ -518,6 +505,13 @@ func (r *AWSEndpointServiceReconciler) Reconcile(ctx context.Context, req ctrl.R
		return ctrl.Result{}, err

Uh oh!

Conversation

hypershift-jira-solve-ci Bot commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented May 13, 2026

Uh oh!

openshift-ci-robot commented May 13, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Possibly related PRs

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (1 error)

Uh oh!

openshift-ci Bot commented May 13, 2026

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hypershift-jira-solve-ci Bot commented May 25, 2026

Uh oh!

bryan-cox left a comment

Choose a reason for hiding this comment

Staff Engineer Review: Add HCP finalizer to AWSEndpointService reconciler

Blocking Issues (3)

Open Questions (2)

Praise

Uh oh!

bryan-cox Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hypershift-jira-solve-ci Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hypershift-jira-solve-ci Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

bryan-cox Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hypershift-jira-solve-ci Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hypershift-jira-solve-ci Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

bryan-cox Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hypershift-jira-solve-ci Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hypershift-jira-solve-ci Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

bryan-cox Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hypershift-jira-solve-ci Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hypershift-jira-solve-ci Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

bryan-cox Jun 17, 2026

hypershift-jira-solve-ci Bot commented May 13, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented May 13, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading

hypershift-jira-solve-ci Bot commented Jun 24, 2026 •

edited by openshift-ci Bot

Loading