Summary
A race condition was discovered in the Helm chart publishing workflow that can leave the main branch in a broken state, causing cascading failures for subsequent PRs.
Background
When PR #531 was merged to main:
- ✅ The
automation chart was bumped from 0.1.1 → 0.1.2
- ❌ The
openhands chart was modified but NOT version bumped (stayed at 0.4.1)
- The
publish-helm-charts workflow on main failed validation because the openhands chart changes were not versioned
- This prevented
automation:0.1.2 from being published to GHCR (since publishing happens after validation and uses max-parallel: 1)
- Subsequent PRs (like #543) that depend on
automation:0.1.2 fail because the chart doesn't exist in the registry
Root Cause
The current workflow has a gap between PR checks and merge:
- PR checks (
preview-helm-charts.yml) run validate-chart-versions with enforce_version_bump: false - just a warning
- After merge (
publish-helm-charts.yml) runs with enforce_version_bump: true - enforced, but too late
This allows PRs with incomplete version bumps to be merged, breaking main.
Recommended Solutions (Ranked)
1. 🏆 Enforce version validation on PRs (Most Recommended)
Effort: Low | Impact: High | Reliability: High
Modify preview-helm-charts.yml to set enforce_version_bump: true:
validate-chart-versions:
uses: ./.github/workflows/validate-chart-versions.yml
with:
base_ref: origin/${{ github.event.pull_request.base.ref }}
enforce_version_bump: true # Changed from false
Pros:
- Simple one-line change
- Catches issues before merge
- Uses existing infrastructure
Cons:
- Adds friction for documentation-only changes (though README is already excluded)
2. Add a required status check for chart version validation (Recommended)
Effort: Low | Impact: High | Reliability: High
In GitHub repository settings, add validate-chart-versions as a required status check for PRs targeting main.
Pros:
- Prevents bypass via admin merge
- Clear visibility in PR UI
Cons:
- Requires repository admin access to configure
3. Add automated version bump suggestion bot (Nice to have)
Effort: Medium | Impact: Medium | Reliability: Medium
Create a GitHub Action or bot that:
- Detects when chart files are modified without version bumps
- Posts a comment suggesting the version bump
- Optionally auto-generates a commit with the fix
Pros:
- Developer-friendly
- Educational
Cons:
- More complex to implement
- Still relies on developers to accept/apply suggestions
4. Implement atomic chart publishing with rollback (Defense in depth)
Effort: High | Impact: High | Reliability: High
Modify publish-helm-charts.yml to:
- Validate ALL charts first before publishing any
- If validation fails, no charts are published
- Add retry/rollback mechanisms
Pros:
- Prevents partial publishing state
- More resilient
Cons:
- Significant workflow changes
- More complex testing
5. Add workflow to auto-fix broken main (Recovery mechanism)
Effort: Medium | Impact: Medium | Reliability: Medium
Create a workflow that can be manually triggered to:
- Re-run the publish workflow for any charts that failed to publish
- Detect missing chart versions in GHCR vs what's in
main
Pros:
- Provides quick recovery path
- Useful for edge cases
Cons:
- Doesn't prevent the issue, only recovers from it
Immediate Action Required
For the current broken state:
- Option A: Re-run the
publish-helm-charts workflow on main after fixing the version issue
- Option B: Bump chart versions in a new PR to trigger a fresh publish
Related PRs
This issue was created by an AI assistant (OpenHands) on behalf of @aivong-openhands.
Summary
A race condition was discovered in the Helm chart publishing workflow that can leave the
mainbranch in a broken state, causing cascading failures for subsequent PRs.Background
When PR #531 was merged to
main:automationchart was bumped from0.1.1→0.1.2openhandschart was modified but NOT version bumped (stayed at0.4.1)publish-helm-chartsworkflow onmainfailed validation because theopenhandschart changes were not versionedautomation:0.1.2from being published to GHCR (since publishing happens after validation and usesmax-parallel: 1)automation:0.1.2fail because the chart doesn't exist in the registryRoot Cause
The current workflow has a gap between PR checks and merge:
preview-helm-charts.yml) runvalidate-chart-versionswithenforce_version_bump: false- just a warningpublish-helm-charts.yml) runs withenforce_version_bump: true- enforced, but too lateThis allows PRs with incomplete version bumps to be merged, breaking
main.Recommended Solutions (Ranked)
1. 🏆 Enforce version validation on PRs (Most Recommended)
Effort: Low | Impact: High | Reliability: High
Modify
preview-helm-charts.ymlto setenforce_version_bump: true:Pros:
Cons:
2. Add a required status check for chart version validation (Recommended)
Effort: Low | Impact: High | Reliability: High
In GitHub repository settings, add
validate-chart-versionsas a required status check for PRs targetingmain.Pros:
Cons:
3. Add automated version bump suggestion bot (Nice to have)
Effort: Medium | Impact: Medium | Reliability: Medium
Create a GitHub Action or bot that:
Pros:
Cons:
4. Implement atomic chart publishing with rollback (Defense in depth)
Effort: High | Impact: High | Reliability: High
Modify
publish-helm-charts.ymlto:Pros:
Cons:
5. Add workflow to auto-fix broken main (Recovery mechanism)
Effort: Medium | Impact: Medium | Reliability: Medium
Create a workflow that can be manually triggered to:
mainPros:
Cons:
Immediate Action Required
For the current broken state:
publish-helm-chartsworkflow onmainafter fixing the version issueRelated PRs
This issue was created by an AI assistant (OpenHands) on behalf of @aivong-openhands.