Skip to content

Document race condition in Helm chart publishing workflow and implement preventive measures #545

@aivong-openhands

Description

@aivong-openhands

Summary

A race condition was discovered in the Helm chart publishing workflow that can leave the main branch in a broken state, causing cascading failures for subsequent PRs.

Background

When PR #531 was merged to main:

  1. ✅ The automation chart was bumped from 0.1.10.1.2
  2. ❌ The openhands chart was modified but NOT version bumped (stayed at 0.4.1)
  3. The publish-helm-charts workflow on main failed validation because the openhands chart changes were not versioned
  4. This prevented automation:0.1.2 from being published to GHCR (since publishing happens after validation and uses max-parallel: 1)
  5. Subsequent PRs (like #543) that depend on automation:0.1.2 fail because the chart doesn't exist in the registry

Root Cause

The current workflow has a gap between PR checks and merge:

  • PR checks (preview-helm-charts.yml) run validate-chart-versions with enforce_version_bump: false - just a warning
  • After merge (publish-helm-charts.yml) runs with enforce_version_bump: true - enforced, but too late

This allows PRs with incomplete version bumps to be merged, breaking main.

Recommended Solutions (Ranked)

1. 🏆 Enforce version validation on PRs (Most Recommended)

Effort: Low | Impact: High | Reliability: High

Modify preview-helm-charts.yml to set enforce_version_bump: true:

validate-chart-versions:
  uses: ./.github/workflows/validate-chart-versions.yml
  with:
    base_ref: origin/${{ github.event.pull_request.base.ref }}
    enforce_version_bump: true  # Changed from false

Pros:

  • Simple one-line change
  • Catches issues before merge
  • Uses existing infrastructure

Cons:

  • Adds friction for documentation-only changes (though README is already excluded)

2. Add a required status check for chart version validation (Recommended)

Effort: Low | Impact: High | Reliability: High

In GitHub repository settings, add validate-chart-versions as a required status check for PRs targeting main.

Pros:

  • Prevents bypass via admin merge
  • Clear visibility in PR UI

Cons:

  • Requires repository admin access to configure

3. Add automated version bump suggestion bot (Nice to have)

Effort: Medium | Impact: Medium | Reliability: Medium

Create a GitHub Action or bot that:

  • Detects when chart files are modified without version bumps
  • Posts a comment suggesting the version bump
  • Optionally auto-generates a commit with the fix

Pros:

  • Developer-friendly
  • Educational

Cons:

  • More complex to implement
  • Still relies on developers to accept/apply suggestions

4. Implement atomic chart publishing with rollback (Defense in depth)

Effort: High | Impact: High | Reliability: High

Modify publish-helm-charts.yml to:

  1. Validate ALL charts first before publishing any
  2. If validation fails, no charts are published
  3. Add retry/rollback mechanisms

Pros:

  • Prevents partial publishing state
  • More resilient

Cons:

  • Significant workflow changes
  • More complex testing

5. Add workflow to auto-fix broken main (Recovery mechanism)

Effort: Medium | Impact: Medium | Reliability: Medium

Create a workflow that can be manually triggered to:

  1. Re-run the publish workflow for any charts that failed to publish
  2. Detect missing chart versions in GHCR vs what's in main

Pros:

  • Provides quick recovery path
  • Useful for edge cases

Cons:

  • Doesn't prevent the issue, only recovers from it

Immediate Action Required

For the current broken state:

  1. Option A: Re-run the publish-helm-charts workflow on main after fixing the version issue
  2. Option B: Bump chart versions in a new PR to trigger a fresh publish

Related PRs


This issue was created by an AI assistant (OpenHands) on behalf of @aivong-openhands.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions