Skip to content

Conversation

@willis89pr
Copy link

Pull Request: Throttle Parallel Jobs in Axom CI Matrix


Description

This PR introduces max-parallel limits to two key workflows in the Axom CI pipeline:

  • .github/workflows/ci-tests.yml
    Limits simultaneous matrix jobs to 5.
  • .github/workflows/test_windows_tpls.yml
    Caps concurrent Windows TPL builds at 2.

These changes ensure that our build runners aren’t overwhelmed by large matrices all at once, improving overall throughput and predictability.


Why This Matters

Without a cap on parallel matrix jobs, a single commit on Axom’s main branch can spin up dozens of containers simultaneously. LLNL has about 20 runners for all LLNL GitHub projects. In our current runner pool, that often leads to:

  • Queue backlogs for other important pipelines (e.g., other LLNL projects).
  • Longer overall completion times when everyone’s jobs start at the same time.
  • Resource contention that can trigger costly autoscaling or timeouts.

By throttling to a reasonable number of concurrent jobs, we:

  • Smooth out load spikes, so new work can start sooner.
  • Reduce idle wait times for high-priority builds.
  • Lower runner autoscaling events, saving shared infrastructure costs.

Example Scenario

Before:
A change to a deep dependency triggers the 8×3 Python matrix in ci-tests.yml (24 jobs) plus Windows TPL tests (4 jobs), firing 28 jobs at once. Our 20-runner pool (for all LLNL projects) immediately queues 18 jobs, delaying hotfix builds and nightly jobs.

After:
With max-parallel: 5, only 5 of those 24 run at once; the rest wait in queue. Meanwhile, Windows jobs (capped at 2) and any other workflows can interleave, keeping the pool busy but not saturated. Overall turnaround for all pipelines improves.


Verification Steps

  1. Push a test commit to a feature branch.
  2. Observe in the Actions UI that only 5 matrix shards run concurrently in ci-tests.
  3. Confirm that test_windows_tpls never exceeds 2 concurrent jobs.
  4. Ensure no unintended cancellations occur for other workflows.

By applying these caps, we’ll achieve steadier CI performance and faster feedback for all contributors.

- Add `max-parallel: 5` to `.github/workflows/ci-tests.yml` to limit simultaneous jobs in the build matrix
- Add `max-parallel: 2` to `.github/workflows/test_windows_tpls.yml` to cap Windows TPLs matrix parallelism
@rhornung67 rhornung67 added CI Issues related to continuous integration Reviewed high priority labels Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Issues related to continuous integration high priority Reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants