Skip to content

test: Refactor distributed job creation in testing pt 2#1483

Open
itsomri wants to merge 4 commits into
mainfrom
omric/unified-distributed-batch-job-helper-pt-2
Open

test: Refactor distributed job creation in testing pt 2#1483
itsomri wants to merge 4 commits into
mainfrom
omric/unified-distributed-batch-job-helper-pt-2

Conversation

@itsomri
Copy link
Copy Markdown
Collaborator

@itsomri itsomri commented Apr 27, 2026

Description

Related Issues

Fixes #

Checklist

Note: Ensure your PR title follows the Conventional Commits format (e.g., feat(scheduler): add new feature)

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated documentation (if needed)

Breaking Changes

Additional Notes

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9dc39861-b674-468c-a77a-d7bf6f4af20f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch omric/unified-distributed-batch-job-helper-pt-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@itsomri itsomri changed the title test: Refactor distributed job creation in testing test: Refactor distributed job creation in testing pt 2 Apr 27, 2026
itsomri and others added 2 commits April 27, 2026 14:26
- api/events NotReady test: collapse manual PG + pod into one helper call
  (MinMember=2, Parallelism defaults to 1).
- allocate/elastic Balance test: replace 6 raw pods + 2 manual PGs with two
  CreateDistributedBatchJob calls sharing an elasticOpts struct; delete the
  now-unused createElasticPod helper.

Signed-off-by: itsomri <[email protected]>

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…tion

The Job-owned pods produced by CreateDistributedBatchJob were being
recreated by the Job controller after the scheduler reclaimed/preempted
them. Tests waiting for pod count to drop (partial reclaim/preemption)
saw the count snap back to Parallelism and timed out.

Add an optional BackoffLimit field on DistributedBatchJobOptions. nil
preserves the k8s default (6 retries) so existing callers — including
single-pod scale fillers that may legitimately want pod replacement —
keep their old behavior. Reclaim/preempt elastic specs opt in with
BackoffLimit=ptr.To(int32(0)): the first scheduler-induced deletion
marks the Job Failed, the controller stops creating replacements, and
surviving pods keep running, matching the raw-pod semantics those tests
were written against.

Opt-ins applied to the elastic preemptee/reclaimee jobs in
reclaim_elastic_specs, reclaim_elastic_test, and preempt_elastic_specs;
the gang reclaimer/preemptor side keeps the default.

Signed-off-by: itsomri <[email protected]>

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@itsomri itsomri force-pushed the omric/unified-distributed-batch-job-helper-pt-2 branch from 6cdbd3b to a93ab6f Compare April 27, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant