Bug Description
PR: #1823 (Add fallback for getseqlenbalanced_partitions)
Issue: When rollout-max-response-len > max-tokens-per-gpu, a single sample's total length (prompt + response) can exceed max_tokens_per_gpu. get_minimum_num_micro_batch_size handles this correctly by isolating the oversized sample in its own micro-batch.
However, the fallback _get_capped_partitions enforces sums[i] + length <= max_tokens strictly, so it can't place the sample in any partition and hits raise AssertionError("This should never happen.").
Steps to Reproduce
Repro config:
--rollout-max-response-len 8192
--max-tokens-per-gpu 4096
Any sample with prompt (~400 tokens) + response (>3696 tokens) triggers the crash.
Expected Behavior
Expected: _get_capped_partitions should match get_minimum_num_micro_batch_size's behavior — when a sample can't fit in any existing partition, place it alone in an empty partition (even if it exceeds max_tokens).
If this is not desired, like we want to enforce max tokens per gpu, we should add a more meaningful error message and update the documentation about the limitation, or enforce this when parsing the config.
Actual Behavior
raise AssertionError("This should never happen.").
Environment
- slime version: v0.2.4 (commit 286750a)
- Python version: 3.12.3
- PyTorch version: 2.9.1+cu129
- CUDA version: 12.9
- GPU type and count: NVIDIA H200, 8 per node (4 nodes, 32 total)
- OS: Linux (Amazon Linux 2023, kernel 6.1.141)
- SGLang version: 0.5.9
- Megatron-LM version: 0.16.0
Logs
Additional Context
No response
Pre-submission Checklist
Bug Description
PR: #1823 (Add fallback for getseqlenbalanced_partitions)
Issue: When rollout-max-response-len > max-tokens-per-gpu, a single sample's total length (prompt + response) can exceed max_tokens_per_gpu. get_minimum_num_micro_batch_size handles this correctly by isolating the oversized sample in its own micro-batch.
However, the fallback _get_capped_partitions enforces sums[i] + length <= max_tokens strictly, so it can't place the sample in any partition and hits raise AssertionError("This should never happen.").
Steps to Reproduce
Repro config:
--rollout-max-response-len 8192
--max-tokens-per-gpu 4096
Any sample with prompt (~400 tokens) + response (>3696 tokens) triggers the crash.
Expected Behavior
Expected: _get_capped_partitions should match get_minimum_num_micro_batch_size's behavior — when a sample can't fit in any existing partition, place it alone in an empty partition (even if it exceeds max_tokens).
If this is not desired, like we want to enforce max tokens per gpu, we should add a more meaningful error message and update the documentation about the limitation, or enforce this when parsing the config.
Actual Behavior
raise AssertionError("This should never happen.").
Environment
Logs
Additional Context
No response
Pre-submission Checklist