[Question] How to compare rollout accuracy using debug-rollout-only mode

### Your Question

I found that the log stopped to print `rollout raw_reward` value  since this [commit](https://github.com/THUDM/slime/commit/96118d0c261b8f3b3e58d1420a79749975a73967). The `log_rollout_data` won't be executed.

I wonder if there are any other ways to compare rollout accuracy? Thanks

### What I've Tried

Only `rollout performance` arguments have been printed:
```sh
[36m(RolloutManager pid=1805485)[0m [2026-04-15 07:57:53] sglang_rollout.py:338 - Abort request for ['http://90.90.97.74:15006', 'http://90.90.97.74:15000', 'http://90.90.97.74:15004', 'http://90.90.97.74:15002']
[36m(RolloutManager pid=1805485)[0m [2026-04-15 07:57:53] rollout.py:1193 - perf 0: {'rollout/response_len/mean': 2773.71484375, 'rollout/response_len/median': 2837.5, 'rollout/response_len/max': 4096, 'rollout/response_len/min': 593, 'rollout/zero_std/count_0': 12, 'rollout/zero_std/count_1': 14, 'rollout/repetition_frac': 0.0, 'rollout/truncated_ratio': 0.46484375, 'perf/rollout_time': 371.1796889305115, 'perf/tokens_per_gpu_per_sec': 239.12643295688667, 'perf/longest_sample_tokens_per_sec': 11.035086568992766, 'perf/effective_tokens_per_gpu_per_sec': 239.12643295688667, 'perf/longest_effective_sample_tokens_per_sec': 11.035086568992766}
[36m(SGLangEngine pid=1807332)[0m [2026-04-15 07:57:53] INFO:     90.90.97.74:49018 - "POST /abort_request HTTP/1.1" 200 OK
[36m(RolloutManager pid=1805485)[0m 
Rollout generation:   0%|          | 0/256 [00:00<?, ?it/s]
[36m(SGLangEngine pid=1807330)[0m [2026-04-15 07:57:58 TP0] Prefill batch, #new-seq: 1, #new-token: 256, #cached-token: 0, full token usage: 0.01, mamba usage: 0.02, #running-req: 0, #queue-req: 0, npu graph: False, input throughput (token/s): 4.60
[36m(SGLangEngine pid=1807331)[0m [2026-04-15 07:58:03 TP0] Decode batch, #running-req: 60, #full token: 18560, full token usage: 0.06, mamba num: 60, mamba usage: 0.09, npu graph: False, gen throughput (token/s): 16.37, #queue-req: 0
[36m(SGLangEngine pid=1807330)[0m [2026-04-15 07:58:02 TP0] Prefill batch, #new-seq: 3, #new-token: 1536, #cached-token: 0, full token usage: 0.06, mamba usage: 0.09, #running-req: 57, #queue-req: 0, npu graph: False, input throughput (token/s): 41072.45[32m [repeated 24x across cluster][0m
```

### Environment (if relevant)

- slime version: v0.2.4
- Python version:
- PyTorch version:
- CUDA/ROCm version:
- GPU type and count:
- OS:


### Additional Context

_No response_

### Pre-submission Checklist

- [x] I have read the [CONTRIBUTING.md](https://github.com/THUDM/slime/blob/main/CONTRIBUTING.md) and understand the collaboration scope.
- [x] I have read the [documentation](https://thudm.github.io/slime/) and [FAQ](https://thudm.github.io/slime/en/get_started/qa.html) and my question is not answered there.
- [x] I have searched for [existing issues](https://github.com/THUDM/slime/issues) and my question has not been asked before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to compare rollout accuracy using debug-rollout-only mode #1837

Your Question

What I've Tried

Environment (if relevant)

Additional Context

Pre-submission Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] How to compare rollout accuracy using debug-rollout-only mode #1837

Description

Your Question

What I've Tried

Environment (if relevant)

Additional Context

Pre-submission Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions