[rllib] Fix incorrect log value of environment steps sampled/trained #56599

Daraan · 2025-09-16T22:45:06Z

Why are these changes needed?

Currently learners/__all_modules__/num_env_steps_trained has a strangely large value, multiple times higher than learners/__all_modules__/num_module_steps_trained.
This is due to a logging error as minibatch.env_steps() reports the size of the whole batch.

This is confusing but, seems to be intended (once):
minibatch = MultiAgentBatch(minibatch, env_steps=len(whole_batch))

ray/rllib/utils/minibatch_utils.py

Lines 174 to 178 in 3a1750b

    
           # Note (Kourosh): env_steps is the total number of env_steps that this 
        
           # multi-agent batch is covering. It should be simply inherited from the 
        
           # original multi-agent batch. 
        
           minibatch = MultiAgentBatch(minibatch, len(self._batch)) 
        
           yield minibatch

Due to this fallacy the whole batch_size is logged for each mini batch iteration creating a too large value:

For example:

I have a batch size and sample size of 2048, train for 20 epochs with a minibatch size of 128. The resulting learners log is:

__all_modules__: {
  num_module_steps_trained: 40960,
  num_env_steps_trained: 655360
}

num_module_steps_trained is correct
num_module_steps_trained = 2048 samples * 20 epochs = 128 minibatch_size * (2048/128 minibatch cycles) * 20 epochs.

However, num_env_steps_trained makes no sense - it is 16 times higher. It is calculated:
2048 samples * 20 epochs * (16 = 2048 / 128 minibatch cycles) = 2048 * 320 iterations total

Related issue number

https://discuss.ray.io/t/is-the-num-env-steps-trained-logged-incorrectly-if-not-how-to-interpret-it-compared-to-num-module-steps-trained/22616

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests

Signed-off-by: Daniel Sperber <[email protected]>

gemini-code-assist

Code Review

This pull request correctly fixes a bug where num_env_steps_trained and num_env_steps_sampled metrics were being logged with excessively large values. The root cause was that the logging was performed once per minibatch using a value (batch.env_steps()) that represented the size of the entire batch, instead of the minibatch. The fix addresses this by moving the logging logic inside the per-module loop and using module_batch_size as the value. This ensures that for each minibatch, the metric is incremented by the sum of agent steps across all modules, which aligns it with how num_module_steps_trained is calculated and resolves the original issue. The changes are applied consistently across Learner, DifferentiableLearner, and OfflineEvaluationRunner, and they look correct.

Signed-off-by: Daniel Sperber <[email protected]>

Fix too high env_steps_sampled/trained

d4ac7b2

Signed-off-by: Daniel Sperber <[email protected]>

Daraan requested a review from a team as a code owner September 16, 2025 22:45

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

Add tests

5a61257

Signed-off-by: Daniel Sperber <[email protected]>

ray-gardener bot added rllib RLlib related issues community-contribution Contributed by the community labels Sep 17, 2025

Daraan changed the title ~~[rrlib] Fix incorrect log value of environment steps sampled/trained~~ [rllib] Fix incorrect log value of environment steps sampled/trained Sep 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rllib] Fix incorrect log value of environment steps sampled/trained #56599

[rllib] Fix incorrect log value of environment steps sampled/trained #56599

Daraan commented Sep 16, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

	# Note (Kourosh): env_steps is the total number of env_steps that this
	# multi-agent batch is covering. It should be simply inherited from the
	# original multi-agent batch.
	minibatch = MultiAgentBatch(minibatch, len(self._batch))
	yield minibatch

[rllib] Fix incorrect log value of environment steps sampled/trained #56599

Are you sure you want to change the base?

[rllib] Fix incorrect log value of environment steps sampled/trained #56599

Conversation

Daraan commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Daraan commented Sep 16, 2025 •

edited

Loading