Skip to content

Conversation

Daraan
Copy link
Contributor

@Daraan Daraan commented Sep 16, 2025

Why are these changes needed?

Currently learners/__all_modules__/num_env_steps_trained has a strangely large value, multiple times higher than learners/__all_modules__/num_module_steps_trained.
This is due to a logging error as minibatch.env_steps() reports the size of the whole batch.

This is confusing but, seems to be intended (once):
minibatch = MultiAgentBatch(minibatch, env_steps=len(whole_batch))

# Note (Kourosh): env_steps is the total number of env_steps that this
# multi-agent batch is covering. It should be simply inherited from the
# original multi-agent batch.
minibatch = MultiAgentBatch(minibatch, len(self._batch))
yield minibatch

Due to this fallacy the whole batch_size is logged for each mini batch iteration creating a too large value:


For example:

I have a batch size and sample size of 2048, train for 20 epochs with a minibatch size of 128. The resulting learners log is:

__all_modules__: {
  num_module_steps_trained: 40960,
  num_env_steps_trained: 655360
}

num_module_steps_trained is correct
num_module_steps_trained = 2048 samples * 20 epochs = 128 minibatch_size * (2048/128 minibatch cycles) * 20 epochs.

However, num_env_steps_trained makes no sense - it is 16 times higher. It is calculated:
2048 samples * 20 epochs * (16 = 2048 / 128 minibatch cycles) = 2048 * 320 iterations total

Related issue number

https://discuss.ray.io/t/is-the-num-env-steps-trained-logged-incorrectly-if-not-how-to-interpret-it-compared-to-num-module-steps-trained/22616

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests

@Daraan Daraan requested a review from a team as a code owner September 16, 2025 22:45
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where num_env_steps_trained and num_env_steps_sampled metrics were being logged with excessively large values. The root cause was that the logging was performed once per minibatch using a value (batch.env_steps()) that represented the size of the entire batch, instead of the minibatch. The fix addresses this by moving the logging logic inside the per-module loop and using module_batch_size as the value. This ensures that for each minibatch, the metric is incremented by the sum of agent steps across all modules, which aligns it with how num_module_steps_trained is calculated and resolves the original issue. The changes are applied consistently across Learner, DifferentiableLearner, and OfflineEvaluationRunner, and they look correct.

Signed-off-by: Daniel Sperber <[email protected]>
@ray-gardener ray-gardener bot added rllib RLlib related issues community-contribution Contributed by the community labels Sep 17, 2025
@Daraan Daraan changed the title [rrlib] Fix incorrect log value of environment steps sampled/trained [rllib] Fix incorrect log value of environment steps sampled/trained Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution Contributed by the community rllib RLlib related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant