Skip to content

Conversation

@kevinzakka
Copy link
Collaborator

Summary

  • Adds scale_rewards_by_dt: bool = True config field to ManagerBasedRlEnvCfg
  • Updates RewardManager to conditionally apply dt scaling based on the flag
  • Adds comprehensive docstring documentation explaining scaling behavior
  • Adds tests verifying both scaling modes

Behavior

Setting reward_buf _step_reward
scale_rewards_by_dt=True (default) raw * weight * dt raw * weight
scale_rewards_by_dt=False raw * weight raw * weight

The default of True maintains backward compatibility.

Test plan

  • Added test_reward_scaling_enabled - verifies rewards are scaled by dt when enabled
  • Added test_reward_scaling_disabled - verifies rewards are not scaled when disabled
  • Added test_reward_scaling_default_is_enabled - verifies default is True for backward compatibility
  • All existing tests pass

Closes #405

🤖 Generated with Claude Code

Implements a configurable flag to control whether rewards are scaled by
the environment step duration (dt). The flag defaults to True to maintain
backward compatibility.

When scale_rewards_by_dt=True (default):
- reward_buf = raw_value * weight * dt
- _episode_sums and Episode_Reward/* metrics are scaled by dt

When scale_rewards_by_dt=False:
- reward_buf = raw_value * weight (no dt scaling)

The _step_reward values (used by get_active_iterable_terms) always contain
unscaled values regardless of the setting.

Closes #405

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@kevinzakka kevinzakka merged commit 7c36ffb into main Jan 3, 2026
9 checks passed
@kevinzakka kevinzakka deleted the feat/scale-rewards-by-dt-flag branch January 3, 2026 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal to Remove Reward Scaling by Simulation Time Step

2 participants