[megatron, trainer] fix: respect calculate_entropy config in megatron actor update#6016
Merged
wuxibin89 merged 2 commits intoverl-project:mainfrom Apr 16, 2026
Conversation
…date Previously, megatron_actor.update_actor and ray_trainer._update_actor only checked entropy_coeff != 0 to decide whether to compute entropy during training. This meant setting calculate_entropy=True in the actor config had no effect when entropy_coeff=0, unlike dp_actor which already respected the calculate_entropy flag. This is especially problematic in bypass_mode where _compute_old_log_prob is skipped entirely — there was no way to get actor/entropy metrics without also adding entropy to the loss. Changes: - megatron_actor: check calculate_entropy config in addition to entropy_coeff, consistent with dp_actor behavior - megatron_actor loss_func: log actor/entropy metric regardless of entropy_coeff; only add entropy to loss when entropy_coeff != 0 - ray_trainer: same calculate_entropy logic fix for the use_legacy_worker_impl=disable path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the PPO trainer and Megatron actor to allow entropy calculation and logging even when the entropy coefficient is zero, controlled by a new calculate_entropy configuration flag. The logic for the calculate_entropy variable was updated in both ray_trainer.py and megatron_actor.py, and the Megatron actor now logs entropy statistics independently of the loss calculation. I have no feedback to provide.
wuxibin89
previously approved these changes
Apr 16, 2026
wuxibin89
approved these changes
Apr 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes
calculate_entropyconfig being ignored inmegatron_actor.update_actorandray_trainer._update_actor(legacy_worker_impl=disable path).Previously, both only checked
entropy_coeff != 0to decide whether to compute entropy during training. This meant settingcalculate_entropy=Truehad no effect whenentropy_coeff=0, unlikedp_actorwhich already respected thecalculate_entropyflag (line 586):This is especially problematic in bypass_mode where
_compute_old_log_probis skipped entirely — there was no way to getactor/entropymetrics without also adding entropy to the loss.Not duplicating an existing PR: searched for calculate_entropy megatron and entropy bypass_mode — no related open PRs.
AI assistance was used (Claude) for code analysis and patch generation. All changes have been reviewed and validated by a human.
Checklist Before Starting
[{modules}] {type}: {description}formatDesign & Code Changes
Three minimal changes to align megatron actor with dp_actor behavior:
verl/workers/actor/megatron_actor.py(update_actor, line 791):verl/workers/actor/megatron_actor.py(loss_func, line 550-557): Decouple entropy metric logging from entropy loss. Always logactor/entropywhencalculate_entropy=True; only add entropy topolicy_losswhenentropy_coeff != 0.verl/trainer/ppo/ray_trainer.py(_update_actor, line 1227):Backward Compatibility
calculate_entropy=False, entropy_coeff=0(default)calculate_entropy=False, entropy_coeff!=0calculate_entropy=True, entropy_coeff=0calculate_entropy=True, entropy_coeff!=0Test
pre-commit runpassed all 12 checks (ruff, ruff format, mypy, config generation, license, device API, DataProto usage, naming conventions, compile, etc.)calculate_entropy=True.Checklist Before Submitting