[megatron, trainer] fix: respect calculate_entropy config in megatron actor update by MaxwellJryao · Pull Request #6016 · verl-project/verl

MaxwellJryao · 2026-04-15T12:41:39Z

What does this PR do?

Fixes calculate_entropy config being ignored in megatron_actor.update_actor and ray_trainer._update_actor (legacy_worker_impl=disable path).

Previously, both only checked entropy_coeff != 0 to decide whether to compute entropy during training. This meant setting calculate_entropy=True had no effect when entropy_coeff=0, unlike dp_actor which already respected the calculate_entropy flag (line 586):

# dp_actor (correct behavior)
calculate_entropy = self.config.calculate_entropy or (entropy_coeff != 0)

This is especially problematic in bypass_mode where _compute_old_log_prob is skipped entirely — there was no way to get actor/entropy metrics without also adding entropy to the loss.

Not duplicating an existing PR: searched for calculate_entropy megatron and entropy bypass_mode — no related open PRs.

AI assistance was used (Claude) for code analysis and patch generation. All changes have been reviewed and validated by a human.

Checklist Before Starting

I have searched for similar PRs
PR title follows [{modules}] {type}: {description} format

Design & Code Changes

Three minimal changes to align megatron actor with dp_actor behavior:

verl/workers/actor/megatron_actor.py (update_actor, line 791):

# Before:
calculate_entropy = self.config.entropy_coeff != 0
# After:
calculate_entropy = self.config.get("calculate_entropy", False) or (self.config.entropy_coeff != 0)

verl/workers/actor/megatron_actor.py (loss_func, line 550-557): Decouple entropy metric logging from entropy loss. Always log actor/entropy when calculate_entropy=True; only add entropy to policy_loss when entropy_coeff != 0.

# Before: unconditionally modifies loss
policy_loss = pg_loss - entropy_coeff * entropy_loss
# After: log metric first, only modify loss if needed
stats["actor/entropy"] = entropy_loss.detach().item()
if entropy_coeff != 0:
    policy_loss = pg_loss - entropy_coeff * entropy_loss

verl/trainer/ppo/ray_trainer.py (_update_actor, line 1227):

# Before:
calculate_entropy = self.config.actor_rollout_ref.actor.entropy_coeff != 0.0
# After:
calculate_entropy = self.config.actor_rollout_ref.actor.get("calculate_entropy", False) or (
    self.config.actor_rollout_ref.actor.entropy_coeff != 0.0
)

Backward Compatibility

User Config	Before	After
`calculate_entropy=False, entropy_coeff=0` (default)	No entropy computed	No entropy computed (unchanged)
`calculate_entropy=False, entropy_coeff!=0`	Entropy computed + in loss	Entropy computed + in loss + metric logged (unchanged loss)
`calculate_entropy=True, entropy_coeff=0`	Ignored ❌	Entropy computed, metric logged, loss unchanged ✅
`calculate_entropy=True, entropy_coeff!=0`	Entropy computed + in loss	Entropy computed + in loss + metric logged (unchanged loss)

Test

pre-commit run passed all 12 checks (ruff, ruff format, mypy, config generation, license, device API, DataProto usage, naming conventions, compile, etc.)
This change is a minimal logic fix (3 lines of condition change + 2 lines of metric logging) that aligns megatron_actor with the existing dp_actor behavior. It does not introduce new APIs or change existing behavior for users who don't set calculate_entropy=True.

Checklist Before Submitting

Read the Contribute Guide
Pre-commit checks applied and all passed
CI tests (existing tests should cover this; no new API introduced)

…date Previously, megatron_actor.update_actor and ray_trainer._update_actor only checked entropy_coeff != 0 to decide whether to compute entropy during training. This meant setting calculate_entropy=True in the actor config had no effect when entropy_coeff=0, unlike dp_actor which already respected the calculate_entropy flag. This is especially problematic in bypass_mode where _compute_old_log_prob is skipped entirely — there was no way to get actor/entropy metrics without also adding entropy to the loss. Changes: - megatron_actor: check calculate_entropy config in addition to entropy_coeff, consistent with dp_actor behavior - megatron_actor loss_func: log actor/entropy metric regardless of entropy_coeff; only add entropy to loss when entropy_coeff != 0 - ray_trainer: same calculate_entropy logic fix for the use_legacy_worker_impl=disable path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request updates the PPO trainer and Megatron actor to allow entropy calculation and logging even when the entropy coefficient is zero, controlled by a new calculate_entropy configuration flag. The logic for the calculate_entropy variable was updated in both ray_trainer.py and megatron_actor.py, and the Megatron actor now logs entropy statistics independently of the loss calculation. I have no feedback to provide.

MaxwellJryao requested review from ISEEKYAN, PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners April 15, 2026 12:41

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

wuxibin89 previously approved these changes Apr 16, 2026

View reviewed changes

update main_ppo_sync

f8cb2ad

wuxibin89 dismissed their stale review via f8cb2ad April 16, 2026 06:02

wuxibin89 approved these changes Apr 16, 2026

View reviewed changes

wuxibin89 merged commit 45bfd3c into verl-project:main Apr 16, 2026
66 of 73 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron, trainer] fix: respect calculate_entropy config in megatron actor update#6016

[megatron, trainer] fix: respect calculate_entropy config in megatron actor update#6016
wuxibin89 merged 2 commits intoverl-project:mainfrom
MaxwellJryao:fix/megatron-actor-calculate-entropy

MaxwellJryao commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MaxwellJryao commented Apr 15, 2026

What does this PR do?

Checklist Before Starting

Design & Code Changes

Backward Compatibility

Test

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants