fix[v2]: localize RTensor trajectories before reading on controller by sitabulaixizawaluduo · Pull Request #1387 · areal-project/AReaL

sitabulaixizawaluduo · 2026-06-03T06:42:51Z

Description

v2 inference service's data_proxy remotizes the exported trajectory dict (areal/experimental/inference_service/data_proxy/app.py:745), so the controller side receives a dict-of-RTensor. Existing consumers read tensor values directly and crash with AttributeError / TypeError (RTensor exposes neither .flatten/.shape nor __len__/__getitem__).

This PR localizes the trajectory at the exact consumption points:

WorkflowExecutor._dump_trajectory: materialize the whole traj on entry so the subsequent reads of versions / input_ids / attention_mask / loss_mask / rewards work.
InferenceServiceWorkflow._run_online: to_local() traj["rewards"] before len() / [-1] in the tensor branch; the interactions branch is plain Python data and is left untouched.

Lazy fetch for the engine training path is preserved: only the local copy used by dump / reward-extraction is materialized, the outer trajectory dict still carries RTensors through to the training worker.

Related Issue

No tracking issue. Bug surfaced during a v2 run that enabled dump_to_file=true; the online-mode reward extraction was a second latent hit on the same root cause.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with `jb build docs`
No critical issues raised by AI reviewers (`/gemini review`)

Breaking Change Details (if applicable):

N/A

Additional Context

Root cause introduced by feat: controller v2 refactor #1354 (`feat: controller v2 refactor`), which lifted `RTensor.remotize` to the outermost trajectory dict; downstream consumers were never adapted.
v1 path (`RolloutController` + `OpenAIProxyWorkflow`) is unaffected —`data_proxy` is v2-only.
The existing `test_openclaw_online_rl` does not catch this:
`examples/openclaw/config.yaml` does not set `_version: v2`, so the default `RolloutConfig._version="v1"` keeps the test on the legacy path. A follow-up PR will either flip openclaw to v2 in CI or add focused UTs (using the `rpc_server` fixture from `tests/test_rtensor.py`) to lock in regression coverage.

v2 inference service's data_proxy remotizes the exported trajectory dict (areal/experimental/inference_service/data_proxy/app.py:745), so the controller side receives dict-of-RTensor. Existing consumers read tensor values directly and crash with AttributeError / TypeError because RTensor exposes neither .flatten/.shape nor __len__/__getitem__. Key changes: - workflow_executor._dump_trajectory: RTensor.localize(traj) at entry so versions/input_ids/attention_mask/etc. become real tensors; no-op on v1 dict-of-Tensor. - InferenceServiceWorkflow._run_online: to_local() traj["rewards"] before len()/index in the tensor branch; interactions branch untouched (already plain Python data). Lazy fetch for the engine training path is preserved: only the local copy used by dump/reward-extraction is materialized, the outer trajectory dict still carries RTensors through to the training worker. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces handling for RTensor objects by localizing them in workflow.py and workflow_executor.py. Feedback was provided to robustly handle 0-dimensional PyTorch tensors when checking the length of rewards_tensor to prevent potential TypeError exceptions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-03T06:44:18Z

+        if rewards_tensor is not None and len(rewards_tensor) > 0:
+            last_reward = float(rewards_tensor[-1])


If rewards_tensor is a 0-dimensional PyTorch tensor (e.g., a scalar tensor), calling len(rewards_tensor) will raise a TypeError: len() of a 0-d tensor. To make this check robust against 0-dimensional tensors, we can check if the tensor has ndim == 0 and handle it directly, or check ndim before calling len().

if rewards_tensor is not None: if getattr(rewards_tensor, "ndim", None) == 0: last_reward = float(rewards_tensor) elif len(rewards_tensor) > 0: last_reward = float(rewards_tensor[-1])

TaoZex

LGTM

sitabulaixizawaluduo requested review from HwVanICI, TaoZex, garrett4wade, guozhihao-224 and nuzant as code owners June 3, 2026 06:42

sitabulaixizawaluduo requested review from HwVanICI, TaoZex, fishcrap, garrett4wade, guozhihao-224, nuzant and rchardx June 3, 2026 06:42

sitabulaixizawaluduo changed the title ~~fix: localize RTensor trajectories before reading on controller~~ fix[v2]: localize RTensor trajectories before reading on controller Jun 3, 2026

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

TaoZex approved these changes Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix[v2]: localize RTensor trajectories before reading on controller#1387

fix[v2]: localize RTensor trajectories before reading on controller#1387
sitabulaixizawaluduo wants to merge 1 commit into
mainfrom
fix/trajectory-rtensor-localize

sitabulaixizawaluduo commented Jun 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

TaoZex left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if rewards_tensor is not None and len(rewards_tensor) > 0:
		last_reward = float(rewards_tensor[-1])

Conversation

sitabulaixizawaluduo commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

TaoZex left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sitabulaixizawaluduo commented Jun 3, 2026 •

edited

Loading