fix[v2]: localize RTensor trajectories before reading on controller#1387
fix[v2]: localize RTensor trajectories before reading on controller#1387sitabulaixizawaluduo wants to merge 1 commit into
Conversation
v2 inference service's data_proxy remotizes the exported trajectory dict (areal/experimental/inference_service/data_proxy/app.py:745), so the controller side receives dict-of-RTensor. Existing consumers read tensor values directly and crash with AttributeError / TypeError because RTensor exposes neither .flatten/.shape nor __len__/__getitem__. Key changes: - workflow_executor._dump_trajectory: RTensor.localize(traj) at entry so versions/input_ids/attention_mask/etc. become real tensors; no-op on v1 dict-of-Tensor. - InferenceServiceWorkflow._run_online: to_local() traj["rewards"] before len()/index in the tensor branch; interactions branch untouched (already plain Python data). Lazy fetch for the engine training path is preserved: only the local copy used by dump/reward-extraction is materialized, the outer trajectory dict still carries RTensors through to the training worker. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces handling for RTensor objects by localizing them in workflow.py and workflow_executor.py. Feedback was provided to robustly handle 0-dimensional PyTorch tensors when checking the length of rewards_tensor to prevent potential TypeError exceptions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if rewards_tensor is not None and len(rewards_tensor) > 0: | ||
| last_reward = float(rewards_tensor[-1]) |
There was a problem hiding this comment.
If rewards_tensor is a 0-dimensional PyTorch tensor (e.g., a scalar tensor), calling len(rewards_tensor) will raise a TypeError: len() of a 0-d tensor. To make this check robust against 0-dimensional tensors, we can check if the tensor has ndim == 0 and handle it directly, or check ndim before calling len().
if rewards_tensor is not None:
if getattr(rewards_tensor, "ndim", None) == 0:
last_reward = float(rewards_tensor)
elif len(rewards_tensor) > 0:
last_reward = float(rewards_tensor[-1])
Description
v2 inference service's data_proxy remotizes the exported trajectory dict (
areal/experimental/inference_service/data_proxy/app.py:745), so the controller side receives a dict-of-RTensor. Existing consumers read tensor values directly and crash withAttributeError/TypeError(RTensorexposes neither.flatten/.shapenor__len__/__getitem__).This PR localizes the trajectory at the exact consumption points:
WorkflowExecutor._dump_trajectory: materialize the whole traj on entry so the subsequent reads ofversions/input_ids/attention_mask/loss_mask/rewardswork.InferenceServiceWorkflow._run_online:to_local()traj["rewards"]beforelen()/[-1]in the tensor branch; theinteractionsbranch is plain Python data and is left untouched.Lazy fetch for the engine training path is preserved: only the local copy used by dump / reward-extraction is materialized, the outer trajectory dict still carries RTensors through to the training worker.
Related Issue
No tracking issue. Bug surfaced during a v2 run that enabled
dump_to_file=true; the online-mode reward extraction was a second latent hit on the same root cause.Type of Change
Checklist
Breaking Change Details (if applicable):
N/A
Additional Context
`examples/openclaw/config.yaml` does not set `_version: v2`, so the default `RolloutConfig._version="v1"` keeps the test on the legacy path. A follow-up PR will either flip openclaw to v2 in CI or add focused UTs (using the `rpc_server` fixture from `tests/test_rtensor.py`) to lock in regression coverage.