Skip to content

fix[v2]: localize RTensor trajectories before reading on controller#1387

Open
sitabulaixizawaluduo wants to merge 1 commit into
mainfrom
fix/trajectory-rtensor-localize
Open

fix[v2]: localize RTensor trajectories before reading on controller#1387
sitabulaixizawaluduo wants to merge 1 commit into
mainfrom
fix/trajectory-rtensor-localize

Conversation

@sitabulaixizawaluduo
Copy link
Copy Markdown
Collaborator

@sitabulaixizawaluduo sitabulaixizawaluduo commented Jun 3, 2026

Description

v2 inference service's data_proxy remotizes the exported trajectory dict (areal/experimental/inference_service/data_proxy/app.py:745), so the controller side receives a dict-of-RTensor. Existing consumers read tensor values directly and crash with AttributeError / TypeError (RTensor exposes neither .flatten/.shape nor __len__/__getitem__).

This PR localizes the trajectory at the exact consumption points:

  • WorkflowExecutor._dump_trajectory: materialize the whole traj on entry so the subsequent reads of versions / input_ids / attention_mask / loss_mask / rewards work.
  • InferenceServiceWorkflow._run_online: to_local() traj["rewards"] before len() / [-1] in the tensor branch; the interactions branch is plain Python data and is left untouched.

Lazy fetch for the engine training path is preserved: only the local copy used by dump / reward-extraction is materialized, the outer trajectory dict still carries RTensors through to the training worker.

Related Issue

No tracking issue. Bug surfaced during a v2 run that enabled dump_to_file=true; the online-mode reward extraction was a second latent hit on the same root cause.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with `jb build docs`
  • No critical issues raised by AI reviewers (`/gemini review`)

Breaking Change Details (if applicable):

N/A

Additional Context

  • Root cause introduced by feat: controller v2 refactor #1354 (`feat: controller v2 refactor`), which lifted `RTensor.remotize` to the outermost trajectory dict; downstream consumers were never adapted.
  • v1 path (`RolloutController` + `OpenAIProxyWorkflow`) is unaffected —`data_proxy` is v2-only.
  • The existing `test_openclaw_online_rl` does not catch this:
    `examples/openclaw/config.yaml` does not set `_version: v2`, so the default `RolloutConfig._version="v1"` keeps the test on the legacy path. A follow-up PR will either flip openclaw to v2 in CI or add focused UTs (using the `rpc_server` fixture from `tests/test_rtensor.py`) to lock in regression coverage.

v2 inference service's data_proxy remotizes the exported trajectory
dict (areal/experimental/inference_service/data_proxy/app.py:745),
so the controller side receives dict-of-RTensor. Existing consumers
read tensor values directly and crash with AttributeError /
TypeError because RTensor exposes neither .flatten/.shape nor
__len__/__getitem__.

Key changes:
- workflow_executor._dump_trajectory: RTensor.localize(traj) at
  entry so versions/input_ids/attention_mask/etc. become real
  tensors; no-op on v1 dict-of-Tensor.
- InferenceServiceWorkflow._run_online: to_local() traj["rewards"]
  before len()/index in the tensor branch; interactions branch
  untouched (already plain Python data).

Lazy fetch for the engine training path is preserved: only the
local copy used by dump/reward-extraction is materialized, the
outer trajectory dict still carries RTensors through to the
training worker.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sitabulaixizawaluduo sitabulaixizawaluduo changed the title fix: localize RTensor trajectories before reading on controller fix[v2]: localize RTensor trajectories before reading on controller Jun 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces handling for RTensor objects by localizing them in workflow.py and workflow_executor.py. Feedback was provided to robustly handle 0-dimensional PyTorch tensors when checking the length of rewards_tensor to prevent potential TypeError exceptions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +253 to +254
if rewards_tensor is not None and len(rewards_tensor) > 0:
last_reward = float(rewards_tensor[-1])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If rewards_tensor is a 0-dimensional PyTorch tensor (e.g., a scalar tensor), calling len(rewards_tensor) will raise a TypeError: len() of a 0-d tensor. To make this check robust against 0-dimensional tensors, we can check if the tensor has ndim == 0 and handle it directly, or check ndim before calling len().

        if rewards_tensor is not None:
            if getattr(rewards_tensor, "ndim", None) == 0:
                last_reward = float(rewards_tensor)
            elif len(rewards_tensor) > 0:
                last_reward = float(rewards_tensor[-1])

Copy link
Copy Markdown
Collaborator

@TaoZex TaoZex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants