Skip to content

fix: route himalaya chat commands to device backend#818

Open
icycrystal4 wants to merge 4 commits intowecode-ai:mainfrom
icycrystal4:feat/device-sandbox-himalaya-routing
Open

fix: route himalaya chat commands to device backend#818
icycrystal4 wants to merge 4 commits intowecode-ai:mainfrom
icycrystal4:feat/device-sandbox-himalaya-routing

Conversation

@icycrystal4
Copy link
Copy Markdown
Collaborator

@icycrystal4 icycrystal4 commented Mar 23, 2026

Summary

  • route Himalaya-related chat exec calls through the device backend instead of always creating a cloud sandbox
  • add an internal backend bridge plus local executor sandbox exec handler for device-backed command execution
  • harden chat_shell skill ZIP loading so sandbox tool modules with shared base files load reliably

Testing

  • uv run --project /Users/crystal/dev/git/Wegent/backend/pyproject.toml pytest /Users/crystal/dev/git/Wegent/backend/tests/services/test_device_sandbox_service.py
  • uv run --project /Users/crystal/dev/git/Wegent/chat_shell/pyproject.toml pytest /Users/crystal/dev/git/Wegent/chat_shell/tests/test_skill_registry.py
  • uv run --project /Users/crystal/dev/git/Wegent/executor/pyproject.toml pytest /Users/crystal/dev/git/Wegent/executor/tests/test_local_sandbox_handler.py (did not complete cleanly in this environment; existing runner behavior appears to hang during collection/execution)

Summary by CodeRabbit

  • New Features

    • Device-backed sandbox operations: execute commands, read/list/write files, and upload/download attachments via connected devices.
    • Local runtime WebSocket support for sandbox events to handle sandbox operations directly.
  • Improvements

    • Device registration now records device capabilities for better routing.
    • Sandbox-based tools route to device backend when bound, speeding some flows.
    • Increased agent/tool call and iteration limits (defaults raised).
  • Tests

    • Added tests covering device sandbox service, local sandbox handler, and skill registry loading.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

Adds device-backed sandbox capabilities: new public/internal API endpoints, a DeviceSandboxService for selecting/dispatching to online devices via Socket.IO, device-side sandbox handlers and client routing in chat-shell tools, plus extensive tests and config updates for device-routed command behavior.

Changes

Cohort / File(s) Summary
API routing & internal exports
backend/app/api/api.py, backend/app/api/endpoints/internal/__init__.py
Register and export the new internal devices router (/internal/devices) so internal device endpoints are mounted and discoverable.
Public device endpoint
backend/app/api/endpoints/devices.py
Add POST /sandbox/exec with request/response models that authenticate users and forward execution requests to the device sandbox service, mapping service errors to HTTP 400.
Internal device endpoints
backend/app/api/endpoints/internal/devices.py
New internal /devices APIRouter exposing exec, binding lookup, read/list/write file ops, and download/upload attachment endpoints with Pydantic models, result normalization, DB commits, and DeviceSandboxError→HTTP 400 mapping.
Device sandbox service
backend/app/services/device_sandbox_service.py
New DeviceSandboxService implementing device selection (capability/device_id/task sticky binding), socket call dispatch to namespace /local-executor, response validation/normalization, binding persistence, and DeviceSandboxError.
Device service & WS registration
backend/app/services/device_service.py, backend/app/api/ws/device_namespace.py, executor/modes/local/websocket_client.py
Persist device capabilities in device CRD upsert; propagate capabilities from device client registration payload and accept capabilities in upsert API.
Executor device-side runtime
executor/modes/local/events.py, executor/modes/local/handlers.py, executor/modes/local/runner.py, executor/modes/local/__init__.py
Add SandboxEvents, implement SandboxHandler (exec, file and attachment handlers using subprocess/FS/HTTP), wire handlers into runner subscriptions, and export SandboxEvents.
Sandbox manager & chat-shell tools
chat_shell/chat_shell/tools/sandbox/_base.py, backend/init_data/skills/sandbox/*_tool.py, backend/init_data/skills/sandbox/command_tool.py
Add binding state, decision logic, HTTP proxy to internal device endpoints, via-device helper methods, and early device-bound execution branches in multiple sandbox tools (command, read/list/write files, upload/download attachments).
Skill ZIP loader changes
chat_shell/chat_shell/skills/registry.py
Safer in-ZIP provider loading: pre-register modules in sys.modules, enforce execution priority, guard per-module execution, and set package ModuleSpec for reliable relative imports.
Agent / config limits
chat_shell/chat_shell/agent.py, chat_shell/chat_shell/agents/graph_builder.py, chat_shell/chat_shell/api/schemas.py, chat_shell/chat_shell/core/config.py, chat_shell/.env.example
Raise default tool-call/iteration/request limits from 10/30→50 across agent, agent builder, chat event schema, config, and example env; refactor agent recovery helpers for tool-limit scenarios; add DEVICE_ROUTED_COMMANDS config.
Executor tests
executor/tests/test_local_sandbox_handler.py
Add tests for SandboxHandler helpers: command exec success/timeout, file read/list/write, path normalization, and working_dir normalization.
Backend tests
backend/tests/services/test_device_sandbox_service.py
Add async tests for device selection priority, task-bound routing, socket dispatch, error cases, and sticky binding persistence.
Chat-shell tests
chat_shell/tests/test_skill_registry.py, chat_shell/tests/test_sandbox_manager.py
Add tests for out-of-order ZIP provider loading and SandboxManager sticky binding behavior.

Sequence Diagram

sequenceDiagram
    participant Client as User / Tool
    participant API as Backend API
    participant DB as Database
    participant Service as DeviceSandboxService
    participant SIO as Socket.IO Server
    participant Device as Device Executor

    Client->>API: POST /sandbox/exec\n(command, working_dir, timeout, ...)
    API->>API: authenticate user
    API->>Service: execute_command(user_id,...)
    Service->>DB: query online devices / sticky binding
    DB-->>Service: devices / binding
    Service->>Service: filter by device_id/capability\napply priority / persist binding
    Service->>SIO: call(namespace="/local-executor", event="sandbox:exec", payload, timeout)
    SIO->>Device: deliver sandbox:exec event
    Device->>Device: perform exec / file / attachment op
    Device-->>SIO: return dict result (success, stdout, stderr, exit_code, execution_time, device_id)
    SIO-->>Service: response dict
    Service->>Service: validate & normalize response
    Service-->>API: result dict
    API-->>Client: HTTP JSON response
Loading

Estimated Code Review Effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I hopped through sockets, logs, and threads,
Routed commands where the device-bed sheds,
From API to edge the payload streams,
Small rabbit cheer for sandboxed dreams! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: route himalaya chat commands to device backend' is directly related to the main change in the changeset—enabling Himalaya-related chat commands to route through the device backend instead of cloud sandbox.
Docstring Coverage ✅ Passed Docstring coverage is 90.74% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Important

Merge conflicts detected (Beta)

  • Resolve merge conflict in branch feat/device-sandbox-himalaya-routing
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/api/api.py (1)

56-64: ⚠️ Potential issue | 🟡 Minor

Fix isort import ordering.

The pipeline indicates an isort check failure. The imports in this block need to be reordered.

🔧 Run isort to fix import ordering
isort --write app/api/api.py

Or manually ensure alphabetical ordering within the import block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/api.py` around lines 56 - 64, Import ordering in the import
block is failing isort; reorder the names alphabetically within the tuple so the
entries are sorted (callback_router, chat_storage_router, devices_router as
internal_devices_router, services_router, skills_router, subscriptions_router,
tables_router) or run isort --write on the file to apply the fix; update the
import statement in api.py to reflect the corrected alphabetical order for those
symbols.
🧹 Nitpick comments (7)
backend/app/api/ws/device_namespace.py (1)

168-181: Docstring missing capabilities parameter.

The capabilities parameter was added to _register_device, but the docstring (lines 170-181) does not document it.

📝 Add capabilities to docstring
     Args:
         user_id: Device owner user ID
         device_id: Device unique identifier (stored in Kind.name)
         name: Device display name
         client_ip: Device's client IP address
         device_type: Device type ('local' or 'cloud')
         bind_shell: Shell runtime binding ('claudecode' or 'openclaw')
+        capabilities: Optional capability tags for the device

     Returns (success, error_message).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/ws/device_namespace.py` around lines 168 - 181, The docstring
for _register_device is missing documentation for the new capabilities
parameter; update the function _register_device's docstring to include a short
description of capabilities (type Optional[list[str]]), what it represents
(e.g., list of device capability strings or features), and its role in
registration, and add it to the "Args:" section consistent with the existing
style and formatting.
backend/app/services/device_service.py (1)

381-381: Consider the difference between create and update behavior for capabilities.

When updating an existing device (lines 325-326), capabilities is only written if not None, preserving the existing value. However, when creating a new device, capabilities is set directly even if None, resulting in "capabilities": None in the spec.

This asymmetry may be intentional, but if you want consistency, consider omitting the field entirely when None:

🔧 Optional: Omit capabilities from spec when None
                 "spec": {
                     "deviceId": device_id,
                     "displayName": name,
                     "deviceType": resolved_device_type,
                     "connectionMode": "websocket",
                     "bindShell": resolved_bind_shell,
                     "isDefault": is_first_device,
-                    "capabilities": capabilities,
                     "clientIp": client_ip,
                 },

Then conditionally add it after device_json creation:

if capabilities is not None:
    device_json["spec"]["capabilities"] = capabilities
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/device_service.py` at line 381, The creation path
currently unconditionally sets "capabilities": capabilities on the new device
spec which can produce "capabilities": None; make create behave like the update
path by omitting the field when capabilities is None. After building device_json
(the new device spec), only add device_json["spec"]["capabilities"] if the local
variable capabilities is not None (mirroring the update logic that writes
capabilities only when not None), so existing devices keep their value and new
specs don't contain a None capability entry.
chat_shell/chat_shell/tools/sandbox/_base.py (1)

33-35: Regex pattern may miss some valid command invocations.

The pattern HIMALAYA_COMMAND_PATTERN won't match commands like ./himalaya, /usr/bin/himalaya, or subshell invocations like $(himalaya ...). This is acceptable for the current scope but may need expansion if more invocation patterns are expected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chat_shell/chat_shell/tools/sandbox/_base.py` around lines 33 - 35,
HIMALAYA_COMMAND_PATTERN currently only matches bare "himalaya" or "which
command -v himalaya" forms and will miss invocations like "./himalaya",
"/usr/bin/himalaya" and subshell forms "$(himalaya ...)" — update the
HIMALAYA_COMMAND_PATTERN regex to allow optional relative ("./") and absolute
path prefixes and to detect subshell invocations (e.g. starting with "$(") or
parentheses, ensuring you still anchor on word boundaries (use \b or equivalent)
so you don't overmatch; update tests covering Sandbox._base behavior to include
examples like "./himalaya", "/usr/bin/himalaya", and "$(himalaya args)".
backend/tests/services/test_device_sandbox_service.py (1)

69-72: Consider asserting the actual Socket.IO call arguments.

The test asserts mock_sio.call.assert_awaited_once() but doesn't verify the call arguments. Adding argument assertions would strengthen the test by ensuring the correct event name, payload, and namespace are used.

♻️ Suggested enhancement
         assert result["success"] is True
         assert result["device_id"] == "cloud-device"
         mock_online_info.assert_awaited_once_with(1, "cloud-device")
-        mock_sio.call.assert_awaited_once()
+        mock_sio.call.assert_awaited_once()
+        call_args = mock_sio.call.call_args
+        assert call_args.args[0] == "sandbox:exec"
+        assert call_args.kwargs["namespace"] == "/local-executor"
executor/tests/test_local_sandbox_handler.py (1)

36-39: Consider adding assertions for execution_time field.

Both tests verify the result dict structure but don't assert that execution_time exists and has a valid value (>= 0). This would provide more complete coverage of the return contract.

♻️ Suggested enhancement
         assert result["success"] is True
         assert result["stdout"] == "hello\n"
         assert result["stderr"] == ""
         assert result["exit_code"] == 0
+        assert "execution_time" in result
+        assert result["execution_time"] >= 0

And similarly for the timeout test:

         assert result["success"] is False
         assert result["stdout"] == "partial"
         assert "timed out" in result["stderr"]
         assert result["exit_code"] == -1
+        assert "execution_time" in result
+        assert result["execution_time"] >= 0

Also applies to: 60-63

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor/tests/test_local_sandbox_handler.py` around lines 36 - 39, Add
assertions in both tests that the returned result dict contains an
execution_time field and that its value is a non-negative number: assert
"execution_time" in result and assert isinstance(result["execution_time"], (int,
float)) and result["execution_time"] >= 0; do this for the success case (where
result["success"] is True) and for the timeout test (where exit_code/timeout
behavior is asserted) so the contract guarantees a present, valid execution_time
in all code paths.
backend/app/services/device_sandbox_service.py (2)

27-131: Split execute_command into smaller helpers.

This method is significantly over the preferred function-size limit and mixes selection, dispatch, normalization, and logging in one block.

As per coding guidelines, "Function length: Max 50 lines per function (preferred)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/device_sandbox_service.py` around lines 27 - 131, The
execute_command function is too large and mixes device selection, socket
dispatch, response normalization, and logging; split it into smaller helpers:
keep execute_command as the orchestrator that calls (1) _select_target_device
and device_service.get_device_online_info to resolve target_device and
socket_id, (2) a new helper (e.g., _dispatch_to_device or send_sandbox_exec)
that wraps get_sio().call and error handling, and (3) a new helper (e.g.,
_normalize_sandbox_response) that validates and normalizes response fields
(execution_time, exit_code, stdout, stderr, success). Move logging into either
the orchestrator and/or each helper where appropriate so each function stays
small and focused, and update execute_command to assemble and return the final
dict.

83-83: Extract routing and timeout literals into named constants.

Timeout buffer/minimum, fallback exit code, and priority ranks are magic numbers right now, which makes behavior tuning/error-prone.

As per coding guidelines, "Python: Extract magic numbers to constants".

Also applies to: 102-107, 157-166

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/device_sandbox_service.py` at line 83, Extract the magic
numbers into module-level constants (e.g. TIMEOUT_BUFFER = 5, TIMEOUT_MIN = 30,
FALLBACK_EXIT_CODE = <value>, and PRIORITY_RANKS = {...}) and use them where the
literal values appear (replace timeout=max(timeout_seconds + 5, 30) with
timeout=max(timeout_seconds + TIMEOUT_BUFFER, TIMEOUT_MIN) and replace the
fallback exit code and hard-coded priority ranks in the same file). Add the new
constants near the top of device_sandbox_service.py and update all occurrences
referenced in the comment (the timeout computation using timeout_seconds, the
fallback exit code usage, and the priority rank values) to use the constants
instead of literals.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/endpoints/internal/devices.py`:
- Around line 43-47: The async handler execute_device_sandbox_command_internal
is missing OpenTelemetry tracing; add the trace_async decorator above the
function using the project's convention: `@trace_async`(span_name, tracer_name,
extract_attributes). Import trace_async if needed and choose a clear span_name
like "execute_device_sandbox_command_internal" and the appropriate tracer_name
and extract_attributes function used elsewhere (match other endpoints' usage) so
the entire async function is traced.
- Around line 42-46: The execute_device_sandbox_command_internal endpoint
currently allows caller-controlled request.user_id; protect it by adding an
explicit internal auth dependency (e.g., add a
Depends(get_internal_service_auth) or Depends(verify_internal_service) to the
execute_device_sandbox_command_internal signature) and enforce an impersonation
check before running sandbox commands: use a policy function (e.g.,
can_impersonate(calling_service, request.user_id) or
enforce_impersonation_policy(auth_principal, request.user_id)) and return 403 if
not allowed; keep the existing db: Session = Depends(get_db) and request:
InternalDeviceSandboxExecRequest but ensure the new auth dependency supplies the
caller identity used by the impersonation check.

In `@backend/app/services/device_sandbox_service.py`:
- Around line 27-37: The async method execute_command should be instrumented
with the OpenTelemetry async tracer: add the `@trace_async` decorator above its
definition using a descriptive span name (e.g.,
"device_sandbox.execute_command"), the appropriate tracer_name, and an
extract_attributes function that captures relevant args (user_id, device_id,
command); similarly add `@trace_async` or `@trace_sync` (as appropriate) to the
other service methods referenced in the review (the methods covering the ranges
around lines 133-140 and 194-200) using span names like
"device_sandbox.<method_name>" and the same tracer_name/extract_attributes
pattern so the entire call is traced end-to-end. Ensure you import
trace_async/trace_sync and the attribute extractor and place the decorator
immediately above each function definition (preserve async vs sync choice per
method signature).
- Around line 109-126: Normalize the Socket.IO response fields before using
truthiness or len(): compute a normalized_success by checking
response.get("success") — if it's a bool use it, if it's a str compare lower()
to "true"/"1" (or "false"/"0") to derive a boolean, otherwise fall back to
exit_code == 0; compute normalized_stdout and normalized_stderr by pulling
response.get("stdout"/"stderr"), treating None as "", if the value is bytes
decode to str, if it's not a sized string/bytes convert to str(value) so len()
is safe; then use normalized_success, normalized_stdout, normalized_stderr for
logger.info and the returned dict (referencing response, exit_code, logger.info
and the DeviceSandboxService command handling code).

In `@backend/init_data/skills/sandbox/command_tool.py`:
- Around line 198-221: The device-backend branch calling
sandbox_manager.execute_command_via_device lacks exception handling and can
raise httpx.HTTPStatusError or connection errors; wrap the call in a try/except
that catches httpx.HTTPStatusError and httpx.RequestError (or Exception as a
fallback), log the exception, call self._emit_tool_status("failed", ...) with a
clear message and error details (include response-like dict with
error/message/exit_code), and return a json.dumps(...) of that error dict so
error formatting is consistent with the existing success/failure handling in the
should_use_device_backend_for_command branch.

---

Outside diff comments:
In `@backend/app/api/api.py`:
- Around line 56-64: Import ordering in the import block is failing isort;
reorder the names alphabetically within the tuple so the entries are sorted
(callback_router, chat_storage_router, devices_router as
internal_devices_router, services_router, skills_router, subscriptions_router,
tables_router) or run isort --write on the file to apply the fix; update the
import statement in api.py to reflect the corrected alphabetical order for those
symbols.

---

Nitpick comments:
In `@backend/app/api/ws/device_namespace.py`:
- Around line 168-181: The docstring for _register_device is missing
documentation for the new capabilities parameter; update the function
_register_device's docstring to include a short description of capabilities
(type Optional[list[str]]), what it represents (e.g., list of device capability
strings or features), and its role in registration, and add it to the "Args:"
section consistent with the existing style and formatting.

In `@backend/app/services/device_sandbox_service.py`:
- Around line 27-131: The execute_command function is too large and mixes device
selection, socket dispatch, response normalization, and logging; split it into
smaller helpers: keep execute_command as the orchestrator that calls (1)
_select_target_device and device_service.get_device_online_info to resolve
target_device and socket_id, (2) a new helper (e.g., _dispatch_to_device or
send_sandbox_exec) that wraps get_sio().call and error handling, and (3) a new
helper (e.g., _normalize_sandbox_response) that validates and normalizes
response fields (execution_time, exit_code, stdout, stderr, success). Move
logging into either the orchestrator and/or each helper where appropriate so
each function stays small and focused, and update execute_command to assemble
and return the final dict.
- Line 83: Extract the magic numbers into module-level constants (e.g.
TIMEOUT_BUFFER = 5, TIMEOUT_MIN = 30, FALLBACK_EXIT_CODE = <value>, and
PRIORITY_RANKS = {...}) and use them where the literal values appear (replace
timeout=max(timeout_seconds + 5, 30) with timeout=max(timeout_seconds +
TIMEOUT_BUFFER, TIMEOUT_MIN) and replace the fallback exit code and hard-coded
priority ranks in the same file). Add the new constants near the top of
device_sandbox_service.py and update all occurrences referenced in the comment
(the timeout computation using timeout_seconds, the fallback exit code usage,
and the priority rank values) to use the constants instead of literals.

In `@backend/app/services/device_service.py`:
- Line 381: The creation path currently unconditionally sets "capabilities":
capabilities on the new device spec which can produce "capabilities": None; make
create behave like the update path by omitting the field when capabilities is
None. After building device_json (the new device spec), only add
device_json["spec"]["capabilities"] if the local variable capabilities is not
None (mirroring the update logic that writes capabilities only when not None),
so existing devices keep their value and new specs don't contain a None
capability entry.

In `@chat_shell/chat_shell/tools/sandbox/_base.py`:
- Around line 33-35: HIMALAYA_COMMAND_PATTERN currently only matches bare
"himalaya" or "which command -v himalaya" forms and will miss invocations like
"./himalaya", "/usr/bin/himalaya" and subshell forms "$(himalaya ...)" — update
the HIMALAYA_COMMAND_PATTERN regex to allow optional relative ("./") and
absolute path prefixes and to detect subshell invocations (e.g. starting with
"$(") or parentheses, ensuring you still anchor on word boundaries (use \b or
equivalent) so you don't overmatch; update tests covering Sandbox._base behavior
to include examples like "./himalaya", "/usr/bin/himalaya", and "$(himalaya
args)".

In `@executor/tests/test_local_sandbox_handler.py`:
- Around line 36-39: Add assertions in both tests that the returned result dict
contains an execution_time field and that its value is a non-negative number:
assert "execution_time" in result and assert
isinstance(result["execution_time"], (int, float)) and result["execution_time"]
>= 0; do this for the success case (where result["success"] is True) and for the
timeout test (where exit_code/timeout behavior is asserted) so the contract
guarantees a present, valid execution_time in all code paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 46ff243b-45c0-4acc-97e7-e834249cf4af

📥 Commits

Reviewing files that changed from the base of the PR and between 5bffcbb and fdd85e6.

📒 Files selected for processing (18)
  • backend/app/api/api.py
  • backend/app/api/endpoints/devices.py
  • backend/app/api/endpoints/internal/__init__.py
  • backend/app/api/endpoints/internal/devices.py
  • backend/app/api/ws/device_namespace.py
  • backend/app/services/device_sandbox_service.py
  • backend/app/services/device_service.py
  • backend/init_data/skills/sandbox/command_tool.py
  • backend/tests/services/test_device_sandbox_service.py
  • chat_shell/chat_shell/skills/registry.py
  • chat_shell/chat_shell/tools/sandbox/_base.py
  • chat_shell/tests/test_skill_registry.py
  • executor/modes/local/__init__.py
  • executor/modes/local/events.py
  • executor/modes/local/handlers.py
  • executor/modes/local/runner.py
  • executor/modes/local/websocket_client.py
  • executor/tests/test_local_sandbox_handler.py

Comment on lines +42 to +46
@router.post("/sandbox/exec", response_model=DeviceSandboxExecResponse)
async def execute_device_sandbox_command_internal(
request: InternalDeviceSandboxExecRequest,
db: Session = Depends(get_db),
) -> DeviceSandboxExecResponse:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Protect this internal exec endpoint with explicit internal authz/authn.

Line 42 exposes remote command execution with caller-controlled request.user_id. Combined with backend/app/api/api.py (Lines 178-181) including this router without route-level dependencies, this is a privilege-escalation path if /internal is reachable. Add a strict internal-service auth dependency and enforce impersonation policy before executing.

🔐 Example hardening sketch
-from fastapi import APIRouter, Depends, HTTPException, status
+from fastapi import APIRouter, Depends, HTTPException, status
+# from app.api.dependencies import require_internal_service_auth
...
-router = APIRouter(prefix="/devices", tags=["internal-devices"])
+router = APIRouter(
+    prefix="/devices",
+    tags=["internal-devices"],
+    # dependencies=[Depends(require_internal_service_auth)],
+)
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 45-45: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/devices.py` around lines 42 - 46, The
execute_device_sandbox_command_internal endpoint currently allows
caller-controlled request.user_id; protect it by adding an explicit internal
auth dependency (e.g., add a Depends(get_internal_service_auth) or
Depends(verify_internal_service) to the execute_device_sandbox_command_internal
signature) and enforce an impersonation check before running sandbox commands:
use a policy function (e.g., can_impersonate(calling_service, request.user_id)
or enforce_impersonation_policy(auth_principal, request.user_id)) and return 403
if not allowed; keep the existing db: Session = Depends(get_db) and request:
InternalDeviceSandboxExecRequest but ensure the new auth dependency supplies the
caller identity used by the impersonation check.

Comment on lines +43 to +47
async def execute_device_sandbox_command_internal(
request: InternalDeviceSandboxExecRequest,
db: Session = Depends(get_db),
) -> DeviceSandboxExecResponse:
"""Execute a command on a user's device for internal trusted services."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add OpenTelemetry tracing on the async endpoint handler.

Line 43 defines an async API handler without the required trace decorator/span metadata, which leaves this execution path under-instrumented.

As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry".

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 45-45: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/devices.py` around lines 43 - 47, The
async handler execute_device_sandbox_command_internal is missing OpenTelemetry
tracing; add the trace_async decorator above the function using the project's
convention: `@trace_async`(span_name, tracer_name, extract_attributes). Import
trace_async if needed and choose a clear span_name like
"execute_device_sandbox_command_internal" and the appropriate tracer_name and
extract_attributes function used elsewhere (match other endpoints' usage) so the
entire async function is traced.

Comment on lines +27 to +37
async def execute_command(
self,
db: Session,
user_id: int,
command: str,
working_dir: str = "/home/user",
timeout_seconds: int = 300,
required_capability: Optional[str] = None,
device_id: Optional[str] = None,
) -> dict[str, Any]:
"""Execute a command on a user's online device via Socket.IO."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Instrument service methods with OpenTelemetry decorators.

The new async/sync methods are not decorated for tracing, so this critical routing/dispatch path loses standard observability coverage.

As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry" and "Use @trace_sync(span_name, tracer_name, extract_attributes) decorator to trace entire sync functions in OpenTelemetry".

Also applies to: 133-140, 194-200

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/device_sandbox_service.py` around lines 27 - 37, The
async method execute_command should be instrumented with the OpenTelemetry async
tracer: add the `@trace_async` decorator above its definition using a descriptive
span name (e.g., "device_sandbox.execute_command"), the appropriate tracer_name,
and an extract_attributes function that captures relevant args (user_id,
device_id, command); similarly add `@trace_async` or `@trace_sync` (as appropriate)
to the other service methods referenced in the review (the methods covering the
ranges around lines 133-140 and 194-200) using span names like
"device_sandbox.<method_name>" and the same tracer_name/extract_attributes
pattern so the entire call is traced end-to-end. Ensure you import
trace_async/trace_sync and the attribute extractor and place the decorator
immediately above each function definition (preserve async vs sync choice per
method signature).

Comment on lines +109 to +126
logger.info(
"[DeviceSandboxService] Device command completed: user_id=%s, device_id=%s, "
"socket_id=%s, success=%s, exit_code=%s, execution_time=%.2fs, "
"stdout_len=%s, stderr_len=%s",
user_id,
target_device_id,
socket_id,
bool(response.get("success", exit_code == 0)),
exit_code,
execution_time,
len(response.get("stdout", "") or ""),
len(response.get("stderr", "") or ""),
)

return {
"success": bool(response.get("success", exit_code == 0)),
"stdout": response.get("stdout", "") or "",
"stderr": response.get("stderr", "") or "",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid unsafe truthiness and len() on untyped Socket.IO payload fields.

Line 116 can turn "false" into True (bool("false")), and Lines 119-120 can raise TypeError if stdout/stderr are non-sized values. This can produce incorrect success reporting or a 500 on malformed device responses.

🛠️ Safer normalization before logging/return
+        success_raw = response.get("success")
+        if isinstance(success_raw, bool):
+            success = success_raw
+        elif isinstance(success_raw, str):
+            success = success_raw.strip().lower() in {"1", "true", "yes"}
+        else:
+            success = exit_code == 0
+
+        stdout_raw = response.get("stdout", "")
+        stderr_raw = response.get("stderr", "")
+        stdout = stdout_raw if isinstance(stdout_raw, str) else ("" if stdout_raw is None else str(stdout_raw))
+        stderr = stderr_raw if isinstance(stderr_raw, str) else ("" if stderr_raw is None else str(stderr_raw))
+
         logger.info(
             "[DeviceSandboxService] Device command completed: user_id=%s, device_id=%s, "
             "socket_id=%s, success=%s, exit_code=%s, execution_time=%.2fs, "
             "stdout_len=%s, stderr_len=%s",
             user_id,
             target_device_id,
             socket_id,
-            bool(response.get("success", exit_code == 0)),
+            success,
             exit_code,
             execution_time,
-            len(response.get("stdout", "") or ""),
-            len(response.get("stderr", "") or ""),
+            len(stdout),
+            len(stderr),
         )
 
         return {
-            "success": bool(response.get("success", exit_code == 0)),
-            "stdout": response.get("stdout", "") or "",
-            "stderr": response.get("stderr", "") or "",
+            "success": success,
+            "stdout": stdout,
+            "stderr": stderr,
             "exit_code": exit_code,
             "execution_time": execution_time,
             "device_id": target_device_id,
             "backend": "device",
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info(
"[DeviceSandboxService] Device command completed: user_id=%s, device_id=%s, "
"socket_id=%s, success=%s, exit_code=%s, execution_time=%.2fs, "
"stdout_len=%s, stderr_len=%s",
user_id,
target_device_id,
socket_id,
bool(response.get("success", exit_code == 0)),
exit_code,
execution_time,
len(response.get("stdout", "") or ""),
len(response.get("stderr", "") or ""),
)
return {
"success": bool(response.get("success", exit_code == 0)),
"stdout": response.get("stdout", "") or "",
"stderr": response.get("stderr", "") or "",
success_raw = response.get("success")
if isinstance(success_raw, bool):
success = success_raw
elif isinstance(success_raw, str):
success = success_raw.strip().lower() in {"1", "true", "yes"}
else:
success = exit_code == 0
stdout_raw = response.get("stdout", "")
stderr_raw = response.get("stderr", "")
stdout = stdout_raw if isinstance(stdout_raw, str) else ("" if stdout_raw is None else str(stdout_raw))
stderr = stderr_raw if isinstance(stderr_raw, str) else ("" if stderr_raw is None else str(stderr_raw))
logger.info(
"[DeviceSandboxService] Device command completed: user_id=%s, device_id=%s, "
"socket_id=%s, success=%s, exit_code=%s, execution_time=%.2fs, "
"stdout_len=%s, stderr_len=%s",
user_id,
target_device_id,
socket_id,
success,
exit_code,
execution_time,
len(stdout),
len(stderr),
)
return {
"success": success,
"stdout": stdout,
"stderr": stderr,
"exit_code": exit_code,
"execution_time": execution_time,
"device_id": target_device_id,
"backend": "device",
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/device_sandbox_service.py` around lines 109 - 126,
Normalize the Socket.IO response fields before using truthiness or len():
compute a normalized_success by checking response.get("success") — if it's a
bool use it, if it's a str compare lower() to "true"/"1" (or "false"/"0") to
derive a boolean, otherwise fall back to exit_code == 0; compute
normalized_stdout and normalized_stderr by pulling
response.get("stdout"/"stderr"), treating None as "", if the value is bytes
decode to str, if it's not a sized string/bytes convert to str(value) so len()
is safe; then use normalized_success, normalized_stdout, normalized_stderr for
logger.info and the returned dict (referencing response, exit_code, logger.info
and the DeviceSandboxService command handling code).

Comment on lines +198 to +221
if sandbox_manager.should_use_device_backend_for_command(raw_command):
logger.info(
"[SandboxCommandTool] Routing command to device backend: %s",
raw_command[:100],
)
response = await sandbox_manager.execute_command_via_device(
command=raw_command,
working_dir=working_dir,
timeout_seconds=effective_timeout,
required_capability="himalaya_mail",
)

if response.get("success"):
await self._emit_tool_status(
"completed", "Command executed successfully", response
)
else:
await self._emit_tool_status(
"failed",
f"Command failed with exit code {response.get('exit_code', -1)}",
response,
)

return json.dumps(response, ensure_ascii=False, indent=2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing exception handling for device backend HTTP calls.

The execute_command_via_device call (from _base.py) can raise httpx.HTTPStatusError or connection errors, but this block has no try/except. If the device backend is unreachable or returns a non-2xx status, the exception will propagate unhandled, bypassing the existing error formatting logic.

Consider wrapping in try/except to provide a consistent error response format.

🛡️ Proposed fix to add exception handling
             if sandbox_manager.should_use_device_backend_for_command(raw_command):
                 logger.info(
                     "[SandboxCommandTool] Routing command to device backend: %s",
                     raw_command[:100],
                 )
-                response = await sandbox_manager.execute_command_via_device(
-                    command=raw_command,
-                    working_dir=working_dir,
-                    timeout_seconds=effective_timeout,
-                    required_capability="himalaya_mail",
-                )
+                try:
+                    response = await sandbox_manager.execute_command_via_device(
+                        command=raw_command,
+                        working_dir=working_dir,
+                        timeout_seconds=effective_timeout,
+                        required_capability="himalaya_mail",
+                    )
+                except Exception as e:
+                    logger.error(
+                        "[SandboxCommandTool] Device backend execution failed: %s",
+                        e,
+                        exc_info=True,
+                    )
+                    error_msg = f"Device backend execution failed: {e}"
+                    result = self._format_error(
+                        error_message=error_msg,
+                        stdout="",
+                        stderr=str(e),
+                        exit_code=-1,
+                        execution_time=time.time() - start_time,
+                        suggestion="Check if a compatible device is online.",
+                    )
+                    await self._emit_tool_status("failed", error_msg)
+                    return result

                 if response.get("success"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/init_data/skills/sandbox/command_tool.py` around lines 198 - 221, The
device-backend branch calling sandbox_manager.execute_command_via_device lacks
exception handling and can raise httpx.HTTPStatusError or connection errors;
wrap the call in a try/except that catches httpx.HTTPStatusError and
httpx.RequestError (or Exception as a fallback), log the exception, call
self._emit_tool_status("failed", ...) with a clear message and error details
(include response-like dict with error/message/exit_code), and return a
json.dumps(...) of that error dict so error formatting is consistent with the
existing success/failure handling in the should_use_device_backend_for_command
branch.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
backend/app/api/endpoints/internal/devices.py (1)

154-183: ⚠️ Potential issue | 🟠 Major

Internal endpoint allows caller-controlled user_id without authentication.

This endpoint accepts user_id from the request body without any authentication dependency. Per context snippet from backend/app/api/api.py, the /internal router is included without route-level auth dependencies.

If /api/internal/* is exposed to untrusted networks, an attacker could execute commands on any user's device by specifying arbitrary user_id. Ensure either:

  1. Network-level isolation (internal endpoints only reachable from trusted services)
  2. Add explicit internal service authentication dependency
🔐 Add internal auth dependency
+from app.api.dependencies import get_internal_service_auth
+
-router = APIRouter(prefix="/devices", tags=["internal-devices"])
+router = APIRouter(
+    prefix="/devices",
+    tags=["internal-devices"],
+    dependencies=[Depends(get_internal_service_auth)],
+)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/devices.py` around lines 154 - 183, The
internal endpoint execute_device_sandbox_command_internal currently accepts a
caller-controlled user_id from InternalDeviceSandboxExecRequest with no auth;
add an explicit internal-auth dependency (e.g., Depends(get_internal_service) or
Depends(require_internal_token)) to the endpoint signature and verify the
dependency before using request.user_id, or alternatively remove user_id from
the request and derive the target user from the authenticated internal service
context; update the router/endpoint signature to include that dependency and
fail with HTTP 401/403 if the internal auth check fails so only trusted services
can invoke device_sandbox_service.execute_command.
🧹 Nitpick comments (8)
chat_shell/chat_shell/agents/graph_builder.py (2)

278-278: Consider iterable unpacking for cleaner list construction.

Per static analysis (Ruff RUF005), prefer [*lc_messages, item] over list(lc_messages) + [item].

♻️ Proposed fix
-        final_messages = list(lc_messages) + [HumanMessage(content=system_notice)]
+        final_messages = [*lc_messages, HumanMessage(content=system_notice)]

Apply similarly at lines 300 and 316.

Also applies to: 300-300, 316-316

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chat_shell/chat_shell/agents/graph_builder.py` at line 278, Replace list
concatenation using list(lc_messages) + [item] with iterable unpacking for
clarity and to satisfy Ruff RUF005; specifically update the construction of
final_messages where final_messages = list(lc_messages) +
[HumanMessage(content=system_notice)] to use [*lc_messages,
HumanMessage(content=system_notice)], and apply the same change at the two other
occurrences referenced (the similar constructions at lines around 300 and 316)
so each uses [*lc_messages, ...] instead of list(...) + [...] in the
GraphBuilder/graph building code.

289-292: Inconsistent dict content extraction between helper methods.

_stream_final_response_without_tools extracts text from any dict with a "text" key (line 290), while _build_final_state_without_tools requires part.get("type") == "text" (line 312). This could cause different behavior between streaming and non-streaming recovery paths for the same tool-limit scenario.

Consider aligning both methods. The type-checked approach (matching get_final_content and on_chain_end logic) is safer:

♻️ Proposed fix to align streaming helper with non-streaming
                         elif isinstance(part, dict):
-                            text = part.get("text", "")
-                            if text:
-                                yield text
+                            if part.get("type") == "text":
+                                text = part.get("text", "")
+                                if text:
+                                    yield text

Also applies to: 312-313

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chat_shell/chat_shell/agents/graph_builder.py` around lines 289 - 292, The
streaming helper _stream_final_response_without_tools currently yields any dict
with a "text" key, while _build_final_state_without_tools only accepts dicts
where part.get("type") == "text"; update _stream_final_response_without_tools to
mirror the type-checked behavior: when encountering a dict, check
part.get("type") == "text" before extracting part.get("text", "") and yielding
it so its behavior aligns with get_final_content / on_chain_end and the
non-streaming recovery path.
backend/app/services/device_sandbox_service.py (2)

231-248: Socket.IO dispatch lacks retry logic for transient failures.

The sio.call() invocation catches all exceptions and immediately raises DeviceSandboxError. For transient network issues, consider adding retry with exponential backoff before failing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/device_sandbox_service.py` around lines 231 - 248, The
DeviceSandboxService currently calls sio.call(...) and immediately raises
DeviceSandboxError on any Exception; add retry-with-exponential-backoff around
the sio.call invocation to handle transient network errors: implement a small
retry loop (e.g., 3 attempts) around sio.call in the function containing that
code, catching only transient exceptions (network/timeouts) and sleeping with
backoff between attempts, log each retry via logger.error/warn including
user_id, target_device_id and event_name, and only raise DeviceSandboxError
after all attempts fail (preserve original exception as the cause); keep
existing timeout logic and ensure you still pass the same parameters
(event_name, payload, to=socket_id, namespace="/local-executor", timeout=...)
when retrying.

60-78: Parameter format shadows Python builtin.

The format parameter name in read_file and write_file shadows Python's built-in format() function (Ruff A002).

♻️ Rename to `file_format`
     async def read_file(
         self,
         db: Session,
         user_id: int,
         file_path: str,
-        format: str = "text",
+        file_format: str = "text",
         device_id: Optional[str] = None,
         task_id: Optional[int] = None,
     ) -> dict[str, Any]:
         """Read a file from the bound device."""
         return await self._execute_device_event(
             ...
-            payload={"file_path": file_path, "format": format},
+            payload={"file_path": file_path, "format": file_format},

Also applies to write_file at line 106.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/device_sandbox_service.py` around lines 60 - 78, The
parameter name format in read_file and write_file shadows the built-in; rename
it to file_format in both function signatures and all internal references
(including type hints and default value), and update the payload construction
passed to _execute_device_event to use the original "format" key set to
file_format (e.g., payload={"file_path": file_path, "format": file_format});
also update any local variable uses and external callsites to use the new
parameter name (functions: read_file, write_file, and any tests or callers).
executor/modes/local/handlers.py (2)

769-775: HTTP download does not verify TLS certificates for http:// URLs.

The requests.get() call uses default settings. If download_url starts with http://, credentials in the Authorization header are sent in cleartext. Consider logging a warning or enforcing HTTPS for production use.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor/modes/local/handlers.py` around lines 769 - 775, The HTTP download
call using requests.get with download_url, auth_token, timeout_seconds and
stream=True may send Authorization in cleartext for http:// URLs; update the
code around requests.get/response in executor/modes/local/handlers.py (where
download_url and auth_token are used) to enforce or require HTTPS (e.g.,
validate download_url.startswith("https://") and raise or abort for non-HTTPS),
or at minimum log a clear warning before making the request when the URL scheme
is http://; ensure any decision leaves the existing response.raise_for_status()
flow intact.

382-390: Store the task reference returned by asyncio.create_task.

The fire-and-forget task created here is not stored, which means if it raises an exception it will be silently lost. Python's asyncio logs a warning when a task is garbage-collected with an unhandled exception.

♻️ Proposed fix
+        # Track restart task to prevent GC and ensure exception visibility
+        self._restart_task: Optional[asyncio.Task] = None
         # Schedule the restart without awaiting it
-        asyncio.create_task(delayed_restart())
+        self._restart_task = asyncio.create_task(delayed_restart())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor/modes/local/handlers.py` around lines 382 - 390, The fire-and-forget
task created with asyncio.create_task(delayed_restart()) should be stored and
monitored to avoid silent lost exceptions; assign the returned Task to a
persistent variable (e.g., self._restart_task or a module-level _restart_task)
where UpgradeHandler runs, and attach a done callback that logs any exception
(task.add_done_callback(lambda t: _log_exception(t))). Ensure you replace the
current bare asyncio.create_task call with storing the Task and adding the
callback so the task isn't garbage-collected silently and exceptions get logged.
chat_shell/chat_shell/tools/sandbox/_base.py (2)

469-483: Parameter format shadows Python builtin.

The format parameter name shadows Python's built-in format() function. While it won't cause runtime issues here, it's flagged by static analysis (Ruff A002) and can confuse readers.

♻️ Rename to `file_format`
     async def read_file_via_device(
         self,
         file_path: str,
-        format: str = "text",
+        file_format: str = "text",
     ) -> dict[str, Any]:
         """Read a file through the backend's device sandbox bridge."""
         return await self._post_device_backend(
             endpoint=DEVICE_READ_FILE_ENDPOINT,
             payload={
                 "file_path": file_path,
-                "format": format,
+                "format": file_format,
             },

Also applies to write_file_via_device at line 505.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chat_shell/chat_shell/tools/sandbox/_base.py` around lines 469 - 483, Rename
the parameter `format` to `file_format` in the methods read_file_via_device and
write_file_via_device to avoid shadowing Python's built-in; update the function
signatures and any internal references so the payload still sends the API field
"format": use payload={"file_path": file_path, "format": file_format} (and for
write_file_via_device include its other payload fields), and adjust the
docstrings/parameter names to `file_format` so static analysis (Ruff A002) and
readers are satisfied while the external API key remains "format".

413-448: Silent failure when loading device binding.

The ensure_device_binding_loaded method catches all exceptions and only logs a warning (lines 434-441). If the backend is temporarily unavailable, the task will silently fall back to cloud sandbox instead of retrying or surfacing the error.

This may be intentional for resilience, but consider whether certain errors (e.g., network timeout vs. 404) should be handled differently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chat_shell/chat_shell/tools/sandbox/_base.py` around lines 413 - 448,
ensure_device_binding_loaded currently swallows all exceptions and marks
_binding_loaded true, causing silent fallback to cloud; change it to distinguish
transient network errors from permanent/not-found responses: catch
httpx.TimeoutException and httpx.NetworkError and retry a few times (or
re-raise) without setting self._binding_loaded so transient failures can be
retried, but for httpx.HTTPStatusError inspect response.status_code and if it's
404 set _binding_loaded=True and return, otherwise log and re-raise or propagate
the error; ensure bind_device_backend is only called on successful payload
parsing and only mark _binding_loaded=True on success or definitive 404, and
include the original exception details in logs for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@chat_shell/tests/test_sandbox_manager.py`:
- Around line 1-3: Replace the incorrect copyright header in the
test_sandbox_manager module: change the SPDX header text that currently says
"Weibo, Inc." to "WeCode, Inc." so it matches other files in the PR; update the
top-of-file comment block in chat_shell/tests/test_sandbox_manager.py to use
"WeCode, Inc." while keeping the SPDX-License-Identifier unchanged.

In `@executor/modes/local/handlers.py`:
- Around line 896-907: The _normalize_path function currently returns absolute
paths unchanged, allowing access outside the user's home; change it to resolve
the final filesystem path (use os.path.realpath or equivalent) after mapping
"/home/user" prefixes and ensure the resolved path is within the user's home
directory by comparing with os.path.commonpath(home_dir, resolved_path); if the
resolved path falls outside home, either raise a ValueError/PermissionError or
sanitize it to a safe location under home (pick consistent behavior for
LocalDeviceSandbox), and ensure inputs with ".." are normalized before the
containment check to prevent traversal attacks.
- Around line 913-936: The recursive walk function (walk) currently calls
entry.stat(), entry.is_symlink(), and os.readlink(entry_path) without handling
PermissionError/OSError; update walk to catch PermissionError/OSError around the
per-entry operations (stat and readlink) so inaccessible entries are skipped
instead of crashing the whole listing: wrap the entry.stat() and any
os.readlink() calls in a try/except that skips the entry on
PermissionError/OSError (optionally continue to next iterator item), and ensure
the directory recursion (entry.is_dir(follow_symlinks=False) / walk(...)) only
runs when stat succeeded; refer to symbols walk, entry.stat, entry.is_symlink,
entry.is_dir, and os.readlink to locate and modify the code.
- Around line 547-557: Add an explicit clarifying comment immediately above the
subprocess.run call (the block invoking subprocess.run with shell=True,
cwd=resolved_working_dir, timeout=timeout_seconds) stating that shell=True is
intentional for device sandbox functionality, and document the authenticated
trust boundary: that command dispatch is gated by WebSocket connection
authentication (JWT or API key) and user-scoped device access control so
commands execute with the device owner's OS privileges; mention that maintainers
must not assume untrusted input and refer to the authentication/authorization
checks enforcing this boundary.

In `@executor/tests/test_local_sandbox_handler.py`:
- Around line 1-3: Update the copyright header at the top of the
test_local_sandbox_handler.py file to match the project's standard by replacing
"Weibo, Inc." with "WeCode, Inc." so it is consistent with the other test files;
ensure both SPDX-FileCopyrightText and SPDX-License-Identifier lines remain
intact and only the organization string is changed.

---

Duplicate comments:
In `@backend/app/api/endpoints/internal/devices.py`:
- Around line 154-183: The internal endpoint
execute_device_sandbox_command_internal currently accepts a caller-controlled
user_id from InternalDeviceSandboxExecRequest with no auth; add an explicit
internal-auth dependency (e.g., Depends(get_internal_service) or
Depends(require_internal_token)) to the endpoint signature and verify the
dependency before using request.user_id, or alternatively remove user_id from
the request and derive the target user from the authenticated internal service
context; update the router/endpoint signature to include that dependency and
fail with HTTP 401/403 if the internal auth check fails so only trusted services
can invoke device_sandbox_service.execute_command.

---

Nitpick comments:
In `@backend/app/services/device_sandbox_service.py`:
- Around line 231-248: The DeviceSandboxService currently calls sio.call(...)
and immediately raises DeviceSandboxError on any Exception; add
retry-with-exponential-backoff around the sio.call invocation to handle
transient network errors: implement a small retry loop (e.g., 3 attempts) around
sio.call in the function containing that code, catching only transient
exceptions (network/timeouts) and sleeping with backoff between attempts, log
each retry via logger.error/warn including user_id, target_device_id and
event_name, and only raise DeviceSandboxError after all attempts fail (preserve
original exception as the cause); keep existing timeout logic and ensure you
still pass the same parameters (event_name, payload, to=socket_id,
namespace="/local-executor", timeout=...) when retrying.
- Around line 60-78: The parameter name format in read_file and write_file
shadows the built-in; rename it to file_format in both function signatures and
all internal references (including type hints and default value), and update the
payload construction passed to _execute_device_event to use the original
"format" key set to file_format (e.g., payload={"file_path": file_path,
"format": file_format}); also update any local variable uses and external
callsites to use the new parameter name (functions: read_file, write_file, and
any tests or callers).

In `@chat_shell/chat_shell/agents/graph_builder.py`:
- Line 278: Replace list concatenation using list(lc_messages) + [item] with
iterable unpacking for clarity and to satisfy Ruff RUF005; specifically update
the construction of final_messages where final_messages = list(lc_messages) +
[HumanMessage(content=system_notice)] to use [*lc_messages,
HumanMessage(content=system_notice)], and apply the same change at the two other
occurrences referenced (the similar constructions at lines around 300 and 316)
so each uses [*lc_messages, ...] instead of list(...) + [...] in the
GraphBuilder/graph building code.
- Around line 289-292: The streaming helper _stream_final_response_without_tools
currently yields any dict with a "text" key, while
_build_final_state_without_tools only accepts dicts where part.get("type") ==
"text"; update _stream_final_response_without_tools to mirror the type-checked
behavior: when encountering a dict, check part.get("type") == "text" before
extracting part.get("text", "") and yielding it so its behavior aligns with
get_final_content / on_chain_end and the non-streaming recovery path.

In `@chat_shell/chat_shell/tools/sandbox/_base.py`:
- Around line 469-483: Rename the parameter `format` to `file_format` in the
methods read_file_via_device and write_file_via_device to avoid shadowing
Python's built-in; update the function signatures and any internal references so
the payload still sends the API field "format": use payload={"file_path":
file_path, "format": file_format} (and for write_file_via_device include its
other payload fields), and adjust the docstrings/parameter names to
`file_format` so static analysis (Ruff A002) and readers are satisfied while the
external API key remains "format".
- Around line 413-448: ensure_device_binding_loaded currently swallows all
exceptions and marks _binding_loaded true, causing silent fallback to cloud;
change it to distinguish transient network errors from permanent/not-found
responses: catch httpx.TimeoutException and httpx.NetworkError and retry a few
times (or re-raise) without setting self._binding_loaded so transient failures
can be retried, but for httpx.HTTPStatusError inspect response.status_code and
if it's 404 set _binding_loaded=True and return, otherwise log and re-raise or
propagate the error; ensure bind_device_backend is only called on successful
payload parsing and only mark _binding_loaded=True on success or definitive 404,
and include the original exception details in logs for clarity.

In `@executor/modes/local/handlers.py`:
- Around line 769-775: The HTTP download call using requests.get with
download_url, auth_token, timeout_seconds and stream=True may send Authorization
in cleartext for http:// URLs; update the code around requests.get/response in
executor/modes/local/handlers.py (where download_url and auth_token are used) to
enforce or require HTTPS (e.g., validate download_url.startswith("https://") and
raise or abort for non-HTTPS), or at minimum log a clear warning before making
the request when the URL scheme is http://; ensure any decision leaves the
existing response.raise_for_status() flow intact.
- Around line 382-390: The fire-and-forget task created with
asyncio.create_task(delayed_restart()) should be stored and monitored to avoid
silent lost exceptions; assign the returned Task to a persistent variable (e.g.,
self._restart_task or a module-level _restart_task) where UpgradeHandler runs,
and attach a done callback that logs any exception
(task.add_done_callback(lambda t: _log_exception(t))). Ensure you replace the
current bare asyncio.create_task call with storing the Task and adding the
callback so the task isn't garbage-collected silently and exceptions get logged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1d1b45e7-897d-4c2c-844b-ecc6518ecd7d

📥 Commits

Reviewing files that changed from the base of the PR and between fdd85e6 and 0f52fef.

📒 Files selected for processing (21)
  • backend/app/api/api.py
  • backend/app/api/endpoints/internal/devices.py
  • backend/app/services/device_sandbox_service.py
  • backend/init_data/skills/sandbox/command_tool.py
  • backend/init_data/skills/sandbox/download_attachment_tool.py
  • backend/init_data/skills/sandbox/list_files_tool.py
  • backend/init_data/skills/sandbox/read_file_tool.py
  • backend/init_data/skills/sandbox/upload_attachment_tool.py
  • backend/init_data/skills/sandbox/write_file_tool.py
  • backend/tests/services/test_device_sandbox_service.py
  • chat_shell/.env.example
  • chat_shell/chat_shell/agent.py
  • chat_shell/chat_shell/agents/graph_builder.py
  • chat_shell/chat_shell/api/schemas.py
  • chat_shell/chat_shell/core/config.py
  • chat_shell/chat_shell/tools/sandbox/_base.py
  • chat_shell/tests/test_sandbox_manager.py
  • executor/modes/local/events.py
  • executor/modes/local/handlers.py
  • executor/modes/local/runner.py
  • executor/tests/test_local_sandbox_handler.py
✅ Files skipped from review due to trivial changes (4)
  • chat_shell/chat_shell/core/config.py
  • chat_shell/chat_shell/agent.py
  • chat_shell/.env.example
  • chat_shell/chat_shell/api/schemas.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • executor/modes/local/runner.py
  • backend/app/api/api.py
  • executor/modes/local/events.py
  • backend/tests/services/test_device_sandbox_service.py

Comment on lines +1 to +3
# SPDX-FileCopyrightText: 2025 Weibo, Inc.
#
# SPDX-License-Identifier: Apache-2.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Copyright header inconsistency.

The copyright header shows "Weibo, Inc." but other files in this PR use "WeCode, Inc." This should be corrected for consistency.

🔧 Proposed fix
-# SPDX-FileCopyrightText: 2025 Weibo, Inc.
+# SPDX-FileCopyrightText: 2025 WeCode, Inc.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# SPDX-FileCopyrightText: 2025 Weibo, Inc.
#
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: 2025 WeCode, Inc.
#
# SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chat_shell/tests/test_sandbox_manager.py` around lines 1 - 3, Replace the
incorrect copyright header in the test_sandbox_manager module: change the SPDX
header text that currently says "Weibo, Inc." to "WeCode, Inc." so it matches
other files in the PR; update the top-of-file comment block in
chat_shell/tests/test_sandbox_manager.py to use "WeCode, Inc." while keeping the
SPDX-License-Identifier unchanged.

Comment on lines +547 to +557
try:
result = subprocess.run(
command,
shell=True,
capture_output=True,
text=True,
cwd=resolved_working_dir,
timeout=timeout_seconds,
encoding="utf-8",
errors="replace",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that SandboxHandler events are only dispatched from trusted backend sources
# Check how the WebSocket namespace is authenticated

rg -n "local-executor" --type py -C 3

Repository: wecode-ai/Wegent

Length of output: 11955


🏁 Script executed:

cat -n backend/app/api/ws/device_namespace.py | head -100

Repository: wecode-ai/Wegent

Length of output: 4087


🏁 Script executed:

grep -n "async def on_connect" backend/app/api/ws/device_namespace.py -A 30

Repository: wecode-ai/Wegent

Length of output: 1285


🏁 Script executed:

grep -n "async def on_connect" backend/app/api/ws/device_namespace.py -A 80 | head -100

Repository: wecode-ai/Wegent

Length of output: 3492


🏁 Script executed:

grep -n "task:execute" backend/app/api/ws/device_namespace.py -B 5 -A 30

Repository: wecode-ai/Wegent

Length of output: 1343


🏁 Script executed:

grep -n "def on_task_execute\|@self.on\|\.on('task:execute'" backend/app/api/ws/device_namespace.py

Repository: wecode-ai/Wegent

Length of output: 42


🏁 Script executed:

grep -n "task:execute" backend/app/api/ws/device_namespace.py

Repository: wecode-ai/Wegent

Length of output: 108


🏁 Script executed:

grep -n "task:execute\|emit.*task:execute" backend/app/services/device_sandbox_service.py -B 3 -A 3

Repository: wecode-ai/Wegent

Length of output: 42


🏁 Script executed:

rg "task:execute" -A 5 -B 5 --type py

Repository: wecode-ai/Wegent

Length of output: 5009


🏁 Script executed:

rg "device:.*device-1\|room.*device" backend/app/services/ -A 2 -B 2 | head -60

Repository: wecode-ai/Wegent

Length of output: 42


🏁 Script executed:

rg "emit.*task:execute\|sio.emit.*task" backend/app/ -B 3 -A 3 --type py

Repository: wecode-ai/Wegent

Length of output: 42


🏁 Script executed:

rg "to=.*room\|namespace.*local-executor" backend/app/services/ -B 5 -A 5 --type py | head -100

Repository: wecode-ai/Wegent

Length of output: 42


🏁 Script executed:

grep -rn "sio.emit\|socketio.*emit" backend/app/services/device_sandbox_service.py -B 3 -A 3

Repository: wecode-ai/Wegent

Length of output: 42


🏁 Script executed:

cat -n backend/app/services/device_sandbox_service.py | grep -A 20 "def execute_on_device\|def dispatch"

Repository: wecode-ai/Wegent

Length of output: 42


🏁 Script executed:

ls -la backend/app/services/device_sandbox_service.py && wc -l backend/app/services/device_sandbox_service.py

Repository: wecode-ai/Wegent

Length of output: 207


🏁 Script executed:

cat -n backend/app/services/device_sandbox_service.py

Repository: wecode-ai/Wegent

Length of output: 18529


🏁 Script executed:

grep -rn "sandbox:exec\|sandbox:read_file" backend/app/api/ws/device_namespace.py -A 20 | head -80

Repository: wecode-ai/Wegent

Length of output: 42


Add clarifying comment about authenticated trust boundary for shell=True usage.

Using subprocess.run(..., shell=True) is intentional for device sandbox functionality. However, the trust boundary relies on:

  1. WebSocket connection authentication (JWT token or API Key required at connect)
  2. User-scoped device access control (events dispatched only to authenticated user's devices)

While this authentication and access control properly restricts command dispatch to trusted users, adding an explicit code comment documenting the authentication/authorization boundary would improve clarity for future maintainers. Consider noting that commands run with the device owner's OS privileges and that access is gated by user authentication.

🧰 Tools
🪛 Ruff (0.15.6)

[error] 548-548: subprocess call with shell=True identified, security issue

(S602)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor/modes/local/handlers.py` around lines 547 - 557, Add an explicit
clarifying comment immediately above the subprocess.run call (the block invoking
subprocess.run with shell=True, cwd=resolved_working_dir,
timeout=timeout_seconds) stating that shell=True is intentional for device
sandbox functionality, and document the authenticated trust boundary: that
command dispatch is gated by WebSocket connection authentication (JWT or API
key) and user-scoped device access control so commands execute with the device
owner's OS privileges; mention that maintainers must not assume untrusted input
and refer to the authentication/authorization checks enforcing this boundary.

Comment on lines +896 to +907
def _normalize_path(self, path: str) -> str:
"""Map sandbox-style paths onto the local device home directory."""
home_dir = os.path.expanduser("~")
normalized = os.path.expanduser(path)
if not os.path.isabs(normalized):
return os.path.join(home_dir, normalized)
if normalized == "/home/user":
return home_dir
if normalized.startswith("/home/user/"):
suffix = normalized[len("/home/user/") :]
return os.path.join(home_dir, suffix)
return normalized
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Path normalization may allow path traversal outside home directory.

The _normalize_path method only translates /home/user prefixes to the actual home directory. Absolute paths that don't start with /home/user are returned unchanged (line 907), allowing access to any filesystem location (e.g., /etc/passwd, /var/log).

If this is intentional (full device access), document this trust model. If device-sandbox should be confined to the user's home directory, add validation.

🛡️ Optional: Confine paths to home directory
     def _normalize_path(self, path: str) -> str:
         """Map sandbox-style paths onto the local device home directory."""
         home_dir = os.path.expanduser("~")
         normalized = os.path.expanduser(path)
         if not os.path.isabs(normalized):
             return os.path.join(home_dir, normalized)
         if normalized == "/home/user":
             return home_dir
         if normalized.startswith("/home/user/"):
             suffix = normalized[len("/home/user/") :]
             return os.path.join(home_dir, suffix)
-        return normalized
+        # For full device access, allow any absolute path
+        # To confine: raise ValueError or return os.path.join(home_dir, normalized.lstrip('/'))
+        return normalized
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor/modes/local/handlers.py` around lines 896 - 907, The _normalize_path
function currently returns absolute paths unchanged, allowing access outside the
user's home; change it to resolve the final filesystem path (use
os.path.realpath or equivalent) after mapping "/home/user" prefixes and ensure
the resolved path is within the user's home directory by comparing with
os.path.commonpath(home_dir, resolved_path); if the resolved path falls outside
home, either raise a ValueError/PermissionError or sanitize it to a safe
location under home (pick consistent behavior for LocalDeviceSandbox), and
ensure inputs with ".." are normalized before the containment check to prevent
traversal attacks.

Comment on lines +913 to +936
def walk(current_path: str, remaining_depth: int) -> None:
with os.scandir(current_path) as iterator:
for entry in iterator:
entry_path = entry.path
stat_result = entry.stat(follow_symlinks=False)
entries.append(
{
"name": entry.name,
"path": entry_path,
"type": self._entry_type(entry),
"size": stat_result.st_size,
"permissions": oct(stat_result.st_mode & 0o777),
"owner": self._resolve_owner(stat_result.st_uid),
"group": self._resolve_group(stat_result.st_gid),
"modified_time": self._iso_mtime(entry_path),
**(
{"symlink_target": os.readlink(entry_path)}
if entry.is_symlink()
else {}
),
}
)
if remaining_depth > 1 and entry.is_dir(follow_symlinks=False):
walk(entry_path, remaining_depth - 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Recursive directory walk can raise PermissionError on inaccessible entries.

The walk function iterates over os.scandir() and calls entry.stat() without handling PermissionError. If a directory contains entries the user cannot stat (e.g., protected system files), this will raise an unhandled exception and fail the entire listing.

🛡️ Proposed fix to skip inaccessible entries
         def walk(current_path: str, remaining_depth: int) -> None:
-            with os.scandir(current_path) as iterator:
-                for entry in iterator:
-                    entry_path = entry.path
-                    stat_result = entry.stat(follow_symlinks=False)
+            try:
+                scanner = os.scandir(current_path)
+            except PermissionError:
+                return
+            with scanner as iterator:
+                for entry in iterator:
+                    entry_path = entry.path
+                    try:
+                        stat_result = entry.stat(follow_symlinks=False)
+                    except (PermissionError, OSError):
+                        continue  # Skip inaccessible entries
                     entries.append(
                         {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def walk(current_path: str, remaining_depth: int) -> None:
with os.scandir(current_path) as iterator:
for entry in iterator:
entry_path = entry.path
stat_result = entry.stat(follow_symlinks=False)
entries.append(
{
"name": entry.name,
"path": entry_path,
"type": self._entry_type(entry),
"size": stat_result.st_size,
"permissions": oct(stat_result.st_mode & 0o777),
"owner": self._resolve_owner(stat_result.st_uid),
"group": self._resolve_group(stat_result.st_gid),
"modified_time": self._iso_mtime(entry_path),
**(
{"symlink_target": os.readlink(entry_path)}
if entry.is_symlink()
else {}
),
}
)
if remaining_depth > 1 and entry.is_dir(follow_symlinks=False):
walk(entry_path, remaining_depth - 1)
def walk(current_path: str, remaining_depth: int) -> None:
try:
scanner = os.scandir(current_path)
except PermissionError:
return
with scanner as iterator:
for entry in iterator:
entry_path = entry.path
try:
stat_result = entry.stat(follow_symlinks=False)
except (PermissionError, OSError):
continue # Skip inaccessible entries
entries.append(
{
"name": entry.name,
"path": entry_path,
"type": self._entry_type(entry),
"size": stat_result.st_size,
"permissions": oct(stat_result.st_mode & 0o777),
"owner": self._resolve_owner(stat_result.st_uid),
"group": self._resolve_group(stat_result.st_gid),
"modified_time": self._iso_mtime(entry_path),
**(
{"symlink_target": os.readlink(entry_path)}
if entry.is_symlink()
else {}
),
}
)
if remaining_depth > 1 and entry.is_dir(follow_symlinks=False):
walk(entry_path, remaining_depth - 1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor/modes/local/handlers.py` around lines 913 - 936, The recursive walk
function (walk) currently calls entry.stat(), entry.is_symlink(), and
os.readlink(entry_path) without handling PermissionError/OSError; update walk to
catch PermissionError/OSError around the per-entry operations (stat and
readlink) so inaccessible entries are skipped instead of crashing the whole
listing: wrap the entry.stat() and any os.readlink() calls in a try/except that
skips the entry on PermissionError/OSError (optionally continue to next iterator
item), and ensure the directory recursion (entry.is_dir(follow_symlinks=False) /
walk(...)) only runs when stat succeeded; refer to symbols walk, entry.stat,
entry.is_symlink, entry.is_dir, and os.readlink to locate and modify the code.

Comment on lines +1 to +3
# SPDX-FileCopyrightText: 2025 Weibo, Inc.
#
# SPDX-License-Identifier: Apache-2.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Copyright header inconsistency.

Same as the other test file - the copyright header shows "Weibo, Inc." but should be "WeCode, Inc." for consistency.

🔧 Proposed fix
-# SPDX-FileCopyrightText: 2025 Weibo, Inc.
+# SPDX-FileCopyrightText: 2025 WeCode, Inc.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# SPDX-FileCopyrightText: 2025 Weibo, Inc.
#
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: 2025 WeCode, Inc.
#
# SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor/tests/test_local_sandbox_handler.py` around lines 1 - 3, Update the
copyright header at the top of the test_local_sandbox_handler.py file to match
the project's standard by replacing "Weibo, Inc." with "WeCode, Inc." so it is
consistent with the other test files; ensure both SPDX-FileCopyrightText and
SPDX-License-Identifier lines remain intact and only the organization string is
changed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
chat_shell/chat_shell/tools/sandbox/_base.py (3)

19-19: Unused import: base64

The base64 module is imported but does not appear to be used anywhere in this file. Consider removing it if it's not needed.

Proposed fix
 import asyncio
-import base64
 import logging
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chat_shell/chat_shell/tools/sandbox/_base.py` at line 19, Remove the unused
top-level import "base64" from chat_shell.tools.sandbox._base (the import
statement 'import base64') since it's not referenced anywhere in this module;
delete the import line and run the linter/tests to confirm nothing else relies
on it, and if the module is needed in future usage, re-add it at the point of
actual use.

494-508: Parameter format shadows Python builtin.

The format parameter in read_file_via_device (Line 497) and write_file_via_device (Line 530) shadows the Python builtin format() function. Consider renaming to file_format or content_format for clarity.

Proposed fix for `read_file_via_device`
     async def read_file_via_device(
         self,
         file_path: str,
-        format: str = "text",
+        file_format: str = "text",
     ) -> dict[str, Any]:
         """Read a file through the backend's device sandbox bridge."""
         return await self._post_device_backend(
             endpoint=DEVICE_READ_FILE_ENDPOINT,
             payload={
                 "file_path": file_path,
-                "format": format,
+                "format": file_format,
             },
             timeout_seconds=60,
             unwrap_data=True,
         )
Proposed fix for `write_file_via_device`
     async def write_file_via_device(
         self,
         file_path: str,
         content: str,
-        format: str = "text",
+        file_format: str = "text",
         create_dirs: bool = True,
     ) -> dict[str, Any]:
         """Write a file through the backend's device sandbox bridge."""
         return await self._post_device_backend(
             endpoint=DEVICE_WRITE_FILE_ENDPOINT,
             payload={
                 "file_path": file_path,
                 "content": content,
-                "format": format,
+                "format": file_format,
                 "create_dirs": create_dirs,
             },
             timeout_seconds=60,
             unwrap_data=True,
         )

Also applies to: 526-544

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chat_shell/chat_shell/tools/sandbox/_base.py` around lines 494 - 508, The
parameter name `format` in read_file_via_device and write_file_via_device
shadows the Python builtin; rename it to file_format (or content_format) in both
function signatures, update the docstrings and type hints accordingly, replace
all internal uses of `format` with `file_format` (including the payload key
value passed to _post_device_backend) and update any local callers/tests that
call these methods to use the new parameter name; ensure the JSON payload still
sends the expected "format" key name if the backend requires that (i.e.,
payload={"file_path": file_path, "format": file_format}) so only the Python
parameter is renamed while the external contract remains unchanged.

438-473: Consider narrowing exception handling and adding tracing.

  1. The broad Exception catch (Line 459) could mask unexpected errors. Consider catching specific exceptions like httpx.HTTPError or httpx.TimeoutException.

  2. Per coding guidelines, async functions should use @trace_async decorator for OpenTelemetry tracing.

Proposed refinement
+    `@trace_async`("ensure_device_binding_loaded", "sandbox_manager")
     async def ensure_device_binding_loaded(self) -> None:
         """Load sticky device binding from backend once per task if present."""
         if self._binding_loaded or self.is_device_backend_bound():
             return
 
         backend_url = _get_backend_api_url()
         headers = {}
         if self.auth_token:
             headers["Authorization"] = f"Bearer {self.auth_token}"
 
         binding_url = f"{backend_url}{DEVICE_BINDING_ENDPOINT_TEMPLATE.format(task_id=self.task_id)}"
 
         try:
             async with httpx.AsyncClient(timeout=30) as client:
                 response = await client.get(
                     binding_url,
                     params={"user_id": self.user_id},
                     headers=headers,
                 )
                 response.raise_for_status()
                 payload = response.json()
-        except Exception as exc:
+        except (httpx.HTTPError, httpx.TimeoutException) as exc:
             logger.warning(
                 "[SandboxManager] Failed to load task sandbox binding: task_id=%s, error=%s",
                 self.task_id,
                 exc,
             )
             self._binding_loaded = True
             return

As per coding guidelines: "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chat_shell/chat_shell/tools/sandbox/_base.py` around lines 438 - 473, The
try/except in ensure_device_binding_loaded is too broad and the async function
lacks OpenTelemetry tracing; narrow the exception handling to httpx exceptions
(e.g., httpx.HTTPError, httpx.TimeoutException) when calling
client.get/response.raise_for_status and handle other errors separately, and add
the `@trace_async` decorator (use trace_async with an appropriate span name and
tracer name and supply extract_attributes to include task_id/user_id) above
ensure_device_binding_loaded so the whole coroutine is traced; keep the existing
behavior of setting self._binding_loaded True on handled backend-related network
failures and still call bind_device_backend(device_id) when payload indicates a
device backend.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@chat_shell/chat_shell/tools/sandbox/_base.py`:
- Line 19: Remove the unused top-level import "base64" from
chat_shell.tools.sandbox._base (the import statement 'import base64') since it's
not referenced anywhere in this module; delete the import line and run the
linter/tests to confirm nothing else relies on it, and if the module is needed
in future usage, re-add it at the point of actual use.
- Around line 494-508: The parameter name `format` in read_file_via_device and
write_file_via_device shadows the Python builtin; rename it to file_format (or
content_format) in both function signatures, update the docstrings and type
hints accordingly, replace all internal uses of `format` with `file_format`
(including the payload key value passed to _post_device_backend) and update any
local callers/tests that call these methods to use the new parameter name;
ensure the JSON payload still sends the expected "format" key name if the
backend requires that (i.e., payload={"file_path": file_path, "format":
file_format}) so only the Python parameter is renamed while the external
contract remains unchanged.
- Around line 438-473: The try/except in ensure_device_binding_loaded is too
broad and the async function lacks OpenTelemetry tracing; narrow the exception
handling to httpx exceptions (e.g., httpx.HTTPError, httpx.TimeoutException)
when calling client.get/response.raise_for_status and handle other errors
separately, and add the `@trace_async` decorator (use trace_async with an
appropriate span name and tracer name and supply extract_attributes to include
task_id/user_id) above ensure_device_binding_loaded so the whole coroutine is
traced; keep the existing behavior of setting self._binding_loaded True on
handled backend-related network failures and still call
bind_device_backend(device_id) when payload indicates a device backend.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a916e41b-0d63-45fd-a06a-ad967966ff4c

📥 Commits

Reviewing files that changed from the base of the PR and between 0f52fef and d375577.

📒 Files selected for processing (4)
  • chat_shell/.env.example
  • chat_shell/chat_shell/core/config.py
  • chat_shell/chat_shell/tools/sandbox/_base.py
  • chat_shell/tests/test_sandbox_manager.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • chat_shell/chat_shell/core/config.py
  • chat_shell/tests/test_sandbox_manager.py
  • chat_shell/.env.example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant