Skip to content
This repository was archived by the owner on Nov 12, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -237,3 +237,4 @@ demo_data/demos25

demo_data/libero_spatial_no_noops_1.0.0_lerobot
experiments/test
dev/
6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
[submodule "experiments/7_franka/deoxys_control"]
path = experiments/7_franka/deoxys_control
url = https://github.com/UT-Austin-RPL/deoxys_control.git
[submodule "experiments/5_widowx/bridge_data_robot"]
path = experiments/5_widowx/bridge_data_robot
url = https://github.com/HaomingSong/bridge_data_robot.git
[submodule "experiments/5_widowx/edgeml"]
path = experiments/5_widowx/edgeml
url = https://github.com/youliangtan/edgeml.git
104 changes: 104 additions & 0 deletions experiments/5_widowx/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# WidowX 250s with EO-1

This directory contains the implementation for controlling WidowX 250s robots using the EO-1 model. The system enables real-time robot manipulation tasks through vision-language-action integration.

## 🚀 Quick Start

### Prerequisites

**Hardware Requirements:**

- WidowX 250s robot arm
- RealSense D435 camera (or compatible RGB camera)
- Compute options:
- Single GPU workstation (runs both ROS control and model inference)
- OR: NUC + GPU workstation (NUC for arm control, workstation for model inference)

**Software Requirements:**

- Ubuntu 20.04+ with CUDA support
- Python 3.10+
- Docker (recommended for running the WidowX ROS control node on a workstation in single-machine mode)
- BridgeData WidowX controller stack properly configured

Notes on architecture:

- `Single-machine mode`: Run the WidowX ROS control node in Docker on the same GPU workstation used for EO-1 inference.
- `Dual-machine mode`: Use a NUC for robot control and a GPU workstation for model inference. For WidowX, the NUC does not require a real-time kernel in this setup.

### Installation

1. **Setup submodules:**

```bash
git submodule update --init --recursive experiments/5_widowx/bridge_data_robot
git submodule update --init --recursive experiments/5_widowx/edgeml
```

2. **Configure robot control system:**
Follow the BridgeData WidowX controller setup in [bridge_data_robot](https://github.com/HaomingSong/bridge_data_robot?tab=readme-ov-file#setup) to configure your NUC/workstation for WidowX 250s control:

3. **Install dependencies on workstation**

```bash
# Create conda environment
conda create -n eo python=3.10
conda activate eo

# Install WidowX envs for workstation
pip install -e experiments/5_widowx/bridge_data_robot/widowx_envs
pip install -e experiments/5_widowx/edgeml

# Install additional requirements
pip install -r experiments/5_widowx/requirements.txt
```

**Note**: In dual-machine mode, ensure the workstation can reach the control host (robot IP/port) over the network. In single-machine mode, ensure Docker has access to USB and camera devices.

## 🤖 Running Robot Control

### Basic Usage

```bash
python experiments/5_widowx/eval_widowx.py \
--model-path "path/to/your/model" \
--repo-id libero_spatial_no_noops_1.0.0_lerobot \
--default-instruction "Put the eggplant in the basket" \
--robot-ip 10.6.8.122 \
--robot-port 5556 \
--max-timesteps 120
```

### Parameters

| Parameter | Description | Default |
| ----------------------- | ----------------------------------------- | -------------------------------- |
| `--model-path` | Path to the trained EO-1 model checkpoint | Required |
| `--repo-id` | Dataset/repo ID for task specification | Required |
| `--default-instruction` | Default natural language instruction | "Put the eggplant in the basket" |
| `--roll-out-path` | Directory to save rollouts/videos | experiments/5_widowx/logs |
| `--max-timesteps` | Maximum number of control steps | 120 |
| `--im-size` | Image size for model input | 224 |
| `--action-horizon` | Receding-horizon (RHC) execution steps | 2 |
| `--blocking` | Use blocking control for step execution | False |
| `--robot-ip` | Robot/control host IP | 10.6.8.122 |
| `--robot-port` | Robot/control host port | 5556 |

### Camera Configuration

- Default color topic for RealSense D435 is `/D435/color/image_raw` (see `CAMERA_TOPICS` in `eval_widowx.py`).
- Mount and wire the D435 according to the hardware guide: [BridgeData V2 Hardware Setup](https://docs.google.com/document/d/1si-6cTElTWTgflwcZRPfgHU7-UwfCUkEztkH3ge5CGc/edit?tab=t.0).
- If your camera topic differs, update `CAMERA_TOPICS` or the controller configuration accordingly.

## 🔒 Safety Considerations

- Always ensure proper workspace setup and clear the workspace before operation.
- Monitor robot movements and be ready to use the emergency stop.
- Verify camera positioning and exposure for optimal visual coverage.

## 📝 Notes

- This setup uses a single external D435 stream by default; wrist camera is optional.
- Model performance depends on lighting, viewpoint, and calibration quality.
- Regular calibration of the robot and camera(s) is recommended.
- Rollouts and videos are saved under `--roll-out-path`.
1 change: 1 addition & 0 deletions experiments/5_widowx/bridge_data_robot
Submodule bridge_data_robot added at b84113
1 change: 1 addition & 0 deletions experiments/5_widowx/edgeml
Submodule edgeml added at b4b849
247 changes: 247 additions & 0 deletions experiments/5_widowx/eval_widowx.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
"""
This script shows how we evaluated a finetuned EO-1 on a real WidowX robot, which is adapted from https://github.com/octo-models/octo/blob/main/examples/04_eval_finetuned_on_robot.py.
While the exact specifics may not be applicable to your use case, this script serves as a didactic example of how to use EO-1 in a real-world setting.

If you wish, you may reproduce these results by [reproducing the robot setup](https://rail-berkeley.github.io/bridgedata/)
and installing [the robot controller](https://github.com/HaomingSong/bridge_data_robot.git)
"""

import os

os.environ["TOKENIZERS_PARALLELISM"] = "false"
import dataclasses
import pathlib
import time
from datetime import datetime

import cv2
import imageio
import numpy as np
import pandas as pd
import torch
import tqdm
import tyro
from PIL import Image
from transformers import AutoModel, AutoProcessor
from widowx_env import RHCWrapper, WidowXGym
from widowx_envs.widowx_env_service import WidowXConfigs


@dataclasses.dataclass
class Args:
#################################################################################################################
# Model parameters
#################################################################################################################
im_size: int = 224
action_horizon: int = 2
model_path: str = ""
repo_id: str = ""

#################################################################################################################
# WidowX environment-specific parameters
#################################################################################################################
robot_ip: str = "10.6.8.122" # IP address of the robot
robot_port: int = 5556 # Port of the robot
initial_eep: tuple[float, float, float] = (0.3, 0.0, 0.25) # Initial position
# initial_eep: tuple[float, float, float] = (0.15, 0.0, 0.1) # Initial position
blocking: bool = False # Use the blocking controller
max_timesteps: int = 120 # Number of timesteps to run
default_instruction: str = "Put the eggplant in the basket" # Default instruction

#################################################################################################################
# Utils
#################################################################################################################
show_image: bool = False # Show image
roll_out_path: pathlib.Path = pathlib.Path("experiments/5_widowx/logs") # Path to save videos


##############################################################################
STEP_DURATION_MESSAGE = """
Bridge data was collected with non-blocking control and a step duration of 0.2s.
However, we relabel the actions to make it look like the data was collected with
blocking control and we evaluate with blocking control.
Be sure to use a step duration of 0.2 if evaluating with non-blocking control.
"""
STEP_DURATION = 0.2
STICKY_GRIPPER_NUM_STEPS = 1
WORKSPACE_BOUNDS = [[0.1, -0.15, -0.01, -1.57, 0], [0.45, 0.25, 0.25, 1.57, 0]]
CAMERA_TOPICS = [{"name": "/D435/color/image_raw"}]
ENV_PARAMS = {
"camera_topics": CAMERA_TOPICS,
"override_workspace_boundaries": WORKSPACE_BOUNDS,
"move_duration": STEP_DURATION,
}

##############################################################################


def eval_bridge(args: Args) -> None:
curr_time = datetime.now().strftime("%Y_%m_%d_%H:%M:%S")
base_save_path = args.roll_out_path / pathlib.Path(args.default_instruction.replace(" ", "_")) / curr_time

# set up the widowx client
start_state = np.concatenate([args.initial_eep, (0, 0, 0, 1)])
env_params = WidowXConfigs.DefaultEnvParams.copy()
env_params.update(ENV_PARAMS)
env_params["start_state"] = list(start_state)

env = WidowXGym(
env_params,
host=args.robot_ip,
port=args.robot_port,
im_size=args.im_size,
blocking=args.blocking,
sticky_gripper_num_steps=STICKY_GRIPPER_NUM_STEPS,
)
if not args.blocking:
assert STEP_DURATION == 0.2, STEP_DURATION_MESSAGE
results_df = pd.DataFrame(columns=["success", "duration", "video_filename"])

model = (
AutoModel.from_pretrained(args.model_path, dtype=torch.bfloat16, trust_remote_code=True).eval().cuda()
)

processor = AutoProcessor.from_pretrained(args.model_path, trust_remote_code=True)

# switch TemporalEnsembleWrapper with RHCWrapper for receding horizon control
env = RHCWrapper(env, args.action_horizon)

while True:
# reset env
obs, _ = env.reset()
time.sleep(2.0)

if input(f"Use default instruction: {args.default_instruction}? (default y) [y/n]").lower() == "n":
instruction = input("Enter instruction: ")
else:
instruction = args.default_instruction

# do rollout
images = []
images.append(obs["full_image"])
last_tstep = time.time()
bar = tqdm.tqdm(
range(args.max_timesteps),
position=0,
leave=True,
ncols=80,
desc="Rollout steps",
)

for t_step in bar:
try:
bar.set_description(f"Step {t_step}/{args.max_timesteps}")
if args.show_image:
cv2.imshow("img_view", obs["full_image"])
cv2.waitKey(1)

# prepare observation
# image = torch.from_numpy(obs["image_primary"] / 255).permute(2, 0, 1)
# [::-1, ::-1]
image = cv2.resize(obs["full_image"], (256, 256), interpolation=cv2.INTER_LINEAR)
# image = np.ascontiguousarray(obs["image_primary"])

# print("image",image.shape)
img = Image.fromarray(image)
batch = {
"observation.images.image": [img],
"observation.images.wrist_image": [img],
"observation.state": [obs["proprio"]],
"task": [str(instruction)],
"repo_id": [args.repo_id],
}
ov_out = processor.select_action(model, batch)
action_chunk = ov_out.action.squeeze(0).numpy()

assert len(action_chunk) >= args.action_horizon, (
f"We want to replan every {args.action_horizon} steps, but policy only predicts {len(action_chunk)} steps."
)

# perform environment step
obs, _, _, truncated, infos = env.step(action_chunk)

# recording history images
for history_obs in infos["observations"]:
image = history_obs["full_image"]
images.append(image)
if truncated:
break

# match the step duration
elapsed_time = time.time() - last_tstep
if elapsed_time < STEP_DURATION:
time.sleep(STEP_DURATION - elapsed_time)

except KeyboardInterrupt:
break
time.sleep(0.2)

# logging rollouts
success: str | float | None = None
while not isinstance(success, float):
success = input(
"Did the rollout succeed? (enter y for 100%, n for 0%, a float value 0-1, or a numeric value 0-100 based on the evaluation spec)"
)
try:
if success == "y":
success = 1.0
elif success == "n":
success = 0.0
else:
success = float(success)
except Exception:
success = 0.0

video_save_path = (
base_save_path
/ "videos"
/ f"{datetime.now().strftime('%Y_%m_%d-%H_%M_%S')}_success_{success:.2f}.mp4"
)

if not (0 <= success <= 1):
print(f"Success must be a number in [0, 1] but got: {success}")

results_df = pd.concat(
[
results_df,
pd.DataFrame(
[
{
"instruction": instruction,
"success": success,
"duration": t_step,
"video_filename": video_save_path,
"model_path": args.model_path,
"repo_id": args.repo_id,
}
]
),
],
ignore_index=True,
)

# saving video
video = np.stack(images)
video_save_path.parent.mkdir(parents=True, exist_ok=True)
imageio.mimsave(video_save_path, video, fps=1.0 / STEP_DURATION * 3)

if (
input(f"Already eval {len(results_df)} rollouts. Do one more eval (default y)? [y/n]").lower()
== "n"
):
break

# save results
csv_filename = base_save_path / "results.csv"
results_df.to_csv(csv_filename, index=False)
print(f"Results saved to {csv_filename}")
# print avg
print(f"Avg success: {results_df['success'].mean()}")


if __name__ == "__main__":
import logging

logging.basicConfig(level=logging.INFO)
args: Args = tyro.cli(Args)
eval_bridge(args)
3 changes: 3 additions & 0 deletions experiments/5_widowx/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
gym
funcsigs
numpy==1.24.3
Loading