InternRobotics
diff --git a/‎source/en/_static/video/manip_eval.webm
3.22 MB b/‎source/en/_static/video/manip_eval.webm
3.22 MB
diff --git a/‎source/en/_static/video/manip_verification.webm
513 KB b/‎source/en/_static/video/manip_verification.webm
513 KB
diff --git a/‎source/en/_static/video/widowx_bridge.webm
161 KB b/‎source/en/_static/video/widowx_bridge.webm
161 KB
diff --git a/‎source/en/user_guide/internmanip/index.md
Lines changed: 1 addition & 1 deletion b/‎source/en/user_guide/internmanip/index.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎source/en/user_guide/internmanip/quick_start/add_benchmark.md
Lines changed: 135 additions & 0 deletions b/‎source/en/user_guide/internmanip/quick_start/add_benchmark.md
Lines changed: 135 additions & 0 deletions
diff --git a/‎source/en/user_guide/internmanip/quick_start/add_dataset.md
Lines changed: 156 additions & 0 deletions b/‎source/en/user_guide/internmanip/quick_start/add_dataset.md
Lines changed: 156 additions & 0 deletions
@@ -9,7 +9,7 @@ myst:
 # InternManip
 
 ```{toctree}
-:maxdepth: 3
+:maxdepth: 2
 
 quick_start/index
 tutorials/index
 
@@ -0,0 +1,135 @@
+# 🥇 Add a New Benchmark
+
+
+This guide walks you through adding a custom agent and custom evaluation benchmark to the InternManip framework.
+
+### 1. Implement Your Model Agent
+
+
+To support a new model in InternManip, define a subclass of [`BaseAgent`](../../internmanip/agent/base.py). You must implement two core methods:
+
+- `step()`: given an observation, returns an action.
+- `reset()`: resets internal states, if needed.
+
+**Example: Define a Custom Agent**
+```python
+from internmanip.agent.base import BaseAgent
+from internmanip.configs import AgentCfg
+
+class MyCustomAgent(BaseAgent):
+    def __init__(self, config: AgentCfg):
+        super().__init__(config)
+        # Custom model initialization here
+
+    def step(self, obs):
+        # Implement forward logic here
+        return action
+
+    def reset(self):
+        # Optional: reset internal state
+        pass
+```
+
+**Register Your Agent**
+
+In `internmanip/agent/base.py`, register your agent in the `AgentRegistry`:
+```python
+class AgentRegistry(Enum):
+    ...
+    CUSTOM = "MyCustomAgent"
+
+    @property
+    def value(self):
+        if self.name == "CUSTOM":
+            from internmanip.agent.my_custom_agent import MyCustomAgent
+            return MyCustomAgent
+        ...
+ ```
+
+<!---
+Define a subclass of [`BaseAgent`](../../internmanip/agent/base.py) to implement two essential methods for model reset and step functionality. An [example](../../internmanip/agent/openvla_agent.py) based on the OpenVLA policy model is provided for reference.--->
+
+### 2. Creating a New Evaluator
+
+To add support for a new evaluation environment, inherit from the `Evaluator` base class and implement required methods:
+
+```python
+from internmanip.evaluator.base import Evaluator
+from internmanip.configs import EvalCfg
+
+class CustomEvaluator(Evaluator):
+
+    def __init__(self, config: EvalCfg):
+        super().__init__(config)
+        # Custom initialization logic
+        ...
+
+    @classmethod
+    def _get_all_episodes_setting_data(cls, episodes_config_path) -> List[Any]:
+        """Get all episodes setting data from the given path."""
+        ...
+
+    def eval(self):
+        """The default entrypoint of the evaluation pipeline."""
+        ...
+```
+
+### 3. Registering the Evaluator
+
+Register the new evaluator in `EvaluatorRegistry` under `internmanip/evaluator/base.py`:
+
+```python
+# In internmanip/evaluator/base.py
+class EvaluatorRegistry(Enum):
+    ...
+    CUSTOM = "CustomEvaluator"  # Add new evaluator
+
+    @property
+    def value(self):
+        if self.name == "CUSTOM":
+            from internmanip.evaluator.custom_evaluator import CustomEvaluator
+            return CustomEvaluator
+    ...
+```
+
+### 4. Creating Configuration Files
+
+Create configuration files for the new evaluator:
+
+```python
+# scripts/eval/configs/custom_agent_on_custom_bench.py
+from internmanip.configs import *
+from pathlib import Path
+
+eval_cfg = EvalCfg(
+    eval_type="custom_bench",  # Corresponds to the name registered in EvaluatorRegistry
+    agent=AgentCfg(
+        agent_type="custom_agent", # Corresponds to the name registered in AgentRegistry
+        model_name_or_path="path/to/model",
+        model_kwargs={...},
+        server_cfg=ServerCfg(  # Optional server configuration
+            server_host="localhost",
+            server_port=5000,
+        ),
+    ),
+    env=EnvCfg(
+        env_type="custom_env", # Corresponds to the name registered in EnvWrapperRegistry
+        config_path="path/to/env_config.yaml",
+        env_settings=CustomEnvSettings(...)
+    ),
+    logging_dir="logs/eval/custom",
+    distributed_cfg=DistributedCfg( # Optional distributed configuration
+        num_workers=4,
+        ray_head_ip="auto",  # Use "auto" for local machine
+        include_dashboard=True,
+        dashboard_port=8265,
+    )
+)
+```
+
+## 5. Launch the Evaluator
+```python
+python scripts/eval/start_evaluator.py \
+  --config scripts/eval/configs/custom_on_custom.py
+```
+Use `--distributed` for Ray-based multi-GPU, and `--server` for client-server mode.
@@ -0,0 +1,156 @@
+# 📦 Add a New Dataset
+
+This section explains how to register and add a custom dataset with the InternManip framework.
+The process involves two main steps: **[ensuring the dataset format](#dataset-structure)** and **[registering it in code](#implementation-steps)**.
+
+
+
+## Dataset Structure
+
+All datasets must follow the [LeRobotDataset Format](#https://github.com/huggingface/lerobot) to ensure compatibility with the data loaders and training pipelines.
+The expected structure is:
+
+
+```
+<your_dataset_root>  # Root directory of your dataset
+│
+├── data  # Structured episode data in .parquet format
+│   │
+│   ├── chunk-000  # Episodes 000000 - 000999
+│   │   ├── episode_000000.parquet
+│   │   ├── episode_000001.parquet
+│   │   └── ...
+│   │
+│   ├── chunk-001  # Episodes 001000 - 001999
+│   │   └── ...
+│   │
+│   ├── ...
+│   │
+│   └── chunk-00n  # Follows the same convention (1,000 episodes per chunk)
+│       └── ...
+│
+├── meta  # Metadata and statistical information
+│   ├── episodes.jsonl         # Per-episode metadata (length, subtask, etc.)
+│   ├── info.json              # Dataset-level information
+│   ├── tasks.jsonl            # Task definitions
+│   ├── modality.json          # Key dimensions and mapping information for each modality
+│   └── stats.json             # Global dataset statistics (mean, std, min, max, quantiles)
+│
+└── videos  # Multi-view videos for each episode
+    │
+    ├── chunk-000  # Videos for episodes 000000 - 000999
+    │   ├── observation.images.head       # Head (main front-view) camera
+    │   │   ├── episode_000000.mp4
+    │   │   └── ...
+    │   ├── observation.images.hand_left  # Left hand camera
+    │   └── observation.images.hand_right # Right hand camera
+    │
+    ├── chunk-001  # Videos for episodes 001000 - 001999
+    │
+    ├── ...
+    │
+    └── chunk-00n  # Follows the same naming and structure
+
+```
+
+> 💡 Note: For more detailed tutorials, please refer to the [Dataset](../tutorials/dataset.md) section.
+
+This separation of raw data, video files, and metadata makes it easier to standardize transformations and modality handling across different datasets.
+
+
+<!-- > 💡 Note: The `episodes_stats.jsonl` file under `meta/` is optional and can be omitted. -->
+
+## Implementation Steps
+
+### Register a Dataset Class
+
+Create a new dataset class under `internmanip/datasets/`, inheriting from `LeRobotDataset`:
+
+```python
+from internmanip.datasets import LeRobotDataset
+
+class CustomDataset(LeRobotDataset):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def load_data(self):
+        # Implement custom data loading logic here
+        pass
+```
+
+This class defines how to read your dataset’s raw files and convert them into a standardized format for training.
+
+### Define a Data Configuration
+
+Each dataset needs a data configuration class that specifies modalities, keys, and transformations.
+Create a new configuration file under `internmanip/configs/data_configs/`. Here’s a minimal example:
+
+```python
+class CustomDataConfig(BaseDataConfig):
+    """Data configuration for the custom dataset."""
+    video_keys = ["video.rgb"]
+    state_keys = ["state.pos"]
+    action_keys = ["action.delta_pos"]
+    language_keys = ["annotation.instruction"]
+
+    # Temporal indices
+    observation_indices = [0]         # Current timestep for observations
+    action_indices = list(range(16))  # Future timesteps for actions (0-15)
+
+    def modality_config(self) -> dict[str, ModalityConfig]:
+        """Define modality configurations."""
+        return {
+            "video": ModalityConfig(self.observation_indices, self.video_keys),
+            "state": ModalityConfig(self.observation_indices, self.state_keys),
+            "action": ModalityConfig(self.action_indices, self.action_keys),
+            "language": ModalityConfig(self.observation_indices, self.language_keys),
+        }
+
+    def transform(self):
+        """Define preprocessing pipelines."""
+        return [
+            # Video preprocessing
+            VideoToTensor(apply_to=self.video_keys),
+            VideoResize(apply_to=self.video_keys, height=224, width=224),
+
+            # State preprocessing
+            StateActionToTensor(apply_to=self.state_keys),
+            StateActionTransform(
+                apply_to=self.state_keys,
+                normalization_modes={"state.pos": "mean_std"},
+            ),
+
+            # Action preprocessing
+            StateActionToTensor(apply_to=self.action_keys),
+            StateActionTransform(
+                apply_to=self.action_keys,
+                normalization_modes={"action.delta_pos": "mean_std"},
+            ),
+
+            # Concatenate modalities
+            ConcatTransform(
+                video_concat_order=self.video_keys,
+                state_concat_order=self.state_keys,
+                action_concat_order=self.action_keys,
+            ),
+        ]
+```
+
+### Register Your Config
+
+Finally, register your custom config by adding it to `DATA_CONFIG_MAP`.
+
+
+```python
+DATA_CONFIG_MAP = {
+    ...,
+    "custom": CustomDataConfig(),
+}
+```
+
+> 💡 Tips: Adjust the key names (`video_keys`, `state_keys`, etc.) and `normalization_modes` based on your dataset. For multi-view video or multi-joint actions, just add more keys and update the transforms accordingly.
+
+This config sets up how to load and process different modalities, and ensures compatibility with the training framework.
+
+### What's Next?
+After registration, you can use your dataset by passing `--dataset_path <path>` and `--data_config custom` to the training YAML file.