Skip to content

Commit 647a60a

Browse files
committed
Merge InternManip docs to the release branch
1 parent 945ff0f commit 647a60a

File tree

18 files changed

+3052
-278
lines changed

18 files changed

+3052
-278
lines changed
3.22 MB
Binary file not shown.
Binary file not shown.
161 KB
Binary file not shown.

source/en/user_guide/internmanip/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ myst:
99
# InternManip
1010

1111
```{toctree}
12-
:maxdepth: 3
12+
:maxdepth: 2
1313
1414
quick_start/index
1515
tutorials/index
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# 🥇 Add a New Benchmark
2+
3+
4+
This guide walks you through adding a custom agent and custom evaluation benchmark to the InternManip framework.
5+
6+
### 1. Implement Your Model Agent
7+
8+
9+
To support a new model in InternManip, define a subclass of [`BaseAgent`](../../internmanip/agent/base.py). You must implement two core methods:
10+
11+
- `step()`: given an observation, returns an action.
12+
- `reset()`: resets internal states, if needed.
13+
14+
**Example: Define a Custom Agent**
15+
```python
16+
from internmanip.agent.base import BaseAgent
17+
from internmanip.configs import AgentCfg
18+
19+
class MyCustomAgent(BaseAgent):
20+
def __init__(self, config: AgentCfg):
21+
super().__init__(config)
22+
# Custom model initialization here
23+
24+
def step(self, obs):
25+
# Implement forward logic here
26+
return action
27+
28+
def reset(self):
29+
# Optional: reset internal state
30+
pass
31+
```
32+
33+
**Register Your Agent**
34+
35+
In `internmanip/agent/base.py`, register your agent in the `AgentRegistry`:
36+
```python
37+
class AgentRegistry(Enum):
38+
...
39+
CUSTOM = "MyCustomAgent"
40+
41+
@property
42+
def value(self):
43+
if self.name == "CUSTOM":
44+
from internmanip.agent.my_custom_agent import MyCustomAgent
45+
return MyCustomAgent
46+
...
47+
```
48+
49+
<!---
50+
Define a subclass of [`BaseAgent`](../../internmanip/agent/base.py) to implement two essential methods for model reset and step functionality. An [example](../../internmanip/agent/openvla_agent.py) based on the OpenVLA policy model is provided for reference.--->
51+
52+
### 2. Creating a New Evaluator
53+
54+
To add support for a new evaluation environment, inherit from the `Evaluator` base class and implement required methods:
55+
56+
```python
57+
from internmanip.evaluator.base import Evaluator
58+
from internmanip.configs import EvalCfg
59+
60+
class CustomEvaluator(Evaluator):
61+
62+
def __init__(self, config: EvalCfg):
63+
super().__init__(config)
64+
# Custom initialization logic
65+
...
66+
67+
@classmethod
68+
def _get_all_episodes_setting_data(cls, episodes_config_path) -> List[Any]:
69+
"""Get all episodes setting data from the given path."""
70+
...
71+
72+
def eval(self):
73+
"""The default entrypoint of the evaluation pipeline."""
74+
...
75+
```
76+
77+
### 3. Registering the Evaluator
78+
79+
Register the new evaluator in `EvaluatorRegistry` under `internmanip/evaluator/base.py`:
80+
81+
```python
82+
# In internmanip/evaluator/base.py
83+
class EvaluatorRegistry(Enum):
84+
...
85+
CUSTOM = "CustomEvaluator" # Add new evaluator
86+
87+
@property
88+
def value(self):
89+
if self.name == "CUSTOM":
90+
from internmanip.evaluator.custom_evaluator import CustomEvaluator
91+
return CustomEvaluator
92+
...
93+
```
94+
95+
### 4. Creating Configuration Files
96+
97+
Create configuration files for the new evaluator:
98+
99+
```python
100+
# scripts/eval/configs/custom_agent_on_custom_bench.py
101+
from internmanip.configs import *
102+
from pathlib import Path
103+
104+
eval_cfg = EvalCfg(
105+
eval_type="custom_bench", # Corresponds to the name registered in EvaluatorRegistry
106+
agent=AgentCfg(
107+
agent_type="custom_agent", # Corresponds to the name registered in AgentRegistry
108+
model_name_or_path="path/to/model",
109+
model_kwargs={...},
110+
server_cfg=ServerCfg( # Optional server configuration
111+
server_host="localhost",
112+
server_port=5000,
113+
),
114+
),
115+
env=EnvCfg(
116+
env_type="custom_env", # Corresponds to the name registered in EnvWrapperRegistry
117+
config_path="path/to/env_config.yaml",
118+
env_settings=CustomEnvSettings(...)
119+
),
120+
logging_dir="logs/eval/custom",
121+
distributed_cfg=DistributedCfg( # Optional distributed configuration
122+
num_workers=4,
123+
ray_head_ip="auto", # Use "auto" for local machine
124+
include_dashboard=True,
125+
dashboard_port=8265,
126+
)
127+
)
128+
```
129+
130+
## 5. Launch the Evaluator
131+
```python
132+
python scripts/eval/start_evaluator.py \
133+
--config scripts/eval/configs/custom_on_custom.py
134+
```
135+
Use `--distributed` for Ray-based multi-GPU, and `--server` for client-server mode.
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# 📦 Add a New Dataset
2+
3+
This section explains how to register and add a custom dataset with the InternManip framework.
4+
The process involves two main steps: **[ensuring the dataset format](#dataset-structure)** and **[registering it in code](#implementation-steps)**.
5+
6+
7+
8+
## Dataset Structure
9+
10+
All datasets must follow the [LeRobotDataset Format](#https://github.com/huggingface/lerobot) to ensure compatibility with the data loaders and training pipelines.
11+
The expected structure is:
12+
13+
14+
```
15+
<your_dataset_root> # Root directory of your dataset
16+
17+
├── data # Structured episode data in .parquet format
18+
│ │
19+
│ ├── chunk-000 # Episodes 000000 - 000999
20+
│ │ ├── episode_000000.parquet
21+
│ │ ├── episode_000001.parquet
22+
│ │ └── ...
23+
│ │
24+
│ ├── chunk-001 # Episodes 001000 - 001999
25+
│ │ └── ...
26+
│ │
27+
│ ├── ...
28+
│ │
29+
│ └── chunk-00n # Follows the same convention (1,000 episodes per chunk)
30+
│ └── ...
31+
32+
├── meta # Metadata and statistical information
33+
│ ├── episodes.jsonl # Per-episode metadata (length, subtask, etc.)
34+
│ ├── info.json # Dataset-level information
35+
│ ├── tasks.jsonl # Task definitions
36+
│ ├── modality.json # Key dimensions and mapping information for each modality
37+
│ └── stats.json # Global dataset statistics (mean, std, min, max, quantiles)
38+
39+
└── videos # Multi-view videos for each episode
40+
41+
├── chunk-000 # Videos for episodes 000000 - 000999
42+
│ ├── observation.images.head # Head (main front-view) camera
43+
│ │ ├── episode_000000.mp4
44+
│ │ └── ...
45+
│ ├── observation.images.hand_left # Left hand camera
46+
│ └── observation.images.hand_right # Right hand camera
47+
48+
├── chunk-001 # Videos for episodes 001000 - 001999
49+
50+
├── ...
51+
52+
└── chunk-00n # Follows the same naming and structure
53+
54+
```
55+
56+
> 💡 Note: For more detailed tutorials, please refer to the [Dataset](../tutorials/dataset.md) section.
57+
58+
This separation of raw data, video files, and metadata makes it easier to standardize transformations and modality handling across different datasets.
59+
60+
61+
<!-- > 💡 Note: The `episodes_stats.jsonl` file under `meta/` is optional and can be omitted. -->
62+
63+
## Implementation Steps
64+
65+
### Register a Dataset Class
66+
67+
Create a new dataset class under `internmanip/datasets/`, inheriting from `LeRobotDataset`:
68+
69+
```python
70+
from internmanip.datasets import LeRobotDataset
71+
72+
class CustomDataset(LeRobotDataset):
73+
def __init__(self, *args, **kwargs):
74+
super().__init__(*args, **kwargs)
75+
76+
def load_data(self):
77+
# Implement custom data loading logic here
78+
pass
79+
```
80+
81+
This class defines how to read your dataset’s raw files and convert them into a standardized format for training.
82+
83+
### Define a Data Configuration
84+
85+
Each dataset needs a data configuration class that specifies modalities, keys, and transformations.
86+
Create a new configuration file under `internmanip/configs/data_configs/`. Here’s a minimal example:
87+
88+
```python
89+
class CustomDataConfig(BaseDataConfig):
90+
"""Data configuration for the custom dataset."""
91+
video_keys = ["video.rgb"]
92+
state_keys = ["state.pos"]
93+
action_keys = ["action.delta_pos"]
94+
language_keys = ["annotation.instruction"]
95+
96+
# Temporal indices
97+
observation_indices = [0] # Current timestep for observations
98+
action_indices = list(range(16)) # Future timesteps for actions (0-15)
99+
100+
def modality_config(self) -> dict[str, ModalityConfig]:
101+
"""Define modality configurations."""
102+
return {
103+
"video": ModalityConfig(self.observation_indices, self.video_keys),
104+
"state": ModalityConfig(self.observation_indices, self.state_keys),
105+
"action": ModalityConfig(self.action_indices, self.action_keys),
106+
"language": ModalityConfig(self.observation_indices, self.language_keys),
107+
}
108+
109+
def transform(self):
110+
"""Define preprocessing pipelines."""
111+
return [
112+
# Video preprocessing
113+
VideoToTensor(apply_to=self.video_keys),
114+
VideoResize(apply_to=self.video_keys, height=224, width=224),
115+
116+
# State preprocessing
117+
StateActionToTensor(apply_to=self.state_keys),
118+
StateActionTransform(
119+
apply_to=self.state_keys,
120+
normalization_modes={"state.pos": "mean_std"},
121+
),
122+
123+
# Action preprocessing
124+
StateActionToTensor(apply_to=self.action_keys),
125+
StateActionTransform(
126+
apply_to=self.action_keys,
127+
normalization_modes={"action.delta_pos": "mean_std"},
128+
),
129+
130+
# Concatenate modalities
131+
ConcatTransform(
132+
video_concat_order=self.video_keys,
133+
state_concat_order=self.state_keys,
134+
action_concat_order=self.action_keys,
135+
),
136+
]
137+
```
138+
139+
### Register Your Config
140+
141+
Finally, register your custom config by adding it to `DATA_CONFIG_MAP`.
142+
143+
144+
```python
145+
DATA_CONFIG_MAP = {
146+
...,
147+
"custom": CustomDataConfig(),
148+
}
149+
```
150+
151+
> 💡 Tips: Adjust the key names (`video_keys`, `state_keys`, etc.) and `normalization_modes` based on your dataset. For multi-view video or multi-joint actions, just add more keys and update the transforms accordingly.
152+
153+
This config sets up how to load and process different modalities, and ensures compatibility with the training framework.
154+
155+
### What's Next?
156+
After registration, you can use your dataset by passing `--dataset_path <path>` and `--data_config custom` to the training YAML file.

0 commit comments

Comments
 (0)