The dataset should contain 3 RGB camera views, but only the eye-on-hand images are available in the dataset.
When I list all the keys that exist in the episode metadata, there are:
'steps.action
steps.discount
steps.is_first
steps.is_last
steps.is_terminal
steps.language_embedding
steps.language_instruction
steps.observation.gripper
steps.observation.hand_image
steps.observation.joint_pos
steps.reward'
And for the observations, the specific datatypes and shapes are:
Key: gripper, Shape: (), Dtype: <dtype: 'bool'>
Key: hand_image, Shape: (480, 640, 3), Dtype: <dtype: 'uint8'>
Key: joint_pos, Shape: (7,), Dtype: <dtype: 'float32'>
It seems that 2 of fixed frame RGB images are missing in the dataset.