Hi, thanks for making this code available!
I am trying to convert a dataset I collected for finetuning the RT-1-X model. For the transformation step, I am confused by discrepancies between this repository and the RT-1-X training example:
In the specified Target Config in transform.py, 'action' is defined as a tensor of shape (8,), while to my understanding, RT-1 expects 'action' to contain four separate components, namely 'world_vector', 'rotation_delta', 'gripper_closedness_action', and 'terminate_episode'. Further, you specify a separate field for 'language_embedding', while RT-1 expects the field 'natural_language_embedding' within the 'observation' field.
For my project, I changed the transformation so the converted dataset can be used for training in a similar manner as the datasets already used in the training example. However, I am curious what the reason for this discrepancy is.
Hi, thanks for making this code available!
I am trying to convert a dataset I collected for finetuning the RT-1-X model. For the transformation step, I am confused by discrepancies between this repository and the RT-1-X training example:
In the specified Target Config in
transform.py, 'action' is defined as a tensor of shape (8,), while to my understanding, RT-1 expects 'action' to contain four separate components, namely 'world_vector', 'rotation_delta', 'gripper_closedness_action', and 'terminate_episode'. Further, you specify a separate field for 'language_embedding', while RT-1 expects the field 'natural_language_embedding' within the 'observation' field.For my project, I changed the transformation so the converted dataset can be used for training in a similar manner as the datasets already used in the training example. However, I am curious what the reason for this discrepancy is.