Skip to content
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ checkpoints
__pycache__
outputs
.vscode
~/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, forgot to remove that after adding it in from a local git annoyance

24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,30 @@ automatic annotations using DINO work well in many cases but can struggle with t
gripper masks. All downstream object tracking and reconstruction results are sensitive
to the segmentation quality and thus spending a bit of effort here might be worthwhile.

##### Gripper masking with fine tuned models
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to also make this an option for the run_asset_generation.py pipeline. Maybe add a store_true argument to enable it.
In that case, we have to think about where to put the installation instructions. We could keep them here but refer to them here when explaining that argument.

We provide fine tuned networks for SAM2 and GroundingDINO for the segmentation and annotation
of the gripper used in our provided dataset which can be downloaded from [here](https://mitprod-my.sharepoint.com/personal/nepfaff_mit_edu/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fnepfaff%5Fmit%5Fedu%2FDocuments%2Fscalable%5Freal2sim%5Fmodel%5Fweights&ga=1).

Please put the downloaded checkpoint files in the `./checkpoints` directory.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:nit: That directory doesn't currently exist. It might make sense to create it and put a .gitignore file inside that includes all files inside this directory apart from the directory itself (you can use ! to exclude the dir).

We used mmdetection's implementation to fine tune Grounding DINO. Please see the
[mmdetection Official Github](https://github.com/open-mmlab/mmdetection/tree/main)
for installation instructions.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:nit: Could you add a bit more detail such as which venv to install that in, so that it works when running our asset generation pipeline.


When `--txt_prompt` is set to `gripper`, the segmentation script will use the gripper fine tuned
models for annotation and segmentation.

To fine tune your own object detection model for your gripper, see [these instructions](https://github.com/open-mmlab/mmdetection/blob/main/configs/grounding_dino/README.md)
from the mmdetection Official Github.

To fine tune your own segmentation model for your gripper, see [these instructions](https://github.com/facebookresearch/sam2/blob/main/training/README.md) for training from the
SAM2 Official Github.

An example of segmentation failure on the gripper with default models: \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a bit here for why it can fail without fine-tuning? It could be worth mentioning that our particular gripper seems to be out of distribution for SAM2, and thus it looses track of it for long videos. This can also be solved with re-prompting it after failure, segmenting the video in parts but fine-tuning removes this extra step.

<img src="assets/mask_sam2_default.png" width="200">
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:nit: Maybe "_failure" is more appropriate than "_default".


Gripper segmentation on the same image with custom models: \
<img src="assets/mask_sam2_custom.png" width="200">

### Submodules

#### robot_payload_id
Expand Down
Binary file added assets/mask_sam2_custom.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/mask_sam2_default.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
102 changes: 102 additions & 0 deletions configs/coco_detection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# This configuration file is taken from https://github.com/open-mmlab/mmdetection/tree/main/configs

# dataset settings
dataset_type = "CocoDataset"
data_root = "data/coco/"

# Example to use different file client
# Method 1: simply set the data root and let the file I/O module
# automatically infer from prefix (not support LMDB and Memcache yet)

# data_root = 's3://openmmlab/datasets/detection/coco/'

# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6
# backend_args = dict(
# backend='petrel',
# path_mapping=dict({
# './data/': 's3://openmmlab/datasets/detection/',
# 'data/': 's3://openmmlab/datasets/detection/'
# }))
backend_args = None

train_pipeline = [
dict(type="LoadImageFromFile", backend_args=backend_args),
dict(type="LoadAnnotations", with_bbox=True),
dict(type="Resize", scale=(1333, 800), keep_ratio=True),
dict(type="RandomFlip", prob=0.5),
dict(type="PackDetInputs"),
]
test_pipeline = [
dict(type="LoadImageFromFile", backend_args=backend_args),
dict(type="Resize", scale=(1333, 800), keep_ratio=True),
# If you don't have a gt annotation, delete the pipeline
dict(type="LoadAnnotations", with_bbox=True),
dict(
type="PackDetInputs",
meta_keys=("img_id", "img_path", "ori_shape", "img_shape", "scale_factor"),
),
]
train_dataloader = dict(
batch_size=2,
num_workers=2,
persistent_workers=True,
sampler=dict(type="DefaultSampler", shuffle=True),
batch_sampler=dict(type="AspectRatioBatchSampler"),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file="annotations/instances_train2017.json",
data_prefix=dict(img="train2017/"),
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=train_pipeline,
backend_args=backend_args,
),
)
val_dataloader = dict(
batch_size=1,
num_workers=2,
persistent_workers=True,
drop_last=False,
sampler=dict(type="DefaultSampler", shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file="annotations/instances_val2017.json",
data_prefix=dict(img="val2017/"),
test_mode=True,
pipeline=test_pipeline,
backend_args=backend_args,
),
)
test_dataloader = val_dataloader

val_evaluator = dict(
type="CocoMetric",
ann_file=data_root + "annotations/instances_val2017.json",
metric="bbox",
format_only=False,
backend_args=backend_args,
)
test_evaluator = val_evaluator

# inference on test dataset and
# format the output results for submission.
# test_dataloader = dict(
# batch_size=1,
# num_workers=2,
# persistent_workers=True,
# drop_last=False,
# sampler=dict(type='DefaultSampler', shuffle=False),
# dataset=dict(
# type=dataset_type,
# data_root=data_root,
# ann_file=data_root + 'annotations/image_info_test-dev2017.json',
# data_prefix=dict(img='test2017/'),
# test_mode=True,
# pipeline=test_pipeline))
# test_evaluator = dict(
# type='CocoMetric',
# metric='bbox',
# format_only=True,
# ann_file=data_root + 'annotations/image_info_test-dev2017.json',
# outfile_prefix='./work_dirs/coco_detection/test')
28 changes: 28 additions & 0 deletions configs/default_runtime.py
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:nit: I find it a bit weird to have python files inside a configs dir. I would normally expect yaml files or something similar there. Also, these are only for grounding dino fine-tuning. I'm a bit worried that people will lock for general pipeline configs in this folder. Would it be possible to rename this to something more specific (e.g. finetuned_grounding_dino_utils or whatever you think an appropriate group name for these file) and move it under scalable_real2sim/segmentation?

Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# This configuration file is taken from https://github.com/open-mmlab/mmdetection/tree/main/configs

default_scope = "mmdet"

default_hooks = dict(
timer=dict(type="IterTimerHook"),
logger=dict(type="LoggerHook", interval=50),
param_scheduler=dict(type="ParamSchedulerHook"),
checkpoint=dict(type="CheckpointHook", interval=1),
sampler_seed=dict(type="DistSamplerSeedHook"),
visualization=dict(type="DetVisualizationHook"),
)

env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method="fork", opencv_num_threads=0),
dist_cfg=dict(backend="nccl"),
)

vis_backends = [dict(type="LocalVisBackend")]
visualizer = dict(
type="DetLocalVisualizer", vis_backends=vis_backends, name="visualizer"
)
log_processor = dict(type="LogProcessor", window_size=50, by_epoch=True)

log_level = "INFO"
load_from = None
resume = False
Loading