This package provides a ROS 2 service–client wrapper around Contact-GraspNet, using a subprocess call inside the ROS 2 server to run grasp inference in a Docker container.
This design allows us to:
- Keep ROS 2 running on the host system (e.g., Python 3.12, CUDA 12.2).
- Execute Contact-GraspNet inference in a controlled environment (Docker with Python 3.9, CUDA 11.8).
- Cleanly return grasp results (
pred_grasps_cam,scores,contact_pts) to the ROS 2 ecosystem.
The same approach can be extended to other grasp planners or perception algorithms (e.g., UnseenObjectClustering) running in Docker or conda environments.
+-----------------+ +-------------------+ +------------------+
| ROS2 Client | -----> | ROS2 Server | -----> | Docker (CGN) |
| (grasp request) | | (subprocess call) | <----- | inference.py |
+-----------------+ +-------------------+ +------------------+
^ | (grasp planning)
| v
+----- Grasp Results <------+
Flow:
- Client sends a scene ID to the server.
- Server launches
inference.pyinside Docker viasubprocess. - Inference produces grasp predictions.
- Server extracts predictions (
pred_grasps_cam,scores,contact_pts) from JSON. - Results are returned to the client as a ROS 2 message.
-
ROS 2 Jazzy (or compatible distro) installed on host.
-
Docker with GPU runtime enabled (
nvidia-docker2ornvidia-container-toolkit). -
Built Docker image for Contact-GraspNet (see
Dockerfile_CGN). -
Assume repository
contact_graspnet_ros2is put under~/graspnet_ws/src
-
Locate the Docker files: Navigate to
~/graspnet_ws/src/contact_graspnet_ros2/contact_graspnet_docker -
Build the Docker image:
docker build -t cuda118:contact_graspnet -f Dockerfile_CGN .Alternatively, you may use the following command to pull the docker image for contact-graspnet from docker hub:
docker pull zhaohuajing/cuda118:contact_graspnet -
Start the Docker container:
This script launches the Contact-GraspNet container with the proper environment and names it as:contact_graspnet_container../run_docker.sh
-
Note:
- If you pulled the docker image instead of building it, modify
run_docker.shand change itsline 16fromcuda118:contact_graspnet \tozhaohuajing/cuda118:contact_graspnet \. run_docker.shscript will mount the entire workspace (i.e.,~/graspnet_ws/src) to the docker container through-v ~/graspnet_ws:/root/graspnet_ws; you may adjust the name of workspace to your local setup as needed.
- If you pulled the docker image instead of building it, modify
Once the container is running, simply leave that terminal open. No manual commands need to be executed inside the container. All ROS 2 server interactions are initiated from separate terminals on the host machine. These ROS 2 nodes use subprocess calls to automatically enter the running container and execute the required inference scripts internally.
- Start an new terminal on the host machine (i.e., outside of the docker container). Assume repository
contact_graspnet_ros2is put under~/graspnet_ws/src, run the following command:cd ~/graspnet_ws colcon build --symlink-install source install/setup.bash
Both server and client commands should run on the host machine (i.e., outside of the docker container).
-
Run the test ROS 2 server (in one terminal):
ros2 run contact_graspnet_ros2 grasp_executor_server
-
Run the test ROS 2 client (in another terminal):
ros2 run contact_graspnet_ros2 client_grasp_request <scene_name>
This requests grasps for ~/graspnet_ws/src/contact_graspnet_ros2/contact_graspnet/test_data/<scene_name>.npy. Example <scene_name> can be 0, 1, ..., 13.
- The server uses subprocess + docker exec to call inference inside the container.
- You can extend this wrapper for other perception or grasp planning modules by reusing the same server–client communication pattern.
This repository provides two complementary ROS 2 wrappers for Contact-GraspNet, enabling integration with real-world sensor inputs depending on the perception pipeline and available modalities.
We introduce a ROS 2 server, grasp_executor_rgbd_server, which enables Contact-GraspNet to operate directly on live RGB-D scenes (e.g., from Gazebo or a physical camera), instead of only static, pre-generated datasets.
Key features:
- ROS 2 service interface for grasp requests.
- Converts live RGB-D inputs (and optional instance segmentation outputs) into Contact-GraspNet-compatible scene files.
- Launches Contact-GraspNet inference inside Docker via
subprocess, enabling:- ROS 2 on the host (modern Python, CUDA, drivers).
- Contact-GraspNet running in a controlled container environment.
- Parses inference outputs and returns grasp poses to ROS 2 clients for planning and execution.
Both server and client commands should run on the host machine (i.e., outside of the docker container).
-
Run the ROS 2 server for RGB-D inputs (in one terminal):
ros2 run contact_graspnet_ros2 grasp_executor_rgbd_server
-
Run the test ROS 2 client (in another terminal):
ros2 run contact_graspnet_ros2 client_grasp_request <scene_name>
This requests grasps for ~/graspnet_ws/src/contact_graspnet_ros2/contact_graspnet/test_data/sample_scene_ucn/<scene_name>.npy. Example <scene_name> can be scene_from_ucn.
ros2 run contact_graspnet_ros2 grasp_executor_rgbd_serverThe RGB-D wrapper is designed for perception pipelines that start from synchronized color and depth images. This variant:
- Accepts live RGB-D images from simulation (Gazebo) or physical cameras
- Converts RGB-D observations into Contact-GraspNet scene representations
- Applies explicit camera and gripper frame alignment for correct TF integration
- Works out of the box with our ROS 2 Unseen Object Clustering wrapper
This design supports modular integration with upstream perception modules, including:
- Unseen Object Clustering (ROS 2 wrapper)
https://github.com/zhaohuajing/unseen_obj_clst_ros2 - Other RGB-D or image-based object detection and segmentation algorithms.
A full perception-to-action pipeline example using FlexBE state machines is available at:
- https://github.com/zhaohuajing/compare_flexbe
(branch:feature/cgn)
This enables a full RGB-D → segmentation → grasp planning → MoveIt pipeline without requiring intermediate point cloud processing by the user. For this reason, the RGB-D interface is currently the recommended entry point for end-to-end perception-to-action workflows in both simulation and real hardware.
We also provide a point cloud–based Contact-GraspNet wrapper, intended for pipelines that operate directly on 3D geometry rather than images. This variant:
- Accepts point clouds as input
- Bypasses RGB-D image handling and segmentation
- Supports Contact-GraspNet inference on raw or preprocessed point clouds
Run the ROS 2 server with Live PointCloud inputs
ros2 run contact_graspnet_ros2 grasp_executor_cloud_serverHowever, this point cloud interface is not directly compatible with the Unseen Object Clustering RGB-D pipeline provided in this repository, which operates on image-based segmentation. Instead, it is better suited for integration with:
- Point cloud–based object detection or segmentation models
- Scene reconstruction or multi-view fusion pipelines
- External perception systems that already output filtered or labeled point clouds
With appropriate upstream perception, the point cloud wrapper can be used as an alternative grasp planning backend, but it requires the user to manage object isolation and point cloud preparation externally.
A major contribution of this work is the explicit and correct alignment of frame conventions between Contact-GraspNet and standard ROS TF / URDF definitions.
Contact-GraspNet internally represents grasps in the camera optical frame which mismatches with ROS camera frames (e.g., camera_link).
We apply a fixed rotation: R_optical → camera_link to map Contact-GraspNet grasp poses into the ROS TF tree correctly. This resolves systematic position errors such as grasps floating above the table or shifted laterally.
Contact-GraspNet’s grasp frame does not exactly match the Panda gripper (panda_hand) convention used by ROS and MoveIt.
Based on inspection of prior implementations (e.g., SceneReplica), we introduce an additional constant gripper-frame rotation to reconcile differences in:
- Palm orientation
- End-effector X/Y axis definitions
After applying both:
- Camera optical → ROS camera frame rotation, and
- Contact-GraspNet grasp frame → Panda gripper frame rotation,
the resulting grasp poses are:
- Correctly aligned in position,
- Correctly oriented for execution,
- Directly usable by MoveIt without ad-hoc offsets.
These transformations are implemented in grasp_executor_rgbd_server.py and documented inline.
To support validation and debugging, this repository includes:
- RViz marker publishers for visualizing transformed grasp poses.
- Optional saving of intermediate results (JSON / NPZ / TXT) for offline inspection.
- Improved plotting utilities for Contact-GraspNet outputs with:
- Consistent axis coloring (X=red, Y=green, Z=blue),
- Fixed camera viewpoints,
- Deterministic screenshot export.
These tools were essential for verifying correctness across:
camera → pedestal → robot base → end-effector frames.