diff --git a/README.md b/README.md index 9853b1c..66516c3 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,11 @@ This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from the current directory, and treat ./visualDet3D as a package that we could modify and test directly instead of a library. Several useful scripts are provided in the main directory for easy usage. We believe that visual tasks are interconnected, so we make this library extensible to more experiments. -The package uses registry to register datasets, models, processing functions and more allowing easy inserting of new tasks/models while not interfere with the existing ones. +The package uses registry to register datasets, models, processing functions and more, allowing easy inserting of new tasks/models while not interfere with the existing ones. ## Related Paper: -This repo contains the official implementation of 2021 *RAL* paper [**Ground-aware Monocular 3D Object Detection for Autonomous Driving**](https://ieeexplore.ieee.org/document/9327478). [Arxiv Page](https://arxiv.org/abs/2102.00690). Pretrained model can be found at release pages. +This repo contains the official implementation of 2021 *RAL* \& *ICRA* paper [**Ground-aware Monocular 3D Object Detection for Autonomous Driving**](https://ieeexplore.ieee.org/document/9327478). [Arxiv Page](https://arxiv.org/abs/2102.00690). Pretrained model can be found at [release pages](https://github.com/Owen-Liuyuxuan/visualDet3D/releases/tag/1.0). ``` @ARTICLE{9327478, author={Y. {Liu} and Y. {Yuan} and M. {Liu}}, @@ -16,6 +16,20 @@ This repo contains the official implementation of 2021 *RAL* paper [**Ground-awa year={2021}, doi={10.1109/LRA.2021.3052442}} ``` + +Also the official implementation of 2021 *ICRA* paper [**YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection**](https://arxiv.org/abs/2103.09422). Pretrained model can be found at [release pages](https://github.com/Owen-Liuyuxuan/visualDet3D/releases/tag/1.1). +``` +@inproceedings{liu2021yolostereo3d, + title={YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection}, + author={Yuxuan Liu and Lujia Wang and Ming, Liu}, + booktitle={2021 International Conference on Robotics and Automation (ICRA)}, + year={2021}, + organization={IEEE} +} +``` + +We further incorperate an *Unofficial* re-implementation of **Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training** (KM3D) as a reference on how to integrate with other frameworks. (Notice that the codes are from the [originally official repo](https://github.com/Banconxuan/RTM3D), and we **DO NOT** guarantee a complete re-implementation). + ## Key Features - **SOTA Performance** State of the art result on visual 3D detection. @@ -26,7 +40,7 @@ This repo contains the official implementation of 2021 *RAL* paper [**Ground-awa - **Global Path-based IMDB** Do not need data placed inside the folder, convienient for managing data and code separately. -We provide start-up solutions for [Mono3D](docs/mono3d.md), [Depth Predictions](docs/monoDepth.md) and more (until further publication). +We provide start-up solutions for [Mono3D](docs/mono3d.md), [Stereo3D](docs/stereo3d.md), [Depth Predictions](docs/monoDepth.md) and more (until further publication). Reference: this repo borrows codes and ideas from [retinanet](https://github.com/yhenon/pytorch-retinanet), [mmdetection](https://github.com/open-mmlab/mmdetection), @@ -44,13 +58,13 @@ pip3 install -r requirement.txt or manually check dependencies. ```bash -# build ops (deform convs), We will not install operations into the system environment +# build ops (deform convs and iou3d), We will not install operations into the system environment ./make.sh ``` ## Start Training -Please check the corresponding task: [Mono3D](docs/mono3d.md), [Depth Predictions](docs/monoDepth.md). More demo will be available through contributions and further paper submission. +Please check the corresponding task: [Mono3D](docs/mono3d.md), [Stereo3D](docs/stereo3d.md) [Depth Predictions](docs/monoDepth.md). More demo will be available through contributions and further paper submission. ### Config and Path setup. @@ -78,7 +92,9 @@ Please check the template's comments and other comments in codes to fully exploi ## Other Resources - [RAM-LAB](https://www.ram-lab.com) -- [Collections of Papers and Readings](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/); [Collection for Mono3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/RecentCollectionForMono3D/); [Ground-Aware 3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/GroundAwareConvultion/) +- [Collections of Papers and Readings](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/); +- [Collection for Mono3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/RecentCollectionForMono3D/); [Ground-Aware 3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/GroundAwareConvultion/) +- [Collection for Stereo3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/RecentCollectionForStereo3D/); [YOLOStereo3D](https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/YOLOStereo3D/) ## Related Codes @@ -86,4 +102,5 @@ Please check the template's comments and other comments in codes to fully exploi - [M3D-RPN](https://github.com/garrickbrazil/M3D-RPN) - [Retinanet](https://github.com/yhenon/pytorch-retinanet) - [DORN](https://github.com/dontLoveBugs/SupervisedDepthPrediction) -- [det3](https://github.com/pyun-ram/FL3D) \ No newline at end of file +- [det3](https://github.com/pyun-ram/FL3D) +- [RTM3D](https://github.com/Banconxuan/RTM3D) \ No newline at end of file diff --git a/config/KM3D_example b/config/KM3D_example new file mode 100644 index 0000000..a96f476 --- /dev/null +++ b/config/KM3D_example @@ -0,0 +1,165 @@ +from easydict import EasyDict as edict +import os +import numpy as np + +cfg = edict() +cfg.obj_types = ['Car', 'Pedestrian', 'Cyclist'] +cfg.anchor_prior = False +## trainer +trainer = edict( + gpu = 0, + max_epochs = 200, + disp_iter = 50, + save_iter = 20, + test_iter = 20, + cudnn = True, + training_func = "train_rtm3d", + test_func = "test_mono_detection", + evaluate_func = "evaluate_kitti_obj", +) + +cfg.trainer = trainer + +## path +path = edict() +path.data_path = "/home/kitti_obj/training" +path.test_path = "/home/kitti_obj/testing" +path.visualDet3D_path = "/home/stereo_kitti/visualDet3D" +path.project_path = "/home/stereo_kitti/workdirs" + +if not os.path.isdir(path.project_path): + os.mkdir(path.project_path) +path.project_path = os.path.join(path.project_path, 'RTM3D') +if not os.path.isdir(path.project_path): + os.mkdir(path.project_path) + +path.log_path = os.path.join(path.project_path, "log") +if not os.path.isdir(path.log_path): + os.mkdir(path.log_path) + +path.checkpoint_path = os.path.join(path.project_path, "checkpoint") +if not os.path.isdir(path.checkpoint_path): + os.mkdir(path.checkpoint_path) + +path.preprocessed_path = os.path.join(path.project_path, "output") +if not os.path.isdir(path.preprocessed_path): + os.mkdir(path.preprocessed_path) + +path.train_imdb_path = os.path.join(path.preprocessed_path, "training") +if not os.path.isdir(path.train_imdb_path): + os.mkdir(path.train_imdb_path) + +path.val_imdb_path = os.path.join(path.preprocessed_path, "validation") +if not os.path.isdir(path.val_imdb_path): + os.mkdir(path.val_imdb_path) + +cfg.path = path + +## optimizer +optimizer = edict( + type_name = 'adam', + keywords = edict( + lr = 1.25e-4, + weight_decay = 0, + ), + clipped_gradient_norm = 35.0 +) +cfg.optimizer = optimizer +## scheduler +scheduler = edict( + type_name = 'MultiStepLR', + keywords = edict( + milestones = [90, 120] + ) +) +cfg.scheduler = scheduler + +## data +data = edict( + batch_size = 32, + num_workers = 4, + rgb_shape = (384, 1280, 3), + train_dataset = "KittiRTM3DDataset", + val_dataset = "KittiMonoDataset", + test_dataset = "KittiMonoTestDataset", + train_split_file = os.path.join(cfg.path.visualDet3D_path, 'data', 'kitti', 'chen_split', 'train.txt'), + val_split_file = os.path.join(cfg.path.visualDet3D_path, 'data', 'kitti', 'chen_split', 'val.txt'), + max_occlusion = 4, + min_z = 3, +) + +data.augmentation = edict( + rgb_mean = np.array([0.485, 0.456, 0.406]), + rgb_std = np.array([0.229, 0.224, 0.225]), + cropSize = (data.rgb_shape[0], data.rgb_shape[1]), +) +data.train_augmentation = [ + edict(type_name='ConvertToFloat'), + edict(type_name='RandomWarpAffine', keywords=edict(output_w=data.augmentation.cropSize[1], output_h=data.augmentation.cropSize[0])), + #edict(type_name='Resize', keywords=edict(size=data.augmentation.cropSize)), + edict(type_name="Shuffle", keywords=edict( + aug_list=[ + edict(type_name="RandomBrightness", keywords=edict(distort_prob=1.0)), + edict(type_name="RandomContrast", keywords=edict(distort_prob=1.0, lower=0.6, upper=1.4)), + edict(type_name="Compose", keywords=edict( + aug_list=[ + edict(type_name="ConvertColor", keywords=edict(transform='HSV')), + edict(type_name="RandomSaturation", keywords=edict(distort_prob=1.0, lower=0.6, upper=1.4)), + edict(type_name="ConvertColor", keywords=edict(current='HSV', transform='RGB')), + ] + )) + ] + ) + ), + edict(type_name='RandomEigenvalueNoise', keywords=edict(alphastd=0.1)), + edict(type_name='RandomMirror', keywords=edict(mirror_prob=0.5)), + edict(type_name="FilterObject"), + edict(type_name='Normalize', keywords=edict(mean=data.augmentation.rgb_mean, stds=data.augmentation.rgb_std)) +] +data.test_augmentation = [ + edict(type_name='ConvertToFloat'), + #edict(type_name='CropTop', keywords=edict(crop_top_index=data.augmentation.crop_top)), + edict(type_name='Resize', keywords=edict(size=data.augmentation.cropSize)), + edict(type_name='Normalize', keywords=edict(mean=data.augmentation.rgb_mean, stds=data.augmentation.rgb_std)) +] +cfg.data = data + +## networks +detector = edict() +detector.obj_types = cfg.obj_types +detector.name = 'KM3D' +detector.backbone = edict( + depth=18, + pretrained=True, + frozen_stages=-1, + num_stages=4, + out_indices=(3, ), + norm_eval=False, + dilations=(1, 1, 1, 1), +) +head_loss = edict( + gamma=2.0, + rampup_length = 100, + output_w = data.rgb_shape[1] // 4 +) +head_test = edict( + score_thr=0.3, +) + +head_layer = edict( + input_features=256, + head_features=64, + head_dict={'hm': len(cfg.obj_types), 'wh': 2, 'hps': 18, + 'rot': 8, 'dim': 3, 'prob': 1, + 'reg': 2, 'hm_hp': 9, 'hp_offset': 2} +) +detector.head = edict( + num_classes = len(cfg.obj_types), + num_joints = 9, + max_objects = 32, + layer_cfg = head_layer, + loss_cfg = head_loss, + test_cfg = head_test +) +detector.loss = head_loss +cfg.detector = detector diff --git a/config/Stereo3D_example b/config/Stereo3D_example new file mode 100644 index 0000000..99021cc --- /dev/null +++ b/config/Stereo3D_example @@ -0,0 +1,167 @@ +from easydict import EasyDict as edict +import os +import numpy as np + +cfg = edict() +cfg.obj_types = ['Car', 'Pedestrian'] + +## trainer +trainer = edict( + gpu = 0, + max_epochs = 80, # for validation epoch 50 is enough + disp_iter = 100, + save_iter = 5, + test_iter = 10, + training_func = "train_stereo_detection", + test_func = "test_stereo_detection", + evaluate_func = "evaluate_kitti_obj", +) + +cfg.trainer = trainer + +## path +path = edict() +path.data_path = "/data/kitti_obj/training" # used in visualDet3D/data/.../dataset +path.test_path = "/data/kitti_obj/testing" # used in visualDet3D/data/.../dataset +path.visualDet3D_path = "/path/to/visualDet3D/visualDet3D" # The path should point to the inner subfolder +path.project_path = "/path/to/visualDet3D/workdirs" # or other path for pickle files, checkpoints, tensorboard logging and output files. +if not os.path.isdir(path.project_path): + os.mkdir(path.project_path) +path.project_path = os.path.join(path.project_path, 'Stereo3D') +if not os.path.isdir(path.project_path): + os.mkdir(path.project_path) + +path.log_path = os.path.join(path.project_path, "log") +if not os.path.isdir(path.log_path): + os.mkdir(path.log_path) + +path.checkpoint_path = os.path.join(path.project_path, "checkpoint") +if not os.path.isdir(path.checkpoint_path): + os.mkdir(path.checkpoint_path) + +path.preprocessed_path = os.path.join(path.project_path, "output") +if not os.path.isdir(path.preprocessed_path): + os.mkdir(path.preprocessed_path) + +path.train_imdb_path = os.path.join(path.preprocessed_path, "training") +if not os.path.isdir(path.train_imdb_path): + os.mkdir(path.train_imdb_path) + +path.val_imdb_path = os.path.join(path.preprocessed_path, "validation") +if not os.path.isdir(path.val_imdb_path): + os.mkdir(path.val_imdb_path) + +cfg.path = path + +## optimizer +optimizer = edict( + type_name = 'adam', + keywords = edict( + lr = 1e-4, + weight_decay = 0, + ), + clipped_gradient_norm = 0.1 +) +cfg.optimizer = optimizer +## scheduler +scheduler = edict( + type_name = 'CosineAnnealingLR', + keywords = edict( + T_max = cfg.trainer.max_epochs, + eta_min = 5e-6, + ) +) +cfg.scheduler = scheduler + +## data +data = edict( + batch_size = 4, + num_workers = 4, + rgb_shape = (288, 1280, 3), + train_dataset = "KittiStereoDataset", + val_dataset = "KittiStereoDataset", + test_dataset = "KittiStereoTestDataset", + train_split_file = os.path.join(cfg.path.visualDet3D_path, 'data', 'kitti', 'test_split', 'train.txt'), + val_split_file = os.path.join(cfg.path.visualDet3D_path, 'data', 'kitti', 'test_split', 'val.txt'), +) + +data.augmentation = edict( + rgb_mean = np.array([0.485, 0.456, 0.406]), + rgb_std = np.array([0.229, 0.224, 0.225]), + cropSize = (data.rgb_shape[0], data.rgb_shape[1]), + crop_top = 100, +) +data.train_augmentation = [ + edict(type_name='ConvertToFloat'), + edict(type_name='PhotometricDistort', keywords=edict(distort_prob=1.0, contrast_lower=0.5, contrast_upper=1.5, saturation_lower=0.5, saturation_upper=1.5, hue_delta=18.0, brightness_delta=32)), + edict(type_name='CropTop', keywords=edict(crop_top_index=data.augmentation.crop_top)), + edict(type_name='Resize', keywords=edict(size=data.augmentation.cropSize)), + edict(type_name='RandomMirror', keywords=edict(mirror_prob=0.5)), + edict(type_name='Normalize', keywords=edict(mean=data.augmentation.rgb_mean, stds=data.augmentation.rgb_std)) +] +data.test_augmentation = [ + edict(type_name='ConvertToFloat'), + edict(type_name='CropTop', keywords=edict(crop_top_index=data.augmentation.crop_top)), + edict(type_name='Resize', keywords=edict(size=data.augmentation.cropSize)), + edict(type_name='Normalize', keywords=edict(mean=data.augmentation.rgb_mean, stds=data.augmentation.rgb_std)) +] +cfg.data = data + +## networks +detector = edict() +detector.obj_types = cfg.obj_types +detector.name = 'Stereo3D' +detector.backbone = edict( + depth=34, + pretrained=True, + frozen_stages=-1, + num_stages=3, + out_indices=(0, 1, 2), + norm_eval=True, + dilations=(1, 1, 1), +) +head_loss = edict( + fg_iou_threshold = 0.5, + bg_iou_threshold = 0.4, + L1_regression_alpha = 5 ** 2, + focal_loss_gamma = 2.0, + balance_weight = [20.0, 40], + regression_weight = [1, 1, 1, 1, 1, 1, 12, 1, 1, 0.5, 0.5, 0.5, 1], #[x, y, w, h, cx, cy, z, sin2a, cos2a, w, h, l] +) +head_test = edict( + score_thr=0.75, + cls_agnostic = False, + nms_iou_thr=0.4, + post_optimization=True +) + +anchors = edict( + { + 'obj_types': cfg.obj_types, + 'pyramid_levels':[4], + 'strides': [2 ** 4], + 'sizes' : [24], + 'ratios': np.array([0.5, 1, 2.0]), + 'scales': np.array([2 ** (i / 4.0) for i in range(16)]), + } + ) + +head_layer = edict( + num_features_in=1408, + num_cls_output=len(cfg.obj_types)+1, + num_reg_output=12, + cls_feature_size=256, + reg_feature_size=1408, +) +detector.head = edict( + num_regression_loss_terms=13, + preprocessed_path=path.preprocessed_path, + num_classes = len(cfg.obj_types), + anchors_cfg = anchors, + layer_cfg = head_layer, + loss_cfg = head_loss, + test_cfg = head_test +) +detector.anchors = anchors +detector.loss = head_loss +cfg.detector = detector diff --git a/demos/visualize_test_3d_stereo.ipynb b/demos/visualize_test_3d_stereo.ipynb new file mode 100644 index 0000000..e2cb141 --- /dev/null +++ b/demos/visualize_test_3d_stereo.ipynb @@ -0,0 +1,342 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2021-03-17 16:26:11 224002facb3e numba.cuda.cudadrv.driver[8550] INFO init\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CUDA available: True\n" + ] + } + ], + "source": [ + "import sys\n", + "sys.path.append(\"../\")\n", + "import importlib\n", + "import os\n", + "import copy\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import cv2\n", + "import torch\n", + "import torch.nn as nn\n", + "from torch.utils.data import Dataset, DataLoader\n", + "from torch.utils.tensorboard import SummaryWriter\n", + "from torchvision import datasets, models, transforms\n", + "import torchvision\n", + "from visualDet3D.data.kitti.utils import write_result_to_file\n", + "from visualDet3D.utils.utils import LossLogger, cfg_from_file\n", + "from visualDet3D.networks.utils.registry import DETECTOR_DICT, DATASET_DICT, PIPELINE_DICT\n", + "from visualDet3D.networks.heads.anchors import Anchors\n", + "from visualDet3D.networks.lib.fast_utils.hill_climbing import post_opt\n", + "from visualDet3D.networks.utils import BBox3dProjector, BackProjection\n", + "from visualDet3D.utils.utils import convertAlpha2Rot, convertRot2Alpha, draw_3D_box, compound_annotation\n", + "import visualDet3D.data.kitti.dataset\n", + "from visualDet3D.utils.timer import Timer\n", + "from numba import jit\n", + "from tqdm import tqdm\n", + "print('CUDA available: {}'.format(torch.cuda.is_available()))\n", + "\n", + "cfg = cfg_from_file(\"../config/kitti_stereo.py\")\n", + "is_test_train = True\n", + "\n", + "checkpoint_name = \"open_Stereo3D_latest.pth\"" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "def draw_bbox2d_to_image(image, bboxes2d, color=(255, 0, 255)):\n", + " drawed_image = image.copy()\n", + " for box2d in bboxes2d:\n", + " cv2.rectangle(drawed_image, (int(box2d[0]), int(box2d[1])), (int(box2d[2]), int(box2d[3])), color, 3)\n", + " return drawed_image" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "cfg.batch_size=1\n", + "split_to_test='validation'\n", + "\n", + "is_test_train = split_to_test == 'training'\n", + "if split_to_test == 'training':\n", + " dataset_name = cfg.data.train_dataset\n", + "elif split_to_test == 'test':\n", + " dataset_name = cfg.data.test_dataset\n", + "else:\n", + " dataset_name = cfg.data.val_dataset\n", + "\n", + "dataset = DATASET_DICT[dataset_name](\n", + " cfg, split_to_test\n", + " )\n", + "\n", + "if split_to_test=='training':\n", + " dataset_val = DATASET_DICT[cfg.data.val_dataset](\n", + " cfg, 'validation'\n", + " )\n", + " dataset.transform = dataset_val.transform\n", + " dataset.collate_fn = dataset_val.collate_fn\n", + "\n", + "\n", + "\n", + "detector = DETECTOR_DICT[cfg.detector.name](cfg.detector)\n", + "detector = detector.cuda()\n", + "\n", + "weight_path = os.path.join(cfg.path.checkpoint_path, checkpoint_name)\n", + "state_dict = torch.load(weight_path, map_location='cuda:{}'.format(cfg.trainer.gpu))\n", + "new_dict = state_dict.copy()\n", + "for key in state_dict:\n", + " if 'focalLoss' in key:\n", + " new_dict.pop(key)\n", + "detector.load_state_dict(new_dict, strict=False)\n", + "detector.eval().cuda()\n", + "\n", + "# testing pipeline\n", + "test_func = PIPELINE_DICT[cfg.trainer.test_func]\n", + "\n", + "projector = BBox3dProjector().cuda()\n", + "backprojector = BackProjection().cuda()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "index = 0\n", + "def corner_homo2bbox(corner_homo):\n", + " \"\"\"\n", + " corner_homo: [N, 8, 3]\n", + " \"\"\"\n", + " min_xy = torch.min(corner_homo[:, :, 0:2], dim=1)[0]\n", + " max_xy = torch.max(corner_homo[:, :, 0:2], dim=1)[0]\n", + " min_xy[:, 0] = torch.clamp(min_xy[:, 0], 0, cfg.rgb_shape[1])\n", + " min_xy[:, 1] = torch.clamp(min_xy[:, 1], 0, cfg.rgb_shape[0])\n", + " max_xy[:, 0] = torch.clamp(max_xy[:, 0], 0, cfg.rgb_shape[1])\n", + " max_xy[:, 1] = torch.clamp(max_xy[:, 1], 0, cfg.rgb_shape[0])\n", + " return torch.cat([min_xy, max_xy], dim=1)\n", + "\n", + "def denorm(image):\n", + " new_image = np.array((image * cfg.data.augmentation.rgb_std + cfg.data.augmentation.rgb_mean) * 255, dtype=np.uint8)\n", + " return new_image\n", + "\n", + "@jit(cache=True, nopython=True)\n", + "def ToColorDepth(depth_image:np.ndarray)->np.ndarray: #[H, W] -> [H, W, 3] # Used to draw depth predictions\n", + " H, W = depth_image.shape\n", + " max_depth = float(np.max(depth_image))\n", + " cmap = np.array([\n", + " [0,0,0,114],[0,0,1,185],[1,0,0,114],[1,0,1,174], \n", + " [0,1,0,114],[0,1,1,185],[1,1,0,114],[1,1,1,0]\n", + " ])\n", + " _sum = 0\n", + " for i in range(8):\n", + " _sum += cmap[i, 3]\n", + " \n", + " weights = np.zeros(8)\n", + " cumsum = np.zeros(8)\n", + " for i in range(7):\n", + " weights[i] = _sum / cmap[i, 3]\n", + " cumsum[i+1] = cumsum[i] + cmap[i, 3] / _sum\n", + " \n", + " image = np.zeros((H, W, 3), dtype=np.uint8)\n", + " for i in range(H):\n", + " for j in range(W):\n", + " val = depth_image[i, j] / max_depth\n", + " for k in range(7):\n", + " if val <= cumsum[k + 1]:\n", + " break\n", + " w = 1.0- (val - cumsum[k]) * weights[k]\n", + " r = int( (w * cmap[k, 0] + (1 - w) * cmap[k+1, 0]) * 255 )\n", + " g = int( (w * cmap[k, 1] + (1 - w) * cmap[k+1, 1]) * 255 )\n", + " b = int( (w * cmap[k, 2] + (1 - w) * cmap[k+1, 2]) * 255 )\n", + " image[i, j] = np.array([r,g,b])\n", + " return image" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "def compute_once(index, is_draw=True, is_test_train=True):\n", + " name = \"%06d\" % index\n", + " data = dataset[index]\n", + " if isinstance(data['calib'], list):\n", + " P2 = data['calib'][0]\n", + " else:\n", + " P2 = data['calib']\n", + " original_height = data['original_shape'][0]\n", + " collated_data = dataset.collate_fn([data])\n", + " height = collated_data[0].shape[2]\n", + " scale_2d = (original_height - cfg.data.augmentation.crop_top) / height\n", + " \n", + " if len(collated_data) > 6:\n", + " left_images, right_images, _, _, labels, bbox_3d, _ = collated_data\n", + " else:\n", + " left_images, right_images, _, _, labels, bbox_3d = collated_data\n", + " image = left_images\n", + "\n", + " with torch.no_grad():\n", + " \n", + " left_images, right_images, P2, P3 = collated_data[0], collated_data[1], collated_data[2], collated_data[3]\n", + " scores, bbox, obj_names = detector([left_images.cuda().float().contiguous(),\n", + " right_images.cuda().float().contiguous(),\n", + " P2.cuda().float(),\n", + " P3.cuda().float()])\n", + " \n", + " P2 = P2[0]\n", + " bbox_2d = bbox[:, 0:4]\n", + " bbox_3d_state = bbox[:, 4:] #[cx,cy,z,w,h,l,alpha]\n", + " bbox_3d_state_3d = backprojector(bbox_3d_state, P2.cuda()) #[x, y, z, w,h ,l, alpha]\n", + " abs_bbox, bbox_3d_corner_homo, thetas = projector(bbox_3d_state_3d, P2.cuda())\n", + "\n", + " \n", + " \n", + " rgb_image = denorm(image[0].cpu().numpy().transpose([1, 2, 0]))\n", + " if len(scores) > 0:\n", + " rgb_image = draw_bbox2d_to_image(rgb_image, bbox_2d.cpu().numpy())\n", + " for box in bbox_3d_corner_homo:\n", + " box = box.cpu().numpy().T\n", + " rgb_image = draw_3D_box(rgb_image, box)\n", + " if is_draw:\n", + " plt.imshow(np.clip(rgb_image, 0, 255))\n", + "\n", + " return np.clip(rgb_image, 0, 255)\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```python\n", + "centers = np.array([[1.50, 1.57, 1.625, 1.67, 1.72],\n", + " [1.42, 1.46, 1.50, 1.58, 1.66],\n", + " [3.43, 3.63, 3.89, 4.17, 4.47]]) #[3, 5]\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "PSM Cos Volume takes 0.0013043880462646484 seconds at call time 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "../visualDet3D/networks/lib/PSM_cost_volume.py:82: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.\n", + " cost = Variable(\n", + "../visualDet3D/networks/lib/PSM_cost_volume.py:49: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.\n", + " cost = Variable(\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "index = 0\n", + "a = compute_once(index)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "PSM Cos Volume takes 0.002501249313354492 seconds at call time 2\n", + "PSM Cos Volume takes 0.0020182132720947266 seconds at call time 3\n", + "Cost Volume takes 0.001317739486694336 seconds at call time 1\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "#%matplotlib inline\n", + "fig = plt.figure(figsize=(16,9))\n", + "index += 1\n", + "a = compute_once(index, is_test_train=False, is_draw=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/stereo3d.md b/docs/stereo3d.md new file mode 100644 index 0000000..6cf5a69 --- /dev/null +++ b/docs/stereo3d.md @@ -0,0 +1,34 @@ +# Stereo3D + +## Training Schedule + +```bash +# copy Stereo 3D example config +cd config +cp Stereo3D_example $CONFIG_FILE.py + +## Modify config path +nano $CONFIG_FILE.py +cd .. + +## Compute image database and anchors mean/std +# You can run ./launcher/det_precompute.sh without arguments to see helper documents +./launcher/det_precompute.sh config/$CONFIG_FILE.py train +./launcher/det_precompute.sh config/$CONFIG_FILE.py test + +## run this if disparity map is needed, can be computed with point cloud or openCV BlockMatching +# You can run ./launcher/disparity_precompute.sh without arguments to see helper documents +./disparity_precompute.sh config/$CONFIG_FILE.py $IsUsingPointCloud + +## train the model with one GPU +# You can run ./launcher/train.sh without arguments to see helper documents +./launcher/train.sh --config/$CONFIG_FILE.py 0 $experiment_name # validation goes along + +## produce validation/test result # we only support single GPU testing +# You can run ./launcher/eval.sh without arguments to see helper documents +./launcher/eval.sh --config/$CONFIG_FILE.py 0 $CHECKPOINT_PATH validation/test +``` + + +## Testing on KITTI +![stere3d_gif](stereo_3d.gif) \ No newline at end of file diff --git a/docs/stereo_3d.gif b/docs/stereo_3d.gif new file mode 100644 index 0000000..f4fbbb6 Binary files /dev/null and b/docs/stereo_3d.gif differ diff --git a/launchers/disparity_precompute.sh b/launchers/disparity_precompute.sh new file mode 100755 index 0000000..237d3cb --- /dev/null +++ b/launchers/disparity_precompute.sh @@ -0,0 +1,15 @@ +#!/bin/bash +set -e +if [[ "$2" == "" ]];then + echo -e "--------------------Disparity Ground Truth Precompute script------------------" + echo -e "Two arguments are needed. Usage: \n" + echo -e " ./disparity_precompute.sh \n" + echo -e "exiting" + echo -e "------------------------------------------------------------------" + exit 1 +fi +CONFIG_PATH=$1 +IS_PC=$2 + + +python3 scripts/disparity_compute.py --config=$CONFIG_PATH --use_point_cloud=$IS_PC diff --git a/make.sh b/make.sh index 7101061..8fd381d 100755 --- a/make.sh +++ b/make.sh @@ -8,8 +8,13 @@ CUDA_VER=$(python3 -c "import torch;print(torch.version.cuda)") if [[ $CUDA_VER < "10.0" || $TORCH_VER < '1.3' ]] ; then echo "The current version of pytorch/cuda is $TORCH_VER/$CUDA_VER which could be not compatible with deformable convolution, we will not compile DCN for now. As long as you do not init DCN instance in code, the code will run fine." else - cd visualDet3D/networks/lib/ops/dcn + pushd visualDet3D/networks/lib/ops/dcn sh make.sh rm -r build - cd ../../../../.. + popd + + pushd visualDet3D/networks/lib/ops/iou3d + sh make.sh + rm -r build + popd fi diff --git a/scripts/disparity_compute.py b/scripts/disparity_compute.py new file mode 100644 index 0000000..1abf683 --- /dev/null +++ b/scripts/disparity_compute.py @@ -0,0 +1,149 @@ +from tqdm import tqdm +import numpy as np +import os +import pickle +import time +import cv2 +from fire import Fire +from typing import List, Dict, Tuple +from copy import deepcopy +import skimage.measure +import torch + +from _path_init import * +from visualDet3D.networks.heads.anchors import Anchors +from visualDet3D.networks.utils.utils import calc_iou, BBox3dProjector +from visualDet3D.data.pipeline import build_augmentator +from visualDet3D.data.kitti.kittidata import KittiData +from visualDet3D.data.kitti.utils import generate_dispariy_from_velo +from visualDet3D.utils.timer import Timer +from visualDet3D.utils.utils import cfg_from_file +def denorm(image:np.ndarray, rgb_mean:np.ndarray, rgb_std:np.ndarray)->np.ndarray: + """ + Denormalize a image. + Args: + image: np.ndarray normalized [H, W, 3] + rgb_mean: np.ndarray [3] among [0, 1] image + rgb_std : np.ndarray [3] among [0, 1] image + Returns: + unnormalized image: np.ndarray (H, W, 3) [0-255] dtype=np.uint8 + """ + image = image * rgb_std + rgb_mean # + image[image > 1] = 1 + image[image < 0] = 0 + image *= 255 + return np.array(image, dtype=np.uint8) + +def process_train_val_file(cfg)-> Tuple[List[str], List[str]]: + train_file = cfg.data.train_split_file + val_file = cfg.data.val_split_file + + with open(train_file) as f: + train_lines = f.readlines() + for i in range(len(train_lines)): + train_lines[i] = train_lines[i].strip() + + with open(val_file) as f: + val_lines = f.readlines() + for i in range(len(val_lines)): + val_lines[i] = val_lines[i].strip() + + return train_lines, val_lines + +def compute_dispairity_for_split(cfg, + index_names:List[str], + data_root_dir:str, + output_dict:Dict, + data_split:str='training', + time_display_inter:int=100, + use_point_cloud:bool=True): + save_dir = os.path.join(cfg.path.preprocessed_path, data_split) + if not os.path.isdir(save_dir): + os.makedirs(save_dir) + + disp_dir = os.path.join(save_dir, 'disp') + if not os.path.isdir(disp_dir): + os.mkdir(disp_dir) + + if not use_point_cloud: + stereo_matcher = cv2.StereoBM_create(192, 25) + + N = len(index_names) + frames = [None] * N + print("start reading {} data".format(data_split)) + timer = Timer() + preprocess = build_augmentator(cfg.data.test_augmentation) + + for i, index_name in tqdm(enumerate(index_names)): + + # read data with dataloader api + data_frame = KittiData(data_root_dir, index_name, output_dict) + calib, image, right_image, label, velo = data_frame.read_data() + + original_image = image.copy() + baseline = (calib.P2[0, 3] - calib.P3[0, 3]) / calib.P2[0, 0] + image, image_3, P2, P3 = preprocess(original_image, right_image.copy(), p2=deepcopy(calib.P2), p3=deepcopy(calib.P3)) + if use_point_cloud: + ## gathering disparity with point cloud back projection + disparity_left = generate_dispariy_from_velo(velo[:, 0:3], image.shape[0], image.shape[1], calib.Tr_velo_to_cam, calib.R0_rect, P2, baseline=baseline) + disparity_right = generate_dispariy_from_velo(velo[:, 0:3], image.shape[0], image.shape[1], calib.Tr_velo_to_cam, calib.R0_rect, P3, baseline=baseline) + + else: + ## gathering disparity with stereoBM from opencv + left_image = denorm(image, cfg.data.augmentation.rgb_mean, cfg.data.augmentation.rgb_std) + right_image = denorm(image_3, cfg.data.augmentation.rgb_mean, cfg.data.augmentation.rgb_std) + gray_image1 = cv2.cvtColor(left_image, cv2.COLOR_BGR2GRAY) + gray_image2 = cv2.cvtColor(right_image, cv2.COLOR_BGR2GRAY) + + disparity_left = stereo_matcher.compute(gray_image1, gray_image2) + disparity_left[disparity_left < 0] = 0 + disparity_left = disparity_left.astype(np.uint16) + + disparity_right = stereo_matcher.compute(gray_image2[:, ::-1], gray_image1[:, ::-1]) + disparity_right[disparity_right < 0] = 0 + disparity_right= disparity_right.astype(np.uint16) + + + disparity_left = skimage.measure.block_reduce(disparity_left, (4,4), np.max) + file_name = os.path.join(disp_dir, "P2%06d.png" % i) + cv2.imwrite(file_name, disparity_left) + + disparity_right = skimage.measure.block_reduce(disparity_right, (4,4), np.max) + file_name = os.path.join(disp_dir, "P3%06d.png" % i) + cv2.imwrite(file_name, disparity_left) + + + + print("{} split finished precomputing disparity".format(data_split)) + + + + +def main(config:str="config/config.py",use_point_cloud:bool=False): + """Main entry point for disparity precompute + config_file(str): path to the config file. + use_point_cloud(bool): whether use OpenCV or point cloud to construct disparity ground truth. + """ + cfg = cfg_from_file(config) + torch.cuda.set_device(cfg.trainer.gpu) + time_display_inter = 100 # define the inverval displaying time consumed in loop + data_root_dir = cfg.path.data_path # the base directory of training dataset + calib_path = os.path.join(data_root_dir, 'calib') + list_calib = os.listdir(calib_path) + N = len(list_calib) + # no need for image, could be modified for extended use + output_dict = { + "calib": True, + "image": True, + "image_3" : True, + "label": False, + "velodyne": use_point_cloud, + } + + train_names, val_names = process_train_val_file(cfg) + compute_dispairity_for_split(cfg, train_names, data_root_dir, output_dict, 'training', time_display_inter, use_point_cloud) + + print("Preprocessing finished") + +if __name__ == '__main__': + Fire(main) diff --git a/scripts/train.py b/scripts/train.py index 50e70c0..abcd5d7 100644 --- a/scripts/train.py +++ b/scripts/train.py @@ -15,7 +15,7 @@ from _path_init import * from visualDet3D.networks.utils.registry import DETECTOR_DICT, DATASET_DICT, PIPELINE_DICT -from visualDet3D.networks.utils.utils import get_num_parameters +from visualDet3D.networks.utils.utils import BackProjection, BBox3dProjector, get_num_parameters from visualDet3D.evaluator.kitti.evaluate import evaluate import visualDet3D.data.kitti.dataset from visualDet3D.utils.timer import Timer @@ -61,7 +61,9 @@ def main(config="config/config.py", experiment_name="default", world_size=1, loc writer = None ## Set up GPU and distribution process - gpu = min(local_rank if is_distributed else cfg.trainer.gpu, torch.cuda.device_count() - 1) + if is_distributed: + cfg.trainer.gpu = local_rank # local_rank will overwrite the GPU in configure file + gpu = min(cfg.trainer.gpu, torch.cuda.device_count() - 1) torch.backends.cudnn.benchmark = getattr(cfg.trainer, 'cudnn', False) torch.cuda.set_device(gpu) if is_distributed: @@ -142,7 +144,7 @@ def main(config="config/config.py", experiment_name="default", world_size=1, loc if training_loss_logger: training_loss_logger.reset() for iter_num, data in enumerate(dataloader_train): - training_dection(data, detector, optimizer, writer, training_loss_logger, global_step, cfg) + training_dection(data, detector, optimizer, writer, training_loss_logger, global_step, epoch_num, cfg) global_step += 1 @@ -151,12 +153,15 @@ def main(config="config/config.py", experiment_name="default", world_size=1, loc if is_logging and global_step % cfg.trainer.disp_iter == 0: ## Log loss, print out and write to tensorboard in main process - log_str = 'Epoch: {} | Iteration: {} | Running loss: {:1.5f} | eta:{}'.format( + if 'total_loss' not in training_loss_logger.loss_stats: + print(f"\nIn epoch {epoch_num}, iteration:{iter_num}, global_step:{global_step}, total_loss not found in logger.") + else: + log_str = 'Epoch: {} | Iteration: {} | Running loss: {:1.5f} | eta:{}'.format( epoch_num, iter_num, training_loss_logger.loss_stats['total_loss'].avg, timer.compute_eta(global_step, len(dataloader_train) * cfg.trainer.max_epochs)) - print(log_str, end='\r') - writer.add_text("training_log/train", log_str, global_step) - training_loss_logger.log(global_step) + print(log_str, end='\r') + writer.add_text("training_log/train", log_str, global_step) + training_loss_logger.log(global_step) if not is_iter_based: scheduler.step() diff --git a/visualDet3D/data/kitti/dataset/KM3D_dataset.py b/visualDet3D/data/kitti/dataset/KM3D_dataset.py new file mode 100644 index 0000000..0f8d6d2 --- /dev/null +++ b/visualDet3D/data/kitti/dataset/KM3D_dataset.py @@ -0,0 +1,278 @@ +from __future__ import print_function, division +import sys +import os +import torch +import numpy as np +import random +import csv +from typing import List, Tuple +from torch.utils.data import Dataset, DataLoader + +import torch +import torch.nn as nn +import torch.utils.data +from visualDet3D.utils.utils import alpha2theta_3d, theta2alpha_3d +from visualDet3D.data.kitti.kittidata import KittiData, KittiObj, KittiCalib +from visualDet3D.data.kitti.dataset import KittiMonoDataset +from visualDet3D.data.pipeline import build_augmentator +from visualDet3D.utils.timer import profile +from visualDet3D.networks.utils.rtm3d_utils import gen_hm_radius, project_to_image, gaussian_radius +import os +import pickle +import numpy as np +from copy import deepcopy +from visualDet3D.networks.utils import BBox3dProjector +from visualDet3D.networks.utils.registry import DATASET_DICT +import sys +from matplotlib import pyplot as plt +ros_py_path = '/opt/ros/kinetic/lib/python2.7/dist-packages' +if sys.version_info > (3, 0) and ros_py_path in sys.path: + #Python 3, compatible with a naive ros environment + sys.path.remove(ros_py_path) + import cv2 + sys.path.append(ros_py_path) +else: + #Python 2 + import cv2 + +@DATASET_DICT.register_module +class KittiRTM3DDataset(KittiMonoDataset): + def __init__(self, cfg, split='training'): + super(KittiRTM3DDataset, self).__init__(cfg, split) + self.num_classes = len(self.obj_types) + self.num_vertexes = 9 + self.max_objects = 32 + self.projector.register_buffer('corner_matrix', torch.tensor( + [[-1, -1, -1], + [ 1, -1, -1], + [ 1, 1, -1], + [ 1, 1, 1], + [ 1, -1, 1], + [-1, -1, 1], + [-1, 1, 1], + [-1, 1, -1], + [ 0, 0, 0]] + ).float() )# 9, 3 + + def _build_target(self, image:np.ndarray, P2:np.ndarray, transformed_label:List[KittiObj], scale=4)-> dict: + """Encode Targets for RTM3D + + Args: + image (np.ndarray): augmented image [H, W, 3] + P2 (np.ndarray): Calibration matrix [3, 4] + transformed_label (List[KittiObj]): A list of kitti objects. + scale (int, optional): Downsampling scale. Defaults to 4. + + Returns: + dict: label dicts + """ + num_objects = len(transformed_label) + hm_h, hm_w = image.shape[0] // scale, image.shape[1] // scale + + # setup empty targets + hm_main_center = np.zeros((self.num_classes, hm_h, hm_w), dtype=np.float32) + hm_ver = np.zeros((self.num_vertexes, hm_h, hm_w), dtype=np.float32) + + cen_offset = np.zeros((self.max_objects, 2), dtype=np.float32) + indices_center = np.zeros((self.max_objects), dtype=np.int64) + obj_mask = np.zeros((self.max_objects), dtype=np.uint8) + location = np.zeros((self.max_objects, 3), dtype=np.float32) + orientation = np.zeros((self.max_objects, 1), dtype=np.float32) + rotbin = np.zeros((self.max_objects, 2), dtype=np.int64) + rotres = np.zeros((self.max_objects, 2), dtype=np.float32) + ver_coor = np.zeros((self.max_objects, self.num_vertexes * 2), dtype=np.float32) + ver_coor_mask = np.zeros((self.max_objects, self.num_vertexes * 2), dtype=np.uint8) + ver_offset = np.zeros((self.max_objects * self.num_vertexes, 2), dtype=np.float32) + ver_offset_mask = np.zeros((self.max_objects * self.num_vertexes), dtype=np.uint8) + indices_vertexes = np.zeros((self.max_objects * self.num_vertexes), dtype=np.int64) + + dimension = np.zeros((self.max_objects, 3), dtype=np.float32) + + rots = np.zeros((self.max_objects, 2), dtype=np.float32) #[sin, cos] + + depth = np.zeros((self.max_objects, 1), dtype=np.float32) + whs = np.zeros((self.max_objects, 2), dtype=np.float32) + + # compute vertexes + bbox3d_state = np.zeros([len(transformed_label), 7]) #[camera_x, camera_y, z, w, h, l, alpha] + for obj in transformed_label: + obj.alpha = theta2alpha_3d(obj.ry, obj.x, obj.z, P2) + bbox3d_origin = torch.tensor([[obj.x, obj.y - 0.5 * obj.h, obj.z, obj.w, obj.h, obj.l, obj.alpha] for obj in transformed_label], dtype=torch.float32).reshape(-1, 7) + abs_corner, homo_corner, theta = self.projector.forward(bbox3d_origin, torch.tensor(P2, dtype=torch.float32)) + + # # For debuging and visualization, testing the correctness of bbox3d->bbox2d + # a = plt.figure(figsize=(16,9)) + # plt.subplot(3, 1, 1) + # image2 = np.array(np.clip(image * np.array([0.229, 0.224, 0.225]) + np.array([0.485, 0.456, 0.406]), 0, 1) * 255, dtype=np.uint8) + # max_xy, _= homo_corner[:, :, 0:2].max(dim = 1) # [N,2] + # min_xy, _= homo_corner[:, :, 0:2].min(dim = 1) # [N,2] + + # result = torch.cat([min_xy, max_xy], dim=-1) #[:, 4] + + # bbox2d = result.cpu().numpy() + # for i in range(len(transformed_label)): + # image2 = cv2.rectangle(image2, tuple(bbox2d[i, 0:2].astype(int)), tuple(bbox2d[i, 2:4].astype(int)), (0, 255, 0) , 3) + # draw_3D_box(image2, homo_corner[i].cpu().numpy().T) + # plt.imshow(image2) + # plt.show() + + + for k in range(num_objects): + obj = transformed_label[k] + cls_id = self.obj_types.index(obj.type) + bbox = np.array([obj.bbox_l, obj.bbox_t, obj.bbox_r, obj.bbox_b]) + orientation[k] = obj.ry + dim = np.array([obj.w, obj.h, obj.l]) + ry = obj.ry + alpha= obj.alpha + + if np.sin(alpha) < 0.5: #alpha < np.pi / 6. or alpha > 5 * np.pi / 6.: + rotbin[k, 0] = 1 + rotres[k, 0] = alpha - (-0.5 * np.pi) + if np.sin(alpha) > -0.5: # alpha > -np.pi / 6. or alpha < -5 * np.pi / 6.: + rotbin[k, 1] = 1 + rotres[k, 1] = alpha - (0.5 * np.pi) + + bbox = bbox / scale # on the heatmap + bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, image.shape[1] // scale) + bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, image.shape[0] // scale) + bbox_h, bbox_w = bbox[3] - bbox[1], bbox[2] - bbox[0] + if bbox_h > 0 and bbox_w > 0: + sigma = 1. # Just dummy + radius = 1 # Just dummy + + location[k] = bbox3d_origin[k, 0:3].float().cpu().numpy() + + radius = gaussian_radius((np.ceil(bbox_h), np.ceil(bbox_w))) + radius = max(0, int(radius)) + # Generate heatmaps for 8 vertexes + vertexes_2d = homo_corner[k, :, 0:2].numpy() + + vertexes_2d = vertexes_2d / scale # on the heatmap + + center = np.array([(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2], dtype=np.float32) + center_int = center.astype(np.int32) + + if not (0 <= center_int[0] < hm_w and 0 <= center_int[1] < hm_h): + continue + + # Generate heatmaps for main center + gen_hm_radius(hm_main_center[cls_id], center, radius) + # Index of the center + indices_center[k] = center_int[1] * hm_w + center_int[0] + + for ver_idx, ver in enumerate(vertexes_2d): + ver_int = ver.astype(np.int32) + + # targets for vertexes coordinates + ver_coor[k, ver_idx * 2: (ver_idx + 1) * 2] = ver - center_int # Don't take the absolute values + ver_coor_mask[k, ver_idx * 2: (ver_idx + 1) * 2] = 1 + + if (0 <= ver_int[0] < hm_w) and (0 <= ver_int[1] < hm_h): + gen_hm_radius(hm_ver[ver_idx], ver_int, radius) + + # targets for vertexes offset + ver_offset[k * self.num_vertexes + ver_idx] = ver - ver_int + ver_offset_mask[k * self.num_vertexes + ver_idx] = 1 + # Indices of vertexes + indices_vertexes[k * self.num_vertexes + ver_idx] = ver_int[1] * hm_w + ver_int[0] + + # targets for center offset + cen_offset[k] = center - center_int + + # targets for dimension + dimension[k] = dim + + # targets for orientation + rots[k, 0] = np.sin(alpha) + rots[k, 1] = np.cos(alpha) + + # targets for depth + depth[k] = obj.z + + # targets for 2d bbox + whs[k, 0] = bbox_w + whs[k, 1] = bbox_h + + # Generate masks + obj_mask[k] = 1 + # Follow official names + targets = { + 'hm': hm_main_center, + 'hm_hp': hm_ver, + 'hps': ver_coor, + 'reg': cen_offset, + 'hp_offset': ver_offset, + 'dim': dimension, #whl + 'rots': rots, # sin cos alpha + 'rotbin': rotbin, + 'rotres': rotres, + 'dep': depth, + 'ind': indices_center, + 'hp_ind': indices_vertexes, + 'reg_mask': obj_mask, + 'hps_mask': ver_coor_mask, + 'hp_mask': ver_offset_mask, + 'wh': whs, + 'location': location, + 'ori': orientation + } + + return targets + + def __getitem__(self, index): + kitti_data = self.imdb[index % len(self.imdb)] + # The calib and label has been preloaded to minimize the time in each indexing + if index >= len(self.imdb): + kitti_data.output_dict = { + "calib": True, + "image": False, + "image_3":True, + "label": False, + "velodyne": False + } + calib, _, image, _, _ = kitti_data.read_data() + calib.P2 = calib.P3 # a workaround to use P3 for right camera images. 3D bboxes are the same(cx, cy, z, w, h, l, alpha) + else: + kitti_data.output_dict = self.output_dict + _, image, _, _ = kitti_data.read_data() + calib = kitti_data.calib + calib.image_shape = image.shape + label = kitti_data.label # label: list of kittiObj + label = [] + for obj in kitti_data.label: + if obj.type in self.obj_types: + label.append(obj) + transformed_image, transformed_P2, transformed_label = self.transform(image, p2=deepcopy(calib.P2), labels=deepcopy(label)) + targets = self._build_target(transformed_image, transformed_P2, transformed_label) + + output_dict = {'calib': transformed_P2, + 'image': transformed_image, + 'label': targets, + 'original_shape':image.shape, + 'original_P':calib.P2.copy()} + + return output_dict + + + def __len__(self): + return len(self.imdb) + + @staticmethod + def collate_fn(batch): + rgb_images = np.array([item["image"] for item in batch])#[batch, H, W, 3] + rgb_images = rgb_images.transpose([0, 3, 1, 2]) + + calib = [item["calib"] for item in batch] + + # gather labels + label = {} + for key in batch[0]['label']: + label[key] = torch.from_numpy( + np.stack( + [ + item['label'][key] for item in batch + ], axis=0 + ) + ) + return torch.from_numpy(rgb_images).float(), torch.tensor(calib).float(), label diff --git a/visualDet3D/data/kitti/dataset/__init__.py b/visualDet3D/data/kitti/dataset/__init__.py index 866fe31..d41106f 100644 --- a/visualDet3D/data/kitti/dataset/__init__.py +++ b/visualDet3D/data/kitti/dataset/__init__.py @@ -1,2 +1,4 @@ from .mono_dataset import KittiMonoDataset, KittiMonoTestDataset -from .depth_mono_dataset import KittiDepthMonoDataset, KittiDepthMonoValTestDataset \ No newline at end of file +from .depth_mono_dataset import KittiDepthMonoDataset, KittiDepthMonoValTestDataset +from .stereo_dataset import KittiStereoDataset, KittiStereoTestDataset +from .KM3D_dataset import KittiRTM3DDataset \ No newline at end of file diff --git a/visualDet3D/data/kitti/dataset/mono_dataset.py b/visualDet3D/data/kitti/dataset/mono_dataset.py index 9920173..a3338f1 100644 --- a/visualDet3D/data/kitti/dataset/mono_dataset.py +++ b/visualDet3D/data/kitti/dataset/mono_dataset.py @@ -11,6 +11,7 @@ import torch import torch.nn as nn import torch.utils.data +from visualDet3D.utils.utils import alpha2theta_3d, theta2alpha_3d from visualDet3D.data.kitti.kittidata import KittiData, KittiObj, KittiCalib from visualDet3D.data.pipeline import build_augmentator import os @@ -60,6 +61,8 @@ def __init__(self, cfg, split='training'): def _reproject(self, P2:np.ndarray, transformed_label:List[KittiObj]) -> Tuple[List[KittiObj], np.ndarray]: bbox3d_state = np.zeros([len(transformed_label), 7]) #[camera_x, camera_y, z, w, h, l, alpha] + for obj in transformed_label: + obj.alpha = theta2alpha_3d(obj.ry, obj.x, obj.z, P2) bbox3d_origin = torch.tensor([[obj.x, obj.y - 0.5 * obj.h, obj.z, obj.w, obj.h, obj.l, obj.alpha] for obj in transformed_label], dtype=torch.float32) abs_corner, homo_corner, _ = self.projector(bbox3d_origin, bbox3d_origin.new(P2)) for i, obj in enumerate(transformed_label): diff --git a/visualDet3D/data/kitti/dataset/stereo_dataset.py b/visualDet3D/data/kitti/dataset/stereo_dataset.py new file mode 100644 index 0000000..b4145f0 --- /dev/null +++ b/visualDet3D/data/kitti/dataset/stereo_dataset.py @@ -0,0 +1,204 @@ +from __future__ import print_function, division +import sys +import os +import torch +import numpy as np +import random +import csv +from typing import List, Tuple +from torch.utils.data import Dataset, DataLoader +import torch +import torch.utils.data +from visualDet3D.data.kitti.kittidata import KittiData, KittiObj, KittiCalib +from visualDet3D.data.pipeline import build_augmentator + +import os +import pickle +import numpy as np +from copy import deepcopy +from visualDet3D.utils.utils import alpha2theta_3d, theta2alpha_3d, draw_3D_box +from visualDet3D.networks.utils import BBox3dProjector +from visualDet3D.networks.utils.registry import DATASET_DICT +import sys +from matplotlib import pyplot as plt +ros_py_path = '/opt/ros/kinetic/lib/python2.7/dist-packages' +if sys.version_info > (3, 0) and ros_py_path in sys.path: + #Python 3, compatible with a naive ros environment + sys.path.remove(ros_py_path) + import cv2 + sys.path.append(ros_py_path) +else: + #Python 2 + import cv2 + +@DATASET_DICT.register_module +class KittiStereoDataset(torch.utils.data.Dataset): + """Some Information about KittiDataset""" + def __init__(self, cfg, split='training'): + super(KittiStereoDataset, self).__init__() + preprocessed_path = cfg.path.preprocessed_path + obj_types = cfg.obj_types + aug_cfg = cfg.data.augmentation + is_train = (split == 'training') + imdb_file_path = os.path.join(preprocessed_path, split, 'imdb.pkl') + self.imdb = pickle.load(open(imdb_file_path, 'rb')) # list of kittiData + self.output_dict = { + "calib": True, + "image": True, + "image_3":True, + "label": False, + "velodyne": False + } + if is_train: + self.transform = build_augmentator(cfg.data.train_augmentation) + else: + self.transform = build_augmentator(cfg.data.test_augmentation) + self.projector = BBox3dProjector() + self.is_train = is_train + self.obj_types = obj_types + self.preprocessed_path = preprocessed_path + + def _reproject(self, P2:np.ndarray, transformed_label:List[KittiObj]) -> Tuple[List[KittiObj], np.ndarray]: + bbox3d_state = np.zeros([len(transformed_label), 7]) #[camera_x, camera_y, z, w, h, l, alpha] + if len(transformed_label) > 0: + #for obj in transformed_label: + # obj.alpha = theta2alpha_3d(obj.ry, obj.x, obj.z, P2) + bbox3d_origin = torch.tensor([[obj.x, obj.y - 0.5 * obj.h, obj.z, obj.w, obj.h, obj.l, obj.alpha] for obj in transformed_label], dtype=torch.float32) + try: + abs_corner, homo_corner, _ = self.projector.forward(bbox3d_origin, bbox3d_origin.new(P2)) + except: + print('\n',bbox3d_origin.shape, len(transformed_label), len(label), label, transformed_label, bbox3d_origin) + for i, obj in enumerate(transformed_label): + extended_center = np.array([obj.x, obj.y - 0.5 * obj.h, obj.z, 1])[:, np.newaxis] #[4, 1] + extended_bottom = np.array([obj.x, obj.y, obj.z, 1])[:, np.newaxis] #[4, 1] + image_center = (P2 @ extended_center)[:, 0] #[3] + image_center[0:2] /= image_center[2] + + image_bottom = (P2 @ extended_bottom)[:, 0] #[3] + image_bottom[0:2] /= image_bottom[2] + + bbox3d_state[i] = np.concatenate([image_center, + [obj.w, obj.h, obj.l, obj.alpha]]) #[7] + + max_xy, _= homo_corner[:, :, 0:2].max(dim = 1) # [N,2] + min_xy, _= homo_corner[:, :, 0:2].min(dim = 1) # [N,2] + + result = torch.cat([min_xy, max_xy], dim=-1) #[:, 4] + + bbox2d = result.cpu().numpy() + + for i in range(len(transformed_label)): + transformed_label[i].bbox_l = bbox2d[i, 0] + transformed_label[i].bbox_t = bbox2d[i, 1] + transformed_label[i].bbox_r = bbox2d[i, 2] + transformed_label[i].bbox_b = bbox2d[i, 3] + return transformed_label, bbox3d_state + + def __getitem__(self, index): + kitti_data = self.imdb[index] + # The calib and label has been preloaded to minimize the time in each indexing + kitti_data.output_dict = self.output_dict + calib, left_image, right_image, _, _ = kitti_data.read_data() + calib.image_shape = left_image.shape + label = [] + for obj in kitti_data.label: + if obj.type in self.obj_types: + label.append(obj) + transformed_left_image, transformed_right_image, P2, P3, transformed_label = self.transform( + left_image, right_image, deepcopy(calib.P2),deepcopy(calib.P3), deepcopy(label) + ) + bbox3d_state = np.zeros([len(transformed_label), 7]) #[camera_x, camera_y, z, w, h, l, alpha] + + if len(transformed_label) > 0: + transformed_label, bbox3d_state = self._reproject(P2, transformed_label) + + if self.is_train: + if abs(P2[0, 3]) < abs(P3[0, 3]): # not mirrored or swaped, disparity should base on pointclouds projecting through P2 + disparity = cv2.imread(os.path.join(self.preprocessed_path, 'training', 'disp', "P2%06d.png" % index), -1) + else: # mirrored and swap, disparity should base on pointclouds projecting through P3, and also mirrored + disparity = cv2.imread(os.path.join(self.preprocessed_path, 'training', 'disp', "P3%06d.png" % index), -1) + disparity = disparity[:, ::-1] + disparity = disparity / 16.0 + else: + disparity = None + + bbox2d = np.array([[obj.bbox_l, obj.bbox_t, obj.bbox_r, obj.bbox_b] for obj in transformed_label]) + + output_dict = {'calib': [P2, P3], + 'image': [transformed_left_image, transformed_right_image], + 'label': [obj.type for obj in transformed_label], + 'bbox2d': bbox2d, #[N, 4] [x1, y1, x2, y2] + 'bbox3d': bbox3d_state, + 'original_shape': calib.image_shape, + 'disparity': disparity, + 'original_P':calib.P2.copy()} + return output_dict + + def __len__(self): + return len(self.imdb) + + @staticmethod + def collate_fn(batch): + left_images = np.array([item["image"][0] for item in batch])#[batch, H, W, 3] + left_images = left_images.transpose([0, 3, 1, 2]) + + right_images = np.array([item["image"][1] for item in batch])#[batch, H, W, 3] + right_images = right_images.transpose([0, 3, 1, 2]) + + P2 = [item['calib'][0] for item in batch] + P3 = [item['calib'][1] for item in batch] + label = [item['label'] for item in batch] + bbox2ds = [item['bbox2d'] for item in batch] + bbox3ds = [item['bbox3d'] for item in batch] + disparities = [item['disparity'] for item in batch] + if disparities[0] is None: + return torch.from_numpy(left_images).float(), torch.from_numpy(right_images).float(), torch.tensor(P2).float(), torch.tensor(P3).float(), label, bbox2ds, bbox3ds + else: + return torch.from_numpy(left_images).float(), torch.from_numpy(right_images).float(), torch.tensor(P2).float(), torch.tensor(P3).float(), label, bbox2ds, bbox3ds, torch.tensor(disparities).float() + +@DATASET_DICT.register_module +class KittiStereoTestDataset(KittiStereoDataset): + def __init__(self, cfg, split='test'): + preprocessed_path = cfg.path.preprocessed_path + obj_types = cfg.obj_types + aug_cfg = cfg.data.augmentation + super(KittiStereoTestDataset, self).__init__(cfg, split) + imdb_file_path = os.path.join(preprocessed_path, 'test', 'imdb.pkl') + self.imdb = pickle.load(open(imdb_file_path, 'rb')) # list of kittiData + self.output_dict = { + "calib": True, + "image": True, + "image_3":True, + "label": False, + "velodyne": False + } + + def __getitem__(self, index): + kitti_data = self.imdb[index] + # The calib and label has been preloaded to minimize the time in each indexing + kitti_data.output_dict = self.output_dict + calib, left_image, right_image, _, _ = kitti_data.read_data() + calib.image_shape = left_image.shape + + transformed_left_image, transformed_right_image, P2, P3 = self.transform( + left_image, right_image, deepcopy(calib.P2),deepcopy(calib.P3) + ) + + output_dict = {'calib': [P2, P3], + 'image': [transformed_left_image, transformed_right_image], + 'original_shape': calib.image_shape, + 'original_P':calib.P2.copy()} + return output_dict + + @staticmethod + def collate_fn(batch): + left_images = np.array([item["image"][0] for item in batch])#[batch, H, W, 3] + left_images = left_images.transpose([0, 3, 1, 2]) + + right_images = np.array([item["image"][1] for item in batch])#[batch, H, W, 3] + right_images = right_images.transpose([0, 3, 1, 2]) + + P2 = [item['calib'][0] for item in batch] + P3 = [item['calib'][1] for item in batch] + return torch.from_numpy(left_images).float(), torch.from_numpy(right_images).float(), P2, P3 + diff --git a/visualDet3D/data/kitti/utils.py b/visualDet3D/data/kitti/utils.py index 3a9e21e..ca5a622 100644 --- a/visualDet3D/data/kitti/utils.py +++ b/visualDet3D/data/kitti/utils.py @@ -81,6 +81,83 @@ def _leftcam2imgplane(pts, P2): pixels[:, 1] /= pixels[:, 2] + 1e-6 return pixels[:, :2] +@jit(nopython=True, cache=True) +def generate_dispariy_from_velo(pc_velo:np.ndarray, + height:int, + width:int, + Tr_velo_to_cam:np.ndarray, + R0_rect:np.ndarray, + P2:np.ndarray, + baseline:float=0.54): + """ + Generate disparity map from point clouds. + Args: + pc_velo : point clouds in lidar coordinate; np.array of shape [n, 3] -> [[x, y, z]; ...] + height, width : output disparity map shape; int + Tr_velo_to_cam : transform from lidar to camera; np.array [3, 4] -> [R | T] + R0_rect : rotation transform into camera coordinates(z forward, x towards right); np.array [3, 4] -> [R | T] + P2 : transform from P0 camera coordinates to target image plane; np.array [3, 4] -> [R | T] + baseline : baseline length in meter of the stereo setup; float + Output: + disp_map : disparity map; np.array of [height, width], dtype=np.uint16; if disp_map==0 -> should be ignore + """ + #pts_2d = calib.project_velo_to_image(pc_velo) + pts_cam = _lidar2leftcam(pc_velo, Tr_velo_to_cam, R0_rect) + pts_2d = _leftcam2imgplane(pts_cam, P2) + fov_inds = (pts_2d[:, 0] < width - 1) & (pts_2d[:, 0] >= 0) & \ + (pts_2d[:, 1] < height - 1) & (pts_2d[:, 1] >= 0) + fov_inds = fov_inds & (pc_velo[:, 0] > 2) + imgfov_pts_2d = pts_2d[fov_inds, :] + imgfov_pc_rect = pts_cam[fov_inds, :] + depth_map = np.ones((height, width)) * 1e9 + imgfov_pts_2d = imgfov_pts_2d.astype(np.int32)#np.round(imgfov_pts_2d).astype(int) + for i in range(imgfov_pts_2d.shape[0]): + depth = imgfov_pc_rect[i, 2] + depth_map[int(imgfov_pts_2d[i, 1]), int(imgfov_pts_2d[i, 0])] = depth + + disp_map = (P2[0, 0] * baseline) / (depth_map) * 16 + disp_map = disp_map.astype(np.uint16) + return disp_map + +@jit(nopython=True, cache=True) +def generate_depth_from_velo(pc_velo:np.ndarray, + height:int, + width:int, + Tr_velo_to_cam:np.ndarray, + R0_rect:np.ndarray, + P2:np.ndarray, + base_depth:Optional[np.ndarray]=None): + """ + Generate disparity map from point clouds. + Args: + pc_velo : point clouds in lidar coordinate; np.array of shape [n, 3] -> [[x, y, z]; ...] + height, width : output disparity map shape; int + Tr_velo_to_cam : transform from lidar to camera; np.array [3, 4] -> [R | T] + R0_rect : rotation transform into camera coordinates(z forward, x towards right); np.array [3, 4] -> [R | T] + P2 : transform from P0 camera coordinates to target image plane; np.array [3, 4] -> [R | T] + baseline : baseline length in meter of the stereo setup; float + Output: + disp_map : disparity map; np.array of [height, width], dtype=np.uint16; if disp_map==0 -> should be ignore + """ + #pts_2d = calib.project_velo_to_image(pc_velo) + pts_cam = _lidar2leftcam(pc_velo, Tr_velo_to_cam, R0_rect) + pts_2d = _leftcam2imgplane(pts_cam, P2) + fov_inds = (pts_2d[:, 0] < width - 1) & (pts_2d[:, 0] >= 0) & \ + (pts_2d[:, 1] < height - 1) & (pts_2d[:, 1] >= 0) + fov_inds = fov_inds & (pc_velo[:, 0] > 2) + imgfov_pts_2d = pts_2d[fov_inds, :] + imgfov_pc_rect = pts_cam[fov_inds, :] + + if base_depth is None: + depth_map = np.zeros((height, width)) + else: + depth_map = base_depth + imgfov_pts_2d = imgfov_pts_2d.astype(np.int32)#np.round(imgfov_pts_2d).astype(int) + for i in range(imgfov_pts_2d.shape[0]): + depth = imgfov_pc_rect[i, 2] + depth_map[int(imgfov_pts_2d[i, 1]), int(imgfov_pts_2d[i, 0])] = depth + + return depth_map def write_result_to_file(base_result_path:str, index:int, scores, bbox_2d, bbox_3d_state_3d=None, thetas=None, obj_types=['Car', 'Pedestrian', 'Cyclist'], threshold=0.4): diff --git a/visualDet3D/data/pipeline/augmentation_builder.py b/visualDet3D/data/pipeline/augmentation_builder.py index 9787da7..f984491 100644 --- a/visualDet3D/data/pipeline/augmentation_builder.py +++ b/visualDet3D/data/pipeline/augmentation_builder.py @@ -3,14 +3,33 @@ from easydict import EasyDict from visualDet3D.networks.utils.registry import AUGMENTATION_DICT from visualDet3D.data.kitti.kittidata import KittiObj + +def build_single_augmentator(cfg:EasyDict): + name:str = cfg.type_name + keywords:dict = getattr(cfg, 'keywords', dict()) + return AUGMENTATION_DICT[name](**keywords) + +@AUGMENTATION_DICT.register_module class Compose(object): """ Composes a set of functions which take in an image and an object, into a single transform """ - def __init__(self, transforms:List[Callable], is_return_all=True): - self.transforms = transforms + # def __init__(self, transforms:List[Callable], is_return_all=True): + # self.transforms = transforms + # self.is_return_all = is_return_all + + def __init__(self, aug_list:List[EasyDict], is_return_all=True): + self.transforms:List[Callable] = [] + for item in aug_list: + self.transforms.append(build_single_augmentator(item)) self.is_return_all = is_return_all + @classmethod + def from_transforms(cls, transforms:List[Callable]): + instance:Compose = cls(aug_list=[]) + instance.transforms = transforms + return instance + def __call__(self, left_image:np.ndarray, right_image:Union[None, np.ndarray]=None, p2:Union[None, np.ndarray]=None, @@ -34,12 +53,5 @@ def __call__(self, left_image:np.ndarray, return [item for item in return_list if item is not None] -def build_augmentator(aug_cfg:EasyDict)->Compose: - transformers:List[Callable] = [] - for item in aug_cfg: - name = item.type_name - keywords = getattr(item, 'keywords', dict()) - transformers.append( - AUGMENTATION_DICT[name](**keywords) - ) - return Compose(transformers, is_return_all=False) +def build_augmentator(aug_cfg:List[EasyDict])->Compose: + return Compose(aug_cfg, is_return_all=False) diff --git a/visualDet3D/data/pipeline/stereo_augmentator.py b/visualDet3D/data/pipeline/stereo_augmentator.py index fd5cb4f..0a2be71 100644 --- a/visualDet3D/data/pipeline/stereo_augmentator.py +++ b/visualDet3D/data/pipeline/stereo_augmentator.py @@ -16,13 +16,15 @@ import math import os import sys +from easydict import EasyDict +from typing import List from matplotlib import pyplot as plt from visualDet3D.networks.utils.utils import BBox3dProjector -from visualDet3D.utils.utils import draw_3D_box +from visualDet3D.utils.utils import draw_3D_box, theta2alpha_3d from visualDet3D.networks.utils.registry import AUGMENTATION_DICT from visualDet3D.data.kitti.kittidata import KittiObj import torch -from .augmentation_builder import Compose +from .augmentation_builder import Compose, build_single_augmentator @AUGMENTATION_DICT.register_module class ConvertToFloat(object): @@ -293,6 +295,31 @@ def __call__(self, left_image, right_image=None, p2=None, p3=None, labels=None, return left_image, right_image, p2, p3, labels, image_gt, lidar +@AUGMENTATION_DICT.register_module +class FilterObject(object): + """ + Filtering out object completely outside of the box; + """ + def __init__(self): + pass + + def __call__(self, left_image, right_image=None, p2=None, p3=None, labels=None, image_gt=None, lidar=None): + height, width = left_image.shape[0:2] + + if labels is not None: + new_labels = [] + if isinstance(labels, list): + # scale all coordinates + for obj in labels: + is_outside = ( + obj.bbox_b < 0 or obj.bbox_t > height or obj.bbox_r < 0 or obj.bbox_l > width + ) + if not is_outside: + new_labels.append(obj) + else: + new_labels = None + + return left_image, right_image, p2, p3, new_labels, image_gt, lidar @AUGMENTATION_DICT.register_module class RandomCropToWidth(object): @@ -402,13 +429,75 @@ def __call__(self, left_image, right_image=None, p2=None, p3=None, labels=None, obj.ry = ry # alpha - obj.alpha = ry - np.arctan2(-z, obj.x) - 0.5 * np.pi - + obj.alpha = theta2alpha_3d(ry, obj.x, z, p2) + if lidar is not None: lidar[:, :, 0] = -lidar[:, :, 0] return left_image, right_image, p2, p3, labels, image_gt, lidar +@AUGMENTATION_DICT.register_module +class RandomWarpAffine(object): + """ + Randomly random scale and random shift the image. Then resize to a fixed output size. + """ + def __init__(self, scale_lower=0.6, scale_upper=1.4, shift_border=128, output_w=1280, output_h=384): + self.scale_lower = scale_lower + self.scale_upper = scale_upper + self.shift_border = shift_border + self.output_w = output_w + self.output_h = output_h + + def __call__(self, left_image, right_image=None, p2=None, p3=None, labels=None, image_gt=None, lidar=None): + s_original = max(left_image.shape[0], left_image.shape[1]) + center_original = np.array([left_image.shape[1] / 2., left_image.shape[0] / 2.], dtype=np.float32) + scale = s_original * np.random.uniform(self.scale_lower, self.scale_upper) + center_w = np.random.randint(low=self.shift_border, high=left_image.shape[1] - self.shift_border) + center_h = np.random.randint(low=self.shift_border, high=left_image.shape[0] - self.shift_border) + + final_scale = max(self.output_w, self.output_h) / scale + final_shift_w = self.output_w / 2 - center_w * final_scale + final_shift_h = self.output_h / 2 - center_h * final_scale + affine_transform = np.array( + [ + [final_scale, 0, final_shift_w], + [0, final_scale, final_shift_h] + ], dtype=np.float32 + ) + + left_image = cv2.warpAffine(left_image, affine_transform, + (self.output_w, self.output_h), flags=cv2.INTER_LINEAR) + if right_image is not None: + right_image = cv2.warpAffine(right_image, affine_transform, + (self.output_w, self.output_h), flags=cv2.INTER_LINEAR) + + if image_gt is not None: + image_gt = cv2.warpAffine(image_gt, affine_transform, + (self.output_w, self.output_h), flags=cv2.INTER_LINEAR) + + if p2 is not None: + p2[0:2, :] *= final_scale + p2[0, 2] = p2[0, 2] + final_shift_w # cy' = cy - dv + p2[0, 3] = p2[0, 3] + final_shift_w * p2[2, 3] # ty' = ty - dv * tz + p2[1, 2] = p2[1, 2] + final_shift_h # cy' = cy - dv + p2[1, 3] = p2[1, 3] + final_shift_h * p2[2, 3] # ty' = ty - dv * tz + + if p3 is not None: + p3[0:2, :] *= final_scale + p3[0, 2] = p3[0, 2] + final_shift_w # cy' = cy - dv + p3[0, 3] = p3[0, 3] + final_shift_w * p3[2, 3] # ty' = ty - dv * tz + p3[1, 2] = p3[1, 2] + final_shift_h # cy' = cy - dv + p3[1, 3] = p3[1, 3] + final_shift_h * p3[2, 3] # ty' = ty - dv * tz + + if labels: + if isinstance(labels, list): + for obj in labels: + obj.bbox_l = obj.bbox_l * final_scale + final_shift_w + obj.bbox_r = obj.bbox_r * final_scale + final_shift_w + obj.bbox_t = obj.bbox_t * final_scale + final_shift_h + obj.bbox_b = obj.bbox_b * final_scale + final_shift_h + + return left_image, right_image, p2, p3, labels, image_gt, lidar @AUGMENTATION_DICT.register_module class RandomHue(object): @@ -508,6 +597,35 @@ def __call__(self, left_image, right_image=None, p2=None, p3=None, labels=None, right_image += delta return left_image, right_image, p2, p3, labels, image_gt, lidar +@AUGMENTATION_DICT.register_module +class RandomEigenvalueNoise(object): + """ + Randomly apply noise in RGB color channels based on the eigenvalue and eigenvector of ImageNet + """ + def __init__(self, distort_prob=1.0, + alphastd=0.1, + eigen_value=np.array([0.2141788, 0.01817699, 0.00341571], dtype=np.float32), + eigen_vector=np.array([ + [-0.58752847, -0.69563484, 0.41340352], + [-0.5832747, 0.00994535, -0.81221408], + [-0.56089297, 0.71832671, 0.41158938] + ], dtype=np.float32) + ): + self.distort_prob = distort_prob + self._eig_val = eigen_value + self._eig_vec = eigen_vector + self.alphastd = alphastd + + def __call__(self, left_image, right_image=None, p2=None, p3=None, labels=None, image_gt=None, lidar=None): + if random.rand() <= self.distort_prob: + alpha = np.random.normal(scale=self.alphastd, size=(3, )) + noise = np.dot(self._eig_vec, self._eig_val * alpha) * 255 + + left_image += noise + if right_image is not None: + right_image += noise + + return left_image, right_image, p2, p3, labels, image_gt, lidar @AUGMENTATION_DICT.register_module class PhotometricDistort(object): @@ -545,7 +663,7 @@ def __call__(self, left_image, right_image=None, p2=None, p3=None, labels=None, distortion.insert(0, self.rand_brightness) # compose transformation - distortion = Compose(distortion) + distortion = Compose.from_transforms(distortion) return distortion(left_image.copy(), right_image if right_image is None else right_image.copy(), p2, p3, labels, image_gt, lidar) @@ -565,7 +683,7 @@ def __init__(self, cfg): self.distort_prob = cfg.distortProb if cfg.distortProb <= 0: - self.augment = Compose([ + self.augment = Compose.from_transforms([ ConvertToFloat(), CropTop(cfg.crop_top), Resize(self.size), @@ -573,7 +691,7 @@ def __init__(self, cfg): Normalize(self.mean, self.stds) ]) else: - self.augment = Compose([ + self.augment = Compose.from_transforms([ ConvertToFloat(), PhotometricDistort(self.distort_prob), CropTop(cfg.crop_top), @@ -598,7 +716,7 @@ def __init__(self, cfg): self.stds = cfg.rgb_std self.size = cfg.cropSize - self.preprocess = Compose([ + self.preprocess = Compose.from_transforms([ ConvertToFloat(), CropTop(cfg.crop_top), Resize(self.size), @@ -612,3 +730,25 @@ def __call__(self, left_image, right_image, p2=None, p3=None, labels=None, image #img = np.transpose(img, [2, 0, 1]) return left_image, right_image, p2, p3, labels, image_gt, lidar + +@AUGMENTATION_DICT.register_module +class Shuffle(object): + """ + Initialize a sequence of transformations. During function call, it will randomly shuffle the augmentation calls. + + Can be used with Compose to build complex augmentation structures. + """ + def __init__(self, aug_list:List[EasyDict]): + self.transforms = [ + build_single_augmentator(aug_cfg) for aug_cfg in aug_list + ] + + def __call__(self, left_image, right_image=None, p2=None, p3=None, labels=None, image_gt=None, lidar=None): + # We aim to keep the original order of the initialized transforms in self.transforms, so we only randomize the indexes. + shuffled_indexes = np.random.permutation(len(self.transforms)) + + for index in shuffled_indexes: + left_image, right_image, p2, p3, labels, image_gt, lidar = self.transforms[index](left_image, right_image, p2, p3, labels, image_gt, lidar) + + return left_image, right_image, p2, p3, labels, image_gt, lidar + diff --git a/visualDet3D/networks/backbones/__init__.py b/visualDet3D/networks/backbones/__init__.py index 22079cb..0a05bd1 100644 --- a/visualDet3D/networks/backbones/__init__.py +++ b/visualDet3D/networks/backbones/__init__.py @@ -1 +1,13 @@ -from .resnet import resnet101, resnet152, resnet18, resnet34, resnet50, ResNet, resnet \ No newline at end of file +from .resnet import resnet101, resnet152, resnet18, resnet34, resnet50, ResNet, resnet +from .dla import dlanet +from visualDet3D.networks.utils.registry import BACKBONE_DICT + +def build_backbone(cfg): + temp_cfg = cfg.copy() + name = "" + if 'name' in temp_cfg: + name = temp_cfg.pop('name') + else: + name = 'resnet' + + return BACKBONE_DICT[name](**temp_cfg) diff --git a/visualDet3D/networks/backbones/dla.py b/visualDet3D/networks/backbones/dla.py new file mode 100644 index 0000000..60038c9 --- /dev/null +++ b/visualDet3D/networks/backbones/dla.py @@ -0,0 +1,441 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +import math +from os.path import join + +import torch +from torch import nn +import torch.utils.model_zoo as model_zoo +from typing import Tuple +from visualDet3D.networks.utils.registry import BACKBONE_DICT + +BatchNorm = nn.BatchNorm2d + +WEB_ROOT = 'http://dl.yf.io/dla/models' + + +model_hash={'dla34': 'ba72cf86', + 'dla46_c': '2bfd52c3', + 'dla46x_c': 'd761bae7', + 'dla60x_c': 'b870c45c', + 'dla60': '24839fc4', + 'dla60x': 'd15cacda', + 'dla102': 'd94d9790', + 'dla102x': 'ad62be81', + 'dla102x2': '262837b6', + 'dla169': '0914e092'} + +def get_model_url(name): + return join(WEB_ROOT, 'imagenet', + '{}-{}.pth'.format(name, model_hash[name])) + + +def conv3x3(in_planes, out_planes, stride=1): + "3x3 convolution with padding" + return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, + padding=1, bias=False) + + +class BasicBlock(nn.Module): + def __init__(self, inplanes, planes, stride=1, dilation=1): + super(BasicBlock, self).__init__() + self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3, + stride=stride, padding=dilation, + bias=False, dilation=dilation) + self.bn1 = BatchNorm(planes) + self.relu = nn.ReLU(inplace=True) + self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, + stride=1, padding=dilation, + bias=False, dilation=dilation) + self.bn2 = BatchNorm(planes) + self.stride = stride + + def forward(self, x, residual=None): + if residual is None: + residual = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + + out = self.conv2(out) + out = self.bn2(out) + + out += residual + out = self.relu(out) + + return out + + +class Bottleneck(nn.Module): + expansion = 2 + + def __init__(self, inplanes, planes, stride=1, dilation=1): + super(Bottleneck, self).__init__() + expansion = Bottleneck.expansion + bottle_planes = planes // expansion + self.conv1 = nn.Conv2d(inplanes, bottle_planes, + kernel_size=1, bias=False) + self.bn1 = BatchNorm(bottle_planes) + self.conv2 = nn.Conv2d(bottle_planes, bottle_planes, kernel_size=3, + stride=stride, padding=dilation, + bias=False, dilation=dilation) + self.bn2 = BatchNorm(bottle_planes) + self.conv3 = nn.Conv2d(bottle_planes, planes, + kernel_size=1, bias=False) + self.bn3 = BatchNorm(planes) + self.relu = nn.ReLU(inplace=True) + self.stride = stride + + def forward(self, x, residual=None): + if residual is None: + residual = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + + out = self.conv2(out) + out = self.bn2(out) + out = self.relu(out) + + out = self.conv3(out) + out = self.bn3(out) + + out += residual + out = self.relu(out) + + return out + + +class BottleneckX(nn.Module): + expansion = 2 + cardinality = 32 + + def __init__(self, inplanes, planes, stride=1, dilation=1): + super(BottleneckX, self).__init__() + cardinality = BottleneckX.cardinality + # dim = int(math.floor(planes * (BottleneckV5.expansion / 64.0))) + # bottle_planes = dim * cardinality + bottle_planes = planes * cardinality // 32 + self.conv1 = nn.Conv2d(inplanes, bottle_planes, + kernel_size=1, bias=False) + self.bn1 = BatchNorm(bottle_planes) + self.conv2 = nn.Conv2d(bottle_planes, bottle_planes, kernel_size=3, + stride=stride, padding=dilation, bias=False, + dilation=dilation, groups=cardinality) + self.bn2 = BatchNorm(bottle_planes) + self.conv3 = nn.Conv2d(bottle_planes, planes, + kernel_size=1, bias=False) + self.bn3 = BatchNorm(planes) + self.relu = nn.ReLU(inplace=True) + self.stride = stride + + def forward(self, x, residual=None): + if residual is None: + residual = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + + out = self.conv2(out) + out = self.bn2(out) + out = self.relu(out) + + out = self.conv3(out) + out = self.bn3(out) + + out += residual + out = self.relu(out) + + return out + + +class Root(nn.Module): + def __init__(self, in_channels, out_channels, kernel_size, residual): + super(Root, self).__init__() + self.conv = nn.Conv2d( + in_channels, out_channels, kernel_size, + stride=1, bias=False, padding=(kernel_size - 1) // 2) + self.bn = BatchNorm(out_channels) + self.relu = nn.ReLU(inplace=True) + self.residual = residual + + def forward(self, *x): + children = x + x = self.conv(torch.cat(x, 1)) + x = self.bn(x) + if self.residual: + x += children[0] + x = self.relu(x) + + return x + + +class Tree(nn.Module): + def __init__(self, levels, block, in_channels, out_channels, stride=1, + level_root=False, root_dim=0, root_kernel_size=1, + dilation=1, root_residual=False): + super(Tree, self).__init__() + if root_dim == 0: + root_dim = 2 * out_channels + if level_root: + root_dim += in_channels + if levels == 1: + self.tree1 = block(in_channels, out_channels, stride, + dilation=dilation) + self.tree2 = block(out_channels, out_channels, 1, + dilation=dilation) + else: + self.tree1 = Tree(levels - 1, block, in_channels, out_channels, + stride, root_dim=0, + root_kernel_size=root_kernel_size, + dilation=dilation, root_residual=root_residual) + self.tree2 = Tree(levels - 1, block, out_channels, out_channels, + root_dim=root_dim + out_channels, + root_kernel_size=root_kernel_size, + dilation=dilation, root_residual=root_residual) + if levels == 1: + self.root = Root(root_dim, out_channels, root_kernel_size, + root_residual) + self.level_root = level_root + self.root_dim = root_dim + self.downsample = None + self.project = None + self.levels = levels + if stride > 1: + self.downsample = nn.MaxPool2d(stride, stride=stride) + if in_channels != out_channels: + self.project = nn.Sequential( + nn.Conv2d(in_channels, out_channels, + kernel_size=1, stride=1, bias=False), + BatchNorm(out_channels) + ) + + def forward(self, x, residual=None, children=None): + children = [] if children is None else children + bottom = self.downsample(x) if self.downsample else x + residual = self.project(bottom) if self.project else bottom + if self.level_root: + children.append(bottom) + x1 = self.tree1(x, residual) + if self.levels == 1: + x2 = self.tree2(x1) + x = self.root(x2, x1, *children) + else: + children.append(x1) + x = self.tree2(x1, children=children) + return x + + +class DLA(nn.Module): + """ + For DLA the downscale ratio: + + -1: 1 + 0 : 1 + 1 : 2 + 2 : 4 + 3 : 8 + 4 : 16 + 5 : 32 + + DLA34: + torch.Size([1, 16, 224, 224]) + torch.Size([1, 16, 224, 224]) + torch.Size([1, 32, 112, 112]) + torch.Size([1, 64, 56, 56]) + torch.Size([1, 128, 28, 28]) + torch.Size([1, 256, 14, 14]) + torch.Size([1, 512, 7, 7]) + """ + def __init__(self, levels, channels, num_classes=1000, + block=BasicBlock, residual_root=False, out_indices:Tuple[int, ...]=(-1, 0, 1, 2, 3, 4, 5) + ): + super(DLA, self).__init__() + self.channels = channels + self.out_indices = out_indices + self.num_classes = num_classes + self.base_layer = nn.Sequential( + nn.Conv2d(3, channels[0], kernel_size=7, stride=1, + padding=3, bias=False), + BatchNorm(channels[0]), + nn.ReLU(inplace=True)) + self.level0 = self._make_conv_level( + channels[0], channels[0], levels[0]) + self.level1 = self._make_conv_level( + channels[0], channels[1], levels[1], stride=2) + self.level2 = Tree(levels[2], block, channels[1], channels[2], 2, + level_root=False, + root_residual=residual_root) + self.level3 = Tree(levels[3], block, channels[2], channels[3], 2, + level_root=True, root_residual=residual_root) + self.level4 = Tree(levels[4], block, channels[3], channels[4], 2, + level_root=True, root_residual=residual_root) + self.level5 = Tree(levels[5], block, channels[4], channels[5], 2, + level_root=True, root_residual=residual_root) + + for m in self.modules(): + if isinstance(m, nn.Conv2d): + n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels + m.weight.data.normal_(0, math.sqrt(2. / n)) + elif isinstance(m, BatchNorm): + m.weight.data.fill_(1) + m.bias.data.zero_() + + def _make_level(self, block, inplanes, planes, blocks, stride=1): + downsample = None + if stride != 1 or inplanes != planes: + downsample = nn.Sequential( + nn.MaxPool2d(stride, stride=stride), + nn.Conv2d(inplanes, planes, + kernel_size=1, stride=1, bias=False), + BatchNorm(planes), + ) + + layers = [] + layers.append(block(inplanes, planes, stride, downsample=downsample)) + for i in range(1, blocks): + layers.append(block(inplanes, planes)) + + return nn.Sequential(*layers) + + def _make_conv_level(self, inplanes, planes, convs, stride=1, dilation=1): + modules = [] + for i in range(convs): + modules.extend([ + nn.Conv2d(inplanes, planes, kernel_size=3, + stride=stride if i == 0 else 1, + padding=dilation, bias=False, dilation=dilation), + BatchNorm(planes), + nn.ReLU(inplace=True)]) + inplanes = planes + return nn.Sequential(*modules) + + def forward(self, x): + y = [] + x = self.base_layer(x) + if -1 in self.out_indices: + y.append(x) + for i in range(6): + x = getattr(self, 'level{}'.format(i))(x) + if i in self.out_indices: + y.append(x) + return y + + def load_pretrained_model(self, data_name, name): + model_url = get_model_url(name) + print(model_url) + self.load_state_dict(model_zoo.load_url(model_url), strict=False) + + +def dla34(pretrained=True, **kwargs): # DLA-34 + model = DLA([1, 1, 1, 2, 2, 1], + [16, 32, 64, 128, 256, 512], + block=BasicBlock, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla34') + return model + + +def dla46_c(pretrained=True, **kwargs): # DLA-46-C + Bottleneck.expansion = 2 + model = DLA([1, 1, 1, 2, 2, 1], + [16, 32, 64, 64, 128, 256], + block=Bottleneck, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla46_c') + return model + + +def dla46x_c(pretrained=True, **kwargs): # DLA-X-46-C + BottleneckX.expansion = 2 + model = DLA([1, 1, 1, 2, 2, 1], + [16, 32, 64, 64, 128, 256], + block=BottleneckX, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla46x_c') + return model + + +def dla60x_c(pretrained=True, **kwargs): # DLA-X-60-C + BottleneckX.expansion = 2 + model = DLA([1, 1, 1, 2, 3, 1], + [16, 32, 64, 64, 128, 256], + block=BottleneckX, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla60x_c') + return model + + +def dla60(pretrained=True, **kwargs): # DLA-60 + Bottleneck.expansion = 2 + model = DLA([1, 1, 1, 2, 3, 1], + [16, 32, 128, 256, 512, 1024], + block=Bottleneck, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla60') + return model + + +def dla60x(pretrained=True, **kwargs): # DLA-X-60 + BottleneckX.expansion = 2 + model = DLA([1, 1, 1, 2, 3, 1], + [16, 32, 128, 256, 512, 1024], + block=BottleneckX, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla60x') + return model + + +def dla102(pretrained=True, **kwargs): # DLA-102 + Bottleneck.expansion = 2 + model = DLA([1, 1, 1, 3, 4, 1], [16, 32, 128, 256, 512, 1024], + block=Bottleneck, residual_root=True, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla102') + return model + + +def dla102x(pretrained=True, **kwargs): # DLA-X-102 + BottleneckX.expansion = 2 + model = DLA([1, 1, 1, 3, 4, 1], [16, 32, 128, 256, 512, 1024], + block=BottleneckX, residual_root=True, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla102x') + return model + + +def dla102x2(pretrained=True, **kwargs): # DLA-X-102 64 + BottleneckX.cardinality = 64 + model = DLA([1, 1, 1, 3, 4, 1], [16, 32, 128, 256, 512, 1024], + block=BottleneckX, residual_root=True, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla102x2') + return model + + +def dla169(pretrained=True, **kwargs): # DLA-169 + Bottleneck.expansion = 2 + model = DLA([1, 1, 2, 3, 5, 1], [16, 32, 128, 256, 512, 1024], + block=Bottleneck, residual_root=True, **kwargs) + if pretrained is not None: + model.load_pretrained_model(pretrained, 'dla169') + return model + +@BACKBONE_DICT.register_module +def dlanet(depth, **kwargs): + if depth == 34: + model = dla34(**kwargs) + elif depth == 60: + model = dla60(**kwargs) + elif depth == 102: + model = dla102(**kwargs) + elif depth == 169: + model = dla169(**kwargs) + else: + raise ValueError( + 'Unsupported model depth, must be one of 34, 60, 102, 169') + return model \ No newline at end of file diff --git a/visualDet3D/networks/detectors/KM3D.py b/visualDet3D/networks/detectors/KM3D.py new file mode 100644 index 0000000..ab226d7 --- /dev/null +++ b/visualDet3D/networks/detectors/KM3D.py @@ -0,0 +1,88 @@ +import numpy as np +import torch.nn as nn +import torch.nn.functional as F +import torch +import math +import time +from torchvision.ops import nms +from visualDet3D.networks.utils import DETECTOR_DICT +from visualDet3D.networks.detectors.KM3D_core import KM3DCore +from visualDet3D.networks.heads.km3d_head import KM3DHead +from visualDet3D.networks.lib.blocks import AnchorFlatten +from visualDet3D.networks.lib.look_ground import LookGround +from visualDet3D.networks.lib.ops.dcn.deform_conv import DeformConv + +@DETECTOR_DICT.register_module +class KM3D(nn.Module): + """ + KM3D + """ + def __init__(self, network_cfg): + super(KM3D, self).__init__() + + self.obj_types = network_cfg.obj_types + + self.build_head(network_cfg) + + self.build_core(network_cfg) + + self.network_cfg = network_cfg + + + def build_core(self, network_cfg): + self.core = KM3DCore(network_cfg.backbone) + + def build_head(self, network_cfg): + self.bbox_head = KM3DHead( + **(network_cfg.head) + ) + + def training_forward(self, img_batch, annotations, meta): + """ + Args: + img_batch: [B, C, H, W] tensor + annotations: check visualDet3D.utils.utils compound_annotation + meta: + calib: visualDet3D.kitti.data.kitti.KittiCalib or anything with obj.P2 + epoch: current_epoch + Returns: + cls_loss, reg_loss: tensor of losses + loss_dict: [key, value] pair for logging + """ + + features = self.core(dict(image=img_batch, P2=meta['P2'])) + output_dict = self.bbox_head(features) + + loss, loss_dict = self.bbox_head.loss(output_dict, annotations, meta) + + return loss, loss_dict + + def test_forward(self, img_batch, P2): + """ + Args: + img_batch: [B, C, H, W] tensor + calib: visualDet3D.kitti.data.kitti.KittiCalib or anything with obj.P2 + Returns: + results: a nested list: + result[i] = detection_results for obj_types[i] + each detection result is a list [scores, bbox, obj_type]: + bbox = [bbox2d(length=4) , cx, cy, z, w, h, l, alpha] + """ + assert img_batch.shape[0] == 1 # we recommmend image batch size = 1 for testing + + features = self.core(dict(image=img_batch, P2=P2)) + output_dict = self.bbox_head(features) + + scores, bboxes, cls_indexes = self.bbox_head.get_bboxes(output_dict, P2, img_batch) + + return scores, bboxes, cls_indexes + + def forward(self, inputs): + + if isinstance(inputs, list) and len(inputs) == 3: + img_batch, annotations, meta = inputs + return self.training_forward(img_batch, annotations, meta) + else: + img_batch, calib = inputs + return self.test_forward(img_batch, calib) + diff --git a/visualDet3D/networks/detectors/KM3D_core.py b/visualDet3D/networks/detectors/KM3D_core.py new file mode 100644 index 0000000..a8e2d8c --- /dev/null +++ b/visualDet3D/networks/detectors/KM3D_core.py @@ -0,0 +1,34 @@ +import numpy as np +import torch.nn as nn +import torch +import math +import time +from visualDet3D.networks.backbones import build_backbone + +class KM3DCore(nn.Module): + """Some Information about RTM3D_core""" + def __init__(self, backbone_arguments=dict()): + super(KM3DCore, self).__init__() + self.backbone = build_backbone(backbone_arguments) + output_features = 2048 if backbone_arguments['depth'] > 34 else 512 + feature_size = 256 + + self.deconv_layers = nn.Sequential( + nn.ConvTranspose2d(output_features, feature_size, (4, 4), stride=(2, 2), padding=(1, 1), bias=False), + nn.BatchNorm2d(feature_size), + nn.ReLU(inplace=True), + nn.ConvTranspose2d(feature_size, feature_size, (4, 4), stride=(2, 2), padding=(1, 1), bias=False), + nn.BatchNorm2d(feature_size), + nn.ReLU(inplace=True), + nn.ConvTranspose2d(feature_size, feature_size, (4, 4), stride=(2, 2), padding=(1, 1), bias=False), + nn.BatchNorm2d(feature_size), + nn.ReLU(inplace=True), + ) + for _, m in self.deconv_layers.named_modules(): + if isinstance(m, nn.ConvTranspose2d): + nn.init.normal_(m.weight, std=0.001) + + def forward(self, x): + x = self.backbone(x['image']) + x = self.deconv_layers(x[0]) + return x diff --git a/visualDet3D/networks/detectors/__init__.py b/visualDet3D/networks/detectors/__init__.py index 5566321..10d56c2 100644 --- a/visualDet3D/networks/detectors/__init__.py +++ b/visualDet3D/networks/detectors/__init__.py @@ -1,3 +1,5 @@ from .retinanet_2d import RetinaNet from .yolomono3d_detector import Yolo3D, GroundAwareYolo3D -from .unet_monodepth import MonoDepth \ No newline at end of file +from .unet_monodepth import MonoDepth +from .yolostereo3d_detector import YoloStereo3DCore +from .KM3D import KM3D \ No newline at end of file diff --git a/visualDet3D/networks/detectors/yolostereo3d_core.py b/visualDet3D/networks/detectors/yolostereo3d_core.py new file mode 100644 index 0000000..b3b9817 --- /dev/null +++ b/visualDet3D/networks/detectors/yolostereo3d_core.py @@ -0,0 +1,126 @@ +import numpy as np +import torch +import torch.nn as nn +import torch.nn.functional as F +import math +import time +from visualDet3D.networks.lib.blocks import AnchorFlatten, ConvBnReLU +from visualDet3D.networks.lib.ghost_module import ResGhostModule, GhostModule +from visualDet3D.networks.lib.PSM_cost_volume import PSMCosineModule, CostVolume +from visualDet3D.networks.backbones import resnet +from visualDet3D.networks.backbones.resnet import BasicBlock +from visualDet3D.networks.lib.look_ground import LookGround + +class CostVolumePyramid(nn.Module): + """Some Information about CostVolumePyramid""" + def __init__(self, depth_channel_4, depth_channel_8, depth_channel_16): + super(CostVolumePyramid, self).__init__() + self.depth_channel_4 = depth_channel_4 # 24 + self.depth_channel_8 = depth_channel_8 # 24 + self.depth_channel_16 = depth_channel_16 # 96 + + input_features = depth_channel_4 # 24 + self.four_to_eight = nn.Sequential( + ResGhostModule(input_features, 3 * input_features, 3, ratio=3), + nn.AvgPool2d(2), + #nn.Conv2d(3 * input_features, 3 * input_features, 3, padding=1, bias=False), + #nn.BatchNorm2d(3 * input_features), + #nn.ReLU(), + BasicBlock(3 * input_features, 3 * input_features), + ) + input_features = 3 * input_features + depth_channel_8 # 3 * 24 + 24 = 96 + self.eight_to_sixteen = nn.Sequential( + ResGhostModule(input_features, 3 * input_features, 3, ratio=3), + nn.AvgPool2d(2), + BasicBlock(3 * input_features, 3 * input_features), + #nn.Conv2d(3 * input_features, 3 * input_features, 3, padding=1, bias=False), + #nn.BatchNorm2d(3 * input_features), + #nn.ReLU(), + ) + input_features = 3 * input_features + depth_channel_16 # 3 * 96 + 96 = 384 + self.depth_reason = nn.Sequential( + ResGhostModule(input_features, 3 * input_features, kernel_size=3, ratio=3), + BasicBlock(3 * input_features, 3 * input_features), + #nn.Conv2d(3 * input_features, 3 * input_features, 3, padding=1, bias=False), + #nn.BatchNorm2d(3 * input_features), + #nn.ReLU(), + ) + self.output_channel_num = 3 * input_features #1152 + + self.depth_output = nn.Sequential( + nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True), + nn.Conv2d(self.output_channel_num, int(self.output_channel_num/2), 3, padding=1), + nn.BatchNorm2d(int(self.output_channel_num/2)), + nn.ReLU(), + nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True), + nn.Conv2d(int(self.output_channel_num/2), int(self.output_channel_num/4), 3, padding=1), + nn.BatchNorm2d(int(self.output_channel_num/4)), + nn.ReLU(), + nn.Conv2d(int(self.output_channel_num/4), 96, 1), + ) + + + def forward(self, psv_volume_4, psv_volume_8, psv_volume_16): + psv_4_8 = self.four_to_eight(psv_volume_4) + psv_volume_8 = torch.cat([psv_4_8, psv_volume_8], dim=1) + psv_8_16 = self.eight_to_sixteen(psv_volume_8) + psv_volume_16 = torch.cat([psv_8_16, psv_volume_16], dim=1) + psv_16 = self.depth_reason(psv_volume_16) + if self.training: + return psv_16, self.depth_output(psv_16) + return psv_16, torch.zeros([psv_volume_4.shape[0], 1, psv_volume_4.shape[2], psv_volume_4.shape[3]]) + +class StereoMerging(nn.Module): + def __init__(self, base_features): + super(StereoMerging, self).__init__() + self.cost_volume_0 = PSMCosineModule(downsample_scale=4, max_disp=96, input_features=base_features) + PSV_depth_0 = self.cost_volume_0.depth_channel + + self.cost_volume_1 = PSMCosineModule(downsample_scale=8, max_disp=192, input_features=base_features * 2) + PSV_depth_1 = self.cost_volume_1.depth_channel + + self.cost_volume_2 = CostVolume(downsample_scale=16, max_disp=192, input_features=base_features * 4, PSM_features=8) + PSV_depth_2 = self.cost_volume_2.output_channel + + self.depth_reasoning = CostVolumePyramid(PSV_depth_0, PSV_depth_1, PSV_depth_2) + self.final_channel = self.depth_reasoning.output_channel_num + base_features * 4 + + def forward(self, left_x, right_x): + PSVolume_0 = self.cost_volume_0(left_x[0], right_x[0]) + PSVolume_1 = self.cost_volume_1(left_x[1], right_x[1]) + PSVolume_2 = self.cost_volume_2(left_x[2], right_x[2]) + PSV_features, depth_output = self.depth_reasoning(PSVolume_0, PSVolume_1, PSVolume_2) # c = 1152 + features = torch.cat([left_x[2], PSV_features], dim=1) # c = 1152 + 256 = 1408 + return features, depth_output + +class YoloStereo3DCore(nn.Module): + """ + Inference Structure of YoloStereo3D + Similar to YoloMono3D, + Left and Right image are fed into the backbone in batch. So they will affect each other with BatchNorm2d. + """ + def __init__(self, backbone_arguments): + super(YoloStereo3DCore, self).__init__() + self.backbone =resnet(**backbone_arguments) + + base_features = 256 if backbone_arguments['depth'] > 34 else 64 + self.neck = StereoMerging(base_features) + + + def forward(self, images): + + batch_size = images.shape[0] + left_images = images[:, 0:3, :, :] + right_images = images[:, 3:, :, :] + + images = torch.cat([left_images, right_images], dim=0) + + features = self.backbone(images) + + left_features = [feature[0:batch_size] for feature in features] + right_features = [feature[batch_size:] for feature in features] + + features, depth_output = self.neck(left_features, right_features) + + output_dict = dict(features=features, depth_output=depth_output) + return output_dict diff --git a/visualDet3D/networks/detectors/yolostereo3d_detector.py b/visualDet3D/networks/detectors/yolostereo3d_detector.py new file mode 100644 index 0000000..b132167 --- /dev/null +++ b/visualDet3D/networks/detectors/yolostereo3d_detector.py @@ -0,0 +1,103 @@ +import numpy as np +import torch +import torch.nn as nn +import torch.nn.functional as F +from torchvision.ops import nms +from visualDet3D.networks.utils.registry import DETECTOR_DICT +from visualDet3D.utils.timer import profile +from visualDet3D.networks.heads import losses +from visualDet3D.networks.detectors.yolostereo3d_core import YoloStereo3DCore +from visualDet3D.networks.heads.detection_3d_head import StereoHead +from visualDet3D.networks.lib.blocks import AnchorFlatten, ConvBnReLU +from visualDet3D.networks.backbones.resnet import BasicBlock + + + +@DETECTOR_DICT.register_module +class Stereo3D(nn.Module): + """ + Stereo3D + """ + def __init__(self, network_cfg): + super(Stereo3D, self).__init__() + + self.obj_types = network_cfg.obj_types + + self.build_head(network_cfg) + + self.build_core(network_cfg) + + self.network_cfg = network_cfg + + def build_core(self, network_cfg): + self.core = YoloStereo3DCore(network_cfg.backbone) + + def build_head(self, network_cfg): + self.bbox_head = StereoHead( + **(network_cfg.head) + ) + + self.disparity_loss = losses.DisparityLoss(maxdisp=96) + + def train_forward(self, left_images, right_images, annotations, P2, P3, disparity=None): + """ + Args: + img_batch: [B, C, H, W] tensor + annotations: check visualDet3D.utils.utils compound_annotation + calib: visualDet3D.kitti.data.kitti.KittiCalib or anything with obj.P2 + Returns: + cls_loss, reg_loss: tensor of losses + loss_dict: [key, value] pair for logging + """ + output_dict = self.core(torch.cat([left_images, right_images], dim=1)) + depth_output = output_dict['depth_output'] + + cls_preds, reg_preds = self.bbox_head( + dict( + features=output_dict['features'], + P2=P2, + image=left_images + ) + ) + + anchors = self.bbox_head.get_anchor(left_images, P2) + + cls_loss, reg_loss, loss_dict = self.bbox_head.loss(cls_preds, reg_preds, anchors, annotations, P2) + + if reg_loss.mean() > 0 and not disparity is None and not depth_output is None: + disp_loss = 1.0 * self.disparity_loss(depth_output, disparity) + loss_dict['disparity_loss'] = disp_loss + reg_loss += disp_loss + + self.depth_output = depth_output.detach() + else: + loss_dict['disparity_loss'] = torch.zeros_like(reg_loss) + return cls_loss, reg_loss, loss_dict + + def test_forward(self, left_images, right_images, P2, P3): + assert left_images.shape[0] == 1 # we recommmend image batch size = 1 for testing + + output_dict = self.core(torch.cat([left_images, right_images], dim=1)) + depth_output = output_dict['depth_output'] + + cls_preds, reg_preds = self.bbox_head( + dict( + features=output_dict['features'], + P2=P2, + image=left_images + ) + ) + + anchors = self.bbox_head.get_anchor(left_images, P2) + + scores, bboxes, cls_indexes = self.bbox_head.get_bboxes(cls_preds, reg_preds, anchors, P2, left_images) + + return scores, bboxes, cls_indexes + + + def forward(self, inputs): + + if isinstance(inputs, list) and len(inputs) >= 5: + return self.train_forward(*inputs) + else: + return self.test_forward(*inputs) diff --git a/visualDet3D/networks/heads/detection_3d_head.py b/visualDet3D/networks/heads/detection_3d_head.py index a184fef..8e66da5 100644 --- a/visualDet3D/networks/heads/detection_3d_head.py +++ b/visualDet3D/networks/heads/detection_3d_head.py @@ -13,7 +13,8 @@ from visualDet3D.networks.utils.utils import calc_iou, BackProjection, BBox3dProjector from visualDet3D.networks.lib.fast_utils.hill_climbing import post_opt from visualDet3D.networks.utils.utils import ClipBoxes -from visualDet3D.networks.lib.blocks import AnchorFlatten +from visualDet3D.networks.lib.blocks import AnchorFlatten, ConvBnReLU +from visualDet3D.networks.backbones.resnet import BasicBlock from visualDet3D.networks.lib.ops import ModulatedDeformConvPack from visualDet3D.networks.lib.look_ground import LookGround @@ -387,7 +388,6 @@ def get_bboxes(self, cls_scores, reg_preds, anchors, P2s, img_batch=None): nms_bbox = bboxes[:, :4] + label.float().unsqueeze() * (max_coordinate) keep_inds = nms(nms_bbox, max_score, nms_iou_thr) - bboxes = bboxes[keep_inds] max_score = max_score[keep_inds] label = label[keep_inds] @@ -492,3 +492,38 @@ def loss(self, cls_scores, reg_preds, anchors, annotations, P2s): reg_loss = weighted_regression_losses.mean(dim=0, keepdim=True) return cls_loss, reg_loss, dict(cls_loss=cls_loss, reg_loss=reg_loss, total_loss=cls_loss + reg_loss) + +class StereoHead(AnchorBasedDetection3DHead): + def init_layers(self, num_features_in, + num_anchors:int, + num_cls_output:int, + num_reg_output:int, + cls_feature_size:int=1024, + reg_feature_size:int=1024, + **kwargs): + + self.cls_feature_extraction = nn.Sequential( + nn.Conv2d(num_features_in, cls_feature_size, kernel_size=3, padding=1), + nn.Dropout2d(0.3), + nn.ReLU(inplace=True), + nn.Conv2d(cls_feature_size, cls_feature_size, kernel_size=3, padding=1), + nn.Dropout2d(0.3), + nn.ReLU(inplace=True), + + nn.Conv2d(cls_feature_size, num_anchors*(num_cls_output), kernel_size=3, padding=1), + AnchorFlatten(num_cls_output) + ) + self.cls_feature_extraction[-2].weight.data.fill_(0) + self.cls_feature_extraction[-2].bias.data.fill_(0) + + self.reg_feature_extraction = nn.Sequential( + + ConvBnReLU(num_features_in, reg_feature_size, (3, 3)), + BasicBlock(reg_feature_size, reg_feature_size), + nn.ReLU(), + nn.Conv2d(reg_feature_size, num_anchors*num_reg_output, kernel_size=3, padding=1), + AnchorFlatten(num_reg_output) + ) + + self.reg_feature_extraction[-2].weight.data.fill_(0) + self.reg_feature_extraction[-2].bias.data.fill_(0) diff --git a/visualDet3D/networks/heads/km3d_head.py b/visualDet3D/networks/heads/km3d_head.py new file mode 100644 index 0000000..afeae35 --- /dev/null +++ b/visualDet3D/networks/heads/km3d_head.py @@ -0,0 +1,357 @@ +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +from torchvision.ops import nms +from easydict import EasyDict +import numpy as np +from typing import List, Tuple, Dict + + +from visualDet3D.networks.heads.losses import SigmoidFocalLoss, ModifiedSmoothL1Loss +from visualDet3D.networks.heads.anchors import Anchors +from visualDet3D.networks.utils.utils import calc_iou, BackProjection, BBox3dProjector +from visualDet3D.networks.lib.fast_utils.hill_climbing import post_opt +from visualDet3D.networks.utils.utils import ClipBoxes +from visualDet3D.networks.lib.blocks import AnchorFlatten +from visualDet3D.networks.lib.ops import ModulatedDeformConvPack +from visualDet3D.networks.lib.look_ground import LookGround +from visualDet3D.networks.utils.rtm3d_utils import _transpose_and_gather_feat, compute_rot_loss, gen_position, Position_loss, _nms, _topk_channel, _topk +from visualDet3D.utils.utils import convertRot2Alpha + +class KM3DHead(nn.Module): + """Some Information about KM3DHead""" + def __init__(self, num_classes:int=3, + num_joints:int=9, + max_objects:int=32, + layer_cfg=EasyDict(), + loss_cfg=EasyDict(), + test_cfg=EasyDict()): + super(KM3DHead, self).__init__() + self._init_layers(**layer_cfg) + self.build_loss(**loss_cfg) + self.test_cfg = test_cfg + const = torch.Tensor( + [[-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], + [-1, 0], [0, -1], [-1, 0], [0, -1]]).unsqueeze(0).unsqueeze(0) + self.register_buffer('const', const) # self.const + + self.num_classes = num_classes + self.num_joints = num_joints + self.max_objects = max_objects + self.clipper = ClipBoxes() + + def build_loss(self, + gamma=2.0, + output_w = 1280, + rampup_length = 100, + **kwargs): + pass #self.cls_hm_loss = SigmoidFocalLoss(gamma=gamma) + self.position_loss = Position_loss(output_w=output_w) + self.rampup_length = rampup_length + + def exp_rampup(self, epoch=0): + if epoch < self.rampup_length: + epoch = np.clip(epoch, 0.0, self.rampup_length) + phase = 1.0 - epoch / self.rampup_length + return float(np.exp(-5.0 * phase * phase)) + else: + return 1.0 + + @staticmethod + def _neg_loss(pred, gt): + ''' Modified focal loss. Exactly the same as CornerNet. + Runs faster and costs a little bit more memory + Arguments: + pred (batch x c x h x w) + gt_regr (batch x c x h x w) + ''' + pos_inds = gt.eq(1).float() + neg_inds = gt.lt(1).float() + + neg_weights = torch.pow(1 - gt, 4) + + loss = 0 + pred_prob = torch.sigmoid(pred) + + pos_loss = nn.functional.logsigmoid(pred) * torch.pow(1 - pred_prob, 2) * pos_inds + pos_loss = torch.where( + pred_prob > 0.99, + torch.zeros_like(pos_loss), + pos_loss + ) + neg_loss = nn.functional.logsigmoid(- pred) * torch.pow(pred_prob, 2) * neg_weights * neg_inds + neg_loss = torch.where( + pred_prob < 0.01, + torch.zeros_like(neg_loss), + neg_loss + ) + + num_pos = pos_inds.float().sum() + pos_loss = pos_loss.sum() + neg_loss = neg_loss.sum() + + if num_pos == 0: + loss = loss - neg_loss + else: + loss = loss - (pos_loss + neg_loss) / num_pos + return loss + + @staticmethod + def _RegWeightedL1Loss(output, mask, ind, target, dep): + dep=dep.squeeze(2) + dep[dep<5]=dep[dep<5]*0.01 + dep[dep >= 5] = torch.log10(dep[dep >=5]-4)+0.1 + pred = _transpose_and_gather_feat(output, ind) + mask = mask.float() + # loss = F.l1_loss(pred * mask, target * mask, reduction='elementwise_mean') + #losss=torch.abs(pred * mask-target * mask) + #loss = F.l1_loss(pred * mask, target * mask, size_average=False) + loss=torch.abs(pred * mask-target * mask) + loss=torch.sum(loss,dim=2)*dep + loss=loss.sum() + loss = loss / (mask.sum() + 1e-4) + + return loss + + @staticmethod + def _RegL1Loss(output, mask, ind, target): + pred = _transpose_and_gather_feat(output, ind) + mask = mask.unsqueeze(2).expand_as(pred).float() + # loss = F.l1_loss(pred * mask, target * mask, reduction='elementwise_mean') + loss = F.l1_loss(pred * mask, target * mask, size_average=False) + loss = loss / (mask.sum() + 1e-4) + return loss + + @staticmethod + def _RotLoss(output, mask, ind, rotbin, rotres): + pred = _transpose_and_gather_feat(output, ind) + loss = compute_rot_loss(pred, rotbin, rotres, mask) + return loss + + def _init_layers(self, + input_features=256, + head_features=64, + head_dict=dict(), + **kwargs): + # self.head_dict = head_dict + self.head_layers = nn.ModuleDict() + for head_name, num_output in head_dict.items(): + self.head_layers[head_name] = nn.Sequential( + nn.Conv2d(input_features, head_features, 3, padding=1, bias=True), + nn.ReLU(inplace=True), + nn.Conv2d(head_features, num_output, 1) + ) + + if 'hm' in head_name: + output_layer = self.head_layers[head_name][-1] + nn.init.constant_(output_layer.bias, -2.19) + + else: + output_layer = self.head_layers[head_name][-1] + nn.init.normal_(output_layer.weight, std=0.001) + nn.init.constant_(output_layer.bias, 0) + + def _decode(self, heat, wh, kps,dim,rot, prob=None,reg=None, hm_hp=None, hp_offset=None, K=100,meta=None,const=None): + + batch, cat, height, width = heat.size() + num_joints = kps.shape[1] // 2 + # heat = torch.sigmoid(heat) + # perform nms on heatmaps + # hm_show,_=torch.max(hm_hp,1) + # hm_show=hm_show.squeeze(0) + # hm_show=hm_show.detach().cpu().numpy().copy() + # plt.imshow(hm_show, 'gray') + # plt.show() + + heat = _nms(heat) + scores, inds, clses, ys, xs = _topk(heat, K=K) + + kps = _transpose_and_gather_feat(kps, inds) + kps = kps.view(batch, K, num_joints * 2) + kps[..., ::2] += xs.view(batch, K, 1).expand(batch, K, num_joints) + kps[..., 1::2] += ys.view(batch, K, 1).expand(batch, K, num_joints) + if reg is not None: + reg = _transpose_and_gather_feat(reg, inds) + reg = reg.view(batch, K, 2) + xs = xs.view(batch, K, 1) + reg[:, :, 0:1] + ys = ys.view(batch, K, 1) + reg[:, :, 1:2] + else: + xs = xs.view(batch, K, 1) + 0.5 + ys = ys.view(batch, K, 1) + 0.5 + wh = _transpose_and_gather_feat(wh, inds) + wh = wh.view(batch, K, 2) + clses = clses.view(batch, K, 1).float() + scores = scores.view(batch, K, 1) + + bboxes = torch.cat([xs - wh[..., 0:1] / 2, + ys - wh[..., 1:2] / 2, + xs + wh[..., 0:1] / 2, + ys + wh[..., 1:2] / 2], dim=2) + dim = _transpose_and_gather_feat(dim, inds) + dim = dim.view(batch, K, 3) + # dim[:, :, 0] = torch.exp(dim[:, :, 0]) * 1.63 + # dim[:, :, 1] = torch.exp(dim[:, :, 1]) * 1.53 + # dim[:, :, 2] = torch.exp(dim[:, :, 2]) * 3.88 + rot = _transpose_and_gather_feat(rot, inds) + rot = rot.view(batch, K, 8) + prob = _transpose_and_gather_feat(prob, inds)[:,:,0] + prob = prob.view(batch, K, 1) + if hm_hp is not None: + hm_hp = _nms(hm_hp) + thresh = 0.1 + kps = kps.view(batch, K, num_joints, 2).permute( + 0, 2, 1, 3).contiguous() # b x J x K x 2 + reg_kps = kps.unsqueeze(3).expand(batch, num_joints, K, K, 2) + hm_score, hm_inds, hm_ys, hm_xs = _topk_channel(hm_hp, K=K) # b x J x K + if hp_offset is not None: + hp_offset = _transpose_and_gather_feat( + hp_offset, hm_inds.view(batch, -1)) + hp_offset = hp_offset.view(batch, num_joints, K, 2) + hm_xs = hm_xs + hp_offset[:, :, :, 0] + hm_ys = hm_ys + hp_offset[:, :, :, 1] + else: + hm_xs = hm_xs + 0.5 + hm_ys = hm_ys + 0.5 + mask = (hm_score > thresh).float() + hm_score = (1 - mask) * -1 + mask * hm_score + hm_ys = (1 - mask) * (-10000) + mask * hm_ys + hm_xs = (1 - mask) * (-10000) + mask * hm_xs + hm_kps = torch.stack([hm_xs, hm_ys], dim=-1).unsqueeze( + 2).expand(batch, num_joints, K, K, 2) + dist = (((reg_kps - hm_kps) ** 2).sum(dim=4) ** 0.5) + min_dist, min_ind = dist.min(dim=3) # b x J x K + hm_score = hm_score.gather(2, min_ind).unsqueeze(-1) # b x J x K x 1 + min_dist = min_dist.unsqueeze(-1) + min_ind = min_ind.view(batch, num_joints, K, 1, 1).expand( + batch, num_joints, K, 1, 2) + hm_kps = hm_kps.gather(3, min_ind) + hm_kps = hm_kps.view(batch, num_joints, K, 2) + l = bboxes[:, :, 0].view(batch, 1, K, 1).expand(batch, num_joints, K, 1) + t = bboxes[:, :, 1].view(batch, 1, K, 1).expand(batch, num_joints, K, 1) + r = bboxes[:, :, 2].view(batch, 1, K, 1).expand(batch, num_joints, K, 1) + b = bboxes[:, :, 3].view(batch, 1, K, 1).expand(batch, num_joints, K, 1) + mask = (hm_kps[..., 0:1] < l) + (hm_kps[..., 0:1] > r) + \ + (hm_kps[..., 1:2] < t) + (hm_kps[..., 1:2] > b) + \ + (hm_score < thresh) + (min_dist > (torch.max(b - t, r - l) * 0.3)) + mask = (mask > 0).float().expand(batch, num_joints, K, 2) + kps = (1 - mask) * hm_kps + mask * kps + kps = kps.permute(0, 2, 1, 3).contiguous().view( + batch, K, num_joints * 2) + hm_score=hm_score.permute(0, 2, 1, 3).squeeze(3).contiguous() + else: + hm_score = kps.new_zeros([1, K, 9])# dets[mask, 26:35] + + kps *= 4 # restore back to scale 1 + bboxes *= 4 # restore back to scale 1 + + position,rot_y, alpha, kps_inv=gen_position(kps,dim,rot,meta,const) + + detections = torch.cat([bboxes, scores, kps_inv, dim,hm_score,rot_y, position,prob,clses, alpha], dim=2) + + return detections + + + def get_bboxes(self, output:dict, P2, img_batch=None): + output['hm'] = torch.sigmoid(output['hm']) + output['hm_hp'] = torch.sigmoid(output['hm_hp']) + reg = output['reg'] + hm_hp = output['hm_hp'] + hp_offset = output['hp_offset'] + dets = self._decode( + output['hm'], output['wh'], output['hps'], output['dim'], output['rot'], prob=output['prob'], reg=reg, hm_hp=hm_hp, hp_offset=hp_offset, K=100, const=self.const, meta=dict(calib=P2) + )[0] + + score_threshold = getattr(self.test_cfg, 'score_thr', 0.1) + mask = dets[:, 4] > score_threshold#[K] + bbox2d = dets[mask, 0:4] + scores = dets[mask, 4:5] #[K, 1] + kps_inv = dets[mask, 5:23] #[K, 18] + dims = dets[mask, 23:26] #[w, h, l] ? + hm_score = dets[mask, 26:35] + rot_y = dets[mask, 35:36] + position = dets[mask, 36:39] + prob = dets[mask, 39:40] + cls_indexes = dets[mask, 40:41].long() + alpha = dets[mask, 41:42] + + ## Project back to camera frame for final output + p2 = P2[0] #[3, 4] + fx = p2[0, 0] + fy = p2[1, 1] + cx = p2[0, 2] + cy = p2[1, 2] + tx = p2[0, 3] + ty = p2[1, 3] + z3d = position[:, 2:3] #[N, 1] + cx3d = (position[:, 0:1] * fx + tx + cx * z3d) / z3d + cy3d = (position[:, 1:2] * fy + ty + cy * z3d) / z3d + + if img_batch is not None: + bbox2d = self.clipper(bbox2d, img_batch) + + bbox3d_3d = torch.cat( + [bbox2d, cx3d, cy3d, z3d, dims, alpha], dim=1 #cx, cy, z, w, h, l, alpha + ) + + + cls_agnostic = getattr(self.test_cfg, 'cls_agnositc', True) # True -> directly NMS; False -> NMS with offsets, different categories will not collide + nms_iou_thr = getattr(self.test_cfg, 'nms_iou_thr', 0.5) + + + if cls_agnostic: + keep_inds = nms(bbox3d_3d[:, :4], scores[:, 0], nms_iou_thr) + else: + max_coordinate = bbox3d_3d.max() + nms_bbox = bbox3d_3d[:, :4] + cls_indexes.float() * (max_coordinate) + keep_inds = nms(nms_bbox, scores, nms_iou_thr) + + scores = scores[keep_inds, 0] + bbox3d_3d = bbox3d_3d[keep_inds] + cls_indexes = cls_indexes[keep_inds] + + + return scores, bbox3d_3d, cls_indexes + + def loss(self, output, annotations, meta): + P2 = meta['P2'] + epoch = meta['epoch'] + + #output['hm'] = torch.sigmoid(output['hm']) + #output['hm_hp'] = torch.sigmoid(output['hm_hp']) + + hm_loss = self._neg_loss(output['hm'], annotations['hm']) + hp_loss = self._RegWeightedL1Loss(output['hps'],annotations['hps_mask'], annotations['ind'], annotations['hps'],annotations['dep']) + + wh_loss = self._RegL1Loss(output['wh'], annotations['reg_mask'],annotations['ind'], annotations['wh']) + dim_loss = self._RegL1Loss(output['dim'], annotations['reg_mask'],annotations['ind'], annotations['dim']) + + rot_loss = self._RotLoss(output['rot'], annotations['reg_mask'], annotations['ind'], annotations['rotbin'], annotations['rotres']) + off_loss = self._RegL1Loss(output['reg'], annotations['reg_mask'], annotations['ind'], annotations['reg']) + + hp_offset_loss = self._RegL1Loss(output['hp_offset'], annotations['hp_mask'], annotations['hp_ind'], annotations['hp_offset']) + hm_hp_loss = self._neg_loss(output['hm_hp'], annotations['hm_hp']) + coor_loss, prob_loss, box_score = self.position_loss(output, annotations, P2) + + loss_stats = {'loss': box_score, 'hm_loss': hm_loss, 'hp_loss': hp_loss, + 'hm_hp_loss': hm_hp_loss, 'hp_offset_loss': hp_offset_loss, + 'wh_loss': wh_loss, 'off_loss': off_loss,'dim_loss': dim_loss, + 'rot_loss':rot_loss,'prob_loss':prob_loss,'box_score':box_score,'coor_loss':coor_loss} + + weight_dict = {'hm_loss': 1, 'hp_loss': 1, + 'hm_hp_loss': 1, 'hp_offset_loss': 1, + 'wh_loss': 0.1, 'off_loss': 1, 'dim_loss': 2, + 'rot_loss': 0.2, 'prob_loss': self.exp_rampup(epoch), 'coor_loss': self.exp_rampup(epoch)} + + loss = 0 + for key, weight in weight_dict.items(): + if key in loss_stats: + loss = loss + loss_stats[key] * weight + loss_stats['total_loss'] = loss + return loss, loss_stats + + def forward(self, x): + ret = {} + for head in self.head_layers: + ret[head] = self.head_layers[head](x) + return ret diff --git a/visualDet3D/networks/heads/losses.py b/visualDet3D/networks/heads/losses.py index e68a846..d57ec05 100644 --- a/visualDet3D/networks/heads/losses.py +++ b/visualDet3D/networks/heads/losses.py @@ -4,6 +4,7 @@ import torch import torch.nn as nn from visualDet3D.networks.utils.utils import calc_iou +from visualDet3D.networks.lib.disparity_loss import stereo_focal_loss from visualDet3D.utils.timer import profile @@ -116,4 +117,19 @@ def forward(self, preds:torch.Tensor, targets:torch.Tensor, eps:float=1e-8) -> t # IoU ious = overlap / union ious = torch.clamp(ious, min=eps) - return -ious.log() \ No newline at end of file + return -ious.log() + +class DisparityLoss(nn.Module): + """Some Information about DisparityLoss""" + def __init__(self, maxdisp:int=64): + super(DisparityLoss, self).__init__() + #self.register_buffer("disp",torch.Tensor(np.reshape(np.array(range(maxdisp)),[1,maxdisp,1,1]))) + self.criterion = stereo_focal_loss.StereoFocalLoss(maxdisp) + + def forward(self, x:torch.Tensor, label:torch.Tensor)->torch.Tensor: + #x = torch.softmax(x, dim=1) + label = label.cuda().unsqueeze(1) + loss = self.criterion(x, label, variance=0.5) + #mask = (label > 0) * (label < 64) + #loss = nn.functional.smooth_l1_loss(disp[mask], label[mask]) + return loss \ No newline at end of file diff --git a/visualDet3D/networks/lib/PSM_cost_volume.py b/visualDet3D/networks/lib/PSM_cost_volume.py new file mode 100644 index 0000000..4831c5c --- /dev/null +++ b/visualDet3D/networks/lib/PSM_cost_volume.py @@ -0,0 +1,130 @@ +""" + This script implements cost_volume module in the PSM networks +""" + + +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +from torch.autograd import Variable +from visualDet3D.utils.timer import profile +def make_grid(grid_shape): + + #grid: (y, x, z) + grid_1ds = [torch.arange(-1, 1, 2.0/shape) for shape in grid_shape] + grids = torch.meshgrid(grid_1ds) + return grids + +class CostVolume(nn.Module): + """ + While PSV module define depth dimension similar to the depth in real world + + Cost Volume implementation in PSM network and its prior networks define this directly as disparity + """ + def __init__(self, max_disp=192, downsample_scale=4, input_features=1024, PSM_features=64): + super(CostVolume, self).__init__() + self.max_disp = max_disp + self.downsample_scale = downsample_scale + self.depth_channel = int(self.max_disp / self.downsample_scale) + self.down_sample = nn.Sequential( + nn.Conv2d(input_features, PSM_features, 1), + nn.BatchNorm2d(PSM_features), + nn.ReLU(), + ) + self.conv3d = nn.Sequential( + nn.Conv3d(2 * PSM_features, PSM_features, 3, padding=1), + nn.BatchNorm3d(PSM_features), + nn.ReLU(), + nn.Conv3d(PSM_features, PSM_features, 3, padding=1), + nn.BatchNorm3d(PSM_features), + nn.ReLU(), + ) + self.output_channel = PSM_features * self.depth_channel + @profile("Cost Volume", 1, 10) + def forward(self, left_features, right_features): + batch_size, _, w, h = left_features.shape + left_features = self.down_sample(left_features) + right_features = self.down_sample(right_features) + cost = Variable( + torch.FloatTensor(left_features.size()[0], + left_features.size()[1]*2, + self.depth_channel, + left_features.size()[2], + left_features.size()[3]).zero_(), + volatile= not self.training + ).cuda() + + for i in range(self.depth_channel): + if i > 0 : + cost[:, :left_features.size()[1], i, :,i:] = left_features[:,:,:,i:] + cost[:, left_features.size()[1]:, i, :,i:] = right_features[:,:,:,:-i] + else: + cost[:, :left_features.size()[1], i, :,:] = left_features + cost[:, left_features.size()[1]:, i, :,:] = right_features + cost = cost.contiguous() + cost = self.conv3d(cost) # .squeeze(1) + cost = cost.reshape(batch_size, -1, w, h).contiguous() + return cost + + +class PSMCosineModule(nn.Module): + """Some Information about PSMCosineModule""" + def __init__(self, max_disp=192, downsample_scale=4, input_features=512): + super(PSMCosineModule, self).__init__() + self.max_disp = max_disp + self.downsample_scale = downsample_scale + self.depth_channel = int(self.max_disp / self.downsample_scale) + #self.distance_function = nn.CosineSimilarity(dim=1) + + @profile("PSM Cos Volume", 1, 20) + def forward(self, left_features, right_features): + cost = Variable( + torch.FloatTensor(left_features.size()[0], + self.depth_channel, + left_features.size()[2], + left_features.size()[3]).zero_(), + volatile= not self.training + ).cuda() + + for i in range(self.depth_channel): + if i > 0 : + cost[:, i, :,i:] = (left_features[:,:,:,i:] * right_features[:,:,:,:-i]).mean(dim=1) + else: + cost[:, i, :, :] = (left_features * right_features).mean(dim=1) + cost = cost.contiguous() + return cost + +class DoublePSMCosineModule(PSMCosineModule): + """Some Information about DoublePSMCosineModule""" + def __init__(self, max_disp=192, downsample_scale=4): + super(DoublePSMCosineModule, self).__init__(max_disp=max_disp, downsample_scale=downsample_scale) + self.depth_channel = self.depth_channel + + def forward(self, left_features, right_features): + b, c, h, w = left_features.shape + base_grid_y, base_grid_x = make_grid(right_features.shape[2:]) #[h, w] + base_grid_x = base_grid_x - 1.0 / right_features.shape[1] + shifted_grid = torch.stack([base_grid_y, base_grid_x], dim=-1).cuda().unsqueeze(0).repeat(b, 1, 1, 1) + right_features_shifted = F.grid_sample(right_features, shifted_grid) + cost_1 = super(DoublePSMCosineModule, self)(left_features, right_features) + cost_2 = super(DoublePSMCosineModule, self)(left_features, right_features_shifted) + return torch.cat([cost_1, cost_2], dim=1) + + + +if __name__ == "__main__": + model = DoublePSMCosineModule(max_disp=192, downsample_scale=16).cuda() + left_feature = torch.randn(2, 128, 12, 56, requires_grad=True, device="cuda:0") + right_feature = torch.randn(2, 128, 12, 56, requires_grad=True, device="cuda:0") + output = model(left_feature, right_feature, 0, 0) #currently dummy + mean_1 = output.mean() + mean_1.backward() + print(left_feature.grad.std()) + print(model.depth_channel) + print(output.shape) + import time + start = time.time() + for _ in range(10): + output = model(left_feature, right_feature, 0, 0) + print(time.time() - start) diff --git a/visualDet3D/networks/lib/disparity_loss/__init__.py b/visualDet3D/networks/lib/disparity_loss/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/visualDet3D/networks/lib/disparity_loss/disp2prob.py b/visualDet3D/networks/lib/disparity_loss/disp2prob.py new file mode 100644 index 0000000..2e677ef --- /dev/null +++ b/visualDet3D/networks/lib/disparity_loss/disp2prob.py @@ -0,0 +1,142 @@ +import warnings + +import torch +import torch.nn.functional as F + + +def isNaN(x): + return x != x + + +class Disp2Prob(object): + """ + Convert disparity map to matching probability volume + Args: + maxDisp, (int): the maximum of disparity + gtDisp, (torch.Tensor): in (..., Height, Width) layout + start_disp (int): the start searching disparity index, usually be 0 + dilation (int): the step between near disparity index + + Outputs: + probability, (torch.Tensor): in [BatchSize, maxDisp, Height, Width] layout + + + """ + def __init__(self, maxDisp:int, gtDisp:torch.Tensor, start_disp:int=0, dilation:int=1): + + if not isinstance(maxDisp, int): + raise TypeError('int is expected, got {}'.format(type(maxDisp))) + + if not torch.is_tensor(gtDisp): + raise TypeError('tensor is expected, got {}'.format(type(gtDisp))) + + if not isinstance(start_disp, int): + raise TypeError('int is expected, got {}'.format(type(start_disp))) + + if not isinstance(dilation, int): + raise TypeError('int is expected, got {}'.format(type(dilation))) + + if gtDisp.dim() == 2: # single image H x W + gtDisp = gtDisp.view(1, 1, gtDisp.size(0), gtDisp.size(1)) + + if gtDisp.dim() == 3: # multi image B x H x W + gtDisp = gtDisp.view(gtDisp.size(0), 1, gtDisp.size(1), gtDisp.size(2)) + + if gtDisp.dim() == 4: + if gtDisp.size(1) == 1: # mult image B x 1 x H x W + gtDisp = gtDisp + else: + raise ValueError('2nd dimension size should be 1, got {}'.format(gtDisp.size(1))) + + self.gtDisp = gtDisp + self.maxDisp = maxDisp + self.start_disp = start_disp + self.dilation = dilation + self.end_disp = start_disp + maxDisp - 1 + self.disp_sample_number = (maxDisp + dilation -1) // dilation + self.eps = 1e-40 + + def getProb(self): + # [BatchSize, 1, Height, Width] + b, c, h, w = self.gtDisp.shape + assert c == 1 + + # if start_disp = 0, dilation = 1, then generate disparity candidates as [0, 1, 2, ... , maxDisp-1] + index = torch.linspace(self.start_disp, self.end_disp, self.disp_sample_number) + index = index.to(self.gtDisp.device) + + # [BatchSize, maxDisp, Height, Width] + self.index = index.repeat(b, h, w, 1).permute(0, 3, 1, 2).contiguous() + + # the gtDisp must be (start_disp, end_disp), otherwise, we have to mask it out + mask = (self.gtDisp > self.start_disp) & (self.gtDisp < self.end_disp) + mask = mask.detach().type_as(self.gtDisp) + self.gtDisp = self.gtDisp * mask + + probability = self.calProb() + + # let the outliers' probability to be 0 + # in case divide or log 0, we plus a tiny constant value + probability = probability * mask + self.eps + + # in case probability is NaN + if isNaN(probability.min()) or isNaN(probability.max()): + print('Probability ==> min: {}, max: {}'.format(probability.min(), probability.max())) + print('Disparity Ground Truth after mask out ==> min: {}, max: {}'.format(self.gtDisp.min(), + self.gtDisp.max())) + raise ValueError(" \'probability contains NaN!") + + return probability + + def kick_invalid_half(self): + distance = self.gtDisp - self.index + invalid_index = distance < 0 + # after softmax, the valid index with value 1e6 will approximately get 0 + distance[invalid_index] = 1e6 + return distance + + def calProb(self): + raise NotImplementedError + + +class LaplaceDisp2Prob(Disp2Prob): + # variance is the diversity of the Laplace distribution + def __init__(self, maxDisp, gtDisp, variance=1, start_disp=0, dilation=1): + super(LaplaceDisp2Prob, self).__init__(maxDisp, gtDisp, start_disp, dilation) + self.variance = variance + + def calProb(self): + # 1/N * exp( - (d - d{gt}) / var), N is normalization factor, [BatchSize, maxDisp, Height, Width] + scaled_distance = ((-torch.abs(self.index - self.gtDisp)) / self.variance) + probability = F.softmax(scaled_distance, dim=1) + + return probability + + +class GaussianDisp2Prob(Disp2Prob): + # variance is the variance of the Gaussian distribution + def __init__(self, maxDisp, gtDisp, variance=1, start_disp=0, dilation=1): + super(GaussianDisp2Prob, self).__init__(maxDisp, gtDisp, start_disp, dilation) + self.variance = variance + + def calProb(self): + # 1/N * exp( - (d - d{gt})^2 / b), N is normalization factor, [BatchSize, maxDisp, Height, Width] + distance = (torch.abs(self.index - self.gtDisp)) + scaled_distance = (- distance.pow(2.0) / self.variance) + probability = F.softmax(scaled_distance, dim=1) + + return probability + +class OneHotDisp2Prob(Disp2Prob): + # variance is the variance of the OneHot distribution + def __init__(self, maxDisp, gtDisp, variance=1, start_disp=0, dilation=1): + super(OneHotDisp2Prob, self).__init__(maxDisp, gtDisp, start_disp, dilation) + self.variance = variance + + def getProb(self): + + # |d - d{gt}| < variance, [BatchSize, maxDisp, Height, Width] + probability = torch.lt(torch.abs(self.index - self.gtDisp), self.variance).type_as(self.gtDisp) + + return probability + diff --git a/visualDet3D/networks/lib/disparity_loss/stereo_focal_loss.py b/visualDet3D/networks/lib/disparity_loss/stereo_focal_loss.py new file mode 100644 index 0000000..6a11297 --- /dev/null +++ b/visualDet3D/networks/lib/disparity_loss/stereo_focal_loss.py @@ -0,0 +1,121 @@ +import torch +import torch.nn as nn +import torch.nn.functional as F +from .disp2prob import LaplaceDisp2Prob, GaussianDisp2Prob, OneHotDisp2Prob + +class StereoFocalLoss(object): + """ + Under the same start disparity and maximum disparity, calculating all estimated cost volumes' loss + Args: + max_disp, (int): the max of Disparity. default: 192 + start_disp, (int): the start searching disparity index, usually be 0 + dilation (int): the step between near disparity index, it mainly used in gt probability volume generation + weights, (list of float or None): weight for each scale of estCost. + focal_coefficient, (float): stereo focal loss coefficient, details please refer to paper. default: 0.0 + sparse, (bool): whether the ground-truth disparity is sparse, for example, KITTI is sparse, but SceneFlow is not. default: False + + Inputs: + estCost, (Tensor or list of Tensor): the estimated cost volume, in (BatchSize, max_disp, Height, Width) layout + gtDisp, (Tensor): the ground truth disparity map, in (BatchSize, 1, Height, Width) layout. + variance, (Tensor or list of Tensor): the variance of distribution, details please refer to paper, in (BatchSize, 1, Height, Width) layout. + + Outputs: + loss, (dict), the loss of each level + + ..Note: + Before calculate loss, the estCost shouldn't be normalized, + because we will use softmax for normalization + """ + + def __init__(self, max_disp=192, start_disp=0, dilation=1, weights=None, focal_coefficient=0.0, sparse=False): + self.max_disp = max_disp + self.start_disp = start_disp + self.dilation = dilation + self.weights = weights + self.focal_coefficient = focal_coefficient + self.sparse = sparse + if sparse: + # sparse disparity ==> max_pooling + self.scale_func = F.adaptive_max_pool2d + else: + # dense disparity ==> avg_pooling + self.scale_func = F.adaptive_avg_pool2d + + def loss_per_level(self, estCost, gtDisp, variance, dilation): + N, C, H, W = estCost.shape + scaled_gtDisp = gtDisp.clone() + scale = 1.0 + if gtDisp.shape[-2] != H or gtDisp.shape[-1] != W: + # compute scale per level and scale gtDisp + scale = gtDisp.shape[-1] / (W * 1.0) + scaled_gtDisp = gtDisp.clone() / scale + + scaled_gtDisp = self.scale_func(scaled_gtDisp, (H, W)) + + # mask for valid disparity + # (start_disp, max disparity / scale) + # Attention: the invalid disparity of KITTI is set as 0, be sure to mask it out + lower_bound = self.start_disp + upper_bound = lower_bound + int(self.max_disp/scale) + mask = (scaled_gtDisp > lower_bound) & (scaled_gtDisp < upper_bound) + mask = mask.detach_().type_as(scaled_gtDisp) + if mask.sum() < 1.0: + print('Stereo focal loss: there is no point\'s ' + 'disparity is in [{},{})!'.format(lower_bound, upper_bound)) + scaled_gtProb = torch.zeros_like(estCost) # let this sample have loss with 0 + else: + # transfer disparity map to probability map + mask_scaled_gtDisp = scaled_gtDisp * mask + scaled_gtProb = LaplaceDisp2Prob(int(self.max_disp/scale), mask_scaled_gtDisp, variance=variance, + start_disp=self.start_disp, dilation=dilation).getProb() + + # stereo focal loss + estProb = F.log_softmax(estCost, dim=1) + weight = (1.0 - scaled_gtProb).pow(-self.focal_coefficient).type_as(scaled_gtProb) + loss = -((scaled_gtProb * estProb) * weight * mask.float()).sum(dim=1, keepdim=True).mean() + + return loss + + def __call__(self, estCost, gtDisp, variance): + if not isinstance(estCost, (list, tuple)): + estCost = [estCost] + + if self.weights is None: + self.weights = 1.0 + + if not isinstance(self.weights, (list, tuple)): + self.weights = [self.weights] * len(estCost) + + if not isinstance(self.dilation, (list, tuple)): + self.dilation = [self.dilation] * len(estCost) + + if not isinstance(variance, (list, tuple)): + variance = [variance] * len(estCost) + + # compute loss for per level + loss_all_level = [] + for est_cost_per_lvl, var, dt in zip(estCost, variance, self.dilation): + loss_all_level.append( + self.loss_per_level(est_cost_per_lvl, gtDisp, var, dt)) + + # re-weight loss per level + loss = 0 + for i, loss_per_level in enumerate(loss_all_level): + loss += self.weights[i] * loss_per_level + + return loss + + def __repr__(self): + repr_str = '{}\n'.format(self.__class__.__name__) + repr_str += ' ' * 4 + 'Max Disparity: {}\n'.format(self.max_disp) + repr_str += ' ' * 4 + 'Start disparity: {}\n'.format(self.start_disp) + repr_str += ' ' * 4 + 'Dilation rate: {}\n'.format(self.dilation) + repr_str += ' ' * 4 + 'Loss weight: {}\n'.format(self.weights) + repr_str += ' ' * 4 + 'Focal coefficient: {}\n'.format(self.focal_coefficient) + repr_str += ' ' * 4 + 'Disparity is sparse: {}\n'.format(self.sparse) + + return repr_str + + @property + def name(self): + return 'StereoFocalLoss' diff --git a/visualDet3D/networks/lib/fast_utils/hill_climbing.py b/visualDet3D/networks/lib/fast_utils/hill_climbing.py index 9d46a46..22f3d5d 100644 --- a/visualDet3D/networks/lib/fast_utils/hill_climbing.py +++ b/visualDet3D/networks/lib/fast_utils/hill_climbing.py @@ -14,12 +14,12 @@ def post_opt(bbox_2d, bbox3d_state_3d, P2, cx, cy): box_2d = bbox_2d.detach().cpu().numpy() state = bbox3d_state_3d.detach().cpu().numpy() x, y, z, w, h, l, alpha = state[0], state[1], state[2],state[3],state[4],state[5], state[6] - theta = convertAlpha2Rot([alpha], z, x)[0] + theta = convertAlpha2Rot(np.array([alpha]), cx, P2)[0] theta, ratio, w, h, l = post_optimization(p2, p2_inv, box_2d, cx, cy, z, w, h, l, theta, step_r_init=0.4, r_lim=0.01) z = z*ratio - alpha = convertRot2Alpha([theta], z, x)[0] + alpha = convertRot2Alpha(np.array([theta]), cx, P2)[0] return bbox3d_state_3d.new([cx,cy,z, w,h,l,alpha]) @jit(nopython=True, cache=True) diff --git a/visualDet3D/networks/lib/ghost_module.py b/visualDet3D/networks/lib/ghost_module.py new file mode 100644 index 0000000..c95089e --- /dev/null +++ b/visualDet3D/networks/lib/ghost_module.py @@ -0,0 +1,64 @@ +""" + This script implement ghost module from + "GhostNet: More Features from Cheap Operations" + https://arxiv.org/pdf/1911.11907.pdf + Introduction in: + https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/Building_Blocks/GhostNet/ +""" + +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +import math + + +class GhostModule(nn.Module): + """ + Ghost Module from https://github.com/iamhankai/ghostnet.pytorch. + + """ + def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True): + super(GhostModule, self).__init__() + self.oup = oup + init_channels = math.ceil(oup / ratio) + new_channels = init_channels*(ratio-1) + + self.primary_conv = nn.Sequential( + nn.AvgPool2d(stride) if stride > 1 else nn.Sequential(), + nn.Conv2d(inp, init_channels, kernel_size, 1, kernel_size//2, bias=False), + nn.BatchNorm2d(init_channels), + nn.ReLU(inplace=True) if relu else nn.Sequential(), + ) + + self.cheap_operation = nn.Sequential( + nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False), + nn.BatchNorm2d(new_channels), + nn.ReLU(inplace=True) if relu else nn.Sequential(), + ) + + def forward(self, x): + x1 = self.primary_conv(x) + x2 = self.cheap_operation(x1) + out = torch.cat([x1,x2], dim=1) + return out[:,:self.oup,:,:] + +class ResGhostModule(GhostModule): + """Some Information about ResGhostModule""" + def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, relu=True, stride=1): + assert(ratio > 2) + super(ResGhostModule, self).__init__(inp, oup-inp, kernel_size, ratio-1, dw_size, relu=relu, stride=stride) + self.oup = oup + if stride > 1: + self.downsampling = nn.AvgPool2d(kernel_size=stride, stride=stride) + else: + self.downsampling = None + + def forward(self, x): + x1 = self.primary_conv(x) + x2 = self.cheap_operation(x1) + + if not self.downsampling is None: + x = self.downsampling(x) + out = torch.cat([x, x1, x2], dim=1) + return out[:,:self.oup,:,:] diff --git a/visualDet3D/networks/lib/ops/iou3d/__init__.py b/visualDet3D/networks/lib/ops/iou3d/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/visualDet3D/networks/lib/ops/iou3d/iou3d.py b/visualDet3D/networks/lib/ops/iou3d/iou3d.py new file mode 100644 index 0000000..b9bcea5 --- /dev/null +++ b/visualDet3D/networks/lib/ops/iou3d/iou3d.py @@ -0,0 +1,103 @@ +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +from .iou3d_cuda import boxes_iou_bev_gpu, boxes_overlap_bev_gpu, nms_normal_gpu, nms_gpu + + +def boxes3d_to_bev_torch(boxes3d): + """ + :param boxes3d: (N, 7) [x, y, z, h, w, l, ry] + :return: + boxes_bev: (N, 5) [x1, y1, x2, y2, ry] + """ + boxes_bev = boxes3d.new(torch.Size((boxes3d.shape[0], 5))) + + cu, cv = boxes3d[:, 0], boxes3d[:, 2] + half_l, half_w = boxes3d[:, 5] / 2, boxes3d[:, 4] / 2 + boxes_bev[:, 0], boxes_bev[:, 1] = cu - half_l, cv - half_w + boxes_bev[:, 2], boxes_bev[:, 3] = cu + half_l, cv + half_w + boxes_bev[:, 4] = boxes3d[:, 6] + return boxes_bev + +def boxes_iou_bev(boxes_a, boxes_b): + """ + :param boxes_a: (M, 5) + :param boxes_b: (N, 5) + :return: + ans_iou: (M, N) + """ + + ans_iou = torch.cuda.FloatTensor(torch.Size((boxes_a.shape[0], boxes_b.shape[0]))).zero_() + + boxes_iou_bev_gpu(boxes_a.contiguous(), boxes_b.contiguous(), ans_iou) + + return ans_iou + +def boxes_iou3d_gpu(boxes_a, boxes_b): + """ + :param boxes_a: (N, 7) [x, y, z, h, w, l, ry] + :param boxes_b: (M, 7) [x, y, z, h, w, l, ry] + :return: + ans_iou: (M, N) + """ + boxes_a_bev = boxes3d_to_bev_torch(boxes_a) + boxes_b_bev = boxes3d_to_bev_torch(boxes_b) + + # bev overlap + overlaps_bev = torch.cuda.FloatTensor(torch.Size((boxes_a.shape[0], boxes_b.shape[0]))).zero_() # (N, M) + boxes_overlap_bev_gpu(boxes_a_bev.contiguous(), boxes_b_bev.contiguous(), overlaps_bev) + + # height overlap + boxes_a_height_min = (boxes_a[:, 1] - boxes_a[:, 3]).view(-1, 1) + boxes_a_height_max = boxes_a[:, 1].view(-1, 1) + boxes_b_height_min = (boxes_b[:, 1] - boxes_b[:, 3]).view(1, -1) + boxes_b_height_max = boxes_b[:, 1].view(1, -1) + + max_of_min = torch.max(boxes_a_height_min, boxes_b_height_min) + min_of_max = torch.min(boxes_a_height_max, boxes_b_height_max) + overlaps_h = torch.clamp(min_of_max - max_of_min, min=0) + + # 3d iou + overlaps_3d = overlaps_bev * overlaps_h + + vol_a = (boxes_a[:, 3] * boxes_a[:, 4] * boxes_a[:, 5]).view(-1, 1) + vol_b = (boxes_b[:, 3] * boxes_b[:, 4] * boxes_b[:, 5]).view(1, -1) + + iou3d = overlaps_3d / torch.clamp(vol_a + vol_b - overlaps_3d, min=1e-7) + + return iou3d + + +def nms_gpu(boxes, scores, thresh): + """ + :param boxes: (N, 5) [x1, y1, x2, y2, ry] + :param scores: (N) + :param thresh: + :return: + """ + # areas = (x2 - x1) * (y2 - y1) + order = scores.sort(0, descending=True)[1] + + boxes = boxes[order].contiguous() + + keep = torch.LongTensor(boxes.size(0)) + num_out = nms_gpu(boxes, keep, thresh) + return order[keep[:num_out].cuda()].contiguous() + + +def nms_normal_gpu(boxes, scores, thresh): + """ + :param boxes: (N, 5) [x1, y1, x2, y2, ry] + :param scores: (N) + :param thresh: + :return: + """ + # areas = (x2 - x1) * (y2 - y1) + order = scores.sort(0, descending=True)[1] + + boxes = boxes[order].contiguous() + + keep = torch.LongTensor(boxes.size(0)) + num_out = nms_normal_gpu(boxes, keep, thresh) + return order[keep[:num_out].cuda()].contiguous() \ No newline at end of file diff --git a/visualDet3D/networks/lib/ops/iou3d/make.sh b/visualDet3D/networks/lib/ops/iou3d/make.sh new file mode 100644 index 0000000..2fe7a5c --- /dev/null +++ b/visualDet3D/networks/lib/ops/iou3d/make.sh @@ -0,0 +1,8 @@ +export CUDA_HOME=/usr/local/cuda +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda +export CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME +export LD_LIBRARY_PATH="$CUDA_HOME/extras/CUPTI/lib64:$LD_LIBRARY_PATH" +export LIBRARY_PATH=$CUDA_HOME/lib64:$LIBRARY_PATH +export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH +export CFLAGS="-fopenmp -I$CUDA_HOME/include $CFLAGS" +python3 setup.py build_ext --inplace diff --git a/visualDet3D/networks/lib/ops/iou3d/setup.py b/visualDet3D/networks/lib/ops/iou3d/setup.py new file mode 100644 index 0000000..0b988eb --- /dev/null +++ b/visualDet3D/networks/lib/ops/iou3d/setup.py @@ -0,0 +1,197 @@ +#!/usr/bin/env python +import os +import subprocess +import time +from setuptools import find_packages, setup + +import torch +from torch.utils.cpp_extension import (BuildExtension, CppExtension, + CUDAExtension) + + +def readme(): + with open('README.md', encoding='utf-8') as f: + content = f.read() + return content + + +def get_git_hash(): + + def _minimal_ext_cmd(cmd): + # construct minimal environment + env = {} + for k in ['SYSTEMROOT', 'PATH', 'HOME']: + v = os.environ.get(k) + if v is not None: + env[k] = v + # LANGUAGE is used on win32 + env['LANGUAGE'] = 'C' + env['LANG'] = 'C' + env['LC_ALL'] = 'C' + out = subprocess.Popen( + cmd, stdout=subprocess.PIPE, env=env).communicate()[0] + return out + + try: + out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD']) + sha = out.strip().decode('ascii') + except OSError: + sha = 'unknown' + + return sha + + +def get_hash(): + if os.path.exists('.git'): + sha = get_git_hash()[:7] + elif os.path.exists(version_file): + try: + from mmdet.version import __version__ + sha = __version__.split('+')[-1] + except ImportError: + raise ImportError('Unable to get git version') + else: + sha = 'unknown' + + return sha + + + +def get_version(): + with open(version_file, 'r') as f: + exec(compile(f.read(), version_file, 'exec')) + return locals()['__version__'] + + +def make_cuda_ext(name, module, sources, sources_cuda=[]): + + define_macros = [] + extra_compile_args = {'cxx': []} + + if torch.cuda.is_available() or os.getenv('FORCE_CUDA', '0') == '1': + define_macros += [('WITH_CUDA', None)] + extension = CUDAExtension + extra_compile_args['nvcc'] = [ + '-D__CUDA_NO_HALF_OPERATORS__', + '-D__CUDA_NO_HALF_CONVERSIONS__', + '-D__CUDA_NO_HALF2_OPERATORS__', + ] + sources += sources_cuda + else: + print(f'Compiling {name} without CUDA') + extension = CppExtension + # raise EnvironmentError('CUDA is required to compile MMDetection!') + + return extension( + name=f'{module}.{name}', + sources=[os.path.join(*module.split('.'), p) for p in sources], + define_macros=define_macros, + extra_compile_args=extra_compile_args) + + +def parse_requirements(fname='requirements.txt', with_version=True): + """ + Parse the package dependencies listed in a requirements file but strips + specific versioning information. + + Args: + fname (str): path to requirements file + with_version (bool, default=False): if True include version specs + + Returns: + List[str]: list of requirements items + + CommandLine: + python -c "import setup; print(setup.parse_requirements())" + """ + import sys + from os.path import exists + import re + require_fpath = fname + + def parse_line(line): + """ + Parse information from a line in a requirements text file + """ + if line.startswith('-r '): + # Allow specifying requirements in other files + target = line.split(' ')[1] + for info in parse_require_file(target): + yield info + else: + info = {'line': line} + if line.startswith('-e '): + info['package'] = line.split('#egg=')[1] + else: + # Remove versioning from the package + pat = '(' + '|'.join(['>=', '==', '>']) + ')' + parts = re.split(pat, line, maxsplit=1) + parts = [p.strip() for p in parts] + + info['package'] = parts[0] + if len(parts) > 1: + op, rest = parts[1:] + if ';' in rest: + # Handle platform specific dependencies + # http://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-platform-specific-dependencies + version, platform_deps = map(str.strip, + rest.split(';')) + info['platform_deps'] = platform_deps + else: + version = rest # NOQA + info['version'] = (op, version) + yield info + + def parse_require_file(fpath): + with open(fpath, 'r') as f: + for line in f.readlines(): + line = line.strip() + if line and not line.startswith('#'): + for info in parse_line(line): + yield info + + def gen_packages_items(): + if exists(require_fpath): + for info in parse_require_file(require_fpath): + parts = [info['package']] + if with_version and 'version' in info: + parts.extend(info['version']) + if not sys.version.startswith('3.4'): + # apparently package_deps are broken in 3.4 + platform_deps = info.get('platform_deps') + if platform_deps is not None: + parts.append(';' + platform_deps) + item = ''.join(parts) + yield item + + packages = list(gen_packages_items()) + return packages + + +if __name__ == '__main__': + setup( + name='iou3d_cuda', + description='iou3d cuda', + keywords='computer vision, object detection', + packages=find_packages(exclude=('configs', 'tools', 'demo')), + classifiers=[ + 'Development Status :: 4 - Beta', + 'License :: OSI Approved :: Apache Software License', + 'Operating System :: OS Independent', + 'Programming Language :: Python :: 3', + 'Programming Language :: Python :: 3.5', + 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', + ], + license='Apache License 2.0', + ext_modules=[ + make_cuda_ext( + name='iou3d_cuda', + module='.', + sources=['src/iou3d.cpp'], + sources_cuda=[ + 'src/iou3d_kernel.cu' + ]), + ], + cmdclass={'build_ext': BuildExtension}, + zip_safe=False) diff --git a/visualDet3D/networks/lib/ops/iou3d/src/iou3d.cpp b/visualDet3D/networks/lib/ops/iou3d/src/iou3d.cpp new file mode 100644 index 0000000..ee11ce1 --- /dev/null +++ b/visualDet3D/networks/lib/ops/iou3d/src/iou3d.cpp @@ -0,0 +1,180 @@ +#include +#include +#include +#include +#include + +#define CHECK_CUDA(x) TORCH_CHECK(x.device().is_cuda(), #x, " must be a CUDAtensor ") +#define CHECK_CONTIGUOUS(x) TORCH_CHECK(x.is_contiguous(), #x, " must be contiguous ") +#define CHECK_INPUT(x) CHECK_CUDA(x);CHECK_CONTIGUOUS(x) + +#define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0)) + +#define CHECK_ERROR(ans) { gpuAssert((ans), __FILE__, __LINE__); } +inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true) +{ + if (code != cudaSuccess) + { + fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line); + if (abort) exit(code); + } +} + +const int THREADS_PER_BLOCK_NMS = sizeof(unsigned long long) * 8; + + +void boxesoverlapLauncher(const int num_a, const float *boxes_a, const int num_b, const float *boxes_b, float *ans_overlap); +void boxesioubevLauncher(const int num_a, const float *boxes_a, const int num_b, const float *boxes_b, float *ans_iou); +void nmsLauncher(const float *boxes, unsigned long long * mask, int boxes_num, float nms_overlap_thresh); +void nmsNormalLauncher(const float *boxes, unsigned long long * mask, int boxes_num, float nms_overlap_thresh); + +int boxes_overlap_bev_gpu(at::Tensor boxes_a, at::Tensor boxes_b, at::Tensor ans_overlap){ + // params boxes_a: (N, 5) [x1, y1, x2, y2, ry] + // params boxes_b: (M, 5) + // params ans_overlap: (N, M) + + CHECK_INPUT(boxes_a); + CHECK_INPUT(boxes_b); + CHECK_INPUT(ans_overlap); + + int num_a = boxes_a.size(0); + int num_b = boxes_b.size(0); + + const float * boxes_a_data = boxes_a.data_ptr(); + const float * boxes_b_data = boxes_b.data_ptr(); + float * ans_overlap_data = ans_overlap.data_ptr(); + + boxesoverlapLauncher(num_a, boxes_a_data, num_b, boxes_b_data, ans_overlap_data); + + return 1; +} + +int boxes_iou_bev_gpu(at::Tensor boxes_a, at::Tensor boxes_b, at::Tensor ans_iou){ + // params boxes_a: (N, 5) [x1, y1, x2, y2, ry] + // params boxes_b: (M, 5) + // params ans_overlap: (N, M) + + CHECK_INPUT(boxes_a); + CHECK_INPUT(boxes_b); + CHECK_INPUT(ans_iou); + + int num_a = boxes_a.size(0); + int num_b = boxes_b.size(0); + + const float * boxes_a_data = boxes_a.data_ptr(); + const float * boxes_b_data = boxes_b.data_ptr(); + float * ans_iou_data = ans_iou.data_ptr(); + + boxesioubevLauncher(num_a, boxes_a_data, num_b, boxes_b_data, ans_iou_data); + + return 1; +} + +int nms_gpu(at::Tensor boxes, at::Tensor keep, float nms_overlap_thresh){ + // params boxes: (N, 5) [x1, y1, x2, y2, ry] + // params keep: (N) + + CHECK_INPUT(boxes); + CHECK_CONTIGUOUS(keep); + + int boxes_num = boxes.size(0); + const float * boxes_data = boxes.data_ptr(); + long * keep_data = keep.data_ptr(); + + const int col_blocks = DIVUP(boxes_num, THREADS_PER_BLOCK_NMS); + + unsigned long long *mask_data = NULL; + CHECK_ERROR(cudaMalloc((void**)&mask_data, boxes_num * col_blocks * sizeof(unsigned long long))); + nmsLauncher(boxes_data, mask_data, boxes_num, nms_overlap_thresh); + + // unsigned long long mask_cpu[boxes_num * col_blocks]; + // unsigned long long *mask_cpu = new unsigned long long [boxes_num * col_blocks]; + std::vector mask_cpu(boxes_num * col_blocks); + +// printf("boxes_num=%d, col_blocks=%d\n", boxes_num, col_blocks); + CHECK_ERROR(cudaMemcpy(&mask_cpu[0], mask_data, boxes_num * col_blocks * sizeof(unsigned long long), + cudaMemcpyDeviceToHost)); + + cudaFree(mask_data); + + unsigned long long remv_cpu[col_blocks]; + memset(remv_cpu, 0, col_blocks * sizeof(unsigned long long)); + + int num_to_keep = 0; + + for (int i = 0; i < boxes_num; i++){ + int nblock = i / THREADS_PER_BLOCK_NMS; + int inblock = i % THREADS_PER_BLOCK_NMS; + + if (!(remv_cpu[nblock] & (1ULL << inblock))){ + keep_data[num_to_keep++] = i; + unsigned long long *p = &mask_cpu[0] + i * col_blocks; + for (int j = nblock; j < col_blocks; j++){ + remv_cpu[j] |= p[j]; + } + } + } + if ( cudaSuccess != cudaGetLastError() ) printf( "Error!\n" ); + + return num_to_keep; +} + + +int nms_normal_gpu(at::Tensor boxes, at::Tensor keep, float nms_overlap_thresh){ + // params boxes: (N, 5) [x1, y1, x2, y2, ry] + // params keep: (N) + + CHECK_INPUT(boxes); + CHECK_CONTIGUOUS(keep); + + int boxes_num = boxes.size(0); + const float * boxes_data = boxes.data_ptr(); + long * keep_data = keep.data_ptr(); + + const int col_blocks = DIVUP(boxes_num, THREADS_PER_BLOCK_NMS); + + unsigned long long *mask_data = NULL; + CHECK_ERROR(cudaMalloc((void**)&mask_data, boxes_num * col_blocks * sizeof(unsigned long long))); + nmsNormalLauncher(boxes_data, mask_data, boxes_num, nms_overlap_thresh); + + // unsigned long long mask_cpu[boxes_num * col_blocks]; + // unsigned long long *mask_cpu = new unsigned long long [boxes_num * col_blocks]; + std::vector mask_cpu(boxes_num * col_blocks); + +// printf("boxes_num=%d, col_blocks=%d\n", boxes_num, col_blocks); + CHECK_ERROR(cudaMemcpy(&mask_cpu[0], mask_data, boxes_num * col_blocks * sizeof(unsigned long long), + cudaMemcpyDeviceToHost)); + + cudaFree(mask_data); + + unsigned long long remv_cpu[col_blocks]; + memset(remv_cpu, 0, col_blocks * sizeof(unsigned long long)); + + int num_to_keep = 0; + + for (int i = 0; i < boxes_num; i++){ + int nblock = i / THREADS_PER_BLOCK_NMS; + int inblock = i % THREADS_PER_BLOCK_NMS; + + if (!(remv_cpu[nblock] & (1ULL << inblock))){ + keep_data[num_to_keep++] = i; + unsigned long long *p = &mask_cpu[0] + i * col_blocks; + for (int j = nblock; j < col_blocks; j++){ + remv_cpu[j] |= p[j]; + } + } + } + if ( cudaSuccess != cudaGetLastError() ) printf( "Error!\n" ); + + return num_to_keep; +} + + + +PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { + m.def("boxes_overlap_bev_gpu", &boxes_overlap_bev_gpu, "oriented boxes overlap"); + m.def("boxes_iou_bev_gpu", &boxes_iou_bev_gpu, "oriented boxes iou"); + m.def("nms_gpu", &nms_gpu, "oriented nms gpu"); + m.def("nms_normal_gpu", &nms_normal_gpu, "nms gpu"); +} + diff --git a/visualDet3D/networks/lib/ops/iou3d/src/iou3d_kernel.cu b/visualDet3D/networks/lib/ops/iou3d/src/iou3d_kernel.cu new file mode 100644 index 0000000..328a557 --- /dev/null +++ b/visualDet3D/networks/lib/ops/iou3d/src/iou3d_kernel.cu @@ -0,0 +1,387 @@ +/* +3D IoU Calculation and Rotated NMS(modified from 2D NMS written by others) +Written by Shaoshuai Shi +All Rights Reserved 2018. +*/ + +#include +#define THREADS_PER_BLOCK 16 +#define DIVUP(m, n) ((m) / (n) + ((m) % (n) > 0)) + +//#define DEBUG +const int THREADS_PER_BLOCK_NMS = sizeof(unsigned long long) * 8; +const float EPS = 1e-8; +struct Point { + float x, y; + __device__ Point() {} + __device__ Point(double _x, double _y){ + x = _x, y = _y; + } + + __device__ void set(float _x, float _y){ + x = _x; y = _y; + } + + __device__ Point operator +(const Point &b)const{ + return Point(x + b.x, y + b.y); + } + + __device__ Point operator -(const Point &b)const{ + return Point(x - b.x, y - b.y); + } +}; + +__device__ inline float cross(const Point &a, const Point &b){ + return a.x * b.y - a.y * b.x; +} + +__device__ inline float cross(const Point &p1, const Point &p2, const Point &p0){ + return (p1.x - p0.x) * (p2.y - p0.y) - (p2.x - p0.x) * (p1.y - p0.y); +} + +__device__ int check_rect_cross(const Point &p1, const Point &p2, const Point &q1, const Point &q2){ + int ret = min(p1.x,p2.x) <= max(q1.x,q2.x) && + min(q1.x,q2.x) <= max(p1.x,p2.x) && + min(p1.y,p2.y) <= max(q1.y,q2.y) && + min(q1.y,q2.y) <= max(p1.y,p2.y); + return ret; +} + +__device__ inline int check_in_box2d(const float *box, const Point &p){ + //params: box (5) [x1, y1, x2, y2, angle] + const float MARGIN = 1e-5; + + float center_x = (box[0] + box[2]) / 2; + float center_y = (box[1] + box[3]) / 2; + float angle_cos = cos(-box[4]), angle_sin = sin(-box[4]); // rotate the point in the opposite direction of box + float rot_x = (p.x - center_x) * angle_cos + (p.y - center_y) * angle_sin + center_x; + float rot_y = -(p.x - center_x) * angle_sin + (p.y - center_y) * angle_cos + center_y; +#ifdef DEBUG + printf("box: (%.3f, %.3f, %.3f, %.3f, %.3f)\n", box[0], box[1], box[2], box[3], box[4]); + printf("center: (%.3f, %.3f), cossin(%.3f, %.3f), src(%.3f, %.3f), rot(%.3f, %.3f)\n", center_x, center_y, + angle_cos, angle_sin, p.x, p.y, rot_x, rot_y); +#endif + return (rot_x > box[0] - MARGIN && rot_x < box[2] + MARGIN && rot_y > box[1] - MARGIN && rot_y < box[3] + MARGIN); +} + +__device__ inline int intersection(const Point &p1, const Point &p0, const Point &q1, const Point &q0, Point &ans){ + // fast exclusion + if (check_rect_cross(p0, p1, q0, q1) == 0) return 0; + + // check cross standing + float s1 = cross(q0, p1, p0); + float s2 = cross(p1, q1, p0); + float s3 = cross(p0, q1, q0); + float s4 = cross(q1, p1, q0); + + if (!(s1 * s2 > 0 && s3 * s4 > 0)) return 0; + + // calculate intersection of two lines + float s5 = cross(q1, p1, p0); + if(fabs(s5 - s1) > EPS){ + ans.x = (s5 * q0.x - s1 * q1.x) / (s5 - s1); + ans.y = (s5 * q0.y - s1 * q1.y) / (s5 - s1); + + } + else{ + float a0 = p0.y - p1.y, b0 = p1.x - p0.x, c0 = p0.x * p1.y - p1.x * p0.y; + float a1 = q0.y - q1.y, b1 = q1.x - q0.x, c1 = q0.x * q1.y - q1.x * q0.y; + float D = a0 * b1 - a1 * b0; + + ans.x = (b0 * c1 - b1 * c0) / D; + ans.y = (a1 * c0 - a0 * c1) / D; + } + + return 1; +} + +__device__ inline void rotate_around_center(const Point ¢er, const float angle_cos, const float angle_sin, Point &p){ + float new_x = (p.x - center.x) * angle_cos + (p.y - center.y) * angle_sin + center.x; + float new_y = -(p.x - center.x) * angle_sin + (p.y - center.y) * angle_cos + center.y; + p.set(new_x, new_y); +} + +__device__ inline int point_cmp(const Point &a, const Point &b, const Point ¢er){ + return atan2(a.y - center.y, a.x - center.x) > atan2(b.y - center.y, b.x - center.x); +} + +__device__ inline float box_overlap(const float *box_a, const float *box_b){ + // params: box_a (5) [x1, y1, x2, y2, angle] + // params: box_b (5) [x1, y1, x2, y2, angle] + + float a_x1 = box_a[0], a_y1 = box_a[1], a_x2 = box_a[2], a_y2 = box_a[3], a_angle = box_a[4]; + float b_x1 = box_b[0], b_y1 = box_b[1], b_x2 = box_b[2], b_y2 = box_b[3], b_angle = box_b[4]; + + Point center_a((a_x1 + a_x2) / 2, (a_y1 + a_y2) / 2); + Point center_b((b_x1 + b_x2) / 2, (b_y1 + b_y2) / 2); +#ifdef DEBUG + printf("a: (%.3f, %.3f, %.3f, %.3f, %.3f), b: (%.3f, %.3f, %.3f, %.3f, %.3f)\n", a_x1, a_y1, a_x2, a_y2, a_angle, + b_x1, b_y1, b_x2, b_y2, b_angle); + printf("center a: (%.3f, %.3f), b: (%.3f, %.3f)\n", center_a.x, center_a.y, center_b.x, center_b.y); +#endif + + Point box_a_corners[5]; + box_a_corners[0].set(a_x1, a_y1); + box_a_corners[1].set(a_x2, a_y1); + box_a_corners[2].set(a_x2, a_y2); + box_a_corners[3].set(a_x1, a_y2); + + Point box_b_corners[5]; + box_b_corners[0].set(b_x1, b_y1); + box_b_corners[1].set(b_x2, b_y1); + box_b_corners[2].set(b_x2, b_y2); + box_b_corners[3].set(b_x1, b_y2); + + // get oriented corners + float a_angle_cos = cos(a_angle), a_angle_sin = sin(a_angle); + float b_angle_cos = cos(b_angle), b_angle_sin = sin(b_angle); + + for (int k = 0; k < 4; k++){ +#ifdef DEBUG + printf("before corner %d: a(%.3f, %.3f), b(%.3f, %.3f) \n", k, box_a_corners[k].x, box_a_corners[k].y, box_b_corners[k].x, box_b_corners[k].y); +#endif + rotate_around_center(center_a, a_angle_cos, a_angle_sin, box_a_corners[k]); + rotate_around_center(center_b, b_angle_cos, b_angle_sin, box_b_corners[k]); +#ifdef DEBUG + printf("corner %d: a(%.3f, %.3f), b(%.3f, %.3f) \n", k, box_a_corners[k].x, box_a_corners[k].y, box_b_corners[k].x, box_b_corners[k].y); +#endif + } + + box_a_corners[4] = box_a_corners[0]; + box_b_corners[4] = box_b_corners[0]; + + // get intersection of lines + Point cross_points[16]; + Point poly_center; + int cnt = 0, flag = 0; + + poly_center.set(0, 0); + for (int i = 0; i < 4; i++){ + for (int j = 0; j < 4; j++){ + flag = intersection(box_a_corners[i + 1], box_a_corners[i], box_b_corners[j + 1], box_b_corners[j], cross_points[cnt]); + if (flag){ + poly_center = poly_center + cross_points[cnt]; + cnt++; + } + } + } + + // check corners + for (int k = 0; k < 4; k++){ + if (check_in_box2d(box_a, box_b_corners[k])){ + poly_center = poly_center + box_b_corners[k]; + cross_points[cnt] = box_b_corners[k]; + cnt++; + } + if (check_in_box2d(box_b, box_a_corners[k])){ + poly_center = poly_center + box_a_corners[k]; + cross_points[cnt] = box_a_corners[k]; + cnt++; + } + } + + poly_center.x /= cnt; + poly_center.y /= cnt; + + // sort the points of polygon + Point temp; + for (int j = 0; j < cnt - 1; j++){ + for (int i = 0; i < cnt - j - 1; i++){ + if (point_cmp(cross_points[i], cross_points[i + 1], poly_center)){ + temp = cross_points[i]; + cross_points[i] = cross_points[i + 1]; + cross_points[i + 1] = temp; + } + } + } + +#ifdef DEBUG + printf("cnt=%d\n", cnt); + for (int i = 0; i < cnt; i++){ + printf("All cross point %d: (%.3f, %.3f)\n", i, cross_points[i].x, cross_points[i].y); + } +#endif + + // get the overlap areas + float area = 0; + for (int k = 0; k < cnt - 1; k++){ + area += cross(cross_points[k] - cross_points[0], cross_points[k + 1] - cross_points[0]); + } + + return fabs(area) / 2.0; +} + +__device__ inline float iou_bev(const float *box_a, const float *box_b){ + // params: box_a (5) [x1, y1, x2, y2, angle] + // params: box_b (5) [x1, y1, x2, y2, angle] + float sa = (box_a[2] - box_a[0]) * (box_a[3] - box_a[1]); + float sb = (box_b[2] - box_b[0]) * (box_b[3] - box_b[1]); + float s_overlap = box_overlap(box_a, box_b); + return s_overlap / fmaxf(sa + sb - s_overlap, EPS); +} + +__global__ void boxes_overlap_kernel(const int num_a, const float *boxes_a, const int num_b, const float *boxes_b, float *ans_overlap){ + const int a_idx = blockIdx.y * THREADS_PER_BLOCK + threadIdx.y; + const int b_idx = blockIdx.x * THREADS_PER_BLOCK + threadIdx.x; + + if (a_idx >= num_a || b_idx >= num_b){ + return; + } + const float * cur_box_a = boxes_a + a_idx * 5; + const float * cur_box_b = boxes_b + b_idx * 5; + float s_overlap = box_overlap(cur_box_a, cur_box_b); + ans_overlap[a_idx * num_b + b_idx] = s_overlap; +} + +__global__ void boxes_iou_bev_kernel(const int num_a, const float *boxes_a, const int num_b, const float *boxes_b, float *ans_iou){ + const int a_idx = blockIdx.y * THREADS_PER_BLOCK + threadIdx.y; + const int b_idx = blockIdx.x * THREADS_PER_BLOCK + threadIdx.x; + + if (a_idx >= num_a || b_idx >= num_b){ + return; + } + + const float * cur_box_a = boxes_a + a_idx * 5; + const float * cur_box_b = boxes_b + b_idx * 5; + float cur_iou_bev = iou_bev(cur_box_a, cur_box_b); + ans_iou[a_idx * num_b + b_idx] = cur_iou_bev; +} + +__global__ void nms_kernel(const int boxes_num, const float nms_overlap_thresh, + const float *boxes, unsigned long long *mask){ + //params: boxes (N, 5) [x1, y1, x2, y2, ry] + //params: mask (N, N/THREADS_PER_BLOCK_NMS) + + const int row_start = blockIdx.y; + const int col_start = blockIdx.x; + + // if (row_start > col_start) return; + + const int row_size = fminf(boxes_num - row_start * THREADS_PER_BLOCK_NMS, THREADS_PER_BLOCK_NMS); + const int col_size = fminf(boxes_num - col_start * THREADS_PER_BLOCK_NMS, THREADS_PER_BLOCK_NMS); + + __shared__ float block_boxes[THREADS_PER_BLOCK_NMS * 5]; + + if (threadIdx.x < col_size) { + block_boxes[threadIdx.x * 5 + 0] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 0]; + block_boxes[threadIdx.x * 5 + 1] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 1]; + block_boxes[threadIdx.x * 5 + 2] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 2]; + block_boxes[threadIdx.x * 5 + 3] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 3]; + block_boxes[threadIdx.x * 5 + 4] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 4]; + } + __syncthreads(); + + if (threadIdx.x < row_size) { + const int cur_box_idx = THREADS_PER_BLOCK_NMS * row_start + threadIdx.x; + const float *cur_box = boxes + cur_box_idx * 5; + + int i = 0; + unsigned long long t = 0; + int start = 0; + if (row_start == col_start) { + start = threadIdx.x + 1; + } + for (i = start; i < col_size; i++) { + if (iou_bev(cur_box, block_boxes + i * 5) > nms_overlap_thresh){ + t |= 1ULL << i; + } + } + const int col_blocks = DIVUP(boxes_num, THREADS_PER_BLOCK_NMS); + mask[cur_box_idx * col_blocks + col_start] = t; + } +} + + +__device__ inline float iou_normal(float const * const a, float const * const b) { + float left = fmaxf(a[0], b[0]), right = fminf(a[2], b[2]); + float top = fmaxf(a[1], b[1]), bottom = fminf(a[3], b[3]); + float width = fmaxf(right - left, 0.f), height = fmaxf(bottom - top, 0.f); + float interS = width * height; + float Sa = (a[2] - a[0]) * (a[3] - a[1]); + float Sb = (b[2] - b[0]) * (b[3] - b[1]); + return interS / fmaxf(Sa + Sb - interS, EPS); +} + + +__global__ void nms_normal_kernel(const int boxes_num, const float nms_overlap_thresh, + const float *boxes, unsigned long long *mask){ + //params: boxes (N, 5) [x1, y1, x2, y2, ry] + //params: mask (N, N/THREADS_PER_BLOCK_NMS) + + const int row_start = blockIdx.y; + const int col_start = blockIdx.x; + + // if (row_start > col_start) return; + + const int row_size = fminf(boxes_num - row_start * THREADS_PER_BLOCK_NMS, THREADS_PER_BLOCK_NMS); + const int col_size = fminf(boxes_num - col_start * THREADS_PER_BLOCK_NMS, THREADS_PER_BLOCK_NMS); + + __shared__ float block_boxes[THREADS_PER_BLOCK_NMS * 5]; + + if (threadIdx.x < col_size) { + block_boxes[threadIdx.x * 5 + 0] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 0]; + block_boxes[threadIdx.x * 5 + 1] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 1]; + block_boxes[threadIdx.x * 5 + 2] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 2]; + block_boxes[threadIdx.x * 5 + 3] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 3]; + block_boxes[threadIdx.x * 5 + 4] = boxes[(THREADS_PER_BLOCK_NMS * col_start + threadIdx.x) * 5 + 4]; + } + __syncthreads(); + + if (threadIdx.x < row_size) { + const int cur_box_idx = THREADS_PER_BLOCK_NMS * row_start + threadIdx.x; + const float *cur_box = boxes + cur_box_idx * 5; + + int i = 0; + unsigned long long t = 0; + int start = 0; + if (row_start == col_start) { + start = threadIdx.x + 1; + } + for (i = start; i < col_size; i++) { + if (iou_normal(cur_box, block_boxes + i * 5) > nms_overlap_thresh){ + t |= 1ULL << i; + } + } + const int col_blocks = DIVUP(boxes_num, THREADS_PER_BLOCK_NMS); + mask[cur_box_idx * col_blocks + col_start] = t; + } +} + + + + + +void boxesoverlapLauncher(const int num_a, const float *boxes_a, const int num_b, const float *boxes_b, float *ans_overlap){ + + dim3 blocks(DIVUP(num_b, THREADS_PER_BLOCK), DIVUP(num_a, THREADS_PER_BLOCK)); // blockIdx.x(col), blockIdx.y(row) + dim3 threads(THREADS_PER_BLOCK, THREADS_PER_BLOCK); + + boxes_overlap_kernel<<>>(num_a, boxes_a, num_b, boxes_b, ans_overlap); +#ifdef DEBUG + cudaDeviceSynchronize(); // for using printf in kernel function +#endif +} + +void boxesioubevLauncher(const int num_a, const float *boxes_a, const int num_b, const float *boxes_b, float *ans_iou){ + + dim3 blocks(DIVUP(num_b, THREADS_PER_BLOCK), DIVUP(num_a, THREADS_PER_BLOCK)); // blockIdx.x(col), blockIdx.y(row) + dim3 threads(THREADS_PER_BLOCK, THREADS_PER_BLOCK); + + boxes_iou_bev_kernel<<>>(num_a, boxes_a, num_b, boxes_b, ans_iou); +} + + +void nmsLauncher(const float *boxes, unsigned long long * mask, int boxes_num, float nms_overlap_thresh){ + dim3 blocks(DIVUP(boxes_num, THREADS_PER_BLOCK_NMS), + DIVUP(boxes_num, THREADS_PER_BLOCK_NMS)); + dim3 threads(THREADS_PER_BLOCK_NMS); + nms_kernel<<>>(boxes_num, nms_overlap_thresh, boxes, mask); +} + + +void nmsNormalLauncher(const float *boxes, unsigned long long * mask, int boxes_num, float nms_overlap_thresh){ + dim3 blocks(DIVUP(boxes_num, THREADS_PER_BLOCK_NMS), + DIVUP(boxes_num, THREADS_PER_BLOCK_NMS)); + dim3 threads(THREADS_PER_BLOCK_NMS); + nms_normal_kernel<<>>(boxes_num, nms_overlap_thresh, boxes, mask); +} diff --git a/visualDet3D/networks/pipelines/testers.py b/visualDet3D/networks/pipelines/testers.py index 38fb4fa..efa3925 100644 --- a/visualDet3D/networks/pipelines/testers.py +++ b/visualDet3D/networks/pipelines/testers.py @@ -27,3 +27,16 @@ def test_mono_detection(data, module:nn.Module, return scores, bbox, obj_types +@PIPELINE_DICT.register_module +@torch.no_grad() +def test_stereo_detection(data, module:nn.Module, + writer:SummaryWriter, + loss_logger:LossLogger=None, + global_step:int=None, + cfg:EasyDict=None) -> Tuple[torch.Tensor, torch.Tensor, List[str]]: + left_images, right_images, P2, P3 = data[0], data[1], data[2], data[3] + + scores, bbox, obj_index = module([left_images.cuda().float().contiguous(), right_images.cuda().float().contiguous(), torch.tensor(P2).cuda().float(), torch.tensor(P3).cuda().float()]) + obj_types = [cfg.obj_types[i.item()] for i in obj_index] + + return scores, bbox, obj_types \ No newline at end of file diff --git a/visualDet3D/networks/pipelines/trainers.py b/visualDet3D/networks/pipelines/trainers.py index 7ddba77..7b9eb3c 100644 --- a/visualDet3D/networks/pipelines/trainers.py +++ b/visualDet3D/networks/pipelines/trainers.py @@ -19,6 +19,7 @@ def train_mono_detection(data, module:nn.Module, writer:SummaryWriter=None, loss_logger:LossLogger=None, global_step:int=None, + epoch_num:int=None, cfg:EasyDict=EasyDict()): optimizer.zero_grad() # load data @@ -58,6 +59,7 @@ def train_mono_depth(data, module:nn.Module, writer:SummaryWriter=None, loss_logger:LossLogger=None, global_step:int=None, + epoch_num:int=None, cfg:EasyDict=EasyDict()): optimizer.zero_grad() image, K, gts = data @@ -78,3 +80,82 @@ def train_mono_depth(data, module:nn.Module, torch.nn.utils.clip_grad_norm_(module.parameters(), cfg.optimizer.clipped_gradient_norm) optimizer.step() + +@PIPELINE_DICT.register_module +def train_stereo_detection(data, module:nn.Module, + optimizer:optim.Optimizer, + writer:SummaryWriter=None, + loss_logger:LossLogger=None, + global_step:int=None, + epoch_num:int=None, + cfg:EasyDict=EasyDict()): + optimizer.zero_grad() + left_images, right_images, P2, P3, labels, bbox2d, bbox_3d, disparity = data + + # create compound array of annotation + max_length = np.max([len(label) for label in labels]) + if max_length == 0: + return + annotation = compound_annotation(labels, max_length, bbox2d, bbox_3d, cfg.obj_types) #np.arraym, [batch, max_length, 4 + 1 + 7] + + # Feed to the network + classification_loss, regression_loss, loss_dict = module( + [left_images.cuda().float().contiguous(), right_images.cuda().float().contiguous(), + left_images.new(annotation).cuda(), + P2.cuda(), P3.cuda(), + disparity.cuda().contiguous()] + ) + + classification_loss = classification_loss.mean() + regression_loss = regression_loss.mean() + + if not loss_logger is None: + # Record loss in a average meter + loss_logger.update(loss_dict) + del loss_dict + + if not optimizer is None: + loss = classification_loss + regression_loss + + if bool(loss == 0): + del loss, loss_dict + return + loss.backward() + # clip loss norm + torch.nn.utils.clip_grad_norm_(module.parameters(), cfg.optimizer.clipped_gradient_norm) + + optimizer.step() + optimizer.zero_grad() + +@PIPELINE_DICT.register_module +def train_rtm3d(data, module:nn.Module, + optimizer:optim.Optimizer, + writer:SummaryWriter=None, + loss_logger:LossLogger=None, + global_step:int=None, + epoch_num:int=None, + cfg:EasyDict=EasyDict()): + optimizer.zero_grad() + image, K, gts = data + #outs = data + + for key in gts: + gts[key] = gts[key].cuda() + + # Feed to the network + loss, loss_dict = module( + [image.cuda().float().contiguous(), gts, dict(P2=image.new(K).cuda().float(), epoch=epoch_num)] + ) + + + if not loss_logger is None and loss > 0: + # Record loss in a average meter + loss_logger.update(loss_dict) + + if bool(loss == 0): + return + loss.mean().backward() + # clip loss norm + if 'clipped_gradient_norm' in cfg.optimizer: + torch.nn.utils.clip_grad_norm_(module.parameters(), cfg.optimizer.clipped_gradient_norm) + optimizer.step() diff --git a/visualDet3D/networks/utils/rtm3d_utils.py b/visualDet3D/networks/utils/rtm3d_utils.py new file mode 100644 index 0000000..bcdb1b6 --- /dev/null +++ b/visualDet3D/networks/utils/rtm3d_utils.py @@ -0,0 +1,400 @@ +import numpy as np +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim +from visualDet3D.networks.lib.ops.iou3d.iou3d import boxes_iou3d_gpu + +def compute_res_loss(output, target): + return F.smooth_l1_loss(output, target, reduction='elementwise_mean') + +def compute_bin_loss(output, target, mask): + mask = mask.expand_as(output) + output = output * mask.float() + return F.cross_entropy(output, target, reduction='elementwise_mean') + +def compute_rot_loss(output, target_bin, target_res, mask): + # output: (B, 128, 8) [bin1_cls[0], bin1_cls[1], bin1_sin, bin1_cos, + # bin2_cls[0], bin2_cls[1], bin2_sin, bin2_cos] + # target_bin: (B, 128, 2) [bin1_cls, bin2_cls] + # target_res: (B, 128, 2) [bin1_res, bin2_res] + # mask: (B, 128, 1) + # import pdb; pdb.set_trace() + output = output.view(-1, 8) + target_bin = target_bin.view(-1, 2) + target_res = target_res.view(-1, 2) + mask = mask.view(-1, 1) + loss_bin1 = compute_bin_loss(output[:, 0:2], target_bin[:, 0], mask) + loss_bin2 = compute_bin_loss(output[:, 4:6], target_bin[:, 1], mask) + loss_res = torch.zeros_like(loss_bin1) + if target_bin[:, 0].nonzero().shape[0] > 0: + idx1 = target_bin[:, 0].nonzero()[:, 0] + valid_output1 = torch.index_select(output, 0, idx1.long()) + valid_target_res1 = torch.index_select(target_res, 0, idx1.long()) + loss_sin1 = compute_res_loss( + valid_output1[:, 2], torch.sin(valid_target_res1[:, 0])) + loss_cos1 = compute_res_loss( + valid_output1[:, 3], torch.cos(valid_target_res1[:, 0])) + loss_res += loss_sin1 + loss_cos1 + if target_bin[:, 1].nonzero().shape[0] > 0: + idx2 = target_bin[:, 1].nonzero()[:, 0] + valid_output2 = torch.index_select(output, 0, idx2.long()) + valid_target_res2 = torch.index_select(target_res, 0, idx2.long()) + loss_sin2 = compute_res_loss( + valid_output2[:, 6], torch.sin(valid_target_res2[:, 1])) + loss_cos2 = compute_res_loss( + valid_output2[:, 7], torch.cos(valid_target_res2[:, 1])) + loss_res += loss_sin2 + loss_cos2 + return loss_bin1 + loss_bin2 + loss_res + + +def gaussian_radius(det_size, min_overlap=0.7): + height, width = det_size + + a1 = 1 + b1 = (height + width) + c1 = width * height * (1 - min_overlap) / (1 + min_overlap) + sq1 = np.sqrt(b1 ** 2 - 4 * a1 * c1) + r1 = (b1 + sq1) / 2 + + a2 = 4 + b2 = 2 * (height + width) + c2 = (1 - min_overlap) * width * height + sq2 = np.sqrt(b2 ** 2 - 4 * a2 * c2) + r2 = (b2 + sq2) / 2 + + a3 = 4 * min_overlap + b3 = -2 * min_overlap * (height + width) + c3 = (min_overlap - 1) * width * height + sq3 = np.sqrt(b3 ** 2 - 4 * a3 * c3) + r3 = (b3 + sq3) / 2 + return min(r1, r2, r3) + +def compute_radius(det_size, min_overlap=0.7): + height, width = det_size[0], det_size[1] + + a2 = 4 + b2 = 2 * (height + width) + c2 = (1 - min_overlap) * width * height + sq2 = np.sqrt(b2 ** 2 - 4 * a2 * c2) + r2 = (b2 - sq2) / (2 * a2) + + return r2 + +def gaussian2D(shape, sigma=1): + m, n = [(ss - 1.) / 2. for ss in shape] + y, x = np.ogrid[-m:m + 1, -n:n + 1] + h = np.exp(-(x * x + y * y) / (2 * sigma * sigma)) + h[h < np.finfo(h.dtype).eps * h.max()] = 0 + + return h + +def gen_hm_radius(heatmap, center, radius, k=1): + diameter = 2 * radius + 1 + gaussian = gaussian2D((diameter, diameter), sigma=diameter / 6) + + x, y = int(center[0]), int(center[1]) + + height, width = heatmap.shape[0:2] + + left, right = min(x, radius), min(width - x, radius + 1) + top, bottom = min(y, radius), min(height - y, radius + 1) + + masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right] + masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:radius + right] + if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0: # TODO debug + np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap) + + return heatmap + +def project_to_image(pts_3d, P): + # pts_3d: n x 3 + # P: 3 x 4 + # return: n x 2 + pts_3d_homo = np.concatenate([pts_3d, np.ones((pts_3d.shape[0], 1), dtype=np.float32)], axis=1) + pts_2d = np.dot(P, pts_3d_homo.transpose(1, 0)).transpose(1, 0) + pts_2d = pts_2d[:, :2] / pts_2d[:, 2:] + + return pts_2d.astype(np.int) + + +def _nms(heat, kernel=3): + pad = (kernel - 1) // 2 + hmax = F.max_pool2d(heat, (kernel, kernel), stride=1, padding=pad) + keep = (hmax == heat).float() + + return heat * keep + + +def _gather_feat(feat, ind, mask=None): + dim = feat.size(2) + ind = ind.unsqueeze(2).expand(ind.size(0), ind.size(1), dim) + feat = feat.gather(1, ind) + if mask is not None: + mask = mask.unsqueeze(2).expand_as(feat) + feat = feat[mask] + feat = feat.view(-1, dim) + return feat + + +def _transpose_and_gather_feat(feat, ind): + feat = feat.permute(0, 2, 3, 1).contiguous() + feat = feat.view(feat.size(0), -1, feat.size(3)) + feat = _gather_feat(feat, ind) + return feat + +def _topk(scores, K=40): + batch, cat, height, width = scores.size() + + topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K) + + topk_inds = topk_inds % (height * width) + topk_ys = (topk_inds / width).int().float() + topk_xs = (topk_inds % width).int().float() + + topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K) + topk_clses = (topk_ind / K).int() + topk_inds = _gather_feat(topk_inds.view(batch, -1, 1), topk_ind).view(batch, K) + topk_ys = _gather_feat(topk_ys.view(batch, -1, 1), topk_ind).view(batch, K) + topk_xs = _gather_feat(topk_xs.view(batch, -1, 1), topk_ind).view(batch, K) + + return topk_score, topk_inds, topk_clses, topk_ys, topk_xs + + +def _topk_channel(scores, K=40): + batch, cat, height, width = scores.size() + + topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K) + + topk_inds = topk_inds % (height * width) + topk_ys = (topk_inds / width).int().float() + topk_xs = (topk_inds % width).int().float() + + return topk_scores, topk_inds, topk_ys, topk_xs + +class Position_loss(nn.Module): + def __init__(self, output_w): + super(Position_loss, self).__init__() + + const = torch.Tensor( + [[-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], + [-1, 0], [0, -1], [-1, 0], [0, -1]]) #, [-1, 0], [0, -1]]) + self.register_buffer('const', const.unsqueeze(0).unsqueeze(0)) # b,c,2 + self.output_w = output_w + + self.num_joints = 9 + + def forward(self, output, batch, calib): + dim = _transpose_and_gather_feat(output['dim'], batch['ind']) + rot = _transpose_and_gather_feat(output['rot'], batch['ind']) + prob = _transpose_and_gather_feat(output['prob'], batch['ind']) + kps = _transpose_and_gather_feat(output['hps'], batch['ind']) + rot=rot.detach()### solving............ + + b = dim.size(0) + c = dim.size(1) + + mask = batch['hps_mask'] + mask = mask.float() + + cys = (batch['ind'] / self.output_w).int().float() + cxs = (batch['ind'] % self.output_w).int().float() + kps[..., ::2] = kps[..., ::2] + cxs.view(b, c, 1).expand(b, c, self.num_joints) + kps[..., 1::2] = kps[..., 1::2] + cys.view(b, c, 1).expand(b, c, self.num_joints) + + meta = dict(calib=calib) + const = torch.Tensor( + [[-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], [-1, 0], [0, -1], + [-1, 0], [0, -1], [-1, 0], [0, -1]]).unsqueeze(0).unsqueeze(0).cuda() + pinv,rot_y,alpha_pre, _ = gen_position(kps.reshape(b, kps.shape[1], 9, 2) * 4, dim, rot, meta, const) + + + kps_mask = mask + + mask2 = torch.sum(kps_mask, dim=2) + loss_mask = mask2 > 15 + + + pinv = pinv.view(b, c, 3, 1).squeeze(3) + + dim_mask = dim<0 + dim = torch.clamp(dim, 0 , 10) + dim_mask_score_mask = torch.sum(dim_mask, dim=2) + dim_mask_score_mask = 1 - (dim_mask_score_mask > 0).int() + dim_mask_score_mask = dim_mask_score_mask.float() + + off_set = (calib[:, 0, 3]) / calib[:, 0, 0] # [B, 1] + + box_pred = torch.cat((pinv, dim, rot_y), dim=2).detach() + loss = (pinv - batch['location']) + loss_norm = torch.norm(loss, p=2, dim=2) + loss_mask = loss_mask.float() + loss = loss_norm * loss_mask + mask_num = (loss_mask != 0).sum() + + loss = loss.sum() / (mask_num + 1) + dim_gt = batch['dim'].clone() # b,c,3 + location_gt = batch['location'] + ori_gt = batch['ori'] + dim_gt[dim_mask] = 0 + + + + gt_box = torch.cat((location_gt, dim_gt, ori_gt), dim=2) + box_pred = box_pred.view(b * c, -1) + gt_box = gt_box.view(b * c, -1) + + box_score = boxes_iou3d_gpu(box_pred, gt_box) + box_score = torch.diag(box_score).view(b, c) + prob = prob.squeeze(2) + box_score = box_score * loss_mask * dim_mask_score_mask + loss_prob = F.binary_cross_entropy_with_logits(prob, box_score.detach(), reduce=False) + loss_prob = loss_prob * loss_mask * dim_mask_score_mask + loss_prob = torch.sum(loss_prob, dim=1) + loss_prob = loss_prob.sum() / (mask_num + 1) + box_score = box_score * loss_mask + + box_score = box_score.sum() / (mask_num + 1e-3) + return loss, loss_prob, box_score +def gen_position(kps,dim,rot,meta,const): + """ Decode rotation and generate position. Notice that + unlike the official implementation, we do not transform back to pre-augmentation images. + And we also compenstate for the offset in camera in this function. + + We also change the order of the keypoints to the default projection order in this repo, + therefore the way we construct least-square matrix also changed. + + Args: + kps [torch.Tensor]: [B, C, 9, 2], keypoints relative offset from the center_int in augmented scale 4. network prediction. + dim [torch.Tensor]: [B, C, 3], width/height/length, the order is different. + rot [torch.Tensor]: [B, C, 8], rotation prediction from the network. + meta [Dict]: meta['calib'].shape = [B, 3, 4] -> calibration matrix for augmented images. + const [torch.Tensor]: const.shape = [1, 1, 16], constant helping parameter used in optimization. + Returns: + position [torch.Tensor]: [B, C, 3], 3D position. + rot_y [torch.Tensor]: [B, C, 1], 3D rotation theta. Decoded. + alpna_pre [torch.Tensor]: [B, C, 1], observation angle alpha decoded. The typo is consistent with the official typo. + kps [torch.Tensor]: [B, C, 18], basically same with the input (not transformed here). + """ + b=kps.size(0) + c=kps.size(1) + calib=meta['calib'] + off_set = (calib[:, 0, 3]) / calib[:, 0, 0] # [B, 1] + + #opinv = opinv.unsqueeze(1) + #opinv = opinv.expand(b, c, -1, -1).contiguous().view(-1, 2, 3).float() + kps = kps.view(b, c, -1, 2).permute(0, 1, 3, 2) + #hom = torch.ones(b, c, 1, 9).cuda() + #kps = torch.cat((kps, hom), dim=2).view(-1, 3, 9) + #kps = torch.bmm(opinv, kps).view(b, c, 2, 9) + kps = kps.permute(0, 1, 3, 2).contiguous().view(b, c, -1) # 16.32,18 + si = torch.zeros_like(kps[:, :, 0:1]) + calib[:, 0:1, 0:1] + + alpha_idx = rot[:, :, 1] > rot[:, :, 5] + alpha_idx = alpha_idx.float() + alpha1 = torch.atan(rot[:, :, 2] / rot[:, :, 3]) + (-0.5 * np.pi) + alpha2 = torch.atan(rot[:, :, 6] / rot[:, :, 7]) + (0.5 * np.pi) + alpna_pre = alpha1 * alpha_idx + alpha2 * (1 - alpha_idx) + alpna_pre = alpna_pre.unsqueeze(2) + + # alpna_pre=rot_gt + + rot_y = alpna_pre + torch.atan2(kps[:, :, 16:17] - calib[:, 0:1, 2:3], si) + rot_y[rot_y > np.pi] = rot_y[rot_y > np.pi] - 2 * np.pi + rot_y[rot_y < - np.pi] = rot_y[rot_y < - np.pi] + 2 * np.pi + + calib = calib.unsqueeze(1) + calib = calib.expand(b, c, -1, -1).contiguous() + kpoint = kps[:, :, :16] + f = calib[:, :, 0, 0].unsqueeze(2) + f = f.expand_as(kpoint) + cx, cy = calib[:, :, 0, 2].unsqueeze(2), calib[:, :, 1, 2].unsqueeze(2) + cxy = torch.cat((cx, cy), dim=2) + cxy = cxy.repeat(1, 1, 8) # b,c,16 + kp_norm = (kpoint - cxy) / f + + l = dim[:, :, 2:3] + h = dim[:, :, 1:2] + w = dim[:, :, 0:1] + cosori = torch.cos(rot_y) + sinori = torch.sin(rot_y) + + B = torch.zeros_like(kpoint) + C = torch.zeros_like(kpoint) + + kp = kp_norm.unsqueeze(3) # b,c,16,1 + const = const.expand(b, c, -1, -1) + A = torch.cat([const, kp], dim=3) + + ## The order of the point has been changed, so should the matrixes + # B[:, :, 0:1] = l * 0.5 * cosori + w * 0.5 * sinori + # B[:, :, 1:2] = h * 0.5 + # B[:, :, 2:3] = l * 0.5 * cosori - w * 0.5 * sinori + # B[:, :, 3:4] = h * 0.5 + # B[:, :, 4:5] = -l * 0.5 * cosori - w * 0.5 * sinori + # B[:, :, 5:6] = h * 0.5 + # B[:, :, 6:7] = -l * 0.5 * cosori + w * 0.5 * sinori + # B[:, :, 7:8] = h * 0.5 + # B[:, :, 8:9] = l * 0.5 * cosori + w * 0.5 * sinori + # B[:, :, 9:10] = -h * 0.5 + # B[:, :, 10:11] = l * 0.5 * cosori - w * 0.5 * sinori + # B[:, :, 11:12] = -h * 0.5 + # B[:, :, 12:13] = -l * 0.5 * cosori - w * 0.5 * sinori + # B[:, :, 13:14] = -h * 0.5 + # B[:, :, 14:15] = -l * 0.5 * cosori + w * 0.5 * sinori + # B[:, :, 15:16] = -h * 0.5 + + B[:, :, 0:1] = - l * 0.5 * cosori - w * 0.5 * sinori + B[:, :, 1:2] = - h * 0.5 + B[:, :, 2:3] = - l * 0.5 * cosori + w * 0.5 * sinori + B[:, :, 3:4] = - h * 0.5 + B[:, :, 4:5] = - l * 0.5 * cosori + w * 0.5 * sinori + B[:, :, 5:6] = h * 0.5 + B[:, :, 6:7] = l * 0.5 * cosori + w * 0.5 * sinori + B[:, :, 7:8] = h * 0.5 + B[:, :, 8:9] = l * 0.5 * cosori + w * 0.5 * sinori + B[:, :, 9:10] = - h * 0.5 + B[:, :, 10:11] = l * 0.5 * cosori - w * 0.5 * sinori + B[:, :, 11:12] = - h * 0.5 + B[:, :, 12:13] = l * 0.5 * cosori - w * 0.5 * sinori + B[:, :, 13:14] = h * 0.5 + B[:, :, 14:15] = - l * 0.5 * cosori - w * 0.5 * sinori + B[:, :, 15:16] = h * 0.5 + + C[:, :, 0:1] = l * 0.5 * sinori - w * 0.5 * cosori # - l * 0.5 * cosori - w * 0.5 * sinori + C[:, :, 1:2] = l * 0.5 * sinori - w * 0.5 * cosori + C[:, :, 2:3] = l * 0.5 * sinori + w * 0.5 * cosori # - l * 0.5 * cosori + w * 0.5 * sinori + C[:, :, 3:4] = l * 0.5 * sinori + w * 0.5 * cosori + C[:, :, 4:5] = l * 0.5 * sinori + w * 0.5 * cosori # - l * 0.5 * cosori + w * 0.5 * sinori + C[:, :, 5:6] = l * 0.5 * sinori + w * 0.5 * cosori + C[:, :, 6:7] = - l * 0.5 * sinori + w * 0.5 * cosori # l * 0.5 * cosori + w * 0.5 * sinori + C[:, :, 7:8] = - l * 0.5 * sinori + w * 0.5 * cosori + C[:, :, 8:9] = - l * 0.5 * sinori + w * 0.5 * cosori # l * 0.5 * cosori + w * 0.5 * sinori + C[:, :, 9:10] = - l * 0.5 * sinori + w * 0.5 * cosori + C[:, :, 10:11] = - l * 0.5 * sinori - w * 0.5 * cosori # l * 0.5 * cosori - w * 0.5 * sinori + C[:, :, 11:12] = - l * 0.5 * sinori - w * 0.5 * cosori + C[:, :, 12:13] = - l * 0.5 * sinori - w * 0.5 * cosori # l * 0.5 * cosori - w * 0.5 * sinori + C[:, :, 13:14] = - l * 0.5 * sinori - w * 0.5 * cosori + C[:, :, 14:15] = l * 0.5 * sinori - w * 0.5 * cosori # - l * 0.5 * cosori - w * 0.5 * sinori + C[:, :, 15:16] = l * 0.5 * sinori - w * 0.5 * cosori + + B = B - kp_norm * C + + # A=A*kps_mask1 + A = A.double() # For Numerical Stability. We add this line after repeated debugging. + AT = A.permute(0, 1, 3, 2) + AT = AT.view(b * c, 3, 16) + A = A.view(b * c, 16, 3) + B = B.view(b * c, 16, 1).float() + # mask = mask.unsqueeze(2) + + pinv = torch.bmm(AT, A) + pinv = torch.inverse(pinv + torch.randn_like(pinv) * 1e-8) # b*c 3 3 + pinv = torch.bmm(pinv, AT).float() # Change back to Float + pinv = torch.bmm(pinv, B) + pinv = pinv.view(b, c, 3, 1).squeeze(3) + + #pinv[:, :, 1] = pinv[:, :, 1] + dim[:, :, 0] / 2 ## No need to transfer to bottom point. We always use the center point unless in writing to KITTI + pinv[:, :, 0] -= off_set.unsqueeze(1) + + return pinv,rot_y,alpna_pre, kps diff --git a/visualDet3D/networks/utils/utils.py b/visualDet3D/networks/utils/utils.py index a5e8e7c..b61b186 100644 --- a/visualDet3D/networks/utils/utils.py +++ b/visualDet3D/networks/utils/utils.py @@ -5,6 +5,7 @@ import numpy as np import cv2 from functools import wraps +from visualDet3D.utils.utils import alpha2theta_3d, theta2alpha_3d def get_num_parameters(model): @@ -224,13 +225,13 @@ def forward(self, bbox_3d, tensor_p2): unnormalize bbox_3d [N, 7] with x, y, z, w, h, l, alpha tensor_p2: tensor of [3, 4] output: - [N, 8, 3] with corner point in camera frame + [N, 8, 3] with corner point in camera frame # 8 is determined by the shape of self.corner_matrix [N, 8, 3] with corner point in image frame [N, ] thetas """ relative_eight_corners = 0.5 * self.corner_matrix * bbox_3d[:, 3:6].unsqueeze(1) # [N, 8, 3] # [batch, N, ] - thetas = bbox_3d[:, 6] + torch.atan2(-bbox_3d[:, 2], bbox_3d[:, 0]) + 0.5 * np.pi + thetas = alpha2theta_3d(bbox_3d[..., 6], bbox_3d[..., 0], bbox_3d[..., 2], tensor_p2) _cos = torch.cos(thetas).unsqueeze(1) # [N, 1] _sin = torch.sin(thetas).unsqueeze(1) # [N, 1] rotated_corners_x, rotated_corners_z = ( @@ -243,7 +244,7 @@ def forward(self, bbox_3d, tensor_p2): abs_corners = rotated_corners + \ bbox_3d[:, 0:3].unsqueeze(1) # [N, 8, 3] camera_corners = torch.cat([abs_corners, - abs_corners.new_ones([abs_corners.shape[0], 8, 1])], + abs_corners.new_ones([abs_corners.shape[0], self.corner_matrix.shape[0], 1])], dim=-1).unsqueeze(3) # [N, 8, 4, 1] camera_coord = torch.matmul(tensor_p2, camera_corners).squeeze(-1) # [N, 8, 3] diff --git a/visualDet3D/utils/utils.py b/visualDet3D/utils/utils.py index e3a087b..ceaafa1 100644 --- a/visualDet3D/utils/utils.py +++ b/visualDet3D/utils/utils.py @@ -1,3 +1,4 @@ +import torch import numpy as np import cv2 import sys @@ -26,21 +27,57 @@ def log(self, step): name = key + '/' + self.data_split self.recorder.add_scalar(name, self.loss_stats[key].avg, step) -def convertAlpha2Rot(alpha, z3d, x3d): - - ry3d = alpha + np.arctan2(-z3d, x3d) + 0.5 * np.pi +def convertAlpha2Rot(alpha, cx, P2): + cx_p2 = P2[..., 0, 2] + fx_p2 = P2[..., 0, 0] + ry3d = alpha + np.arctan2(cx - cx_p2, fx_p2) ry3d[np.where(ry3d > np.pi)] -= 2 * np.pi ry3d[np.where(ry3d <= -np.pi)] += 2 * np.pi return ry3d -def convertRot2Alpha(ry3d, z3d, x3d): - - alpha = ry3d - np.arctan2(-z3d, x3d) - 0.5 * np.pi +def convertRot2Alpha(ry3d, cx, P2): + cx_p2 = P2[..., 0, 2] + fx_p2 = P2[..., 0, 0] + alpha = ry3d - np.arctan2(cx - cx_p2, fx_p2) alpha[alpha > np.pi] -= 2 * np.pi alpha[alpha <= -np.pi] += 2 * np.pi return alpha +def alpha2theta_3d(alpha, x, z, P2): + """ Convert alpha to theta with 3D position + Args: + alpha [torch.Tensor/ float or np.ndarray]: size: [...] + x []: size: [...] + z []: size: [...] + P2 [torch.Tensor/ np.ndarray]: size: [3, 4] + Returns: + theta []: size: [...] + """ + offset = P2[0, 3] / P2[0, 0] + if isinstance(alpha, torch.Tensor): + theta = alpha + torch.atan2(x + offset, z) + else: + theta = alpha + np.arctan2(x + offset, z) + return theta + +def theta2alpha_3d(theta, x, z, P2): + """ Convert theta to alpha with 3D position + Args: + theta [torch.Tensor/ float or np.ndarray]: size: [...] + x []: size: [...] + z []: size: [...] + P2 [torch.Tensor/ np.ndarray]: size: [3, 4] + Returns: + alpha []: size: [...] + """ + offset = P2[0, 3] / P2[0, 0] + if isinstance(theta, torch.Tensor): + alpha = theta - torch.atan2(x + offset, z) + else: + alpha = theta - np.arctan2(x + offset, z) + return alpha + def draw_3D_box(img, corners, color = (255, 255, 0)): """ draw 3D box in image with OpenCV,