Im sure most of what we need is sitting in [here](https://github.com/pytorch/vision/tree/main/references/detection). Might be nice to do this if we have thousands of images + augmentation.