Description
Hi guys,
I recently started working with computer vision as part of my thesis. Initially, I achieved good results using the Ultralytics YOLO model, but its license poses a challenge for future use cases. After some research, I believe this YOLO repository is the best fit for my needs. I also tried YOLOX, DAMO-YOLO, and YOLO-NAS but couldn’t get them to work.
I now have some questions and problems regarding the training process and on how to correctly load a trained model for predictions. I would be very happy if someone can give me some ideas on how to solve these problems.
I am training on a custom dataset in the normalized yolo labeling format (c,x,y,w,h) with the following folder structure and configs:
#######################################
dataset/
├── images
│ ├── train
│ └── val
├── labels
│ ├── train
│ └── val
└── classes.txt
#######################################
dataset.yaml:
path: dataset_dog/
train: train
validation: val
class_num: 4
class_list: ["Bicycle","Car","Cat","Dog"]
#######################################
config.yaml:
hydra:
run:
dir: runs
name: v9-train-dog
defaults:
- _self_
- task: train
- dataset: yolo_dog
- model: v9-m
- general
#######################################
general.yaml:
device: cuda
cpu_num: 8
image_size: [640, 640]
out_path: runs
exist_ok: True
lucky_number: 10
use_wandb: False
use_tensorboard: True
weight: yolo/v9-m.pt
#######################################
-
While training on my custom dataset the training metrics look like this:
-
After training the last checkpoint is being saved in the runs folder. I am currently loading the model how it is suggested in the lazy.py and I do not get any predictions on my pictures. I am not sure if I am doing something wrong with the training or the prediction and on how to validate if the model was trained correctly or not.
The configs for predictions look like this:
#######################################
config_inference.yaml
hydra:
run:
dir: runs
name: v9-inf
defaults:
- _self_
- task: inference
- dataset: yolo_dog
- model: v9-m
- general_inference
#######################################
general_inference.yaml
device: cuda
cpu_num: 8
image_size: [640, 640]
out_path: runs
exist_ok: True
lucky_number: 10
use_wandb: False
use_tensorboard: True
weight: yolo/v9-m-custom.ckpt
#######################################
What is the correct way to load the model for predictions? I want to setup a pipeline in which I can predict on camera (not webcam) pictures in real time inside a loop with images in opencv format and get the results to work with (which class was predictied etc. like in ultralytics).