In this documentation, we will primarily focus on training and inference of our MOTIP model on the relevant MOT benchmarks. All the configurations corresponding to our experiments are stored in the configs folder. You can also customize the configuration files according to your own requirements.
To expedite the training process, we’ll begin by pre-training the DETR component of the model. Typically, training the DETR model on a specific dataset (like DanceTrack, SportsMOT, etc.) is quite efficient, taking only a few hours.
💾 Similar to many other methods (e.g., MOTR and MeMOTR), we also use COCO pre-trained DETR weights for initialization. You can obtain them from the following links:
- Deformable DETR: [official repo] [our repo]
To accelerate the convergence, we will first pre-train DETR on the corresponding dataset (target dataset) to serve as the initialization for subsequent MOTIP training.
💾 We recommend directly using our pre-trained DETR weights, which are stored in the model zoo. If needed, you can pre-train it yourself using the script provided below.
You should put necessary pre-trained weights into ./pretrains/
directory as default.
All our pre-train scripts follows the template script below. You'll need to fill the <placeholders>
according to your requirements:
accelerate launch --num_processes=8 train.py --data-root <data dir> --exp-name <exp name> --config-path <.yaml config file path>
For example, you can pre-train a Deformable-DETR model on DanceTrack as follows:
accelerate launch --num_processes=8 train.py --data-root ./datasets/ --exp-name pretrain_r50_deformable_detr_dancetrack --config-path ./configs/pretrain_r50_deformable_detr_dancetrack.yaml
Please referring to here to get more information.
Once you have the DETR pre-trained weights on the corresponding dataset (target dataset), you can use the following script to train your own MOTIP model.
All our training scripts follow the template script below. You'll need to fill the <placeholders>
according to your requirements:
accelerate launch --num_processes=8 train.py --data-root <DATADIR> --exp-name <exp name> --config-path <.yaml config file path>
For example, you can the default model on DanceTrack as follows:
accelerate launch --num_processes=8 train.py --data-root ./datasets/ --exp-name r50_deformable_detr_motip_dancetrack --config-path ./configs/r50_deformable_detr_motip_dancetrack.yaml
Using this script, you can achieve 69.5 HOTA on DanceTrack test set. There is a relatively high instability (~ 1.5) which is also encountered in other work (e.g., OC-SORT, MOTRv2, MeMOTR).
If your GPUs have less than 24GB CUDA memory, we offer the gradient checkpoint technology. You can set --detr-num-checkpoint-frames
to 2
(< 16GB) or 1
(< 12GB) to reduce the CUDA memory requirements.
We have two different inference modes:
- Without ground truth annotations (e.g. DanceTrack test, SportsMOT test), submission scripts can generate tracker files for submission.
- With ground truth annotations, evaluation scripts can produce tracking results and obtain evaluation results.
📌 Different inference behaviors are controlled by the runtime parameter --inference-mode
.
You can obtain the tracking results (tracker files) using the following template script:
accelerate launch --num_processes=8 submit_and_evaluate.py --data-root <DATADIR> --inference-mode submit --config-path <.yaml config file path> --inference-model <checkpoint path> --outputs-dir <outputs dir> --inference-dataset <dataset name> --inference-split <split name>
For example, you can get our default results on the DanceTrack test set as follows:
accelerate launch --num_processes=8 submit_and_evaluate.py --data-root ./datasets/ --inference-mode submit --config-path ./configs/r50_deformable_detr_motip_dancetrack.yaml --inference-model ./outputs/r50_deformable_detr_motip_dancetrack/r50_deformable_detr_motip_dancetrack.pth --outputs-dir ./outputs/r50_deformable_detr_motip_dancetrack/ --inference-dataset DanceTrack --inference-split test
🏎️ You can add --inference-dtype FP16
to the script to use float16 for inference. This can improve inference speed by over 30% with only a slight impact on tracking performance (about 0.5 HOTA on DanceTrack test).
You can obtain both the tracking results (tracker files) and evaluation results using the following template script:
accelerate launch --num_processes=8 submit_and_evaluate.py --data-root <DATADIR> --inference-mode evaluate --config-path <.yaml config file path> --inference-model <checkpoint path> --outputs-dir <outputs dir> --inference-dataset <dataset name> --inference-split <split name>
For example, you can get the evaluation results on the DanceTrack val set as follows:
accelerate launch --num_processes=8 submit_and_evaluate.py --data-root ./datasets/ --inference-mode evaluate --config-path ./configs/r50_deformable_detr_motip_dancetrack.yaml --inference-model ./outputs/r50_deformable_detr_motip_dancetrack/r50_deformable_detr_motip_dancetrack.pth --outputs-dir ./outputs/r50_deformable_detr_motip_dancetrack/ --inference-dataset DanceTrack --inference-split val