Dear author, can you provide more details on the training time of the model? Since pretrained checkpoint is not released. I want to understand more about the computing resource needed to reproduce the experiments. From the paper, I only know that the model is trained on 4xA100.