-
Notifications
You must be signed in to change notification settings - Fork 95
Description
hi, thanks for your great work. after reviewing the issue#82, i understand the metrics refer to https://github.com/Wangt-CN/DisCo. then, my questions are following:
1 which generation video/frames do you use for FVD, L1,LPIPS,SSIM,PSNR? in stableanimator, do you use inference_basic.py to generate video/frames or use inference_op.py to generate them?
2 L1,LPIPS,SSIM,PSNR need same resolution compare between source video/frames and generated video and compare. so the metrics in your paper is based on 576×1024?
3 how much frames do you generate to do the evaluation metrics, 16 frames, 37frames or more?
4 seems hard to generate 576×1024 with 37frames or 16 frames by inference_op.py (while 512×512 is ok),due to out of CUDA memory even based on A100 80g, how do you overcome it?
5 by using code of evaluation metrics from https://github.com/Wangt-CN/DisCo,in condition: tiktok dataset resize to 576×1024 ,37 frames, inference_basic.py, i got different data: {'FID': 54.470088044072604, 'L1': 3.7926431617260896e-05, 'CosineSimilarity': 0.92670786, 'PSNR': 17.10885290446992, 'SSIM': 0.7296110461665567, 'LPIPS': 0.32609654036728114}, {'FVD-3DRN50': 23.856818179213583, 'FVD-3DInception': 485.0383682794837}, is there something wrong?
many thanks.