You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2024-06-20-fvmd-1.md
+6-6
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
layout: distill
3
-
title: Video Evaluation Metrics 1/2 - A Review of the State of the Art
3
+
title: A Review of Video Evaluation Metrics
4
4
description: Video generative models have been rapidly improving recently, but how do we evaluate them efficiently and effectively? In this blog post, we review the existing evaluation metrics and highlight their pros and cons.
5
5
tags: metrics video generative-models
6
6
giscus_comments: true
@@ -62,7 +62,7 @@ toc:
62
62
63
63
<divclass="row mt-3">
64
64
<div class="col-sm mt-3 mt-md-0">
65
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/video-metrics.png" class="img-fluid rounded z-depth-1" %}
65
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/video-metrics.png" class="img-fluid rounded z-depth-1" %}
66
66
</div>
67
67
</div>
68
68
<divclass="caption">
@@ -113,7 +113,7 @@ The models (a) to (e) are sorted based on human ratings collected through a user
@@ -142,20 +142,20 @@ We also present visualizations of video frames for one randomly selected scene t
142
142
143
143
<divclass="row mt-3">
144
144
<div class="col-sm mt-3 mt-md-0">
145
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/fig-eval-metric-comparison-v0.png" class="img-fluid rounded z-depth-1" %}
145
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/fig-eval-metric-comparison-v0.jpg" class="img-fluid rounded z-depth-1" %}
146
146
</div>
147
147
</div>
148
148
149
149
<details>
150
150
<summary>click here for more frames comparison</summary>
151
151
<divclass="row mt-3">
152
152
<div class="col-sm mt-3 mt-md-0">
153
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/fig-eval-metric-comparison-v1.png" class="img-fluid rounded z-depth-1" %}
153
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/fig-eval-metric-comparison-v1.jpg" class="img-fluid rounded z-depth-1" %}
154
154
</div>
155
155
</div>
156
156
</details>
157
157
158
158
## Summary
159
159
We review the video evaluation metrics used to assess video generative models. These metrics can be categorized into two types: set-to-set comparison metrics (FID, FVD, KVD, FVMD, PSNR, and SSIM) and unary metrics (VBench, CLIP score, and IS). We discuss the pros and cons of each type and provide a detailed comparison using the TikTok dataset. The results show that the **FVMD metric aligns better with human judgments than other metrics, especially for assessing motion consistency**. This suggests that FVMD is a promising metric for evaluating video generative models.
160
160
161
-
Wonder why FVMD performs so much better than other metrics? Check out [the second part of our blog post](https://qiyan98.github.io/blog/2024/fvmd-2/) to find out more! We will delve into the details of the FVMD metric and explain why it is more effective in assessing video quality and motion consistency.
161
+
Wonder why FVMD performs so much better than other metrics? Check out [the second part of our blog post](https://dsl-lab.github.io/blog/2024/fvmd-2/) to find out more! We will delve into the details of the FVMD metric and explain why it is more effective in assessing video quality and motion consistency.
Copy file name to clipboardExpand all lines: _posts/2024-06-20-fvmd-2.md
+22-22
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
layout: distill
3
-
title: Video Evaluation Metrics 2/2 - Evaluating Motion Consistency by Fréchet Video Motion Distance (FVMD)
3
+
title: Evaluating Motion Consistency by Fréchet Video Motion Distance (FVMD)
4
4
description: In this blog post, we introduce a promising new metric for video generative models, Fréchet Video Motion Distance (FVMD), which focuses on the motion consistency of generated videos.
5
5
tags: metrics video generative-models
6
6
giscus_comments: true
@@ -67,7 +67,7 @@ toc:
67
67
68
68
Recently, diffusion models have demonstrated remarkable capabilities in high-quality image generation. This advancement has been extended to the video domain, giving rise to text-to-video diffusion models, such as [Pika](https://pika.art/home), [Runway Gen-2](https://research.runwayml.com/gen2), and [Sora](https://openai.com/index/sora/) <d-citekey="videoworldsimulators2024"></d-cite>.
69
69
70
-
Despite the rapid development of video generation models, research on evaluation metrics for video generation remains insufficient (see more discussion on our [blog](https://qiyan98.github.io/blog/2024/fvmd-1/)).
70
+
Despite the rapid development of video generation models, research on evaluation metrics for video generation remains insufficient (see more discussion on our [blog](https://dsl-lab.github.io/blog/2024/fvmd-1/)).
71
71
For example, FID-VID <d-citekey="balaji2019conditional"></d-cite> and FVD <d-citekey="unterthiner2018towards"></d-cite> are commonly used video metrics. FID-VID focuses on visual quality by comparing synthesized *frames* to real ones, ignoring motion quality. FVD adds temporal coherence by using features from a *pre-trained action recognition model*, Inflated 3D Convnet (I3D) <d-citekey="carreira2017quo"></d-cite>.
72
72
Recently, VBench <d-citekey="huang2023vbench"></d-cite> introduces a 16-dimensional evaluation suite for text-to-video generative models. However, VBench's protocols for temporal consistency, like temporal flickering and motion smoothness, favor videos with smooth or static movement, *neglecting high-quality videos with intense motion*, such as dancing and sports videos.
73
73
@@ -80,7 +80,7 @@ The code is available at [GitHub](https://github.com/DSL-Lab/FVMD-frechet-video-
80
80
## Fréchet Video Motion Distance (FVMD)
81
81
<divclass="row mt-3">
82
82
<div class="col-sm mt-3 mt-md-0">
83
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/pipeline.png" class="img-fluid rounded z-depth-1" %}
83
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/pipeline.png" class="img-fluid rounded z-depth-1" %}
84
84
</div>
85
85
</div>
86
86
<divclass="caption">
@@ -92,10 +92,10 @@ The core idea of FVMD is to measure temporal motion consistency based on **the p
92
92
93
93
<divclass="row mt-3">
94
94
<div class="col-sm mt-3 mt-md-0">
95
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/tracking_demo_1.gif" class="img-fluid rounded z-depth-1" %}
95
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/tracking_demo_1.gif" class="img-fluid rounded z-depth-1" %}
96
96
</div>
97
97
<div class="col-sm mt-3 mt-md-0">
98
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/tracking_demo_2.gif" class="img-fluid rounded z-depth-1" %}
98
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/tracking_demo_2.gif" class="img-fluid rounded z-depth-1" %}
99
99
</div>
100
100
</div>
101
101
<divclass="caption">
@@ -155,24 +155,24 @@ If two videos are of very different quality, their histograms should look very *
155
155
156
156
<divclass="row mt-3">
157
157
<div class="col-sm mt-3 mt-md-0">
158
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/gt.gif" class="img-fluid rounded z-depth-1" %}
158
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/gt.gif" class="img-fluid rounded z-depth-1" %}
159
159
</div>
160
160
<div class="col-sm mt-3 mt-md-0">
161
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/disco.gif" class="img-fluid rounded z-depth-1" %}
161
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/disco.gif" class="img-fluid rounded z-depth-1" %}
162
162
</div>
163
163
<div class="col-sm mt-3 mt-md-0">
164
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/anyone.gif" class="img-fluid rounded z-depth-1" %}
164
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/anyone.gif" class="img-fluid rounded z-depth-1" %}
165
165
</div>
166
166
</div>
167
167
<divclass="row mt-3">
168
168
<div class="col-sm mt-3 mt-md-0">
169
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/gt_tracking.gif" class="img-fluid rounded z-depth-1" %}
169
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/gt_tracking.gif" class="img-fluid rounded z-depth-1" %}
170
170
</div>
171
171
<div class="col-sm mt-3 mt-md-0">
172
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/disco_tracking.gif" class="img-fluid rounded z-depth-1" %}
172
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/disco_tracking.gif" class="img-fluid rounded z-depth-1" %}
173
173
</div>
174
174
<div class="col-sm mt-3 mt-md-0">
175
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/anyone_tracking.gif" class="img-fluid rounded z-depth-1" %}
175
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/anyone_tracking.gif" class="img-fluid rounded z-depth-1" %}
176
176
</div>
177
177
</div>
178
178
<divclass="caption">
@@ -185,13 +185,13 @@ Above, we show three pieces of video from the TikTok dataset <d-cite key="jafari
185
185
186
186
<divclass="row mt-3">
187
187
<div class="col-sm mt-3 mt-md-0">
188
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/gt_v_1d.png" class="img-fluid rounded z-depth-1" %}
188
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/gt_v_1d.png" class="img-fluid rounded z-depth-1" %}
189
189
</div>
190
190
<div class="col-sm mt-3 mt-md-0">
191
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/disco_v_1d.png" class="img-fluid rounded z-depth-1" %}
191
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/disco_v_1d.png" class="img-fluid rounded z-depth-1" %}
192
192
</div>
193
193
<div class="col-sm mt-3 mt-md-0">
194
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/anyone_v_1d.png" class="img-fluid rounded z-depth-1" %}
194
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/anyone_v_1d.png" class="img-fluid rounded z-depth-1" %}
195
195
</div>
196
196
</div>
197
197
<divclass="caption">
@@ -208,13 +208,13 @@ This is exactly what we want to observe in the motion features! These features c
208
208
<summary>click here for 2D histogram result</summary>
209
209
<divclass="row mt-3">
210
210
<div class="col-sm mt-3 mt-md-0">
211
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/gt_v_2d.png" class="img-fluid rounded z-depth-1" %}
211
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/gt_v_2d.png" class="img-fluid rounded z-depth-1" %}
212
212
</div>
213
213
<div class="col-sm mt-3 mt-md-0">
214
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/disco_v_2d.png" class="img-fluid rounded z-depth-1" %}
214
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/disco_v_2d.png" class="img-fluid rounded z-depth-1" %}
215
215
</div>
216
216
<div class="col-sm mt-3 mt-md-0">
217
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/anyone_v_2d.png" class="img-fluid rounded z-depth-1" %}
217
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/anyone_v_2d.png" class="img-fluid rounded z-depth-1" %}
218
218
</div>
219
219
</div>
220
220
<divclass="caption">
@@ -243,7 +243,7 @@ To verify the efficacy of the extracted motion features in representing motion p
243
243
244
244
<divclass="row mt-3">
245
245
<div class="col-sm mt-3 mt-md-0">
246
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/sanity_check.png" class="img-fluid rounded z-depth-1" %}
246
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/sanity_check.png" class="img-fluid rounded z-depth-1" %}
247
247
</div>
248
248
</div>
249
249
<divclass="caption">
@@ -256,7 +256,7 @@ When measuring the FVMD of **two subsets from the same dataset**, it **converges
256
256
Moreover, a sensitivity analysis is conducted to evaluate if the proposed metric can effectively detect temporal inconsistencies in generated videos, *i.e.*, being **numerically sensitive to temporal noises**. To this end, artificially-made temporal noises are injected to the TikTok dancing dataset <d-citekey="jafarian2022self"></d-cite> and FVMD scores are computed to assess its sensitivity to data corruption.
257
257
<divclass="row mt-3">
258
258
<div class="col-sm mt-3 mt-md-0">
259
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/sensitivity_ana.png" class="img-fluid rounded z-depth-1" %}
259
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/sensitivity_ana.png" class="img-fluid rounded z-depth-1" %}
260
260
</div>
261
261
</div>
262
262
<divclass="caption">
@@ -275,7 +275,7 @@ Note that the models (a) to (e) are sorted based on human ratings collected thro
275
275
276
276
<divclass="row mt-3">
277
277
<div class="col-sm mt-3 mt-md-0">
278
-
{% include video.liquid path="assets/video/fvmd/FVMD.mp4" class="img-fluid rounded z-depth-1" controls=true autoplay=true %}
278
+
{% include video.liquid path="blog/2024/fvmd/FVMD.mp4" class="img-fluid rounded z-depth-1" controls=true autoplay=true %}
279
279
</div>
280
280
</div>
281
281
<divclass="caption">
@@ -305,7 +305,7 @@ The second setting, **One Metric Diverse**, evaluates the agreement among differ
305
305
306
306
<divclass="row mt-3">
307
307
<div class="col-sm mt-3 mt-md-0">
308
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/human_study_eql.png" class="img-fluid rounded z-depth-1" %}
308
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/human_study_eql.png" class="img-fluid rounded z-depth-1" %}
309
309
</div>
310
310
</div>
311
311
<divclass="caption">
@@ -314,7 +314,7 @@ The second setting, **One Metric Diverse**, evaluates the agreement among differ
314
314
315
315
<divclass="row mt-3">
316
316
<div class="col-sm mt-3 mt-md-0">
317
-
{% include figure.liquid loading="eager" path="assets/img/fvmd/human_study_div.png" class="img-fluid rounded z-depth-1" %}
317
+
{% include figure.liquid loading="eager" path="blog/2024/fvmd/human_study_div.png" class="img-fluid rounded z-depth-1" %}
0 commit comments