-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathSportsHHI.html
281 lines (246 loc) · 14.1 KB
/
SportsHHI.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
<!DOCTYPE html>
<html lang="en">
<head>
<title>SportsHHI Dataset</title>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="format-detection" content="telephone=no">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="author" content="">
<meta name="keywords" content="">
<meta name="description" content="">
<link rel="stylesheet" type="text/css" href="css/normalize.css">
<link rel="stylesheet" type="text/css" href="fonts/icomoon/icomoon.css">
<link rel="stylesheet" type="text/css" href="css/vendor.css">
<link rel="stylesheet" type="text/css" href="style.css">
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Cormorant+SC:wght@400;700&family=Jost:wght@300;400;700&display=swap" rel="stylesheet">
<!-- script
================================================== -->
<script src="js/modernizr.js"></script>
</head>
<body>
<div id="header-wrap">
<header id="header">
<div class="container">
<div class="inner-content">
<div class="grid">
<div class="main-logo">
<a href="index.html"><img src="pics/sv.png" alt="logo"></a>
</div>
<nav id="navbar">
<div class="main-menu">
<ul class="menu-list">
<li class="menu-item"><a href="index.html" data-effect="Home">Home</a></li>
<li class="menu-item"><a href="SportsAction.html" data-effect="About">SportsAction</a></li>
<li class="menu-item"><a href="SportsMOT.html" data-effect="Services">SportsMOT</a></li>
<li class="menu-item active"><a href="SportsHHI.html" class="active" data-effect="Projects">SportsHHI</a></li>
<li class="menu-item"><a href="SportsShot.html" data-effect="Latest Blog">SportsShot</a></li>
<li class="menu-item"><a href="SportsGrounding.html" data-effect="Testimonial">SportsGrounding</a></li>
<!-- <li class="menu-item "><a href="https://templatesjungle.gumroad.com/l/creatify-digital-marketing-website-template" class="nav-link" > <b> GET PRO </b> </a></li> -->
</ul>
<div class="hamburger">
<span class="bar"></span>
<span class="bar"></span>
<span class="bar"></span>
</div>
</div>
<!-- <a href="#" class="btn-hvr-effect">
<span>Let's Talk</span>
<i class="icon icon-long-arrow-right"></i>
</a> -->
<!--search-bar-->
</nav>
</div>
</div>
</div>
</header>
</div><!--header-wrap-->
<section id="billboard">
<div class="main-banner pattern-overlay">
<div class="banner-content" data-aos="fade-up">
<h3 class="banner-title">SportsHHI Dataset</h3>
<h2 class="section-subtitle ">SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos</h2>
<p>✉<a href="">Tao Wu</a>   ✉<a href="">Runyu He</a></p>
<p>✉<a href="http://mcg.nju.edu.cn/member/gswu/en/index.html">Gangshan Wu</a>   ✉<a href="http://wanglimin.github.io/">Limin Wang</a></p>
<div style="height: 20px;"></div>
<p><a href="http://mcg.nju.edu.cn/en/index.html">MCG Group @ Nanjing University</a></p>
<div class="btn-wrap">
<a href="https://arxiv.org/abs/2404.04565" class="btn-accent">paper</a>
<a href="https://github.com/MCG-NJU/SportsHHI" class="btn-accent">github</a>
</div>
</div><!--banner-content-->
<figure>
<div style="height: 20px;"></div>
<img src="pics/hhi1.png" alt="banner" class="banner-image">
<!-- <div style="height: 20px;"></div> -->
<small>Compared with three relation instances from VidVRD and AG datasets showed in the upper row, the bottom row shows interaction annotations in two sample keyframes of <i>SportsHHI</i>. The bounding boxes and interaction annotation of the same instance are displayed in the same color. <i>SportsHHI</i> provides complex multi-person scenes where various interactions between human pairs occur concurrently. It focuses on high-level interactions that require detailed spatio-temporal context reasoning.</small>
</figure>
</div>
</section>
<button id="scrollToTopBtn">Top</button>
<section id="about">
<div class="container">
<div class="row">
<div class="inner-content">
<div class="abstract-entry" data-aos="fade-up">
<div class="section-header">
<!-- <h2 class="section-subtitle liner">About Us</h2> -->
<h3 class="section-title">Abstract</h3>
</div>
<div class="detail-wrap">
<p>Video-based visual relation detection tasks, such as video scene graph generation, play important roles in fine-grained video understanding. However, current video visual relation detection datasets have two main limitations that hinder the progress of research in this area. First, they do not explore complex human-human interactions in multi-person scenarios. Second, the relation types of existing datasets have relatively low-level semantics and can be often recognized by appearance or simple prior information, without the need for detailed spatio-temporal context reasoning. Nevertheless, comprehending high-level interactions between humans is crucial for understanding complex multi-person videos, such as sports and surveillance videos. To address this issue, we propose a new video visual relation detection task: video human-human interaction detection, and build a dataset named <i>SportsHHI</i> for it. SportsHHI contains 34 high-level interaction classes from basketball and volleyball sports. 118,075 human bounding boxes and 50,649 interaction instances are annotated on 11,398 keyframes. To benchmark this, we propose a two-stage baseline method and conduct extensive experiments to reveal the key factors for a successful human-human interaction detector. We hope that SportsHHI can stimulate research on human interaction understanding in videos and promote the development of spatio-temporal context modeling techniques in video visual relation detection.</p>
</div><!--description-->
</div>
</div><!--inner-content-->
</div>
</div>
</section>
<section id="about">
<div class="container">
<div class="row">
<div class="inner-content">
<div class="abstract-entry" data-aos="fade-up">
<div class="section-header">
<h3 class="section-title">Demo Video</h3>
</div>
<div class="detail-wrap">
<p>Please choose "1080P" for better experience.</p>
</div>
<div class="video-container">
<video width="100%" height="100%" controls>
<source src="pics/demo1.mp4" type="video/mp4">
</video>
</div>
<div style="height: 20px;"></div>
<div class="video-container">
<video width="100%" height="100%" controls>
<source src="pics/demo2.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="about">
<div class="container">
<div class="row">
<div class="inner-content">
<div class="abstract-entry" data-aos="fade-up">
<div class="section-header">
<h3 class="section-title">Data Construction</h3>
</div>
<div class="detail-wrap">
<h2 class="section-subtitle liner">Interaction classes definition</h2>
<p>We focus on high level interactions between athletes, which are semantically complex and require detailed spatio-temporal context reasoning to recognize. With the guidance of professional athletes, we generated the final interaction vocabulary through iterative trial labeling and modification.</p>
</div>
<figure>
<img src="pics/hhi21.png" alt="category" style="width: 60%; height: auto;">
</figure>
<div style="height: 40px;"></div>
<div class="detail-wrap">
<h2 class="section-subtitle liner">Interaction instance formulation</h2>
<p>Following common practice in AVA and AG datasets, we define interaction instances at the frame level, with reference to a long-term spatial-temporal context. Each interaction instance can be formulated as a triplet ⟨S,I,O⟩ where S and O denote the bounding boxes of the subject and object person and I denotes the interaction category between them from the in teraction vocabulary. When the subject person or the object person is out of view, we annotate S or O as “invisible”. This happens infrequently and we will provide statistics about it in the appendix.</p>
</div>
<div class="detail-wrap">
<h2 class="section-subtitle liner">Data preparation</h2>
<p>We carefully selected 80 basketball and 80 volleyball videos from the MultiSports dataset to cover various types of games including men’s, women’s, national team, and club games. The average length of the videos is 603 frames and the frame rate of the videos is 25FPS. All videos have a high resolution of 720P.</p>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="about">
<div class="container">
<div class="row">
<div class="inner-content">
<div class="abstract-entry" data-aos="fade-up">
<div class="section-header">
<!-- <h2 class="section-subtitle liner">About Us</h2> -->
<h3 class="section-title">Dataset Statistics and Characteristics</h3>
</div>
<div class="detail-wrap">
<p>OurSportsHHI provides interaction instance annotations on keyframes of basketball and volleyball videos. As shown in Table 1, SportHHI contains 34 interaction classes, and 11398 keyframes in total are annotated with instances. Current video scene graph datasets deal with general relations between various kinds of objects while our SportsHHI focuses on high-level interaction between humans. It is reasonable that datasets for video scene graph generation have a larger scale than SportsHHI. However, our SportHHI still has a comparable size to the popular VidVRD dataset for video scene graph generation. Our SportsHHI has more annotated keyframes (11398 versus 5834) and the number of interaction instances is close (55631 versus 50649). One important characteristic of our SportsHHI is the multi-person scenarios. The average number of people per frame in our SportsHHI is much higher than AG andVidVRD. AG only contains one person in each video and there is virtually no multi-person scenario in videos of VidVRD. Human-human interaction is barely involved in these datasets.</p>
<h2 class="section-subtitle liner">Comparison of statistics between video scene graph generation datasets and SportsHHI</h2>
</div>
<figure>
<img src="pics/hhi31.png" alt="category" style="width: 60%; height: auto;">
</figure>
<div style="height: 30px;"></div>
<div class="detail-wrap">
<h2 class="section-subtitle liner">The number of interaction instances of each class sorted by descending order</h2>
</div>
<figure>
<img src="pics/hhi41.png" alt="statistics" style="width: 80%; height: auto;">
</figure>
<div style="height: 30px;"></div>
<div class="detail-wrap">
<h2 class="section-subtitle liner">Statistics comparisons between SportsHHI and VidVRD</h2>
</div>
<figure>
<img src="pics/hhi51.png" alt="statistics" style="width: 60%; height: auto;">
</figure>
</div>
</div>
</div>
</div>
</section>
<section id="about">
<div class="container">
<div class="row">
<div class="inner-content">
<div class="abstract-entry" data-aos="fade-up">
<div class="section-header">
<!-- <h2 class="section-subtitle liner">About Us</h2> -->
<h3 class="section-title">Metrics</h3>
</div>
<div class="detail-wrap">
<p>A prediction is considered as a true positive if and only if its subject and object bounding boxes both have an IoU overlap higher than a preset threshold with their counterparts in a ground-truth interaction instance and the predicted interaction class matches the ground truth. Following VidVRD, we set the IoU threshold to 0.5. Following the video scene graph generation task, we use Recall@K(K is the number of predictions) as the evaluation metric. Mean average precision is adopted as an evaluation metric for many detection tasks. However, this metric is discarded by many former visual relation detection benchmarks because of their incomplete annotation. This issue does not exist in SportsHHI. In our experiments, mAP is also reported for the interaction detection task. We argue that mAP is a more difficult and informative metric. Two different modes for model training and evaluation are used: 1)human-human interaction detection (HHIDet) which expects input video frames and predicts human boxes and interaction labels; 2) human-human interaction classification(HHICls) which directly uses gound-truth human bounding boxes and predicts human-human interaction classes.</p>
</div>
</div>
</div>
</div>
</div>
</section>
<section id="about">
<div class="container">
<div class="row">
<div class="inner-content">
<div class="abstract-entry" data-aos="fade-up">
<div class="section-header">
<!-- <h2 class="section-subtitle liner">About Us</h2> -->
<h3 class="section-title">Download</h3>
</div>
<div class="detail-wrap">
<p>Please refer to the huggingface page or the competition page to download the dataset for more information.</p>
</div>
<div class="btn-wrap">
<a href="https://huggingface.co/datasets/MCG-NJU/SportsHHI" class="btn-accent">hugging face</a>
<a href="https://codalab.lisn.upsaclay.fr/competitions/20978" class="btn-accent">competition</a>
</div>
<div style="height: 50px;"></div>
</div>
</div>
</div>
</div>
</section>
<div id="footer-bottom">
<div class="container">
<div class="grid">
<div class="copyright">
<p>© 2024 <a href="https://mcg.nju.edu.cn/">Multimedia Computing Group, Nanjing University.</a> All rights reserved.</p>
</div>
</div><!--grid-->
</div>
</div>
<script src="js/jquery-1.11.0.min.js"></script>
<script src="js/plugins.js"></script>
<script src="js/slideNav.min.js"></script>
<script src="js/slideNav.js"></script>
<script src="js/script.js"></script>
</body>
</html>