Skip to content

Commit 36ad399

Browse files
linjintaolinjintao.vendor
linjintao
authored and
linjintao.vendor
committed
first init
1 parent 5d60bb5 commit 36ad399

File tree

78 files changed

+13484
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+13484
-1
lines changed

.gitignore

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
*.swp
2+
*.pyc
3+
*.ipynb
4+
5+
**/__pycache__/**
6+
cached/*
7+
checkpoints/*
8+
data/*
9+
pretrained/*
10+
slrum_scripts/*
11+
slrum_configs/*

README.md

+174-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,174 @@
1-
# VLG
1+
# VLG: General Video Recognition with Web Textual Knowledge
2+
3+
## Usage
4+
5+
First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows:
6+
7+
```
8+
conda install -c pytorch pytorch torchvision
9+
pip install timm==0.3.2
10+
pip install ftfy regex tqdm
11+
pip install git+https://github.com/openai/CLIP.git
12+
pip install mmcv==1.3.14
13+
pip install decord
14+
pip install git+https://github.com/ildoonet/pytorch-randaugment
15+
```
16+
17+
## Data preparation
18+
19+
### Kinetics-Close/Kinetics-LT
20+
21+
Download the Kinetics videos from [here](https://github.com/open-mmlab/mmaction2/tree/master/tools/data/kinetics).
22+
23+
Then download and extract the [wiki text]() into the same directory. The directory tree of data is expected to be like this:
24+
25+
```
26+
./data/kinetics400/
27+
videos_train/
28+
vid1.mp4
29+
...
30+
videos_val/
31+
vid2.mp4
32+
...
33+
wiki/
34+
desc_0.txt
35+
...
36+
k400_LT_train_videos.txt
37+
k400_LT_val_videos.txt
38+
kinetics_video_train_list.txt
39+
kinetics_video_val_list.txt
40+
labels.txt
41+
```
42+
43+
### Kinetics-Fewshot
44+
45+
We used the split from [CMN](https://github.com/ffmpbgrnn/CMN/tree/master/kinetics-100) for Kinetics-Fewshot.
46+
47+
Download and extract the [wiki text]() into the same directory. The directory tree of data is expected to be like this:
48+
49+
```
50+
./data/kinetics100_base
51+
wiki/
52+
desc_0.txt
53+
...
54+
k100_base_train_list.txt
55+
labels.txt
56+
./data/kinetics100_test
57+
wiki/
58+
desc_0.txt
59+
...
60+
k100_support_query_list.txt
61+
labels.txt
62+
```
63+
64+
### Kinetics-Fewshot-C-way
65+
66+
we used the split from [Efficient-Prompt](https://github.com/ju-chen/Efficient-Prompt) for Kinetics-Fewshot-C-way.
67+
68+
Download and extract the [wiki text]() into the same directory. The directory tree of data is expected to be like this:
69+
70+
```
71+
./data/kinetics400_fewshot_C
72+
wiki/
73+
desc_0.txt
74+
...
75+
k400_fewshot_c_train_split_0.txt
76+
k400_fewshot_c_train_split_1.txt
77+
...
78+
k400_fewshot_c_train_split_9.txt
79+
kinetics_video_val_list.txt
80+
labels.txt
81+
```
82+
83+
### Kinetics-Openset
84+
85+
Download the split from [here]() for Kinetics-Openset.
86+
87+
Then download and extract the [wiki text]() into the same directory. The directory tree of data is expected to be like this:
88+
89+
```
90+
./data/kinetics400_openset
91+
wiki/
92+
desc_0.txt
93+
...
94+
k400_openset_train_list.txt
95+
k400_openset_val_list.txt
96+
labels.txt
97+
```
98+
99+
## Evaluation
100+
101+
To evaluate VLG, you can run:
102+
103+
- Pre-training stage:
104+
105+
```
106+
bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --eval-pretrain
107+
```
108+
109+
- Fine-tuning stage:
110+
111+
```
112+
bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval
113+
```
114+
115+
For fewshot cases, you can run:
116+
117+
```
118+
bash dist_train_arun_fewshot.sh ${CONFIG_PATH} 8
119+
```
120+
121+
For openset cases, you can run:
122+
123+
```
124+
bash dist_train_arun_openset.sh ${CONFIG_PATH} 8 --test --dist-eval --eval
125+
```
126+
127+
The `${CONFIG_PATH}` is the relative path of the corresponding configuration file in the `config` directory.
128+
129+
## Training
130+
131+
To train VLG on a single node with 8 GPUs for:
132+
133+
- Pre-training stage, run:
134+
135+
```
136+
bash dist_train_arun.sh ${CONFIG_PATH} 8
137+
```
138+
139+
- Fine-tuning stage:
140+
141+
- First, select the salient sentences by running this:
142+
143+
```
144+
bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --select
145+
```
146+
147+
- Then, running this:
148+
149+
```
150+
bash dist_train_arun.sh ${CONFIG_PATH} 8
151+
```
152+
153+
The `${CONFIG_PATH}` is the relative path of the corresponding configuration file in the `config` directory.
154+
155+
## Pretrained Models:
156+
157+
The pretrained models are provided in [Baidu Netdisk](), the code is xxx.
158+
159+
## Citation
160+
161+
If you are interested in our work, please cite as follows:
162+
163+
```
164+
@article{lin2022vlg,
165+
title={VLG: General Video Recognition with Web Textual Knowledge},
166+
author={Lin, Jintao and Liu, Zhaoyang and Wang, Wenhai and Wu, Wayne and Wang, Limin},
167+
journal={arXiv preprint arXiv:2212.01638},
168+
year={2022}
169+
}
170+
```
171+
172+
## Acknowledge
173+
174+
This repo contains modified codes from: [VL-LTR](https://github.com/ChangyaoTian/VL-LTR), [ActionCLIP](https://github.com/sallymmx/ActionCLIP), and [OpenMax](https://github.com/ma-xu/Open-Set-Recognition/tree/master/OSR/OpenMax).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
cfg = dict(
2+
pretrain_model='CVLP_r50',
3+
finetune_model='LGR_r50_no_init',
4+
desc_path='data/kinetics100_test',
5+
data_root_train='data/kinetics400/',
6+
data_root_val='data/kinetics400/',
7+
pretrained_clip='pretrained/RN50.pt',
8+
context_length=75,
9+
pretrain_cvlp_path='checkpoints/k100_base_pretrain_r50/',
10+
vis_backbone_path='checkpoints/k100_base_pretrain_r50/checkpoint.pth',
11+
12+
use_mcloader=True,
13+
data_set='k100_support_query',
14+
dataset='k100_support_query',
15+
drop_last=True,
16+
index_bias=0,
17+
nb_classes=5,
18+
19+
train_mode=False,
20+
train_list_file='data/kinetics100_test/k100_support_query_list.txt',
21+
val_list_file='data/kinetics100_test/k100_support_query_list.txt',
22+
23+
epochs=50,
24+
batch_size=int(16),
25+
use_res=True,
26+
val_interval=10,
27+
save_interval=50,
28+
29+
repeated_aug=False,
30+
mixup=0.,
31+
cutmix=0.,
32+
clip_ms=True,
33+
num_segments=16,
34+
new_length=1,
35+
is_video=True,
36+
select_num=50,
37+
38+
io_backend='disk',
39+
only_video=False,
40+
num_classes=24,
41+
42+
n_way=5,
43+
n_support=5,
44+
n_query=20,
45+
n_eposide=200,
46+
47+
dropout=0.,
48+
emb_dropout=0.,
49+
consider_fusion_module=False,
50+
51+
alpha=0.1,
52+
C=0.316,
53+
max_iter=1000,
54+
verbose=0
55+
)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
cfg = dict(
2+
pretrain_model='CVLP_vit16',
3+
finetune_model='LGR_vit16_no_init',
4+
desc_path='data/kinetics100_test',
5+
data_root_train='data/kinetics400/',
6+
data_root_val='data/kinetics400/',
7+
pretrained_clip='pretrained/ViT-B-16.pt',
8+
context_length=75,
9+
pretrain_cvlp_path='checkpoints/k100_base_pretrain_vit16/',
10+
vis_backbone_path='checkpoints/k100_base_pretrain_vit16/checkpoint.pth',
11+
op_type='cosine',
12+
with_param=False,
13+
14+
use_mcloader=True,
15+
data_set='k100_support_query',
16+
dataset='k100_support_query',
17+
drop_last=True,
18+
index_bias=0,
19+
nb_classes=5,
20+
21+
train_mode=False,
22+
train_list_file='data/kinetics100_test/k100_support_query_list.txt',
23+
val_list_file='data/kinetics100_test/k100_support_query_list.txt',
24+
25+
epochs=50,
26+
batch_size=int(16),
27+
use_res=True,
28+
29+
repeated_aug=False,
30+
mixup=0.,
31+
cutmix=0.,
32+
clip_ms=True,
33+
num_segments=16,
34+
new_length=1,
35+
is_video=True,
36+
select_num=50,
37+
38+
io_backend='disk',
39+
only_video=False,
40+
num_classes=24,
41+
42+
n_way=5,
43+
n_support=5,
44+
n_query=20,
45+
n_eposide=200,
46+
47+
consider_fusion_module=False,
48+
49+
alpha=0.0
50+
)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
cfg = dict(
2+
pretrain_model='CVLP_r50',
3+
finetune_model='LGR_r50_no_init',
4+
desc_path='data/kinetics100_test',
5+
data_root_train='data/kinetics400/',
6+
data_root_val='data/kinetics400/',
7+
pretrained_clip='pretrained/RN50.pt',
8+
context_length=75,
9+
pretrain_cvlp_path='checkpoints/k100_base_pretrain_r50/',
10+
vis_backbone_path='checkpoints/k100_base_pretrain_r50/checkpoint.pth',
11+
12+
use_mcloader=True,
13+
data_set='k100_support_query',
14+
dataset='k100_support_query',
15+
drop_last=True,
16+
index_bias=0,
17+
nb_classes=5,
18+
19+
train_mode=False,
20+
train_list_file='data/kinetics100_test/k100_support_query_list.txt',
21+
val_list_file='data/kinetics100_test/k100_support_query_list.txt',
22+
23+
epochs=50,
24+
batch_size=int(16),
25+
use_res=True,
26+
val_interval=10,
27+
save_interval=50,
28+
29+
repeated_aug=False,
30+
mixup=0.,
31+
cutmix=0.,
32+
clip_ms=True,
33+
num_segments=16,
34+
new_length=1,
35+
is_video=True,
36+
select_num=50,
37+
38+
io_backend='disk',
39+
only_video=False,
40+
num_classes=24,
41+
42+
n_way=5,
43+
n_support=5,
44+
n_query=20,
45+
n_eposide=200,
46+
47+
dropout=0.,
48+
emb_dropout=0.,
49+
consider_fusion_module=False,
50+
51+
alpha=0.1,
52+
C=0.316,
53+
max_iter=1000,
54+
verbose=0
55+
)

0 commit comments

Comments
 (0)