MCG-NJU
diff --git a/‎.gitignore
+11 b/‎.gitignore
+11
diff --git a/‎README.md
+174-1 b/‎README.md
+174-1
diff --git a/‎configs/kinetics100_fewshot/finetune/k100_fewshot_finetune_r50.py
+55 b/‎configs/kinetics100_fewshot/finetune/k100_fewshot_finetune_r50.py
+55
diff --git a/‎configs/kinetics100_fewshot/finetune/k100_fewshot_finetune_vit16.py
+50 b/‎configs/kinetics100_fewshot/finetune/k100_fewshot_finetune_vit16.py
+50
diff --git a/‎configs/kinetics100_fewshot/finetune/k100_fewshot_linear_probe.py
+55 b/‎configs/kinetics100_fewshot/finetune/k100_fewshot_linear_probe.py
+55
@@ -0,0 +1,11 @@
+*.swp
+*.pyc
+*.ipynb
+
+**/__pycache__/**
+cached/*
+checkpoints/*
+data/*
+pretrained/*
+slrum_scripts/*
+slrum_configs/*
@@ -1 +1,174 @@
-# VLG
+# VLG: General Video Recognition with Web Textual Knowledge
+
+## Usage
+
+First, install PyTorch 1.7.1+, torchvision 0.8.2+ and other required packages as follows:
+
+```
+conda install -c pytorch pytorch torchvision
+pip install timm==0.3.2
+pip install ftfy regex tqdm
+pip install git+https://github.com/openai/CLIP.git
+pip install mmcv==1.3.14
+pip install decord
+pip install git+https://github.com/ildoonet/pytorch-randaugment
+```
+
+## Data preparation
+
+### Kinetics-Close/Kinetics-LT
+
+Download the Kinetics videos from [here](https://github.com/open-mmlab/mmaction2/tree/master/tools/data/kinetics).
+
+Then download and extract the [wiki text]() into the same directory. The directory tree of data is expected to be like this:
+
+```
+./data/kinetics400/
+  videos_train/
+    vid1.mp4
+    ...
+  videos_val/
+    vid2.mp4
+    ...
+  wiki/
+    desc_0.txt
+    ...
+  k400_LT_train_videos.txt
+  k400_LT_val_videos.txt
+  kinetics_video_train_list.txt
+  kinetics_video_val_list.txt
+  labels.txt
+```
+
+### Kinetics-Fewshot
+
+We used the split from [CMN](https://github.com/ffmpbgrnn/CMN/tree/master/kinetics-100) for Kinetics-Fewshot.
+
+Download and extract the [wiki text]() into the same directory. The directory tree of data is expected to be like this:
+
+```
+./data/kinetics100_base
+  wiki/
+    desc_0.txt
+    ...
+  k100_base_train_list.txt
+  labels.txt
+./data/kinetics100_test
+  wiki/
+    desc_0.txt
+    ...
+  k100_support_query_list.txt
+  labels.txt
+```
+
+### Kinetics-Fewshot-C-way
+
+we used the split from [Efficient-Prompt](https://github.com/ju-chen/Efficient-Prompt) for Kinetics-Fewshot-C-way.
+
+Download and extract the [wiki text]() into the same directory. The directory tree of data is expected to be like this:
+
+```
+./data/kinetics400_fewshot_C
+  wiki/
+    desc_0.txt
+    ...
+  k400_fewshot_c_train_split_0.txt
+  k400_fewshot_c_train_split_1.txt
+  ...
+  k400_fewshot_c_train_split_9.txt
+  kinetics_video_val_list.txt
+  labels.txt
+```
+
+### Kinetics-Openset
+
+Download the split from [here]() for Kinetics-Openset.
+
+Then download and extract the [wiki text]() into the same directory. The directory tree of data is expected to be like this:
+
+```
+./data/kinetics400_openset
+  wiki/
+    desc_0.txt
+    ...
+  k400_openset_train_list.txt
+  k400_openset_val_list.txt
+  labels.txt
+```
+
+## Evaluation
+
+To evaluate VLG, you can run:
+
+- Pre-training stage:
+
+```
+bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --eval-pretrain
+```
+
+- Fine-tuning stage:
+
+```
+bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval
+```
+
+For fewshot cases, you can run:
+
+```
+bash dist_train_arun_fewshot.sh ${CONFIG_PATH} 8
+```
+
+For openset cases, you can run:
+
+```
+bash dist_train_arun_openset.sh ${CONFIG_PATH} 8 --test --dist-eval --eval
+```
+
+The `${CONFIG_PATH}` is the relative path of the corresponding configuration file in the `config` directory.
+
+## Training
+
+To train VLG on a single node with 8 GPUs for:
+
+- Pre-training stage, run:
+
+```
+bash dist_train_arun.sh ${CONFIG_PATH} 8
+```
+
+- Fine-tuning stage:
+
+  - First, select the salient sentences by running this:
+
+    ```
+    bash dist_train_arun.sh ${CONFIG_PATH} 8 --eval --select 
+    ```
+
+  - Then, running this:
+
+    ```
+    bash dist_train_arun.sh ${CONFIG_PATH} 8
+    ```
+
+The `${CONFIG_PATH}` is the relative path of the corresponding configuration file in the `config` directory.
+
+## Pretrained Models:
+
+The pretrained models are provided in [Baidu Netdisk](), the code is xxx.
+
+## Citation
+
+If you are interested in our work, please cite as follows:
+
+```
+@article{lin2022vlg,
+  title={VLG: General Video Recognition with Web Textual Knowledge},
+  author={Lin, Jintao and Liu, Zhaoyang and Wang, Wenhai and Wu, Wayne and Wang, Limin},
+  journal={arXiv preprint arXiv:2212.01638},
+  year={2022}
+}
+```
+
+## Acknowledge
+
+This repo contains modified codes from: [VL-LTR](https://github.com/ChangyaoTian/VL-LTR), [ActionCLIP](https://github.com/sallymmx/ActionCLIP), and [OpenMax](https://github.com/ma-xu/Open-Set-Recognition/tree/master/OSR/OpenMax).
@@ -0,0 +1,55 @@
+cfg = dict(
+    pretrain_model='CVLP_r50',
+    finetune_model='LGR_r50_no_init',
+    desc_path='data/kinetics100_test',
+    data_root_train='data/kinetics400/',
+    data_root_val='data/kinetics400/',
+    pretrained_clip='pretrained/RN50.pt',
+    context_length=75,
+    pretrain_cvlp_path='checkpoints/k100_base_pretrain_r50/',
+    vis_backbone_path='checkpoints/k100_base_pretrain_r50/checkpoint.pth',
+
+    use_mcloader=True,
+    data_set='k100_support_query',
+    dataset='k100_support_query',
+    drop_last=True,
+    index_bias=0,
+    nb_classes=5,
+
+    train_mode=False,
+    train_list_file='data/kinetics100_test/k100_support_query_list.txt',
+    val_list_file='data/kinetics100_test/k100_support_query_list.txt',
+
+    epochs=50,
+    batch_size=int(16),
+    use_res=True,
+    val_interval=10,
+    save_interval=50,
+
+    repeated_aug=False,
+    mixup=0.,
+    cutmix=0.,
+    clip_ms=True,
+    num_segments=16,
+    new_length=1,
+    is_video=True,
+    select_num=50,
+
+    io_backend='disk',
+    only_video=False,
+    num_classes=24,
+
+    n_way=5,
+    n_support=5,
+    n_query=20,
+    n_eposide=200,
+
+    dropout=0.,
+    emb_dropout=0.,
+    consider_fusion_module=False,
+
+    alpha=0.1,
+    C=0.316,
+    max_iter=1000,
+    verbose=0
+)
@@ -0,0 +1,50 @@
+cfg = dict(
+    pretrain_model='CVLP_vit16',
+    finetune_model='LGR_vit16_no_init',
+    desc_path='data/kinetics100_test',
+    data_root_train='data/kinetics400/',
+    data_root_val='data/kinetics400/',
+    pretrained_clip='pretrained/ViT-B-16.pt',
+    context_length=75,
+    pretrain_cvlp_path='checkpoints/k100_base_pretrain_vit16/',
+    vis_backbone_path='checkpoints/k100_base_pretrain_vit16/checkpoint.pth',
+    op_type='cosine',
+    with_param=False,
+
+    use_mcloader=True,
+    data_set='k100_support_query',
+    dataset='k100_support_query',
+    drop_last=True,
+    index_bias=0,
+    nb_classes=5,
+
+    train_mode=False,
+    train_list_file='data/kinetics100_test/k100_support_query_list.txt',
+    val_list_file='data/kinetics100_test/k100_support_query_list.txt',
+
+    epochs=50,
+    batch_size=int(16),
+    use_res=True,
+
+    repeated_aug=False,
+    mixup=0.,
+    cutmix=0.,
+    clip_ms=True,
+    num_segments=16,
+    new_length=1,
+    is_video=True,
+    select_num=50,
+
+    io_backend='disk',
+    only_video=False,
+    num_classes=24,
+
+    n_way=5,
+    n_support=5,
+    n_query=20,
+    n_eposide=200,
+
+    consider_fusion_module=False,
+
+    alpha=0.0
+)
@@ -0,0 +1,55 @@
+cfg = dict(
+    pretrain_model='CVLP_r50',
+    finetune_model='LGR_r50_no_init',
+    desc_path='data/kinetics100_test',
+    data_root_train='data/kinetics400/',
+    data_root_val='data/kinetics400/',
+    pretrained_clip='pretrained/RN50.pt',
+    context_length=75,
+    pretrain_cvlp_path='checkpoints/k100_base_pretrain_r50/',
+    vis_backbone_path='checkpoints/k100_base_pretrain_r50/checkpoint.pth',
+
+    use_mcloader=True,
+    data_set='k100_support_query',
+    dataset='k100_support_query',
+    drop_last=True,
+    index_bias=0,
+    nb_classes=5,
+
+    train_mode=False,
+    train_list_file='data/kinetics100_test/k100_support_query_list.txt',
+    val_list_file='data/kinetics100_test/k100_support_query_list.txt',
+
+    epochs=50,
+    batch_size=int(16),
+    use_res=True,
+    val_interval=10,
+    save_interval=50,
+
+    repeated_aug=False,
+    mixup=0.,
+    cutmix=0.,
+    clip_ms=True,
+    num_segments=16,
+    new_length=1,
+    is_video=True,
+    select_num=50,
+
+    io_backend='disk',
+    only_video=False,
+    num_classes=24,
+
+    n_way=5,
+    n_support=5,
+    n_query=20,
+    n_eposide=200,
+
+    dropout=0.,
+    emb_dropout=0.,
+    consider_fusion_module=False,
+
+    alpha=0.1,
+    C=0.316,
+    max_iter=1000,
+    verbose=0
+)