MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer

Authors' code for paper "MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer", ACMMM 2022.

Prerequisites

Python == 3.9
PyTorch == 1.12.1
stanfordcorenlp == 3.9.1.1 with stanford-corenlp-4.2.2
transformers == 4.12.5
pycocoevalcap （https://github.com/sks3i/pycocoevalcap）

Datasets

VIST-E: download the SIS-with-labels.tar.gz (https://visionandlanguage.net/VIST/dataset.html), download the image features (https://vist-arel.s3.amazonaws.com/resnet_features.zip) and put them in data/VIST-E.

LSMDC-E: download LSMDC 2021 version （task1_2021.zip, resnet152_200.zip） (https://sites.google.com/site/describingmovies/home) and put them in data/LSMDC-E. NOTE: Due to LSMDC agreement, we cannot share data to any third-party.

We utilize Glove embedding, please download the glove.6b.300d.txt and put it in data/.

Data Preprocess

VIST-E:

Unzip SIS-with-labels.tar.gz to data/VIST-E.
Unzip conv features in resnet_features.zip to a folder data/VIST-E/image_features without any subfolders.
Run data/VIST-E/annotations.py.
Run data/VIST-E/img_feat_path.py .
Run data/VIST-E/pro_label.py.
Run data/embed_vocab.py and make sure parameter dataset is set to VIST-E.

LSMDC-E:

Unzip task1_2021.zip to data/LSMDC-E.
Unzip all resnet features in resnet152_200.zip to a folder data/LSMDC-E/image_features without any subfolders.
Run data/LSMDC-E/prepro_vocab.py.
Run data/embed_vocab.py and make sure parameter dataset is set to LSMDC-E.

Process

Set parameters in utils/opts.py.
Run train.py to train a model.

Run eval.py to evaluate a model.

Recommended Settings

VIST-E w BERT:

python train.py --dataset VIST-E --use_bert True --num_head 4 --weight_decay 0 --grad_clip_value 0

VIST-E w/o BERT:

python train.py --dataset VIST-E --use_bert False --num_head 4 --weight_decay 1e-5 --grad_clip_value 0

LSMDC-E w BERT:

python train.py --dataset LSMDC-E --use_bert True --num_head 8 --weight_decay 1e-5 --grad_clip_value 0.1

LSMDC-E w/o BERT:

python train.py --dataset LSMDC-E --use_bert False --num_head 8 --weight_decay 1e-5 --grad_clip_value 0.1

Citation

If you find our work or the code useful, please consider cite our paper using:

@inproceedings{10.1145/3503161.3548022,
author = {Xue, Dizhan and Qian, Shengsheng and Fang, Quan and Xu, Changsheng},
title = {MMT: Image-Guided Story Ending Generation with Multimodal Memory Transformer},
year = {2022},
doi = {10.1145/3503161.3548022},
booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
pages = {750–758},
numpages = {9},
}

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
dataloader		dataloader
modules		modules
utils		utils
CaptionModel.py		CaptionModel.py
README.md		README.md
eval.py		eval.py
eval_utils.py		eval_utils.py
metrics.py		metrics.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer

Authors' code for paper "MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer", ACMMM 2022.

Prerequisites

Datasets

Data Preprocess

Process

Citation

About

Uh oh!

Releases

Packages

Languages

LivXue/MMT

Folders and files

Latest commit

History

Repository files navigation

MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer

Authors' code for paper "MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer", ACMMM 2022.

Prerequisites

Datasets

Data Preprocess

Process

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages