This repo contains a collection of video language models-based works over the past few years that I found interesting. Hope this helps !
-
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval [CVPR 2023]
Code : https://github.com/XudongLinthu/upgradable-multimodal-intelligence -
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners (VidIL) [NeurIPS 2022]
Code : https://github.com/MikeWangWZHL/VidIL -
Invariant Grounding for Video Question Answering (IGV) [CVPR 2022]
Code : https://github.com/yl3800/IGV -
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (HQGA) [AAAI 2022]
Code : https://github.com/doc-doc/HQGA
-
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos [EMNLP 2023]
Code : https://github.com/PlusLabNLP/acquired (NOT RELEASED YET) -
From representation to reasoning: Towards both evidence and commonsense reasoning for video question-answering (CausalVidQA) [CVPR 2022]
Code : https://github.com/bcmi/Causal-VidQA -
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos (ComPhy) [ICLR 2022]
Code : https://github.com/zfchenUnique/compositional_physics_learner -
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions (NeXT-QA) [CVPR 2021]
Code : https://github.com/doc-doc/NExT-QA -
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions (CRAFT) [ACL 2022]
Code : https://github.com/hucvl/craft -
CLEVRER: CoLlision Events for Video REpresentation and Reasoning (CLEVRER) [ICLR 2020]
Code : https://github.com/chuangg/CLEVRER