- VQA_image_features.h5 - pretrained CNN features (you don't need actual images)
- glove.6B.300d.txt - word embeddings
- v2_Annotations_Train_mscoco.zip - answers data & more
- v2_Questions_Train_mscoco.zip - questions data, images ids & more
. ├── Data │ ├── Additional │ │ ├── VQA_image_features.h5 (absent) │ │ ├── image_ids_vqa.json │ │ ├── imgid2imginfo.json │ │ └── VQA_img_features2id.json │ ├── GloVe │ │ └── glove.6B.300d.txt (absent) │ ├── Subset │ │ └── (empty) │ └── VQA_Train │ ├── v2_Annotations_Train_mscoco.zip (absent) │ └── v2_Questions_Train_mscoco.zip (absent) ├── Gallery │ └── (some pics) ├── Notebooks │ ├── scripts │ │ └── (empty) │ ├── DataPreparation.ipynb │ ├── EmbeddingPreparation.ipynb │ ├── Preprocessing.ipynb │ └── TrainBOWIMG.ipynb ├── README.md └── Weights └── 443700_emb_ep_3_34.0.pt
( more examples: Demo section )
- [Done] Pure BoW vs W2V comarison
- [Done] Linear Layer matrix splitting into textual and visual parts
- [Done] Different metrics instead of a silly accuracy
- Data statistics
- Make sure the same pics ids don't end up in the same train/valid/test split
- Different questions for the same picture, vise versa
- Split questions by type, score for each type (yes/no, number, other)
- Good README with references