The code and model details in the cs480/680 kaggle competition at Fall 2020
In this task, I was inspired by Transformer, and thus I proposed a SimpleTransformer in this competition. The model architecture is in the following.
The BOP is Beginning of Prediction.
The another type of model also combined noisy image and text. The following figure displays the structure of this type of model.
For Pretrained Image classification I used VGG and MobileNet. I implemented two text classification models: TextCNN and RCNN.
All these three models achieved over 97.8% in my validation set. More details can be checked in the notebook.