Image Captioning

Dataset:Flickr8k

Flickr8k Dataset: This dataset contains 8,000 images, each paired with five descriptive captions. It is widely used in image captioning research due to its diverse and rich annotations.

For the encoding phase, we employed ResNet50, a powerful convolutional neural network (CNN) pre-trained on ImageNet. ResNet50 is renowned for its deep architecture, which allows it to capture intricate patterns and features in images.By leveraging a pre-trained model, we benefit from the transfer learning capabilities, improving the model's performance on the image captioning task.

For the decoding phase, we implemented a Transformer-based model. Since Transformers do not have a built-in sense of sequence, we incorporated positional embeddings to provide the model with information about the position of each word in the sequence.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.ipynb_checkpoints		.ipynb_checkpoints
SCs		SCs
.gitattributes		.gitattributes
Caption_Generator.ipynb		Caption_Generator.ipynb
Embedded_Images_data.pkl		Embedded_Images_data.pkl
ImageCaptioning_Model.pkl		ImageCaptioning_Model.pkl
Image_Features_Embed_ResNet_Train.pkl		Image_Features_Embed_ResNet_Train.pkl
Image_Features_Embed_ResNet_Valid.pkl		Image_Features_Embed_ResNet_Valid.pkl
README.md		README.md
captions.txt		captions.txt
image_caption.ipynb		image_caption.ipynb