Skip to content

victorgearhead/Image_Captioning

Repository files navigation

Image Captioning

LIVE MODEL

Dataset:Flickr8k

  • Flickr8k Dataset: This dataset contains 8,000 images, each paired with five descriptive captions. It is widely used in image captioning research due to its diverse and rich annotations.

Encoder: ResNet50

  • For the encoding phase, we employed ResNet50, a powerful convolutional neural network (CNN) pre-trained on ImageNet. ResNet50 is renowned for its deep architecture, which allows it to capture intricate patterns and features in images.By leveraging a pre-trained model, we benefit from the transfer learning capabilities, improving the model's performance on the image captioning task.

Decoder: Transformer

  • For the decoding phase, we implemented a Transformer-based model. Since Transformers do not have a built-in sense of sequence, we incorporated positional embeddings to provide the model with information about the position of each word in the sequence.

Related

Here are some readings

ResNet

Transformers

Dataset

Flickr8k

Results

Screenshot

Screenshot

Screenshot

Screenshot

Author

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published