Skip to content

An attempt to replicate "Text to Image Synthesis" using GANs and Skipthought Vectors.

Notifications You must be signed in to change notification settings

Dhruvvvx17/Text-to-Image-Synthesis

Repository files navigation

Text-to-Image-Synthesis

This project is part of the course Topics in Deep Learning (UE17CS338) taken at PES University during my 6th semester (Spring 2020).

Introduction about GANs

GANs short for Generative Adversarial Networks are a class of deep learning networks. They consist of two neural networks that compete with each other in order to improve the outcomes of both the networks simultaneously.

GANs are an approach to Generative Modeling but rather than the conventional unsupervised form of generative modeling that requires a network to discover and learn patterns and regularities in data such that the network can generate outputs which would seem to have been obtained from the original dataspace itself, a GAN converts aims at generating the same through a supervised learning process.

A GAN consists of two networks - the Generator that generates new output and the Discriminator that tries to classify the output as real (from the training data space) or fake (generated). The generator tries to improve by moving from generating absolute noise to generating something close to the real dataset. The discriminator tries to improve by becoming better at differentiating the real output from the fake/generated output.

Once the generator and discriminator are sufficiently improved and the discriminator is unable to diffrentiate the real from the fake, the training process is complete. The generator can then be used independently to generate output.


Project Abstract

This project is an implementation of the paper Generative Adversarial Text to Image Synthesis. It is a tensorflow based implementation and the text descriptions are encoded using Skip Thought Vectors. The below image is the representation of the model architecture.

Results

Training:

After 120 epochs - img_after_120

After 150 epochs - img_after_150

After 210 epochs - img_after_210

After 240 epochs - img_after_240

After 270 epochs - img_after_270

After 330 epochs - img_after_330

After 450 epochs - img_after_450

After 900 epochs - img_after_900

After 1300 epochs - img_after_1300

Testing:

"A Red Flower" - Red_flower_960

"A Blue Flower" - Blue_flower_960

"A Purple Flower" - Purple_flower_960


Improvements:

  • Train for a greater number of epochs to obtain better results
  • Train the model on the MS-COCO data set, and generate more generic images.
  • Try different embedding options for captions(other than skip thought vectors). Also try to train the caption embedding RNN along with the GAN-CLS model.

About

An attempt to replicate "Text to Image Synthesis" using GANs and Skipthought Vectors.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages