This project is part of the course Topics in Deep Learning (UE17CS338) taken at PES University during my 6th semester (Spring 2020).
GANs short for Generative Adversarial Networks are a class of deep learning networks. They consist of two neural networks that compete with each other in order to improve the outcomes of both the networks simultaneously.
GANs are an approach to Generative Modeling but rather than the conventional unsupervised form of generative modeling that requires a network to discover and learn patterns and regularities in data such that the network can generate outputs which would seem to have been obtained from the original dataspace itself, a GAN converts aims at generating the same through a supervised learning process.
A GAN consists of two networks - the Generator that generates new output and the Discriminator that tries to classify the output as real (from the training data space) or fake (generated). The generator tries to improve by moving from generating absolute noise to generating something close to the real dataset. The discriminator tries to improve by becoming better at differentiating the real output from the fake/generated output.
Once the generator and discriminator are sufficiently improved and the discriminator is unable to diffrentiate the real from the fake, the training process is complete. The generator can then be used independently to generate output.
This project is an implementation of the paper Generative Adversarial Text to Image Synthesis. It is a tensorflow based implementation and the text descriptions are encoded using Skip Thought Vectors. The below image is the representation of the model architecture.
Training:
Testing:
Improvements:
- Train for a greater number of epochs to obtain better results
- Train the model on the MS-COCO data set, and generate more generic images.
- Try different embedding options for captions(other than skip thought vectors). Also try to train the caption embedding RNN along with the GAN-CLS model.