Skip to content

Commit e30db7a

Browse files
committed
Added project files and results
1 parent aa5dbbb commit e30db7a

16 files changed

+1761
-2
lines changed

.gitignore

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
.DS_Store
2+
.idea*
3+
*.pdf
4+
*.jpg
5+
*.png
6+
*.pyc
7+
*.py.bak
8+
sample.py
9+
vggtest.py
10+
*.pem
11+
amazon_ssh.sh
12+
awstransfer.sh
13+
localtrain.py
14+
vislstm.png
15+
sample_aws.sh
16+
eval_trec.py
17+
theanotest.py
18+
data_loader_old.py
19+
Utils/word_embeddings_old.py
20+
gen_backup.py
21+
data_loader_test.py
22+
downloadModels.sh

Data/.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
*
2+
*/
3+
!.gitignore

README.md

+117-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,117 @@
1-
# Text-to-Image-Synthesis
2-
Text to Image Synthesis using GANs and Skipthought Vectors
1+
# Text To Image Synthesis Using Thought Vectors
2+
3+
[![Join the chat at https://gitter.im/text-to-image/Lobby](https://badges.gitter.im/text-to-image/Lobby.svg)](https://gitter.im/text-to-image/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
4+
5+
This is an experimental tensorflow implementation of synthesizing images from captions using [Skip Thought Vectors][1]. The images are synthesized using the GAN-CLS Algorithm from the paper [Generative Adversarial Text-to-Image Synthesis][2]. This implementation is built on top of the excellent [DCGAN in Tensorflow][3]. The following is the model architecture. The blue bars represent the Skip Thought Vectors for the captions.
6+
7+
![Model architecture](http://i.imgur.com/dNl2HkZ.jpg)
8+
9+
Image Source : [Generative Adversarial Text-to-Image Synthesis][2] Paper
10+
11+
## Requirements
12+
- Python 2.7.6
13+
- [Tensorflow][4]
14+
- [h5py][5]
15+
- [Theano][6] : for skip thought vectors
16+
- [scikit-learn][7] : for skip thought vectors
17+
- [NLTK][8] : for skip thought vectors
18+
19+
## Datasets
20+
- All the steps below for downloading the datasets and models can be performed automatically by running `python download_datasets.py`. Several gigabytes of files will be downloaded and extracted.
21+
- The model is currently trained on the [flowers dataset][9]. Download the images from [this link][9] and save them in ```Data/flowers/jpg```. Also download the captions from [this link][10]. Extract the archive, copy the ```text_c10``` folder and paste it in ```Data/flowers```.
22+
- Download the pretrained models and vocabulary for skip thought vectors as per the instructions given [here][13]. Save the downloaded files in ```Data/skipthoughts```.
23+
- Make empty directories in Data, ```Data/samples```, ```Data/val_samples``` and ```Data/Models```. They will be used for sampling the generated images and saving the trained models.
24+
25+
## Usage
26+
- <b>Data Processing</b> : Extract the skip thought vectors for the flowers data set using :
27+
```
28+
python data_loader.py --data_set="flowers"
29+
```
30+
- <b>Training</b>
31+
* Basic usage `python train.py --data_set="flowers"`
32+
* Options
33+
- `z_dim`: Noise Dimension. Default is 100.
34+
- `t_dim`: Text feature dimension. Default is 256.
35+
- `batch_size`: Batch Size. Default is 64.
36+
- `image_size`: Image dimension. Default is 64.
37+
- `gf_dim`: Number of conv in the first layer generator. Default is 64.
38+
- `df_dim`: Number of conv in the first layer discriminator. Default is 64.
39+
- `gfc_dim`: Dimension of gen untis for for fully connected layer. Default is 1024.
40+
- `caption_vector_length`: Length of the caption vector. Default is 1024.
41+
- `data_dir`: Data Directory. Default is `Data/`.
42+
- `learning_rate`: Learning Rate. Default is 0.0002.
43+
- `beta1`: Momentum for adam update. Default is 0.5.
44+
- `epochs`: Max number of epochs. Default is 600.
45+
- `resume_model`: Resume training from a pretrained model path.
46+
- `data_set`: Data Set to train on. Default is flowers.
47+
48+
- <b>Generating Images from Captions</b>
49+
* Write the captions in text file, and save it as ```Data/sample_captions.txt```. Generate the skip thought vectors for these captions using:
50+
```
51+
python generate_thought_vectors.py --caption_file="Data/sample_captions.txt"
52+
```
53+
* Generate the Images for the thought vectors using:
54+
```
55+
python generate_images.py --model_path=<path to the trained model> --n_images=8
56+
```
57+
```n_images``` specifies the number of images to be generated per caption. The generated images will be saved in ```Data/val_samples/```. ```python generate_images.py --help``` for more options.
58+
59+
## Sample Images Generated
60+
Following are the images generated by the generative model from the captions.
61+
62+
| Caption | Generated Images |
63+
| ------------- | -----:|
64+
| the flower shown has yellow anther red pistil and bright red petals | ![](http://i.imgur.com/SknZ3Sg.jpg) |
65+
| this flower has petals that are yellow, white and purple and has dark lines | ![](http://i.imgur.com/8zsv9Nc.jpg) |
66+
| the petals on this flower are white with a yellow center | ![](http://i.imgur.com/vvzv1cE.jpg) |
67+
| this flower has a lot of small round pink petals. | ![](http://i.imgur.com/w0zK1DC.jpg) |
68+
| this flower is orange in color, and has petals that are ruffled and rounded. | ![](http://i.imgur.com/VfBbRP1.jpg) |
69+
| the flower has yellow petals and the center of it is brown | ![](http://i.imgur.com/IAuOGZY.jpg) |
70+
71+
72+
## Implementation Details
73+
- Only the uni-skip vectors from the skip thought vectors are used. I have not tried training the model with combine-skip vectors.
74+
- The model was trained for around 200 epochs on a GPU. This took roughly 2-3 days.
75+
- The images generated are 64 x 64 in dimension.
76+
- While processing the batches before training, the images are flipped horizontally with a probability of 0.5.
77+
- The train-val split is 0.75.
78+
79+
## Pre-trained Models
80+
- Download the pretrained model from [here][14] and save it in ```Data/Models```. Use this path for generating the images.
81+
82+
## TODO
83+
- Train the model on the MS-COCO data set, and generate more generic images.
84+
- Try different embedding options for captions(other than skip thought vectors). Also try to train the caption embedding RNN along with the GAN-CLS model.
85+
86+
## References
87+
- [Generative Adversarial Text-to-Image Synthesis][2] Paper
88+
- [Generative Adversarial Text-to-Image Synthesis][11] Code
89+
- [Skip Thought Vectors][1] Paper
90+
- [Skip Thought Vectors][12] Code
91+
- [DCGAN in Tensorflow][3]
92+
- [DCGAN in Tensorlayer][15]
93+
94+
## Alternate Implementations
95+
- [Text to Image in Torch by Scot Reed][11]
96+
- [Text to Image in Tensorlayer by Dong Hao][16]
97+
98+
## License
99+
MIT
100+
101+
102+
[1]:http://arxiv.org/abs/1506.06726
103+
[2]:http://arxiv.org/abs/1605.05396
104+
[3]:https://github.com/carpedm20/DCGAN-tensorflow
105+
[4]:https://github.com/tensorflow/tensorflow
106+
[5]:http://www.h5py.org/
107+
[6]:https://github.com/Theano/Theano
108+
[7]:http://scikit-learn.org/stable/index.html
109+
[8]:http://www.nltk.org/
110+
[9]:http://www.robots.ox.ac.uk/~vgg/data/flowers/102/
111+
[10]:https://drive.google.com/file/d/0B0ywwgffWnLLcms2WWJQRFNSWXM/view
112+
[11]:https://github.com/reedscot/icml2016
113+
[12]:https://github.com/ryankiros/skip-thoughts
114+
[13]:https://github.com/ryankiros/skip-thoughts#getting-started
115+
[14]:https://bitbucket.org/paarth_neekhara/texttomimagemodel/raw/74a4bbaeee26fe31e148a54c4f495694680e2c31/latest_model_flowers_temp.ckpt
116+
[15]:https://github.com/zsdonghao/dcgan
117+
[16]:https://github.com/zsdonghao/text-to-image

Utils/__init__.py

Whitespace-only changes.

Utils/image_processing.py

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import numpy as np
2+
from scipy import misc
3+
import random
4+
import skimage
5+
import skimage.io
6+
import skimage.transform
7+
import imageio
8+
9+
def load_image_array(image_file, image_size):
10+
img = skimage.io.imread(image_file)
11+
# GRAYSCALE
12+
if len(img.shape) == 2:
13+
img_new = np.ndarray( (img.shape[0], img.shape[1], 3), dtype = 'uint8')
14+
img_new[:,:,0] = img
15+
img_new[:,:,1] = img
16+
img_new[:,:,2] = img
17+
img = img_new
18+
19+
img_resized = skimage.transform.resize(img, (image_size, image_size))
20+
21+
# FLIP HORIZONTAL WIRH A PROBABILITY 0.5
22+
if random.random() > 0.5:
23+
img_resized = np.fliplr(img_resized)
24+
25+
26+
return img_resized.astype('float32')
27+
28+
if __name__ == '__main__':
29+
# TEST>>>
30+
arr = load_image_array('sample.jpg', 64)
31+
print(arr.mean())
32+
# rev = np.fliplr(arr)
33+
imageio.imwrite( 'rev.jpg', arr)

Utils/ops.py

+133
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# RESUED CODE FROM https://github.com/carpedm20/DCGAN-tensorflow/blob/master/ops.py
2+
import math
3+
import numpy as np
4+
import tensorflow as tf
5+
6+
from tensorflow.python.framework import ops
7+
8+
9+
class batch_norm(object):
10+
11+
# This function initailizes a batch_norm layer when the class name is called.
12+
# Code modification of http://stackoverflow.com/a/33950177
13+
def __init__(self, epsilon=1e-5, momentum = 0.9, name="batch_norm"):
14+
15+
with tf.variable_scope(name):
16+
17+
self.epsilon = epsilon
18+
self.momentum = momentum
19+
self.ema = tf.train.ExponentialMovingAverage(decay=self.momentum)
20+
self.name = name
21+
22+
23+
def __call__(self, x, train=True):
24+
shape = x.get_shape().as_list()
25+
26+
if train:
27+
with tf.variable_scope(self.name) as scope:
28+
self.beta = tf.get_variable("beta", [shape[-1]],
29+
initializer=tf.constant_initializer(0.))
30+
self.gamma = tf.get_variable("gamma", [shape[-1]],
31+
initializer=tf.random_normal_initializer(1., 0.02))
32+
33+
try:
34+
batch_mean, batch_var = tf.nn.moments(x, [0, 1, 2], name='moments')
35+
except:
36+
batch_mean, batch_var = tf.nn.moments(x, [0, 1], name='moments')
37+
38+
with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
39+
ema_apply_op = self.ema.apply([batch_mean, batch_var])
40+
self.ema_mean, self.ema_var = self.ema.average(batch_mean), self.ema.average(batch_var)
41+
42+
with tf.control_dependencies([ema_apply_op]):
43+
mean, var = tf.identity(batch_mean), tf.identity(batch_var)
44+
else:
45+
mean, var = self.ema_mean, self.ema_var
46+
47+
normed = tf.nn.batch_norm_with_global_normalization(
48+
x, mean, var, self.beta, self.gamma, self.epsilon, scale_after_normalization=True)
49+
50+
return normed
51+
52+
def binary_cross_entropy(preds, targets, name=None):
53+
"""Computes binary cross entropy given `preds`.
54+
For brevity, let `x = `, `z = targets`. The logistic loss is
55+
loss(x, z) = - sum_i (x[i] * log(z[i]) + (1 - x[i]) * log(1 - z[i]))
56+
Args:
57+
preds: A `Tensor` of type `float32` or `float64`.
58+
targets: A `Tensor` of the same type and shape as `preds`.
59+
"""
60+
eps = 1e-12
61+
with ops.op_scope([preds, targets], name, "bce_loss") as name:
62+
preds = ops.convert_to_tensor(preds, name="preds")
63+
targets = ops.convert_to_tensor(targets, name="targets")
64+
return tf.reduce_mean(-(targets * tf.log(preds + eps) + (1.0 - targets) * tf.log(1.0 - preds + eps)))
65+
66+
def conv_cond_concat(x, y):
67+
"""Concatenate conditioning vector on feature map axis."""
68+
x_shapes = x.get_shape()
69+
y_shapes = y.get_shape()
70+
return tf.concat(3, [x, y*tf.ones([x_shapes[0], x_shapes[1], x_shapes[2], y_shapes[3]])])
71+
72+
def conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,name="conv2d"):
73+
with tf.variable_scope(name):
74+
w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim], initializer=tf.truncated_normal_initializer(stddev=stddev))
75+
76+
conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')
77+
78+
biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
79+
conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
80+
81+
return conv
82+
83+
#
84+
def deconv2d(input_, output_shape, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, name="deconv2d", with_w=False):
85+
86+
with tf.variable_scope(name):
87+
# filter : [height, width, output_channels, in_channels]
88+
w = tf.get_variable('w', [k_h, k_h, output_shape[-1], input_.get_shape()[-1]], initializer=tf.random_normal_initializer(stddev=stddev))
89+
90+
try:
91+
deconv = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape,strides=[1, d_h, d_w, 1])
92+
93+
# Support for verisons of TensorFlow before 0.7.0
94+
except AttributeError:
95+
deconv = tf.nn.deconv2d(input_, w, output_shape=output_shape, strides=[1, d_h, d_w, 1])
96+
97+
biases = tf.get_variable('biases', [output_shape[-1]], initializer=tf.constant_initializer(0.0))
98+
deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())
99+
100+
if with_w:
101+
return deconv, w, biases
102+
else:
103+
return deconv
104+
105+
# Leaky relu activation
106+
def lrelu(x, leak=0.2, name="lrelu"):
107+
return tf.maximum(x, leak*x)
108+
109+
110+
111+
def linear(input_, output_size, scope=None, stddev=0.02, bias_start=0.0, with_w=False):
112+
113+
# input_ is the text_embedding being passed from model.py
114+
shape = input_.get_shape().as_list()
115+
116+
# Preserving the scope of the variable. Variable_scope allows to create new variables or use shared variables
117+
# Check out this for variable_scope https://www.tensorflow.org/api_docs/python/tf/compat/v1/variable_scope
118+
with tf.variable_scope(scope or "Linear"):
119+
120+
# get_variable is used to get an existing variable with these parameters or to create a new one.
121+
# Input arguments are : name, share, dtype and initializer.
122+
123+
# Weight matrix
124+
matrix = tf.get_variable("Matrix", [shape[1], output_size], tf.float32, tf.random_normal_initializer(stddev=stddev))
125+
126+
# Bias matrix
127+
bias = tf.get_variable("bias", [output_size], initializer=tf.constant_initializer(bias_start))
128+
129+
# Return the matmul of the input with the weight matrix + the bias
130+
if with_w:
131+
return tf.matmul(input_, matrix) + bias, matrix, bias
132+
else:
133+
return tf.matmul(input_, matrix) + bias

0 commit comments

Comments
 (0)