Hi thanks for the great tutorial. I have trouble understanding math. What is the reason to pass in encode3 to logsd before the nonlinearity is applied? Why not give encode3neur to both mu and logsd? I would ask if it's a typo, but running the reference prototxt, I can make it converge.

I have combined the VAE layers with convolution and deconvolution layers, and am having trouble training MNIST with this new architecture. (Using Sigmoid neurons instead of ReLU, if that matters).