gradient descent

rename segmented-trainer to segmented-gd-trainer

conjugate gradient

clarify conjugate gradient license

State “WAITING” [2008-07-21 Mon 21:53]
Contacting Carl Rasmussen by email

portably implement float nan, inf handling

There is no way to do traps in allegro so let’s just catch ARITHMETIC-ERROR at some strategically chosen places.

failed line searches can still change the weights

documented

boltzmann machines

conditioning chunks are not really visible nor hidden

Does it make sense to store them separately?

factored RBM: weight matrix is A*B

semi-restricted (connections between visibles)

higher order rbm

training with conjugate gradient

Is it possible to calculate the cost function? Not quite, but there are two ways out: 1) rbm importance sampling paper to estimate the partition function 2) constrain features to be sparse and don’t use a partition function at all.

training with SMD

sparsity constraint

possible with WEIGHT-PENALTY

generic support for exponential family distributions

http://www.ics.uci.edu/~michal/GenHarm3.pdf

cache inputs optionally

This should make higher level RBMs a lot faster and can be implemented trivially with a samples -> node arrays hash table.

fix gradient accumulation

CLOSING NOTE [2013-03-05 Tue 16:32] When the accumulator’s start is not zero, there are two ways to fail: a) go to the matlisp branch and put it into the wrong place as if start were 0, b) use the lisp branch and fail if there are multiple stripes. Currently, non-zero start with several stripes runs into an assert.

normalized-chunk: scale should be per stripe

positive phase: if sample-hidden-p, then is it (* visible hidden-mean) or (* visible hidden-sample)?

HIDDEN-SAMPLING parameter in RBM-TRAINER

temporal rbm

calculate bias+conditioning activation only once per training example

State “DONE” [2008-09-10 Wed 15:00]
:CACHE-STATIC-ACTIVATIONS-P initarg to CHUNK

persistent contrastive divergence

general boltzmann machines

deep boltzmann machines

initialize DBM from DBN

implement annealed importance sampling

unbreak temporal chunks

backprop

VALIDATE-LUMP: set and check size if possible, check inputs

ADD-LUMP: set MAX-N-STRIPES to something saner

redo ->CROSS-ENTROPY

CLOSING NOTE [2014-01-20 Mon 13:13]
removed

what to do with INDICES-TO-CALCULATE?

CLOSING NOTE [2014-01-20 Mon 13:13]
removed

normalized-lump: scale should be per stripe

implement: http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/1409.pdf

CLOSING NOTE [2015-01-21 Wed 21:08]

implement: http://riejohnson.com/rie/stograd_nips.pdf

CLOSING NOTE [2015-01-21 Wed 21:09]

Doesn’t work well with dropout or rbms. Why?

lagging-average-gradients is too imprecise
lagging-average-gradients is for the whole network and not the particular subsets with dropouts which correspond to the input examples in a batch.

unroll

unroll only part of the network

example for unrolling with missing values

fix or remove missing value support

removed

unroll factored clouds

flip chunk

When an input chunk is to be reconstructed it should go above the layer in the bpn instead of below it where it is in the BM.

gaussian processes

implement gaussian processes

CLOSING NOTE [2013-03-05 Tue 16:33]

misc

faster exp? http://citeseer.ist.psu.edu/cache/papers/cs/43/ftp:zSzzSzftp.idsia.chzSzpubzSzniczSzexp.pdf/schraudolph98fast.pdf

Doesn’t seem to speed up things.

remove dependency on BLAS: implement some of matlisp in Lisp

Ripped matrix.lisp and various bits from Matlisp.

what to do with USE-BLAS?

CLOSING NOTE [2013-03-05 Tue 16:31] The wrappers should calculate cost and call matlisp if it is avaible.

float vector I/O for cmucl

CLOSING NOTE [2013-03-05 Tue 19:39]

support more lisps

CLOSING NOTE [2013-03-05 Tue 16:32] Matlisp only supports cmucl, sbcl and allegro.

Files

TODO

Latest commit

History