Skip to content

Latest commit

 

History

History
129 lines (128 loc) · 5.32 KB

TODO

File metadata and controls

129 lines (128 loc) · 5.32 KB

gradient descent

rename segmented-trainer to segmented-gd-trainer

conjugate gradient

clarify conjugate gradient license

  • State “WAITING” [2008-07-21 Mon 21:53]
    Contacting Carl Rasmussen by email

portably implement float nan, inf handling

There is no way to do traps in allegro so let’s just catch ARITHMETIC-ERROR at some strategically chosen places.

failed line searches can still change the weights

documented

boltzmann machines

conditioning chunks are not really visible nor hidden

Does it make sense to store them separately?

factored RBM: weight matrix is A*B

semi-restricted (connections between visibles)

higher order rbm

training with conjugate gradient

Is it possible to calculate the cost function? Not quite, but there are two ways out: 1) rbm importance sampling paper to estimate the partition function 2) constrain features to be sparse and don’t use a partition function at all.

training with SMD

sparsity constraint

possible with WEIGHT-PENALTY

generic support for exponential family distributions

http://www.ics.uci.edu/~michal/GenHarm3.pdf

cache inputs optionally

This should make higher level RBMs a lot faster and can be implemented trivially with a samples -> node arrays hash table.

fix gradient accumulation

  • CLOSING NOTE [2013-03-05 Tue 16:32] When the accumulator’s start is not zero, there are two ways to fail: a) go to the matlisp branch and put it into the wrong place as if start were 0, b) use the lisp branch and fail if there are multiple stripes. Currently, non-zero start with several stripes runs into an assert.

normalized-chunk: scale should be per stripe

positive phase: if sample-hidden-p, then is it (* visible hidden-mean) or (* visible hidden-sample)?

HIDDEN-SAMPLING parameter in RBM-TRAINER

temporal rbm

calculate bias+conditioning activation only once per training example

  • State “DONE” [2008-09-10 Wed 15:00]
    :CACHE-STATIC-ACTIVATIONS-P initarg to CHUNK

persistent contrastive divergence

general boltzmann machines

deep boltzmann machines

initialize DBM from DBN

implement annealed importance sampling

unbreak temporal chunks

backprop

VALIDATE-LUMP: set and check size if possible, check inputs

ADD-LUMP: set MAX-N-STRIPES to something saner

redo ->CROSS-ENTROPY

  • CLOSING NOTE [2014-01-20 Mon 13:13]
    removed

what to do with INDICES-TO-CALCULATE?

  • CLOSING NOTE [2014-01-20 Mon 13:13]
    removed

normalized-lump: scale should be per stripe

  • CLOSING NOTE [2015-01-21 Wed 21:08]
  • CLOSING NOTE [2015-01-21 Wed 21:09]

Doesn’t work well with dropout or rbms. Why?

  • lagging-average-gradients is too imprecise
  • lagging-average-gradients is for the whole network and not the particular subsets with dropouts which correspond to the input examples in a batch.

unroll

unroll only part of the network

example for unrolling with missing values

fix or remove missing value support

removed

unroll factored clouds

flip chunk

When an input chunk is to be reconstructed it should go above the layer in the bpn instead of below it where it is in the BM.

gaussian processes

implement gaussian processes

  • CLOSING NOTE [2013-03-05 Tue 16:33]

misc

Doesn’t seem to speed up things.

remove dependency on BLAS: implement some of matlisp in Lisp

Ripped matrix.lisp and various bits from Matlisp.

what to do with USE-BLAS?

  • CLOSING NOTE [2013-03-05 Tue 16:31] The wrappers should calculate cost and call matlisp if it is avaible.

float vector I/O for cmucl

  • CLOSING NOTE [2013-03-05 Tue 19:39]

support more lisps

  • CLOSING NOTE [2013-03-05 Tue 16:32] Matlisp only supports cmucl, sbcl and allegro.

optimize the missing value case (SSE?)

lookup table based exp/sigmoid

investigate SPARTNS

  • CLOSING NOTE [2013-03-07 Thu 21:42]

investigate LISP-MATRIX

  • CLOSING NOTE [2013-03-07 Thu 21:42]

investigate LLA

  • CLOSING NOTE [2013-03-05 Tue 16:31]

factor out most frequent and log-likelihood-ratio based feature selection

parallelize with threads and/or across different images

add high level interface (scikit?)

use cuda

examples

movie review example

netflix example

  • CLOSING NOTE [2015-01-21 Wed 21:09]