Fully Trainable SSMs

Causal 1D SSMs

A causal 1D SSM is defined as

$$ x_t = Ax_{t-1} + B u_t, \quad y_t = C x_t + D u_t $$

where $u$, $x$ and $y$ are the input, state and output vectors, respectively. In many works, the state matrix $A$ is fixed to certain constant ones, say the HiPPO matrix. Actually, it is the most important matrix in an SSM accounting for its long term memories. We should learn it from data. There are two possible modal decompositions suitable for highly efficient fully trainable SSM learnings.

Modal decomposition I

The eigenvalue decomposition (EVD) form. With EVD $A = V\Lambda V^{-1}$, we can introduce a (generally) complex state vector $V^{-1}x$ to diagonalize the state matrix. This is the easiest way to learn $A$, and training is fast. However, there may be some waste of degrees of freedoms.

Modal decomposition II

Another modal decomposition is to stick to the domain of real numbers strictly. Say that $A$ has a pair of complex eigenvalues and eigenvectors: $A (v_R \pm j v_I) = (\lambda_R \pm j\lambda_I)( v_R \pm jv_I)$, where $j=\sqrt{-1}$. We can rewrite it as

$$ A[v_R, v_I]= [v_R, v_I] \begin{bmatrix} \lambda_R & \lambda_I \\ -\lambda_I & \lambda_R \end{bmatrix} $$

This sugguests that we always can block diagonalize $A$, and those $2\times 2$ blocks will have either of the following forms

$${\rm real \, modes:}\; \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}, \quad {\rm complex \, modes:}\; \begin{bmatrix} \lambda_R & \lambda_I \\ -\lambda_I & \lambda_R \end{bmatrix}$$

There are some rough notes here.

Some features

It supports sample rate conversion (see resample_up and resample_down settings).
It can enforce stability by pulling poles into the unit disc as

$$\lambda \rightarrow \lambda/\sqrt{|\lambda|^2 + 1}$$

By default, we do not enforce stability (could be unstable for long sequences).
May need to manually tune the scales of matrices $B$ and $C$ for optimal performance. Their default scales could be too large for long sequences.

Comparison against Mamba and Attention

On a simple language problem we compare the training loss perplexities of our Complex SSM vs Real vs Mamba vs Attention

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
misc		misc
README.md		README.md
demo.py		demo.py
state_space_models.py		state_space_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fully Trainable SSMs

Causal 1D SSMs

Modal decomposition I

Modal decomposition II

Some features

Comparison against Mamba and Attention

About

Releases

Packages

Contributors 2

Languages

lixilinx/Fully-Trainable-SSM

Folders and files

Latest commit

History

Repository files navigation

Fully Trainable SSMs

Causal 1D SSMs

Modal decomposition I

Modal decomposition II

Some features

Comparison against Mamba and Attention

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages