-
Notifications
You must be signed in to change notification settings - Fork 4
RETURNN layers with hidden state should make it explicit #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Implemented partly now. All recurrent layers do return a tuple The returned
This might need to have special implementations for each case. Not sure. |
Another issue: Some layers like Maybe we can make two versions of the module, one which is not recurrent and one which is recurrent. Also unclear whether this should be automatically generated, or whether we explicitly handle these cases. |
We should be a bit more specific, and list the layers or modules which need custom hidden state. E.g. We also should list the layers where we might consider non-recurrent variants. It should be collected by editing the main post. |
It's not totally obvious.
state = output or h = output, that is the case only inside the loop. The So, you could argue, |
I introduced the explicit By convention, I suggest that the return in such case should be a tuple, and the last item should always be this Note that
When |
Passing So we need a new way on RETURNN side to allow passing the state per frame on such layers, via a new But we also want to be able to make use of RETURNN rec layer automatic optimization, which is tricky as well. We need to figure out if the layer uses its own prev state as input However, at the time we call the module (layer maker), it can not know this. Only when it returns and when So this means we need to delay the layer dict creation, or post edit it to apply the optimization and remove the This is not only for the We can recursively go through a |
Fix test_rec_inner_lstm. Also see #31.
Note that commit 15d67a9 implements the mentioned optimization now more or less just as outlined. We still can not pass |
Another problem: The behavior of such modules (e.g.
So, how do we figure out which case we have, single step or sequence? We could simply check whether there is an outer loop context. However, that means that we never could use a Note that RETURNN determines this based on the input. If the input has a time axis, it will operate on the sequence, otherwise it will do a single step. However, to be able to replicate such a logic here, we need the shape information (#47). Or we make it more explicit by having two variants here (just like PyTorch), e.g. |
I tend to the solution of just having separate modules The question is then also about other recurrent layers. |
Note that 5f590e2 implemented now a distinction between the options |
If we go with two variants for rec modules, one for the per-step operation (inside loop), another one for operating on a sequence (outside loop, although you could also use it inside if there is another separate time axis maybe), then here are suggestions:
Notes:
|
Ok this is mostly done now. |
As it was discussed in #16, RETURNN layers with (hidden) state (e.g.
RecLayer
withunit="lstm"
) should make the state explicit in the API. E.g. theRec
module should get two argumentsinput
andprev_state
and returnoutput
andstate
. So the usage would look like this in a loop:Or like this outside a loop (using default initial state, ignoring last state):
This applies for all RETURNN layers with rec hidden state, and further modules like
Lstm
.See RETURNN layers with rec hidden state.
Relevant modules here:
_Rec
based, e.g.Lstm
(only one so far)window
cumsum
ken_lm_state
edit_distance_table
unmask
_TwoDLSTM
cum_concat
The text was updated successfully, but these errors were encountered: