-
Notifications
You must be signed in to change notification settings - Fork 4
added zoneout to rec.py #86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I wonder whether we should rather make this an option to the Also relevant is the handling of the train flag (#18). |
I dont really have a strong opinion on this, adding it as an option would be fine with me too.
From my understanding this right now is just in |
This is just a RETURNN specific detail. Actually also on RETURNN side, we maybe should change that as well, and make it an option to the NativeLstm2 e.g. as well or so. I'm a bit afraid that we end up with different LSTMs which are basically all the same except of the underlying implementation which supports a different feature set (zoneout, layer norm, recurrent dropout, weight dropout, whatever other things), and at some later point we would extend some of the implementation (e.g. NativeLstm to include zoneout and layer norm). I'm not sure...
Well, yes, that's what I mean, that is the problem. You just ignore this here. This could be wrong, depending on how we handle #18. |
This is actually a really good point. If you want to track this I can open an issue in the RETURNN repo so it doesnt get lost. Do you have a suggestion how to do this as explicit as possible? Passing the unit string is something I am no fan of. Maybe adding flags what to include (which might be mutually exclusive due to the nature of the concepts) and for now choose the unit based on that? That should be easy to adapt then later when RETURNN side changes. But maybe there is a better way.
I was assuming that the |
Yes, that was what I was thinking. We can directly try to do it in a clean way here, wrap the current RETURNN logic, and adapt when RETURNN is extended. (On RETURNN side, I wanted to implemented a new extended NativeLstm since a while, which includes optional zoneout + layer-norm + recurrent dropout.) This is the same line of thought for many things here in returnn-common. It always boils down to the question, what is actually the best and cleanest way or interface? So is it cleaner to have Note that in other situations, we argued that having many options is also not good. E.g. the What we also want to avoid is that this goes out of hand. E.g. there will be further LSTM variants or options in the future, and we keep adding more and more options.
Yea, well, I don't know. E.g. if we decide in #18 that we want to be explicit about But maybe we can also do that later here in returnn-common. I just wanted to say that #18 can have an influence on this code. |
I agree. I pushed an updated versoin which for now has a For #18 I agree that this will influence this layer, but I think we can then also change this layer after the decision is done in a separate PR. |
So you agree that one single module ( Yes, in principle, conceptually, many of the things I listed above could be combined in whatever way you like (zoneout, layer norm, recurrent dropout, etc). That's one argument why it would make sense in a single module because we cannot have separate modules for all possible combinations. However, effectively, they are currently partly mutually exclusive because we just lack an implementation which is generic enough to cover all cases. But this could be extended later. Or maybe it's cleaner if different LSTM modeling types (layer-norm, zoneout) really get a different module. I don't know. |
I should stop working late... I read "Note that in other situations, we argued that having many options is also not good." the word options as in: different classes not options for one class. My bad. I am not sure if this is a problem here though, because usually you would only call the
One benefit of having them in a special class would be that the unit opts could be done explicitly. Right now I would pass something like
to the class, having zoneout in a separate class would allow these parameters to be explicit. Writing them all in one |
No, I'm not referring just to reading users code. Sure, for the users code, it doesn't make much a difference if it is
vs
But I'm speaking about the maintainability of the returnn-common code. I think any functions, classes, modules, layers which get too complex by combining too much functionality are bad and get ugly. And the experience tells, over time, they get more and more options. After a while we will have an Having separate modules makes that cleaner. Even separate modules like your initial proposed Maybe In other parts, e.g. also for the Transformer, and in RETURNN in general, our solution to still give any flexibility to the user but to avoid complex layers or functions with too many options is to break it down into simple building blocks such that the user can put it together in whatever way he wants. This sometimes makes it difficult to still provide good efficiency. Our solution to that is automatic optimization in RETURNN, e.g. optimizing layers out of a loop, etc. Sometimes this is not so obvious though. See the long discussion on generalizing and extending self attention in RETURNN (rwth-i6/returnn#391), although I'm very happy now with the result ( Here in this case for LSTMs, this is tricky again. Sure, the user could already just write down the formula for a zoneout LSTM or whatever other LSTM explicitly and that would work, so the building blocks are there. But when the user writes down the formulas for a standard LSTM, it would be tricky to automatically convert that into a NativeLSTM.
Yes exactly, that's another thing. In a generic So, to conclude, I would avoid putting all functionality into one Maybe at some point, I write a new |
f1ba359
to
d265d2b
Compare
Okay so this is in the PR now. If I understood you right we will deal with the rest later when (or if) it comes up relevant, so this should be ready now? |
Can you rebase (due to conflicts)? Unfortunately the master is currently broken. But I'm working on it. |
91806f6
to
1816e3a
Compare
Should be done |
I merged now.
We can just do that in a follow-up PR or commit. It's really work-in-progress and some parts on this are not fully decided yet.
See #17 as a starting point. Dim tags are used to distinguish dims explicitly. The dim value (some |
Similar to
LSTM
addingZoneoutLSTM
torec.py
.