-
Notifications
You must be signed in to change notification settings - Fork 4
Make training loop and stages explicit? #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Why would you actually want that? I get the feeling that returnn-common should now do much more things than originally intended (being a nice interface for network construction & sharing modules). It feels like we would end up with 2 completely distinct interfaces, where one has to be kept compatible so that the other one can be translated into that (translating from returnn-common into a config file). I do not think this is a good idea at all. |
For the case that the user wants to define a custom optimizer, or any other custom update rule. Sure, the user could implement that again in pure TF and then import it. But one main idea of returnn-common is that we do not want to mix custom TF code (with custom layers, or custom So, again, this is now about having a custom optimizer. This is an optional thing. When you just want to use some of the existing ones, it is all fine. This is only one small part of the issue here. This issue is also about allowing for more custom training loops, or just making the training loop explicit. Again it is about explicit vs implicit.
I don't exactly understand. This issue here is about having the training loop or epoch loop explicit, and also maybe having the param update explicit. These are all things which are not really possible in RETURNN itself currently. So these are strict extensions. |
Maybe this issue here is just very unclear formulated. I do not see where the difference to #59 and #93 is. Having epoch based training stages is already possible (it is very easy to map returnn_common outputs to get_network, our current pipeline already does that), and for step-based changes (besides LR stuff) huge changes in RETURNN would be needed, and I would not be in favor of focusing on this, as this is not really a priority. Maybe you can clarify here what additional logic you want to have, I do not understand that yet. |
But this issue here is about making the epoch loop explicit, and then at some later point allow other schemes as well. Currently the high-level logic in RETURNN always looks like this: for epoch in range(num_epochs):
(optional: new network, new step function)
for batch in dataset.get_batches():
session.run(step) This is fine for many of the simpler applications and training schemes but not everything can fit into this scheme or would be quite unnatural or inefficient this way. For example (really random example): for epoch in range(num_epochs):
# get alignments for the whole dataset (while not updating the model)
for batch in dataset.get_batches():
alignments.update(session.run(get_alignments()))
# now update the model using the alignments
for batch in dataset.get_batches():
session.run(step) I'm also thinking about reinforcement learning (#19, #20) (see sequence training as a special case). Or maybe GAN-style training. Or some meta learning. There are many other custom training schemes as well. And while we have not really used them much, they are definitely also used for ASR/NLP applications. The reason we did not use them much is more that it was not easy for us to do so. Or maybe you don't really want the outer loop over epochs, and have some dataset (maybe also using for step_idx, batch in enumerate(train_dataset.get_batches()):
session.run(train_step)
if step_idx % N == 0: # always after N steps
eval_model(dev_dataset)
learning_rate_update()
store_model() Or anything else like this. To be able to be flexible on this, it needs to be explicit. If the epoch loop or step loop is not explicit, it becomes either impossible or very difficult to have custom logic there. Our current approach is to try to fit everything into the scheme above, and the epoch loop and step loop is implicit (not visible in the user config). Note that this is all technically quite simple, esp when it is explicit.
I'm not sure if you mean changes for a single step (within the step-loop), or changes on the step-loop logic itself. In the first case within the step-loop, I don't understand then, nothing needs to be changed. In the case for changing the step-loop, yes, this needs an extension on RETURNN side, but this is a very simple extension. Definitely no huge changes are needed.
Yes, maybe not for the first release (#32) or also not any intermediate usage (#98). However, when we later want to extend this to make this part flexible and explicit, I wonder if this makes it unnatural then. Because later we do want that the current code still works and not break it anymore. So somehow when no explicit loop is defined, it must somehow fallback to the standard epoch/step loop, and otherwise use the explicit loop. But maybe this is not a problem. |
for epoch in range(num_epochs):
# get alignments for the whole dataset (while not updating the model)
for batch in dataset.get_batches():
alignments.update(session.run(get_alignments()))
# now update the model using the alignments
for batch in dataset.get_batches():
session.run(step) But this is a really good example why your approach is maybe a little bit of a dead end here, at least in the perspective of how we structure experiments at i6. Our current pipelines already support alternating between alignment extraction and training, but of course with separate RETURNN calls, and I think it should stay like this. We already have the problem that When I understand correctly what you imagine, then it seems for me that directly switching to e.g. PyTorch would be more comfortable, because then you start designing your whole training process as you like with a fully tested and widely used framework, and just control it via some fixed parameters (I actually have this scenario in my daily work with two different PyTorch based repositories). The big advantage of RETURNN was the easily machine readable and writable config format, that can be very flexibly controlled by the Sisyphus pipeline for what ever needs (at least in theory), and now we are going to break this concept with But now you want So if we would go with your approach, this would mean that the usage of |
Then just imagine any other example. It was just an example. This is really not the point here. Imagine some reinforcement learning example, or whatever. The point is, there are cases where the standard structure (epoch/step training loop) is not optimal, and it can be useful that the user has flexibility over this. I'm sure you agree on this, right?
But even for this example, I would say this would be very suboptimal to do this. So sub-optimal that you really would not do it like this in practice. You do not want separate RETURNN calls here. Recompiling the computation graph, reloading the model into GPU memory, even waiting for a new slot in the cluster, you don't want to have this after every single (sub) epoch. This is not realistic, or would be very slow and lots of wasted computation time. But anyway, let's not focus too much on this example. It was just an example to show that there are cases where you might want to have some more custom training loop.
Discussions regarding Sisyphus hashes should be moved to #51. I don't think this is really a big issue. For any other issues with Sisyphus integration, please open separate issues. I don't think there are any issues. I'm not aware of any. If you think there are, as said, please open separate issues. Whatever there is, I think we can easily solve it without much effort. This is somewhat off-topic here in this issue.
Why is this one aspect of have custom training loops suddenly a reason to move over to a separate framework? This is just a single aspect, which should be a really minor change only for the user perspective, to make the training loop somewhat more custom. The usual case is probably that the user started with the standard training loop, has a working setup, and then wants to do some experiment which requires a custom training loop (e.g. some sort of sequence training).
I don't understand. How? This issue here is just about the possibility to define a custom training loop. So, yes, there is the question on having that explicit or allow for the implicit default of using our standard training loop. This was one of the open questions here. So I understand that you argue that you want to have the implicit default? Sure, we can do that. So then this is just optional.
Yes? How is this related to this issue? Where is the problem?
How does this becomes problematic? RETURNN was from the very beginning intended to be a flexible generic framework. And in many aspects, it is that. Having a custom training loop is not there yet but really this is not such a big thing as you maybe imagine currently. And again, also as said, this can be an optional thing. In RETURNN itself, it has to be optional because we don't want to break anything.
No, not really. First of all, returnn_common is not really so complex. I'm not sure what you mean. If you think something can be simplified, please make a separate issue, and be more specific. Then, they also don't need to be in sync. RETURNN has the strict principle to stay backward compatible. So for a fixed returnn-common version, you can always update RETURNN to the latest version. Once we declare the returnn-common API as stable (e.g. for the first release), we also have the same here as well. If there are any issues about this, also make a separate issue. There should not be the requirement that they are in sync.
I don't understand what you mean here. What approach are you referring to? Is this related to this issue? If this is not about this issue here, please open a separate issue. We should not mix up things. |
I just did not want to open 10 different issues that are somewhat related to this issue here, but put down my thoughts that came up when reading this issue here before I forget about them. I agree that most things are somewhat off-topic. My replies to some of your questions would be far more off-topic now, so I guess it is best if we take this offline (I think @mmz33 already organized a meeting for this). |
As we want this to be optional, then this can be extended later, and then this is not critical for the first release. |
As another source of inspiration, see PyTorch Lightning (example). There, your root module is derived from a special class |
All the iterative-looking logic currently being implemented, like
this defines what is being calculated per-step. So it assumes one outer loop over the steps.
Actually, you can also see this more as a model definition. What actually is done per-step is the model update in training via the optimizer, which is separated here. Actually the optimizer is also not really part of returnn-common yet.
In any case, now mixing this with logic which is done at initialization (also depending on whether this is a new run, starting in epoch 1, or a continued run, loading a previous checkpoint), and with logic done per-epoch, can maybe cause confusion.
Maybe we should make this logic more explicit, the definition what part the calculation is executed in what context. Maybe similar to our
Loop
(#16) logic with a context manager.At initialization, we would also handle parameter initialization (#59), or maybe also custom checkpoint loading (#93).
Some suggestion:
Old suggestion:
This however is not so optimal because param init would naturally probably already happen inside the
Model
constructor. Basically everything before the training loop would be part of the init context. Just like we also have it forLoop
.Also, it does not follow the natural logic that epoch loop is behind step loop. The step loop should be inside the epoch loop.
(This was suggested as part of #93 on the model checkpoint load and store logic.)
The text was updated successfully, but these errors were encountered: