-
Notifications
You must be signed in to change notification settings - Fork 4
How to have custom updates for parameters #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note on parameters: Trainable parameters are usually updated by the optimizer for the defined losses. This is usually via a loop over mini batches over some dataset (or infinite data source) where the optimizer performs an update on the params each step. The custom update discussed here would be a custom update per step (i.e. per mini batch). This might only make sense for non-trainable params. Parameters (variables) are created via Due to possible transformations on parameters (e.g. weight norm #91), other code might not always really get a So maybe the The custom update might be conditional, e.g. only be done in training (#18). In the computation graph, when should the update be done? E.g. when other code reads the variable, should we make sure that it was updated before or after? Or this would be defined by the ordering of code / execution, i.e:
Is this also well defined when done in a loop or cond (#24)? As mentioned before, it is probably common to make this dependent on the train flag (#18) but it could also depend on other things. Should we allow multiple assigns? This might get tricky together with loop or cond? Can an auxiliary parameter also be trainable, i.e. being updated by some optimizer as well? Again the question would be on the order. |
What would be the return value? Maybe just |
In case this is used in addition to the loss, the question is in what order do we apply both updates, and however the custom update is calculated, should that depend on the grad update or not. When not enforced, this will be non-deterministic, which is maybe not a good idea. The class _DenseResourceVariableProcessor(_OptimizableVariable):
...
def update_op(self, optimizer, g):
...
update_op = optimizer._resource_apply_dense(g, self._v)
if self._v.constraint is not None:
with ops.control_dependencies([update_op]):
return self._v.assign(self._v.constraint(self._v))
else:
return update_op This is for the case that the optimizer updates the variable. Otherwise we need to take care of it explicitly. This is maybe still a bit too restrictive. An explicit Also, there are probably different use cases, where the custom update might wait first for some other variable update, or must be executed before some other var is updated. When we don't explicitly handle this, then this will be arbitrary and non-deterministic which is maybe not a good idea, unless it is totally independent from all other vars, which might be the case for many use cases (e.g. applying weight decay). |
This issue on RETURNN side would allow for sth like this here: rwth-i6/returnn#1214 |
Common use cases:
The text was updated successfully, but these errors were encountered: