How to have custom updates for parameters #90

albertz · 2022-01-05T13:16:44Z

Common use cases:

running statistics, for example for batch norm (How to implement batch norm #89)
weight decay (executed before grad update)
constraints, renormalization of vars (executed after grad update)

albertz · 2022-01-05T13:38:04Z

Note on parameters:

Trainable parameters are usually updated by the optimizer for the defined losses. This is usually via a loop over mini batches over some dataset (or infinite data source) where the optimizer performs an update on the params each step.

The custom update discussed here would be a custom update per step (i.e. per mini batch). This might only make sense for non-trainable params.

Parameters (variables) are created via nn.Parameter and behave just like layers or layer refs.

Due to possible transformations on parameters (e.g. weight norm #91), other code might not always really get a nn.Parameter instance but any nn.Tensor. Although when you want to do custom updates on a parameter, you likely did mark it as auxiliary, and then it should not have been transformed (I assume...).

So maybe the auxiliary flag means exactly that we (might) have a custom update.

The custom update might be conditional, e.g. only be done in training (#18).

In the computation graph, when should the update be done? E.g. when other code reads the variable, should we make sure that it was updated before or after? Or this would be defined by the ordering of code / execution, i.e:

p = nn.Parameter(...)
nn.print(p)  # old
p.assign(...)   # custom update (draft API)
nn.print(p)  # new

Is this also well defined when done in a loop or cond (#24)? As mentioned before, it is probably common to make this dependent on the train flag (#18) but it could also depend on other things.

Should we allow multiple assigns? This might get tricky together with loop or cond?

Can an auxiliary parameter also be trainable, i.e. being updated by some optimizer as well? Again the question would be on the order.

albertz · 2022-01-05T16:30:08Z

Parameter.assign would wrap to tf.assign. There should also be Parameter.assign_add, assign_sub, etc.

What would be the return value? Maybe just None. This is basically related to the question of when this is actually executed, i.e. how the order is defined.

albertz · 2022-03-21T13:41:33Z

In case this is used in addition to the loss, the question is in what order do we apply both updates, and however the custom update is calculated, should that depend on the grad update or not. When not enforced, this will be non-deterministic, which is maybe not a good idea.

The constraint of a tf.Variable might be a nice and clean generic way which is also used by the optimizer, with the right control dependencies, after the variable was updated by the optimizer. This is formulated as a transformation old -> new, and executed by this TF code:

class _DenseResourceVariableProcessor(_OptimizableVariable):
  ...
  def update_op(self, optimizer, g):
    ...
    update_op = optimizer._resource_apply_dense(g, self._v)
    if self._v.constraint is not None:
      with ops.control_dependencies([update_op]):
        return self._v.assign(self._v.constraint(self._v))
    else:
      return update_op

This is for the case that the optimizer updates the variable. Otherwise we need to take care of it explicitly.

This is maybe still a bit too restrictive. An explicit assign_add or so might be better.

Also, there are probably different use cases, where the custom update might wait first for some other variable update, or must be executed before some other var is updated. When we don't explicitly handle this, then this will be arbitrary and non-deterministic which is maybe not a good idea, unless it is totally independent from all other vars, which might be the case for many use cases (e.g. applying weight decay).

albertz · 2022-11-14T09:49:32Z

This issue on RETURNN side would allow for sth like this here: rwth-i6/returnn#1214

albertz mentioned this issue Jan 5, 2022

How to implement batch norm #89

Closed

albertz added this to the first-release milestone Jan 5, 2022

albertz mentioned this issue Jan 5, 2022

Missing pieces for first release #32

Open

albertz mentioned this issue Jan 18, 2022

Model checkpoint load and store logic #93

Open

albertz mentioned this issue Mar 20, 2022

How to define the API for parameter initialization, regularization (L2, weight dropout, etc), maybe updater opts per-param #59

Closed

albertz mentioned this issue Nov 14, 2022

Generalized constraints: post update hooks rwth-i6/returnn#1214

Open

albertz mentioned this issue Nov 14, 2022

Weight decay API maybe unintuitive #241

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to have custom updates for parameters #90

How to have custom updates for parameters #90

albertz commented Jan 5, 2022 •

edited

Loading

albertz commented Jan 5, 2022 •

edited

Loading

albertz commented Jan 5, 2022

albertz commented Mar 21, 2022

albertz commented Nov 14, 2022

How to have custom updates for parameters #90

How to have custom updates for parameters #90

Comments

albertz commented Jan 5, 2022 • edited Loading

albertz commented Jan 5, 2022 • edited Loading

albertz commented Jan 5, 2022

albertz commented Mar 21, 2022

albertz commented Nov 14, 2022

albertz commented Jan 5, 2022 •

edited

Loading

albertz commented Jan 5, 2022 •

edited

Loading