-
Notifications
You must be signed in to change notification settings - Fork 133
Description
Currently our implemented constraints are:
- L2 on weights (
L2
option on a layer) - Some exotic things on activations (
darc1
,spatial_smoothing
)
We already have the possibility to decouple the constraints from the normal loss computation, via decouple_constraints
. In #1206, this behavior will change a bit, and then it decouples only the data-independent constraints, i.e. namely only L2 currently.
L2 is equivalent to weight decay when SGD is used. With the new decoupled constraints code (#1206), it explicitly does:
return var.assign_sub(var * (l2 * 2.), use_locking=self.use_locking, read_value=False)
We can generalize such updates, and allow the user to perform some generic post updates on parameters.
For example, in rwth-i6/returnn_common#241 it was suggested to extend L2 to have some decay_center
. But instead of having such a L2-specific additional option, we can allow the user to perform any custom post updates, similar as the code above. Then the user could easily do such delay_center
logic, but also many other things as well.
Also related: rwth-i6/returnn_common#90
How would the API look like on RETURNN side? It's maybe also ok to only do this for the VariableLayer
.