Skip to content

Generalized constraints: post update hooks #1214

@albertz

Description

@albertz

Currently our implemented constraints are:

  • L2 on weights (L2 option on a layer)
  • Some exotic things on activations (darc1, spatial_smoothing)

We already have the possibility to decouple the constraints from the normal loss computation, via decouple_constraints. In #1206, this behavior will change a bit, and then it decouples only the data-independent constraints, i.e. namely only L2 currently.

L2 is equivalent to weight decay when SGD is used. With the new decoupled constraints code (#1206), it explicitly does:

                return var.assign_sub(var * (l2 * 2.), use_locking=self.use_locking, read_value=False)

We can generalize such updates, and allow the user to perform some generic post updates on parameters.

For example, in rwth-i6/returnn_common#241 it was suggested to extend L2 to have some decay_center. But instead of having such a L2-specific additional option, we can allow the user to perform any custom post updates, similar as the code above. Then the user could easily do such delay_center logic, but also many other things as well.

Also related: rwth-i6/returnn_common#90

How would the API look like on RETURNN side? It's maybe also ok to only do this for the VariableLayer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions