How to implement batch norm #89

albertz · 2022-01-05T13:14:13Z

There are a couple of open question regarding how to implement batch norm using the building blocks of returnn-common. Of course we could also wrap the existing BatchNormLayer in RETURNN (which needs rwth-i6/returnn#891 though) but even if this would be the implementation of BatchNorm in returnn-common, the question still remains how to implement it from scratch using the building blocks of returnn-common. In any case, this should be possible, and preferably also in a straight-forward way.

One question is, how to handle the the train flag. This is #18.

Another question is, how to do custom updates for the running statistic variables. This is #90.

Another question is, how to make use of the TF fused op, which would be important for efficiency. Specifically, tf.compat.v1.nn.fused_batch_norm with data_format="NCHW".

Related are also the batch norm defaults (#83) although not too relevant for the question on how to implement this.

The text was updated successfully, but these errors were encountered:

albertz · 2022-01-05T16:15:05Z

Demo implementation:

class BatchNorm(nn.Module):

  def __init__(self, in_dim: Optional[nn.Dim] = None, *, affine: bool = True):
    """
    :param in_dim: the feature dimension of the input
    :param affine: whether to use learnable parameters gamma and beta
    """
    super().__init__()
    self.in_dim = in_dim
    self.mean = None  # type: Optional[nn.Parameter]
    self.var = None  # type: Optional[nn.Parameter]
    self.affine = affine
    self.gamma = None  # type: Optional[nn.Parameter]
    self.beta = None  # type: Optional[nn.Parameter]
    if in_dim:
      self._lazy_init(in_dim)

  def _lazy_init(self, in_dim: nn.Dim):
    self.in_dim = in_dim
    self.mean = nn.Parameter([in_dim], auxiliary=True)
    self.var = nn.Parameter([in_dim], auxiliary=True)
    if self.affine:
      self.gamma = nn.Parameter([in_dim])
      self.beta = nn.Parameter([in_dim])

  def __call__(self, source: nn.LayerRef, *, epsilon=1e-5, momentum=0.1) -> nn.Layer:
    source = nn.check_in_feature_dim_lazy_init(source, self.in_dim, self._lazy_init)
    reduce_dims = [d for d in source.data.dim_tags if d != self.in_dim]
    with nn.Cond(nn.get_train_flag()) as cond:
      mean_cur_batch, var_cur_batch = moments(source, reduce_dims)
      cond.else((mean_cur_batch, var_cur_batch))
      mean, var = cond.end((self.mean, self.var))
    with nn.Cond(nn.get_train_flag()) as cond:  # separate Cond such that this can be delayed
      self.mean.assign_add((mean - self.mean) * momentum)
      self.var.assign_add((var - self.var) * momentum)
    return (source - mean) * nn.rsqrt(var + epsilon)

#89

albertz · 2022-01-06T09:25:20Z

Because we want that the fused op is used when possible, we need to wrap some RETURNN layer in any case. Because of this, I just wrapped the whole module to the RETURNN layer now.

Some of these questions (how to handle/implement custom aux var updates) still remain but are not needed anymore for this case. Also, basically the case here is also clear, and just needs to be implemented. Which can be done once it is needed (maybe for sth else). But this is #90. And maybe also #18.

So I will close this now.

This was referenced Jan 5, 2022

How to have custom updates for parameters #90

Open

Fix batch norm defaults #83

Closed

albertz added a commit that referenced this issue Jan 6, 2022

BatchNorm by wrapping RETURNN

21e56ab

#89

albertz closed this as completed Jan 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to implement batch norm #89

How to implement batch norm #89

albertz commented Jan 5, 2022 •

edited

Loading

albertz commented Jan 5, 2022 •

edited

Loading

albertz commented Jan 6, 2022

How to implement batch norm #89

How to implement batch norm #89

Comments

albertz commented Jan 5, 2022 • edited Loading

albertz commented Jan 5, 2022 • edited Loading

albertz commented Jan 6, 2022

albertz commented Jan 5, 2022 •

edited

Loading

albertz commented Jan 5, 2022 •

edited

Loading