Skip to content

merge_dims does not define its own output dimension #117

@JackTemaki

Description

@JackTemaki

I have the following code:

        print([out_spatial_dims[-1], self.last_conv_out_dim])
        out, out_feature_dim = merge_dims(x, axes=[out_spatial_dims[-1], self.last_conv_out_dim])

Where the print gives:

[Dim{'encoder/conv_block:out-spatial-dim1'(10)}, Dim{F'conv_channel'(32)}]

This crashes with:

  File "/u/rossenbach/experiments/tts_asr_2021/recipe/returnn_common/nn/base.py", line 551, in make_layer
    line: return make_layer(
            layer_dict=layer_dict, predefined_out_data=predefined_out_data, add_out_shape_info=add_out_shape_info,
            name=name_ctx)
    locals:
      make_layer = <global> <function make_layer at 0x7fad6bbd1040>
      layer_dict = <local> {'class': 'merge_dims', 'from': <Tensor /'encoder'/'conv_block'/'pool_0' [B(-1),F|F'conv_channel'(32),T|'audio_features_time'[?],'encoder/conv_block:out-spatial-dim1'(10)] via 'pool'>,
 'axes': [Dim{'encoder/conv_block:out-spatial-dim1'(10)}, Dim{F'conv_channel'(32)}], 'out_dim': Dim{F'encoder/con...
      predefined_out_data = <local> None
      add_out_shape_info = <local> True
      name = <local> <NameCtx /'encoder'/'conv_block'/'merge_dims'>
      name_ctx = <local> <NameCtx /'encoder'/'conv_block'/'merge_dims'>
  File "/u/rossenbach/experiments/tts_asr_2021/recipe/returnn_common/nn/base.py", line 560, in make_layer
    line: layer = Tensor(
            layer_dict=layer_dict, name_ctx=name_ctx,
            data=predefined_out_data, add_out_shape_info=add_out_shape_info)
    locals:
      layer = <not found>
      Tensor = <global> <class 'returnn_common.nn.base.Tensor'>
      layer_dict = <local> {'class': 'merge_dims', 'from': <Tensor /'encoder'/'conv_block'/'pool_0' [B(-1),F|F'conv_channel'(32),T|'audio_features_time'[?],'encoder/conv_block:out-spatial-dim1'(10)] via 'pool'>,
 'axes': [Dim{'encoder/conv_block:out-spatial-dim1'(10)}, Dim{F'conv_channel'(32)}], 'out_dim': Dim{F'encoder/con...
      name_ctx = <local> <NameCtx /'encoder'/'conv_block'/'merge_dims'>
      data = <not found>
      predefined_out_data = <local> None
      add_out_shape_info = <local> True
  File "/u/rossenbach/experiments/tts_asr_2021/recipe/returnn_common/nn/base.py", line 104, in Tensor.__init__
    line: data = _data_from_layer_dict(layer_dict)
    locals:
      data = <local> None
      _data_from_layer_dict = <global> <function _data_from_layer_dict at 0x7fad4173f9d0>
      layer_dict = <local> {'class': 'merge_dims', 'from': <Tensor /'encoder'/'conv_block'/'pool_0' [B(-1),F|F'conv_channel'(32),T|'audio_features_time'[?],'encoder/conv_block:out-spatial-dim1'(10)] via 'pool'>,
 'axes': [Dim{'encoder/conv_block:out-spatial-dim1'(10)}, Dim{F'conv_channel'(32)}], 'out_dim': Dim{F'encoder/con...
  File "/u/rossenbach/experiments/tts_asr_2021/recipe/returnn_common/nn/base.py", line 684, in _data_from_layer_dict
    line: out_data = layer_class.get_out_data_from_opts(**layer_desc)
    locals:
      out_data = <not found>
      layer_class = <local> <class 'returnn.tf.layers.basic.MergeDimsLayer'>
      layer_class.get_out_data_from_opts = <local> <bound method MergeDimsLayer.get_out_data_from_opts of <class 'returnn.tf.layers.basic.MergeDimsLayer'>>
      layer_desc = <local> {'axes': [Dim{'encoder/conv_block:out-spatial-dim1'(10)}, Dim{F'conv_channel'(32)}], 'out_dim': Dim{F'encoder/conv_block:out_dim'[?]}, '_network': <TFNetwork 'dummy_net' train=False>, 
'_name': 'output', 'sources': [<InternalLayer 'pool_0' out_type=Data{[B(-1),F|F'conv_channel'(32),T|'audio_featur..., len = 7
  File "/work/asr4/rossenbach/env/python38_sisyphus/lib/python3.8/site-packages/returnn/tf/layers/basic.py", line 3507, in MergeDimsLayer.get_out_data_from_opts
    line: assert out_dim.dimension == res_dim
    locals:
      out_dim = <local> Dim{F'encoder/conv_block:out_dim'[?]}
      out_dim.dimension = <local> None
      res_dim = <local> 320
AssertionError

because merge_dims does not try to set the dimension but just does:

out_dim = nn.Dim(kind=kind, description=f'{nn.NameCtx.current_ctx().get_abs_name()}:out_dim')

I know I could manually pass a new output dimension with "320", but the thing is that I do not know this value. The spatial dimension is the result of 2 convolution operations, so as user writing the code you do not know yet what dimension you will end up with, but the layer already knows it will be 320 because of the input dims. So it would be good if the output Dim could also be auto-generated.

The fix would be fairly easy, e.g.

    dimension = None
    if all([axis.dimension is not None for axis in axes]):
      dimension = int(numpy.prod([axis.dimension for axis in axes]))
    out_dim = nn.Dim(kind=kind, description=f'{nn.NameCtx.current_ctx().get_abs_name()}:out_dim', dimension=dimension)

but this is auto-generated code, so I am not sure how to handle this...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions