Skip to content

SplitDimsLayer fix feature_dim_axis on feature-dim split #705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 1, 2021

Conversation

albertz
Copy link
Member

@albertz albertz commented Oct 1, 2021

Fix #704.

WARNING: This potentially changed the behavior of configs.
E.g. the one reported in #703.
Most of the attention and transducer encoders which used initial convolutional layers.

Related is #596. Maybe actually the fix for that in #683 introduces this new bug.

WARNING: This potentially changed the behavior of configs.
E.g. the one reported in #703.
Most of the attention and transducer encoders
which used initial convolutional layers.

However, this bug probably did not exist for a long time.

Fix #704.
@albertz albertz requested a review from a team as a code owner October 1, 2021 09:52
@albertz albertz force-pushed the albert-split-dims-feature-704 branch from 8f1d12c to ad95110 Compare October 1, 2021 10:04
@albertz
Copy link
Member Author

albertz commented Oct 1, 2021

So, Git bisecting:

It is bad in current master (ea3e91f).
It was still good in 6f7f622 (10 days ago).
2ff056d is the first bad commit.

commit 2ff056de31838b6425333a882759df0093b5bb63
Author: Albert Zeyer <[email protected]>
Date:   Wed Sep 22 14:45:18 2021 +0200

    SplitDimsLayer, fix feature_dim_axis for batch-feature-major input
    
    Fix #596

 returnn/tf/layers/basic.py | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

So yes, as expected #683 introduced this bug.

@albertz albertz merged commit 4d7908b into master Oct 1, 2021
@albertz albertz deleted the albert-split-dims-feature-704 branch October 1, 2021 10:22
@albertz
Copy link
Member Author

albertz commented Oct 1, 2021

Note on the potential wrong behavior

This did not cause an error, just wrong behavior!
When you used split dims like this:

"source0": {"class": "split_dims", "axis": "F", "dims": (-1, 1), "from": "data"},  # (B,T,40,1)

And then 2D convolution, it would operate on the two expected spatial dims. The spatial dims are those which are not the feature dim and the batch dim. So it matters, from these splitted dims 40 -> (40,1), which of these are actually the feature dim.

Earlier, the last axis (with dim 1) was the new feature dim.
#683 changed this, and dim 40 here kept being the feature dim, as you can see in this output:

layer root/'source0' output: Data{'source0_output', [B,T|'time'[B],F|F'feature:data'(40),'source0_split_dims1'(1)]}

In some follow up 2D conv with options:

"conv0": {
  "class": "conv", "from": "source0",
  "padding": "same", "filter_size": (3, 3),
  "n_out": 32, "activation": None, "with_bias": True},  # (B,T,40,32)

you then see:

layer root/'conv0' output: Data{'conv0_output', [B,T|'time'[B],'source0_split_dims1'(1),F|F'conv0:channel'(32)]}

So the second spatial dim this operated on was source0_split_dims1 from above.
This was not the intended and old behavior.

@albertz
Copy link
Member Author

albertz commented Oct 1, 2021

Note that with #706 (out_shape for verification) we could have detected such problem more directly.

And the proposed options in #597 (e.g. in_spatial_dims for ConvLayer) also would have avoided the problem.

Probably any solution for #594 would also have avoided the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SplitDimsLayer wrong out feature axis when splitting feature axis
1 participant