`test_mp_sync_batch_norm` fails with ambiguous shape mismatch error when using dynamic `sym_size`

## 🐛 Bug

Getting the following error on running any of the following three methods under `test/test_mp_sync_batch_norm.py`. It's strange to see this issue because the expected shape and the calculated shape are the same - yet we have an error.

```
  sync_bn1d_multi_channel_test(index)
  sync_bn2d_test(index)
  sync_bn3d_test(index)
```

ERROR:
```
$ python test/test_mp_sync_batch_norm.py
INFO:torch_xla:Letting libtpu.so load fail during _XLAC import. libtpu.so will be loaded from `libtpu` Python package when the ComputationClient is created.
2022-08-09 03:46:00.339314: I 1911080 tensorflow/core/tpu/tpu_initializer_helper.cc:253] Libtpu path is: /dev/null
2022-08-09 03:46:00.339558: I 1911080 tensorflow/compiler/xla/xla_client/xrt_local_service.cc:55] libtpu status: OK
2022-08-09 03:46:00.339598: I 1911080 tensorflow/compiler/xla/xla_client/xrt_local_service.cc:41] Peer localservice 1 {localhost:51011}
2022-08-09 03:46:00.339700: I 1911080 tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-09 03:46:00.351539: W 1911080 tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-08-09 03:46:00.351582: W 1911080 tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-08-09 03:46:00.351614: I 1911080 tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (5fc32f4fda5a): /proc/driver/nvidia/version does not exist
2022-08-09 03:46:00.395516: I 1911080 tensorflow/compiler/xla/service/service.cc:174] XLA service 0x555ade94c210 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-08-09 03:46:00.395576: I 1911080 tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): Host, Default Version
2022-08-09 03:46:00.440707: I 1911080 tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:272] Initialize GrpcChannelCache for job localservice -> {0 -> localhost:51011}
2022-08-09 03:46:00.441636: I 1911080 tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:438] Started server with target: grpc://localhost:51011
2022-08-09 03:46:00.498331: I 1911833 tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-08-09 03:46:00.515291: I 1911454 tensorflow/compiler/jit/xla_device.cc:429] XLA_GPU and XLA_CPU devices are deprecated and will be removed in subsequent releases. Instead, use either @tf.function(jit_compile=True) for must-compile semantics, or run with TF_XLA_FLAGS=--tf_xla_auto_jit=2 for auto-clustering best-effort compilation.
Traceback (most recent call last):
  File "test/test_mp_sync_batch_norm.py", line 146, in <module>
    xmp.spawn(_mp_fn, args=())
  File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.13-py3.7-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 383, in spawn
    return _run_direct(fn, args, nprocs, join, daemon, start_method)
  File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.13-py3.7-linux-x86_64.egg/torch_xla/distributed/xla_multiprocessing.py", line 344, in _run_direct
    fn(0, *args)
  File "test/test_mp_sync_batch_norm.py", line 142, in _mp_fn
    sync_bn3d_test(index)
  File "test/test_mp_sync_batch_norm.py", line 124, in sync_bn3d_test
    result = run_step(sbn_xla, t_xla)
  File "test/test_mp_sync_batch_norm.py", line 19, in run_step
    loss.backward()
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 485, in backward
    self, gradient, retain_graph, create_graph, inputs=inputs
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 193, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: Function SumBackward0 returned an invalid gradient at index 0 - got [16, 32, 16, 32, 32] but expected shape compatible with [16, 32, 16, 32, 32]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`test_mp_sync_batch_norm` fails with ambiguous shape mismatch error when using dynamic `sym_size` #3844

🐛 Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

test_mp_sync_batch_norm fails with ambiguous shape mismatch error when using dynamic sym_size #3844

Description

🐛 Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`test_mp_sync_batch_norm` fails with ambiguous shape mismatch error when using dynamic `sym_size` #3844