Accuracy issue in test_transformer.py cases

### 🚀 The feature, motivation and pitch

1. NotImplementedError: Could not run 'aten::_to_copy' with arguments from the 'NestedTensorXPU' backend
cases:
test_transformers.py::TestTransformersXPU::test_with_nested_tensor_input_xpu
2. We have no mechanism to handle SDPBackend::ERROR so far. Will give a fully support when we support all SDPBackends.
 cases:
"test_dispatch_fails_no_backend_xpu",
3. AssertionError: False is not true
    CPU fallback failure. To support aten::transformer_encoder_layer_forward with proper priority.
    "test_disable_fastpath_xpu",
 4. Double and complex datatype matmul is not supported in oneDNN
        # https://github.com/intel/torch-xpu-ops/issues/253
        "test_sdp_math_gradcheck_contiguous_inputs_False_xpu",
        "test_sdp_math_gradcheck_contiguous_inputs_True_xpu",
        "test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_True_xpu",
        "test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_False_xpu",
        "test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_True_xpu",
        "test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_False_xpu",
        "test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_True_xpu",
        "test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_False_xpu",
        "test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_True_xpu",
        "test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_False_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_no_attn_mask_dropout_p_0_0_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_4D_causal_attn_mask_dropout_p_0_0_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_4D_attn_mask_dropout_p_0_0_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_2D_causal_attn_mask_dropout_p_0_0_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_4D_input_dim_2D_attn_mask_dropout_p_0_0_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_no_attn_mask_dropout_p_0_0_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_3D_causal_attn_mask_dropout_p_0_0_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_3D_attn_mask_dropout_p_0_0_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_2D_causal_attn_mask_dropout_p_0_0_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_5_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_2_xpu",
        "test_scaled_dot_product_attention_3D_input_dim_2D_attn_mask_dropout_p_0_0_xpu",

reproduce step:
pytest -vs test_transformers_xpu.py -k xxx (case name)

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accuracy issue in test_transformer.py cases #761

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accuracy issue in test_transformer.py cases #761

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions