Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casting error in simple model with a constant tensor<f32> during --EmitLib #335

Open
agostini01 opened this issue Oct 5, 2020 · 10 comments

Comments

@agostini01
Copy link

Got an error in a (Tensor+Tensor)+Constant graph.

This is the onnx.mlir code:

module {
  func @main_graph(%arg0: tensor<4x5xf32>, %arg1: tensor<4x5xf32>) -> tensor<4x5xf32> attributes {input_names = ["input_y:0", "input_x:0"], output_names = ["output:0"]} {
    %0 = "onnx.Add"(%arg1, %arg0) {onnx_node_name = "added"} : (tensor<4x5xf32>, tensor<4x5xf32>) -> tensor<4x5xf32>
    %1 = "onnx.Constant"() {value = dense<4.200000e+01> : tensor<f32>} : () -> tensor<f32>
    %2 = "onnx.Add"(%0, %1) {onnx_node_name = "add"} : (tensor<4x5xf32>, tensor<f32>) -> tensor<4x5xf32>
    return %2 : tensor<4x5xf32>
  }
  "onnx.EntryPoint"() {func = @main_graph, numInputs = 2 : i32, numOutputs = 1 : i32} : () -> ()
}

And this is the executed line and the error:

$ /working_dir/onnx-mlir/build/bin/onnx-mlir --EmitLib /working_dir/examples/model/custom_add_plus_cte/onnx_mlir_generated/model.onnx.mlir

onnx-mlir: /working_dir/llvm-project/llvm/include/llvm/Support/Casting.h:269: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::FixedVectorType, Y = llvm::Type]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.

Note that the same example without the internal constant, but with a 1xf32 argument works:

module {
  func @main_graph(%arg0: tensor<4x5xf32>, %arg1: tensor<4x5xf32>, %arg2: tensor<1xf32>) -> tensor<4x5xf32> attributes {input_names = ["input_y:0", "input_x:0", "cte:0"], output_names = ["output:0"]} {
    %0 = "onnx.Add"(%arg1, %arg0) {onnx_node_name = "added"} : (tensor<4x5xf32>, tensor<4x5xf32>) -> tensor<4x5xf32>
    %1 = "onnx.Add"(%0, %arg2) {onnx_node_name = "add"} : (tensor<4x5xf32>, tensor<1xf32>) -> tensor<4x5xf32>
    return %1 : tensor<4x5xf32>
  }
  "onnx.EntryPoint"() {func = @main_graph, numInputs = 3 : i32, numOutputs = 1 : i32} : () -> ()
}
@agostini01 agostini01 changed the title Casting error when using a model with constants during --EmitLib Casting error in simple model with a constant during --EmitLib Oct 5, 2020
@agostini01
Copy link
Author

agostini01 commented Oct 5, 2020

Looking further into the problem, I found that if we replace the type declaration:
tensor<f32> by tensor<1xf32> the compilation is successful.

Compilation works if the mlir generated from the onnx model were this:

module {
  func @main_graph(%arg0: tensor<4x5xf32>, %arg1: tensor<4x5xf32>) -> tensor<4x5xf32> attributes {input_names = ["input_y:0", "input_x:0"], output_names = ["output:0"]} {
    %0 = "onnx.Add"(%arg1, %arg0) {onnx_node_name = "added"} : (tensor<4x5xf32>, tensor<4x5xf32>) -> tensor<4x5xf32>
    %1 = "onnx.Constant"() {value = dense<4.200000e+01> : tensor<1xf32>} : () -> tensor<1xf32>
    %2 = "onnx.Add"(%0, %1) {onnx_node_name = "add"} : (tensor<4x5xf32>, tensor<1xf32>) -> tensor<4x5xf32>
    return %2 : tensor<4x5xf32>
  }
  "onnx.EntryPoint"() {func = @main_graph, numInputs = 2 : i32, numOutputs = 1 : i32} : () -> ()
}

However,

/working_dir/onnx-mlir/build/bin/onnx-mlir --EmitONNXIR /working_dir/examples/model/custom_add_plus_cte/model.onnx -o /working_dir/examples/model/custom_add_plus_cte/onnx_mlir_generated/model

emits this incorrect IR for this model. I will try to add the model in the next comment.

@agostini01
Copy link
Author

I could not add the model.onnx file, but this is how the onnx file was generated:

#!/bin/python

# To execute this, you need to have the following installed
# pip install tensorflow onnx tf2onnx

import tensorflow as tf
import tf2onnx


# Declare a custom function that represents the graph with all its inputs
def add_plus_cte(x, y, cte):
    added = tf.math.add(
        x, y, name='added'
    )
    added_plus_cte = added + cte
    return added_plus_cte

# Create a `Function` object that contains a graph
fun_obj = tf.function(add_plus_cte)

# Make some tensors to test it
x1 = tf.constant([[1.0, 2.0]])
y1 = tf.constant([[2.0, 3.0]])
b1 = tf.constant(4.0)

# It works!
print(fun_obj(x1, y1, b1).numpy())

# Wrap everything in a session to be read by tf2onnx tool
with tf.compat.v1.Session() as sess:

    # Declare graph input arguments (it has bigger dimensions)
    x = tf.compat.v1.placeholder(tf.float32, [4, 5], name="input_x")
    y = tf.compat.v1.placeholder(tf.float32, [4, 5], name="input_y")
    cte = tf.constant(42.0) # The constant is not a variable. It is embedded in the graph.
    result = add_plus_cte(x,y,cte)
    _ = tf.identity(result, name="output")

    # Create the onnx graph
    onnx_graph = tf2onnx.tfonnx.process_tf_graph(sess.graph, input_names=["input_x:0","input_y:0"], output_names=["output:0"])
    model_proto = onnx_graph.make_model("custom_add_plus_cte")

    # Save to file
    with open("model.onnx", "wb") as f:
        f.write(model_proto.SerializeToString())

@agostini01 agostini01 changed the title Casting error in simple model with a constant during --EmitLib Casting error in simple model with a constant tensor<f32> during --EmitLib Oct 5, 2020
@chentong319
Copy link
Collaborator

Thanks @agostini01. Could you send the model (.onnx) file to me by email([email protected])?

@agostini01
Copy link
Author

agostini01 commented Oct 5, 2020

@chentong319 , just sent you an email.

The manual fix allows for LLVM IR to be generated but breaks downstream during the compilation of the .so binary.

@chentong319
Copy link
Collaborator

@agostini01 I did not see your email yet. Is the model file too large?

@agostini01
Copy link
Author

agostini01 commented Oct 6, 2020

@chentong319 my email must have been sent to a spam folder.

I have created a github repo and included the example: https://github.com/agostini01/failing-onnx-models/tree/main/custom_tensor_add_plus_cte

@chentong319
Copy link
Collaborator

I downloaded the model and tried. So far I found that the onnx::TensorProto node for this denseElemtAttr has dims().size() == 0. It should be 1. That's why the importer generated tensor, but not tensor<1xf32>. Need further to identify the source.

@agostini01
Copy link
Author

This is good that you found one of the sources of the problem.

I was playing with onnx-mlir source code, but could not spot how to debug this error.
Do you set breakpoints on specific files?
How/where do you know that f32 was generated instead of 1xf32?

@chentong319
Copy link
Collaborator

I set break point in onnx-mlir/src/Builder/FrontendDialectHelper.cpp: mlir::DenseElementsAttr onnxTensorProtoToDenseElmAttr. This is the procedure to construct the attribute for ConstantOp. I dumped the type and also print out initializer.dims().size().
By the way, I tried to use your model generation python code. I got an error (same error for other example code). I think that the error is caused by that my tensor flow version is too new than the tensor2onnx tool (from package not the source) required). Did you install tensor2onnx from package or source?

@doru1004
Copy link
Collaborator

doru1004 commented Nov 6, 2020

@chentong319 any updates on whether this error has already been fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants