Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend instrumentSignature to print data #3078

Merged
merged 9 commits into from
Feb 20, 2025

Conversation

chentong319
Copy link
Collaborator

Extend instrumentSignature to be a tool that can be used to print the data of a particular onnx op at runtime.

  1. Change in the op selection. If the option instrumentSignature starts with "onnx_node_name:", only the rest of the string is used as regular expression pattern, and the attribute of "onnx_node_name", instead of op->getName(), is used to match. This change allows user to target a particular onnx op.
  2. When the op is mapped with onnx_node_name, the data of the tensors will be printed. Therefore, one attribute is added to onnx.PrintSignature op.
  3. Consequently, the lowering of PrintSignature op is modified.
  4. Another change, which may not be needed, is that the instrumentSignature pass is moved from onnx to mlir group to onnx transformation group, right after the other onnx instrumentation. I feel it is better to put all the instrumentation passes for onnx are together. A side benefit is that we can easily check the instrumentation with "--EmitONNXIR".

Future work:
If we want to debug the crash inside the execution of an onnx op, there two changes are needed:

  1. Move the print input before the onnx op.
  2. The output should be easily used as input to run that op alone.

@@ -214,7 +214,7 @@ void addONNXToKrnlPasses(mlir::PassManager &pm, int optLevel, bool enableCSE,

void addKrnlToAffinePasses(mlir::PassManager &pm) {
pm.addNestedPass<func::FuncOp>(
onnx_mlir::krnl::createConvertKrnlToAffinePass(enableParallel));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You lost some changes from dev main. Please add them back.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"Asterisk is also available.\n"
"e.g. \"onnx.*\" for all onnx operations.\n"
"If this option is started with \"onnx_node_name\"\n"
"the attribute of \"onnx_node_name\", instead of the op name\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the : in the pattern, as you are looking for this

std::string header = "onnx_node_name:";

Also you state that "the data values" will be printed. Is that the input data values, the output data values? Please specify.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

std::string header = "onnx_node_name:";
std::cout << signaturePattern << "\n";
bool useNodeName = false;
if (signaturePattern.rfind(header, 0) == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the behavior here? If there is one match of "onnx_node_name", then all we use the node name for all ops?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the option starts with onnx_node_name:, we use attribute("onnx_node_name"), instead of op->getName(), to check all the op.
I am not sure what you mean by node name. I modified CompilerOption.cpp for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the code is not doing what you think it is doing.

if the pattern passed is "onnx.Add, onnx_node_name:one_specific_op", then the rfind will return true and thus useNodeName is set to true. Then for every op in the walk, we will fetch the data from the onnx_node_name for alll.

I think what you need to do is:

  1. use the old code to match with an op. if there is a match, then do the old print.
  2. else search the onnx_node_name attribute, if non null, prefix the attribute with "onnx_node_name:", search. If hit, then do the new print including the data.

Copy link
Collaborator Author

@chentong319 chentong319 Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then for every op in the walk, we will fetch the data from the onnx_node_name for all.

Yes, that's what I want. For all the op, either from onnx_node_name or op->getName(). We could separate the pattern for onnx_node_name and getName, and match each op accordingly. I did not do that: just keep the pattern description simple.

Signed-off-by: Chen Tong <[email protected]>
@chentong319
Copy link
Collaborator Author

About the changes:

  1. Introduce a new option for instrumentation according the onnx_node_name. The original one is for opName, unchanged.
  2. The two instrumentations are implemented in the same pass. User can use them in the same time if they like to.

Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Maybe you could just print the output of a very small example as comments just to show folks what the output looks like.

auto dialect = op->getDialect();
Location loc = op->getLoc();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not check the logic in great detail, I assume that you used a small example mlir file and tried both the signature and the new functionality and that you were happy with the outputs.

Signed-off-by: Chen Tong <[email protected]>
@chentong319 chentong319 merged commit 1c862a8 into onnx:main Feb 20, 2025
7 checks passed
@chentong319 chentong319 deleted the print-onnx-op branch February 20, 2025 01:27
@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #16330 [push] Extend instrumentSignatu... started at 19:28

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #16332 [push] Extend instrumentSignatu... started at 20:28

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #15313 [push] Extend instrumentSignatu... started at 20:38

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #16332 [push] Extend instrumentSignatu... failed after 1 hr 4 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #16330 [push] Extend instrumentSignatu... passed after 1 hr 24 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #15313 [push] Extend instrumentSignatu... passed after 2 hr 33 min

@chentong319
Copy link
Collaborator Author

@jenkins-droid publish this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants