Skip to content

[WIP] Build C++ & Python tracing support for PyTorch/XLA ops #3348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

miladm
Copy link
Collaborator

@miladm miladm commented Feb 5, 2022

This PR is a software tool to enable tracing PyTorch/XLA ops at Python and C++ levels combined.

As an immediate use case, the size op is traced to showcase the approach. Understanding the utility of size at a detailed level is an important aspect of supporting Dynamic Shape in PyTorch / XLA and in Lazy Tensor Core (LTC).

@miladm miladm self-assigned this Feb 5, 2022
@miladm miladm changed the title Build C++ and Python tracing support for PyTorch/XLA ops Build C++ & Python tracing support for PyTorch/XLA ops Feb 5, 2022
@miladm
Copy link
Collaborator Author

miladm commented Feb 6, 2022

Sample output of >> python test_op_tracing.py

========Python BEGIN==========
  File "/usr/local/google/home/miladmo/sw/dupe/pytorch/xla/test/test_op_tracing.py", line 13, in print_stacktrace
    traceback.print_stack()
========Python END==========
=========CPP BEGIN=========
frame #0: <unknown function> + 0x7ec406 (0x7f2ee22a6406 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #1: <unknown function> + 0x7ec33d (0x7f2ee22a633d in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x7ec2ed (0x7f2ee22a62ed in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x7ec17d (0x7f2ee22a617d in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #4: std::function<void ()>::operator()() const + 0x3e (0x7f2ee205a6be in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #5: torch_xla::XLATensorImpl::SetupSizeProperties() + 0x4f (0x7f2ee25628ef in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #6: torch_xla::XLATensorImpl::sizes() const + 0x19 (0x7f2ee2562879 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0xfbfc29 (0x7f2eeb7acc29 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x4ecadf4 (0x7f2eef6b7df4 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x4ecc76b (0x7f2eef6b976b in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #10: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x452 (0x7f2eef6b56d2 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #11: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x690 (0x7f2eef6b4db0 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #12: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x9e (0x7f2eef6b434e in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x63 (0x7f2efbe98d73 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #14: <unknown function> + 0x4eea889 (0x7f2eef6d7889 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #15: <unknown function> + 0x4eea6c1 (0x7f2eef6d76c1 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #16: <unknown function> + 0x4eea63d (0x7f2eef6d763d in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #17: <unknown function> + 0x4eea5a5 (0x7f2eef6d75a5 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #18: <unknown function> + 0x4eea129 (0x7f2eef6d7129 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #19: <unknown function> + 0xd38f4 (0x7f2efd06c8f4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #20: <unknown function> + 0x8d80 (0x7f2f06b42d80 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #21: clone + 0x3f (0x7f2f06a6db6f in /lib/x86_64-linux-gnu/libc.so.6)
=========CPP END=========
==========Lower Ops BEGIN=======

==========Lower Ops END=======
========Python BEGIN==========
  File "/usr/local/google/home/miladmo/sw/dupe/pytorch/xla/test/test_op_tracing.py", line 30, in <module>
    print(a.grad.to(device="cpu").sum())
  File "/usr/local/google/home/miladmo/sw/dupe/pytorch/xla/test/test_op_tracing.py", line 13, in print_stacktrace
    traceback.print_stack()
========Python END==========
=========CPP BEGIN=========
frame #0: <unknown function> + 0x7ec406 (0x7f2ee22a6406 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #1: <unknown function> + 0x7ec33d (0x7f2ee22a633d in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x7ec2ed (0x7f2ee22a62ed in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x7ec17d (0x7f2ee22a617d in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #4: std::function<void ()>::operator()() const + 0x3e (0x7f2ee205a6be in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #5: torch_xla::XLATensorImpl::SetupSizeProperties() + 0x4f (0x7f2ee25628ef in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #6: torch_xla::XLATensorImpl::sizes() const + 0x19 (0x7f2ee2562879 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch_xla-1.11-py3.9-linux-x86_64.egg/_XLAC.cpython-39-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0xfbfc29 (0x7f2eeb7acc29 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::native::_to_copy(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x7b6 (0x7f2eebfc44f6 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x28cf175 (0x7f2eed0bc175 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x29781eb (0x7f2eed1651eb in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x1d9b882 (0x7f2eec588882 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x1d9c1b6 (0x7f2eec5891b6 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x3f2 (0x7f2eec406352 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x25f9328 (0x7f2eecde6328 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #15: <unknown function> + 0x25f990b (0x7f2eecde690b in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #16: <unknown function> + 0x1d9b882 (0x7f2eec588882 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #17: <unknown function> + 0x1d9c1b6 (0x7f2eec5891b6 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #18: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x3f2 (0x7f2eec406352 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #19: at::redispatch::_to_copy(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x14d (0x7f2eee8286bd in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #20: <unknown function> + 0x3f3e901 (0x7f2eee72b901 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #21: <unknown function> + 0x3f3de83 (0x7f2eee72ae83 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #22: <unknown function> + 0x3f3ee8e (0x7f2eee72be8e in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #23: <unknown function> + 0x1d9b882 (0x7f2eec588882 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #24: at::_ops::_to_copy::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x9a1 (0x7f2eec405c31 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #25: at::_to_copy(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, c10::optional<c10::MemoryFormat>) + 0x125 (0x7f2eebfc7ba5 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #26: <unknown function> + 0x17d7e39 (0x7f2eebfc4e39 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #27: at::native::to(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0x1d1 (0x7f2eebfc5251 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #28: <unknown function> + 0x2a5bb13 (0x7f2eed248b13 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #29: <unknown function> + 0x2b81b55 (0x7f2eed36eb55 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #30: <unknown function> + 0x1fc097a (0x7f2eec7ad97a in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #31: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0xa67 (0x7f2eec695f97 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #32: <unknown function> + 0x62cde3 (0x7f2efbaecde3 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #33: <unknown function> + 0x60964f (0x7f2efbac964f in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #34: <unknown function> + 0x54c9f0 (0x7f2efba0c9f0 in /usr/local/google/home/miladmo/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #47: __libc_start_main + 0xcd (0x7f2f069987ed in /lib/x86_64-linux-gnu/libc.so.6)
=========CPP END=========

@@ -0,0 +1,36 @@
import torch
import torch_xla
#torch_xla._XLAC._ltc_init_ts_backend() #TOREMOVE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove 😄

stack = traceback.format_list(stack)
if (len(stack) > 1):
second_frame = str(stack[-2]).replace(
"/usr/local/google/home/miladmo/anaconda3/envs/pytorch1/bin/ipython",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have to hardcode the path, maybe use something like /tmp instead of your home dir.

@@ -128,6 +130,15 @@ int64_t XLATensorImpl::size(int64_t d) const {
}

void XLATensorImpl::SetupSizeProperties() {
if (getPythonPrinter() && !disablePrinter()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if I understand correctly, this will ensure that whenever tensor.size() is called, we log the python frame and C++ frame if printer is set. I think it is a useful tool but I am not sure if it is a good idea for us to merge the change in this file into the master. I think it make a lot of sense of add the binding and documents, but this part to me feel like a debugging feature for developer.

Unless we are at a stage where we expect user to dump this info and do debugging, I felt like it might be better to leave this in a separate branch?

};
getPythonPrinter() = printer;
});
m.def("_print_text",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the use of this function?

@miladm miladm added this to the Dynamic Shape milestone Feb 10, 2022
@miladm miladm changed the title Build C++ & Python tracing support for PyTorch/XLA ops [WIP] Build C++ & Python tracing support for PyTorch/XLA ops Feb 10, 2022
@ysiraichi ysiraichi added DO_NOT_MERGE Not for merging. and removed DO_NOT_MERGE_YET labels Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO_NOT_MERGE Not for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants