Open
Description
When I change gpu script to run on cpu but keep ipex imported, chatglm2-6b will throw the following error.
After removing the import of ipex, everything works fine. Llama2 is OK even import ipex.
This issue is of low priority, just record it here in case others meet the same issue.
(Triggered internally at /build/intel-pytorch-extension/csrc/gpu/jit/fusion_pass.cpp:826.)
query_layer = apply_rotary_pos_emb(query_layer, rotary_pos_emb)
Traceback (most recent call last):
File "/home/arda/kai/BigDL/python/llm/example/gpu/hf-transformers-models/chatglm2/./generate.py", line 67, in <module>
output = model.generate(input_ids,
File "/home/arda/anaconda3/envs/kai-llm-pip-nf4/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/arda/kai/BigDL/python/llm/example/gpu/hf-transformers-models/chatglm2/benchmark_util.py", line 1564, in generate
return self.greedy_search(
File "/home/arda/kai/BigDL/python/llm/example/gpu/hf-transformers-models/chatglm2/benchmark_util.py", line 2385, in greedy_search
outputs = self(
File "/home/arda/kai/BigDL/python/llm/example/gpu/hf-transformers-models/chatglm2/benchmark_util.py", line 531, in __call__
return self.model(*args, **kwargs)
File "/home/arda/anaconda3/envs/kai-llm-pip-nf4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/arda/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 845, in forward
transformer_outputs = self.transformer(
File "/home/arda/anaconda3/envs/kai-llm-pip-nf4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/arda/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 741, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "/home/arda/anaconda3/envs/kai-llm-pip-nf4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/arda/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 588, in forward
hidden_states, kv_cache = layer(
File "/home/arda/anaconda3/envs/kai-llm-pip-nf4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/arda/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 510, in forward
attention_output, kv_cache = self.self_attention(
File "/home/arda/anaconda3/envs/kai-llm-pip-nf4/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/arda/anaconda3/envs/kai-llm-pip-nf4/lib/python3.9/site-packages/bigdl/llm/transformers/models/chatglm2.py", line 124, in chatglm2_attention_forward_8eb45c
query_layer = apply_rotary_pos_emb(query_layer, rotary_pos_emb)
NotImplementedError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Could not run 'torch_ipex::mul_add' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torch_ipex::mul_add' is only available for these backends: [XPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
XPU: registered at /build/intel-pytorch-extension/csrc/gpu/aten/operators/TripleOps.cpp:510 [kernel]
BackendSelect: fallthrough registered at /build/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /build/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]
Functionalize: registered at /build/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:280 [backend fallback]
Named: registered at /build/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /build/pytorch/aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at /build/pytorch/aten/src/ATen/native/NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at /build/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback]
AutogradOther: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:30 [backend fallback]
AutogradCPU: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:34 [backend fallback]
AutogradCUDA: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:42 [backend fallback]
AutogradXLA: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:46 [backend fallback]
AutogradMPS: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:54 [backend fallback]
AutogradXPU: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:38 [backend fallback]
AutogradHPU: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:67 [backend fallback]
AutogradLazy: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:50 [backend fallback]
AutogradMeta: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:58 [backend fallback]
Tracer: registered at /build/pytorch/torch/csrc/autograd/TraceTypeManual.cpp:294 [backend fallback]
AutocastCPU: fallthrough registered at /build/pytorch/aten/src/ATen/autocast_mode.cpp:487 [backend fallback]
AutocastXPU: registered at /build/intel-pytorch-extension/csrc/gpu/aten/operators/TripleOps.cpp:510 [kernel]
AutocastCUDA: fallthrough registered at /build/pytorch/aten/src/ATen/autocast_mode.cpp:354 [backend fallback]
FuncTorchBatched: registered at /build/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /build/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /build/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]
VmapMode: fallthrough registered at /build/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /build/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]
PythonTLSSnapshot: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /build/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]
PythonDispatcher: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]