"CUDA error: No kernel image" still exists after reinstalling torch-points-kernels

Hi, 

I have to compile the "torch-points-kernels" library in my workstation and then run the code in a remote server using the same conda environment.

The "CUDA error" happened after I submitted the job to the remote server although I could run the code well in my workstation.

Following your solution, I uninstalled the library, cleared the cache, and reinstalled it on my workstation after setting the TORCH_CUDA_ARCH_LIST.

But the same error still happened.

I checked the two GPU cards, which were Quadro RTX 6000 (Turing SM 75) and Tesla V100 (Volta SM70), respectively. And I set 'export TORCH_CUDA_ARCH_LIST="7.0;7.5"' before I reinstalled the library.

The error details are as follows,

Traceback (most recent call last):
  File "train_s_stransformer.py", line 613, in <module>
    main()
  File "train_s_stransformer.py", line 92, in main
    main_worker(args.train_gpu, args.ngpus_per_node, args)
  File "train_s_stransformer.py", line 327, in main_worker
    loss_train, mIoU_train, mAcc_train, allAcc_train= train(train_loader, model, criterion, optimizer, epoch, scaler, scheduler)
  File "train_s_stransformer.py", line 426, in train
    output = model(feat, coord, offset, batch, neighbor_idx)
  File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/3dSegmentation/stratified_transformer/Stratified-Transformer-main/model/stratified_transformer.py", line 453, in forward
    feats, xyz, offset, feats_down, xyz_down, offset_down = layer(feats, xyz, offset)
  File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/3dSegmentation/stratified_transformer/Stratified-Transformer-main/model/stratified_transformer.py", line 281, in forward
    v2p_map, p2v_map, counts = grid_sample(xyz, batch, window_size, start=None)
  File "/home/xxx/3dSegmentation/stratified_transformer/Stratified-Transformer-main/model/stratified_transformer.py", line 59, in grid_sample
    unique, cluster, counts = torch.unique(cluster, sorted=True, return_inverse=True, return_counts=True)
  File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/_jit_internal.py", line 421, in fn
    return if_true(*args, **kwargs)
  File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/_jit_internal.py", line 421, in fn
    return if_true(*args, **kwargs)
  File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/functional.py", line 769, in _unique_impl
    return_counts=return_counts,
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Please give me some advice on how to use it.

Best, 

Eric.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"CUDA error: No kernel image" still exists after reinstalling torch-points-kernels #100

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"CUDA error: No kernel image" still exists after reinstalling torch-points-kernels #100

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions