Open
Description
In thread #4677, @larshg mentioned that the revised GPU clustering runs faster than before but is slower than the CPU version. The CPU version is based on a KD-Tree, while the GPU version relies on an Octree. I was looking for fast GPU implementation of KD-Tree but did not found a convincing one. Moreover, I was not sure if something like this exists. I thus wanted to ask if somebody with more experience with the clustering algorithms knows whether fast GPU implementations exist and whether we could leverage them here? Are maybe the nn implementations in the cuda module just that? I am grateful for any tips or suggestions!