|
| 1 | +.. _gpu: |
| 2 | + |
| 3 | +GPU Support |
| 4 | +=========== |
| 5 | + |
| 6 | +Overview |
| 7 | +-------- |
| 8 | +PyLops-mpi supports computations on GPUs leveraging the GPU backend of PyLops. Under the hood, |
| 9 | +`CuPy <https://cupy.dev/>`_ (``cupy-cudaXX>=v13.0.0``) is used to perform all of the operations. |
| 10 | +This library must be installed *before* PyLops-mpi is installed. |
| 11 | + |
| 12 | +.. note:: |
| 13 | + |
| 14 | + Set environment variable ``CUPY_PYLOPS=0`` to force PyLops to ignore the ``cupy`` backend. |
| 15 | + This can be also used if a previous (or faulty) version of ``cupy`` is installed in your system, |
| 16 | + otherwise you will get an error when importing PyLops. |
| 17 | + |
| 18 | + |
| 19 | +The :class:`pylops_mpi.DistributedArray` and :class:`pylops_mpi.StackedDistributedArray` objects can be |
| 20 | +generated using both ``numpy`` and ``cupy`` based local arrays, and all of the operators and solvers in PyLops-mpi |
| 21 | +can handle both scenarios. Note that, since most operators in PyLops-mpi are thin-wrappers around PyLops operators, |
| 22 | +some of the operators in PyLops that lack a GPU implementation cannot be used also in PyLops-mpi when working with |
| 23 | +cupy arrays. |
| 24 | + |
| 25 | + |
| 26 | +Example |
| 27 | +------- |
| 28 | + |
| 29 | +Finally, let's briefly look at an example. First we write a code snippet using |
| 30 | +``numpy`` arrays which PyLops-mpi will run on your CPU: |
| 31 | + |
| 32 | +.. code-block:: python |
| 33 | +
|
| 34 | + # MPI helpers |
| 35 | + comm = MPI.COMM_WORLD |
| 36 | + rank = MPI.COMM_WORLD.Get_rank() |
| 37 | + size = MPI.COMM_WORLD.Get_size() |
| 38 | + |
| 39 | + # Create distributed data (broadcast) |
| 40 | + nxl, nt = 20, 20 |
| 41 | + dtype = np.float32 |
| 42 | + d_dist = pylops_mpi.DistributedArray(global_shape=nxl * nt, |
| 43 | + partition=pylops_mpi.Partition.BROADCAST, |
| 44 | + engine="numpy", dtype=dtype) |
| 45 | + d_dist[:] = np.ones(d_dist.local_shape, dtype=dtype) |
| 46 | + |
| 47 | + # Create and apply VStack operator |
| 48 | + Sop = pylops.MatrixMult(np.ones((nxl, nxl)), otherdims=(nt, )) |
| 49 | + HOp = pylops_mpi.MPIVStack(ops=[Sop, ]) |
| 50 | + y_dist = HOp @ d_dist |
| 51 | + |
| 52 | +
|
| 53 | +Now we write a code snippet using ``cupy`` arrays which PyLops will run on |
| 54 | +your GPU: |
| 55 | + |
| 56 | +.. code-block:: python |
| 57 | +
|
| 58 | + # MPI helpers |
| 59 | + comm = MPI.COMM_WORLD |
| 60 | + rank = MPI.COMM_WORLD.Get_rank() |
| 61 | + size = MPI.COMM_WORLD.Get_size() |
| 62 | + |
| 63 | + # Define gpu to use |
| 64 | + cp.cuda.Device(device=rank).use() |
| 65 | +
|
| 66 | + # Create distributed data (broadcast) |
| 67 | + nxl, nt = 20, 20 |
| 68 | + dtype = np.float32 |
| 69 | + d_dist = pylops_mpi.DistributedArray(global_shape=nxl * nt, |
| 70 | + partition=pylops_mpi.Partition.BROADCAST, |
| 71 | + engine="cupy", dtype=dtype) |
| 72 | + d_dist[:] = cp.ones(d_dist.local_shape, dtype=dtype) |
| 73 | + |
| 74 | + # Create and apply VStack operator |
| 75 | + Sop = pylops.MatrixMult(cp.ones((nxl, nxl)), otherdims=(nt, )) |
| 76 | + HOp = pylops_mpi.MPIVStack(ops=[Sop, ]) |
| 77 | + y_dist = HOp @ d_dist |
| 78 | +
|
| 79 | +The code is almost unchanged apart from the fact that we now use ``cupy`` arrays, |
| 80 | +PyLops-mpi will figure this out! |
| 81 | + |
| 82 | +.. note:: |
| 83 | + |
| 84 | + The CuPy backend is in active development, with many examples not yet in the docs. |
| 85 | + You can find many `other examples <https://github.com/PyLops/pylops_notebooks/tree/master/developement-mpi/Cupy_MPI>`_ from the `PyLops Notebooks repository <https://github.com/PyLops/pylops_notebooks>`_. |
0 commit comments