Skip to content

Commit 076868d

Browse files
committed
doc: added gpu section to doc
1 parent 41583fb commit 076868d

File tree

3 files changed

+88
-2
lines changed

3 files changed

+88
-2
lines changed

docs/source/gpu.rst

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
.. _gpu:
2+
3+
GPU Support
4+
===========
5+
6+
Overview
7+
--------
8+
PyLops-mpi supports computations on GPUs leveraging the GPU backend of PyLops. Under the hood,
9+
`CuPy <https://cupy.dev/>`_ (``cupy-cudaXX>=v13.0.0``) is used to perform all of the operations.
10+
This library must be installed *before* PyLops-mpi is installed.
11+
12+
.. note::
13+
14+
Set environment variable ``CUPY_PYLOPS=0`` to force PyLops to ignore the ``cupy`` backend.
15+
This can be also used if a previous (or faulty) version of ``cupy`` is installed in your system,
16+
otherwise you will get an error when importing PyLops.
17+
18+
19+
The :class:`pylops_mpi.DistributedArray` and :class:`pylops_mpi.StackedDistributedArray` objects can be
20+
generated using both ``numpy`` and ``cupy`` based local arrays, and all of the operators and solvers in PyLops-mpi
21+
can handle both scenarios. Note that, since most operators in PyLops-mpi are thin-wrappers around PyLops operators,
22+
some of the operators in PyLops that lack a GPU implementation cannot be used also in PyLops-mpi when working with
23+
cupy arrays.
24+
25+
26+
Example
27+
-------
28+
29+
Finally, let's briefly look at an example. First we write a code snippet using
30+
``numpy`` arrays which PyLops-mpi will run on your CPU:
31+
32+
.. code-block:: python
33+
34+
# MPI helpers
35+
comm = MPI.COMM_WORLD
36+
rank = MPI.COMM_WORLD.Get_rank()
37+
size = MPI.COMM_WORLD.Get_size()
38+
39+
# Create distributed data (broadcast)
40+
nxl, nt = 20, 20
41+
dtype = np.float32
42+
d_dist = pylops_mpi.DistributedArray(global_shape=nxl * nt,
43+
partition=pylops_mpi.Partition.BROADCAST,
44+
engine="numpy", dtype=dtype)
45+
d_dist[:] = np.ones(d_dist.local_shape, dtype=dtype)
46+
47+
# Create and apply VStack operator
48+
Sop = pylops.MatrixMult(np.ones((nxl, nxl)), otherdims=(nt, ))
49+
HOp = pylops_mpi.MPIVStack(ops=[Sop, ])
50+
y_dist = HOp @ d_dist
51+
52+
53+
Now we write a code snippet using ``cupy`` arrays which PyLops will run on
54+
your GPU:
55+
56+
.. code-block:: python
57+
58+
# MPI helpers
59+
comm = MPI.COMM_WORLD
60+
rank = MPI.COMM_WORLD.Get_rank()
61+
size = MPI.COMM_WORLD.Get_size()
62+
63+
# Define gpu to use
64+
cp.cuda.Device(device=rank).use()
65+
66+
# Create distributed data (broadcast)
67+
nxl, nt = 20, 20
68+
dtype = np.float32
69+
d_dist = pylops_mpi.DistributedArray(global_shape=nxl * nt,
70+
partition=pylops_mpi.Partition.BROADCAST,
71+
engine="cupy", dtype=dtype)
72+
d_dist[:] = cp.ones(d_dist.local_shape, dtype=dtype)
73+
74+
# Create and apply VStack operator
75+
Sop = pylops.MatrixMult(cp.ones((nxl, nxl)), otherdims=(nt, ))
76+
HOp = pylops_mpi.MPIVStack(ops=[Sop, ])
77+
y_dist = HOp @ d_dist
78+
79+
The code is almost unchanged apart from the fact that we now use ``cupy`` arrays,
80+
PyLops-mpi will figure this out!
81+
82+
.. note::
83+
84+
The CuPy backend is in active development, with many examples not yet in the docs.
85+
You can find many `other examples <https://github.com/PyLops/pylops_notebooks/tree/master/developement-mpi/Cupy_MPI>`_ from the `PyLops Notebooks repository <https://github.com/PyLops/pylops_notebooks>`_.

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ class and implementing the ``_matvec`` and ``_rmatvec``.
7171

7272
self
7373
installation.rst
74+
gpu.rst
7475

7576
.. toctree::
7677
:maxdepth: 2

pylops_mpi/basicoperators/BlockDiag.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ def _matvec(self, x: DistributedArray) -> DistributedArray:
121121
for iop, oper in enumerate(self.ops):
122122
y1.append(oper.matvec(x.local_array[self.mmops[iop]:
123123
self.mmops[iop + 1]]))
124-
y[:] = ncp.concatenate(ncp.asarray(y1))
124+
y[:] = ncp.concatenate(y1)
125125
return y
126126

127127
@reshaped(forward=False, stacking=True)
@@ -133,7 +133,7 @@ def _rmatvec(self, x: DistributedArray) -> DistributedArray:
133133
for iop, oper in enumerate(self.ops):
134134
y1.append(oper.rmatvec(x.local_array[self.nnops[iop]:
135135
self.nnops[iop + 1]]))
136-
y[:] = ncp.concatenate(ncp.asarray(y1))
136+
y[:] = ncp.concatenate(y1)
137137
return y
138138

139139

0 commit comments

Comments
 (0)