Releases · PennyLaneAI/pennylane-lightning-gpu

28 Aug 15:38

mlxd

v0.32.0

0c2044f

Release 0.32.0 Latest

Latest

New features since last release

Add sparse Hamiltonian support to multi-node/multi-GPU adjoint methods. (#128)
Add Sparse Hamiltonian support for expectation value calculation. (#127)

Breaking changes

Rename QubitStateVector to StatePrep in the LightningGPU class. (#134)
Deprecate Python 3.8. (#134)
Update PennyLane-Lightning imports following the (#472) refactoring. #134

Improvements

Optimizes the single qubit rotation gate by using a single cuStateVector API call instead of separate Pauli gate applications. (#132)

Bug fixes

apply no longer mutates the inputted list of operations and add the missing _dp to the LightningGPU class with single GPU backend. (#133)
Ensure active return check doesn't break CI. (#136)

Contributors

This release contains contributions from (in alphabetical order):

David Clark (NVIDIA), Vincent Michaud-Rioux, Shuli Shu

Assets 2

26 Jun 14:35

multiphaseCFD

v0.31.0

fabe0d9

Release v0.31.0

New features since last release

Add multi-node/multi-GPU support to adjoint methods. (#119)

Note each MPI process will return the overall result of the adjoint method. The MPI adjoint method has two options:

The default method is faster if the available problem fits into GPU memory, and will simply enabled with the mpi=True device argument. With the default method, a separate bra is created for each observable and the ket is only updated once for each operation, regardless of the number of observables. This approach may consume more memory due to the up-front creation of multiple bras.
The memory-optimized method requires less memory but is slower due serialization of the execution. The memory-optimized method uses a single bra object that is reused for all observables. The ket needs to be updated n times, where n is the number of observables, for each operation. This approach reduces memory consumption as only one bra object is created. However, it may lead to slower execution due to the multiple ket updates per gate operation.

Each MPI process will return the overall simulation results for the adjoint method.

The workflow for the default adjoint method with MPI support is as follows:

 from mpi4py import MPI
 import pennylane as qml
 from pennylane import numpy as np
 
 comm = MPI.COMM_WORLD
 rank = comm.Get_rank()
 n_wires = 20
 n_layers = 2
 
 dev = qml.device('lightning.gpu', wires= n_wires, mpi=True)
 @qml.qnode(dev, diff_method="adjoint")
 def circuit_adj(weights):
     qml.StronglyEntanglingLayers(weights, wires=list(range(n_wires)))
     return qml.math.hstack([qml.expval(qml.PauliZ(i)) for i in range(n_wires)])
 
 if rank == 0:
     params = np.random.random(qml.StronglyEntanglingLayers.shape(n_layers=n_layers, n_wires=n_wires))
 else:
     params = None
 
 params = comm.bcast(params, root=0)
 jac = qml.jacobian(circuit_adj)(params)

To enable the memory-optimized method, batch_obs should be set as True. The workflow for the memory-optimized method is as follows:

dev = qml.device('lightning.gpu', wires= n_wires, mpi=True, batch_obs=True)

Add multi-node/multi-GPU support to measurement methods, including expval, generate_samples and probability. (#116)

Note that each MPI process will return the overall result of expectation value and sample generation. However, probability will
return local probability results. Users should be responsible to collect probability results across the MPI processes.

The workflow for collecting probability results across the MPI processes is as follows:

from mpi4py import MPI
import pennylane as qml
import numpy as np

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
dev = qml.device('lightning.gpu', wires=8, mpi=True)
prob_wires = [0, 1]

@qml.qnode(dev)
def mpi_circuit():
    qml.Hadamard(wires=1)
    return qml.probs(wires=prob_wires)

local_probs = mpi_circuit()

#For data collection across MPI processes.
recv_counts = comm.gather(len(local_probs),root=0)
if rank == 0:
     probs = np.zeros(2**len(prob_wires))
else:
     probs = None

comm.Gatherv(local_probs,[probs,recv_counts],root=0)
if rank == 0:
   print(probs)

Add multi-node/multi-gpu support to gate operation. (#112)

This new feature empowers users to leverage the computational power of multi-node and multi-GPUs for running large-scale applications. It requires both the total number of overall MPI processes and the number of MPI processes of each node to be the same and power of 2. Each MPI process is responsible for managing one GPU for the moment.
To enable this feature, users can set mpi=True. Furthermore, users can fine-tune the performance of MPI operations by adjusting the mpi_buf_size parameter. This parameter determines the allocation of mpi_buf_size MiB (mebibytes, 2^20 bytes) GPU memory for MPI operations. Note that mpi_buf_size should be also power of 2 and there will be a runtime warning if GPU memory buffer for MPI operation is larger than the GPU memory allocated for the local state vector. By default (mpi_buf_size=0), the GPU memory allocated for MPI operations will be the same of size of the local state vector, with a upper limit of 64 MiB. Note that MiB (2^20 bytes) is different from MB (megabytes, 10^6 bytes).
The workflow for the new feature is as follows:
```
from mpi4py import MPI
import pennylane as qml
dev = qml.device('lightning.gpu', wires=8, mpi=True, mpi_buf_size=1)
@qml.qnode(dev)
def circuit_mpi():
    qml.PauliX(wires=[0])
    return qml.state()
local_state_vector = circuit_mpi()
print(local_state_vector)
```
Note that each MPI process will return its local state vector with qml.state() here.

Breaking changes

Update tests to be compliant with PennyLane v0.31.0 development changes and deprecations. (#114)

Improvements

Use Operator.name instead of Operation.base_name. (#115)
Updated runs-on label for self-hosted runner workflows. (#117)
Update workflow to support multi-gpu self-hosted runner. (#118)
Add compat workflows. (#121)

Documentation

Update README.rst and CHANGLOG.md for the MPI backend. (#122)

Contributors

This release contains contributions from (in alphabetical order):

Christina Lee, Rashid N H M, Shuli Shu

Assets 7

01 May 19:10

mlxd

v0.30.0

faa54a3

Release v0.30.0

New features since last release

Improvements

Wheels are now checked with twine check post-creation for PyPI compatibility. (#103)

Bug fixes

Fix CUDA version to 11 for cuquantum dependency in CI. (#107)
Fix the controlled-gate generators, which are now fully used in the adjoint pipeline following PennyLane PR (#3874). (#101)
Updates to use the new call signature for QuantumScript.get_operation. (#104)

Contributors

Vincent Michaud-Rioux, Romain Moyard, Lee James O'Riordan

Assets 2

10 Mar 20:42

mlxd

v0.29.1

9e98e84

Release v0.29.1

Improvements

Optimization updates to custatevector integration. E.g., creation of fewer cublas, cusparse and custatevec handles and fewer calls to small data transfers between host and device. (#73)

Contributors

Ania Brown (NVIDIA), Andreas Hehn (NVIDIA)

Assets 7

28 Feb 12:41

mlxd

v0.29.0

8b20900

Release v0.29.0

Improvements

Update inv() to qml.adjoint() in Python tests following recent changes in Pennylane. (#88)
Remove explicit Numpy requirement. (#90)

Bug fixes

Ensure early-failure rather than return of incorrect results from out of order probs wires. (#94)

Contributors

This release contains contributions from (in alphabetical order):

Amintor Dusko, Lee James O'Riordan, Shuli Shu

Assets 2

12 Jan 17:04

mlxd

v0.28.1

3e8766a

Release 0.28.1

Bug fixes

Downgrade CUDA compiler for wheels to avoid compatibility issues with older runtimes. (#87)
Add header unordered_map to util/cuda_helpers.hpp. (#86)

Contributors

This release contains contributions from (in alphabetical order):

Lee James O'Riordan, Feng Wang

Assets 2

19 Dec 14:52

mlxd

v0.28.0

a82c99c

Release 0.28.0

New features since last release

Add customized CUDA kernels for statevector initialization to cpp layer. (#70)

Breaking changes

Deprecate _state and _pre_rotated_state and refactor syncH2D and syncD2H. (#70)

The refactor on syncH2D and syncD2H allows users to explicitly access and update statevector data
on device when needed and could reduce the unnecessary memory allocation on host.

The workflow for syncH2D is:

dev = qml.device('lightning.gpu', wires=3)
obs = qml.Identity(0) @ qml.PauliX(1) @ qml.PauliY(2)
obs1 = qml.Identity(1)
H = qml.Hamiltonian([1.0, 1.0], [obs1, obs])
state_vector = np.array([0.0 + 0.0j, 0.0 + 0.1j, 0.1 + 0.1j, 0.1 + 0.2j,
                0.2 + 0.2j, 0.3 + 0.3j, 0.3 + 0.4j, 0.4 + 0.5j,], dtype=np.complex64,)
dev.syncH2D(state_vector)
res = dev.expval(H)

The workflow for syncD2H is:

dev = qml.device('lightning.gpu', wires=num_wires)
dev.apply([qml.PauliX(wires=[0])])
state_vector = np.zeros(2**dev.num_wires).astype(dev.C_DTYPE)
dev.syncD2H(state_vector)

Deprecate Python 3.7 wheels. (#75)
Change the signature of the DefaultQubit.signature method. (#78)

Improvements

lightning.gpu is decoupled from Numpy layer during initialization and execution
and change lightning.gpu to inherit from QubitDevice instead of LightningQubit. (#70)
Add support for CI checks. (#76)
Implement improved stopping_condition method, and make Linux wheel builds more performant. (#77)

Bug fixes

Fix wheel-builder to pin CUDA version to 11.8 instead of latest. (#83)
Pin CMake to 3.24.x in wheel-builder to avoid Python not found error in CMake 3.25. (#75)
Fix data copy method in the state() method. (#82)

Contributors

This release contains contributions from (in alphabetical order):

Amintor Dusko, Lee J. O'Riordan, Shuli Shu

Assets 2

14 Nov 19:32

mlxd

v0.27.0

0432ae0

Release v0.27.0

New features since last release

Explicit support for qml.SparseHamiltonian using the adjoint gradient method. (#72)

This support allows users to explicitly make use of qml.SparseHamiltonian in expectation value calculations, and ensures the gradients can be taken efficiently.
A user can now explicitly decide whether to decompose the Hamiltonian into separate Pauli-words, with evaluations happening over multiple GPUs, or convert the Hamiltonian directly to a sparse representation for evaluation on a single GPU. Depending on the Hamiltonian structure, a user may benefit from one method or the other.

The workflow for decomposing a Hamiltonian is as:

obs_per_gpu = 1
dev = qml.device("lightning.gpu", wires=num_wires, batch_obs=obs_per_gpu)

H = sum([0.5*(i+1)*(qml.PauliZ(i)@qml.PauliZ(i+1)) for i in range(0, num_wires-1, 2)])

@qml.qnode(dev, diff_method="adjoint")
def circuit(params):
    for i in range(num_wires):
        qml.RX(params[i], i)
    return qml.expval(H)

For the new qml.SparseHamiltonian support, the above script becomes:

dev = qml.device("lightning.gpu", wires=num_wires)
H = sum([0.5*(i+1)*(qml.PauliZ(i)@qml.PauliZ(i+1)) for i in range(0, num_wires-1, 2)])
H_sparse_matrix = qml.utils.sparse_hamiltonian(H, wires=range(num_wires))

SpH = qml.SparseHamiltonian(H_sparse_matrix, wires=range(num_wires))

@qml.qnode(dev, diff_method="adjoint")
def circuit(params):
    for i in range(num_wires):
        qml.RX(params[i], i)
    return qml.expval(SpH)

Enable building of python 3.11 wheels and upgrade python on CI/CD workflows to 3.8. (#71)

Improvements

Update LightningGPU device following changes in LightningQubit inheritance from DefaultQubit to QubitDevice. (#74)

Bug fixes

Ensure device fallback successfully carries through for 0 CUDA devices. (#67)
Fix void data type used in SparseSpMV. (#69)

Contributors

Amintor Dusko, Lee J. O'Riordan, Shuli Shu

Assets 2

20 Oct 13:51

mlxd

v0.26.2

a9165ee

Release v0.26.2

Bug fixes

Fix reduction over batched & decomposed Hamiltonians in adjoint pipeline (#64)

Contributors

Lee J. O'Riordan

Assets 2

17 Oct 16:49

mlxd

v0.26.1

e4c0e92

Release v0.26.1

This is a minor release with an update to how qml.Hamiltonian's are handled at the C++ layer.

Bug fixes

Ensure qml.Hamiltonian is auto-decomposed for the adjoint differentiation pipeline to avoid OOM errors.
(#62)

Contributors

Lee J. O'Riordan

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features since last release

Breaking changes

Improvements

Bug fixes

Contributors

New features since last release

Breaking changes

Improvements

Documentation

Contributors

New features since last release

Improvements

Bug fixes

Contributors

Improvements

Contributors

Improvements

Bug fixes

Contributors

Bug fixes

Contributors

New features since last release

Breaking changes

Improvements

Bug fixes

Contributors

New features since last release

Improvements

Bug fixes

Contributors

Bug fixes

Contributors

Bug fixes

Contributors

Releases: PennyLaneAI/pennylane-lightning-gpu

Release 0.32.0

New features since last release

Breaking changes

Improvements

Bug fixes

Contributors

Release v0.31.0

New features since last release

Breaking changes

Improvements

Documentation

Contributors

Release v0.30.0

New features since last release

Improvements

Bug fixes

Contributors

Release v0.29.1

Improvements

Contributors

Release v0.29.0

Improvements

Bug fixes

Contributors

Release 0.28.1

Bug fixes

Contributors

Release 0.28.0

New features since last release

Breaking changes

Improvements

Bug fixes

Contributors

Release v0.27.0

New features since last release

Improvements

Bug fixes

Contributors

Release v0.26.2

Bug fixes

Contributors

Release v0.26.1

Bug fixes

Contributors