-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #302 from Exabyte-io/docs/SOF-7534
SOF-7534: QE GPU tutorial
- Loading branch information
Showing
9 changed files
with
308 additions
and
36 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Git LFS file not shown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
{ | ||
"descriptionLinks": [ | ||
"Accelerate Quantum ESPRESSO simulation with GPUs: https://docs.mat3ra.com/tutorials/jobs-cli/qe-gpu/" | ||
], | ||
"description": "We walk through a step-by-step example of running a Quantum ESPRESSO job on a GPU enabled node. We see significant performance improvement by using CUDA/GPU-enabled version of Quantum ESPRESSO.", | ||
"tags": [ | ||
{ | ||
"...": "../../metadata/general.json#/tags" | ||
}, | ||
{ | ||
"...": "../../models-directory/dft.json#/tags" | ||
}, | ||
{ | ||
"...": "../../software-directory/modeling/quantum-espresso.json#/tags" | ||
}, | ||
"CUDA", | ||
"GPU", | ||
"NVIDIA" | ||
], | ||
"title": "Mat3ra Tutorial: Accelerate Quantum ESPRESSO simulation with GPUs", | ||
"youTubeCaptions": [ | ||
{ | ||
"text": "Hello, and welcome to the matera tutorial series.", | ||
"startTime": "00:00:00.000", | ||
"endTime": "00:00:03.000" | ||
}, | ||
{ | ||
"text": "In today's tutorial, we will go through a step-by-step example of running a Quantum ESPRESSO simulation on one of our GPU enabled compute nodes.", | ||
"startTime": "00:00:04.000", | ||
"endTime": "00:00:14.000" | ||
}, | ||
{ | ||
"text": "We will see how we can dramatically improve the performance of our simulation using GPUs.", | ||
"startTime": "00:00:15.000", | ||
"endTime": "00:00:20.000" | ||
}, | ||
{ | ||
"text": "At the moment, GPU build of Quantum ESPRESSO is only available via our command line interface, and soon it will be made available in the web interface.", | ||
"startTime": "00:00:21.000", | ||
"endTime": "00:00:30.000" | ||
}, | ||
{ | ||
"text": "Let's connect to the login node using SSH.", | ||
"startTime": "00:00:31.000", | ||
"endTime": "00:00:34.000" | ||
}, | ||
{ | ||
"text": "You can use your terminal application and type S S H, your username at login dot matera dot com and press enter.", | ||
"startTime": "00:00:35.000", | ||
"endTime": "00:00:41.000" | ||
}, | ||
{ | ||
"text": "If you need help on how to set up S S H, please visit our documentation site at docs dot matera dot com, and search S S H.", | ||
"startTime": "00:00:42.000", | ||
"endTime": "00:00:51.000" | ||
}, | ||
{ | ||
"text": "Here you will find step by step guide to setup S S H key for seamless authentication.", | ||
"startTime": "00:00:52.000", | ||
"endTime": "00:00:57.000" | ||
}, | ||
{ | ||
"text": "Note that it is also possible to connect to the login node from our web platform using the web terminal.", | ||
"startTime": "00:00:58.000", | ||
"endTime": "00:01:04.000" | ||
}, | ||
{ | ||
"text": "Besides, <break time='0.5'/> it is also possible to run a command line job via bash workflow in our web platform.", | ||
"startTime": "00:01:05.000", | ||
"endTime": "00:01:12.000" | ||
}, | ||
{ | ||
"text": "Create a new workflow. Select shell script as application.", | ||
"startTime": "00:01:13.000", | ||
"endTime": "00:01:16.000" | ||
}, | ||
{ | ||
"text": "Add an execution unit and write your job script.", | ||
"startTime": "00:01:17.000", | ||
"endTime": "00:01:20.000" | ||
}, | ||
{ | ||
"text": "For now, let's focus on the command line part.", | ||
"startTime": "00:01:22.000", | ||
"endTime": "00:01:24.000" | ||
}, | ||
{ | ||
"text": "The example calculation we are going to demonstrate is available in our github repository C L I job examples.", | ||
"startTime": "00:01:25.000", | ||
"endTime": "00:01:33.000" | ||
}, | ||
{ | ||
"text": "Please browse under espresso, then gpu, where you will find required input and reference output files.", | ||
"startTime": "00:01:34.000", | ||
"endTime": "00:01:39.000" | ||
}, | ||
{ | ||
"text": "Once connected to the login node, let's navigate to your working directory, and clone our example repository.", | ||
"startTime": "00:01:40.000", | ||
"endTime": "00:01:47.000" | ||
}, | ||
{ | ||
"text": "After cloning the repository, we also need to sync the L F S objects with git L F S pull.", | ||
"startTime": "00:01:50.000", | ||
"endTime": "00:01:56.000" | ||
}, | ||
{ | ||
"text": "Let's navigate to our GPU example.", | ||
"startTime": "00:01:57.000", | ||
"endTime": "00:02:00.000" | ||
}, | ||
{ | ||
"text": "Let's examine the P B S job script.", | ||
"startTime": "00:02:03.000", | ||
"endTime": "00:02:05.000" | ||
}, | ||
{ | ||
"text": "We will run our job in GPU enabled G O F queue, we will request one node which has eight CPUs.", | ||
"startTime": "00:02:07.000", | ||
"endTime": "00:02:13.000" | ||
}, | ||
{ | ||
"text": "To run quantum espresso jobs in GPUs, we need to load the CUDA build of quantum espresso.", | ||
"startTime": "00:02:14.000", | ||
"endTime": "00:02:19.000" | ||
}, | ||
{ | ||
"text": "We set eight open M P threads and 1 M P I per GPU.", | ||
"startTime": "00:02:20.000", | ||
"endTime": "00:02:24.000" | ||
}, | ||
{ | ||
"text": "We can also set parallelization options for k point and matrix diagonalization.", | ||
"startTime": "00:02:25.000", | ||
"endTime": "00:02:30.000" | ||
}, | ||
{ | ||
"text": "Finally, we can submit our job with Q sub command. We can find the status of job with Q stat.", | ||
"startTime": "00:02:31.000", | ||
"endTime": "00:02:37.000" | ||
}, | ||
{ | ||
"text": "Once the job is completed, we can examine the output file.", | ||
"startTime": "00:02:38.000", | ||
"endTime": "00:02:41.000" | ||
}, | ||
{ | ||
"text": "We will see that the GPU acceleration was enabled for the calculation.", | ||
"startTime": "00:02:44.000", | ||
"endTime": "00:02:49.000" | ||
}, | ||
{ | ||
"text": "If we scroll to the bottom of the file, we will see the total time taken by the program. The wall time for this job was slightly less than a minute.", | ||
"startTime": "00:02:50.000", | ||
"endTime": "00:02:58.000" | ||
}, | ||
{ | ||
"text": "For comparison, we ran the same job using eight CPUs but without GPU acceleration, <break time='0.5'/> it took about 20 times longer.", | ||
"startTime": "00:03:02.000", | ||
"endTime": "00:03:10.000" | ||
}, | ||
{ | ||
"text": "Now you may test different combination of M P I and open M P threads, different parallelization option, and see what gives you the best performance.", | ||
"startTime": "00:03:11.000", | ||
"endTime": "00:03:20.000" | ||
}, | ||
{ | ||
"text": "Thank you for watching this tutorial and using our platform.", | ||
"startTime": "00:03:21.000", | ||
"endTime": "00:03:24.000" | ||
} | ||
], | ||
"youTubeId": "trLDEwWc3ho" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
--- | ||
tags: | ||
- GPU | ||
- CUDA | ||
hide: | ||
- tags | ||
--- | ||
# Accelerate Quantum ESPRESSO simulation with GPUs | ||
|
||
We will walk through a step-by-step example of running a Quantum ESPRESSO job on | ||
GPUs. As of the time of writing, the GPU (CUDA) build of Quantum ESPRESSO is | ||
only available via the Command Line Interface (CLI). We will see that we can | ||
dramatically speedup our Quantum ESPRESSO simulation by using GPUs. | ||
|
||
1. First connect to login node via [SSH client](../../remote-connection/ssh.md), | ||
or [web terminal](../../remote-connection/web-terminal.md). Note that it is also | ||
possible to run CLI jobs by creating a [bash workflow]( | ||
../../software-directory/scripting/shell/overview.md). | ||
|
||
 | ||
|
||
2. Example job that we are going to run is available in git repository | ||
[exabyte-io/cli-job-examples](https://github.com/exabyte-io/cli-job-examples). | ||
You may clone the repository to your working directory: | ||
```bash | ||
git clone https://github.com/exabyte-io/cli-job-examples | ||
cd cli-job-examples | ||
git lfs pull | ||
cd espresso/gpu | ||
``` | ||
|
||
3. You will find all required input files and job script under `espresso/gpu`. | ||
Please review the input files and PBS job script, update the project name, and | ||
other parameters as necessary. | ||
|
||
4. We will use [GOF](../../infrastructure/clusters/aws.md#hardware-specifications) | ||
queue, which comprises 8 CPUs and 1 NVIDIA V100 GPU per node. | ||
|
||
5. Since our compute node contains 8 CPUs with 1 GPU, we will run 1 MPI process | ||
with 8 OpenMP threads. | ||
```bash | ||
module load espresso/7.4-cuda-12.4-cc-70 | ||
export OMP_NUM_THREADS=8 | ||
mpirun -np 1 pw.x -npool 1 -ndiag 1 -in pw.cuo.scf.in > pw.cuo.gpu.scf.out | ||
``` | ||
|
||
6. Finally, we can submit our job using: | ||
```bash | ||
qsub job.gpu.pbs | ||
``` | ||
|
||
7. Once, the job is completed, we can inspect the output file `pw.cuo.gpu.scf.out`. | ||
We will see that GPU was used, and the job took about 1 minute wall time. | ||
``` | ||
Parallel version (MPI & OpenMP), running on 8 processor cores | ||
Number of MPI processes: 1 | ||
Threads/MPI process: 8 | ||
... | ||
GPU acceleration is ACTIVE. 1 visible GPUs per MPI rank | ||
GPU-aware MPI enabled | ||
... | ||
Parallel routines | ||
PWSCF : 37.94s CPU 50.77s WALL | ||
``` | ||
|
||
8. For comparison, we ran the same calculation using only CPUs, and it took | ||
about 20 times longer. | ||
``` | ||
Parallel version (MPI), running on 8 processors | ||
MPI processes distributed on 1 nodes | ||
... | ||
Parallel routines | ||
PWSCF : 18m 0.56s CPU 18m25.33s WALL | ||
``` | ||
|
||
You may experiment different combinations of MPI and OpenMP, various | ||
[parallelization options](https://www.quantum-espresso.org/Doc/user_guide/node20.html), | ||
and find what gives you the best performance. | ||
|
||
## Step-by-step screenshare video | ||
|
||
<div class="video-wrapper"> | ||
<iframe class="gifffer" width="100%" height="100%" src="https://www.youtube.com/embed/trLDEwWc3ho" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,5 +3,5 @@ | |
publish = "site/" | ||
|
||
[build.environment] | ||
PYTHON_VERSION = "3.8" | ||
PYTHON_VERSION = "3.10" | ||
NODE_VERSION = "20" |
Oops, something went wrong.