-
Notifications
You must be signed in to change notification settings - Fork 256
Fix GPU backend bugs, Python analysis scripts, and test inputs for GPU CI #6801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Changes from all commits
559a102
8bea0f6
f6f3750
78312b6
062ddba
e0fe780
db18dcf
662e18a
7bb079e
8229901
b37c23a
900cd6e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -76,6 +76,7 @@ def main(args): | |||||||||||||||||||||||||
| OpenPMDTimeSeries(args.path) | ||||||||||||||||||||||||||
| except Exception: | ||||||||||||||||||||||||||
| print("Could not open the file as a plotfile or an openPMD time series") | ||||||||||||||||||||||||||
| args.format = "openpmd" | ||||||||||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this assignment?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moreover, analysis_default_regression.py here should be just a link to https://github.com/BLAST-WarpX/warpx/blob/development/Examples/analysis_default_regression.py. Fixing the code within the test directory won't work. If it is not a link, but a hard copy, that should be fixed in the first place.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. looks like the script here changes the Do you like to simplify the format detection logic? @EZoni
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The link issue is fixed in #6810. Further fixes to the underlying code, if needed, should be done in the original file.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #6810 and #6811 fix the link issues, but coming back to the original suggestion, @ax3l and @lucafedeli88,
I think I don't understand it. The current code is warpx/Examples/analysis_default_regression.py Lines 69 to 80 in ad2c5b9
so an error (
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is it that does not work with this and/or what is it that you would like to change? Simply adding a
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's my attempt of interpretation, let me know if this is what you had in mind: #6812.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without this change, args.format remains unset if the inner exception is thrown, correct? Subsequent usage of args.format could be problematic. Something that exits before main() is invoked would avoid passing down an unset args.format. (E.G., remove the extra else: in the inner try block and replace it with sys.exit(1) as suggested.) If all the code is robust against/agnostic to an unset value, then it doesn't matter and the code could stay as it was before this commit. @EZoni, your fix in #6812 handles it the best way. |
||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||
| args.format = "openpmd" | ||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,15 +26,15 @@ | |
| def find_first_non_zero_from_bottom_left(matrix): | ||
| for i in range(matrix.shape[0]): | ||
| for j in range(matrix.shape[1]): | ||
| if (matrix[i][j] != 0) and (matrix[i][j] != np.nan): | ||
| if (matrix[i][j] != 0) and (not np.isnan(matrix[i][j])): | ||
| return (i, j) | ||
|
Comment on lines
+29
to
30
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good fix.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isolated to faster review & merge to #6807 |
||
| return i, j | ||
|
|
||
|
|
||
| def find_first_non_zero_from_upper_right(matrix): | ||
| for i in range(matrix.shape[0] - 1, -1, -1): | ||
| for j in range(matrix.shape[1] - 1, -1, -1): | ||
| if (matrix[i][j] != 0) and (matrix[i][j] != np.nan): | ||
| if (matrix[i][j] != 0) and (not np.isnan(matrix[i][j])): | ||
| return (i, j) | ||
| return i, j | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,7 +15,7 @@ my_constants.dt = 0.1/wpe # s | |
| ################################# | ||
| max_step = 20 | ||
| amr.n_cell = 40 40 | ||
| amr.max_grid_size = 8 | ||
| amr.max_grid_size = 16 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks a bit random: Why did you increase this but not blocking factor, too?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was explained here. #6801 (comment)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, saw also PR description now.
I think we probably want to increase max_grid_size and blocking factor for the tests that throw warnings. |
||
| amr.blocking_factor = 8 | ||
| amr.max_level = 0 | ||
| geometry.dims = 2 | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,7 +29,7 @@ algo.particle_shape = 1 | |
| algo.current_deposition = direct | ||
|
|
||
| particles.species_names = beam | ||
| particles.do_tiling = 1 | ||
| particles.do_tiling = 0 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks a bit random: Why changed?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without this change, the test_3d_magnetostatic_eb_run fails on the AMREX_ALWAYS_ASSERT below when run on a GPU: From the run without changing the input parameter:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ouch, yes! Fix in #6809 |
||
| beam.mass = m_e | ||
| beam.charge = -q_e | ||
| beam.injection_style = nuniformpercell | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,18 +49,54 @@ def check_restart(filename, tolerance=1e-12): | |
| dims=ds_benchmark.domain_dimensions, | ||
| ) | ||
|
|
||
| # Loop over all fields (all particle species, all particle attributes, all grid fields) | ||
| # and compare output data generated from initial run with output data generated after restart | ||
| # Separate grid fields from particle fields. Particle fields use the | ||
| # species name as field type; grid fields use 'boxlib'. | ||
| particle_species = set() | ||
| grid_fields = [] | ||
| for field in ds_benchmark.field_list: | ||
| ftype, fname = field | ||
| if ftype == "boxlib": | ||
| grid_fields.append(field) | ||
| elif ftype != "all": | ||
| particle_species.add(ftype) | ||
|
|
||
| print(f"\ntolerance = {tolerance}") | ||
| print() | ||
| for field in ds_benchmark.field_list: | ||
|
|
||
| # Compare grid fields directly (order is deterministic) | ||
| for field in grid_fields: | ||
| dr = ad_restart[field].squeeze().v | ||
| db = ad_benchmark[field].squeeze().v | ||
| error = np.amax(np.abs(dr - db)) | ||
| if np.amax(np.abs(db)) != 0.0: | ||
| error /= np.amax(np.abs(db)) | ||
| print(f"field: {field}; error = {error}") | ||
| assert error < tolerance | ||
|
|
||
| # Compare particle fields sorted by (particle_cpu, particle_id), since | ||
| # Redistribute() after checkpoint-restart may reorder particles across | ||
| # tiles/ranks. The (cpu, id) pair is the unique particle key in AMReX. | ||
| for species in sorted(particle_species): | ||
| species_fields = [f for f in ds_benchmark.field_list if f[0] == species] | ||
|
|
||
|
Comment on lines
+76
to
+81
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Trying to understand what does this patch improves: The stability of checksums?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. |
||
| id_r = np.atleast_1d(ad_restart[(species, "particle_id")].squeeze().v) | ||
| id_b = np.atleast_1d(ad_benchmark[(species, "particle_id")].squeeze().v) | ||
| cpu_r = np.atleast_1d(ad_restart[(species, "particle_cpu")].squeeze().v) | ||
| cpu_b = np.atleast_1d(ad_benchmark[(species, "particle_cpu")].squeeze().v) | ||
|
|
||
| sort_r = np.lexsort((id_r, cpu_r)) | ||
| sort_b = np.lexsort((id_b, cpu_b)) | ||
|
|
||
| for field in species_fields: | ||
| if field[1] in ("particle_id", "particle_cpu"): | ||
| continue | ||
| dr = np.atleast_1d(ad_restart[field].squeeze().v)[sort_r] | ||
| db = np.atleast_1d(ad_benchmark[field].squeeze().v)[sort_b] | ||
| error = np.amax(np.abs(dr - db)) | ||
| if np.amax(np.abs(db)) != 0.0: | ||
| error /= np.amax(np.abs(db)) | ||
| print(f"field: {field}; error = {error}") | ||
| assert error < tolerance | ||
| print() | ||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -71,6 +71,17 @@ def load_library(self): | |
| "Please write separate scripts for each geometry." | ||
| ) | ||
|
|
||
| # Import mpi4py before the pyAMReX (amrex.space*d) shared library is | ||
| # loaded so that mpi4py calls MPI_Init_thread first. With the Cray | ||
| # MPICH on Polaris/Sirius, if the AMReX shared library is loaded before | ||
|
Comment on lines
+74
to
+76
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to talk about this. Generally we support both, either AMReX-initialized MPI or externally (e.g., We need to see your reproducer where you had issues loading, so we can understand this better. Your patch here disables one of the two modes we support.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue with ordering of |
||
| # mpi4py initializes MPI, mpi4py's later MPI_Init_thread conflicts with | ||
| # the already-loaded MPI symbols and causes hangs or unbounded memory | ||
| # allocation during amrex::Initialize(). | ||
| try: | ||
| from mpi4py import MPI # noqa: F811,F401 | ||
| except ImportError: | ||
|
github-advanced-security[bot] marked this conversation as resolved.
Fixed
|
||
| pass # mpi4py is optional; MPI_Init handled by AMReX if absent | ||
|
|
||
| # --- Use geometry to determine whether to import the 1D, 2D, 3D or RZ version. | ||
| # --- The geometry must be setup before the lib warpx shared object can be loaded. | ||
| try: | ||
|
|
@@ -147,7 +158,7 @@ def load_library(self): | |
| register_warpx_WarpXParticleContainer_extension(self.libwarpx_so) | ||
|
|
||
| def amrex_init(self, argv, mpi_comm=None): | ||
| if mpi_comm is None: # or MPI is None: | ||
| if mpi_comm is None: | ||
| self.libwarpx_so.amrex_init(argv) | ||
| else: | ||
| raise Exception("mpi_comm argument not yet supported") | ||
|
|
@@ -172,9 +183,28 @@ def finalize(self, finalize_mpi=1): | |
| """ | ||
| # TODO: simplify, part of pyAMReX already | ||
| if self.initialized: | ||
| # GPU finalization workaround: on GPU backends, destroying the C++ | ||
| # WarpX object (del self.warpx) or calling amrex_finalize() can | ||
| # crash in SYCL/CUDA/HIP Device::Finalize() when it tries to | ||
| # synchronize streams whose backing static objects are already gone. | ||
|
Comment on lines
+187
to
+189
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @zippylab reproducer/example needed please. Not clear when this happens. This patch looks like a bandage. cc @WeiqunZhang in case this general
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PR description points to issues in:
This could be from external streams maybe initialized by either external MPI or by another lib not shown as reproducer, e.g., cupy/dpnp ...? Reproducer would be super helpful, because clearly the patch here cannot be merged.
Moved upstream to #5386
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for putting in the AMReX Issue, @ax3l . I'll find one of the example reproducers from my notes. |
||
| # Exit immediately via os._exit(0) BEFORE any C++ destructors run, | ||
| # and let the job launcher (PALS/mpiexec) handle cleanup. | ||
| try: | ||
| if self.libwarpx_so.Config.gpu_backend in ("SYCL", "CUDA", "HIP"): | ||
| try: | ||
| from mpi4py import MPI | ||
|
|
||
| MPI.COMM_WORLD.Barrier() | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So your strategy here is to call MPI Barrier, see that it crashes and
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Codex comment:
|
||
| except Exception: | ||
|
github-advanced-security[bot] marked this conversation as resolved.
Fixed
|
||
| pass # best-effort barrier; proceeding to _exit | ||
| os._exit(0) | ||
| except Exception: | ||
|
github-advanced-security[bot] marked this conversation as resolved.
Fixed
|
||
| pass # Config may not be available; fall through to normal cleanup | ||
|
|
||
| del self.warpx | ||
| # The call to warpx_finalize causes a crash - don't know why | ||
| # self.libwarpx_so.warpx_finalize() | ||
|
|
||
| self.libwarpx_so.amrex_finalize() | ||
|
|
||
| from pywarpx import callbacks | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -593,6 +593,10 @@ for (const auto & particle_diag : particle_diags) { | |
| particlesConvertUnits(ConvertDirection::SI_to_WarpX, pc, mass); | ||
| } | ||
|
|
||
| // On GPU backends, copyParticles may be asynchronous; synchronize before | ||
| // passing pinned-memory pointers to openPMD's storeChunkRaw. | ||
| amrex::Gpu::streamSynchronize(); | ||
|
Comment on lines
+596
to
+598
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry Claude, it may not be asynchronous, if you had taken the step to check the header instead of guessing: Please see:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please provide a minimal reproducer that failed? All the ops above/below are ParIter loops that always auto-synchronize, so I cannot spot an async race condition here. But there might be something we overlook. |
||
|
|
||
| // Gather the electrostatic potential (phi) on the macroparticles | ||
| if ( particle_diag.m_plot_phi ) { | ||
| storePhiOnParticles( tmp, WarpX::electrostatic_solver_id, !use_pinned_pc ); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,13 +14,31 @@ | |
| # include "Particles/ElementaryProcess/QEDInternals/QuantumSyncEngineWrapper.H" | ||
| #endif | ||
|
|
||
| #include <AMReX_GpuAllocators.H> | ||
| #include <AMReX_GpuContainers.H> | ||
| #include <AMReX_REAL.H> | ||
|
|
||
| #include <cmath> | ||
| #include <map> | ||
| #include <string> | ||
|
|
||
| namespace ParticleCreation { | ||
| /** Whether particle data with allocator Alloc lives on the GPU. | ||
| * amrex::RunOnGpu is not specialised for PolymorphicArenaAllocator, | ||
| * so we extend the check here. PolymorphicArenaAllocator in WarpX | ||
| * always wraps a device arena, so it is safe to treat it as GPU. */ | ||
| template <typename Alloc> | ||
| constexpr bool particles_on_gpu () | ||
| { | ||
| #ifdef AMREX_USE_GPU | ||
| return amrex::RunOnGpu<Alloc>::value | ||
| || amrex::IsPolymorphicArenaAllocator<Alloc>::value; | ||
| #else | ||
| return false; | ||
| #endif | ||
| } | ||
| } // namespace ParticleCreation | ||
|
|
||
| /** | ||
| * \brief This set of initialization policies describes what happens | ||
| * when we need to create a new particle due to an elementary process. | ||
|
|
@@ -158,7 +176,7 @@ void DefaultInitializeRuntimeAttributes (PTile& ptile, | |
| const QuantumSynchrotronGetOpticalDepth quantum_sync_get_opt = | ||
| p_qs_engine->build_optical_depth_functor(); | ||
| // If the particle tile was allocated in a memory pool that can run on GPU, launch GPU kernel | ||
| if constexpr (amrex::RunOnGpu<typename PTile::template AllocatorType<amrex::Real>>::value) { | ||
| if constexpr (ParticleCreation::particles_on_gpu<typename PTile::template AllocatorType<amrex::Real>>()) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @atmyers @WeiqunZhang this could be a legit bug from transition to polymorphic PC, do I see that right? Bug generally speaking, this cannot be a The patch no treats every This pattern is very likely used more often in the code base, both WarpX (5x in init) and AMReX.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. If it is polymorphic, the check can only be done at runtime with
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix isolated in #6808 |
||
| amrex::ParallelForRNG(stop - start, | ||
| [=] AMREX_GPU_DEVICE (int i, amrex::RandomEngine const& engine) noexcept { | ||
| const int ip = i + start; | ||
|
|
@@ -185,7 +203,7 @@ void DefaultInitializeRuntimeAttributes (PTile& ptile, | |
| const BreitWheelerGetOpticalDepth breit_wheeler_get_opt = | ||
| p_bw_engine->build_optical_depth_functor();; | ||
| // If the particle tile was allocated in a memory pool that can run on GPU, launch GPU kernel | ||
| if constexpr (amrex::RunOnGpu<typename PTile::template AllocatorType<amrex::Real>>::value) { | ||
| if constexpr (ParticleCreation::particles_on_gpu<typename PTile::template AllocatorType<amrex::Real>>()) { | ||
| amrex::ParallelForRNG(stop - start, | ||
| [=] AMREX_GPU_DEVICE (int i, amrex::RandomEngine const& engine) noexcept { | ||
| const int ip = i + start; | ||
|
|
@@ -214,7 +232,7 @@ void DefaultInitializeRuntimeAttributes (PTile& ptile, | |
| const amrex::ParserExecutor<7> user_real_attrib_parserexec = | ||
| user_real_attrib_parser[ia]->compile<7>(); | ||
| // If the particle tile was allocated in a memory pool that can run on GPU, launch GPU kernel | ||
| if constexpr (amrex::RunOnGpu<typename PTile::template AllocatorType<amrex::Real>>::value) { | ||
| if constexpr (ParticleCreation::particles_on_gpu<typename PTile::template AllocatorType<amrex::Real>>()) { | ||
| amrex::ParallelFor(stop - start, | ||
| [=] AMREX_GPU_DEVICE (int i) noexcept { | ||
| const int ip = i + start; | ||
|
|
@@ -246,7 +264,7 @@ void DefaultInitializeRuntimeAttributes (PTile& ptile, | |
| if (it_ioniz != particle_icomps.end() && | ||
| std::distance(particle_icomps.begin(), it_ioniz) == j) | ||
| { | ||
| if constexpr (amrex::RunOnGpu<typename PTile::template AllocatorType<int>>::value) { | ||
| if constexpr (ParticleCreation::particles_on_gpu<typename PTile::template AllocatorType<int>>()) { | ||
| amrex::ParallelFor(stop - start, | ||
| [=] AMREX_GPU_DEVICE (int i) noexcept { | ||
| const int ip = i + start; | ||
|
|
@@ -268,7 +286,7 @@ void DefaultInitializeRuntimeAttributes (PTile& ptile, | |
| { | ||
| const amrex::ParserExecutor<7> user_int_attrib_parserexec = | ||
| user_int_attrib_parser[ia]->compile<7>(); | ||
| if constexpr (amrex::RunOnGpu<typename PTile::template AllocatorType<int>>::value) { | ||
| if constexpr (ParticleCreation::particles_on_gpu<typename PTile::template AllocatorType<int>>()) { | ||
| amrex::ParallelFor(stop - start, | ||
| [=] AMREX_GPU_DEVICE (int i) noexcept { | ||
| const int ip = i + start; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good-ish, but should be generalized for both tests to use:
AMREX_DEFAULT_INITAMReX-Codes/amrex#4947