Fix GPU backend bugs, Python analysis scripts, and test inputs for GPU CI#6801
Fix GPU backend bugs, Python analysis scripts, and test inputs for GPU CI#6801zippylab wants to merge 12 commits intoBLAST-WarpX:developmentfrom
Conversation
amrex::RunOnGpu<PolymorphicArenaAllocator> is false_type because AMReX does not specialize RunOnGpu for this allocator. This causes DefaultInitializeRuntimeAttributes to take the CPU code path even when particle data lives on device, leading to segfaults when ionization or user-defined runtime attributes are used with SYCL (and potentially other GPU backends that don't have UVA). Add ParticleCreation::particles_on_gpu<Alloc>() helper that checks both RunOnGpu and IsPolymorphicArenaAllocator, and use it at all 5 dispatch sites. This fixes test_2d_ionization_picmi and any other test that creates particles via ionization or with parser-evaluated attributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes for running PICMI (Python) tests on SYCL/Intel PVC: 1. CMakeLists: Set AMREX_THE_ARENA_INIT_SIZE=0 for Python tests so GPU memory is allocated on-demand, matching native test behavior. 2. WarpXOpenPMD: Add streamSynchronize() after copyParticles and before openPMD storeChunkRaw, since the copy may be asynchronous on GPU backends. 3. _libwarpx.py: On SYCL, call os._exit(0) before amrex_finalize() to avoid a crash in Device::Finalize() when C++ static destructors have already run before the Python atexit handler. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set args.format = "openpmd" in the exception handler so downstream code knows the intended format even when OpenPMDTimeSeries() fails to open the file. Without this, args.format is unset and the analysis script crashes with an AttributeError. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Increase amr.max_grid_size from 8 to 16 in the theta-implicit JFNK Van der Burg tests, and set amr.max_grid_size = 128 in the multi-level PSATD langmuir test. These avoid grid decomposition issues observed on Aurora with SYCL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _libwarpx.py: import mpi4py before amrex_init so MPI_Init_thread runs first; with the Cray MPICH version on Polaris, AMReX calling MPI_Init before mpi4py causes mpi4py COMM_WORLD collectives to hang - _libwarpx.py: extend os._exit(0) finalization workaround to all GPU backends (CUDA/HIP/SYCL), not just SYCL; remove unnecessary Barrier - analysis_default_restart.py: sort particles by (cpu, id) before comparing restart vs benchmark, since Redistribute after checkpoint restart may reorder particles across tiles/ranks - magnetostatic_eb: disable particle tiling to avoid ParticleTile arena allocation issue on GPU Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
np.squeeze() on a single-element array produces a 0-d scalar, which can't be indexed with sort indices. Wrap with np.atleast_1d() to keep particle data as 1-d arrays. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…explosion The previous mpi4py fix (693bc3f) imported mpi4py in amrex_init() so that MPI_Init_thread would be called before amrex::Initialize(). This fixed Cray MPICH hangs in the langmuir tests, but caused unbounded memory allocation (~340 MB/s) in any PICMI test that uses Python callbacks (e.g. LoadInitialFieldFromPython). Root cause: in the PICMI flow, load_library() is triggered early during initialize_inputs() (via callback registration accessing libwarpx_so), which imports the pyAMReX shared library (amrex.space*d). When amrex_init() later imports mpi4py, its MPI_Init_thread conflicts with MPI symbols already loaded by pyAMReX, causing amrex::Initialize() to enter an infinite allocation loop. Fix: move the mpi4py import to the top of load_library(), before the amrex.space*d import. Since amrex_init() accesses self.libwarpx_so (which triggers load_library() via __getattr__), mpi4py is always imported before any pyAMReX code loads, regardless of whether the PICMI flow or the direct flow is used. Verified on Sirius (A100, Cray MPICH 8.1.28): - test_rz_psatd_eigenmode (was hanging/OOM): passes - test_rz_langmuir_multi_picmi (original hang fix): still passes - Full 975-test CI suite: no new run failures Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The find_first_non_zero_from_bottom_left/upper_right helper functions used ``matrix[i][j] != np.nan`` to skip NaN values. Per IEEE 754, NaN != NaN is always True, so this check never filtered out NaN cells. On GPU platforms where openpmd outputs NaN for cells outside the refined patch (rather than zero), the analysis selected NaN regions as valid data, producing ``nan`` errors for level 1 and failing the assertion. Fix: replace ``!= np.nan`` with ``not np.isnan()``. With the fix, level 1 errors on A100 GPU are 0.029% (phi) and 0.083% (Er), well within the 0.4% tolerance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move os._exit(0) before del self.warpx so the Python atexit handler exits immediately on GPU backends (SYCL/CUDA/HIP) without triggering any C++ destructors. The previous ordering called del self.warpx first, which could crash in Device::Finalize() when static objects backing the SYCL/CUDA streams were already destroyed. Restore MPI Barrier before os._exit(0) to ensure coordinated multi-rank shutdown — without it, the job launcher may kill straggler ranks with a signal before they finish flushing output. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t should go away with an upstream AMReX fix
|
There are a few tests where I changed input parameters, so the checksums are failing. A representative example is The reason for the change: Complete set of failing tests: 82 - test_2d_theta_implicit_jfnk_vandb.checksum (Failed) I wasn't sure whether I should update the checksums. Willing to help with that if appropriate. |
|
Hi Timothy @zippylab, Thank you very much for your PR! Since this is your first PR and it does a bunch of unrelated updates, please consider our contribution guidelines and our LLM-assisted development guideline. Am I correct that this is an "all the things I/Codex/Claude/Cursor found that are buggy when I tried running WarpX on Aurora" PR? (We would usually split totally separate fixes in separate PRs, so we can easier review and patch/port/cherry-pick them...) Python Zero-Copy Bindings /
|
| OpenPMDTimeSeries(args.path) | ||
| except Exception: | ||
| print("Could not open the file as a plotfile or an openPMD time series") | ||
| args.format = "openpmd" |
There was a problem hiding this comment.
Why this assignment?
All file format checks have failed at this point, no?
There was a problem hiding this comment.
Moreover, analysis_default_regression.py here should be just a link to https://github.com/BLAST-WarpX/warpx/blob/development/Examples/analysis_default_regression.py. Fixing the code within the test directory won't work. If it is not a link, but a hard copy, that should be fixed in the first place.
There was a problem hiding this comment.
looks like the script here changes the args.rtol though for restart... Needs some generalization.
Do you like to simplify the format detection logic? @EZoni
@lucafedeli88 suggests:
I would simply change the logic and exit with an error message if the script can't read the output file.
There was a problem hiding this comment.
The link issue is fixed in #6810. Further fixes to the underlying code, if needed, should be done in the original file.
There was a problem hiding this comment.
#6810 and #6811 fix the link issues, but coming back to the original suggestion, @ax3l and @lucafedeli88,
Do you like to simplify the format detection logic? @EZoni
@lucafedeli88 suggests:I would simply change the logic and exit with an error message if the script can't read the output file.
I think I don't understand it.
The current code is
warpx/Examples/analysis_default_regression.py
Lines 69 to 80 in ad2c5b9
so an error ("Could not open the file as a plotfile or an openPMD time series") is already raised as Exception if the code can't read the output file:
- It first tries to open
args.pathusingyt.load(args.path). - If that succeeds, it sets
args.format = "plotfile"via theelseclause. - If that fails, it falls into the outer
exceptand triesOpenPMDTimeSeries(args.path)instead. - If that succeeds, it sets
args.format = "openpmd". - If that also fails, it prints an error message and
args.formatis never set.
There was a problem hiding this comment.
What is it that does not work with this and/or what is it that you would like to change? Simply adding a sys.exit(1) after the error message?
There was a problem hiding this comment.
Here's my attempt of interpretation, let me know if this is what you had in mind: #6812.
There was a problem hiding this comment.
Without this change, args.format remains unset if the inner exception is thrown, correct? Subsequent usage of args.format could be problematic. Something that exits before main() is invoked would avoid passing down an unset args.format. (E.G., remove the extra else: in the inner try block and replace it with sys.exit(1) as suggested.) If all the code is robust against/agnostic to an unset value, then it doesn't matter and the code could stay as it was before this commit. @EZoni, your fix in #6812 handles it the best way.
| if (matrix[i][j] != 0) and (not np.isnan(matrix[i][j])): | ||
| return (i, j) |
There was a problem hiding this comment.
Good fix. np.nan != np.nan is always True, so the old code was buggy. (NaN is never equal to anything, including itself.)
| max_step = 20 | ||
| amr.n_cell = 40 40 | ||
| amr.max_grid_size = 8 | ||
| amr.max_grid_size = 16 |
There was a problem hiding this comment.
Looks a bit random: Why did you increase this but not blocking factor, too?
There was a problem hiding this comment.
Thanks, saw also PR description now.
Adjust test inputs for GPU compatibility (tiling assertion, grid decomposition warnings)
I think we probably want to increase max_grid_size and blocking factor for the tests that throw warnings.
|
|
||
| particles.species_names = beam | ||
| particles.do_tiling = 1 | ||
| particles.do_tiling = 0 |
There was a problem hiding this comment.
Without this change, the test_3d_magnetostatic_eb_run fails on the AMREX_ALWAYS_ASSERT below when run on a GPU:
1358 // The GPU implementation of Redistribute
1359 //
1360 template <typename ParticleType, int NArrayReal, int NArrayInt,
1361 template<class> class Allocator, class CellAssignor>
1362 void
1363 ParticleContainer_impl<ParticleType, NArrayReal, NArrayInt, Allocator, CellAssignor>
1364 ::RedistributeGPU (int lev_min, int lev_max, int nGrow, int local, bool remove_negative)
1365 {
1366 #ifdef AMREX_USE_GPU
1367
1368 if (local) { AMREX_ASSERT(numParticlesOutOfRange(*this, lev_min, lev_max, local) == 0); }
1369
1370 // sanity check
1371 AMREX_ALWAYS_ASSERT(do_tiling == false);
From the run without changing the input parameter:
8/55 Test #503: test_3d_magnetostatic_eb.run .....................................***Failed 2.35 sec
Initializing AMReX (26.04-68-g7e9ce72d229c)...
MPI initialized with 1 MPI processes
MPI initialized with thread support level 3
Initializing SYCL...
Multiple GPUs are visible to each MPI rank. This is usually not an issue. But this may lead to incorrect or suboptimal rank-to-GPU mapping.!
SYCL initialized with 1 device.
AMReX (26.04-68-g7e9ce72d229c) initialized
PICSAR (25.06)
WarpX (26.03-105-g18012c92d2bb)
___
__ __ ___ / /
\ \ / /_ _ _ __ _ __\ \/ /
\ \ /\ / / _` | '__| '_ \\ /
\ V V / (_| | | | |_) / \
\_/\_/ \__,_|_| | .__/__/\ \
|_| \__\
Level 0: dt = 1e-12 ; dx = 0.0078125 ; dy = 0.0078125 ; dz = 0.015625
terminate called after throwing an instance of 'std::runtime_error'
what(): Assertion `do_tiling == false' failed, file "/home/zippy/src/warpx/build_RelWithDebInfo/_deps/fetchedamrex-src/Src/Particle/AMReX_ParticleContainerI.H", line 1371
SIGABRT
See Backtrace.0 file for details
Abort(6) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 6) - process 0
x1922c6s0b0n0: rank 0 exited with code 6
| # FIXME Use helper function to handle Windows exceptions | ||
| set_property(TEST ${name}.run APPEND PROPERTY ENVIRONMENT "PYTHONPATH=$ENV{PYTHONPATH}:${CMAKE_PYTHON_OUTPUT_DIRECTORY}") | ||
| # allocate GPU memory on-demand for Python tests too (same reason as native tests below) | ||
| set_property(TEST ${name}.run APPEND PROPERTY ENVIRONMENT "AMREX_THE_ARENA_INIT_SIZE=0") |
There was a problem hiding this comment.
Good-ish, but should be generalized for both tests to use: AMREX_DEFAULT_INIT
| # Import mpi4py before the pyAMReX (amrex.space*d) shared library is | ||
| # loaded so that mpi4py calls MPI_Init_thread first. With the Cray | ||
| # MPICH on Polaris/Sirius, if the AMReX shared library is loaded before |
There was a problem hiding this comment.
We need to talk about this.
Generally we support both, either AMReX-initialized MPI or externally (e.g., mpi4py initialized MPI).
We need to see your reproducer where you had issues loading, so we can understand this better.
The problem does not appear for us on other Cray MPICH systems like Perlmutter (NERSC) or Frontier (OLCF) or Tuolumne (LLNL) or Adastra.
Your patch here disables one of the two modes we support.
There was a problem hiding this comment.
The issue with ordering of import mpi4py is Polaris-specific. It has different Cray MPICH version and other differences in main software versions w.r.t. Perlmutter, and we've had multiple user issues reported that required re-ordering import statements like this. Since it's breaking the other mode, though, it clearly needs a different approach. I'll break this out into an issue with the test that demonstrates it identified.
| # WarpX object (del self.warpx) or calling amrex_finalize() can | ||
| # crash in SYCL/CUDA/HIP Device::Finalize() when it tries to | ||
| # synchronize streams whose backing static objects are already gone. |
There was a problem hiding this comment.
@zippylab reproducer/example needed please. Not clear when this happens. This patch looks like a bandage.
cc @WeiqunZhang in case this general MPI_Barrier attempt here appears useful for amrex::Finalize()...
There was a problem hiding this comment.
PR description points to issues in:
On all GPU backends, the Python atexit finalization can crash in amrex::Gpu::Device::Finalize() when it synchronizes streams whose backing C++ static objects (external_stream_stack) have already been destroyed.
This could be from external streams maybe initialized by either external MPI or by another lib not shown as reproducer, e.g., cupy/dpnp ...? Reproducer would be super helpful, because clearly the patch here cannot be merged.
This is an aggressive workaround — it bypasses all C++ cleanup and relies on the job launcher for process cleanup. The proper fix requires changes in AMReX to address the static destructor ordering in Gpu::Device. Without this workaround, PICMI tests with large particle counts (e.g., test_2d_ohm_solver_landau_damping_picmi) are marked FAILED despite completing the simulation correctly, because ctest sees the crash exit code.
Moved upstream to #5386
There was a problem hiding this comment.
Thanks for putting in the AMReX Issue, @ax3l . I'll find one of the example reproducers from my notes.
| // On GPU backends, copyParticles may be asynchronous; synchronize before | ||
| // passing pinned-memory pointers to openPMD's storeChunkRaw. | ||
| amrex::Gpu::streamSynchronize(); |
There was a problem hiding this comment.
Sorry Claude, it may not be asynchronous, if you had taken the step to check the header instead of guessing:
https://github.com/AMReX-Codes/amrex/blob/26.04/Src/Particle/AMReX_ParticleTransformation.H#L148-L198
Please see:
https://warpx.readthedocs.io/en/latest/developers/how_to_develop_with_llms.html
There was a problem hiding this comment.
Can you please provide a minimal reproducer that failed? All the ops above/below are ParIter loops that always auto-synchronize, so I cannot spot an async race condition here. But there might be something we overlook.
| p_qs_engine->build_optical_depth_functor(); | ||
| // If the particle tile was allocated in a memory pool that can run on GPU, launch GPU kernel | ||
| if constexpr (amrex::RunOnGpu<typename PTile::template AllocatorType<amrex::Real>>::value) { | ||
| if constexpr (ParticleCreation::particles_on_gpu<typename PTile::template AllocatorType<amrex::Real>>()) { |
There was a problem hiding this comment.
@atmyers @WeiqunZhang this could be a legit bug from transition to polymorphic PC, do I see that right? Bug generally speaking, this cannot be a constexpr at all anymore (I think the flagging of this line is right, the patch I would write as arena runtime check...). Do you agree?
The patch no treats every PolymorphicArenaAllocator as GPU-resident, which is wrong. (The development code treats every PolymorphicArenaAllocator as CPU-resident, which is wrong, too.)
This pattern is very likely used more often in the code base, both WarpX (5x in init) and AMReX.
There was a problem hiding this comment.
Right. If it is polymorphic, the check can only be done at runtime with arena()->isManaged() || arena()->isDevice().
| try: | ||
| from mpi4py import MPI | ||
|
|
||
| MPI.COMM_WORLD.Barrier() |
There was a problem hiding this comment.
So your strategy here is to call MPI Barrier, see that it crashes and os._exit(0), i.e., w/o an error code? This looks like a massive hack, no? :)
There was a problem hiding this comment.
Codex comment:
High: Python/pywarpx/_libwarpx.py:200 calls os._exit(0) from finalize(), which
is both an atexit handler and public API via pywarpx.warpx.finalize() / PICMI
Simulation.finalize(). On GPU backends this can turn an uncaught Python
exception after WarpX initialization into a successful process exit, hiding
failed tests. It also makes explicit finalize() terminate the interpreter
instead of returning. This should be split from normal/manual finalization,
and it should not force exit code 0 on error paths.
| # Compare particle fields sorted by (particle_cpu, particle_id), since | ||
| # Redistribute() after checkpoint-restart may reorder particles across | ||
| # tiles/ranks. The (cpu, id) pair is the unique particle key in AMReX. | ||
| for species in sorted(particle_species): | ||
| species_fields = [f for f in ds_benchmark.field_list if f[0] == species] | ||
|
|
There was a problem hiding this comment.
Trying to understand what does this patch improves: The stability of checksums?
There was a problem hiding this comment.
Yes. Redistribute after the restart from the checkpoint can reorder particle IDs across ranks, e.g. Sorting them by ID orders the compared checksum calculations so they agree more closely. Some tests were failing that checksum comparison without it.
|
Hi @ax3l
That's basically correct. I was working on MKL for SYCL support of FFTs on Aurora (separate draft PR), and started running the CI test set to see what it might've broken. That showed a bunch of unrelated test failures, such as tests failing on shutdown. Then, running on Polaris to verify another platform showed some other failures there. I could break this up into smaller pieces as needed, and will recover reproducers for some of your specific asks. |
|
Thanks @zippylab! I started pulling out a few bug fixes in separate PRs and added inline comments where the patch looks a bit unclear and needs reproducers 🙏 Thanks for your help! 👍 |
…on (#6808) Follow-up from transitioning to polymorphic particle containers #6374. Isolated from #6801 `amrex::RunOnGpu<PolymorphicArenaAllocator>` is `false_type` because AMReX does not specialize `RunOnGpu` for this allocator. This causes `DefaultInitializeRuntimeAttributes` to take the CPU code path even when particle data lives on device, leading to segfaults when ionization or user-defined runtime attributes are used with SYCL (and potentially other GPU backends that don't have UVA). This affected users that used runtime attributes on hardware that did not successfully fell back to unified memory (and we try to avoid that anyway). Co-authored-by: Tim Williams <zippy@anl.gov>
Set by default: on for CPU, off for GPU. If set to on (1) on GPU runs, this asserts. Isolated from #6801
## Summary In IEEE floats, `np.nan != np.nan` is _always_ `True`, so the old code was buggy: NaN is never equal to anything, _including itself_. ## Original Description - Spotted by @zippylab The find_first_non_zero_from_bottom_left/upper_right helper functions used ``matrix[i][j] != np.nan`` to skip NaN values. Per IEEE 754, NaN != NaN is always True, so this check never filtered out NaN cells. On GPU platforms where openpmd outputs NaN for cells outside the refined patch (rather than zero), the analysis selected NaN regions as valid data, producing ``nan`` errors for level 1 and failing the assertion. Fix: replace ``!= np.nan`` with ``not np.isnan()``. With the fix, level 1 errors on A100 GPU are 0.029% (phi) and 0.083% (Er), well within the 0.4% tolerance. ## X-ref Isolated from #6801 Co-authored-by: Tim Williams <tjwilliams@anl.gov>
## Overview This fixes a silent failure on format detection in analysis_default_regression.py. If both `yt` and `openPMD` attempts fail, the code printed a message but continued execution. Depending on what follows, this could lead to a confusing downstream error or silent misbehavior. The issue came up in #6801 (comment) as part of #6801. ## Notes I added a few empty lines to improve readability as well.
## Overview The file `analysis_default_regression.py` in the directory of the pierce diode test, originally added in #5999, is a hard copy but should be a link, as described in our documentation at [warpx.readthedocs.io/en/latest/developers/how_to_test.html#how-to-add-automated-tests](https://warpx.readthedocs.io/en/latest/developers/how_to_test.html#how-to-add-automated-tests): > 6. If the test directory is new, make a symbolic link to the default regression analysis script analysis_default_regression.py from [Examples/analysis_default_regression.py](https://github.com/BLAST-WarpX/warpx/blob/development/Examples/analysis_default_regression.py), by running `ln -s ../../analysis_default_regression.py analysis_default_regression.py` from the test directory. This was observed in #6801 (comment). ## Related PRs - #6811
Summary
PolymorphicArenaAllocatorGPU dispatch in particle initialization — affects all GPU backends but was masked on CUDA by UVAmpi4pyimport ordering that caused hangs or unbounded memory allocation on Cray MPICH systems (Polaris, Sirius, Frontier)1. GPU Backend Bug Fixes
1a.
PolymorphicArenaAllocatorGPU dispatchFile:
Source/Particles/ParticleCreation/DefaultInitialization.HDefaultInitializeRuntimeAttributesusesamrex::RunOnGpu<Alloc>::valueto choose between GPU kernel launch and host loop. AMReX does not specializeRunOnGpuforPolymorphicArenaAllocator, so the check returnsfalseeven when the allocator wraps a device arena. This causes the host code path to dereference device pointers — a segfault on SYCL (strict device memory), silently slow on CUDA/HIP (UVA page faults).Added
ParticleCreation::particles_on_gpu<Alloc>()helper that checks bothRunOnGpu<Alloc>::valueandIsPolymorphicArenaAllocator<Alloc>::value. Updated all 5if constexprdispatch sites (QED optical depth, user-defined real/int attribute parsers, ionization level).Reproducer:
test_2d_ionization_picmicrashes at first ionization event because the electron product species has runtime real attributes (orig_z). Even withorig_z = 0.0(constant parser), the crash occurs — the issue is the kernel launch path, not the parser content.The proper upstream fix would be adding a
RunOnGpu<PolymorphicArenaAllocator<T>>specialization in AMReX.1b. openPMD particle write stream synchronization
File:
Source/Diagnostics/WarpXOpenPMD.cppcopyParticles()from device to pinned memory may be asynchronous on GPU backends. The subsequentstoreChunkRaw()calls in openPMD immediately read from the pinned buffer, potentially before the copy completes. Addedamrex::Gpu::streamSynchronize()aftercopyParticlesand before the openPMD write.1c. On-demand GPU arena for Python tests
File:
Examples/CMakeLists.txtSet
AMREX_THE_ARENA_INIT_SIZE=0for Python/PICMI tests so AMReX allocates GPU memory on demand instead of pre-allocating 75% of device memory. This prevents OOM when a prior crashed test left GPU memory unreleased.1d.
mpi4pyimport ordering for Cray MPICHFile:
Python/pywarpx/_libwarpx.pyOn Cray MPICH (Polaris, Sirius, and likely Frontier/Perlmutter), loading the pyAMReX shared library (
amrex.space2d, etc.) beforempi4pycallsMPI_Init_threadcauses a conflict with already-loaded MPI symbols. The result is either a hang or unbounded memory allocation (~340 MB/s) duringamrex::Initialize().Fix: import
mpi4pyat the top ofload_library(), before anyamrex.space*dimport. This ensures Python'sMPI_Init_threadruns before any MPI symbols are loaded via shared library. The import is wrapped intry/except ImportErrorso it's a no-op when mpi4py isn't installed.This was diagnosed on Sirius (A100, Cray MPICH 8.1.28) and verified on Sunspot (PVC, ANL MPICH). It is relevant to all Cray MPICH systems.
1e. GPU finalize crash workaround
File:
Python/pywarpx/_libwarpx.pyOn all GPU backends, the Python
atexitfinalization can crash inamrex::Gpu::Device::Finalize()when it synchronizes streams whose backing C++ static objects (external_stream_stack) have already been destroyed.Workaround: call
os._exit(0)before any C++ destructors (del self.warpx,amrex_finalize()) on GPU backends, with a coordinatedMPI.COMM_WORLD.Barrier()to ensure all ranks complete before exit.This is an aggressive workaround — it bypasses all C++ cleanup and relies on the job launcher for process cleanup. The proper fix requires changes in AMReX to address the static destructor ordering in
Gpu::Device. Without this workaround, PICMI tests with large particle counts (e.g.,test_2d_ohm_solver_landau_damping_picmi) are marked FAILED despite completing the simulation correctly, becausectestsees the crash exit code.2. Analysis Script Fixes
2a. NaN handling in electrostatic_sphere_eb MR analysis
File:
Examples/Tests/electrostatic_sphere_eb/analysis_rz_mr.pyThe
find_first_non_zero_from_bottom_leftandfind_first_non_zero_from_upper_rightfunctions usedmatrix[i][j] != np.nanto filter NaN values. Per IEEE 754,NaN != NaNis alwaysTrue, so this check never filtered anything. On GPU backends where openPMD writes NaN for out-of-domain embedded-boundary cells, this caused the analysis to select NaN regions and fail.Fixed to
not np.isnan(matrix[i][j]).2b. Restart analysis: single-particle IndexError and particle sort
File:
Examples/analysis_default_restart.pyTwo fixes:
np.squeeze()on a single-element array returns a 0-d scalar that can't be indexed. Wrapped withnp.atleast_1d().(cpu, id)before comparing restart vs benchmark data, sinceRedistribute()after checkpoint-restart reorders particles across tiles/ranks.2c. openPMD format detection fallback
Files:
Examples/analysis_default_regression.py,Examples/Physics_applications/pierce_diode/analysis_default_regression.pyWhen the openPMD file cannot be opened (e.g., missing HDF5 backend),
args.formatwas left unset, causingAttributeErrordownstream. Setargs.format = "openpmd"in the exception handler.3. Test Input Adjustments
inputs_test_3d_magnetostatic_eb: Disabled particle tiling (do_tiling = 0).RedistributeGPU()assertsdo_tiling == false; the assertion only fires in GPU builds.inputs_test_2d_theta_implicit_jfnk_vandband_filtered: Increasedamr.max_grid_sizefrom 8 to 16 to avoid excessive box decomposition that triggersabort_on_warning_thresholdon GPU.inputs_test_2d_langmuir_multi_mr_psatd: Addedamr.max_grid_size = 128for the same reason.Upstream Dependencies (Not in This PR)
These issues were discovered during GPU CI validation and require separate upstream PRs:
AMReX
ParticleTile::define()missing arena on pointer-of-pointer vectors —m_runtime_r_ptrs,m_runtime_i_ptrs,m_runtime_r_cptrs,m_runtime_i_cptrsdefault to device arena even when the tile uses a pinned arena. Causes segfault inpackIODataCpuwhen writing plotfiles with runtime particle attributes on GPU. Locally patched in the build tree'sAMReX_ParticleTile.Hbut not included in this PR.AMReX
Device::Finalize()static destructor ordering — Root cause of theos._exit(0)workaround in 1e.pyAMReX
to_xp()for non-CUDA GPU backends —Array4.to_xp()andMultiFab.to_xp()wrap device pointers as numpy arrays on SYCL (segfault). Needs dpnp support for SYCL.CI Validation
Full 975-test CI suite on Sirius (NVIDIA A100, Cray MPICH) with all fixes applied:
Tests directly fixed by this PR
test_rz_electrostatic_sphere_eb_mrtest_*_restart(single-particle species)test_3d_magnetostatic_ebtest_rz_psatd_eigenmodeI used Claude Code (carefully) in developing and testing these changes, and in drafting this PR description.