Fix performance regression in libMesh unstructured mesh tallies #3577

nuclearkevin · 2025-09-21T18:19:47Z

Description

#3185 added support for adaptive libMesh tallies initialized by an external C++ driver application (e.g. Cardinal) to support a project investigating adaptive mesh refinement applied to unstructured mesh tallies in Monte Carlo particle transport. Unfortunately, this addition also introduced a bug which yielded a massive performance penalty during tallying for users of both OpenMC and Cardinal. The crux of the issue is the function

int LibMesh::n_bins() const
{
  return m_->n_elem();
}

was changed to

int LibMesh::n_bins() const
{
  return m_->n_active_elem();
}

Without getting too deep into the weeds of how libMesh is designed and the process behind adaptive mesh refinement, the return value of n_elem() is cached when the libMesh mesh is constructed, while n_active_elem() (required for AMR) is not. This necessitates what is essentially a loop over all elements every time n_active_elem() is called, which in the case of transport is every time a libMesh element accumulates a hit due to this function:

int LibMesh::get_bin_from_element(const libMesh::Elem* elem) const
{
  int bin =
    adaptive_ ? elem_to_bin_map_[elem->id()] : elem->id() - first_element_id_;
  if (bin >= n_bins() || bin < 0) {
    fatal_error(fmt::format("Invalid bin: {}", bin));
  }
  return bin;
}

This Implementation

To fix this performance regression, I've reverted the LibMesh class back to it's previous state. All of the adaptivity code necessary for mesh tally AMR in Cardinal has been moved to a new class called AdaptiveLibMesh which inherits from LibMesh and overrides the few functions required to get adaptivity working. n_active_elem() is cached on construction to avoid any performance hits in transport. This should insulate OpenMC users from any other changes made to support the AMR project in Cardinal, such as #3553.

Simpler Implementation

If a simpler fix is desired, I can revert these changes and simply implement the cache for n_active_elem() in LibMesh. I decided to go all the way in an attempt to guard against future performance regressions.

FYI @pshriwise @aprilnovak

Checklist

I have performed a self-review of my own code
I have run clang-format (version 15) on any C++ source files (if applicable)
~~I have followed the style guidelines for Python source files (if applicable)~~
~~I have made corresponding changes to the documentation (if applicable)~~
~~I have added tests that prove my fix is effective or that my feature works (if applicable)~~

nuclearkevin · 2025-09-22T15:51:34Z

Here are some more quantitative performance metrics taken on my personal laptop (Intel Core i7 11800H) using /tests/regression_tests/unstructured_mesh/ which tallies on a ~12,000 element tetrahedral mesh:

Mesh Tally Implementation	1 OpenMP Thread	14 OpenMP Threads
`develop`	321.02 particles per second	1563.55 particles per second
Cache `n_active_elem()`	12019 particles per second	59212.3 particles per second
Splitting the AMR code into an `AdaptiveLibMesh` and a `LibMesh` (this PR)	14227.6 particles per second	64610.3 particles per second

There is a severe per-thread performance penalty to libMesh tallies when n_active_elem() is not cached with no hit to threaded scalability. After caching, performance jumps back up by two orders of magnitude. Separating out the AMR code into it's own class further improves performance, likely due to improved cache alignment as the mapping data structures no longer get fetched in get_bin_from_element(...). This is just a theory at the moment, I haven't profiled this aspect of the change as my main goal was to get unstructured mesh tallies back to the same order of magnitude prior to the addition of AMR support.

pshriwise

Thanks for creating this fix promptly, @nuclearkevin! I agree with the approach to insulate standard LibMesh mesh tallies from adaptive capabilities.

paulromano

Looks good to me too. Thanks @nuclearkevin!

nuclearkevin added 5 commits September 21, 2025 10:58

AdaptiveLibMesh.

b0656c1

Mark as virtual and swap to protected.

9363104

Formatting.

7a0e4dc

Protected.

ecec2e0

Formatting.

c2523ff

pshriwise approved these changes Sep 22, 2025

View reviewed changes

paulromano approved these changes Sep 23, 2025

View reviewed changes

pshriwise merged commit ed433fe into openmc-dev:develop Sep 23, 2025
14 checks passed

nuclearkevin deleted the libmesh_perf_fix branch September 23, 2025 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix performance regression in libMesh unstructured mesh tallies #3577

Fix performance regression in libMesh unstructured mesh tallies #3577

Uh oh!

nuclearkevin commented Sep 21, 2025 •

edited

Loading

Uh oh!

nuclearkevin commented Sep 22, 2025 •

edited

Loading

Uh oh!

pshriwise left a comment

Uh oh!

paulromano left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix performance regression in libMesh unstructured mesh tallies #3577

Fix performance regression in libMesh unstructured mesh tallies #3577

Uh oh!

Conversation

nuclearkevin commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

This Implementation

Simpler Implementation

Checklist

Uh oh!

nuclearkevin commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pshriwise left a comment

Choose a reason for hiding this comment

Uh oh!

paulromano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nuclearkevin commented Sep 21, 2025 •

edited

Loading

nuclearkevin commented Sep 22, 2025 •

edited

Loading