Skip to content

Conversation

nuclearkevin
Copy link
Contributor

@nuclearkevin nuclearkevin commented Sep 21, 2025

Description

#3185 added support for adaptive libMesh tallies initialized by an external C++ driver application (e.g. Cardinal) to support a project investigating adaptive mesh refinement applied to unstructured mesh tallies in Monte Carlo particle transport. Unfortunately, this addition also introduced a bug which yielded a massive performance penalty during tallying for users of both OpenMC and Cardinal. The crux of the issue is the function

int LibMesh::n_bins() const
{
  return m_->n_elem();
}

was changed to

int LibMesh::n_bins() const
{
  return m_->n_active_elem();
}

Without getting too deep into the weeds of how libMesh is designed and the process behind adaptive mesh refinement, the return value of n_elem() is cached when the libMesh mesh is constructed, while n_active_elem() (required for AMR) is not. This necessitates what is essentially a loop over all elements every time n_active_elem() is called, which in the case of transport is every time a libMesh element accumulates a hit due to this function:

int LibMesh::get_bin_from_element(const libMesh::Elem* elem) const
{
  int bin =
    adaptive_ ? elem_to_bin_map_[elem->id()] : elem->id() - first_element_id_;
  if (bin >= n_bins() || bin < 0) {
    fatal_error(fmt::format("Invalid bin: {}", bin));
  }
  return bin;
}

This Implementation

To fix this performance regression, I've reverted the LibMesh class back to it's previous state. All of the adaptivity code necessary for mesh tally AMR in Cardinal has been moved to a new class called AdaptiveLibMesh which inherits from LibMesh and overrides the few functions required to get adaptivity working. n_active_elem() is cached on construction to avoid any performance hits in transport. This should insulate OpenMC users from any other changes made to support the AMR project in Cardinal, such as #3553.

Simpler Implementation

If a simpler fix is desired, I can revert these changes and simply implement the cache for n_active_elem() in LibMesh. I decided to go all the way in an attempt to guard against future performance regressions.

FYI @pshriwise @aprilnovak

Checklist

  • I have performed a self-review of my own code
  • I have run clang-format (version 15) on any C++ source files (if applicable)
  • I have followed the style guidelines for Python source files (if applicable)
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works (if applicable)

@nuclearkevin
Copy link
Contributor Author

nuclearkevin commented Sep 22, 2025

Here are some more quantitative performance metrics taken on my personal laptop (Intel Core i7 11800H) using /tests/regression_tests/unstructured_mesh/ which tallies on a ~12,000 element tetrahedral mesh:

Mesh Tally Implementation 1 OpenMP Thread 14 OpenMP Threads
develop 321.02 particles per second 1563.55 particles per second
Cache n_active_elem() 12019 particles per second 59212.3 particles per second
Splitting the AMR code into an AdaptiveLibMesh and a LibMesh (this PR) 14227.6 particles per second 64610.3 particles per second

There is a severe per-thread performance penalty to libMesh tallies when n_active_elem() is not cached with no hit to threaded scalability. After caching, performance jumps back up by two orders of magnitude. Separating out the AMR code into it's own class further improves performance, likely due to improved cache alignment as the mapping data structures no longer get fetched in get_bin_from_element(...). This is just a theory at the moment, I haven't profiled this aspect of the change as my main goal was to get unstructured mesh tallies back to the same order of magnitude prior to the addition of AMR support.

Copy link
Contributor

@pshriwise pshriwise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this fix promptly, @nuclearkevin! I agree with the approach to insulate standard LibMesh mesh tallies from adaptive capabilities.

Copy link
Contributor

@paulromano paulromano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too. Thanks @nuclearkevin!

@pshriwise pshriwise merged commit ed433fe into openmc-dev:develop Sep 23, 2025
14 checks passed
@nuclearkevin nuclearkevin deleted the libmesh_perf_fix branch September 23, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants