Skip to content

Conversation

@jcs15c
Copy link
Contributor

@jcs15c jcs15c commented Oct 3, 2025

Summary

This PR is a feature/enhancement that adds to core/numerics a method of directly computing Gauss-Legendre quadrature nodes and weights without relying on the analogous methods in MFEM. These methods are used in winding_number.hpp and evaluate_integral.hpp. This significantly reduces the number of #ifdef guards that need to be used to check for an MFEM dependency, particularly relevant for PR #1672.

Quadrature nodes are calculated in calculate_gauss_legendre(npts) through a Newton's method search which finds the roots of the Legendre polynomial of degree npts. Although the Newton iterations converge rapidly from the initial guess (which correctly assumes the nodes are nearly equally distributed in $$\arccos(x_i)$$), an additional interface is provided through get_gauss_legendre(npts) which stores the resulting quadrature rules in a static map so that repeated calls do not have to recompute the rule each time. General quadrature rules are stored in a numerics::QuadratureRule object which stores one axom::Array<double> of nodes and one of weights. This framework is readily extensible to definitions of other types of more specialized quadrature rules.

The accuracy and stability of the quadrature nodes is tested against the MFEM rules for even large values of npts, up to around ~10,000, but a rigorous implementation would use asymptotic formulas for evaluation of high-degree rules instead of Newton's method, as they could be evaluated significantly faster for such large orders. Because the new implementation is essentially identical to the original MFEM implementation, the two methods take about the same time to evaluate.

@jcs15c jcs15c self-assigned this Oct 3, 2025
@jcs15c jcs15c added enhancement New feature or request Core Issues related to Axom's 'core' component Primal Issues related to Axom's 'primal component labels Oct 3, 2025
Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jcs15c -- this is a really nice contribution!

It should resolve #1529

Please also update the RELEASE-NOTES.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

const QuadratureRule& get_gauss_legendre(int npts)
{
// Define a static map that stores the GL quadrature rule for a given order
static std::map<int, QuadratureRule> rule_library;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Should this be a std::map or a hashmap (e.g. std::unordered_map or axom::FlatMap)?

I'm not sure if it matters too much for this PR, but we might want to revisit this in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason it can't be an axom::FlatMap, defaulting to std::map is a bad habit of mine. I've changed it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the simplest reason it shouldn't be a hashmap is because there's no built-in way to hash a pair of ints. I've accidentally avoided the same issue in the GWN memoization code, which also has maps keyed by pair<RefinementLevel, RefinementIndex> by (in retrospect incorrectly) using std::map there too.

It's easy enough to make and use a simple custom hash for a pair of ints, but ideally that implementation would be shared by this and the memoization code, and be more general to arbitrary pairs. For now, I think it's simplest to use std::map here, and figure out later where that shared implementation should go. Maybe in FlatTable.hpp?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jcs15c -- can you please create an issue for this?

Tag: @publixsubfan

@jcs15c
Copy link
Contributor Author

jcs15c commented Oct 6, 2025

Testing directly against the MFEM implementation reveals that the new method agrees with those results for significantly larger degrees than I originally anticipated, upwards of 10,000 points. As such, I've reverted the changes to evaluate_integral so that it just uses the new methods entirely separate from MFEM. I've updated the comment to reflect these changes, and re-requested any approved reviews I've gotten. Thanks for the patience!

@jcs15c jcs15c requested a review from kennyweiss October 6, 2025 18:42
// Compute the Newton method step size
dz = P_n / Pp_n;

if(std::fabs(dz) < 1e-16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the termination condition. I feel hesitant to hard-wire the error tolerance to 1e-16. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think replacing it with a axom::numerics::floating_point_limits<double>::epsilon() is better? I personally don't think it needs to be/should be an exposed tolerance, since its ambiguous (to me at least) how using approximate quadrature nodes impacts the accuracy of the downstream numerical integration.

As it is, the algorithm already converges very quickly, and is very insensitive to the numerical stopping criteria. If you replace it with 10^-10, you get essentially the same number of iterations, since Newton's method converges quadratically.

//! \brief Private constructor for use in compute_<rule>() methods. Only allocates memory
QuadratureRule(int npts) : m_nodes(npts), m_weights(npts), m_npts(npts) { };

axom::Array<double> m_nodes;
Copy link
Member

@BradWhitlock BradWhitlock Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see QuadratureRule use axom::ArrayView<double> instead. The array ownership could be still done in your friend function. That function should probably also take an optional allocatorID. Then I'd like to see QuadratureRule decorated with AXOM_HOST_DEVICE and tested in some GPU kernels.

Or maybe instead of doing that, provide a method for returning a view of the QuadratureRule.

struct QuadratureRule
{
  struct View
  { AXOM_HOST_DEVICE double node(axom::IndexType index) const { return m_nodes[index]; }
    AXOM_HOST_DEVICE double weight(axom::IndexType index) const { return m_weights[index]; }
    AXOM_HOST_DEVICE axom::IndexType size() const { return m_nodes.size(); }
    axom::ArrayView<double> m_nodes, m_weights;
  };

  View view() const
  {
    return View{m_nodes.view(), m_weights.view()};
  }
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing - since the axom::Array contains a size, why also have a member to keep track of the number of points?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you answered my npts question above. n/m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're totally right, QuadratureRule::size() (or .getNumPoints()) should just call m_nodes.size() (which is always equal to m_weigths.size(), regardless of the method) instead of storing the extra integer. If it was an actual .getOrder() method, then it would have to be separate like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BradWhitlock To your earlier point about views, just to make sure I understand correctly, that would that require that the static unordered_map in get_gauss_legendre instead have values of a pair of axom::Arrays (or two maps of one array each), so that the QuadratureRule object can contain only views of those arrays, right? That would work, but I think I'd have to make calculate_gauss_legendre return the arrays themselves instead of a QuadratureRule object, or populate a const reference to the map, or just move its entire functionality into get_gauss_legendre.

My original idea was to let calculate_gauss_legendre bypass the caching altogether and just compute the nodes on the fly, but I'm not sure what the utility of that would really be. Let me implement this view version and see if it's closer to what you have in mind.

Copy link
Member

@BradWhitlock BradWhitlock Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jcs15c , I think you get what I mean but in case this helps clear it up...

// template it in case we wanted to call from a kernel and pass axom::StackArray or axom::StaticArray.
template <typename Container>
AXOM_HOST_DEVICE
void calculate_gauss_legendre(int npts, Container &nodes, Container &weights)
{
   // existing logic using these output arrays.
}

QuadratureRule get_gauss_legendre(int npts, int allocatorID = <some default value>)
{
   // Assumes std::map<std::pair<int,int>, std::pair<axom::Array<double>,axom::Array<double>>>;
   const auto key = std::make_pair<int,int>(npts, allocatorID);
   auto it = cache.find(key);
   if(it == cache.end())
   {
     it = cache[key];

     auto nodes = axom::Array<double>(npts);
     auto weights = axom::Array<double>(npts);
     calculate_gauss_legendre(npts, nodes, weights);

     // copy nodes, weights to arrays in target memory space
     it->second.first = axom::Array<double>(npts, npts, allocatorID);
     it->second.second = axom::Array<double>(npts, npts, allocatorID);
     axom::copy(it->second.first.data(), nodes.data(), npts * sizeof(double));
     axom::copy(it->second.second.data(), weights.data(), npts * sizeof(double));
   }
   return QuadratureRule{it->second.first.view(), it->second.second.view()};
}

Then later in other code, we could do
template <typename ExecSpace>
void func()
{
   // Get the rule for the right memory space, creating it if needed.
   // I was assuming that the function would be called prior to the axom::for_all. If we need to
   // call it for different numbers of points from inside the lambda then that is a problem with
   // what I proposed.
   const auto rule = get_gauss_legendre(5, axom::execution_space<ExecSpace>::allocatorID());

   // Use the rule in a lambda
   axom::for_all<ExecSpace>(n, AXOM_LAMBDA(axom::IndexType i)
   {
     double sum = 0.;
     for(int j = 0; j < rule.numPoints(); j++)
     {
        // These QuadratureRule methods would be AXOM_HOST_DEVICE to use them here
        sum += rule.weight(j) * f(rule.node(j));

        // Maybe we can do this now too...
        axom::StackArray<double, MAXNODES> nodes, weights;
        int nnodes = // something dynamic
        calculate_gauss_legendre(nnodes, nodes, weights);
        QuadratureRule newQR{axom::ArrayView<double>(nodes.data(), nnodes),
                                         axom::ArrayView<double>(weights.data(), nnodes)};
        // use newQR
     }
     someArray[i] = sum;
   });
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay great, I think I got it pretty close then, aside from a handful of redundant cache lookups. Thank you!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BradWhitlock Would you mind taking another look at my implementation of the ArrayView version, and the test I'm using in numerics_quadrature.hpp (reproduced below) just to make sure this looks practical for future applications? In particular, I don't think it's necessary to be able generate rules from within the lambda, since the generation procedure isn't especially expensive, and by definition only needs to be done once for a given rule.

On the other hand, the node computation procedure within a certain rule is pretty well parallelizeable, and could hypothetically benefit from being ported to the GPU, but I don't think there's many contexts in which anyone could use a rule with an order high enough to make it worth it. Even generating ~100,000 points only takes about half a minute, and that was on a debug build.

template <typename ExecSpace>
struct test_device_quadrature
{
  static void test()
  {
    const int npts = 15;

    // Create the rule with the proper allocator
    const auto rule =
      axom::numerics::get_gauss_legendre(npts, axom::execution_space<ExecSpace>::allocatorID());

    // Use the rule in a lambda to integrate the volume under std::sin(pi * x) on [0, 1]
    axom::ReduceSum<ExecSpace, double> quadrature_sum(0.0);
    axom::for_all<ExecSpace>(
      npts,
      AXOM_LAMBDA(axom::IndexType i) { quadrature_sum += rule.weight(i) * sin(rule.node(i)); });

    EXPECT_NEAR(quadrature_sum.get(), 0.459697694132, 1e-6);
  }
};

TEST(numerics_quadrature, get_nodes_seq) { test_device_quadrature<axom::SEQ_EXEC>::test(); }

#if defined(AXOM_RUNTIME_POLICY_USE_OPENMP)
TEST(numerics_quadrature, get_nodes_omp) { test_device_quadrature<axom::OMP_EXEC>::test(); }
#endif

#if defined(AXOM_RUNTIME_POLICY_USE_CUDA)
TEST(numerics_quadrature, get_nodes_cuda) { test_device_quadrature<axom::CUDA_EXEC<256>>::test(); }
#endif

#if defined(AXOM_RUNTIME_POLICY_USE_HIP)
TEST(numerics_quadrature, get_nodes_hip) { test_device_quadrature<axom::HIP_EXEC<256>>::test(); }
#endif

Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jcs15c !

Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another pass through the latest changes -- looks great to me.

Comment on lines 126 to 127
// Evaluate the weight as w_i = 2 / (1 - z^2) / P'_n(z)^2, with z \in [-1, 1]
weights[i - 1] = weights[n - i] = 1.0 / (4.0 * xi * (1.0 - xi) * Pp_n * Pp_n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment doesn't (appear to) match the code -- presumably it's in how $z^2$ and $xi$ are defined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. The lines of code above say that xi = (1 - z) / 2, but it was a bit confusing where the extra factor of 1/2 came from. I've updated the comment to be more clear.

Copy link
Member

@BradWhitlock BradWhitlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now with the ArrayViews. Thanks!

@jcs15c jcs15c merged commit adc75e2 into develop Oct 27, 2025
15 checks passed
@jcs15c jcs15c deleted the feature/spainhour/mfemless_gauss_legendre branch October 27, 2025 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Issues related to Axom's 'core' component enhancement New feature or request Primal Issues related to Axom's 'primal component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants