-
Notifications
You must be signed in to change notification settings - Fork 30
Add Gauss-Legendre quadrature detached from MFEM include #1683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ithub.com/LLNL/axom into feature/spainhour/mfemless_gauss_legendre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| const QuadratureRule& get_gauss_legendre(int npts) | ||
| { | ||
| // Define a static map that stores the GL quadrature rule for a given order | ||
| static std::map<int, QuadratureRule> rule_library; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Should this be a std::map or a hashmap (e.g. std::unordered_map or axom::FlatMap)?
I'm not sure if it matters too much for this PR, but we might want to revisit this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason it can't be an axom::FlatMap, defaulting to std::map is a bad habit of mine. I've changed it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the simplest reason it shouldn't be a hashmap is because there's no built-in way to hash a pair of ints. I've accidentally avoided the same issue in the GWN memoization code, which also has maps keyed by pair<RefinementLevel, RefinementIndex> by (in retrospect incorrectly) using std::map there too.
It's easy enough to make and use a simple custom hash for a pair of ints, but ideally that implementation would be shared by this and the memoization code, and be more general to arbitrary pairs. For now, I think it's simplest to use std::map here, and figure out later where that shared implementation should go. Maybe in FlatTable.hpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jcs15c -- can you please create an issue for this?
Tag: @publixsubfan
|
Testing directly against the MFEM implementation reveals that the new method agrees with those results for significantly larger degrees than I originally anticipated, upwards of 10,000 points. As such, I've reverted the changes to |
| // Compute the Newton method step size | ||
| dz = P_n / Pp_n; | ||
|
|
||
| if(std::fabs(dz) < 1e-16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the termination condition. I feel hesitant to hard-wire the error tolerance to 1e-16. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think replacing it with a axom::numerics::floating_point_limits<double>::epsilon() is better? I personally don't think it needs to be/should be an exposed tolerance, since its ambiguous (to me at least) how using approximate quadrature nodes impacts the accuracy of the downstream numerical integration.
As it is, the algorithm already converges very quickly, and is very insensitive to the numerical stopping criteria. If you replace it with 10^-10, you get essentially the same number of iterations, since Newton's method converges quadratically.
| //! \brief Private constructor for use in compute_<rule>() methods. Only allocates memory | ||
| QuadratureRule(int npts) : m_nodes(npts), m_weights(npts), m_npts(npts) { }; | ||
|
|
||
| axom::Array<double> m_nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see QuadratureRule use axom::ArrayView<double> instead. The array ownership could be still done in your friend function. That function should probably also take an optional allocatorID. Then I'd like to see QuadratureRule decorated with AXOM_HOST_DEVICE and tested in some GPU kernels.
Or maybe instead of doing that, provide a method for returning a view of the QuadratureRule.
struct QuadratureRule
{
struct View
{ AXOM_HOST_DEVICE double node(axom::IndexType index) const { return m_nodes[index]; }
AXOM_HOST_DEVICE double weight(axom::IndexType index) const { return m_weights[index]; }
AXOM_HOST_DEVICE axom::IndexType size() const { return m_nodes.size(); }
axom::ArrayView<double> m_nodes, m_weights;
};
View view() const
{
return View{m_nodes.view(), m_weights.view()};
}
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing - since the axom::Array contains a size, why also have a member to keep track of the number of points?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you answered my npts question above. n/m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're totally right, QuadratureRule::size() (or .getNumPoints()) should just call m_nodes.size() (which is always equal to m_weigths.size(), regardless of the method) instead of storing the extra integer. If it was an actual .getOrder() method, then it would have to be separate like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BradWhitlock To your earlier point about views, just to make sure I understand correctly, that would that require that the static unordered_map in get_gauss_legendre instead have values of a pair of axom::Arrays (or two maps of one array each), so that the QuadratureRule object can contain only views of those arrays, right? That would work, but I think I'd have to make calculate_gauss_legendre return the arrays themselves instead of a QuadratureRule object, or populate a const reference to the map, or just move its entire functionality into get_gauss_legendre.
My original idea was to let calculate_gauss_legendre bypass the caching altogether and just compute the nodes on the fly, but I'm not sure what the utility of that would really be. Let me implement this view version and see if it's closer to what you have in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jcs15c , I think you get what I mean but in case this helps clear it up...
// template it in case we wanted to call from a kernel and pass axom::StackArray or axom::StaticArray.
template <typename Container>
AXOM_HOST_DEVICE
void calculate_gauss_legendre(int npts, Container &nodes, Container &weights)
{
// existing logic using these output arrays.
}
QuadratureRule get_gauss_legendre(int npts, int allocatorID = <some default value>)
{
// Assumes std::map<std::pair<int,int>, std::pair<axom::Array<double>,axom::Array<double>>>;
const auto key = std::make_pair<int,int>(npts, allocatorID);
auto it = cache.find(key);
if(it == cache.end())
{
it = cache[key];
auto nodes = axom::Array<double>(npts);
auto weights = axom::Array<double>(npts);
calculate_gauss_legendre(npts, nodes, weights);
// copy nodes, weights to arrays in target memory space
it->second.first = axom::Array<double>(npts, npts, allocatorID);
it->second.second = axom::Array<double>(npts, npts, allocatorID);
axom::copy(it->second.first.data(), nodes.data(), npts * sizeof(double));
axom::copy(it->second.second.data(), weights.data(), npts * sizeof(double));
}
return QuadratureRule{it->second.first.view(), it->second.second.view()};
}
Then later in other code, we could do
template <typename ExecSpace>
void func()
{
// Get the rule for the right memory space, creating it if needed.
// I was assuming that the function would be called prior to the axom::for_all. If we need to
// call it for different numbers of points from inside the lambda then that is a problem with
// what I proposed.
const auto rule = get_gauss_legendre(5, axom::execution_space<ExecSpace>::allocatorID());
// Use the rule in a lambda
axom::for_all<ExecSpace>(n, AXOM_LAMBDA(axom::IndexType i)
{
double sum = 0.;
for(int j = 0; j < rule.numPoints(); j++)
{
// These QuadratureRule methods would be AXOM_HOST_DEVICE to use them here
sum += rule.weight(j) * f(rule.node(j));
// Maybe we can do this now too...
axom::StackArray<double, MAXNODES> nodes, weights;
int nnodes = // something dynamic
calculate_gauss_legendre(nnodes, nodes, weights);
QuadratureRule newQR{axom::ArrayView<double>(nodes.data(), nnodes),
axom::ArrayView<double>(weights.data(), nnodes)};
// use newQR
}
someArray[i] = sum;
});
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay great, I think I got it pretty close then, aside from a handful of redundant cache lookups. Thank you!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BradWhitlock Would you mind taking another look at my implementation of the ArrayView version, and the test I'm using in numerics_quadrature.hpp (reproduced below) just to make sure this looks practical for future applications? In particular, I don't think it's necessary to be able generate rules from within the lambda, since the generation procedure isn't especially expensive, and by definition only needs to be done once for a given rule.
On the other hand, the node computation procedure within a certain rule is pretty well parallelizeable, and could hypothetically benefit from being ported to the GPU, but I don't think there's many contexts in which anyone could use a rule with an order high enough to make it worth it. Even generating ~100,000 points only takes about half a minute, and that was on a debug build.
template <typename ExecSpace>
struct test_device_quadrature
{
static void test()
{
const int npts = 15;
// Create the rule with the proper allocator
const auto rule =
axom::numerics::get_gauss_legendre(npts, axom::execution_space<ExecSpace>::allocatorID());
// Use the rule in a lambda to integrate the volume under std::sin(pi * x) on [0, 1]
axom::ReduceSum<ExecSpace, double> quadrature_sum(0.0);
axom::for_all<ExecSpace>(
npts,
AXOM_LAMBDA(axom::IndexType i) { quadrature_sum += rule.weight(i) * sin(rule.node(i)); });
EXPECT_NEAR(quadrature_sum.get(), 0.459697694132, 1e-6);
}
};
TEST(numerics_quadrature, get_nodes_seq) { test_device_quadrature<axom::SEQ_EXEC>::test(); }
#if defined(AXOM_RUNTIME_POLICY_USE_OPENMP)
TEST(numerics_quadrature, get_nodes_omp) { test_device_quadrature<axom::OMP_EXEC>::test(); }
#endif
#if defined(AXOM_RUNTIME_POLICY_USE_CUDA)
TEST(numerics_quadrature, get_nodes_cuda) { test_device_quadrature<axom::CUDA_EXEC<256>>::test(); }
#endif
#if defined(AXOM_RUNTIME_POLICY_USE_HIP)
TEST(numerics_quadrature, get_nodes_hip) { test_device_quadrature<axom::HIP_EXEC<256>>::test(); }
#endifThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jcs15c !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another pass through the latest changes -- looks great to me.
| // Evaluate the weight as w_i = 2 / (1 - z^2) / P'_n(z)^2, with z \in [-1, 1] | ||
| weights[i - 1] = weights[n - i] = 1.0 / (4.0 * xi * (1.0 - xi) * Pp_n * Pp_n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment doesn't (appear to) match the code -- presumably it's in how
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. The lines of code above say that xi = (1 - z) / 2, but it was a bit confusing where the extra factor of 1/2 came from. I've updated the comment to be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now with the ArrayViews. Thanks!
Summary
This PR is a feature/enhancement that adds to
core/numericsa method of directly computing Gauss-Legendre quadrature nodes and weights without relying on the analogous methods in MFEM. These methods are used inwinding_number.hppandevaluate_integral.hpp. This significantly reduces the number of#ifdefguards that need to be used to check for an MFEM dependency, particularly relevant for PR #1672.Quadrature nodes are calculated in$$\arccos(x_i)$$ ), an additional interface is provided through
calculate_gauss_legendre(npts)through a Newton's method search which finds the roots of the Legendre polynomial of degreenpts. Although the Newton iterations converge rapidly from the initial guess (which correctly assumes the nodes are nearly equally distributed inget_gauss_legendre(npts)which stores the resulting quadrature rules in a static map so that repeated calls do not have to recompute the rule each time. General quadrature rules are stored in anumerics::QuadratureRuleobject which stores oneaxom::Array<double>of nodes and one of weights. This framework is readily extensible to definitions of other types of more specialized quadrature rules.The accuracy and stability of the quadrature nodes is tested against the MFEM rules for even large values of
npts, up to around ~10,000, but a rigorous implementation would use asymptotic formulas for evaluation of high-degree rules instead of Newton's method, as they could be evaluated significantly faster for such large orders. Because the new implementation is essentially identical to the original MFEM implementation, the two methods take about the same time to evaluate.