Parallel periodic coupling #71

pjaap · 2025-07-04T13:31:34Z

This seems to work well now.

I measured the parts of the function in Example312_PeriodicBoundary3D.main(h=5e-6, order=2, threads=XX)
and get

threads	prepare dof-maps	box search	main loop	merging
1	6.5	0.6	26	0.08
2	6.5	0.4	14	0.18
4	6.3	0.26	9.6	0.4
8	6.4	0.16	9.6	0.6

My computer has 6 physical CPUs. So it saturates after 4 threads.
On our clusters parallel efficiency is even better.

j-fu · 2025-07-04T14:11:24Z

src/helper_functions.jl

+    chunks = Iterators.partition(bfaces_of_interest, chuck_length)
+
+    # loop over boundary face indices in a chunk: we need this index for dofs_on_boundary
+    function compute_chunk_result(chunk)


May be the need for "local" comes from the fact that you define this function
inside another one.

src/helper_functions.jl

j-fu · 2025-07-04T14:23:10Z

For the local matrices you could assemble into instances of SparseMatrixLNK
and then use the overloaded '+' operator:

https://github.com/WIAS-PDELib/ExtendableSparse.jl/blob/a07808002f0fe2c875d2533291edd497d01e4157/src/matrix/sparsematrixlnk.jl#L297

This avoids lots of intermediate transformations in ExtendableSparse, and the sparse method during the merge, and should be faster.

j-fu · 2025-07-04T14:26:41Z

There is also the newer https://github.com/WIAS-PDELib/ExtendableSparse.jl/blob/master/src/matrix/sparsematrixdilnkc.jl with the corresponding '+' overload

which avoids the need for a fully sized colind arrays in the sparse matrix format by replacing this with a dict.

pjaap · 2025-07-04T14:49:24Z

Regarding the local matrices: The merging in the end is currently barely measurable in terms of time.
Can you elaborate how to use the new ExtendableSparse types? I do not see how the + operator does the merging of entries instead of summing them.

j-fu · 2025-07-04T14:56:02Z

But I think the main culprit is in interpolate!. The number of allocations in this call grows proportional to the number of nodes in the grid. So this seems to allocate for every node in the grid while we only work on two parts of the boundary. So somehow each interpolate! call must run over all elements. I think this should be fixed in the first place.

pjaap · 2025-07-04T14:59:23Z

But I think the main culprit is in interpolate!. The number of allocations in this call grows proportional to the number of nodes in the grid. So this seems to allocate for every node in the grid while we only work on two parts of the boundary. So somehow each interpolate! call must run over all elements. I think this should be fixed in the first place.

Yes, we are fighting on two different fronts here 😃

j-fu · 2025-07-04T15:11:29Z

Regarding the local matrices: The merging in the end is currently barely measurable in terms of time. Can you elaborate how to use the new ExtendableSparse types? I do not see how the + operator does the merging of entries instead of summing them.

It essentially does both. Adding to existing entries, or creating new ones if they are missing.

pjaap · 2025-07-04T15:15:38Z

It essentially does both. Adding to existing entries, or creating new ones if they are missing.

Here, we need explicitly no adding. Some entries are culculated multiple times, since the precomputed "searchareas" overlap. Then adding creates a wrong result. This could be avoided with a bool vector "allow list" that blocks inserting the same index twice... Then, simple reducing with + should be possible

j-fu · 2025-07-04T15:49:26Z

Sitting with Christian: searchareas is empty and therefore the find loop goes over all cells...

pjaap · 2025-07-04T16:05:41Z

oh what? This is definitely a regression.

chmerdon · 2025-07-04T16:07:35Z

shouldn't we couple bregions 3 and 5 with this g function above?

pjaap · 2025-07-04T16:09:05Z

Maybe, but also this should trigger an error.

chmerdon · 2025-07-04T16:13:20Z

I thought so, too, but didn't get an error with 3 and 5. And the gridplot also suggests these numbers.

pjaap · 2025-07-04T16:15:15Z

Then you are right. I started with a 2D grid and reused the numbers since there was no error. Do we get meaningful search ares then?

I suggest to throw an error if the areas are empty.

chmerdon · 2025-07-04T16:16:43Z

I think thats why the searchareas are empty, which triggers that the NodalInterpolator wants to evaluate at every node.

chmerdon · 2025-07-04T16:17:33Z

I suggest to throw an error if the areas are empty.

Yes, that is a good idea

j-fu · 2025-07-06T17:54:13Z

As for timing: in order to verify correct complexity for the single threaded case I would propose to have as scaling test (not necessarily in CI). I think we should have complexity O(number_of_surface_nodes_to_be_coupled) This means that execution time should increase by a factor of approximately 4 when going from h to h/2 (increasing nref by one). In the moment it seems to be much larger. May be things then become fast enough even without parallelization.

chmerdon · 2025-07-07T08:00:28Z

Right, so the number of calls of __eval_point scales with a factor 4 (when bregions 3 and 5 are coupled), but the overall runtime does not...

chmerdon · 2025-07-07T08:22:55Z

also the duration and allocations of each interpolate! call stays constant when I refine, so maybe something else in the loop causes the bad scaling?

chmerdon · 2025-07-07T08:24:04Z

aha, the loop below # set entries scales with a factor 8

pjaap · 2025-07-07T11:17:46Z

The bad scaling of the interpolation loop was caused by using a dense vector for fe_vector_target. We experimented with a sparse vector implementation and this solves the problem.

The interpolation loop is now scaled optimally.

Bottleneck( with much smaller factor) is now the offline searchareas construction.

This PR needs WIAS-PDELib/ExtendableFEMBase.jl#43

pjaap · 2025-07-21T12:31:53Z

I updated the test script to Example312

using TestEnv; TestEnv.activate()
include("examples/Example312_PeriodicBoundary3D.jl")
Example312_PeriodicBoundary3D.main(order = 2, h = 1e-5, threads = xxx) # I added the kwarg 'threads'

and now have the following results (I have 6 cores / 12 threads )

threads	total time	matrix assembly	matrix merging
1	18.6	15.4	0.0
2	12.2	9.0	0.05
3	10.4	7.1	0.1
6	8.4	5.1	0.2
12	10.0	6.2	0.6

But I'll open a separate PR for the sparse vector stuff. We discuss different issues here.

pjaap · 2025-08-15T13:09:10Z

All updated and rebased.

Benchmark in the description.

the box search is now also parallel (inner loop)
the matrix merging in the end is very cheap
for 1 thread or parallel=false no threads are spawned at all.

j-fu reviewed Jul 4, 2025

View reviewed changes

src/helper_functions.jl Outdated Show resolved Hide resolved

Parallel periodic coupling

882efc1

pjaap force-pushed the feature/parallel-coupling branch from 7fe8c9c to 882efc1 Compare August 15, 2025 13:02

pjaap changed the title ~~Start implementing parallel periodic coupling~~ Parallel periodic coupling Aug 15, 2025

Parallel periodic coupling #71

Are you sure you want to change the base?

Parallel periodic coupling #71

Uh oh!

Conversation

pjaap commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

j-fu Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

j-fu commented Jul 4, 2025

Uh oh!

j-fu commented Jul 4, 2025

Uh oh!

pjaap commented Jul 4, 2025

Uh oh!

j-fu commented Jul 4, 2025

Uh oh!

pjaap commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

j-fu commented Jul 4, 2025

Uh oh!

pjaap commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

j-fu commented Jul 4, 2025

Uh oh!

pjaap commented Jul 4, 2025

Uh oh!

chmerdon commented Jul 4, 2025

Uh oh!

pjaap commented Jul 4, 2025

Uh oh!

chmerdon commented Jul 4, 2025

Uh oh!

pjaap commented Jul 4, 2025

Uh oh!

chmerdon commented Jul 4, 2025

Uh oh!

chmerdon commented Jul 4, 2025

Uh oh!

j-fu commented Jul 6, 2025

Uh oh!

chmerdon commented Jul 7, 2025

Uh oh!

chmerdon commented Jul 7, 2025

Uh oh!

chmerdon commented Jul 7, 2025

Uh oh!

pjaap commented Jul 7, 2025

Uh oh!

pjaap commented Jul 21, 2025

Uh oh!

pjaap commented Aug 15, 2025

Uh oh!

Uh oh!

pjaap commented Jul 4, 2025 •

edited

Loading

pjaap commented Jul 4, 2025 •

edited

Loading

pjaap commented Jul 4, 2025 •

edited

Loading