Relocation variants by isidorostsa · Pull Request #6364 · TheHPXProject/hpx

isidorostsa · 2023-10-12T18:15:43Z

This PR introduces additional features related to relocation, aimed at both internal and public use. The specifics are:

The algorithm uninitialized_relocate_backward and its underlying lower-level primitives have been implemented. Corresponding tests for these algorithms have also been added.
The function uninitialized_relocate has been refactored and it now uses a sentinel based for loop, instead of calculating the number of iterations and internally calling uninitialized_relocate_n.

The primitives of uninitialized_relocate_backwards will be utilized internally for the hpx::small_vector::insert() method.

Quuxplusone

Some comments. I didn't look closely from the point of my last comment onward: reviewer silence doesn't necessarily mean endorsement. :)

Quuxplusone · 2023-10-12T20:01:25Z

+                auto dest_first = std::prev(dest_last, count);
+
+                return parallel_uninitialized_relocate_n(
+                    HPX_FORWARD(ExPolicy, policy), first, count, dest_first);


Whether this is OK depends on the preconditions of your parallel algorithms. I expect that this is not OK. The important thing IMO is that we want this to work:

Widget a[10] = {1,2,3,4,5,6,7,8,9,destroyed}; std::uninitialized_relocate_backward(a, a+9, a+10); assert(a == {destroyed,1,2,3,4,5,6,7,8,9});

(modulo syntax errors and language-lawyering over the lifetimes of those Widget objects). If you copy them in the forward direction, the output will be something like

assert(a == {destroyed,1,1,1,1,1,1,1,1,1});

So the idea of having uninitialized_relocate_backward dispatch to uninitialized_relocate_n smells wrong to me.

You are correct in pointing this out, however I should look into how (if at all) parallelizable this would be, while retaining the correct ordering.

Naïvely, it's not parallelizable at all unless you can prove (by pointer comparisons) that the two ranges don't overlap. If they don't overlap, then this implementation is fine. It occurs to me that HPX already has a std::copy{,_backward} implementation; you should look at what it does, and copy that strategy.
(It might be that HPX's std::copy{,_backward} also completely ignore the overlapping issue. If so, then I think it would be perfectly fine for you to copy that ignorance here. uninitialized_relocate_backward is not intended to be any more difficult to implement than copy_backward.)

From #6364 (comment):

In standard C++20 we have

the non-parallel std::copy_backward, which forbids overlap

the ExecutionPolicy overload of std::copy_backward, which also forbids overlap

and thus {parallel_,}vector::erase fundamentally isn't allowed to use std::copy_backward.
D1144R10 currently proposes

a non-parallel std::uninitialized_relocate_backward which permits overlap

an ExecutionPolicy overload of std::uninitialized_relocate_backward, which also permits overlap

And noting that:

uninitialized_relocate_backward is not intended to be any more difficult to implement than copy_backward.

Are you proposing permitting overlapping ranges with parallel copy, or breaking the symmetry between copy and relocation?

I can experiment with implementing and benchmarking a parallel relocation algorithm for overlapping ranges.

I'm proposing breaking the symmetry between copy and uninitialized_relocate. (But I admit that's bad; and I also admit I still don't know why copy forbids overlap in the first place.)

Okay, for now I will structure the code in a way to accept overlaps from the expected direction in each of the forward/backward algorithms.

Do you think they should permit overlaps from both sides?

My view is:

uninitialized_relocate{,_n} should permit "shift left" overlap, as needed by vector::erase.

uninitialized_relocate_backward should permit "shift right" overlap, as needed by vector::insert.

My proposed wording reflects this since P1144R6: the behavior of uninitialized_relocate is "as if" by a plain old for-loop relocating each element in sequence from first to last, left to right.

Quuxplusone · 2023-10-12T20:04:11Z

+            // if count is representing a negative value, we do nothing
+            if (hpx::parallel::detail::is_negative(count))
+            {
+                return parallel::util::detail::algorithm_result<ExPolicy,
+                    BiIter2>::get(HPX_MOVE(dest_last));
+            }


This would be "library UB" according to P1144. You're welcome to treat it as a no-op (documented or undocumented), but it might be more appropriate to assert-fail or something, I don't know.

codacy-production · 2023-10-12T22:09:01Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
-0.13%	80.38%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`103a7b8`)	190583	162311	85.17%
Head commit (`74155d9`)	191899 (+1316)	163188 (+877)	85.04% (-0.13%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#6364)	943	758	80.38%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

hkaiser · 2023-10-19T15:47:52Z

@isidorostsa could you please rebase this and resolve the conflicts while you do?

isidorostsa · 2023-10-19T16:34:11Z

@isidorostsa could you please rebase this and resolve the conflicts while you do?

@hkaiser, I rebased and resolved the conflicts. Before we finalize the merge, there are some design choices I'd like to discuss:

Relocation variants #6364 (comment): should negative input sizes be treated as a no-op, similar to our move/copy algorithms? Arthur suggests using an assertion failure, and I'm leaning towards that as well. What do you think?
Relocation variants #6364 (comment): Does it make sense to have the backward to not actually perform operations in reverse order when executed in parallel? So when running in parallel the only difference between the default and the backward algorithm is the calling parameters
Codacy false alarm: There are some warnings here that are inaccurate. Do you know how we can disable those? The code it is shouting about is essentially:

void foo() noexcept(!(mode == throwing)){
    if constexpr(mode == throwing) { throw; }
}

hkaiser · 2023-10-19T17:40:25Z

@isidorostsa could you please rebase this and resolve the conflicts while you do?

@hkaiser, I rebased and resolved the conflicts. Before we finalize the merge, there are some design choices I'd like to discuss:

Relocation variants #6364 (comment): should negative input sizes be treated as a no-op, similar to our move/copy algorithms? Arthur suggests using an assertion failure, and I'm leaning towards that as well. What do you think?

Yes, sounds good.

Relocation variants #6364 (comment): Does it make sense to have the backward to not actually perform operations in reverse order when executed in parallel? So when running in parallel the only difference between the default and the backward algorithm is the calling parameters

Using a backwards algorithm usually has a reason, e.g., you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way. I don't think we should strat cheating here ;-)

Codacy false alarm: There are some warnings here that are inaccurate. Do you know how we can disable those? The code it is shouting about is essentially:
void foo() noexcept(!(mode == throwing)){
    if constexpr(mode == throwing) { throw; }
}

I can take care of those.

isidorostsa · 2023-10-19T20:17:07Z

you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way

@hkaiser In case we want to preserve parallelization while not overwriting objects I have two techniques in mind:

A. Execute the overlapping portion of the relocation sequentially and do what is left in parallel. However, I suspect that in most common use cases, the ranges will largely overlap, rendering the operation mostly sequential.
B. Break down the overlapping relocation into two non-overlapping ones, using an intermediate buffer. But if memory speed is a limiting factor, this could actually reduce performance. Additionally, the unexpected allocation of the buffer could be an issue.

So it may be more practical to opt for a purely sequential implementation. Is there an alternative solution that I'm overlooking?

hkaiser · 2023-10-23T14:13:00Z

retest lsu

codacy-production · 2023-10-23T17:49:32Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
-0.12%	80.38%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`103a7b8`)	190583	162311	85.17%
Head commit (`1b08b70`)	191899 (+1316)	163194 (+883)	85.04% (-0.12%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#6364)	943	758	80.38%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

hkaiser · 2023-11-08T14:50:20Z

@isidorostsa what's the status for this PR?

isidorostsa · 2023-11-08T14:53:58Z

@isidorostsa what's the status for this PR?

I think last time I pushed the hpx tests were broken, which is why we have not merged yet

But locally the tests related to this are passing.

hkaiser · 2023-11-08T19:22:36Z

@isidorostsa what's the status for this PR?

I think last time I pushed the hpx tests were broken, which is why we have not merged yet

But locally the tests related to this are passing.

The broken tests are unrelated. So this is good to go, then?

Pansysk75 · 2023-11-08T21:37:05Z

The broken tests are unrelated. So this is good to go, then?

@hkaiser
Oh, the tests aren't really running. Take a look here: https://cdash.cscs.ch/build/118631
It's related to this: #6377
I apologize I didn't make that very clear.
I have put fixing them on the top of my queue.

isidorostsa · 2023-11-09T14:47:29Z

I have put fixing them on the top of my queue.

P1144R9 is out already, so we do need to rush merging this!

codacy-production · 2023-11-14T16:10:21Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ -0.30%	✅ 89.19%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`103a7b8`)	0	0	84.75%
Head commit (`323904c`)	197388 (+197388)	166693 (+166693)	84.45% (-0.30%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#6364)	925	825	89.19%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

_{You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation}

isidorostsa · 2023-12-03T11:43:00Z

+                "Relocating from this source type to this destination type is "
+                "ill-formed");
+
+            auto count = std::distance(first, last);


@hkaiser @Pansysk75 @gonidelis Considering #6364 (comment), it may be sensible to include a warning here such as:
"Calling uninitialized_relocate_backward with a non-sequential execution policy does not guarantee the order of execution of the relocations to be backward. Use on overlapping ranges could yield undefined behavior."

However, I am not sure how we would issue such a warning.

There is no need to do that. The same warning applies to std::copy and std::move. There, it is up to the user to ensure these preconditions.

There is no need to do that. [...] it is up to the user to ensure these preconditions.

Does that contradict your earlier #6364 (comment) ?

Using a backwards algorithm usually has a reason, e.g., you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way. I don't think we should start cheating here ;-)

Either "no overlap" is a precondition that you don't need to check, or else we expect overlap to be the common case for the backwards algorithm (because doing it forwards would overwrite values in the overlapping portion). But we can't have both!

FYI, I'm very interested in feedback into the design of these algorithms in P1144.

In standard C++20 we have

the non-parallel std::copy_backward, which forbids overlap

the ExecutionPolicy overload of std::copy_backward, which also forbids overlap

and thus {parallel_,}vector::erase fundamentally isn't allowed to use std::copy_backward.
D1144R10 currently proposes

a non-parallel std::uninitialized_relocate_backward which permits overlap

an ExecutionPolicy overload of std::uninitialized_relocate_backward, which also permits overlap

so that {parallel_,}vector::erase will be allowed to use std::uninitialized_relocate_backward (because I think that's an important feature). But this means inconsistency between copy_backward (which forbids overlap) and std::uninitialized_relocate_backward (which permits it); and also between std::uninitialized_relocate_backward (which permits overlap) and hpx::experimental::uninitialized_relocate_backward (which currently forbids it).

I think you could fix the latter inconsistency by having hpx::parallel::detail::parallel_uninitialized_relocate_n check for overlap (that is, check whether we're going to end up using memcpy, so we surely have contiguous ranges, and then check whether those contiguous ranges overlap) and if so, then don't call util::partitioner_with_cleanup — just do the whole range sequentially-not-in-parallel (or, for trivially relocatable types, use memmove instead of memcpy). (Or to put it a different way that implies more major surgery: implement a parallel_memmove primitive and then make the ExecutionPolicy overloads of uninitialized_relocate{,_n,_backward} dispatch straight to parallel_memmove. But then you get a parallel speedup only for trivially relocatable types, because for non-trivial types and/or non-pointer iterators you can't tell whether the ranges overlap or not? Yuck.)

OTOH, maybe you'd rather that P1144's std::uninitialized_relocate_backward should forbid overlap for consistency with std::copy_backward. But if so, then I'd like to know what's your plan for implementing things like {parallel_,}vector::erase. I think the Standard Library needs some primitive algorithm that fits that use-case.

@Quuxplusone Thank you for the detailed write-up, and sorry for my late response.

I think my previous comment: #6364 (comment), is incorrect and there does exist a way to parallelize the relocation of overlapping ranges, maybe like in the following image: (example for 2 threads and an overlapping offset of 1)

Supposing this is true, I see no reason to avoid offering this as a feature. Furthermore, I don't see why we would need non-overlapping ranges to be a precondition for relocation!

I think you could fix the latter inconsistency by having hpx::parallel::detail::parallel_uninitialized_relocate_n check for overlap

I think for the time being we will do this, only parallelizing the guaranteed safe case (contiguous iterator + no overlap), and have the rest be sequenced, but definitely add proper parallelization support on the todo list. @hkaiser What do you think?

maybe you'd rather that P1144's std::uninitialized_relocate_backward should forbid overlap .... If so, I'd like to know what's your plan for implementing things like {parallel_,}vector::erase

For the time being we do not implement parallel data structures, but there is a draft PR for using relocation inside hpx::detail::small_vector (which is like static_vector except it grows to dynamic storage after running out of static space) (isidorostsa#9).

isidorostsa · 2023-12-14T16:03:32Z

I think for the time being we will do this, only parallelizing the guaranteed safe case (contiguous iterator + no overlap), and have the rest be sequenced, but definitely add proper parallelization support on the todo list. @hkaiser What do you think?

@hkaiser this is okay to merge if you agree with this

isidorostsa · 2024-01-09T21:38:10Z

@hkaiser I noticed that after the merge some tests are failing and I will get onto fixing that as soon as possible.

hkaiser · 2024-01-10T15:28:45Z

@isidorostsa I'd like to revert this merge. Let's fix the issues with a new PR. Can you please do that?

hkaiser · 2024-01-10T15:34:11Z

@isidorostsa I'd like to revert this merge. Let's fix the issues with a new PR. Can you please do that?

I take this back. The issues are unrelated to your changes. I will fix them separately.

isidorostsa · 2024-01-10T15:44:47Z

I take this back. The issues are unrelated to your changes. I will fix them separaty.

Thanks for letting me know.

isidorostsa requested a review from hkaiser as a code owner October 12, 2023 18:15

hkaiser added category: core type: enhancement type: compatibility issue labels Oct 12, 2023

isidorostsa force-pushed the relocation_variants branch from 2f6aedd to 087b01b Compare October 12, 2023 18:21

hkaiser reviewed Oct 12, 2023

View reviewed changes

Comment thread libs/core/type_support/include/hpx/type_support/uninitialized_relocate_n_primitive.hpp Outdated

isidorostsa force-pushed the relocation_variants branch from 087b01b to 3916299 Compare October 12, 2023 19:02

Quuxplusone reviewed Oct 12, 2023

View reviewed changes

isidorostsa force-pushed the relocation_variants branch from 0d18168 to bf8425d Compare October 17, 2023 18:03

isidorostsa added 6 commits October 19, 2023 19:10

relocation backward, n primitives

7b9e036

relocation backward, n + tests

027902e

fix returned value of in_out results

579c5af

Utilize P1144's library functions if present

ef39778

Heavily refactor, consolidate relocation tests

32a3f98

change relocation primitives filename

659e0ea

isidorostsa force-pushed the relocation_variants branch from bf8425d to 74155d9 Compare October 19, 2023 16:12

fix typos, inspect

1b08b70

isidorostsa force-pushed the relocation_variants branch from 74155d9 to 1b08b70 Compare October 23, 2023 14:29

isidorostsa force-pushed the relocation_variants branch 2 times, most recently from 961680e to 0f91e09 Compare November 14, 2023 11:53

isidorostsa commented Dec 3, 2023

View reviewed changes

uninitialized_relocate_backward docs

84afafa

isidorostsa force-pushed the relocation_variants branch from 0f91e09 to 84afafa Compare December 3, 2023 11:44

isidorostsa added 2 commits December 8, 2023 13:55

permit (expected side) overlaps on relocation

4fa59fd

tidying up tests

323904c

hkaiser merged commit fcac367 into TheHPXProject:master Jan 9, 2024

hkaiser added this to the 1.10.0 milestone Jan 15, 2024

Uh oh!

Conversation

isidorostsa commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Quuxplusone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codacy-production Bot commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage summary from Codacy

Uh oh!

hkaiser commented Oct 19, 2023

Uh oh!

isidorostsa commented Oct 19, 2023

Uh oh!

hkaiser commented Oct 19, 2023

Uh oh!

isidorostsa commented Oct 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hkaiser commented Oct 23, 2023

Uh oh!

codacy-production Bot commented Oct 23, 2023

Coverage summary from Codacy

Uh oh!

hkaiser commented Nov 8, 2023

Uh oh!

isidorostsa commented Nov 8, 2023

Uh oh!

hkaiser commented Nov 8, 2023

Uh oh!

Pansysk75 commented Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isidorostsa commented Nov 9, 2023

Uh oh!

codacy-production Bot commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage summary from Codacy

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isidorostsa commented Dec 14, 2023

Uh oh!

isidorostsa commented Jan 9, 2024

Uh oh!

hkaiser commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

isidorostsa commented Oct 12, 2023 •

edited

Loading

codacy-production Bot commented Oct 12, 2023 •

edited

Loading

isidorostsa commented Oct 19, 2023 •

edited

Loading

Pansysk75 commented Nov 8, 2023 •

edited

Loading

codacy-production Bot commented Nov 14, 2023 •

edited

Loading

hkaiser commented Jan 10, 2024 •

edited

Loading

isidorostsa commented Jan 10, 2024 •

edited

Loading