Relocation variants#6364
Conversation
2f6aedd to
087b01b
Compare
087b01b to
3916299
Compare
Quuxplusone
left a comment
There was a problem hiding this comment.
Some comments. I didn't look closely from the point of my last comment onward: reviewer silence doesn't necessarily mean endorsement. :)
| auto dest_first = std::prev(dest_last, count); | ||
|
|
||
| return parallel_uninitialized_relocate_n( | ||
| HPX_FORWARD(ExPolicy, policy), first, count, dest_first); |
There was a problem hiding this comment.
Whether this is OK depends on the preconditions of your parallel algorithms. I expect that this is not OK. The important thing IMO is that we want this to work:
Widget a[10] = {1,2,3,4,5,6,7,8,9,destroyed};
std::uninitialized_relocate_backward(a, a+9, a+10);
assert(a == {destroyed,1,2,3,4,5,6,7,8,9});
(modulo syntax errors and language-lawyering over the lifetimes of those Widget objects). If you copy them in the forward direction, the output will be something like
assert(a == {destroyed,1,1,1,1,1,1,1,1,1});
So the idea of having uninitialized_relocate_backward dispatch to uninitialized_relocate_n smells wrong to me.
There was a problem hiding this comment.
You are correct in pointing this out, however I should look into how (if at all) parallelizable this would be, while retaining the correct ordering.
There was a problem hiding this comment.
Naïvely, it's not parallelizable at all unless you can prove (by pointer comparisons) that the two ranges don't overlap. If they don't overlap, then this implementation is fine. It occurs to me that HPX already has a std::copy{,_backward} implementation; you should look at what it does, and copy that strategy.
(It might be that HPX's std::copy{,_backward} also completely ignore the overlapping issue. If so, then I think it would be perfectly fine for you to copy that ignorance here. uninitialized_relocate_backward is not intended to be any more difficult to implement than copy_backward.)
There was a problem hiding this comment.
From #6364 (comment):
In standard C++20 we have
- the non-parallel std::copy_backward, which forbids overlap
- the ExecutionPolicy overload of std::copy_backward, which also forbids overlap
and thus {parallel_,}vector::erase fundamentally isn't allowed to use std::copy_backward.
D1144R10 currently proposes
- a non-parallel std::uninitialized_relocate_backward which permits overlap
- an ExecutionPolicy overload of std::uninitialized_relocate_backward, which also permits overlap
And noting that:
uninitialized_relocate_backward is not intended to be any more difficult to implement than copy_backward.
Are you proposing permitting overlapping ranges with parallel copy, or breaking the symmetry between copy and relocation?
I can experiment with implementing and benchmarking a parallel relocation algorithm for overlapping ranges.
There was a problem hiding this comment.
I'm proposing breaking the symmetry between copy and uninitialized_relocate. (But I admit that's bad; and I also admit I still don't know why copy forbids overlap in the first place.)
There was a problem hiding this comment.
Okay, for now I will structure the code in a way to accept overlaps from the expected direction in each of the forward/backward algorithms.
Do you think they should permit overlaps from both sides?
There was a problem hiding this comment.
My view is:
uninitialized_relocate{,_n}should permit "shift left" overlap, as needed byvector::erase.uninitialized_relocate_backwardshould permit "shift right" overlap, as needed byvector::insert.
My proposed wording reflects this since P1144R6: the behavior of uninitialized_relocate is "as if" by a plain old for-loop relocating each element in sequence from first to last, left to right.
| // if count is representing a negative value, we do nothing | ||
| if (hpx::parallel::detail::is_negative(count)) | ||
| { | ||
| return parallel::util::detail::algorithm_result<ExPolicy, | ||
| BiIter2>::get(HPX_MOVE(dest_last)); | ||
| } |
There was a problem hiding this comment.
This would be "library UB" according to P1144. You're welcome to treat it as a no-op (documented or undocumented), but it might be more appropriate to assert-fail or something, I don't know.
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
0d18168 to
bf8425d
Compare
|
@isidorostsa could you please rebase this and resolve the conflicts while you do? |
bf8425d to
74155d9
Compare
@hkaiser, I rebased and resolved the conflicts. Before we finalize the merge, there are some design choices I'd like to discuss:
void foo() noexcept(!(mode == throwing)){
if constexpr(mode == throwing) { throw; }
} |
Yes, sounds good.
Using a backwards algorithm usually has a reason, e.g., you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way. I don't think we should strat cheating here ;-)
I can take care of those. |
@hkaiser In case we want to preserve parallelization while not overwriting objects I have two techniques in mind: A. Execute the overlapping portion of the relocation sequentially and do what is left in parallel. However, I suspect that in most common use cases, the ranges will largely overlap, rendering the operation mostly sequential. So it may be more practical to opt for a purely sequential implementation. Is there an alternative solution that I'm overlooking? |
|
retest lsu |
74155d9 to
1b08b70
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
|
@isidorostsa what's the status for this PR? |
I think last time I pushed the hpx tests were broken, which is why we have not merged yet But locally the tests related to this are passing. |
The broken tests are unrelated. So this is good to go, then? |
@hkaiser |
P1144R9 is out already, so we do need to rush merging this! |
961680e to
0f91e09
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesYou may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation |
| "Relocating from this source type to this destination type is " | ||
| "ill-formed"); | ||
|
|
||
| auto count = std::distance(first, last); |
There was a problem hiding this comment.
@hkaiser @Pansysk75 @gonidelis Considering #6364 (comment), it may be sensible to include a warning here such as:
"Calling uninitialized_relocate_backward with a non-sequential execution policy does not guarantee the order of execution of the relocations to be backward. Use on overlapping ranges could yield undefined behavior."
However, I am not sure how we would issue such a warning.
There was a problem hiding this comment.
There is no need to do that. The same warning applies to std::copy and std::move. There, it is up to the user to ensure these preconditions.
There was a problem hiding this comment.
There is no need to do that. [...] it is up to the user to ensure these preconditions.
Does that contradict your earlier #6364 (comment) ?
Using a backwards algorithm usually has a reason, e.g., you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way. I don't think we should start cheating here ;-)
Either "no overlap" is a precondition that you don't need to check, or else we expect overlap to be the common case for the backwards algorithm (because doing it forwards would overwrite values in the overlapping portion). But we can't have both!
FYI, I'm very interested in feedback into the design of these algorithms in P1144.
In standard C++20 we have
- the non-parallel
std::copy_backward, which forbids overlap - the
ExecutionPolicyoverload ofstd::copy_backward, which also forbids overlap
and thus {parallel_,}vector::erase fundamentally isn't allowed to use std::copy_backward.
D1144R10 currently proposes
- a non-parallel
std::uninitialized_relocate_backwardwhich permits overlap - an
ExecutionPolicyoverload ofstd::uninitialized_relocate_backward, which also permits overlap
so that {parallel_,}vector::erase will be allowed to use std::uninitialized_relocate_backward (because I think that's an important feature). But this means inconsistency between copy_backward (which forbids overlap) and std::uninitialized_relocate_backward (which permits it); and also between std::uninitialized_relocate_backward (which permits overlap) and hpx::experimental::uninitialized_relocate_backward (which currently forbids it).
I think you could fix the latter inconsistency by having hpx::parallel::detail::parallel_uninitialized_relocate_n check for overlap (that is, check whether we're going to end up using memcpy, so we surely have contiguous ranges, and then check whether those contiguous ranges overlap) and if so, then don't call util::partitioner_with_cleanup — just do the whole range sequentially-not-in-parallel (or, for trivially relocatable types, use memmove instead of memcpy). (Or to put it a different way that implies more major surgery: implement a parallel_memmove primitive and then make the ExecutionPolicy overloads of uninitialized_relocate{,_n,_backward} dispatch straight to parallel_memmove. But then you get a parallel speedup only for trivially relocatable types, because for non-trivial types and/or non-pointer iterators you can't tell whether the ranges overlap or not? Yuck.)
OTOH, maybe you'd rather that P1144's std::uninitialized_relocate_backward should forbid overlap for consistency with std::copy_backward. But if so, then I'd like to know what's your plan for implementing things like {parallel_,}vector::erase. I think the Standard Library needs some primitive algorithm that fits that use-case.
There was a problem hiding this comment.
@Quuxplusone Thank you for the detailed write-up, and sorry for my late response.
I think my previous comment: #6364 (comment), is incorrect and there does exist a way to parallelize the relocation of overlapping ranges, maybe like in the following image: (example for 2 threads and an overlapping offset of 1)
Supposing this is true, I see no reason to avoid offering this as a feature. Furthermore, I don't see why we would need non-overlapping ranges to be a precondition for relocation!
I think you could fix the latter inconsistency by having hpx::parallel::detail::parallel_uninitialized_relocate_n check for overlap
I think for the time being we will do this, only parallelizing the guaranteed safe case (contiguous iterator + no overlap), and have the rest be sequenced, but definitely add proper parallelization support on the todo list. @hkaiser What do you think?
maybe you'd rather that P1144's std::uninitialized_relocate_backward should forbid overlap .... If so, I'd like to know what's your plan for implementing things like {parallel_,}vector::erase
For the time being we do not implement parallel data structures, but there is a draft PR for using relocation inside hpx::detail::small_vector (which is like static_vector except it grows to dynamic storage after running out of static space) (isidorostsa#9).
0f91e09 to
84afafa
Compare
@hkaiser this is okay to merge if you agree with this |
|
@hkaiser I noticed that after the merge some tests are failing and I will get onto fixing that as soon as possible. |
|
@isidorostsa I'd like to revert this merge. Let's fix the issues with a new PR. Can you please do that? |
I take this back. The issues are unrelated to your changes. I will fix them separately. |
Thanks for letting me know. |

This PR introduces additional features related to relocation, aimed at both internal and public use. The specifics are:
The algorithm
uninitialized_relocate_backwardand its underlying lower-level primitives have been implemented. Corresponding tests for these algorithms have also been added.The function
uninitialized_relocatehas been refactored and it now uses a sentinel based for loop, instead of calculating the number of iterations and internally callinguninitialized_relocate_n.The primitives of
uninitialized_relocate_backwardswill be utilized internally for thehpx::small_vector::insert()method.