-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort results on replica, merge on envd #30558
Conversation
e27537d
to
d991c6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo! Love to see this happening. IMO the biggest feedback I have is enforcing the invariant that peek results must sorted at the type level, and maybe at the same time reducing the repetitiveness of creating DatumVec
s and calling .sort_by(...)
.
It seems like everywhere we currently sort a Vec<Row>
we're immediately passing the results into RowCollection::new
. What if we push the sorting into RowCollection::new
? i.e.
impl RowCollection {
pub fn new(mut rows: Vec<Row>, finishing: &RowSetFinishing) -> Self { ... }
}
At which point RowCollection
is sorted so what's the point of SortedRowCollection
? It kind of feels like RowCollection
could naturally become a SortedRowRun
and then SortedRowCollection
becomes a collection of SortedRowRun
s? e.g.
struct SortedRowRun {
encoded: Bytes,
metadata: Arc<[EncodedRowMetadata]>,
}
struct SortedRowCollection {
runs: Vec<SortedRowRun>,
}
This is a much larger change, and I think only part one (pushing the sort into RowCollection
) is enough to get this across the line because it mostly solves the invariant that a RowCollection
must be sorted. But I think we can still do part 2 without having to touch the code related to result finishing since most of that should use a Box<dyn RowIterator>
IIRC.
src/repr/src/row/collection.rs
Outdated
while let Some(Reverse(mut finger)) = heap.pop() { | ||
view.push(finger.start); | ||
finger.start += 1; | ||
if finger.start < finger.end { | ||
heap.push(Reverse(finger)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there is a great opportunity to push this logic into SortedRowCollection
or SortedRowCollectionIter
maybe? i.e. as folks iterate through a row collection is when we do this streaming merge sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I assume we're going to iterate through the result rows only once when we send them over the wire, so we can avoid having the extra view
buffer around and decoding the rows twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we need to iterate multiple times to determine the size of the whole result. I agree we shouldn't (and there is no deep reason we have to), but it requires more changes.
src/repr/src/row/collection.rs
Outdated
@@ -33,6 +35,8 @@ pub struct RowCollection { | |||
encoded: Bytes, | |||
/// Metadata about an individual Row in the blob. | |||
metadata: Vec<EncodedRowMetadata>, | |||
/// Start of sorted runs of rows in rows. | |||
fingers: Vec<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation here confused me. This field actually stores the indexes of the ends of sorted runs, right? Is there a reason for that? It does feel like storing the start indexes would be more natural.
I know you said the PR lacks documentation, so if you still planned to adjust it here then nvm!
src/repr/src/row/collection.rs
Outdated
while let Some(Reverse(mut finger)) = heap.pop() { | ||
view.push(finger.start); | ||
finger.start += 1; | ||
if finger.start < finger.end { | ||
heap.push(Reverse(finger)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I assume we're going to iterate through the result rows only once when we send them over the wire, so we can avoid having the extra view
buffer around and decoding the rows twice.
We discussed a bit offline already, an my understanding is that this PR is meant to stop the bleeding with an as-small-as-possible diff and do improvements as a follow-up. That plan is fine with me.
I agree with that. Once concern is that we add some place where we create Another reason for wanting such a type is that we sometimes don't have to sort! Specifically, if the But if it's true that we don't need to sort if the struct RowRuns {
runs: Vec<RowCollection>,
order_by: Vec<ColumnOrder>,
}
impl RowRuns {
fn push(&mut self, mut rows: Vec<(Row, NonZeroU64)>) {
if !self.order_by.is_empty() {
sort(&mut rows, &self.order_by);
}
self.runs.push(RowCollection::new(&rows));
}
} |
I agree that strictly speaking there are cases where we don't have to sort, but I'm not comfortable changing the invariant as part of this PR. We might have downstream code that relies on a certain row order, as well as our tests, so I'd like to separate this from the current effort. |
d991c6c
to
0e6ee9d
Compare
I agree it could! It's a non-trivial departure to what we currently have: At the moment, we allow to index into a |
568a580
to
dab4bcd
Compare
Yes, I'm very much in favor of taking small steps! Just wanted to record my thoughts for follow-ups we can/should do. Also partly to check my thinking around whether or not sorting is necessary. |
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:The pull request has a high risk score of 80, driven by predictors such as the "Sum Bug Reports Of Files" and the "Delta of Executable Lines." Historically, PRs with these predictors are 116% more likely to cause a bug than the repository baseline. The observed bug trend in the repository is steady. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
e470aac
to
562c3d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Sorry I didn't realize pushing the sort into RowCollection
would require moving the struct to the expr
crate 🙈 thanks for making that change!
Thanks for the reviews! |
Sort results on replica, merge on environmentd.
Previously, we'd sort data only on evironmentd, which would cause it to consume more CPU than necessary. This change moves some of the sorting to clusterd, and only leaves the last merge step on environmentd.
The PR selects a minimal approach, and leaves most of the code related to result finishing untouched. It introduces an invariant that peek results must always be sorted according to the finishing, anything else will lead to undefined results. However, there's nothing that enforces the results to be sorted with the same ordering, which is potentially bad. Inside environmentd, it uses a simple heap to combine$k$ sorted runs into a single permutation map.
The interfaces to
RowCollection
(new
,sorted_view
) now take a&[ColumnOrder]
, and internally the implementation picks the right comparison function. If the column order slice is empty, it'll skip decoding the rows and directly defer to the tiebreaker.The PR moves the
RowCollection
type intomz-expr
, which isn't ideal. This is required because theColumnOrder
type is defined here, and we'd like to pass it to the constructor of the type. Alternatives would be to have a function here that passes the correct comparison function toRowCollection
, but that seems to be strictly worse than moving the type.I considered moving the type to
compute-types
, which seems a better fit, but not all uses ofRowCollection
depend oncompute-types
. If this is upsetting, I can think about alternatives.This complexity for sorting on the cluster is roughly$\frac{n}{k}\cdot\log \frac{n}{k}$ , where $n$ is the total number of result records, and $k$ the number of workers. The last merge step then has a time complexity of $n\cdot\log k$ to combine $k$ sorted runs into one.
Follow-up items include:
Bytes
allocation for all rows, and instead keep the individual allocations.RowCollections
are sorted equally.Tips to the reviewer
Don't look at individual commits.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.