Skip to content

design: lift max result size limitation #32211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

happening in `environmentd`, but could happen on the cluster side. We think it
is easier to lift result-size limitations for queries that don't require
post-processing. We should be able to just "stream them through", without
materializing in `environmentd`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think it wouldn't be super hard to support these cases using the Consolidator machinery we have now. Happy to chat about this if you're interested!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I was thinking this might be possible, glad to know you think the same!

`environmentd`) stream those results from persist back to the client.

We would use a (configurable) threshold for determining whether to send a
result inline in the compute protocol or out of band via the blob store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this could use the same machinery as @ParkMyCar worked up for large insert / update queries, but streaming the resulting batch into a client instead of linking it into a shard?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same! I started to build an abstraction for this in #31189, I added a new crate there called row-stash a response type called PeekResponseUnary::Batches, and finally the rows get returned to the user via ExecuteResponse::SendingRowsStreaming.

FWIW some things in that PR are a bit hacked together so the code might not be perfect 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants