Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-723] Add write format for (Geo)Arrow #1863

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

paleolimbot
Copy link
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

  • Yes, and the PR name follows the format [SEDONA-723] my subject.

What changes were proposed in this PR?

This PR is intended to add df.write.format("arrows") when complete (but is currently just an exploration of this idea.

How was this patch tested?

It will be with tests in Java (if this change seems worth it!)

Did this PR include necessary documentation updates?

  • Yes, I am adding a new API (and will update docs if this idea is accepted!)

In SEDONA-660, SEDONA-714, and SEDONA-717, we wired up the ArrowSerializer from SparkConnect to accelerate transfer between the JVM and Python on the driver. For queries whose results are arbitrarily large or unknown at the time of issuing the query, this can result in out-of-memory and it would be helpful to have an escape hatch. This is also a useful way for Sedona users to build services on top of Sedona (e.g., by returning the URLs to the written Arrow files as described in https://arrow.apache.org/blog/2025/01/10/arrow-result-transfer/ ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant