[SEDONA-723] Add write format for (Geo)Arrow #1863

paleolimbot · 2025-03-18T19:08:07Z

Did you read the Contributor Guide?

Yes, I have read the Contributor Rules and Contributor Development Guide

Is this PR related to a ticket?

Yes, and the PR name follows the format [SEDONA-723] my subject.

What changes were proposed in this PR?

This PR is intended to add df.write.format("arrows") when complete (but is currently just an exploration of this idea.

How was this patch tested?

It will be with tests in Java (if this change seems worth it!)

Did this PR include necessary documentation updates?

Yes, I am adding a new API (and will update docs if this idea is accepted!)

In SEDONA-660, SEDONA-714, and SEDONA-717, we wired up the ArrowSerializer from SparkConnect to accelerate transfer between the JVM and Python on the driver. For queries whose results are arbitrarily large or unknown at the time of issuing the query, this can result in out-of-memory and it would be helpful to have an escape hatch. This is also a useful way for Sedona users to build services on top of Sedona (e.g., by returning the URLs to the written Arrow files as described in https://arrow.apache.org/blog/2025/01/10/arrow-result-transfer/ ).

paleolimbot added 2 commits March 17, 2025 15:49

arrow spark writer stub

6f093c9

one more

87a695c

github-actions bot added the sedona-spark label Mar 18, 2025

maybe build on more than one spark/scala combo

a2795f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SEDONA-723] Add write format for (Geo)Arrow #1863

[SEDONA-723] Add write format for (Geo)Arrow #1863

paleolimbot commented Mar 18, 2025

[SEDONA-723] Add write format for (Geo)Arrow #1863

Are you sure you want to change the base?

[SEDONA-723] Add write format for (Geo)Arrow #1863

Conversation

paleolimbot commented Mar 18, 2025

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?