-
Notifications
You must be signed in to change notification settings - Fork 127
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
PyArrow is a massive dependency. Unpacked, it tends to be >100MB in size, and, until the latest versions (I think?) also required numpy as its own non-optional dependency.
It's also, in effect the only current dependency
datafusion-python/pyproject.toml
Line 46 in f0bbad7
dependencies = ["pyarrow>=11.0.0", "typing-extensions;python_version<'3.13'"] |
It would be great if we could remove it, and that would greatly lessen the minimal environment size for datafusion python.
Many other Python Arrow libraries implement the PyCapsule Interface, so the user can use nanoarrow, arro3, Polars, DuckDB, etc, or pyarrow. Whatever is best for them.
Describe the solution you'd like
The Arrow PyCapsule Interface is a lightweight, decentralized protocol for sharing Arrow data between Python libraries. We already implement the PyCapsule Interface, so it's just a matter of removing places where we hard-code use of pyarrow.
Describe alternatives you've considered
Keep pyarrow dependency.
Additional context