fix: better import exception for numpy #2397
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Issues
Fixes #2380
Context
When converting arbitrary row data to
pyarrow
, we try to usepandas
to transpose row-wise data to column-wise data. If unavailable, we fallback tonumpy
Since
pyarrow >= 18
, numpy is no longer a dependency. Our pyarrow code assumed that numpy was available. There's no error / change from related to our recent refactoring.Problem
User reported (#2380) that a pipeline with SQL source fails with an import error when using
pyarrow >= 18
because of missing numpy dependency.Note. the reported config
pyarrow==18, python==3.12
doesn't match thepyproject.toml
constraintspyarrow < 18
forpython < 3.13
. I would expect the package manager to enforce contraints and prevent this error.Solution
Pyarrow is not a dependency of
dlt[sql_database]
, so numpy probably shouldn't be either. Also, except this code, only some LanceDB-related function seems to assume numpy to be available.Exception handling now raises a message similar to when pyarrow is missing.
Unfortunately, it's hard to raise earlier than
DBApiCursor
's arrow related methods