Skip to content

Fix collect() failure when pandas columns contain NULL values, date type mapping and tune-grid pipe placeholder compatibility#178

Merged
edgararuiz merged 13 commits intomlverse:mainfrom
tobiasdut:bugfix/date-datatype
Mar 8, 2026
Merged

Fix collect() failure when pandas columns contain NULL values, date type mapping and tune-grid pipe placeholder compatibility#178
edgararuiz merged 13 commits intomlverse:mainfrom
tobiasdut:bugfix/date-datatype

Conversation

@tobiasdut
Copy link
Copy Markdown
Contributor

Summary

This PR fixes three issues in Spark → Pandas → R collection.


1) NULL columns broke tibble conversion

Before

Collecting tables with NULL-heavy columns failed:

sparklyr::sdf_sql(sc, "SELECT * FROM "...") %>% head()

Error:

Error in `dplyr::as_tibble()`:
! All columns in a tibble must be vectors.
✖ Column `X` is NULL.
✖ Column `Y` is NULL.
...

Fix

to_pandas_cleaned() now flattens fallback output into proper R vectors while preserving row count, preventing literal NULL columns.

After

head() collects successfully without tibble errors.


2) Spark date returned as <dbl>

Before

Date columns sometimes arrived as numeric (days since epoch):

sparklyr::sdf_sql(sc, "...") %>% select(modification_date)
modification_date
<dbl>
17680
17301
...

Fix

If py_type == "date", values are explicitly converted via:

as.Date(x, origin = "1970-01-01")

across degraded R-side types (numeric, integer, character, list, logical).

After

modification_date
<date>
2018-05-29
2017-05-15
...

3) tune-grid.R pipe placeholder parse error

Before

... |> _$val

Could fail with:

pipe placeholder can only be used as a named argument

Fix

Replaced with parser-safe equivalent:

... |> (\(x) x$val)()

Behavior unchanged; compatibility improved.

@tobiasdut
Copy link
Copy Markdown
Contributor Author

Related to #177

@dabruehl
Copy link
Copy Markdown

dabruehl commented Mar 5, 2026

@edgararuiz Could you verify the fix and merge it please?

Thx @tobiasdut

@edgararuiz edgararuiz merged commit e6ea364 into mlverse:main Mar 8, 2026
4 checks passed
@tobiasdut tobiasdut deleted the bugfix/date-datatype branch March 8, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants