You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The duckplyr package is aimed to be a drop-in replacement for dplyr, with full behavior compatibility. To assert that, I'm running checks with a rigged version of dplyr. This package fails its checks in this scenario.
From the error message, I can't tell immediately what the cause of the failure is. I'd appreciate your help: can you please help digest a reproducible example that shows how duckplyr is behaving differently from dplyr in your use case?
The modified dplyr version can be installed with any of:
I'm not sure exactly what's happening behind the scenes, but I was able to distill the problem down to some interaction between duckplyr and a tidymodels workflow using xgboost when step_rename() or step_mutate() are included in a recipe step.
The example below doesn't rely on offsetreg, so I'll look to close this issue assuming we agree the issue lies elsewhere (recipes or somewhere else in the tidymodels ecosystem).
Running this example multiple times will produce volatile results. This shouldn't be possible because a single tree is fit on one predictor across the full training set. Sometimes an error is returned:
Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) :
[08:08:35] src/data/data.cc:461: Check failed: valid: Label contains NaN, infinity or a value too large.
If we take out methods_overwrite(), the bottom line will return the same results every time.
If the recipe is simplified to rec <- recipe(mpg ~ wt, mtcars) then everything works fine.
I experimented with adding other recipe steps like step_log() and didn't notice any issues.
If either step_rename() or step_mutate() are added to a recipe, results are volatile. Interestingly, this happens even if these functions are passed with no arguments.
I haven't been able to recreate this issue with other models or engines besides xgboost.
The duckplyr package is aimed to be a drop-in replacement for dplyr, with full behavior compatibility. To assert that, I'm running checks with a rigged version of dplyr. This package fails its checks in this scenario.
Details: https://github.com/krlmlr/dplyr/blob/f-revdep-duckplyr/revdep/problems.md .
Learn more about duckplyr: https://duckplyr.tidyverse.org/ .
From the error message, I can't tell immediately what the cause of the failure is. I'd appreciate your help: can you please help digest a reproducible example that shows how duckplyr is behaving differently from dplyr in your use case?
The modified dplyr version can be installed with any of:
Thanks a lot for your help! Please let me know if you have any questions.
Tracker: tidyverse/duckplyr#297.
The text was updated successfully, but these errors were encountered: