If the parquet file has factor/dictionary columns, read_parquet_duckdb() runs without errors. However, attempting operations such as filtering on the factor column do not work. Interestingly, other verbs like count() do work.
Clearly related to #88, but I did not see documentation on the duckplyr website of the factor-variable limitation at all, and it seems like the issue is possibly an interaction of filter() and read_parquet_duckdb(), since count() works.
Reprex (apologies for the file size, didn't have time to trim further
county.parquet.zip
library(duckplyr)
d = read_parquet_duckdb("county.parquet")
count(d, contest)
#> # A duckplyr data frame: 2 variables
#> contest n
#> <chr> <int>
#> 1 house 194040
#> 2 president 181106
#> 3 senate 101726
nrow(filter(d, contest == "president"))
#> [1] 0
d2 = as_duckdb_tibble(collect(d))
count(d2, contest)
#> # A duckplyr data frame: 2 variables
#> contest n
#> <chr> <int>
#> 1 house 194040
#> 2 president 181106
#> 3 senate 101726
nrow(filter(d2, contest == "president"))
#> [1] 181106
If the parquet file has factor/dictionary columns,
read_parquet_duckdb()runs without errors. However, attempting operations such as filtering on the factor column do not work. Interestingly, other verbs likecount()do work.Clearly related to #88, but I did not see documentation on the
duckplyrwebsite of the factor-variable limitation at all, and it seems like the issue is possibly an interaction offilter()andread_parquet_duckdb(), sincecount()works.Reprex (apologies for the file size, didn't have time to trim further
county.parquet.zip