Skip to content

unnest_longer/unnest inconsistency with list columns of more lists? #1584

@oliverbeagley-pgg

Description

@oliverbeagley-pgg

I have some code that depending on previous processes can have a zero row dataframe with correct columns and types or rows that for some columns will be a list of more lists e.g. it could be something like

foo <- function(x) {
  list(
    list(a = x * 10),
    list(a = x * 100)
  )
}

df_empty <- tibble::tibble(x = integer()) |>
  dplyr::mutate(y = lapply(x, foo))

df_empty
# # A tibble: 0 × 2
# # ℹ 2 variables: x <int>, y <list>

df_valued <- tibble::tibble(x = 1:3) |>
  dplyr::mutate(y = lapply(x, foo))

df_valued
# # A tibble: 3 × 2
#       x y         
#   <int> <list>    
# 1     1 <list [2]>
# 2     2 <list [2]>
# 3     3 <list [2]>

I'm trying to write code that will happily work with either and still result in proper columns being produced, though I'm issues getting unnest_longer to play well with the empty dataframe compared to unnest (I'd prefer to use unnest_longer as I believe it is more clear as to what it is doing).

With unnest it is:

df_empty |> tidyr::unnest("y", ptype = list())
# # A tibble: 0 × 2
# # ℹ 2 variables: x <int>, y <list>

df_valued |> tidyr::unnest("y", ptype = list())
# # A tibble: 6 × 2
#       x y               
#   <int> <list>          
# 1     1 <named list [1]>
# 2     1 <named list [1]>
# 3     2 <named list [1]>
# 4     2 <named list [1]>
# 5     3 <named list [1]>
# 6     3 <named list [1]>

Though with trying something similar with unnest_longer:

df_empty |> tidyr::unnest_longer("y", ptype = list())
# Error in `tidyr::unnest_longer()`:
# ! Can't convert `x` <logical> to <list>.
# Run `rlang::last_trace()` to see where the error occurred.

df_valued |> tidyr::unnest_longer("y", ptype = list())
# # A tibble: 6 × 2
#       x            y
#   <int> <list<list>>
# 1     1          [1]
# 2     1          [1]
# 3     2          [1]
# 4     2          [1]
# 5     3          [1]
# 6     3          [1]

I can see the outputs are slightly different based on the tibble info, but the downstream operations I'm using don't seem to care about this e.g. tacking on |> tidyr::hoist("y", "a", .ptype = list(a = integer())) with either works for df_valued.

I've tried a bunch of the arguments of unnest_longer without much success, is there something I'm missing or is this a limitation of unnest_longer? As mentioned I can use unnest so I have a work around, but it would be nice to use unnest_longer instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions