Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to treat nested arrays? #48

Open
jeremystan opened this issue Sep 1, 2016 · 3 comments
Open

How to treat nested arrays? #48

jeremystan opened this issue Sep 1, 2016 · 3 comments

Comments

@jeremystan
Copy link
Collaborator

Nested arrays are difficult to work with. For example,

x <- '[[1, 2], 1]' %>% gather_array %>% json_types
x
#>   document.id array.index   type
#> 1           1           1  array
#> 2           1           2 number

At this point, there is no way to gather the next array unless we filter on type == 'array'.

x %>% gather_array("level2")
#> Error in gather_array(., "level2") : 1 records are not arrays
x %>% filter(type == "array") %>% gather_array("level2")
#>   document.id array.index  type level2
#> 1           1           1 array      1
#> 2           1           1 array      2

append_values_number works, but returns NA for the array, and recursive = TRUE doesn't work through the second level array. Further, it could be that the types are mixed.

@colearendt
Copy link
Owner

colearendt commented May 2, 2017

A similar-ish case that may be worth considering here is arrays that have been improperly serialized to an object when there is only one element. I.e. JSON like:

x <- '[{"id": 1, "list":[1,2,3]}, {"id": 2, "list": 4}]'
x %>% gather_array() %>% 
  spread_values(id=jnumber('id')) %>%
  enter_object('list') %>%
  json_types()

While technically not valid, it may still be nice to have a way to work with it. The work-around solution here is the same - filtering on type == 'array'.

I also posted the workaround in an actual question someone had here

@colearendt
Copy link
Owner

colearendt commented Jun 6, 2017

Honestly, it seems all that is really needed here is a way to bypass the type-checking. The function itself already handles these cases fairly nicely when the type-check is removed. Not sure whether the better behavior is a parameter in the function or an environmental variable like tidyjson.typesafety or something like that.

By commenting out the type-checking lines in the gather_factory:

x <- "[{\"id\": 1, \"list\":[1,2,3]}, {\"id\": 2, \"list\": 4}]"
x %>% gather_array() %>% enter_object("list") %>% json_types() %>% 
gather_array("array.index2") %>% 
  json_types("type2")
#> # A tbl_json: 4 x 5 tibble with a "JSON" attribute
#>   `attr(., "JSON")` document.id array.index   type array.index2  type2
#>               <chr>       <int>       <int> <fctr>        <int> <fctr>
#> 1                 1           1           1  array            1 number
#> 2                 2           1           1  array            2 number
#> 3                 3           1           1  array            3 number
#> 4                 4           1           2 number            1 number

x <- "[[1, 2], 1]" %>% gather_array %>% json_types
x %>% gather_array("array.index2") %>% json_types("type2")
#> # A tbl_json: 3 x 5 tibble with a "JSON" attribute
#>   `attr(., "JSON")` document.id array.index   type array.index2  type2
#>               <chr>       <int>       <int> <fctr>        <int> <fctr>
#> 1                 1           1           1  array            1 number
#> 2                 2           1           1  array            2 number
#> 3                 1           1           2 number            1 number

Although perhaps it would be preferable for the array.index2 to be NA and thereby illustrate that it was not an array? Not sure which behavior is more consistent and desirable.

@colearendt
Copy link
Owner

colearendt commented Jun 11, 2017

The change above is very problematic for objects, for which keys are silently thrown away, so a better proposal is required... maybe a way to not touch bad_types and preserve them as NA?

'{"a":"one","b":"two","c":"three"}' %>% 
  gather_array() %>% 
  append_values_string()
## A tbl_json: 3 x 3 tibble with a "JSON" attribute
#  `attr(., "JSON")` document.id array.index string
#              <chr>       <int>       <int>  <chr>
#1         "\"one\""           1           1    one
#2         "\"two\""           1           2    two
#3       "\"three\""           1           3  three

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants