--- title: "Configuration Options for Parsing from JSON" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Configuration Options for Parsing from JSON} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = FALSE, comment = "#>" ) ``` ```{r setup} suppressPackageStartupMessages({ library(yyjsonr) }) ``` Overview ----------------------------------------------------------------------------- This vignette: * introduces the `opts` argument for reading JSON with the `read_json_X()` family of functions. * outlines the creation of default options with `opts_read_json()` * provides extended examples of how these options control parsing of JSON The `opts` argument - Specifying options when reading JSON ----------------------------------------------------------------------------- All `read_json_x()` functions have an `opts` argument. `opts` takes a named list of options used to configure the way `yyjsonr` parses JSON into R objects. The default argument for `opts` is an empty list, which internally sets the default options for parsing. The default options for parsing can also be viewed by running `opts_read_json()`. The following three function calls are all equivalent ways of calling `read_json_str()` using the default options: ```{r eval=FALSE} read_json_str(str) read_json_str(str, opts = list()) read_json_str(str, opts = opts_read_json()) ``` Setting arguments to override the default options ----------------------------------------------------------------------------- Setting a single option (and keeping all other options at their default value) can be done in a number of ways. The following three function calls are all equivalent: ```{r eval=FALSE} read_json_str(str, opts = list(str_specials = 'string')) read_json_str(str, opts = opts_read_json(str_specials = 'string')) read_json_str(str, str_specials = 'string') ``` Option `promote_num_to_string` - mixtures of numeric and string types ----------------------------------------------------------------------------- By default, `yyjsonr` does not promote string values to numerica values i.e. `promote_num_to_string = FALSE`. If an array contains mixed types, then an R *list* will be returned, so that all JSON values retain their original type. ```{r} json <- '[1,2,3.1,"apple", null]' read_json_str(json) ``` If `promote_num_to_string` is set to `TRUE`, then `yyjsonr` will promote numeric types to strings if the following conditions are met: * values are stored in a JSON array * the JSON array only contains numerics, strings or the JSON `null` value ```{r} yyjsonr::read_json_str(json, promote_num_to_string = TRUE) ``` Option `df_missing_list_elem` - Missing list elements (when parsing data.frames) ----------------------------------------------------------------------------- When JSON data is being parsed into an R data.frame some columns become *list-columns* if there are mixed types in the original JSON. It is possible that some values are completely missing in the JSON representation, and the `df_missing_list_elem` specifies the replacement for this missing value in the R data.frame. The default value is `df_missing_list_elem = NULL`. ### JSON to data.frame (no *list columns* needed) ```{r} str <- '[{"a":1, "b":2}, {"a":3, "b":4}]' read_json_str(str) ``` ### JSON to data.frame - *list-columns* required ```{r} str <- '[{"a":1, "b":[1,2]}, {"a":3, "b":2}]' read_json_str(str) ``` ```{r} str <- '[{"a":1, "b":[1,2]}, {"a":2}]' read_json_str(str) read_json_str(str, df_missing_list_elem = NA) ``` Option `obj_of_arrs_to_df` - Reading JSON as a data.frame ----------------------------------------------------------------------------- By default, if JSON looks like it represents a data.frame it will be loaded as such. That is, a JSON `{}` object which contains only `[]` arrays (all of equal length) will be treated as data.frame. This is the default i.e. `obj_of_arrs_to_df = TRUE`. If `obj_of_arrs_to_df = FALSE` then this data will be read in as a named list. In addition, if the `[]` arrays are not all the same length, then the data will also be read in as a named list as no inference of missing values will be done. ```{r} str <- '{"a":[1,2],"b":["apple", "banana"]}' read_json_str(str) read_json_str(str, obj_of_arrs_to_df = FALSE) ``` ```{r} str_unequal <- '{"a":[1,2],"b":["apple", "banana", "carrot"]}' read_json_str(str_unequal) ``` Option `arr_of_objs_to_df` - Reading JSON as a data.frame ----------------------------------------------------------------------------- ```{r} str <- '[{"a":1, "b":2}, {"a":3, "b":4}]' read_json_str(str) read_json_str(str, arr_of_objs_to_df = FALSE) ``` ```{r} str <- '[{"a":1, "b":2}, {"a":3, "b":4, "c":99}]' read_json_str(str) ``` Option `str_specials` - Reading string `"NA"` from JSON ----------------------------------------------------------------------------- JSON only really has the value `null` for representing special missing values, and this is converted to an R `NA_character_` value when it is encountered in a string-ish context. When `yyjsonr` encounters a literal `"NA"` value in a string-ish context, its conversion to an R value is controlled by the `str_specials` options The possible values for the `str_specials` argument are: * `string` read in as the literal character string `"NA"` (the default behaviour) * `special` read in as `NA_character_` ```{r} str <- '["hello", "NA", null]' read_json_str(str) # default: str_specials = 'string' read_json_str(str, str_specials = 'special') ``` Option `num_specials` - Reading numeric `"NA"`, `"NaN"` and `"Inf"` ----------------------------------------------------------------------------- JSON only really has the value `null` for representing special missing values, and this is converted to an R `NA_integer_` or `NA_real_` value when it is encountered in a number-ish context. When `yyjsonr` encounters a literal `"NA"`, `"NaN"` or `"Inf"` value in a number-ish context, its conversion to an R value is controlled by the `num_specials` options. The possible values for the `num_specials` argument are: * `special` read in as an actual numeric `NA`, `NaN` or `Inf` value (the default behaviour) * `string` read in as the literal character string `"NA"` etc ```{r} str <- '[1.23, "NA", "NaN", "Inf", "-Inf", null]' read_json_str(str) # default: num_specials = 'special' read_json_str(str, num_specials = 'string') ``` Option `int64` - large integer support ----------------------------------------------------------------------------- JSON supports large integers outside the range of R's 32-bit integer type. When such a large value is encountered in JSON, the `int64` option controls the value's representation in R. The possible values for the `int64` option are: * `string` store JSON integer as a string in R * `double` will store the JSON integer as a double precisision numeric. If the integer is outside the range +/- 2^53, then it may not be stored perfectly in the double. * `bit64` convert to a 64-bit integer supported by the [`{bit64}`](https://cran.r-project.org/package=bit64) package. ```{r echo=FALSE} suppressPackageStartupMessages( library(bit64) ) ``` ```{r} str <- '[1, 274877906944]' # default: int64 = 'string' # Since result is a mix of types, a list is returned read_json_str(str) # Read large integer as double robj <- read_json_str(str, int64 = 'double') class(robj) robj # Read large integer as 'bit64::integer64' type library(bit64) read_json_str(str, int64 = 'bit64') ``` Option `length1_array_asis` - distinguishing scalars from length-1 vectors ----------------------------------------------------------------------------- JSON supports the concept of both scalar and vector values i.e. in JSON scalar `67` is different from an array of length 1 `[67]`. The `length1_array_asis` option is for situations where it is important to distinguish these value types in R. However, R does not make this distinction between scalars and vectors of length 1. To assist in translating objects from JSON to R and back to JSON, setting `length1_array_asis = TRUE` will mark JSON arrays of length 1 with the class `AsIs`. This option defaults to `FALSE`. ```{r} read_json_str('67') |> str() read_json_str('[67]') |> str() read_json_str('67' , length1_array_asis = TRUE) |> str() read_json_str('[67]', length1_array_asis = TRUE) |> str() # Has 'AsIs' class ``` This option is then used with the option `auto_unbox` when writing JSON in order to control how length-1 R vectors are written. Shown below, if the length-1 vector is marked with `AsIs` class when reading, then when writing out to JSON with `auto_unbox = TRUE` it becomes a JSON vector value. In the following example, only the second value (`[67]`) is affected by the option `length1_array_asis`. When the option is `TRUE` the value is tagged with a class of `AsIs`. Then when the created R object is subsequently written out to a JSON string, its structure is determined by `auto_unbox` which understands how to handle this class. ```{r} str <- '{"a":67, "b":[67], "c":[1,2]}' # Length-1 vectors output as JSON arrays read_json_str(str) |> write_json_str(auto_unbox = FALSE) |> cat() # Length-1 vectors output as JSON scalars read_json_str(str) |> write_json_str(auto_unbox = TRUE) |> cat() # Length-1 vectors output as JSON arrays read_json_str(str, length1_array_asis = TRUE) |> write_json_str(auto_unbox = FALSE) |> cat() # !!!! # Those values marked with 'AsIs' class when reading are output # as length-1 JSON arrays read_json_str(str, length1_array_asis = TRUE) |> write_json_str(auto_unbox = TRUE) |> cat() ``` Option `yyjson_read_flag` - internal `YYJSON` C library options ----------------------------------------------------------------------------- The `yyjson` C library supports a number of internal options for reading JSON. These options are considered advanced, and the user is referred to the [`yyjson` documentation](https://ibireme.github.io/yyjson/doc/doxygen/html/md_doc__a_p_i.html#autotoc_md36) for further explanation on what they control. **Warning**: some of these advanced options do not make sense for interfacing with R, or otherwise conflict with how this package converts JSON to R objects. ```{r} # A reference list of all the possible YYJSON options yyjsonr::yyjson_read_flag read_json_str( "[1, 2, 3, ] // A JSON comment not allowed by the standard", opts = opts_read_json(yyjson_read_flag = c( yyjson_read_flag$YYJSON_READ_ALLOW_TRAILING_COMMAS, yyjson_read_flag$YYJSON_READ_ALLOW_COMMENTS )) ) ```