Definition of a gtsummary Object

This vignette is meant for those who wish to contribute to {gtsummary}, or users who wish to gain an understanding of the inner-workings of a {gtsummary} object so they may more easily modify them to suit your own needs. If this does not describe you, please refer to the {gtsummary} website to an introduction on how to use the package’s functions and tutorials on advanced use.

Introduction

Every {gtsummary} table has a few characteristics common among all tables created with the package. Here, we review those characteristics, and provide instructions on how to construct a {gtsummary} object.

library(gtsummary)

tbl_regression_ex <-
  lm(age ~ grade + marker, trial) %>%
  tbl_regression() %>%
  bold_p(t = 0.5)

tbl_summary_ex <-
  trial %>%
  select(trt, age, grade, response) %>%
  tbl_summary(by = trt)

Structure of a {gtsummary} object

Every {gtsummary} object is a list comprising of, at minimum, these elements:

.$table_body    .$table_styling         

table_body

The .$table_body object is the data frame that will ultimately be printed as the output. The table must include columns "label", "row_type", and "variable". The "label" column is printed, and the other two are hidden from the final output.

tbl_summary_ex$table_body
#> # A tibble: 8 × 7
#>   variable var_type    row_type var_label      label          stat_1      stat_2
#>   <chr>    <chr>       <chr>    <chr>          <chr>          <chr>       <chr> 
#> 1 age      continuous  label    Age            Age            46 (37, 60) 48 (3…
#> 2 age      continuous  missing  Age            Unknown        7           4     
#> 3 grade    categorical label    Grade          Grade          <NA>        <NA>  
#> 4 grade    categorical level    Grade          I              35 (36%)    33 (3…
#> 5 grade    categorical level    Grade          II             32 (33%)    36 (3…
#> 6 grade    categorical level    Grade          III            31 (32%)    33 (3…
#> 7 response dichotomous label    Tumor Response Tumor Response 28 (29%)    33 (3…
#> 8 response dichotomous missing  Tumor Response Unknown        3           4

table_styling

The .$table_styling object is a list of data frames containing information about how .$table_body is printed, formatted, and styled.
The list contains the following data frames header, footnote, footnote_abbrev, fmt_fun, text_format, fmt_missing, cols_merge and the following objects source_note, caption, horizontal_line_above.

header

The header table has the following columns and is one row per column found in .$table_body. The table contains styling information that applies to entire column or the columns headers.

Column Description
column Column name from .$table_body
hide Logical indicating whether the column is hidden in the output. This column is also scoped in modify_header() (and friends) to be used in a selecting environment
align Specifies the alignment/justification of the column, e.g. ‘center’ or ‘left’
label Label that will be displayed (if column is displayed in output)
interpret_label the {gt} function that is used to interpret the column label, gt::md() or gt::html()
spanning_header Includes text printed above columns as spanning headers.
interpret_spanning_header the {gt} function that is used to interpret the column spanning headers, gt::md() or gt::html()
modify_stat_{*} any column beginning with modify_stat_ is a statistic available to report in modify_header() (and others)
modify_selector_{*} any column beginning with modify_selector_ is a column that is scoped in modify_header() (and friends) to be used in a selecting environment

footnote & footnote_abbrev

Each {gtsummary} table may contain a single footnote per header and cell within the table. Footnotes and footnote abbreviations are handled separately. Updates/changes to footnote are appended to the bottom of the tibble. A footnote of NA_character_ deletes an existing footnote.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body, NA indicates to add footnote to header
footnote string containing footnote to add to column/row

fmt_fun

Numeric columns/rows are styled with the functions stored in fmt_fun. Updates/changes to styling functions are appended to the bottom of the tibble.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body
fmt_fun list of formatting/styling functions

text_format

Columns/rows are styled with bold, italic, or indenting stored in text_format. Updates/changes to styling functions are appended to the bottom of the tibble.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body
format_type one of c('bold', 'italic', 'indent')
undo_text_format logical indicating where the formatting indicated should be undone/removed.

fmt_missing

By default, all NA values are shown blanks. Missing values in columns/rows are replaced with the symbol. For example, reference rows in tbl_regression() are shown with an em-dash. Updates/changes to styling functions are appended to the bottom of the tibble.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body
symbol string to replace missing values with, e.g. an em-dash

cols_merge

This object is experimental and may change in the future. This tibble gives instructions for merging columns into a single column. The implementation in as_gt() will be updated after gt::cols_label() gains a rows= argument.

Column Description
column Column name from .$table_body
rows expression selecting rows in .$table_body
pattern glue pattern directing how to combine/merge columns. The merged columns will replace the column indicated in ‘column’.

source_note

String that is made a table source note. The attribute "text_interpret" is either c("md", "html").

caption

String that is made into the table caption. The attribute "text_interpret" is either c("md", "html").

horizontal_line_above

Expression identifying a row where a horizontal line is placed above in the table.

Example from tbl_regression()

tbl_regression_ex$table_styling
#> $header
#> # A tibble: 24 × 8
#>    column             hide  align  interpret_label label  interpret_spanning_h…¹
#>    <chr>              <lgl> <chr>  <chr>           <chr>  <chr>                 
#>  1 variable           TRUE  center gt::md          varia… gt::md                
#>  2 var_label          TRUE  center gt::md          var_l… gt::md                
#>  3 var_type           TRUE  center gt::md          var_t… gt::md                
#>  4 reference_row      TRUE  center gt::md          refer… gt::md                
#>  5 row_type           TRUE  center gt::md          row_t… gt::md                
#>  6 header_row         TRUE  center gt::md          heade… gt::md                
#>  7 N_obs              TRUE  center gt::md          N_obs  gt::md                
#>  8 N                  TRUE  center gt::md          **N**  gt::md                
#>  9 coefficients_type  TRUE  center gt::md          coeff… gt::md                
#> 10 coefficients_label TRUE  center gt::md          coeff… gt::md                
#> # ℹ 14 more rows
#> # ℹ abbreviated name: ¹​interpret_spanning_header
#> # ℹ 2 more variables: spanning_header <chr>, modify_stat_N <int>
#> 
#> $footnote
#> # A tibble: 0 × 4
#> # ℹ 4 variables: column <chr>, rows <list>, text_interpret <chr>,
#> #   footnote <chr>
#> 
#> $footnote_abbrev
#> # A tibble: 2 × 4
#>   column    rows      text_interpret footnote                
#>   <chr>     <list>    <chr>          <chr>                   
#> 1 conf.low  <quosure> gt::md         CI = Confidence Interval
#> 2 std.error <quosure> gt::md         SE = Standard Error     
#> 
#> $text_format
#> # A tibble: 1 × 4
#>   column  rows      format_type undo_text_format
#>   <chr>   <list>    <chr>       <lgl>           
#> 1 p.value <quosure> bold        FALSE           
#> 
#> $indent
#> # A tibble: 2 × 3
#>   column rows      n_spaces
#>   <chr>  <list>       <int>
#> 1 label  <lgl [1]>        0
#> 2 label  <quosure>        4
#> 
#> $fmt_missing
#> # A tibble: 4 × 3
#>   column    rows      symbol
#>   <chr>     <list>    <chr> 
#> 1 estimate  <quosure> —     
#> 2 conf.low  <quosure> —     
#> 3 std.error <quosure> —     
#> 4 statistic <quosure> —     
#> 
#> $fmt_fun
#> # A tibble: 10 × 3
#>    column      rows      fmt_fun
#>    <chr>       <list>    <list> 
#>  1 estimate    <quosure> <fn>   
#>  2 N           <quosure> <fn>   
#>  3 N_obs       <quosure> <fn>   
#>  4 n_obs       <quosure> <fn>   
#>  5 conf.low    <quosure> <fn>   
#>  6 conf.high   <quosure> <fn>   
#>  7 p.value     <quosure> <fn>   
#>  8 std.error   <quosure> <fn>   
#>  9 statistic   <quosure> <fn>   
#> 10 var_nlevels <quosure> <fn>   
#> 
#> $cols_merge
#> # A tibble: 1 × 3
#>   column   rows      pattern                
#>   <chr>    <list>    <chr>                  
#> 1 conf.low <quosure> {conf.low}, {conf.high}

Constructing a {gtsummary} object

table_body

When constructing a {gtsummary} object, the author will begin with the .$table_body object. Recall the .$table_body data frame must include columns "label", "row_type", and "variable". Of these columns, only the "label" column will be printed with the final results. The "row_type" column typically will control whether or not the label column is indented. The "variable" column is often used in the inline_text() family of functions, and merging {gtsummary} tables with tbl_merge().

tbl_regression_ex %>%
  getElement("table_body") %>%
  select(variable, row_type, label)
#> # A tibble: 5 × 3
#>   variable row_type label               
#>   <chr>    <chr>    <chr>               
#> 1 grade    label    Grade               
#> 2 grade    level    I                   
#> 3 grade    level    II                  
#> 4 grade    level    III                 
#> 5 marker   label    Marker Level (ng/mL)

The other columns in .$table_body are created by the user and are likely printed in the output. Formatting and printing instructions for these columns is stored in .$table_styling.

table_styling

There are a few internal {gtsummary} functions to assist in constructing and modifying a .$table_header data frame.

  1. .create_gtsummary_object(table_body) After a user creates a table_body, pass it to this function and the skeleton of a gtsummary object is created and returned (including the full table_styling list of tables).

  2. .update_table_styling() After columns are added or removed from table_body, run this function to update .$table_styling to include or remove styling instructions for the columns. FYI the default styling for each new column is to hide it.

  3. modify_table_styling() This exported function modifies the printing instructions for a single column or groups of columns.

  4. modify_table_body() This exported function helps users make changes to .$table_body. The function runs .update_table_styling() internally to maintain internal validity with the printing instructions.

Printing a {gtsummary} object

All {gtsummary} objects are printed with print.gtsummary(). Before a {gtsummary} object is printed, it is converted to a {gt} object using as_gt(). This function takes the {gtsummary} object as its input, and uses the information in .$table_styling to construct a list of {gt} calls that will be executed on .$table_body. After the {gtsummary} object is converted to {gt}, it is then printed as any other {gt} object.

In some cases, the package defaults to printing with other engines, such as flextable (as_flex_table()), huxtable (as_hux_table()), kableExtra (as_kable_extra()), and kable (as_kable()). The default print engine is set with the theme element "pkgwide-str:print_engine"

While the actual print function is slightly more involved, it is basically this:

print.gtsummary <- function(x) {
  get_theme_element("pkgwide-str:print_engine") %>%
    switch(
      "gt" = as_gt(x),
      "flextable" = as_flex_table(x),
      "huxtable" = as_hux_table(x),
      "kable_extra" = as_kable_extra(x),
      "kable" = as_kable(x)
    ) %>%
    print()
}

The .$cards object

When a gtsummary function is called that requires new statistics, these new calculations are stored in a tibble. These tibbles are often calculated with functions from the {cards} and {cardx} packages.

These structured tibbles store labels for statistics, functions to format them, and more. See the {cards} package documentation for details.

tbl_summary_ex$cards[["tbl_summary"]]
#> {cards} data frame: 76 x 12
#>    group1 group1_level variable variable_level stat_name stat_label  stat
#> 1     trt       Drug A    grade              I         n          n    35
#> 2     trt       Drug A    grade              I         N          N    98
#> 3     trt       Drug A    grade              I         p          % 0.357
#> 4     trt       Drug B    grade              I         n          n    33
#> 5     trt       Drug B    grade              I         N          N   102
#> 6     trt       Drug B    grade              I         p          % 0.324
#> 7     trt       Drug A    grade             II         n          n    32
#> 8     trt       Drug A    grade             II         N          N    98
#> 9     trt       Drug A    grade             II         p          % 0.327
#> 10    trt       Drug B    grade             II         n          n    36
#>    gts_column
#> 1      stat_1
#> 2      stat_1
#> 3      stat_1
#> 4      stat_2
#> 5      stat_2
#> 6      stat_2
#> 7      stat_1
#> 8      stat_1
#> 9      stat_1
#> 10     stat_2
#> ℹ 66 more rows
#> ℹ Use `print(n = ...)` to see more rows
#> ℹ 4 more variables: context, fmt_fn, warning, error