--- title: "Getting started" author: "Ewen Harrison" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The `finafit` package brings together the day-to-day functions we use to generate final results tables and plots when modelling. I spent many years repeatedly manually copying results from R analyses and built these functions to automate our standard healthcare data workflow. It is particularly useful when undertaking a large study involving multiple different regression analyses. When combined with RMarkdown, the reporting becomes entirely automated. Its design follows Hadley Wickham's tidy tool manifesto. ## Installation and Documentation Development lives on GitHub. You can install the `finalfit` development version from CRAN with: ```{r eval=FALSE} install.packages("finalfit") ``` It is recommended that this package is used together with dplyr, which is a dependent. Some of the functions require `rstan` and `boot`. These have been left as `Suggests` rather than `Depends` to avoid unnecessary installation. If needed, they can be installed in the normal way: ```{r eval=FALSE} install.packages("rstan") install.packages("boot") ``` To install off-line (or in a Safe Haven), download the zip file and use `devtools::install_local()`. ## Main Features ### 1. Summarise variables/factors by a categorical variable `summary_factorlist()` is a wrapper used to aggregate any number of explanatory variables by a single variable of interest. This is often "Table 1" of a published study. When categorical, the variable of interest can have a maximum of five levels. It uses `Hmisc::summary.formula()`. ```{r, warning=FALSE, message=FALSE} library(finalfit) library(dplyr) # Load example dataset, modified version of survival::colon data(colon_s) # Table 1 - Patient demographics by variable of interest ---- explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor") dependent = "perfor.factor" # Bowel perforation colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE, add_dependent_label=TRUE) -> t1 knitr::kable(t1, row.names=FALSE, align=c("l", "l", "r", "r", "r")) ``` When exported to PDF: See other options relating to inclusion of missing data, mean vs. median for continuous variables, column vs. row proportions, include a total column etc. `summary_factorlist()` is also commonly used to summarise any number of variables by an outcome variable (say dead yes/no). ```{r, warning=FALSE, message=FALSE} # Table 2 - 5 yr mortality ---- explanatory = c("age.factor", "sex.factor", "obstruct.factor") dependent = 'mort_5yr' colon_s %>% summary_factorlist(dependent, explanatory, p=TRUE, add_dependent_label=TRUE) -> t2 knitr::kable(t2, row.names=FALSE, align=c("l", "l", "r", "r", "r")) ``` Tables can be knitted to PDF, Word or html documents. We do this in RStudio from a .Rmd document. ### 2. Summarise regression model results in final table format The second main feature is the ability to create final tables for linear `lm()`, logistic `glm()`, hierarchical logistic `lme4::glmer()` and Cox proportional hazards `survival::coxph()` regression models. The `finalfit()` "all-in-one" function takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a final table for publication including summary statistics, univariable and multivariable regression analyses. The first columns are those produced by `summary_factorist()`. The appropriate regression model is chosen on the basis of the dependent variable type and other arguments passed. #### Logistic regression: glm() Of the form: `glm(depdendent ~ explanatory, family="binomial")` ```{r, warning=FALSE, message=FALSE} explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory) -> t3 knitr::kable(t3, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` When exported to PDF: #### Logistic regression with reduced model: glm() Where a multivariable model contains a subset of the variables included specified in the full univariable set, this can be specified. ```{r, warning=FALSE, message=FALSE} explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, explanatory_multi) -> t4 knitr::kable(t4, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` When exported to PDF: #### Mixed effects logistic regression: lme4::glmer() Of the form: `lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial")` Hierarchical/mixed effects/multilevel logistic regression models can be specified using the argument `random_effect`. At the moment it is just set up for random intercepts (i.e. `(1 | random_effect)`, but in the future I'll adjust this to accommodate random gradients if needed (i.e. `(variable1 | variable2)`. ```{r, warning=FALSE, message=FALSE} explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, explanatory_multi, random_effect) -> t5 knitr::kable(t5, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` When exported to PDF: #### Cox proportional hazards: survival::coxph() Of the form: `survival::coxph(dependent ~ explanatory)` ```{r, warning=FALSE, message=FALSE} explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% finalfit(dependent, explanatory) -> t6 knitr::kable(t6, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) ``` When exported to PDF: #### Add common model metrics to output `metrics=TRUE` provides common model metrics. The output is a list of two dataframes. Note chunk specification for output below. ```{r, warning=FALSE, message=FALSE} explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% finalfit(dependent, explanatory, metrics=TRUE) -> t7 knitr::kable(t7[[1]], row.names=FALSE, align=c("l", "l", "r", "r", "r", "r")) knitr::kable(t7[[2]], row.names=FALSE, col.names="") ``` When exported to PDF: #### Combine multiple models into single table Rather than going all-in-one, any number of subset models can be manually added on to a `summary_factorlist()` table using `finalfit_merge()`. This is particularly useful when models take a long-time to run or are complicated. Note the requirement for `fit_id=TRUE` in `summary_factorlist()`. `fit2df` extracts, condenses, and add metrics to supported models. ```{r, warning=FALSE, message=FALSE} explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") explanatory_multi = c("age.factor", "obstruct.factor") random_effect = "hospital" dependent = 'mort_5yr' # Separate tables colon_s %>% summary_factorlist(dependent, explanatory, fit_id=TRUE) -> example.summary colon_s %>% glmuni(dependent, explanatory) %>% fit2df(estimate_suffix=" (univariable)") -> example.univariable colon_s %>% glmmulti(dependent, explanatory) %>% fit2df(estimate_suffix=" (multivariable)") -> example.multivariable colon_s %>% glmmixed(dependent, explanatory, random_effect) %>% fit2df(estimate_suffix=" (multilevel)") -> example.multilevel # Pipe together example.summary %>% finalfit_merge(example.univariable) %>% finalfit_merge(example.multivariable) %>% finalfit_merge(example.multilevel, last_merge = TRUE) %>% dependent_label(colon_s, dependent, prefix="") -> t8 # place dependent variable label knitr::kable(t8, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r", "r")) ``` When exported to PDF: #### Bayesian logistic regression: with `stan` Our own particular `rstan` models are supported and will be documented in the future. Broadly, if you are running (hierarchical) logistic regression models in [Stan](https://mc-stan.org/users/interfaces/rstan) with coefficients specified as a vector labelled `beta`, then `fit2df()` will work directly on the `stanfit` object in a similar manner to if it was a `glm` or `glmerMod` object. ### 3. Summarise regression model results in plot Models can be summarized with odds ratio/hazard ratio plots using `or_plot`, `hr_plot` and `surv_plot`. #### OR plot ```{r, eval=FALSE} explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = 'mort_5yr' colon_s %>% or_plot(dependent, explanatory) # Previously fitted models (`glmmulti()` or # `glmmixed()`) can be provided directly to `glmfit` ``` #### HR plot ```{r, eval=FALSE} explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor") dependent = "Surv(time, status)" colon_s %>% hr_plot(dependent, explanatory, dependent_label = "Survival") # Previously fitted models (`coxphmulti`) can be provided directly using `coxfit` ``` #### Kaplan-Meier survival plots KM plots can be produced using the `library(survminer)` ```{r, eval=FALSE} explanatory = c("perfor.factor") dependent = "Surv(time, status)" colon_s %>% surv_plot(dependent, explanatory, xlab="Time (days)", pval=TRUE, legend="none") ``` ## Notes Use `ff_label()` to assign labels to variables for tables and plots. ```{r, eval=FALSE} colon_s %>% mutate( ff_label(age.factor, "Age (years)") ) ``` Export dataframe tables directly or to R Markdown `knitr::kable()`. Note wrapper `missing_pattern()` is also useful. Wraps `mice::md.pattern`. ```{r, warning=FALSE, message=FALSE} colon_s %>% missing_pattern(dependent, explanatory) ``` Development will be on-going, but any input appreciated.