---
title: "IPD meta-analysis-with-missing-data"
author: "Michael Seo"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{IPD meta-analysis-with-missing-data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(comment = "#>", message=FALSE, tidy.opts=list(width.cutoff=60), tidy=TRUE, collapse = TRUE, warning = FALSE) 
```

## Fitting IPD meta-analysis model with missing data

In this vignette, we go over how to run IPD meta-analysis using models from this package when there is missing data. This would be a simple example showcasing a way of handling missing data. We will use multiple imputation using the mice package. First let's generate a dataset with some missingness.

```{r}
#install.packages("bipd")
#or devtools::install_github("MikeJSeo/bipd")
library(bipd)
set.seed(1)
simulated_dataset <- generate_sysmiss_ipdma_example(Nstudies = 10, Ncov = 5, sys_missing_prob = 0, magnitude = 0.5, heterogeneity = 0.1)

simulated_dataset_missing <- simulated_dataset
randomindex <- sample(c(TRUE,FALSE), dim(simulated_dataset_missing)[1], replace = TRUE, prob = c(0.2, 0.8))
randomindex2 <- sample(c(TRUE,FALSE), dim(simulated_dataset_missing)[1], replace = TRUE, prob = c(0.1, 0.9))
simulated_dataset_missing[randomindex,c("x1")] <- NA
simulated_dataset_missing[randomindex2,c("x3")] <- NA
head(simulated_dataset_missing)
```

Now we would like to create multiply imputed datasets. You can use mice package to create your own imputations or you can use pre-built imputation tools in this package. For demonstration, we will use the imputation tool in this package using '2l.pmm', which is predictive mean matching accounting for multilevel via mixed effects modelling. See miceadds package for details.

```{r, warning = FALSE, message = FALSE, results = 'hide', comment = FALSE}
library(miceadds) #for multilevel datasets without systematically missing predictors
imputation <- ipdma.impute(simulated_dataset_missing, covariates = c("x1", "x2", "x3", "x4", "x5"), typeofvar = c("continuous", "binary", "binary", "continuous", "continuous"), interaction = TRUE, studyname = "study", treatmentname = "treat", outcomename = "y", m = 5)  
```

Okay, so we have obtained 5 sets of imputed datasets with the call above. One convenient aspect of utilizing Bayesian methods under missing data is that to do a proper analysis, we can simply fit the IPD-MA model for each imputed dataset and merge the mcmc results. We do not need to use Rubin's rule to combine multiply imputed datasets. Just simple merging of mcmc.list would work.

```{r}
multiple.imputations <- imputation$imp.list
for(ii in 1:length(multiple.imputations)){
  
  current.data <- multiple.imputations[[ii]]
  
  X <- with(current.data, apply(current.data[,c("x1", "x2", "x3", "x4", "x5")],2, function(x) as.numeric(x)))
  
  ipd <- with(current.data, ipdma.model.onestage(y = y, study = study, treat = treat, X = X, response = "normal", shrinkage = "none"))
  
  #Run only 100 iterations for demonstration
  samples <- ipd.run(ipd, pars.save = c("beta", "gamma", "delta"), n.chains = 3, n.burnin = 100, n.iter = 100) 

  if(ii == 1){
    final.result <- samples
  } else{
    final.result <- add.mcmc(final.result, samples)
  }
}
```

We have written a convenient function 'mcmc.add' which combines mcmc.list. Having run the code above, we have mcmc.list containing all the results from each multiply imputed datasets. Now we can use this to summarize the findings.

```{r}
summary(final.result)
```

Also another important function in this package is finding the individual treatment effect. One additional precaution we need to take is determining what to use for the overall mean and standard deviation (sd). Instead of mean and sd of the imputed dataset, we need to calculate mean and sd of the original dataset. We have allowed the users to specify the overall mean (scale_mean) and overall sd (scale_sd) parameters in the treatment.effect function.

```{r}
X <- as.matrix(apply(simulated_dataset[, c("x1", "x2", "x3", "x4", "x5")], 2, as.numeric))
# calculate overall mean
overall_mean <- apply(X, 2, mean, na.rm = TRUE)
overall_sd <- apply(X, 2, sd)

treatment.effect(ipd, samples, newpatient = c(0.5, 1, 1, -0.5, 0.5), scale_mean = overall_mean, scale_sd = overall_sd)
```