Multiple Realization Methods

library(Colossus)
library(data.table)

Available Methods

There are many instances in which a user may want to use data that is known up to a distribution, in these cases it may not be sufficient to use a single realization. Colossus provides a method to run multiple regressions in a row, each with a different realization. This method was designed to assist in circumstances in which combinations of shared and unshared error are present, and a two-dimensional Monte Carlo method is used to generate realizations for at least one covariate column. Note that this method does not apply to the event/cases or interval/duration columns.

Specifics of Use

Let us suppose that we are interested in performing a regression with three covariates: an age bin, amount of exposure to a radioisotope during an interval, and average sleep during an interval. Age bin is known, exposure is known up to a distribution which could be shared among individuals, and sleep is randomly distributed and independent each interval.

$$ \begin{aligned} \lambda(a, r, s) = \exp{(\beta_a \times a + \beta_a \times r + \beta_a \times s)} \end{aligned} $$

names <- c("a", "r", "s")
term_n <- c(0, 0, 0)
tform <- c("loglin", "loglin", "loglin")
modelform <- "M"
fir <- 0

a_n <- c(0.1, 0.1, 0.1)

Colossus requires two items to apply multiple realizations, a list of columns to replace and a matrix of columns for each realization. Suppose the user generates 5 realizations. Then there are two distributed columns and five realizations.

dose_index <- c("r", "s") # the two columns in the model to replace are the radiation and sleeping covariates
dose_realizations <- matrix(c("r0", "r1", "r2", "r3", "r4", "s0", "s1", "s2", "s3", "s4"), nrow = 2) # columns to be used for realizations 0-4, rows for each column being replaced

Once Colossus finishes regressions for each realization, it returns matrices for the final parameter estimates, standard deviations, and final log-likelihoods of each regression. These results can be used to create likelihood-weighted parameter distributions, to account for uncertainty in the final parameter estimates.

Note that there is no strict requirement for the realization columns to actually be realizations of a distributed column. A user could use this method to compare models with the same formula but different covariates. The only difficulty is that Colossus does not refresh the parameter estimates between realization regressions, so the user may need to keep in mind feasible parameter space.