Title: | Linear Model Fitting with LOD Covariates |
---|---|
Description: | Tools to fit linear regression model to data while taking into account covariates with lower limit of detection (LOD). |
Authors: | Kevin Donovan |
Maintainer: | Kevin Donovan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0 |
Built: | 2024-12-05 06:51:19 UTC |
Source: | CRAN |
Extracts estimates regression coefficients from object of class "lod_lm
".
## S3 method for class 'lod_lm' coef(object, ...)
## S3 method for class 'lod_lm' coef(object, ...)
object |
An object of class " |
... |
further arguments passed to or from other methods. |
Coefficients extracted from object
as a named numeric vector.
Kevin Donovan, [email protected].
Maintainer: Kevin Donovan <[email protected]>
May RC, Ibrahim JG, Chu H (2011). “Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits.” Statistics in medicine, 30(20), 2551–2561.
fitted.lod_lm
and residuals.lod_lm
for related methods; lod_lm
for model fitting.
The generic functions fitted
and residuals
.
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) coef(fit)
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) coef(fit)
Extracts fitted values from object of class "lod_lm
".
## S3 method for class 'lod_lm' fitted(object, ...)
## S3 method for class 'lod_lm' fitted(object, ...)
object |
An object of class " |
... |
further arguments passed to or from other methods. |
For subjects with covariates outside of limits of detection, when computing fitted values the values for these covariates are set according to method specified by argument fill_in_method
in call to lod_lm
.
Fitted values extracted from object
as a named numeric vector.
Kevin Donovan, [email protected].
Maintainer: Kevin Donovan <[email protected]>
May RC, Ibrahim JG, Chu H (2011). “Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits.” Statistics in medicine, 30(20), 2551–2561.
coef.lod_lm
and residuals.lod_lm
for related methods; lod_lm
for model fitting.
The generic functions coef
and residuals
.
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) fitted(fit)
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) fitted(fit)
LOD_bootstrap_fit
calls Rcpp code to compute linear model regression parameter standard errors in C++, taking into account covariates with limits of detection per the method detailed in May et al. (2011).
LOD_bootstrap_fit(num_of_boots, y_data, x_data, no_of_samples, threshold, max_iterations, LOD_u_l)
LOD_bootstrap_fit(num_of_boots, y_data, x_data, no_of_samples, threshold, max_iterations, LOD_u_l)
num_of_boots |
number denoting the number of bootstrap resamples to use to compute the regression parameter standard errors. |
y_data |
numeric vector consisting of data of the model's outcome variable. |
x_data |
column-named matrix consisting of data of the model's covariates with each column representing one covariate, with values outside of the limit(s) of detection marked as |
no_of_samples |
an integer specifying the number of samples to generate for each subject with covariate values outside of their limits of detection. For more details, see May et al. (2011). |
threshold |
number denoting the minimum difference in the regression parameter estimates needed for convergence of the model fitting procedure. |
max_iterations |
number denoting the maximum number of iterations allowed in the model fitting procedure. |
LOD_u_l |
numeric matrix consisting of the lower and upper limits of detection for all covariates in the model as the columns, with each covariate containing its own row, in the same order as the covariates in |
This function is used to complete the standard error computations done when fitting a linear model by calling lod_lm; the standard error computations are done in C++ to minimize computation time.
LOD_bootstrap_fit
returns a list which each component being a numeric vector consisting of the last iteration's regression parameter estimates when fitting the model on a bootstrap resample of the input data.
Kevin Donovan, [email protected].
Maintainer: Kevin Donovan <[email protected]>
May RC, Ibrahim JG, Chu H (2011). “Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits.” Statistics in medicine, 30(20), 2551–2561.
lod_lm
is the recommended function for fitting a linear model with covariates subject to limits of detection, which uses LOD_fit
. LOD_fit
is used to compute the regression parameter estimates.
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 # Replace values marked as under limit of detection using 0 with NA, # add column of ones for intercept lod_data_with_int <- as.matrix(cbind("Intercept"=rep(1, dim(lod_data_ex)[1]), lod_data_ex)) lod_data_ex_edit <- apply(lod_data_with_int, MARGIN = 2, FUN=function(x){ifelse(x==0, NA, x)}) # Fit model with bootstrap procedure, report regression parameter estimate standard errors LOD_matrix <- cbind(c(NA, NA, -100, -100), c(NA, NA, 0, 0)) ## no_of_samples set to 50 for computational speed/illustration purposes only. ## At least 250 is recommended. ## Same for num_of_boots=5; at least 25 is recommended bootstrap_fit_object <- LOD_bootstrap_fit(num_of_boots=5, y_data=lod_data_ex_edit[,2], x_data=lod_data_ex_edit[,-2], no_of_samples=50, threshold=0.001, max_iterations=100, LOD_u_l=LOD_matrix) boot_SEs <- apply(do.call("rbind", bootstrap_fit_object), 2, sd) names(boot_SEs) <- names(lod_data_with_int[,-2]) boot_SEs
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 # Replace values marked as under limit of detection using 0 with NA, # add column of ones for intercept lod_data_with_int <- as.matrix(cbind("Intercept"=rep(1, dim(lod_data_ex)[1]), lod_data_ex)) lod_data_ex_edit <- apply(lod_data_with_int, MARGIN = 2, FUN=function(x){ifelse(x==0, NA, x)}) # Fit model with bootstrap procedure, report regression parameter estimate standard errors LOD_matrix <- cbind(c(NA, NA, -100, -100), c(NA, NA, 0, 0)) ## no_of_samples set to 50 for computational speed/illustration purposes only. ## At least 250 is recommended. ## Same for num_of_boots=5; at least 25 is recommended bootstrap_fit_object <- LOD_bootstrap_fit(num_of_boots=5, y_data=lod_data_ex_edit[,2], x_data=lod_data_ex_edit[,-2], no_of_samples=50, threshold=0.001, max_iterations=100, LOD_u_l=LOD_matrix) boot_SEs <- apply(do.call("rbind", bootstrap_fit_object), 2, sd) names(boot_SEs) <- names(lod_data_with_int[,-2]) boot_SEs
A simulated dataset containing a generic outcome varible and three covariates, two of which are subject to a lower limit of detection of 0, with a sample size of 100. See Details for information on how these data were generated.
lod_data_ex
lod_data_ex
A data frame with 100 rows and 4 variables:
Outcome
First covariate , no limits of detection
Second covariate, lower limit of detection of 0
Third covariate, lower limit of detection of 0
Each of the covariates were generated independently from 100 independent draws from the standard normal distributon. The outcome variable was generated from a linear model with these three covariates, along with an intercept of 1, a residual variance of 1, and regression coefficients of 1 for each covariates. Then for two of the covariates, to reflect a lower limit of detection of 0, values below this limit were set to 0. This results in a 50 percent probability of being below the limit of detection for each of the two corresponding covariates.
LOD_fit
calls Rcpp code to compute linear model regression parameter estimates in C++, taking into account covariates with limits of detection per the method detailed in May et al. (2011).
LOD_fit(y_data, x_data, mean_x_preds, beta, sigma_2_y, sigma_x_preds, no_of_samples, threshold, max_iterations, LOD_u_l)
LOD_fit(y_data, x_data, mean_x_preds, beta, sigma_2_y, sigma_x_preds, no_of_samples, threshold, max_iterations, LOD_u_l)
y_data |
numeric vector consisting of data of the model's outcome variable |
x_data |
column-named matrix consisting of data of the model's covariates with each column representing one covariate, with values outside of the limit(s) of detection marked as |
mean_x_preds |
numeric vector consisting of initial estimates of the means for each covariate, in the same order as the covariates in |
beta |
numeric vector consisting of initial estimates of the regression parameters for each covariate, in the same order as the covariates in |
sigma_2_y |
an initial estimate of the variance of the outcome variable |
sigma_x_preds |
numeric matrix consisting of an initial estimate of the covariance matrix for the model's covariates, in the same order as the covariates in |
no_of_samples |
an integer specifying the number of samples to generate for each subject with covariate values outside of their limits of detection. For more details, see May et al. (2011). |
threshold |
number denoting the minimum difference in the regression parameter estimates needed for convergence of the model fitting procedure. |
max_iterations |
number denoting the maximum number of iterations allowed in the model fitting procedure. |
LOD_u_l |
numeric matrix consisting of the lower and upper limits of detection for all covariates in the model as the columns, with each covariate containing its own row, in the same order as the covariates in |
This function is used to complete the model fitting computations done when calling lod_lm; the fitting computations are done in C++ to minimize computation time.
LOD_fit
returns a list containing the following components:
y_expand_last_int |
a numeric vector consisting of the outcome data with duplicate entries for subjects with covariates outside of their limits of detection per the corresponding resampling procedure, from the last iteration of the model fitting procedure. |
x_data_return_last_int |
a numeric matrix consisting of the covariate data with sampled values for covariates of subjects with covariates outside of their limits of detection, from the last iteration of the model fitting procedure. |
beta_estimates |
a numeric matrix consisting of the regression parameter estimates from each iteration of the model fitting procedure. |
beta_estimate_last_iteration |
a numeric vector consisting of the regression parameter estimates from the last iteration of the model fitting procedure. |
Kevin Donovan, [email protected].
Maintainer: Kevin Donovan <[email protected]>
May RC, Ibrahim JG, Chu H (2011). “Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits.” Statistics in medicine, 30(20), 2551–2561.
lod_lm
is the recommended function for fitting a linear model with covariates subject to limits of detection, which uses LOD_fit
. LOD_bootstrap_fit
is used to compute regression parameter estimate standard errors using bootstrap resampling.
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 # Replace values marked as under limit of detection using 0 with NA, # add column of ones for intercept lod_data_with_int <- as.matrix(cbind("Intercept"=rep(1, dim(lod_data_ex)[1]), lod_data_ex)) lod_data_ex_edit <- data.frame(apply(lod_data_with_int, MARGIN = 2, FUN=function(x){ifelse(x==0, NA, x)})) # Fit linear model to dataset with only subjects without covariates under # limit of detection to get initial estimate for the regression parameters. beta_inital_est <- coef(lm(y~x1+x2+x3, data=lod_data_ex_edit)) # Get initial estimates of mean vector and covariance matrix for covariates and variance of outcome, # again using data from subjects without covariates under limit of detection mean_x_inital <- colMeans(lod_data_ex_edit[,c(-1,-2)], na.rm = TRUE) sigma_x_inital <- cov(lod_data_ex_edit[,c(-1,-2)], use="pairwise.complete.obs") sigma_2_y_inital <- sigma(lm(y~x1+x2+x3, data=lod_data_ex_edit))^2 # Fit model, report regression parameter estimates from last iteration LOD_matrix <- cbind(c(NA, NA, -100, -100), c(NA, NA, 0, 0)) ## no_of_samples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. fit_object <- LOD_fit(y_data=lod_data_ex_edit[,2], x_data=as.matrix(lod_data_ex_edit[,-2]), mean_x_preds=mean_x_inital, beta=beta_inital_est, sigma_2_y=sigma_2_y_inital, sigma_x_preds=sigma_x_inital, no_of_samples=100, threshold=0.001, max_iterations=100, LOD_u_l=LOD_matrix) fit_object$beta_estimate_last_iteration
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 # Replace values marked as under limit of detection using 0 with NA, # add column of ones for intercept lod_data_with_int <- as.matrix(cbind("Intercept"=rep(1, dim(lod_data_ex)[1]), lod_data_ex)) lod_data_ex_edit <- data.frame(apply(lod_data_with_int, MARGIN = 2, FUN=function(x){ifelse(x==0, NA, x)})) # Fit linear model to dataset with only subjects without covariates under # limit of detection to get initial estimate for the regression parameters. beta_inital_est <- coef(lm(y~x1+x2+x3, data=lod_data_ex_edit)) # Get initial estimates of mean vector and covariance matrix for covariates and variance of outcome, # again using data from subjects without covariates under limit of detection mean_x_inital <- colMeans(lod_data_ex_edit[,c(-1,-2)], na.rm = TRUE) sigma_x_inital <- cov(lod_data_ex_edit[,c(-1,-2)], use="pairwise.complete.obs") sigma_2_y_inital <- sigma(lm(y~x1+x2+x3, data=lod_data_ex_edit))^2 # Fit model, report regression parameter estimates from last iteration LOD_matrix <- cbind(c(NA, NA, -100, -100), c(NA, NA, 0, 0)) ## no_of_samples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. fit_object <- LOD_fit(y_data=lod_data_ex_edit[,2], x_data=as.matrix(lod_data_ex_edit[,-2]), mean_x_preds=mean_x_inital, beta=beta_inital_est, sigma_2_y=sigma_2_y_inital, sigma_x_preds=sigma_x_inital, no_of_samples=100, threshold=0.001, max_iterations=100, LOD_u_l=LOD_matrix) fit_object$beta_estimate_last_iteration
lod_lm
is used to fit linear models while taking into account limits of detection for corresponding covariates. It carries out the method detailed in May et al. (2011) with regression coefficient standard errors calculated using bootstrap resampling.
lod_lm(data, frmla, lod=NULL, var_LOD=NULL, nSamples = 250, fill_in_method="mean", convergenceCriterion = 0.001, boots = 25) ## S3 method for class 'lod_lm' print(x, ...)
lod_lm(data, frmla, lod=NULL, var_LOD=NULL, nSamples = 250, fill_in_method="mean", convergenceCriterion = 0.001, boots = 25) ## S3 method for class 'lod_lm' print(x, ...)
data |
a required data frame (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not specified, a corresponding error is returned. |
x |
An object of class " |
frmla |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'. |
lod |
a numeric vector (or object coercible by as.numeric) specifying the limit of detection for each covariates specified in |
var_LOD |
a character vector specifying which covariates in the model ( |
nSamples |
an integer specifying the number of samples to generate for each subject with covariate values outside of their limits of detection. For more details, see May et al. (2011). The default is 250. |
fill_in_method |
a string specifying how values outside of the limits of detection should be handled when calculating residuals and fitted values. Default is "mean", which uses the mean covariate value. Another choice is "LOD" which uses the lower limit of detection. |
convergenceCriterion |
a number specifying the smallest difference between iterations required for the regression coefficient estimation process to complete. The default is 0.001. |
boots |
a number specifying the number of bootstrap resamples used for the standard error estimation process for the regression coefficient estimates. The default is 25. |
... |
further arguments passed to or from other methods. |
Models for lod_lm
are specified the same as models for lm
. A typical model as the form response ~ terms
where response
is the (numeric) response vector and terms
is a series of terms separated by +
which specifies a linear predictor for response
. A formula has an implied intercept term.
In the dataset used with lod_lm, values outside of the limits of detection need to be denoted by the value of the lower limit of detection. Observations with values marked as missing by NA
are removed by the model fit procedure as done with lm
.
lod_lm
returns an object of class) "lod_lm
" if arguments lod
and var_LOD
are not NULL
, otherwise it returns class) "lm
". The function summary
prints a summary of the results in the same format as with an object of class) "lm
". The generic accessor functions coef
, fitted
and residuals
extract various useful features of the value returned by lod_lm
.
An object of class) "lod_lm
" is a list containing the following components:
coefficients |
a named vector of regression coefficient estimates. |
boot_SE |
a named vector of regression coefficient estimate bootstrap standard error estimates. |
fitted.values |
the fitted mean values for subjects with covariates within their limits of detection. |
rank |
the numeric rank of the fitted linear model |
residuals |
the residuals, that is response minus fitted values, for subjects with covariates within their limits of detection. |
df.residual |
the residual degrees of freedom. |
model |
the model frame used. |
call |
the matched call. |
terms |
the |
Kevin Donovan, [email protected].
Maintainer: Kevin Donovan <[email protected]>
May RC, Ibrahim JG, Chu H (2011). “Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits.” Statistics in medicine, 30(20), 2551–2561.
summary.lod_lm
for summaries of the results from lod_lm
The generic functions coef
, fitted
and residuals
.
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) summary(fit)
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) summary(fit)
Extracts residuals from object of class "lod_lm
".
## S3 method for class 'lod_lm' residuals(object, ...)
## S3 method for class 'lod_lm' residuals(object, ...)
object |
An object of class " |
... |
further arguments passed to or from other methods. |
For subjects with covariates outside of limits of detection, when computing residuals the values for these covariates are set according to method specified by argument fill_in_method
in call to lod_lm
.
Residuals extracted from object
as a named numeric vector.
Kevin Donovan, [email protected].
Maintainer: Kevin Donovan <[email protected]>
May RC, Ibrahim JG, Chu H (2011). “Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits.” Statistics in medicine, 30(20), 2551–2561.
fitted.lod_lm
and coef.lod_lm
for related methods; lod_lm
for model fitting.
The generic functions coef
and fitted
.
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) residuals(fit)
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) residuals(fit)
summary
method for class "lod_lm
"
## S3 method for class 'lod_lm' summary(object, ...) ## S3 method for class 'summary.lod_lm' print(x, ...)
## S3 method for class 'lod_lm' summary(object, ...) ## S3 method for class 'summary.lod_lm' print(x, ...)
object |
An object of class " |
x |
An object of class " |
... |
further arguments passed to or from other methods. |
print.summary.lod_lm
prints a table containing the coefficient estimates, standard errors, etc. from the lod_lm
fit.
The function summary.lod_lm
returns a list of summary statistics of the fitted linear model given in object
, using the components (list elements) "call
" and "terms
" from its argument, plus
residuals |
residuals computed by |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a vector |
Kevin Donovan, [email protected].
Maintainer: Kevin Donovan <[email protected]>
May RC, Ibrahim JG, Chu H (2011). “Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits.” Statistics in medicine, 30(20), 2551–2561.
The model fitting function lod_lm
, summary
.
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) summary(fit)
library(lodr) ## Using example dataset provided in lodr package: lod_data_ex ## 3 covariates: x1, x2, x3 with x2 and x3 subject to a lower limit of ## detection of 0 ## nSamples set to 100 for computational speed/illustration purposes only. ## At least 250 is recommended. Same for boots=0; results in NAs returned for standard errors fit <- lod_lm(data=lod_data_ex, frmla=y~x1+x2+x3, lod=c(0,0), var_LOD=c("x2", "x3"), nSamples=100, boots=0) summary(fit)