Title: | Measurement Error Modelling using MCEM |
---|---|
Description: | Fits measurement error models using Monte Carlo Expectation Maximization (MCEM). For specific details on the methodology, see: Greg C. G. Wei & Martin A. Tanner (1990) A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms, Journal of the American Statistical Association, 85:411, 699-704 <doi:10.1080/01621459.1990.10474930> For more examples on measurement error modelling using MCEM, see the 'RMarkdown' vignette: "'refitME' R-package tutorial". |
Authors: | Jakub Stoklosa [aut, cre] , Wenhan Hwang [aut, ctb], David Warton [aut, ctb] |
Maintainer: | Jakub Stoklosa <[email protected]> |
License: | GPL-2 |
Version: | 1.2.2 |
Built: | 2024-12-22 06:48:49 UTC |
Source: | CRAN |
MCEMfit_glm
objectsAn ANOVA function for fitted MCEMfit_glm
objects.
anova_MCEMfit_glm(object, ..., dispersion = NULL, test = NULL)
anova_MCEMfit_glm(object, ..., dispersion = NULL, test = NULL)
object |
: fitted model objects of class |
... |
: further arguments passed through to |
dispersion |
: the dispersion parameter for the fitting family. By default it is obtained from the object(s). |
test |
: a character string, (partially) matching one of " |
anova_MCEMfit_glm
produces output identical to anova.glm
.
Jakub Stoklosa, Wen-Han Hwang and David I. Warton.
refitME
objectsAn ANOVA function for fitted refitME
objects.
## S3 method for class 'refitME' anova(object, ..., dispersion = NULL, test = NULL)
## S3 method for class 'refitME' anova(object, ..., dispersion = NULL, test = NULL)
object |
: fitted model objects of class |
... |
: further arguments passed through to |
dispersion |
: the dispersion parameter for the fitting family. By default it is obtained from the object(s). |
test |
: a character string, (partially) matching one of " |
anova.refitME
produces output identical to anova.lm
, anova.glm
or anova.gam
.
Jakub Stoklosa, Wen-Han Hwang and David I. Warton.
Data set consisting of presence-only records for the plant species Corymbia eximia, site coordinates 5 covariates for each site.
Corymbiaeximiadata
Corymbiaeximiadata
A data set that contains: 8 columns with 86,316 observations (or sites). The columns are defined as follows:
X
Longitude coordinate.
Y
Latitude coordinate.
FC
Recorded number of fire counts for each site.
MNT
Recorded minimum temperatures for each site.
MXT
Recorded maximum temperature for each site.
Rain
Recorded rainfall for each site.
D.Main
Recorded distance from nearest major road.
Y.obs
Presences for the plant species Corymbia eximia for each site.
See Renner and Warton (2013) for full details on the data and study.
Renner, I. W. and Warton, D. I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics, 69, 274–281.
# Load the data. data(Corymbiaeximiadata)
# Load the data. data(Corymbiaeximiadata)
Data set consisting of records of male patients with coronary heart disease collected from the Framingham heart study. The Framinghamdata
data consists of binary responses and four predictor variables collected on 'n = 1615' patients.
Framinghamdata
Framinghamdata
A data set that contains: 5 columns with 1,615 observations. The columns are defined as follows:
Y
Response indicator (binary variable) of first evidence of CHD status of patient.
z1
Serum cholesterol level of patient.
z2
Age of patient.
z3
Smoking indicator - whether the patient smokes.
w1
Systolic blood pressure (SBP) of patient - this is the error contaminated variable, calculated from mean scores. The measurement error is 0.00630, see pp. 112 of Carroll et al. (2006).
See Carroll et al. (2006) for full details on the data and study. Also, see https://github.com/JakubStats/refitME for an RMarkdown vignette of an example that uses the data.
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. 2nd Ed. London: Chapman & Hall/CRC.
# Load the data. data(Framinghamdata)
# Load the data. data(Framinghamdata)
MCEMfit_lm
model objectsExtract log-Likelihoods for MCEMfit_lm
model objects. This function subtracts the entropy term from the observed likelihood.
logLik_MCEMfit_lm(object, REML = FALSE, ...)
logLik_MCEMfit_lm(object, REML = FALSE, ...)
object |
: fitted model objects of class |
REML |
: an optional logical value. If |
... |
: further arguments passed through to |
logLik_MCEMfit_lm
produces output identical to logLik
.
Jakub Stoklosa, Wen-Han Hwang and David I. Warton.
refitME
model objectsExtract log-Likelihoods for refitME
model objects. This function subtracts the entropy term from the observed likelihood.
## S3 method for class 'refitME' logLik(object, ...)
## S3 method for class 'refitME' logLik(object, ...)
object |
: fitted model objects of class |
... |
: further arguments passed through to |
logLik.refitME
produces identical output to logLik
but for refitME
model objects.
Jakub Stoklosa, Wen-Han Hwang and David I. Warton.
VGAM
capture-recapture (CR) model using the MCEM algorithmFunction for fitting VGAM
capture-recapture (CR) model using the MCEM algorithm where covariates have measurement error.
MCEMfit_CR(mod, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE)
MCEMfit_CR(mod, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE)
mod |
: a |
sigma.sq.u |
: measurement error (ME) variance. A scalar if there is only one error-contaminated predictor variable, otherwise this must be stored as a vector (of ME variances) or a matrix if the ME covariance matrix is known. |
B |
: the number of Monte Carlo replication values (default is set to 50). |
epsilon |
: a set convergence threshold (default is set to 0.00001). |
silent |
: if |
MCEMfit_CR
returns model coefficient and population size estimates with standard errors and the effective sample size.
This function is still under development. Currently the function can only fit the CR model used in the manuscript. IT DOES NOT SUPPORT ALL VGAM
families.
Jakub Stoklosa, Wen-Han Hwang and David I. Warton.
See https://github.com/JakubStats/refitME for an RMarkdown vignette with examples.
Stoklosa, J., Hwang, W-H., and Warton, D.I. refitME: Measurement Error Modelling using Monte Carlo Expectation Maximization in R.
# A VGAM example using the Prinia flaviventris capture-recapture data. library(refitME) library(VGAM) data(Priniadata) tau <- 17 # No. of capture occasions. w1 <- Priniadata$w1 # Bird wing length predictor. CR_naiv <- vglm(cbind(cap, noncap) ~ w1, VGAM::posbinomial(omit.constant = TRUE, parallel = TRUE ~ w1), data = Priniadata, trace = FALSE) sigma.sq.u <- 0.37 # ME variance. CR_MCEM <- refitME(CR_naiv, sigma.sq.u) detach(package:VGAM)
# A VGAM example using the Prinia flaviventris capture-recapture data. library(refitME) library(VGAM) data(Priniadata) tau <- 17 # No. of capture occasions. w1 <- Priniadata$w1 # Bird wing length predictor. CR_naiv <- vglm(cbind(cap, noncap) ~ w1, VGAM::posbinomial(omit.constant = TRUE, parallel = TRUE ~ w1), data = Priniadata, trace = FALSE) sigma.sq.u <- 0.37 # ME variance. CR_MCEM <- refitME(CR_naiv, sigma.sq.u) detach(package:VGAM)
gam
objectsFunction for wrapping the MCEM algorithm on GAMs where predictors are subject to measurement error/error-in-variables.
MCEMfit_gam( mod, family, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE, ... )
MCEMfit_gam( mod, family, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE, ... )
mod |
: a |
family |
: a specified family/distribution. |
sigma.sq.u |
: measurement error (ME) variance. A scalar if there is only one error-contaminated predictor variable, otherwise this must be stored as a vector (of ME variances) or a matrix if the ME covariance matrix is known. |
B |
: the number of Monte Carlo replication values (default is set to 50). |
epsilon |
: convergence threshold (default is set to 0.00001). |
silent |
: if |
... |
: further arguments passed to |
MCEMfit_gam
returns the original naive fitted model object but coefficient estimates and the covariance matrix have been replaced with the final MCEM model fit. Standard errors and the effective sample size (which diagnose how closely the proposal distribution matches the posterior, see equation (2) of Stoklosa, Hwang and Warton) have also been included as outputs.
Jakub Stoklosa, Wen-Han Hwang and David I. Warton.
See https://github.com/JakubStats/refitME for an RMarkdown vignette with examples.
Ganguli, B, Staudenmayer, J., and Wand, M. P. (2005). Additive models with predictors subject to measurement error. Australian & New Zealand Journal of Statistics, 47, 193–202.
Wand, M. P. (2018). SemiPar: Semiparametic Regression. R package version 1.0-4.2., URL https://CRAN.R-project.org/package=SemiPar.
Stoklosa, J., Hwang, W-H., and Warton, D.I. refitME: Measurement Error Modelling using Monte Carlo Expectation Maximization in R.
# A GAM example using the air pollution data set from the SemiPar package. library(refitME) library(SemiPar) library(mgcv) library(dplyr) data(milan.mort) dat.air <- sample_n(milan.mort, 100) # Takes a random sample of size 100. Y <- dat.air[, 6] # Mortality counts. n <- length(Y) z1 <- (dat.air[, 1]) z2 <- (dat.air[, 4]) z3 <- (dat.air[, 5]) w1 <- log(dat.air[, 9]) # The error-contaminated predictor (total suspended particles). dat <- data.frame(cbind(Y, w1, z1, z2, z3)) gam_naiv <- gam(Y ~ s(w1), family = "poisson", data = dat) sigma.sq.u <- 0.0915 # Measurement error variance. B <- 10 # Consider increasing this if you want a more accurate answer. gam_MCEM <- refitME(gam_naiv, sigma.sq.u, B) plot(gam_MCEM, select = 1) detach(package:mgcv)
# A GAM example using the air pollution data set from the SemiPar package. library(refitME) library(SemiPar) library(mgcv) library(dplyr) data(milan.mort) dat.air <- sample_n(milan.mort, 100) # Takes a random sample of size 100. Y <- dat.air[, 6] # Mortality counts. n <- length(Y) z1 <- (dat.air[, 1]) z2 <- (dat.air[, 4]) z3 <- (dat.air[, 5]) w1 <- log(dat.air[, 9]) # The error-contaminated predictor (total suspended particles). dat <- data.frame(cbind(Y, w1, z1, z2, z3)) gam_naiv <- gam(Y ~ s(w1), family = "poisson", data = dat) sigma.sq.u <- 0.0915 # Measurement error variance. B <- 10 # Consider increasing this if you want a more accurate answer. gam_MCEM <- refitME(gam_naiv, sigma.sq.u, B) plot(gam_MCEM, select = 1) detach(package:mgcv)
Function for wrapping the MCEM algorithm on any likelihood-based model where predictors are subject to measurement error/error-in-variables.
MCEMfit_gen( mod, family, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE, theta.est = 1, shape.est = 1, ... )
MCEMfit_gen( mod, family, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE, theta.est = 1, shape.est = 1, ... )
mod |
: a model object (this is the naive fitted model). Make sure the first |
family |
: a specified family/distribution. |
sigma.sq.u |
: measurement error (ME) variance. A scalar if there is only one error-contaminated predictor variable, otherwise this must be stored as a vector (of ME variances) or a matrix if the ME covariance matrix is known. |
B |
: the number of Monte Carlo replication values (default is set to 50). |
epsilon |
: a set convergence threshold (default is set to 0.00001). |
silent |
: if |
theta.est |
: an initial value for the dispersion parameter (this is required for fitting negative binomial models). |
shape.est |
: an initial value for the shape parameter (this is required for fitting gamma models). |
... |
: further arguments passed through to the function that was used to fit |
MCEMfit_gen
returns the original naive fitted model object but coefficient estimates and residuals have been replaced with the final MCEM model fit. Standard errors are included and returned, if mod
is a class of object accepted by the sandwich package (such as glm
, gam
, survreg
and many more).
Jakub Stoklosa, Wen-Han Hwang and David I. Warton.
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. 2nd Ed. London: Chapman & Hall/CRC.
Stoklosa, J., Hwang, W-H., and Warton, D.I. refitME: Measurement Error Modelling using Monte Carlo Expectation Maximization in R.
lm
or glm
objectsFunction for wrapping the MCEM algorithm on GLMs where predictors are subject to measurement error/error-in-variables.
MCEMfit_glm( mod, family, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE, ... )
MCEMfit_glm( mod, family, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE, ... )
mod |
: a |
family |
: a specified family/distribution. |
sigma.sq.u |
: measurement error (ME) variance. A scalar if there is only one error-contaminated predictor variable, otherwise this must be stored as a vector (of ME variances) or a matrix if the ME covariance matrix is known. |
B |
: the number of Monte Carlo replication values (default is set to 50). |
epsilon |
: a set convergence threshold (default is set to 0.00001). |
silent |
: if |
... |
: further arguments passed to |
MCEMfit_glm
returns the naive fitted model object where coefficient estimates, the covariance matrix, fitted values, the log-likelihood, and residuals have been replaced with the final MCEM model fit. Standard errors and the effective sample size (which diagnose how closely the proposal distribution matches the posterior, see equation (2) of Stoklosa, Hwang and Warton) have also been included as outputs.
Jakub Stoklosa, Wen-Han Hwang and David I. Warton.
See https://github.com/JakubStats/refitME for an RMarkdown vignette with examples.
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. 2nd Ed. London: Chapman & Hall/CRC.
Stoklosa, J., Hwang, W-H., and Warton, D.I. refitME: Measurement Error Modelling using Monte Carlo Expectation Maximization in R.
# A GLM example I - binary response data. library(refitME) data(Framinghamdata) glm_naiv <- glm(Y ~ w1 + z1 + z2 + z3, x = TRUE, family = binomial, data = Framinghamdata) # The error-contaminated predictor in this example is systolic blood pressure (w1). sigma.sq.u <- 0.006295 # ME variance, as obtained from Carroll et al. (2006) monograph. B <- 50 # The number of Monte Carlo replication values. glm_MCEM <- refitME(glm_naiv, sigma.sq.u, B)
# A GLM example I - binary response data. library(refitME) data(Framinghamdata) glm_naiv <- glm(Y ~ w1 + z1 + z2 + z3, x = TRUE, family = binomial, data = Framinghamdata) # The error-contaminated predictor in this example is systolic blood pressure (w1). sigma.sq.u <- 0.006295 # ME variance, as obtained from Carroll et al. (2006) monograph. B <- 50 # The number of Monte Carlo replication values. glm_MCEM <- refitME(glm_naiv, sigma.sq.u, B)
Data set consisting of capture-recapture histories 164 uniquely captured birds across 17 weekly capture occasions. Bird wing lengths were also measured in the study.
Priniadata
Priniadata
A data set that contains: 3 columns with 164 observations. The columns are defined as follows:
w1
Bird wing lengths.
cap
Number of times the individual was captured.
noncap
Number of times the individual was not captured.
See Hwang, Huang and Wang (2007) for full details on the data and study.
Hwang, W. H., Huang, S. Y. H., and Wang, C. (2007). Effects of measurement error and conditional score estimation in capture–recapture models. Statistica Sinica, 17, 301-316.
# Load the data. data(Priniadata)
# Load the data. data(Priniadata)
Function that extracts the fitted (naive) model object and wraps the MCEM algorithm to correct for measurement error/error-in-variables in predictors.
refitME(mod, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE, ...)
refitME(mod, sigma.sq.u, B = 50, epsilon = 1e-05, silent = FALSE, ...)
mod |
: any (S3 class) fitted object that responds to the generic functions |
sigma.sq.u |
: measurement error (ME) variance. A scalar if there is only one error-contaminated predictor variable, otherwise this must be stored as a vector (of known ME variances) or a matrix if the ME covariance matrix is known. |
B |
: the number of Monte Carlo replication values (default is set 50). |
epsilon |
: convergence threshold (default is set to 0.00001). |
silent |
: if |
... |
: further arguments passed through to the function that was used to fit |
refitME
returns the naive fitted model object where coefficient estimates, the covariance matrix, fitted values, the log-likelihood, and residuals have been replaced with the final MCEM model fit. Standard errors are included and returned, if mod
is a class of object accepted by the sandwich package (such as glm
, gam
, survreg
and many more). The effective sample size (which diagnose how closely the proposal distribution matches the posterior, see equation (2) of Stoklosa, Hwang and Warton) have also been included as outputs.
Jakub Stoklosa, Wen-Han Hwang and David I. Warton.
See https://github.com/JakubStats/refitME for an RMarkdown vignette with examples.
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. 2nd Ed. London: Chapman & Hall/CRC.
Stoklosa, J., Hwang, W-H., and Warton, D.I. refitME: Measurement Error Modelling using Monte Carlo Expectation Maximization in R.
MCEMfit_glm
, MCEMfit_gam
and MCEMfit_gen
# A GLM example I - binary response data. library(refitME) data(Framinghamdata) glm_naiv <- glm(Y ~ w1 + z1 + z2 + z3, x = TRUE, family = binomial, data = Framinghamdata) # The error-contaminated predictor variable in this example is systolic blood pressure (w1). sigma.sq.u <- 0.01259/2 # ME variance, as obtained from Carroll et al. (2006) monograph. B <- 50 # The number of Monte Carlo replication values. glm_MCEM <- refitME(glm_naiv, sigma.sq.u, B)
# A GLM example I - binary response data. library(refitME) data(Framinghamdata) glm_naiv <- glm(Y ~ w1 + z1 + z2 + z3, x = TRUE, family = binomial, data = Framinghamdata) # The error-contaminated predictor variable in this example is systolic blood pressure (w1). sigma.sq.u <- 0.01259/2 # ME variance, as obtained from Carroll et al. (2006) monograph. B <- 50 # The number of Monte Carlo replication values. glm_MCEM <- refitME(glm_naiv, sigma.sq.u, B)
This function replaces NA with zero for a matrix.
## S3 method for class 'na' sqrt(x)
## S3 method for class 'na' sqrt(x)
x |
: a matrix |
sqrt.na
returns a matrix.
Jakub Stoklosa
This function that calculates a weighted variance for a given vector.
wt.var(x, w)
wt.var(x, w)
x |
: a vector of numerical data. |
w |
: a vector of equal length to |
wt.var
returns a single value from analysis requested.
The developer of this function is Jeremy VanDerWal. See https://rdrr.io/cran/SDMTools/src/R/wt.mean.R
# Define simple data x = 1:25 # Set of numbers. wt = runif(25) # Some arbitrary weights. # Display variances (unweighted and then weighted). var(x) wt.var(x, wt)
# Define simple data x = 1:25 # Set of numbers. wt = runif(25) # Some arbitrary weights. # Display variances (unweighted and then weighted). var(x) wt.var(x, wt)