Title: | Generalized Linear Mixed Model Analysis via Expectation Propagation |
---|---|
Description: | Approximate frequentist inference for generalized linear mixed model analysis with expectation propagation used to circumvent the need for multivariate integration. In this version, the random effects can be any reasonable dimension. However, only probit mixed models with one level of nesting are supported. The methodology is described in Hall, Johnstone, Ormerod, Wand and Yu (2018) <arXiv:1805.08423v1>. |
Authors: | Matt P. Wand [aut, cre], James C.F. Yu [aut] |
Maintainer: | Matt P. Wand <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0-3.1 |
Built: | 2024-12-14 06:41:37 UTC |
Source: | CRAN |
The machine learning paradigm known as expectation propagation, typically used for approximation inference in Bayesian graphical models contexts, is adapted to the problem of frequentist inference in generalized linear mixed model analysis. The essence is rapid approximation evaluation of the log-likelihood surface by using expectation propagation projections onto the Multivariate Normal family that circumvent the need for numerical integration. The approach applies to any reasonable dimension of the random effects vector, but currently is limited to probit mixed models with one level of nesting. The reference below provides full details.
glmmEP(y,Xfixed=NULL,Xrandom,idNum,control)
glmmEP(y,Xfixed=NULL,Xrandom,idNum,control)
y |
Vector containing the (0/1) response data. |
Xfixed |
Matrix with columns consisting of predictors that enter the model as fixed effects. Each entry of the first column equal the number 1 corresponding to the fixed effects intercept. |
Xrandom |
Matrix with columns consisting of predictors that enter the model as random effects. Each entry of the first column must equal the number 1 corresponding to the random effects intercept. |
idNum |
Vector containing group membership identification numbers. |
control |
Function for controlling convergence and other specifications. |
Matt Wand[email protected] and James Yu[email protected]
Hall, P., Johnstone, I.M., Ormerod, J.T., Wand, M.P. and Yu, J. (2018). Fast and accurate binary response mixed model analysis via expectation propagation. <arXiv:1805.08423v1>.
library(glmmEP) # Simulate data corresponding to a # a simple random intercept model: set.seed(1) ; m <- 100 ; n <- 10 beta0True <- 0.37 ; beta1True <- 0.93 ; sigmaTrue <- 0.73 uTrue <- rnorm(m)*sigmaTrue x <- runif(m*n) y <- rbinom(m*n,1,pnorm(beta0True + beta1True*x + crossprod(t(kronecker(diag(m),rep(1,n))),uTrue))) idNum <- rep(1:m,each=n) Xfixed <- cbind(1,x) Xrandom <- matrix(1,length(y),1) colnames(Xfixed) <- c("intercept","x") # Obtain and summarise glmmEP() fit: fit <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fit) # Obtain simulated data corresponding to the # simulation study in Section 4.1.2. of # Hall et al. (2018): dataObj <- glmmSimData(seed=54321) y <- dataObj$y Xfixed <- dataObj$Xfixed Xrandom <- dataObj$Xrandom idNum <- dataObj$idNum # Obtain the probit mixed model fit and summaries: fitSimData <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fitSimData) plot(fitSimData$randomEffects) ; abline(h=0) ; abline(v=0) if (require("mlmRev")) { # Example involving the data frame `Contraception' # in the package `mlmRev', starting with # setting up the input vectors and matrices # for a random intercepts model: data(Contraception) y <- as.numeric(as.character(Contraception$use)=="Y") Xfixed <- cbind(1,as.numeric(as.character(Contraception$urban)=="Y"), Contraception$age, as.numeric(as.character(Contraception$livch)=="1"), as.numeric(as.character(Contraception$livch)=="2"), as.numeric(as.character(Contraception$livch)=="3+")) colnames(Xfixed) <- c("intercept","isUrban","age","livChEq1", "livChEq2","livChGe3") Xrandom <- as.matrix(rep(1,length(y))) colnames(Xrandom) <- "intercept" idNum <- as.numeric(as.character(Contraception$district)) # Obtain random intercepts probit mixed model fit and summaries: fitContracRanInt <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fitContracRanInt) hist(as.numeric(fitContracRanInt$randomEffects), xlab="random intercepts predicted values",probability=TRUE, col="dodgerblue",breaks=15,main="") abline(v=0,col="slateblue",lwd=2) # Change `Xrandom' to correspond to a random intercepts and # slopes model: Xrandom <- cbind(1,as.numeric(as.character(Contraception$urban)=="Y")) colnames(Xrandom) <- c("intercept","isUrban") # Obtain random intercepts and slopes probit mixed model fit and summaries: fitContracRanIntAndSlp <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fitContracRanIntAndSlp) plot(fitContracRanIntAndSlp$randomEffects, col="dodgerblue",xlab="random intercepts predicted values", ylab="random slopes predicted values",bty="l") abline(v=0,col="slateblue",lwd=2); abline(h=0,col="slateblue",lwd=2) }
library(glmmEP) # Simulate data corresponding to a # a simple random intercept model: set.seed(1) ; m <- 100 ; n <- 10 beta0True <- 0.37 ; beta1True <- 0.93 ; sigmaTrue <- 0.73 uTrue <- rnorm(m)*sigmaTrue x <- runif(m*n) y <- rbinom(m*n,1,pnorm(beta0True + beta1True*x + crossprod(t(kronecker(diag(m),rep(1,n))),uTrue))) idNum <- rep(1:m,each=n) Xfixed <- cbind(1,x) Xrandom <- matrix(1,length(y),1) colnames(Xfixed) <- c("intercept","x") # Obtain and summarise glmmEP() fit: fit <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fit) # Obtain simulated data corresponding to the # simulation study in Section 4.1.2. of # Hall et al. (2018): dataObj <- glmmSimData(seed=54321) y <- dataObj$y Xfixed <- dataObj$Xfixed Xrandom <- dataObj$Xrandom idNum <- dataObj$idNum # Obtain the probit mixed model fit and summaries: fitSimData <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fitSimData) plot(fitSimData$randomEffects) ; abline(h=0) ; abline(v=0) if (require("mlmRev")) { # Example involving the data frame `Contraception' # in the package `mlmRev', starting with # setting up the input vectors and matrices # for a random intercepts model: data(Contraception) y <- as.numeric(as.character(Contraception$use)=="Y") Xfixed <- cbind(1,as.numeric(as.character(Contraception$urban)=="Y"), Contraception$age, as.numeric(as.character(Contraception$livch)=="1"), as.numeric(as.character(Contraception$livch)=="2"), as.numeric(as.character(Contraception$livch)=="3+")) colnames(Xfixed) <- c("intercept","isUrban","age","livChEq1", "livChEq2","livChGe3") Xrandom <- as.matrix(rep(1,length(y))) colnames(Xrandom) <- "intercept" idNum <- as.numeric(as.character(Contraception$district)) # Obtain random intercepts probit mixed model fit and summaries: fitContracRanInt <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fitContracRanInt) hist(as.numeric(fitContracRanInt$randomEffects), xlab="random intercepts predicted values",probability=TRUE, col="dodgerblue",breaks=15,main="") abline(v=0,col="slateblue",lwd=2) # Change `Xrandom' to correspond to a random intercepts and # slopes model: Xrandom <- cbind(1,as.numeric(as.character(Contraception$urban)=="Y")) colnames(Xrandom) <- c("intercept","isUrban") # Obtain random intercepts and slopes probit mixed model fit and summaries: fitContracRanIntAndSlp <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fitContracRanIntAndSlp) plot(fitContracRanIntAndSlp$randomEffects, col="dodgerblue",xlab="random intercepts predicted values", ylab="random slopes predicted values",bty="l") abline(v=0,col="slateblue",lwd=2); abline(h=0,col="slateblue",lwd=2) }
Function for optional use in calls to glmmEP()
to control convergence values and other specifications for expectation propagation-based fitting of generalized linear mixed models.
glmmEP.control(confLev=0.95,BFGSmaxit=500,BFGSreltol=1e-10, EPmaxit=100,EPreltol=1e-5,NMmaxit=100,NMreltol=1e-10, quiet=FALSE,preTransfData=TRUE)
glmmEP.control(confLev=0.95,BFGSmaxit=500,BFGSreltol=1e-10, EPmaxit=100,EPreltol=1e-5,NMmaxit=100,NMreltol=1e-10, quiet=FALSE,preTransfData=TRUE)
confLev |
Confidence level of confidence intervals expressed as a proportion (i.e. a number between 0 and 1). The default is 0.95 corresponding to 95% confidence intervals. |
BFGSmaxit |
Positive integer specifying the maximum number of iterations in the Broyden-Fletcher-Goldfarb-Shanno optimization phase. The default is 500. |
BFGSreltol |
Positive number specifying the relative tolerance value as defined in the R function optim() in the Broyden-Fletcher-Goldfarb-Shanno optimization phase. The default is 1e-10. |
EPmaxit |
Positive integer specifying the maximum number of iterations in the expectation propagation message passing iterations. The default is 100. |
EPreltol |
Positive number specifying the relative tolerance value for the expectation propagation message passing iterations. The default is 1e-5. |
NMmaxit |
Positive integer specifying the maximum number of iterations in the Nelder-Mead optimization phase. The default is 100. |
NMreltol |
Positive number specifying the relative tolerance value as defined in the R function optim() in the Nelder-Mead optimization phase. The default is 1e-10. |
quiet |
Flag for specifying whether or not glmmEP() runs ‘quietly’ or with progress reports printed to the screen. The default is FALSE. |
preTransfData |
Flag for specifying whether or not the predictor data are pre-transformed to the unit interval for fitting, with estimates, predictions and confidence intervals transformed to match the units of the original data before. The default is TRUE. |
A list containing values of each of the fifteen control parameters, packaged to supply the control
argument to glmmEP
. The values for glmmEP.control
can be specified in the call to glmmEP
.
Matt Wand[email protected] and James Yu[email protected]
Hall, P., Johnstone, I.M., Ormerod, J.T., Wand, M.P. and Yu, J.C.F. (2018). Fast and accurate binary response mixed model analysis via expectation propagation. Submitted.
library(glmmEP) # Obtain simulated data corresponding to the simulation study in Section 4.1.2. of # Hall et al. (2018): dataObj <- glmmSimData(seed=54321) y <- dataObj$y Xfixed <- dataObj$Xfixed Xrandom <- dataObj$Xrandom idNum <- dataObj$idNum # Obtain and summarise probit mixed model fit with user control # of some of the parameters in glmmEP.control(): myNMmaxit <- 500 ; myBFGSreltol <- 0.001 fitSimData <- glmmEP(y,Xfixed,Xrandom,idNum, control=glmmEP.control(NMmaxit=myNMmaxit,BFGSreltol=myBFGSreltol,quiet=TRUE)) summary(fitSimData)
library(glmmEP) # Obtain simulated data corresponding to the simulation study in Section 4.1.2. of # Hall et al. (2018): dataObj <- glmmSimData(seed=54321) y <- dataObj$y Xfixed <- dataObj$Xfixed Xrandom <- dataObj$Xrandom idNum <- dataObj$idNum # Obtain and summarise probit mixed model fit with user control # of some of the parameters in glmmEP.control(): myNMmaxit <- 500 ; myBFGSreltol <- 0.001 fitSimData <- glmmEP(y,Xfixed,Xrandom,idNum, control=glmmEP.control(NMmaxit=myNMmaxit,BFGSreltol=myBFGSreltol,quiet=TRUE)) summary(fitSimData)
The vignette of the glmmEP package is displayed using the default PDF file browser. It provides a detailed detailed description of use of the package for binary mixed model analysis.
glmmEPvignette()
glmmEPvignette()
Matt Wand[email protected] and James Yu[email protected]
glmmEPvignette()
glmmEPvignette()
Section 4.1.2 of the refence below descries a simulation study with data generated from a probit mixed model with six fixed effects parameters and a bivariate random effects vector having a 2 by 2 symmetric positive definite covariance matrix. The function simulates a data set from this model with 2500 groups and the number of observation in each group being a random draw from 20,21,...,30.
glmmSimData(seed=12345)
glmmSimData(seed=12345)
seed |
A positive integer which acts the seed for random data generation. |
Matt Wand[email protected] and James Yu[email protected]
Hall, P.,Johnstone, I.M., Ormerod, J.T., Wand, M.P. and Yu, J. (2018). Fast and accurate binary response mixed model analysis via expectation propagation. <arXiv:1805.08423v1>.
# Obtain simulated data corresponding to the simulation study in Section 4.1.2. of # Hall et al. (2018): library(glmmEP) dataObj <- glmmSimData(seed=54321) print(names(dataObj))
# Obtain simulated data corresponding to the simulation study in Section 4.1.2. of # Hall et al. (2018): library(glmmEP) dataObj <- glmmSimData(seed=54321) print(names(dataObj))
Produces a table containing approximated maximum likelihood estimates and corresponding confidence interval limits for the fixed effect parameters and random effects standard deviation and correlation parameters.
## S3 method for class 'glmmEP' summary(object,...)
## S3 method for class 'glmmEP' summary(object,...)
object |
A |
... |
Place-holder for additional arguments. |
Matt Wand[email protected] and James Yu[email protected]
dataObj <- glmmSimData(seed=54321) y <- dataObj$y Xfixed <- dataObj$Xfixed Xrandom <- dataObj$Xrandom idNum <- dataObj$idNum fitSimData <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fitSimData)
dataObj <- glmmSimData(seed=54321) y <- dataObj$y Xfixed <- dataObj$Xfixed Xrandom <- dataObj$Xrandom idNum <- dataObj$idNum fitSimData <- glmmEP(y,Xfixed,Xrandom,idNum) summary(fitSimData)