Title: | Composite Likelihood Inference and Diagnostics for Replicated Spatial Ordinal Data |
---|---|
Description: | Composite likelihood parameter estimate and asymptotic covariance matrix are calculated for the spatial ordinal data with replications, where spatial ordinal response with covariate and both spatial exponential covariance within subject and independent and identically distributed measurement error. Parameter estimation can be performed by either solving the gradient function or maximizing composite log-likelihood. Parametric bootstrapping is used to estimate the Godambe information matrix and hence the asymptotic standard error and covariance matrix with parallel processing option. Moreover, the proposed surrogate residual, which extends the results of Liu and Zhang (2017) <doi: 10.1080/01621459.2017.1292915>, can act as a useful tool for model diagnostics. |
Authors: | Ting Fung (Ralph) Ma [cre, aut], Pingping Wang [aut], Jun Zhu [aut], Dipankar Bandyopadhyay [ctb], Yincai Tang [ctb] |
Maintainer: | Ting Fung (Ralph) Ma <[email protected]> |
License: | GPL-2 |
Version: | 1.7.0 |
Built: | 2024-11-29 08:47:28 UTC |
Source: | CRAN |
cl
Calculate the negative composite log-likelihood value and score function for a particular subject given parameter value and other input variables.
cl(theta, y, X, dwdv, cmwdv, lt, wn, base, J, p)
cl(theta, y, X, dwdv, cmwdv, lt, wn, base, J, p)
theta |
a vector of parameter value. |
y |
a vector of observation for the subject. |
X |
covariate for the particular subject. |
dwdv |
corresponding distance of selected pair. |
cmwdv |
combination of the pairs included into the composite likelihood. |
lt |
number of parameter (i.e. length of theta). |
wn |
number of pairs with distance. |
base |
identity matrix with dimension |
J |
number of category among (ALL) observed response. |
p |
number of covariate (i.e. number of column of |
cl
returns a list: composite log-likelihood value and a vector of first-order partial derivatives for theta
.
cl_l
Calculate the negative composite log-likelihood value for a particular subject given parameter value and other input variables
cl_l(theta, y, X, dwdv, cmwdv, lt, wn, base, J, p)
cl_l(theta, y, X, dwdv, cmwdv, lt, wn, base, J, p)
theta |
a vector of parameter value |
y |
a vector of observation for the subject. |
X |
covariate for the particular subject |
dwdv |
corresponding distance of selected pair |
cmwdv |
combination of the pairs included into the composite likelihood |
lt |
number of parameter (i.e. length of theta) |
wn |
number of pairs |
base |
identity matrix with dimension |
J |
number of category among (ALL) observed response. |
p |
number of covariate (i.e. number of column of |
cl
returns a list: composite log-likelihood value and a vector of first-order partial derivatives for theta
.
cl.rord
Calculate the negative composite log-likelihood value for replications of spatial ordinal data at given value of parameter value.
Note that this function is not directly used in cle.rord
but illustration only.
cl.rord(theta, response, covar, location, radius = 4)
cl.rord(theta, response, covar, location, radius = 4)
theta |
a vector of parameter value |
response |
a matrix of observation (row: spatial site and column: subject). |
covar |
regression (design) matrix, including intercepts. |
location |
a matrix contains spatial location of sites within each subject |
radius |
radius for selecting pairs for the composite likelihood estimation. |
cl.rord
returns a list: negative composite log-likelihood, a vector of first-order partial derivatives for theta
.
set.seed(1203) n.subject <- 10 n.lat <- n.lon <- 10 n.site <- n.lat*n.lon beta <- c(1,2,-1) # First 1 here is the intercept midalpha <- c(1.15, 2.18) ; sigma2 <- 0.7 ; phi <- 0.8 true = c(midalpha,beta,sigma2,phi) Xi = rnorm(n.subject,0,1) ; Xj <- rbinom(n.site,1,0.6) VV <- matrix(NA, nrow = n.subject*n.site, ncol = 3) for(i in 1:n.subject){ for(j in 1:n.site){ VV[(i-1)*n.site+j,] <- c(1,Xi[i],Xj[j]) } } location = cbind(rep(seq(1,n.lat,length=n.lat),n.lat),rep(1:n.lon, each=n.lon)) sim.data <- sim.rord(n.subject, n.site, n.rep = 2, midalpha, beta, sigma2, phi, covar=VV, location) cl.rord(theta=true,response=sim.data[[1]], covar=VV, location, radius = 4)
set.seed(1203) n.subject <- 10 n.lat <- n.lon <- 10 n.site <- n.lat*n.lon beta <- c(1,2,-1) # First 1 here is the intercept midalpha <- c(1.15, 2.18) ; sigma2 <- 0.7 ; phi <- 0.8 true = c(midalpha,beta,sigma2,phi) Xi = rnorm(n.subject,0,1) ; Xj <- rbinom(n.site,1,0.6) VV <- matrix(NA, nrow = n.subject*n.site, ncol = 3) for(i in 1:n.subject){ for(j in 1:n.site){ VV[(i-1)*n.site+j,] <- c(1,Xi[i],Xj[j]) } } location = cbind(rep(seq(1,n.lat,length=n.lat),n.lat),rep(1:n.lon, each=n.lon)) sim.data <- sim.rord(n.subject, n.site, n.rep = 2, midalpha, beta, sigma2, phi, covar=VV, location) cl.rord(theta=true,response=sim.data[[1]], covar=VV, location, radius = 4)
cle.rord
Estimate parameters (including regression coefficient and cutoff) for replications of spatial ordinal data using pairwise likelihood approach.
cle.rord( response, covar, location, radius = 4, n.sim = 100, output = TRUE, SE = TRUE, parallel = FALSE, n.core = max(detectCores()/2, 1), ini.sp = c(0.5, 0.5), est.method = TRUE, maxiter = 100, rtol = 1e-06, factr = 1e+07 )
cle.rord( response, covar, location, radius = 4, n.sim = 100, output = TRUE, SE = TRUE, parallel = FALSE, n.core = max(detectCores()/2, 1), ini.sp = c(0.5, 0.5), est.method = TRUE, maxiter = 100, rtol = 1e-06, factr = 1e+07 )
response |
a matrix of observation (row: spatial site and column: subject). |
covar |
regression (design) matrix, including intercepts. |
location |
a matrix contains spatial location of sites within each subject. |
radius |
radius for selecting pairs for the composite likelihood estimation. |
n.sim |
number of simulation used for parametric bootstrapping (and hence used for asymptotic variance and standard error). |
output |
logical flag indicates whether printing out result (default: |
SE |
logical flag for detailed output. |
parallel |
logical flag indicates using parallel processing (default: |
n.core |
number of physical cores used for parallel processing (when |
ini.sp |
initial estimate for spatial parameter, |
est.method |
logical flag (default) |
maxiter |
maximum number of iterations in the root solving of gradient function (dafault: 100). |
rtol |
relative error tolerrance in the root solving of gradient function (default: 1e-6). |
factr |
reduction in the objective (-logCL) within this factor of the machine tolerance for L-BFGS-B (default: 1e7). |
Given vector of ordinal responses, the design matrix, spatial location for sites, weight radius (for pair selection), and the prespecified number of simulation used for estimating the Godambe information matrix. Initial estimate is obtained by fitting model without spatial dependence (using MASS::polr()
) and optional guess of spatial parameters. The function first estimates parameters of interest by either solving the gradient of composite log-likelihood using rootSolve::multiroot()
or maximize the composite log-likelihood by optim(..., method="L-BFGS-B")
. The asymptotic covariance matrix and standard error of parameters are then estimated by parametric boostrapping. Although the default root solving option is typically more efficient, it may encounter runtime error if negative value of is evaluated (and L-BFGS-B approach should be used).
cle.rord
returns a list contains:
vec.par
: a vector of estimator for ;
vec.se
: a vector of standard error for the estimator;
mat.asyvar
: estimated asymptotic covariance matrix for the estimator;
mat.Hessian
: Hessian matrix at the parameter estimate;
mat.J
: Sensitivity matrix estimated by parametric boostrapping; and
CLIC
: Composite likelihood information criterion (see help manual of clic()
for detail).
set.seed(1228) n.subject <- 20 n.lat <- n.lon <- 10 n.site <- n.lat*n.lon beta <- c(1,2,-1) # First 1 here is the intercept midalpha <- c(1.15, 2.18) ; phi <- 0.6 ; sigma2 <- 0.7 true <- c(midalpha,beta,phi,sigma2) Xi <- rnorm(n.subject,0,1) ; Xj <- rbinom(n.site,1,0.6) VV <- matrix(NA, nrow = n.subject*n.site, ncol = 3) for(i in 1:n.subject){ for(j in 1:n.site){ VV[(i-1)*n.site+j,] <- c(1,Xi[i],Xj[j]) } } location <- cbind(rep(seq(1,n.lat,length=n.lat),n.lat),rep(1:n.lon, each=n.lon)) sim.data <- sim.rord(n.subject, n.site, n.rep = 2, midalpha, beta, phi, sigma2, covar=VV, location) options(digits=3) result <- cle.rord(response=sim.data[[1]], covar=VV, location = location ,radius = 4, n.sim = 100, output = TRUE, parallel=TRUE, n.core =2) result$vec.par # alpha2 alpha3 beta0 beta1 beta2 phi sigma^2 # 1.249 2.319 1.169 1.990 -1.000 0.668 0.678 result$vec.se # alpha2 alpha3 beta0 beta1 beta2 phi sigma^2 # 0.0704 0.1201 0.1370 0.2272 0.0767 0.0346 0.1050
set.seed(1228) n.subject <- 20 n.lat <- n.lon <- 10 n.site <- n.lat*n.lon beta <- c(1,2,-1) # First 1 here is the intercept midalpha <- c(1.15, 2.18) ; phi <- 0.6 ; sigma2 <- 0.7 true <- c(midalpha,beta,phi,sigma2) Xi <- rnorm(n.subject,0,1) ; Xj <- rbinom(n.site,1,0.6) VV <- matrix(NA, nrow = n.subject*n.site, ncol = 3) for(i in 1:n.subject){ for(j in 1:n.site){ VV[(i-1)*n.site+j,] <- c(1,Xi[i],Xj[j]) } } location <- cbind(rep(seq(1,n.lat,length=n.lat),n.lat),rep(1:n.lon, each=n.lon)) sim.data <- sim.rord(n.subject, n.site, n.rep = 2, midalpha, beta, phi, sigma2, covar=VV, location) options(digits=3) result <- cle.rord(response=sim.data[[1]], covar=VV, location = location ,radius = 4, n.sim = 100, output = TRUE, parallel=TRUE, n.core =2) result$vec.par # alpha2 alpha3 beta0 beta1 beta2 phi sigma^2 # 1.249 2.319 1.169 1.990 -1.000 0.668 0.678 result$vec.se # alpha2 alpha3 beta0 beta1 beta2 phi sigma^2 # 0.0704 0.1201 0.1370 0.2272 0.0767 0.0346 0.1050
clic
Calculating the Composite likelihood information criterion proposed by Varin and Vidoni (2005)
clic(logCL, mat.hessian, mat.J)
clic(logCL, mat.hessian, mat.J)
logCL |
value of composite log-likelihood. |
mat.hessian |
hessian matrix. |
mat.J |
Sensitivity matrix |
Varin and Vidoni (2005) proposed the information criterion in the form:
CLIC
: Composite likelihood information criterion proposed by Varin and Vidoni (2005)
clic
: Composite likelihood information criterion proposed by Varin and Vidoni (2005)
Varin, C. and Vidoni, P. (2005) A note on composite likelihood inference and model selection. Biometrika 92: 519–528.
Composite Likelihood Calculation for Spatial Ordinal Data without Replications (for implmentation)
## S3 method for class 'list' merge(x, y = NULL, mergeUnnamed = TRUE)
## S3 method for class 'list' merge(x, y = NULL, mergeUnnamed = TRUE)
x |
an object to be merged into list of object. |
y |
an object to be merged into list. |
mergeUnnamed |
select an element if it has a.) an empty name in y and mergeUnnamed is true or b.) a name _not_ contained in x |
merge.list
returns a list: a merged list
sim.rord
Simulate replications of spatial ordinal data
sim.rord( n.subject, n.site, n.rep = 100, midalpha, beta, phi, sigma2, covar, location )
sim.rord( n.subject, n.site, n.rep = 100, midalpha, beta, phi, sigma2, covar, location )
n.subject |
number of subjects. |
n.site |
number of spatial sites for each subject. |
n.rep |
number of simulation. Parameter inputs include: |
midalpha |
cutoff parameter (excluding -Inf and +Inf); |
beta |
regression coefficient; |
phi |
dependence parameter for spatial dependence; and |
sigma2 |
sigma^2 (variance) for the spatial dependence. |
covar |
regression (design) matrix, including intercepts. |
location |
a matrix contains spatial location of sites within each subject. |
sim.rord
returns a list (length n.rep
) of matrix (n.subject*n.site
) with the underlying parameter as inputs.
set.seed(1203) n.subject <- 100 n.lat <- n.lon <- 10 n.site <- n.lat*n.lon beta <- c(1,2,-1) # First 1 here is the intercept midalpha <- c(1.15, 2.18) ; phi <- 0.8 ; sigma2 <- 0.7 true <- c(midalpha,beta,sigma2,phi) Xi <- rnorm(n.subject,0,1) ; Xj <- rbinom(n.site,1,0.6) VV <- matrix(NA, nrow = n.subject*n.site, ncol = 3) for(i in 1:n.subject){ for(j in 1:n.site){ VV[(i-1)*n.site+j,] <- c(1,Xi[i],Xj[j]) } } location <- cbind(rep(seq(1,n.lat,length=n.lat),n.lat),rep(1:n.lon, each=n.lon)) sim.data <- sim.rord(n.subject, n.site, n.rep = 2, midalpha, beta, phi, sigma2, covar=VV, location) length(sim.data) head(sim.data[[1]]) dim(sim.data[[1]]) hist(sim.data[[1]])
set.seed(1203) n.subject <- 100 n.lat <- n.lon <- 10 n.site <- n.lat*n.lon beta <- c(1,2,-1) # First 1 here is the intercept midalpha <- c(1.15, 2.18) ; phi <- 0.8 ; sigma2 <- 0.7 true <- c(midalpha,beta,sigma2,phi) Xi <- rnorm(n.subject,0,1) ; Xj <- rbinom(n.site,1,0.6) VV <- matrix(NA, nrow = n.subject*n.site, ncol = 3) for(i in 1:n.subject){ for(j in 1:n.site){ VV[(i-1)*n.site+j,] <- c(1,Xi[i],Xj[j]) } } location <- cbind(rep(seq(1,n.lat,length=n.lat),n.lat),rep(1:n.lon, each=n.lon)) sim.data <- sim.rord(n.subject, n.site, n.rep = 2, midalpha, beta, phi, sigma2, covar=VV, location) length(sim.data) head(sim.data[[1]]) dim(sim.data[[1]]) hist(sim.data[[1]])
surrogate.residual
simulate the surrogate residual with the the given parameter value and covariate for model diagnostics.
surrogate.residual( response, covar, location, seed = NULL, midalpha, beta, sigma2, phi, burn.in = 20, output = TRUE )
surrogate.residual( response, covar, location, seed = NULL, midalpha, beta, sigma2, phi, burn.in = 20, output = TRUE )
response |
a matrix of observation (row: spatial site and column: subject). |
covar |
regression (design) matrix, including intercepts. |
location |
a matrix contains spatial location of sites within each subject. |
seed |
seed input for simulation (default = |
midalpha |
cutoff for latent ordinal response. |
beta |
regression coefficient for |
sigma2 |
|
phi |
spatial correlation for exponential covariance. |
burn.in |
burn-in length (i.e. declaring the initial sample). |
output |
logical flag indicates whether printing out result (default: |
Given vector of observed responses, the design matrix, spatial location for sites and parameter value, raw surrogate residuals are simulated using an efficient Gibbs sampling, which can be used for model diagnostics. When the fitted model is correct, the raw surrogate residuals among subjects should follow multivariate normal with mean 0 and covariance Sigma. If the model is correct, residual plot should be close to a null plot or random scatter. For example, it can be used to check the potential missing in covariate, non-linearity of covariate and outliers. In particular for the example below, the residual plot shows that linearity of Xi is adequate for the model.
surrogate.residual
returns a (no. spatial site * no. subject) matrix contains
raw surrogate residuals with element corresponds to the response matrix.
set.seed(1228) n.subject <- 50 n.lat <- n.lon <- 10 n.site <- n.lat*n.lon beta <- c(1,2,-1) # First 1 here is the intercept midalpha <- c(1.15, 2.18) ; phi <- 0.6 ; sigma2 <- 0.7 true <- c(midalpha,beta,phi,sigma2) Xi <- rnorm(n.subject,0,1) ; Xj <- rbinom(n.site,1,0.6) VV <- matrix(NA, nrow = n.subject*n.site, ncol = 3) for(i in 1:n.subject){ for(j in 1:n.site){ VV[(i-1)*n.site+j,] <- c(1,Xi[i],Xj[j]) } } location <- cbind(rep(seq(1,n.lat,length=n.lat),n.lat),rep(1:n.lon, each=n.lon)) response <- sim.rord(n.subject, n.site, n.rep = 1, midalpha, beta, phi, sigma2, covar=VV, location)[[1]] # Example for linearity of covariate sur.resid <- surrogate.residual(response, covar=VV, location, seed =1, midalpha, beta, sigma2, phi, burn.in=20, output = TRUE) scatter.smooth(rep(Xi,each=n.site),c(sur.resid), main="Surrogate residual against Xi", xlab="Xi", ylab="Surrogate residual", lpars = list(col = "red", lwd = 3, lty = 2)) abline(h=0, col="blue")
set.seed(1228) n.subject <- 50 n.lat <- n.lon <- 10 n.site <- n.lat*n.lon beta <- c(1,2,-1) # First 1 here is the intercept midalpha <- c(1.15, 2.18) ; phi <- 0.6 ; sigma2 <- 0.7 true <- c(midalpha,beta,phi,sigma2) Xi <- rnorm(n.subject,0,1) ; Xj <- rbinom(n.site,1,0.6) VV <- matrix(NA, nrow = n.subject*n.site, ncol = 3) for(i in 1:n.subject){ for(j in 1:n.site){ VV[(i-1)*n.site+j,] <- c(1,Xi[i],Xj[j]) } } location <- cbind(rep(seq(1,n.lat,length=n.lat),n.lat),rep(1:n.lon, each=n.lon)) response <- sim.rord(n.subject, n.site, n.rep = 1, midalpha, beta, phi, sigma2, covar=VV, location)[[1]] # Example for linearity of covariate sur.resid <- surrogate.residual(response, covar=VV, location, seed =1, midalpha, beta, sigma2, phi, burn.in=20, output = TRUE) scatter.smooth(rep(Xi,each=n.site),c(sur.resid), main="Surrogate residual against Xi", xlab="Xi", ylab="Surrogate residual", lpars = list(col = "red", lwd = 3, lty = 2)) abline(h=0, col="blue")