Title: | Factor, Bi-Factor, Second-Order and Factor Tree Copula Models |
---|---|
Description: | Estimation, model selection and goodness-of-fit of (1) factor copula models for mixed continuous and discrete data in Kadhem and Nikoloulopoulos (2021) <doi:10.1111/bmsp.12231>; (2) bi-factor and second-order copula models for item response data in Kadhem and Nikoloulopoulos (2023) <doi:10.1007/s11336-022-09894-2>; (3) factor tree copula models for item response data in Kadhem and Nikoloulopoulos (2022) <arXiv:2201.00339>. |
Authors: | Sayed H. Kadhem [aut], Aristidis K. Nikoloulopoulos [aut, cre] |
Maintainer: | Aristidis K. Nikoloulopoulos <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.9.3 |
Built: | 2024-12-10 06:42:54 UTC |
Source: | CRAN |
Estimation, model selection and goodness-of-fit of (1) factor copula models for mixed continuous and discrete data in Kadhem and Nikoloulopoulos (2021); (2) bi-factor and second-order copula models for item response data in Kadhem and Nikoloulopoulos (2023); (3) factor tree copula models for item response data in Kadhem and Nikoloulopoulos (2022).
This package contains R functions for:
diagnostics based on semi-correlations (Kadhem and Nikoloulopoulos, 2021, 2023; Joe, 2014) to detect tail dependence or tail asymmetry;
diagnostics to show that a dataset has a factor structure based on linear factor analysis (Kadhem and Nikoloulopoulos, 2021,2023; Joe, 2014);
estimation of the factor copula models in Krupskii and Joe (2013), Nikoloulopoulos and Joe (2015), and Kadhem and Nikoloulopoulos (2021);
estimation of the bi-factor and second-order copula models for item response data in Kadhem and Nikoloulopoulos (2023);
estimation of the factor tree copula models for item response data in Kadhem and Nikoloulopoulos (2022);
model selection of the factor copula models in Krupskii and Joe (2013), Nikoloulopoulos and Joe (2015);
model selection of the bifactor and second-order copula models in Kadhem and Nikoloulopoulos (2023);
model selection of the factor tree copula models in Kadhem and Nikoloulopoulos (2022);
goodness-of-fit of the factor copula models in Krupskii and Joe (2013), Nikoloulopoulos and Joe (2015), and Kadhem and Nikoloulopoulos (2021) using the statistic (Maydeu-Olivares and Joe, 2006). Note that the continuous and count data have to be transformed to ordinal;
goodness-of-fit of the bi-factor and second-order copula models in Kadhem and Nikoloulopoulos (2023) using the statistic (Maydeu-Olivares and Joe, 2006).
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos [email protected]
Joe, H. (2014). Dependence Modelling with Copulas. Chapman & Hall, London.
Maydeu-Olivares, A. and Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71, 713–732. doi:10.1007/s11336-005-1295-9.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
Krupskii, P. and Joe, H. (2013) Factor copula models for multivariate data. Journal of Multivariate Analysis, 120, 85–101. doi:10.1016/j.jmva.2013.05.001.
Nikoloulopoulos, A.K. and Joe, H. (2015) Factor copula models with item response data. Psychometrika, 80, 126–150. doi:10.1007/s11336-013-9387-4.
The diagnostic method in Joe (2014, pages 245-246) to show that each dataset has a factor structure based on linear factor analysis. The correlation matrix has been obtained based on the sample correlations from the bivariate pairs of the observed variables.
These are the linear (when both variables are continuous), polychoric (when both variables are ordinal), and polyserial (when one variable is continuous and the other is ordinal) sample correlations among the observed variables.
The resulting
is generally positive definite if the sample size is not small enough; if not one has to convert it to positive definite. We calculate various measures of discrepancy between
and
(the resulting correlation matrix of linear factor analysis), such as the maximum absolute correlation
difference
, the average absolute correlation
difference
, and the correlation matrix discrepancy measure
.
discrepancy(cormat, n, f3)
discrepancy(cormat, n, f3)
cormat |
|
n |
Sample size. |
f3 |
If TRUE, then the linear 3-factor analysis is fitted. |
A matrix with the calculated discrepancy measures for different number of factors.
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Joe, H. (2014). Dependence Modelling with Copulas. Chapman & Hall, London.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
#------------------------------------------------ # PE Data #------------------ ----------------- data(PE) #correlation continuous.PE1 <- -PE[,1] continuous.PE <- cbind(continuous.PE1, PE[,2]) u.PE <- apply(continuous.PE, 2, rank)/(nrow(PE)+1) z.PE <- qnorm(u.PE) categorical.PE <- data.frame(apply(PE[, 3:5], 2, factor)) nPE <- cbind(z.PE, categorical.PE) #------------------------------------------------- # Discrepancy measures---------------------------- #------------------------------------------------- #correlation matrix for mixed data cormat.PE <- as.matrix(polycor::hetcor(nPE, std.err=FALSE)) #discrepancy measures out.PE = discrepancy(cormat.PE, n = nrow(nPE), f3 = FALSE) #------------------------------------------------ #------------------------------------------------ # GSS Data #------------------ ----------------- data(GSS) attach(GSS) continuous.GSS <- cbind(INCOME,AGE) continuous.GSS <- apply(continuous.GSS, 2, rank)/(nrow(GSS)+1) z.GSS <- qnorm(continuous.GSS) ordinal.GSS <- cbind(DEGREE,PINCOME,PDEGREE) count.GSS <- cbind(CHILDREN,PCHILDREN) # Transforming the count variables to ordinal # count1 : CHILDREN count1 = count.GSS[,1] count1[count1 > 3] = 3 # count2: PCHILDREN count2 = count.GSS[,2] count2[count2 > 7] = 7 # Combining both transformed count variables ncount.GSS = cbind(count1, count2) # Combining ordinal and transformed count variables categorical.GSS <- cbind(ordinal.GSS, ncount.GSS) categorical.GSS <- data.frame(apply(categorical.GSS, 2, factor)) # combining continuous and categorical variables nGSS = cbind(z.GSS, categorical.GSS) #------------------------------------------------- # Discrepancy measures---------------------------- #------------------------------------------------- #correlation matrix for mixed data cormat.GSS <- as.matrix(polycor::hetcor(nGSS, std.err=FALSE)) #discrepancy measures out.GSS = discrepancy(cormat.GSS, n = nrow(nGSS), f3 = TRUE)
#------------------------------------------------ # PE Data #------------------ ----------------- data(PE) #correlation continuous.PE1 <- -PE[,1] continuous.PE <- cbind(continuous.PE1, PE[,2]) u.PE <- apply(continuous.PE, 2, rank)/(nrow(PE)+1) z.PE <- qnorm(u.PE) categorical.PE <- data.frame(apply(PE[, 3:5], 2, factor)) nPE <- cbind(z.PE, categorical.PE) #------------------------------------------------- # Discrepancy measures---------------------------- #------------------------------------------------- #correlation matrix for mixed data cormat.PE <- as.matrix(polycor::hetcor(nPE, std.err=FALSE)) #discrepancy measures out.PE = discrepancy(cormat.PE, n = nrow(nPE), f3 = FALSE) #------------------------------------------------ #------------------------------------------------ # GSS Data #------------------ ----------------- data(GSS) attach(GSS) continuous.GSS <- cbind(INCOME,AGE) continuous.GSS <- apply(continuous.GSS, 2, rank)/(nrow(GSS)+1) z.GSS <- qnorm(continuous.GSS) ordinal.GSS <- cbind(DEGREE,PINCOME,PDEGREE) count.GSS <- cbind(CHILDREN,PCHILDREN) # Transforming the count variables to ordinal # count1 : CHILDREN count1 = count.GSS[,1] count1[count1 > 3] = 3 # count2: PCHILDREN count2 = count.GSS[,2] count2[count2 > 7] = 7 # Combining both transformed count variables ncount.GSS = cbind(count1, count2) # Combining ordinal and transformed count variables categorical.GSS <- cbind(ordinal.GSS, ncount.GSS) categorical.GSS <- data.frame(apply(categorical.GSS, 2, factor)) # combining continuous and categorical variables nGSS = cbind(z.GSS, categorical.GSS) #------------------------------------------------- # Discrepancy measures---------------------------- #------------------------------------------------- #correlation matrix for mixed data cormat.GSS <- as.matrix(polycor::hetcor(nGSS, std.err=FALSE)) #discrepancy measures out.GSS = discrepancy(cormat.GSS, n = nrow(nGSS), f3 = TRUE)
Hoff (2007) analysed seven demographic variables of 464 male respondents to the 1994 General Social Survey. Of these seven, two were continuous (income and age of the respondents), three were ordinal with 5 categories (highest degree of the survey respondent, income and highest degree of respondent's parents), and two were count variables (number of children of the survey respondent and respondent's parents).
data(GSS)
data(GSS)
A data frame with 464 observations on the following 7 variables:
INCOME
Income of the respondent in 1000s of dollars, binned into 21 ordered categories.
DEGREE
Highest degree ever obtained (0:None, 1:HS, 2:Associates, 3:Bachelors, 4:Graduate).
CHILDREN
Number of children of the survey respondent.
PINCOME
Financial status of respondent's parents when respondent was 16 (on a 5-point scale).
PDEGREE
Highest degree of the survey respondent's parents (0:None, 1:HS, 2:Associates, 3:Bachelors, 4:Graduate).
PCHILDREN
Number of children of the survey respondent's parents - 1.
AGE
Age of the respondents in years.
Hoff, P. D. (2007). Extending the rank likelihood for semiparametric copula estimation. The Annals of Applied Statistics, 1, 265–283.
The limited information statistic (Maydeu-Olivares and Joe, 2006) of factor copula models for mixed continuous and discrete data.
M2.1F(tcontinuous, ordinal, tcount, cpar, copF1, gl) M2.2F(tcontinuous, ordinal, tcount, cpar, copF1, copF2, gl, SpC)
M2.1F(tcontinuous, ordinal, tcount, cpar, copF1, gl) M2.2F(tcontinuous, ordinal, tcount, cpar, copF1, copF2, gl, SpC)
tcontinuous |
|
ordinal |
|
tcount |
|
cpar |
A list of estimated copula parameters. |
copF1 |
|
copF2 |
|
gl |
Gauss legendre quardrature nodes and weights. |
SpC |
Special case for the 2-factor copula model with BVN copulas. Select a bivariate copula at the 2nd factor to be fixed to independence. e.g. "SpC = 1" to set the first copula at the 2nd factor to independence. |
The statistic has been developed for goodness-of-fit testing in multidimensional contingency tables by Maydeu-Olivares and Joe (2006).
Nikoloulopoulos and Joe (2015) have used the
statistic to assess the goodness-of-fit of factor copula models for ordinal data. We build on the aforementioned papers and propose a methodology to assess the overall goodness-of-fit of factor copula models for mixed continuous and discrete responses. Since the
statistic has been developed for multivariate ordinal data, we propose to first transform the continuous and count variables to ordinal and then calculate the
statistic at the maximum likelihood estimate before transformation.
A list containing the following components:
M2 |
The |
df |
|
p-value |
The resultant |
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Maydeu-Olivares, A. and Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71, 713–732. doi:10.1007/s11336-005-1295-9.
Nikoloulopoulos, A.K. and Joe, H. (2015) Factor copula models with item response data. Psychometrika, 80, 126–150. doi:10.1007/s11336-013-9387-4.
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE1 = -PE[,1] continuous.PE2 = PE[,2] continuous.PE <- cbind(continuous.PE1, continuous.PE2) categorical.PE <- PE[, 3:5] #------------------------------------------------ # Estimation #------------------ ----------------- #------------------ One-factor ----------------- # one-factor copula model cop1f.PE <- c("joe", "joe", "rjoe", "joe", "gum") est1factor.PE <- mle1factor(continuous.PE, categorical.PE, count=NULL, copF1=cop1f.PE, gl, hessian = TRUE) #------------------------------------------------ # M2 #------------------------------------------------ #Transforming the continuous to ordinal data: ncontinuous.PE = continuous2ordinal(continuous.PE, 5) # M2 statistic for the one-factor copula model: m2.1f.PE <- M2.1F(ncontinuous.PE, categorical.PE, tcount=NULL, cpar=est1factor.PE$cpar, copF1=cop1f.PE, gl)
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE1 = -PE[,1] continuous.PE2 = PE[,2] continuous.PE <- cbind(continuous.PE1, continuous.PE2) categorical.PE <- PE[, 3:5] #------------------------------------------------ # Estimation #------------------ ----------------- #------------------ One-factor ----------------- # one-factor copula model cop1f.PE <- c("joe", "joe", "rjoe", "joe", "gum") est1factor.PE <- mle1factor(continuous.PE, categorical.PE, count=NULL, copF1=cop1f.PE, gl, hessian = TRUE) #------------------------------------------------ # M2 #------------------------------------------------ #Transforming the continuous to ordinal data: ncontinuous.PE = continuous2ordinal(continuous.PE, 5) # M2 statistic for the one-factor copula model: m2.1f.PE <- M2.1F(ncontinuous.PE, categorical.PE, tcount=NULL, cpar=est1factor.PE$cpar, copF1=cop1f.PE, gl)
The limited information statistic (Maydeu-Olivares and Joe, 2006) of bi-factor and second-order copula models for item response data.
M2Bifactor(y,cpar, copnames1, copnames2, gl, ngrp, grpsize) M2Second_order(y,cpar, copnames1, copnames2, gl, ngrp, grpsize)
M2Bifactor(y,cpar, copnames1, copnames2, gl, ngrp, grpsize) M2Second_order(y,cpar, copnames1, copnames2, gl, ngrp, grpsize)
y |
|
cpar |
A list of estimated copula parameters. |
copnames1 |
For the bi-factor copula: |
copnames2 |
For the bi-factor copula: |
gl |
Gauss legendre quardrature nodes and weights. |
ngrp |
number of non-overlapping groups. |
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
The statistic has been developed for goodness-of-fit testing in multidimensional contingency tables by Maydeu-Olivares and Joe (2006). We use the
to assess the overall fit for the bi-factor and second-order copula models for item resposne data (Kadhem & Nikoloulopoulos, 2021).
A list containing the following components:
M2 |
The |
df |
|
p-value |
The resultant |
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
Maydeu-Olivares, A. and Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71, 713–732. doi:10.1007/s11336-005-1295-9.
#------------------------------------------------ # Setting quadreture points nq <- 15 gl <- gauss.quad.prob(nq) #------------------------------------------------ # TAS Data #------------------ ----------------- data(TAS) #using a subset of the data #group1: 1,3,6,7,9,13,14 grp1=c(1,3,6) #group2: 2,4,11,12,17 grp2=c(2,4,11) #group3: 5,8,10,15,16,18,19,20 grp3=c(5,8,10) #Rearrange items within testlets set.seed(123) i=sample(1:nrow(TAS),500) ydat=TAS[i,c(grp1,grp2,grp3)] d=ncol(ydat);d n=nrow(ydat);n #size of each group g1=length(grp1) g2=length(grp2) g3=length(grp3) grpsize=c(g1,g2,g3)#group size #number of groups ngrp=length(grpsize) #------------------------------------------------ # M2 #------------------------------------------------ #BI-FACTOR tauX0 = c(0.49,0.16,0.29,#0.09,0.47,0.49,0.30, 0.46,0.41,0.33,#0.29,0.24, 0.10,0.16,0.14)#,0.12,0.03,0.03,0.10,0.10) tauXg = c(0.09,0.37,0.23,#0.53,0.24,0.32,0.27, 0.53,0.58,0.20,#0.23,0.25,0.34,0.33, 0.30,0.19,0.24)#,0.29,0.43,0.26) copX0 = rep("bvt2", d) copXg = c(rep("rgum", g1), rep("bvt3", g2+g3)) #converting taus to cpars cparX0=mapply(function(x,y) tau2par(x,y),x=copX0,y=tauX0) cparXg=mapply(function(x,y) tau2par(x,y),x=copXg,y=tauXg) cpar=c(cparX0,cparXg) m2_Bifactor = M2Bifactor(y=ydat, cpar, copX0, copXg, gl, ngrp, grpsize)
#------------------------------------------------ # Setting quadreture points nq <- 15 gl <- gauss.quad.prob(nq) #------------------------------------------------ # TAS Data #------------------ ----------------- data(TAS) #using a subset of the data #group1: 1,3,6,7,9,13,14 grp1=c(1,3,6) #group2: 2,4,11,12,17 grp2=c(2,4,11) #group3: 5,8,10,15,16,18,19,20 grp3=c(5,8,10) #Rearrange items within testlets set.seed(123) i=sample(1:nrow(TAS),500) ydat=TAS[i,c(grp1,grp2,grp3)] d=ncol(ydat);d n=nrow(ydat);n #size of each group g1=length(grp1) g2=length(grp2) g3=length(grp3) grpsize=c(g1,g2,g3)#group size #number of groups ngrp=length(grpsize) #------------------------------------------------ # M2 #------------------------------------------------ #BI-FACTOR tauX0 = c(0.49,0.16,0.29,#0.09,0.47,0.49,0.30, 0.46,0.41,0.33,#0.29,0.24, 0.10,0.16,0.14)#,0.12,0.03,0.03,0.10,0.10) tauXg = c(0.09,0.37,0.23,#0.53,0.24,0.32,0.27, 0.53,0.58,0.20,#0.23,0.25,0.34,0.33, 0.30,0.19,0.24)#,0.29,0.43,0.26) copX0 = rep("bvt2", d) copXg = c(rep("rgum", g1), rep("bvt3", g2+g3)) #converting taus to cpars cparX0=mapply(function(x,y) tau2par(x,y),x=copX0,y=tauX0) cparXg=mapply(function(x,y) tau2par(x,y),x=copXg,y=tauXg) cpar=c(cparX0,cparXg) m2_Bifactor = M2Bifactor(y=ydat, cpar, copX0, copXg, gl, ngrp, grpsize)
Bivariate copulas: mapping of Kendall's tau and copula parameter.
par2tau(copulaname, cpar) tau2par(copulaname, tau)
par2tau(copulaname, cpar) tau2par(copulaname, tau)
copulaname |
Choices are “bvn” for BVN, “bvt |
cpar |
Copula parameter(s). |
tau |
Kendall's tau. |
Kendall's tau or copula parameter.
Joe H (1997) Multivariate Models and Dependence Concepts. Chapman & Hall, London.
Joe H (2014) Dependence Modeling with Copulas. Chapman & Hall, London.
Joe H (2014) CopulaModel: Dependence Modeling with Copulas. Software for book: Dependence Modeling with Copulas, Chapman & Hall, London 2014.
# 1-param copulas #BVN copula cpar.bvn = tau2par("bvn", 0.5) tau.bvn = par2tau("bvn", cpar.bvn) #Frank copula cpar.frk = tau2par("frk", 0.5) tau.frk = par2tau("frk", cpar.frk) #Gumbel copula cpar.gum = tau2par("gum", 0.5) tau.gum = par2tau("gum", cpar.gum) #Joe copula cpar.joe = tau2par("joe", 0.5) tau.joe = par2tau("joe", cpar.joe) # 2-param copulas #BB1 copula tau.bb1 = par2tau("bb1", c(0.5,1.5)) #BB7 copula tau.bb7 = par2tau("bb7", c(1.5,1)) #BB8 copula tau.bb8 = par2tau("bb8", c(3,0.8)) #BB10 copula tau.bb10 = par2tau("bb10", c(3,0.8))
# 1-param copulas #BVN copula cpar.bvn = tau2par("bvn", 0.5) tau.bvn = par2tau("bvn", cpar.bvn) #Frank copula cpar.frk = tau2par("frk", 0.5) tau.frk = par2tau("frk", cpar.frk) #Gumbel copula cpar.gum = tau2par("gum", 0.5) tau.gum = par2tau("gum", cpar.gum) #Joe copula cpar.joe = tau2par("joe", 0.5) tau.joe = par2tau("joe", cpar.joe) # 2-param copulas #BB1 copula tau.bb1 = par2tau("bb1", c(0.5,1.5)) #BB7 copula tau.bb7 = par2tau("bb7", c(1.5,1)) #BB8 copula tau.bb8 = par2tau("bb8", c(3,0.8)) #BB10 copula tau.bb10 = par2tau("bb10", c(3,0.8))
We use a two-stage etimation approach toward the estimation of factor copula models for mixed continuous and discrete data.
mle1factor(continuous, ordinal, count, copF1, gl, hessian, print.level) mle2factor(continuous, ordinal, count, copF1, copF2, gl, hessian, print.level) mle2factor.bvn(continuous, ordinal, count, copF1, copF2, gl, SpC, print.level)
mle1factor(continuous, ordinal, count, copF1, gl, hessian, print.level) mle2factor(continuous, ordinal, count, copF1, copF2, gl, hessian, print.level) mle2factor.bvn(continuous, ordinal, count, copF1, copF2, gl, SpC, print.level)
continuous |
|
ordinal |
|
count |
|
copF1 |
|
copF2 |
|
gl |
Gauss legendre quardrature nodes and weights. |
SpC |
Special case for the 2-factor copula model with BVN copulas. Select a bivariate copula at the 2nd factor to be fixed to independence. e.g. "SpC = 1" to set the first copula at the 2nd factor to independence. |
hessian |
If TRUE, the hessian of the negative log-likelihood is calculated during the minimization process. |
print.level |
Determines the level of printing which is done during the minimization process; same as in |
Estimation is achieved by maximizing the joint log-likelihood over the copula parameters with the univariate parameters/distributions fixed as estimated at the first step of the proposed two-step estimation approach.
A list containing the following components:
cutpoints |
The estimated univariate cutpoints. |
negbinest |
The estimated univariate parametes for the count responses (fitting the negative binomial distribution). |
loglik |
The maximized joint log-likelihood. |
cpar |
Estimated copula parameters in a list form. |
taus |
The estimated copula parameters in Kendall's tau scale. |
SEs |
The SEs of the Kendall's tau estimates. |
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Krupskii, P. and Joe, H. (2013) Factor copula models for multivariate data. Journal of Multivariate Analysis, 120, 85–101. doi:10.1016/j.jmva.2013.05.001.
Nikoloulopoulos, A.K. and Joe, H. (2015) Factor copula models with item response data. Psychometrika, 80, 126–150. doi:10.1007/s11336-013-9387-4.
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE1 = -PE[,1] continuous.PE2 = PE[,2] continuous.PE <- cbind(continuous.PE1, continuous.PE2) categorical.PE <- PE[, 3:5] #------------------------------------------------ # Estimation #------------------ ----------------- #------------------ One-factor ----------------- # one-factor copula model cop1f.PE <- c("joe", "joe", "rjoe", "joe", "gum") est1factor.PE <- mle1factor(continuous.PE, categorical.PE, count=NULL, copF1=cop1f.PE, gl, hessian = TRUE) est1factor.PE #------------------------------------------------ #------------------------------------------------ # GSS Data #------------------ ----------------- data(GSS) attach(GSS) continuous.GSS <- cbind(INCOME, AGE) ordinal.GSS <- cbind(DEGREE, PINCOME, PDEGREE) count.GSS <- cbind(CHILDREN, PCHILDREN) #------------------------------------------------ # Estimation #------------------ ----------------- #------------------ One-factor ----------------- # one-factor copula model cop1f.GSS <- c("joe","2rjoe","bvt3","bvt3", "rgum","2rjoe","2rgum") est1factor.GSS <- mle1factor(continuous.GSS, ordinal.GSS, count.GSS, copF1 = cop1f.GSS, gl, hessian = TRUE)
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE1 = -PE[,1] continuous.PE2 = PE[,2] continuous.PE <- cbind(continuous.PE1, continuous.PE2) categorical.PE <- PE[, 3:5] #------------------------------------------------ # Estimation #------------------ ----------------- #------------------ One-factor ----------------- # one-factor copula model cop1f.PE <- c("joe", "joe", "rjoe", "joe", "gum") est1factor.PE <- mle1factor(continuous.PE, categorical.PE, count=NULL, copF1=cop1f.PE, gl, hessian = TRUE) est1factor.PE #------------------------------------------------ #------------------------------------------------ # GSS Data #------------------ ----------------- data(GSS) attach(GSS) continuous.GSS <- cbind(INCOME, AGE) ordinal.GSS <- cbind(DEGREE, PINCOME, PDEGREE) count.GSS <- cbind(CHILDREN, PCHILDREN) #------------------------------------------------ # Estimation #------------------ ----------------- #------------------ One-factor ----------------- # one-factor copula model cop1f.GSS <- c("joe","2rjoe","bvt3","bvt3", "rgum","2rjoe","2rgum") est1factor.GSS <- mle1factor(continuous.GSS, ordinal.GSS, count.GSS, copF1 = cop1f.GSS, gl, hessian = TRUE)
We use a two-stage estimation approach toward the estimation of factor tree copula models for item response data.
mle1FactorTree(y, A, cop, gl, hessian, print.level) mle2FactorTree(y, A, cop, gl, hessian, print.level)
mle1FactorTree(y, A, cop, gl, hessian, print.level) mle2FactorTree(y, A, cop, gl, hessian, print.level)
y |
|
A |
|
cop |
|
gl |
Gauss legendre quardrature nodes and weights. |
hessian |
If TRUE, the hessian of the negative log-likelihood is calculated during the minimization process. |
print.level |
Determines the level of printing which is done during the minimization process; same as in |
Estimation is achieved by maximizing the joint log-likelihood over the copula parameters with the univariate cutpoints fixed as estimated at the first step of the proposed two-step estimation approach.
A list containing the following components:
cutpoints |
The estimated univariate cutpoints. |
loglik |
The maximized joint log-likelihood. |
taus |
The estimated copula parameters in Kendall's tau scale. |
SEs |
The SEs of the Kendall's tau estimates. |
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos [email protected]
Joe, H. (2014). Dependence Modelling with Copulas. Chapman & Hall, London.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022b) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
#------------------------------------------------ # Setting quadreture points nq <- 5 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PTSD Data #------------------ ----------------- data(PTSD) ydat=PTSD n=nrow(ydat) d=ncol(ydat) #------------------------------------------------ # Estimation #------------------ ----------------- #selecting vine tree based on polychoric rmat=polychoric0(ydat)$p A.polychoric=selectFactorTrVine(y=ydat,rmat,alg=3) #---------------- 1-factor tree ---------------- # 1-factor tree copula model copf1 <- rep("frk",d) coptree <- rep("frk",d-1) cop <- c(copf1,coptree) est1factortree <- mle1FactorTree(y=ydat, A=A.polychoric$VineTreeA, cop, gl, hessian=FALSE, print.level=2) #---------------- 2-factor tree ---------------- # 2-factor tree copula model copf1 <- rep("frk",d) copf2 <- rep("frk",d) coptree <- rep("frk",d-1) cop <- c(copf1,copf2,coptree) est2factortree <- mle2FactorTree(y=ydat, A=A.polychoric$VineTreeA, cop, gl, hessian=FALSE, print.level=2)
#------------------------------------------------ # Setting quadreture points nq <- 5 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PTSD Data #------------------ ----------------- data(PTSD) ydat=PTSD n=nrow(ydat) d=ncol(ydat) #------------------------------------------------ # Estimation #------------------ ----------------- #selecting vine tree based on polychoric rmat=polychoric0(ydat)$p A.polychoric=selectFactorTrVine(y=ydat,rmat,alg=3) #---------------- 1-factor tree ---------------- # 1-factor tree copula model copf1 <- rep("frk",d) coptree <- rep("frk",d-1) cop <- c(copf1,coptree) est1factortree <- mle1FactorTree(y=ydat, A=A.polychoric$VineTreeA, cop, gl, hessian=FALSE, print.level=2) #---------------- 2-factor tree ---------------- # 2-factor tree copula model copf1 <- rep("frk",d) copf2 <- rep("frk",d) coptree <- rep("frk",d-1) cop <- c(copf1,copf2,coptree) est2factortree <- mle2FactorTree(y=ydat, A=A.polychoric$VineTreeA, cop, gl, hessian=FALSE, print.level=2)
We approach the estimation of the bi-factor and second-order copula models for item response data with the IFM method of Joe (2005).
mleBifactor(y, copnames1, copnames2, gl, ngrp, grpsize, hessian, print.level) mleSecond_order(y, copnames1, copnames2, gl, ngrp, grpsize, hessian, print.level)
mleBifactor(y, copnames1, copnames2, gl, ngrp, grpsize, hessian, print.level) mleSecond_order(y, copnames1, copnames2, gl, ngrp, grpsize, hessian, print.level)
y |
|
copnames1 |
For the bi-factor copula: |
copnames2 |
For the bi-factor copula: |
gl |
Gauss legendre quardrature nodes and weights. |
ngrp |
number of non-overlapping groups. |
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
hessian |
If TRUE, the hessian of the negative log-likelihood is calculated during the minimization process. |
print.level |
Determines the level of printing which is done during the minimization process; same as in |
Estimation is achieved by maximizing the joint log-likelihood over the copula parameters with the univariate cutpoints fixed as estimated at the first step of the proposed two-step estimation approach.
A list containing the following components:
cutpoints |
The estimated univariate cutpoints. |
taus |
The estimated copula parameters in Kendall's tau scale. |
SEs |
The SEs of the Kendall's tau estimates. |
loglik |
The maximized joint log-likelihood. |
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Joe, H. (2005) Asymptotic efficiency of the two-stage estimation method for copula-based models. Journal of Multivariate Analysis, 94, 401–419. doi:10.1016/j.jmva.2004.06.003.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # TAS Data #------------------ ----------------- data(TAS) #using a subset of the data #group1: 1,3,6,7,9,13,14 grp1=c(1,3,6) #group2: 2,4,11,12,17 grp2=c(2,4,11) #group3: 5,8,10,15,16,18,19,20 grp3=c(5,8,10) #Rearrange items within testlets set.seed(123) i=sample(1:nrow(TAS),500) ydat=TAS[i,c(grp1,grp2,grp3)] d=ncol(ydat);d n=nrow(ydat);n #size of each group g1=length(grp1) g2=length(grp2) g3=length(grp3) grpsize=c(g1,g2,g3)#group size #number of groups ngrp=length(grpsize) #BI-FACTOR copX0 = rep("bvt2", d) copXg = c(rep("rgum", g1), rep("bvt3", g2+g3)) mle_Bifactor = mleBifactor(y = ydat, copX0, copXg, gl, ngrp, grpsize, hessian=FALSE, print.level=2)
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # TAS Data #------------------ ----------------- data(TAS) #using a subset of the data #group1: 1,3,6,7,9,13,14 grp1=c(1,3,6) #group2: 2,4,11,12,17 grp2=c(2,4,11) #group3: 5,8,10,15,16,18,19,20 grp3=c(5,8,10) #Rearrange items within testlets set.seed(123) i=sample(1:nrow(TAS),500) ydat=TAS[i,c(grp1,grp2,grp3)] d=ncol(ydat);d n=nrow(ydat);n #size of each group g1=length(grp1) g2=length(grp2) g3=length(grp3) grpsize=c(g1,g2,g3)#group size #number of groups ngrp=length(grpsize) #BI-FACTOR copX0 = rep("bvt2", d) copXg = c(rep("rgum", g1), rep("bvt3", g2+g3)) mle_Bifactor = mleBifactor(y = ydat, copX0, copXg, gl, ngrp, grpsize, hessian=FALSE, print.level=2)
Bivariate normal and Student cdfs with vectorized inputs
pbnorm(z1,z2,rho,icheck=FALSE) pbvt(z1,z2,param,icheck=FALSE)
pbnorm(z1,z2,rho,icheck=FALSE) pbvt(z1,z2,param,icheck=FALSE)
z1 |
scalar or vector of reals |
z2 |
scalar or vector of reals |
rho |
scalar or vector parameter in (-1,1); vectors cannot have different lengths if larger than 1, each of z1,z2,rho either has length 1 or a constant n greater than 1 |
param |
vector of length 2, or matrix with 2 columns; vectors and number of rows of matrix cannot be different if larger than 1; for param, first column is rho, second column is df. |
icheck |
T if checks are made for proper inputs, default of F |
Donnelly's code can be inaccurate in the tail when the tail probability is 2.e-9 or less (it sometimes returns 0). In the case the exchmvn code is used with dimension 2. Alternatively a user can use vectorized function pbivnorm() in the library pbivnorm, and write a function pbvncop based on it.
cdf value(s)
Donnelly TG (1973). Algorithm 462: bivariate normal distribution, Communications of the Association for Computing Machinery, 16, 638;
Joe H (2014) Dependence Modeling with Copulas. Chapman & Hall/CRC.
Joe H (2014) CopulaModel: Dependence Modeling with Copulas. Software for book: Dependence Modeling with Copulas, Chapman & Hall/CRC, 2014.
cat("\n pbnorm rho changing\n") z1=.3; z2=.4; rho=seq(-1,1,.1) out1=pbnorm(z1,z2,rho) print(cbind(rho,out1)) cat("\n pbnorm matrix inputs for z1, z2\n") rho=.4 z1=c(-.5,.5,10.) z2=c(-.4,.6,10.) z1=matrix(z1,3,3) z2=matrix(z2,3,3,byrow=TRUE) out3=pbnorm(z1,z2,rho) print(out3) cdf2=rbind(rep(0,3),out3) cdf2=cbind(rep(0,4),cdf2) pmf=apply(cdf2,2,diff) pmf2=apply(t(pmf),2,diff) pmf2=t(pmf2) # rectangle probabilities print(pmf2) cat("\n pbvt rho changing\n") z1=.3; z2=.4; rho=seq(-.9,.9,.1); nu=2 param=cbind(rho,rep(nu,length(rho))) out1=pbvt(z1,z2,param) print(cbind(rho,out1)) cat("\n pbvt z1 changing\n") z1=seq(-2,2,.4) z2=.4; rho=.5; nu=2 out2=pbvt(z1,z2,c(rho,nu)) print(cbind(z1,out2))
cat("\n pbnorm rho changing\n") z1=.3; z2=.4; rho=seq(-1,1,.1) out1=pbnorm(z1,z2,rho) print(cbind(rho,out1)) cat("\n pbnorm matrix inputs for z1, z2\n") rho=.4 z1=c(-.5,.5,10.) z2=c(-.4,.6,10.) z1=matrix(z1,3,3) z2=matrix(z2,3,3,byrow=TRUE) out3=pbnorm(z1,z2,rho) print(out3) cdf2=rbind(rep(0,3),out3) cdf2=cbind(rep(0,4),cdf2) pmf=apply(cdf2,2,diff) pmf2=apply(t(pmf),2,diff) pmf2=t(pmf2) # rectangle probabilities print(pmf2) cat("\n pbvt rho changing\n") z1=.3; z2=.4; rho=seq(-.9,.9,.1); nu=2 param=cbind(rho,rep(nu,length(rho))) out1=pbvt(z1,z2,param) print(cbind(rho,out1)) cat("\n pbvt z1 changing\n") z1=seq(-2,2,.4) z2=.4; rho=.5; nu=2 out2=pbvt(z1,z2,c(rho,nu)) print(cbind(z1,out2))
Quinn (2004) used 5 mixed variables, namely the continuous variable black-market premium in each country (used as a proxy for illegal economic activity), the continuous variable productivity as measured by real gross domestic product per worker in 1985 international prices, the binary variable independence of the national judiciary (1 if the judiciary is judged to be independent and 0 otherwise), and the ordinal variables measuring the lack of expropriation risk and lack of corruption.
data(PE)
data(PE)
A data frame with 62 observations (countries) on the following 5 variables:
BM
Black-market premium.
GDP
Gross domestic product.
IJ
Independent judiciary.
XPR
Lack of expropriation risk.
CPR
Lack of corruption.
Quinn, K. M. (2004). Bayesian factor analysis for mixed ordinal and continuous responses. Political Analysis, 12, 338–353.
Polychoric correlation
polychoric0(odat,iprint=FALSE,prlevel=0)
polychoric0(odat,iprint=FALSE,prlevel=0)
odat |
nxd matrix of ordinal responses in 0,...,(ncat-1) or 1,...,ncat |
iprint |
flag for printing of intermediate results, including bivariate tables for observed versus expected assuming discretized bivariate Gaussian |
prlevel |
print.level for nlm for numerical optimization |
Polychoric correlation for ordinal random variables. The number of categories can vary.
$polych is a polychoric correlation matrix based on two-stage estimate; $iposdef is an indicator if the 2-stage correlation matrix estimate is positive definite.
data(PTSD) ydat=PTSD rmat=polychoric0(ydat)$p
data(PTSD) ydat=PTSD rmat=polychoric0(ydat)$p
The measure consists of 20 categorical items divided into four domains: (1) intrusions (e.g., repeated, dis- turbing, and unwanted memories), (2) avoidance (e.g., avoiding external reminder of the stressful experience), (3) cognition and mood alterations (e.g., trouble remembering important parts of the stressful experience) and (4) reactivity alterations (e.g., taking too many risks or doing things that could cause you harm). Each item is answered in a five-point ordinal scale: “0 = Not at all”, “1 = A little bit”, “2 = Moderately”, “3 = Quite a bit”, and “4 = Extremely”.
data(PTSD)
data(PTSD)
A data frame with 221 observations and 20 items.
Williams, D. and Mulder, J. (2020). BGGM: Bayesian Gaussian Graphical Models. R package version 1.0.0.
Simulating standard uniform and ordinal response data from factor copula models.
r1factor(n, d1, d2, categ, theta, copF1) r2factor(n, d1, d2, categ, theta, delta, copF1, copF2)
r1factor(n, d1, d2, categ, theta, copF1) r2factor(n, d1, d2, categ, theta, delta, copF1, copF2)
n |
Sample size. |
d1 |
Number of standard uniform variables. |
d2 |
Number of ordinal variables. |
categ |
A vector of categories for the ordinal variables. |
theta |
Copula parameters for the 1st factor. |
delta |
Copula parameters for the 2nd factor. |
copF1 |
|
copF2 |
|
Data matrix of dimension , where
is the sample size, and
is the total number of variables.
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
# --------------------------------------------------- # --------------------------------------------------- # One-factor copula model # --------------------------------------------------- # --------------------------------------------------- #Sample size ---------------------------------------- n = 100 #Continuous Variables ------------------------------ d1 = 5 #Ordinal Variables --------------------------------- d2 = 3 #Categories for ordinal ---------------------------- categ = c(3,4,5) #Copula parameters --------------------------------- theta = rep(2, d1+d2) #Copula names -------------------------------------- copnamesF1 = rep("gum", d1+d2) #----------------- Simulating data ------------------ datF1 = r1factor(n, d1=d1, d2=d2, categ, theta, copnamesF1) #------------ Plotting continuous data ------------- pairs(qnorm(datF1[, 1:d1])) # --------------------------------------------------- # --------------------------------------------------- # Two-factor copula model # --------------------------------------------------- # --------------------------------------------------- #Sample size ---------------------------------------- n = 100 #Continuous Variables ------------------------------ d1 = 5 #Ordinal Variables --------------------------------- d2 = 3 #Categories for ordinal ---------------------------- categ = c(3,4,5) #Copula parameters --------------------------------- theta = rep(2.5, d1+d2) delta = rep(1.5, d1+d2) #Copula names -------------------------------------- copnamesF1 = rep("gum", d1+d2) copnamesF2 = rep("gum", d1+d2) #----------------- Simulating data ------------------ datF2 = r2factor(n, d1=d1, d2=d2, categ, theta, delta, copnamesF1, copnamesF2) #----------------- Plotting data ------------------ pairs(qnorm(datF2[,1:d1]))
# --------------------------------------------------- # --------------------------------------------------- # One-factor copula model # --------------------------------------------------- # --------------------------------------------------- #Sample size ---------------------------------------- n = 100 #Continuous Variables ------------------------------ d1 = 5 #Ordinal Variables --------------------------------- d2 = 3 #Categories for ordinal ---------------------------- categ = c(3,4,5) #Copula parameters --------------------------------- theta = rep(2, d1+d2) #Copula names -------------------------------------- copnamesF1 = rep("gum", d1+d2) #----------------- Simulating data ------------------ datF1 = r1factor(n, d1=d1, d2=d2, categ, theta, copnamesF1) #------------ Plotting continuous data ------------- pairs(qnorm(datF1[, 1:d1])) # --------------------------------------------------- # --------------------------------------------------- # Two-factor copula model # --------------------------------------------------- # --------------------------------------------------- #Sample size ---------------------------------------- n = 100 #Continuous Variables ------------------------------ d1 = 5 #Ordinal Variables --------------------------------- d2 = 3 #Categories for ordinal ---------------------------- categ = c(3,4,5) #Copula parameters --------------------------------- theta = rep(2.5, d1+d2) delta = rep(1.5, d1+d2) #Copula names -------------------------------------- copnamesF1 = rep("gum", d1+d2) copnamesF2 = rep("gum", d1+d2) #----------------- Simulating data ------------------ datF2 = r2factor(n, d1=d1, d2=d2, categ, theta, delta, copnamesF1, copnamesF2) #----------------- Plotting data ------------------ pairs(qnorm(datF2[,1:d1]))
Simulating item response data from the 1- and 2-factor tree copula models.
r1factortree(n, d, A, copname1, copnametree, theta1, delta,K) r2factortree(n, d, A, copname1, copname2, copnametree,theta1, theta2, delta,K)
r1factortree(n, d, A, copname1, copnametree, theta1, delta,K) r2factortree(n, d, A, copname1, copname2, copnametree,theta1, theta2, delta,K)
n |
Sample size. |
d |
Number of observed variables/items. |
A |
|
theta1 |
copula parameter vector of size |
theta2 |
copula parameter vector of size |
delta |
copula parameter vector of size |
copname1 |
A name of a bivariate copula that link each of the oberved variabels with the first factor (note only a single copula family for all items with the factor). Choices are “bvn” for BVN, “bvt |
copname2 |
A name of a bivariate copula that link each of the oberved variabels with the second factor (note only a single copula family for all items with the factor). Choices are “bvn” for BVN, “bvt |
copnametree |
A name of a bivariate copula that link each of the oberved variabels with one another given the factors in the 1-truncated vine (note only a single copula family for all tree). Choices are “bvn” for BVN, “bvt |
K |
Number of categories for the observed variables/items. |
Data matrix of dimension , where
is the sample size, and
is the total number of observed variables/items.
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos [email protected]
Joe, H. (2014). Dependence Modelling with Copulas. Chapman & Hall, London.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022b) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
# --------------------------------------------------- # --------------------------------------------------- #Sample size n = 500 #Ordinal Variables --------------------------------- d = 5 #Categories for ordinal ---------------------------- K = 5 # --------------------------------------------------- # 1-2-factor tree copula model # --------------------------------------------------- #Copula parameters theta1 = rep(3, d) theta2 = rep(2, d) delta = rep(1.5, d-1) #Copula names copulaname_1f = "gum" copulaname_2f = "gum" copulaname_vine = "gum" #vine array #Dvine d=5 A=matrix(0,d,d) A[1,]=c(1,c(1:(d-1))) diag(A)=1:d #----------------- Simulating data ------------------ #1-factor tree copula data_1ft = r1factortree(n, d, A, copulaname_1f, copulaname_vine, theta1, delta,K) #2-factor tree copula data_2ft = r2factortree(n, d, A, copulaname_1f, copulaname_2f, copulaname_vine, theta1,theta2, delta,K)
# --------------------------------------------------- # --------------------------------------------------- #Sample size n = 500 #Ordinal Variables --------------------------------- d = 5 #Categories for ordinal ---------------------------- K = 5 # --------------------------------------------------- # 1-2-factor tree copula model # --------------------------------------------------- #Copula parameters theta1 = rep(3, d) theta2 = rep(2, d) delta = rep(1.5, d-1) #Copula names copulaname_1f = "gum" copulaname_2f = "gum" copulaname_vine = "gum" #vine array #Dvine d=5 A=matrix(0,d,d) A[1,]=c(1,c(1:(d-1))) diag(A)=1:d #----------------- Simulating data ------------------ #1-factor tree copula data_1ft = r1factortree(n, d, A, copulaname_1f, copulaname_vine, theta1, delta,K) #2-factor tree copula data_2ft = r2factortree(n, d, A, copulaname_1f, copulaname_2f, copulaname_vine, theta1,theta2, delta,K)
Simulating item response data from the bi-factor and second-order copula models for item response data.
rBifactor(n, d, grpsize, categ, copnames1,copnames2, theta1, theta2) rSecond_order(n, d, grpsize, categ, copnames1, copnames2, theta1, theta2)
rBifactor(n, d, grpsize, categ, copnames1,copnames2, theta1, theta2) rSecond_order(n, d, grpsize, categ, copnames1, copnames2, theta1, theta2)
n |
Sample size. |
d |
Number of observed variables/items. |
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
categ |
A vector of categories for the observed variables/items. |
theta1 |
For the bi-factor model: copula parameter vector of size |
theta2 |
For the bi-factor model: copula parameter vector of size |
copnames1 |
For the bi-factor copula: |
copnames2 |
For the bi-factor copula: |
Data matrix of dimension , where
is the sample size, and
is the total number of observed variables/items.
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
# --------------------------------------------------- # --------------------------------------------------- #Sample size n = 500 #Ordinal Variables --------------------------------- d = 9 grpsize=c(3,3,3) ngrp=length(grpsize) #Categories for ordinal ---------------------------- categ = rep(3,d) # --------------------------------------------------- # --------------------------------------------------- # Bi-factor copula model # --------------------------------------------------- # --------------------------------------------------- #Copula parameters theta = rep(2.5, d) delta = rep(1.5, d) #Copula names copulanames1 = rep("gum", d) copulanames2 = rep("gum", d) #----------------- Simulating data ------------------ data_Bifactor = rBifactor(n, d, grpsize, categ, copulanames1, copulanames2, theta, delta) # --------------------------------------------------- # --------------------------------------------------- # Second-order copula model # --------------------------------------------------- # --------------------------------------------------- #Copula parameters theta= rep(1.5, ngrp) delta = rep(2.5, d) #Copula names copulanames1 = rep("gum", ngrp) copulanames2 = rep("gum", d) data_Second_order = rSecond_order(n, d, grpsize, categ, copulanames1, copulanames2, theta, delta)
# --------------------------------------------------- # --------------------------------------------------- #Sample size n = 500 #Ordinal Variables --------------------------------- d = 9 grpsize=c(3,3,3) ngrp=length(grpsize) #Categories for ordinal ---------------------------- categ = rep(3,d) # --------------------------------------------------- # --------------------------------------------------- # Bi-factor copula model # --------------------------------------------------- # --------------------------------------------------- #Copula parameters theta = rep(2.5, d) delta = rep(1.5, d) #Copula names copulanames1 = rep("gum", d) copulanames2 = rep("gum", d) #----------------- Simulating data ------------------ data_Bifactor = rBifactor(n, d, grpsize, categ, copulanames1, copulanames2, theta, delta) # --------------------------------------------------- # --------------------------------------------------- # Second-order copula model # --------------------------------------------------- # --------------------------------------------------- #Copula parameters theta= rep(1.5, ngrp) delta = rep(2.5, d) #Copula names copulanames1 = rep("gum", ngrp) copulanames2 = rep("gum", d) data_Second_order = rSecond_order(n, d, grpsize, categ, copulanames1, copulanames2, theta, delta)
A heuristic algorithm that automatically selects the bivariate parametric copula families that link the observed to the latent variables.
select1F(continuous, ordinal, count, copnamesF1, gl) select2F(continuous, ordinal, count, copnamesF1, copnamesF2, gl)
select1F(continuous, ordinal, count, copnamesF1, gl) select2F(continuous, ordinal, count, copnamesF1, copnamesF2, gl)
continuous |
|
ordinal |
|
count |
|
copnamesF1 |
A vector with the names of possible candidates of bivariate copulas that link the each of the oberved variabels with the 1st factor. Choices are “bvn” for BVN, “bvt |
copnamesF2 |
A list with the names of possible candidates of bivariate copulas that link the each of the oberved variabels with the 1st and 2nd factors. Choices are “bvn” for BVN, “bvt |
gl |
Gauss legendre quardrature nodes and weights. |
The linking copulas at each factor are selected with a sequential algorithm under the initial assumption that linking copulas are Frank, and then sequentially copulas with non-tail quadrant independence are assigned to any of pairs where necessary to account for tail asymmetry (discrete data) or tail dependence (continuous data).
A list containing the following components:
``1st factor'' |
The selected bivariate linking copulas for the 1st factor. |
``2nd factor'' |
The selected bivariate linking copulas for the 2nd factor. |
AIC |
Akaike information criterion. |
taus |
The estimated copula parameters in Kendall's tau scale. |
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
#------------------------------------------------ # Estimation #------------------ ----------------- # Setting quadreture points nq<-25 gl<-gauss.quad.prob(nq) #------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE1 = -PE[,1] continuous.PE <- cbind(continuous.PE1, PE[,2]) categorical.PE <- PE[, 3:5] #------------------ One-factor ----------------- # listing the possible copula candidates: d <- ncol(PE) copulasF1 <- rep(list(c("bvn", "bvt3", "bvt5", "frk", "gum", "rgum", "rjoe","joe", "1rjoe","2rjoe", "1rgum","2rgum")), d) out1F.PE <- select1F(continuous.PE, categorical.PE, count=NULL, copulasF1, gl)
#------------------------------------------------ # Estimation #------------------ ----------------- # Setting quadreture points nq<-25 gl<-gauss.quad.prob(nq) #------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE1 = -PE[,1] continuous.PE <- cbind(continuous.PE1, PE[,2]) categorical.PE <- PE[, 3:5] #------------------ One-factor ----------------- # listing the possible copula candidates: d <- ncol(PE) copulasF1 <- rep(list(c("bvn", "bvt3", "bvt5", "frk", "gum", "rgum", "rjoe","joe", "1rjoe","2rjoe", "1rgum","2rgum")), d) out1F.PE <- select1F(continuous.PE, categorical.PE, count=NULL, copulasF1, gl)
Heuristic algorithms that automatically select the bivariate parametric copula families and 1-truncated vine tree structure for the 1- and 2-factor tree copula models for item response data.
selectFactorTrVine(y,rmat,alg) copselect1FactorTree(y, A, copnames, gl) copselect2FactorTree(y, A, copnames, gl)
selectFactorTrVine(y,rmat,alg) copselect1FactorTree(y, A, copnames, gl) copselect2FactorTree(y, A, copnames, gl)
y |
|
rmat |
Polychoric correlation matrix of y. |
alg |
1-truncated vine selection using partial and polychoric correlations. Choose a number from (1,2,3). 1: Partial correlation selection algorithm for 1-factor tree copula. 2: Partial correlation selection algorithm for 2-factor tree copula. 3: Polychoric correlation selection algorithm for 1-factor/2-factor tree copulas, here any other correlation matrix (rmat) can be used. |
A |
|
copnames |
A vector with the names of possible candidates of bivariate copulas to be selected for the 1-factor/2-factor copula models for item response data. Choices are “bvn” for BVN, “bvt |
gl |
Gauss legendre quardrature nodes and weights. |
The model selection involves two separate steps. At the first step we select the 1-truncated vine or Markov tree structure assuming our models are constructed with BVN copulas. We find the maximum spanning tree which is a tree on all nodes that maximizes the pairwise dependencies using the well-known algorithm of Prim (1957). That is we find the tree with edges
that minimizes
The minimum spanning tree algorithm of Prim (1957)
guarantees to find the optimal solution when edge weights between nodes are given by
.
We use two different measures of pairwise dependence
. The first measure is the estimated polychoric correlation (Olsson, 1979). The second measures of pairwise dependence is the partial correlation which is based on the factor copula models in Nikoloulopoulos and Joe (2015) with BVN copulas; for more details see Kadhem and Nikoloulopoulos (2022). We call polychoric and partial correlation selection algorithm when the pairwise dependence is the polychoric and partial correlation, respectively.
At the the second step, we sequentially select suitable bivariate copulas to account for any tail dependence/asymmetry. We start from the 1st tree that includes copulas connecting the observed variables to the first factor and then we iterate over a set of copula candidates and select suitable bivariate copulas in the subsequent trees. We select only one copula family for all the edges in each tree to ease interpretation.
A list containing the following components:
``1Factor copulas'' |
The selected bivariate linking copulas for the first factor. |
``2Factor copulas'' |
The selected bivariate linking copulas for the second factor. |
``Vine tree copulas'' |
The selected bivariate linking copulas for the 1-truncated vine tree. |
``log-likelihood'' |
The maximized joint log-likelihood. |
``estimated taus'' |
The estimated copula parameters in Kendall's tau scale. |
F1treeA |
Vine array for the 1-factor tree copula. |
F2treeA |
Vine array for the 2-factor tree copula. |
pmat_f1 |
Partial correlation matrix for the 1-factor tree copula. |
pmat_f2 |
Partial correlation matrix for the 2-factor tree copula. |
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos [email protected]
Fontenla, M. (2014). optrees: Optimal Trees in Weighted Graphs. R package version 1.0.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
Nikoloulopoulos, A.K. and Joe, H. (2015) Factor copula models with item response data. Psychometrika, 80, 126–150. doi:10.1007/s11336-013-9387-4.
Olsson, F. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44, 443–460.
Prim, R. C. (1957) Shortest connection networks and some generalizations. The Bell System Technical Journal, 36, 1389–1401.
#------------------------------------------------ # Setting quadreture points nq <- 5 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PTSD Data #------------------ ----------------- data(PTSD) ydat=PTSD #------------------------------------------------ #vine tree structure #selecting vine tree based on polychoric corr rmat=polychoric0(ydat)$p A.polychoric=selectFactorTrVine(y=ydat,rmat,alg=3) #selecting bivariate copulas for the 1-2-factor tree # listing the possible copula candidates: copnames=c("gum", "rgum") f1tree_model = copselect1FactorTree(ydat, A.polychoric$VineTreeA, copnames, gl) f2tree_model = copselect2FactorTree(ydat, A.polychoric$VineTreeA, copnames, gl)
#------------------------------------------------ # Setting quadreture points nq <- 5 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PTSD Data #------------------ ----------------- data(PTSD) ydat=PTSD #------------------------------------------------ #vine tree structure #selecting vine tree based on polychoric corr rmat=polychoric0(ydat)$p A.polychoric=selectFactorTrVine(y=ydat,rmat,alg=3) #selecting bivariate copulas for the 1-2-factor tree # listing the possible copula candidates: copnames=c("gum", "rgum") f1tree_model = copselect1FactorTree(ydat, A.polychoric$VineTreeA, copnames, gl) f2tree_model = copselect2FactorTree(ydat, A.polychoric$VineTreeA, copnames, gl)
A heuristic algorithm that automatically selects the bivariate parametric copula families for the bi-factor and second-order copula models for item response data.
selectBifactor(y, grpsize, copnames, gl) selectSecond_order(y, grpsize, copnames, gl)
selectBifactor(y, grpsize, copnames, gl) selectSecond_order(y, grpsize, copnames, gl)
y |
|
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
copnames |
A vector with the names of possible candidates of bivariate copulas to be selected for the bi-factor and second-order copula models for item response data. Choices are “bvn” for BVN, “bvt |
gl |
Gauss legendre quardrature nodes and weights. |
The linking copulas at each factor are selected with a sequential algorithm under the initial assumption that linking copulas are BVN, and then sequentially copulas with tail dependence are assigned to any of pairs where necessary to account for tail asymmetry.
A list containing the following components:
``common factor'' |
The selected bivariate linking copulas for the common factor (Bi-factor: copulas link items with the common factor. Second-order: copulas link group-specific factors with the common factor). |
``group-specific factor'' |
The selected bivariate linking copulas for the items with group-speicifc factors. |
log-likelihood |
The maximized joint log-likelihood. |
taus |
The estimated copula parameters in Kendall's tau scale. |
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
#------------------------------------------------ # Setting quadreture points nq <- 15 gl <- gauss.quad.prob(nq) #------------------------------------------------ # TAS Data #------------------ ----------------- data(TAS) #using a subset of the data #group1: 1,3,6,7,9,13,14 grp1=c(1,3) #group2: 2,4,11,12,17 grp2=c(2,4) #group3: 5,8,10,15,16,18,19,20 grp3=c(5,8) #Rearrange items within testlets set.seed(123) i=sample(1:nrow(TAS),500) ydat=TAS[i,c(grp1,grp2,grp3)] #size of each group g1=length(grp1) g2=length(grp2) g3=length(grp3) grpsize=c(g1,g2,g3) # listing possible copula candidates: copnames=c("bvt2","bvt3") Bifactor_model = selectBifactor(ydat, grpsize, copnames, gl)
#------------------------------------------------ # Setting quadreture points nq <- 15 gl <- gauss.quad.prob(nq) #------------------------------------------------ # TAS Data #------------------ ----------------- data(TAS) #using a subset of the data #group1: 1,3,6,7,9,13,14 grp1=c(1,3) #group2: 2,4,11,12,17 grp2=c(2,4) #group3: 5,8,10,15,16,18,19,20 grp3=c(5,8) #Rearrange items within testlets set.seed(123) i=sample(1:nrow(TAS),500) ydat=TAS[i,c(grp1,grp2,grp3)] #size of each group g1=length(grp1) g2=length(grp2) g3=length(grp3) grpsize=c(g1,g2,g3) # listing possible copula candidates: copnames=c("bvt2","bvt3") Bifactor_model = selectBifactor(ydat, grpsize, copnames, gl)
The sample versions of the correlation , upper semi-correlation
(correlation in the joint upper quadrant) and lower semi-correlation
(correlation in the joint lower quadrant). These are sample linear (when both variables are continuous), polychoric (when both variables are ordinal), and polyserial (when one variable is continuous and the
other is ordinal) correlations.
semicorr(dat,type)
semicorr(dat,type)
dat |
Data frame of mixed continuous and ordinal data. |
type |
A vector with 1's for the location of continuous data and 2's for the location of ordinal data. |
A matrix containing the following components for semicorr():
rho |
|
lrho |
|
urho |
|
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Joe, H. (2014). Dependence Modelling with Copulas. Chapman and Hall/CRC.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
#------------------------------------------------ # PE Data #------------------ ----------------- data(PE) #correlation continuous.PE1 <- -PE[,1] continuous.PE <- cbind(continuous.PE1, PE[,2]) categorical.PE <- data.frame(apply(PE[, 3:5], 2, factor)) nPE <- cbind(continuous.PE, categorical.PE) #------------------------------------------------- # Semi-correlations------------------------------- #------------------------------------------------- # Exclude the dichotomous variable sem.PE = nPE[,-3] semicorr.PE = semicorr(dat = sem.PE, type = c(1,1,2,2)) #------------------------------------------------ #------------------------------------------------ # GSS Data #------------------ ----------------- data(GSS) attach(GSS) continuous.GSS <- cbind(INCOME,AGE) ordinal.GSS <- cbind(DEGREE,PINCOME,PDEGREE) count.GSS <- cbind(CHILDREN,PCHILDREN) # Transforming the COUNT variables to ordinal # count1 : CHILDREN count1 = count.GSS[,1] count1[count1 > 3] = 3 # count2: PCHILDREN count2 = count.GSS[,2] count2[count2 > 7] = 7 # Combining both transformed count variables ncount.GSS = cbind(count1, count2) # Combining ordinal and transformed count variables categorical.GSS <- cbind(ordinal.GSS, ncount.GSS) categorical.GSS <- data.frame(apply(categorical.GSS, 2, factor)) # combining continuous and categorical variables nGSS = cbind(continuous.GSS, categorical.GSS) #------------------------------------------------- # Semi-correlations------------------------------- #------------------------------------------------- semicorr.GSS = semicorr(dat = nGSS, type = c(1, 1, rep(2,5)))
#------------------------------------------------ # PE Data #------------------ ----------------- data(PE) #correlation continuous.PE1 <- -PE[,1] continuous.PE <- cbind(continuous.PE1, PE[,2]) categorical.PE <- data.frame(apply(PE[, 3:5], 2, factor)) nPE <- cbind(continuous.PE, categorical.PE) #------------------------------------------------- # Semi-correlations------------------------------- #------------------------------------------------- # Exclude the dichotomous variable sem.PE = nPE[,-3] semicorr.PE = semicorr(dat = sem.PE, type = c(1,1,2,2)) #------------------------------------------------ #------------------------------------------------ # GSS Data #------------------ ----------------- data(GSS) attach(GSS) continuous.GSS <- cbind(INCOME,AGE) ordinal.GSS <- cbind(DEGREE,PINCOME,PDEGREE) count.GSS <- cbind(CHILDREN,PCHILDREN) # Transforming the COUNT variables to ordinal # count1 : CHILDREN count1 = count.GSS[,1] count1[count1 > 3] = 3 # count2: PCHILDREN count2 = count.GSS[,2] count2[count2 > 7] = 7 # Combining both transformed count variables ncount.GSS = cbind(count1, count2) # Combining ordinal and transformed count variables categorical.GSS <- cbind(ordinal.GSS, ncount.GSS) categorical.GSS <- data.frame(apply(categorical.GSS, 2, factor)) # combining continuous and categorical variables nGSS = cbind(continuous.GSS, categorical.GSS) #------------------------------------------------- # Semi-correlations------------------------------- #------------------------------------------------- semicorr.GSS = semicorr(dat = nGSS, type = c(1, 1, rep(2,5)))
The Toronto Alexithymia Scale is the most utilized measure of alexithymia in empirical research and is composed of items that can be subdivided into
non-overlapping groups:
items to assess difficulty identifying feelings (DIF),
items to assess difficulty describing feelings (DDF) and
items to assess externally oriented thinking (EOT).
Students were 17 to 25 years old and 58% of them were female and 42% were male. They were asked to respond to each item using one of K = 5 categories: “1 = completely disagree”, “2 = disagree”, “3 = neutral”, “4 = agree”, “5 = completely agree”.
data(TAS)
data(TAS)
A data frame with 1925 observations on the following 20 items:
DIF
items: 1,3,6,7,9,13,14.
DDF
items: 2,4,11,12,17.
EOT
items: 5,8,10,15,16,18,19,20.
Briganti, G. and Linkowski, P. (2020). Network approach to items and domains from the toronto alexithymia scale. Psychological Reports, 123, 2038–2052.
Williams, D. and Mulder, J. (2020). BGGM: Bayesian Gaussian Graphical Models. R package version 1.0.0.
Transforming a continuous/count to ordinal variable with categories.
continuous2ordinal(continuous, categ) count2ordinal(count, categ)
continuous2ordinal(continuous, categ) count2ordinal(count, categ)
continuous |
Matrix of continuous data. |
count |
Matrix of count data. |
categ |
The number of categories |
#------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE <- PE[, 1:2] #Transforming the continuous to ordinal data : tcontinuous = continuous2ordinal(continuous.PE, categ=5) table(tcontinuous) #Transforming the count to ordinal data: set.seed(12345) count.data = rpois(1000, 3) tcount = count2ordinal(count.data, 5) table(tcount)
#------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE <- PE[, 1:2] #Transforming the continuous to ordinal data : tcontinuous = continuous2ordinal(continuous.PE, categ=5) table(tcontinuous) #Transforming the count to ordinal data: set.seed(12345) count.data = rpois(1000, 3) tcount = count2ordinal(count.data, 5) table(tcount)
Vuong (1989)'s test for the comparison of non-nested factor copula models for mixed data. We compute the Vuong's test between the factor copula model with BVN copulas (that is the standard factor model) and a competing factor copula model to reveal if the latter provides better fit than the standard factor model.
vuong.1f(cpar.bvn, cpar, copF1, continuous, ordinal, count, gl, param) vuong.2f(cpar.bvn, cpar, copF1, copF2, continuous, ordinal, count, gl, param)
vuong.1f(cpar.bvn, cpar, copF1, continuous, ordinal, count, gl, param) vuong.2f(cpar.bvn, cpar, copF1, copF2, continuous, ordinal, count, gl, param)
cpar.bvn |
copula parameters of the factor copula model with BVN copulas. |
cpar |
copula parameters of the competing factor copula model. |
copF1 |
copula names for the first factor of the competing factor copula model. |
copF2 |
copula names for the second factor of the competing factor copula model. |
continuous |
matrix of continuous data. |
ordinal |
matrix of ordinal data. |
count |
matrix of count data. |
gl |
gauss-legendre quardature points. |
param |
parameterization of estimated copula parameters. If FALSE, then cpar are the actual copula parameters without any transformation/reparamterization. |
A vector containing the following components:
z |
the test statistic. |
p.value |
the |
CI.left |
lower/left endpoint of 95% confidence interval. |
CI.right |
upper/right endpoint of 95% confidence interval. |
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Kadhem, S.H. and Nikoloulopoulos, A.K. (2021) Factor copula models for mixed data. British Journal of Mathematical and Statistical Psychology, 74, 365–403. doi:10.1111/bmsp.12231.
Vuong, Q.H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333.
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE1 = -PE[,1] continuous.PE2 = PE[,2] continuous.PE <- cbind(continuous.PE1, continuous.PE2) categorical.PE <- PE[, 3:5] d <- ncol(PE) #------------------------------------------------ # Estimation #------------------ ----------------- # factor copula model with BVN copulas cop1f.PE.bvn <- rep("bvn", d) PE.bvn1f <- mle1factor(continuous.PE, categorical.PE, count=NULL, copF1=cop1f.PE.bvn, gl, hessian = TRUE) # Selected factor copula model cop1f.PE <- c("joe", "joe", "rjoe", "joe", "gum") PE.selected1f <- mle1factor(continuous.PE, categorical.PE, count=NULL, copF1=cop1f.PE, gl, hessian = TRUE) #------------------------------------------------ # Vuong's test #------------------ ----------------- v1f.PE.selected <- vuong.1f(PE.bvn1f$cpar$f1, PE.selected1f$cpar$f1,cop1f.PE, continuous.PE, categorical.PE, count=NULL, gl, param=FALSE)
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # PE Data #------------------ ----------------- data(PE) continuous.PE1 = -PE[,1] continuous.PE2 = PE[,2] continuous.PE <- cbind(continuous.PE1, continuous.PE2) categorical.PE <- PE[, 3:5] d <- ncol(PE) #------------------------------------------------ # Estimation #------------------ ----------------- # factor copula model with BVN copulas cop1f.PE.bvn <- rep("bvn", d) PE.bvn1f <- mle1factor(continuous.PE, categorical.PE, count=NULL, copF1=cop1f.PE.bvn, gl, hessian = TRUE) # Selected factor copula model cop1f.PE <- c("joe", "joe", "rjoe", "joe", "gum") PE.selected1f <- mle1factor(continuous.PE, categorical.PE, count=NULL, copF1=cop1f.PE, gl, hessian = TRUE) #------------------------------------------------ # Vuong's test #------------------ ----------------- v1f.PE.selected <- vuong.1f(PE.bvn1f$cpar$f1, PE.selected1f$cpar$f1,cop1f.PE, continuous.PE, categorical.PE, count=NULL, gl, param=FALSE)
The Vuong's test (Vuong,1989) is the sample version of the difference in Kullback-Leibler divergence between two models and can be used to differentiate two parametric models which could be non-nested. For the Vuong's test we provide the 95% confidence interval of the Vuong's test statistic (Joe, 2014, page 258). If the interval does not contain 0, then the best fitted model according to the AICs is better if the interval is completely above 0.
vuong_FactorTree(models, y, A.m1, tau.m1, copnames.m1, tau.m2, copnames.m2,A.m2,nq)
vuong_FactorTree(models, y, A.m1, tau.m1, copnames.m1, tau.m2, copnames.m2,A.m2,nq)
models |
choose a number from (1,2,3,4,5,6,7). 1: Model 1 is 1-factor tree copula, Model 2 is 1-factor tree copula. 2: Model 1 is 2-factor tree copula, Model 2 is 2-factor tree copula. 3: Model 1 is 1-factor tree copula, Model 2 is 2-factor tree copula. 4: Model 1 is 1-factor copula, Model 2 is 1-factor tree copula. 5: Model 1 is 1-factor copula, Model 2 is 2-factor tree copula. 6: Model 1 is 2-factor copula, Model 2 is 1-factor tree copula. 7: Model 1 is 2-factor copula, Model 2 is 2-factor tree copula. |
y |
|
A.m1 |
|
A.m2 |
|
tau.m1 |
vector of copula paramters in Kendall's |
tau.m2 |
vector of copula paramters in Kendall's |
copnames.m1 |
vector of names of copula families for model 1. |
copnames.m2 |
vector of names of copula families for model 2. |
nq |
Number of Gauss legendre quardrature points. |
A vector containing the following components:
z |
the test statistic. |
p.value |
the |
CI.left |
lower/left endpoint of 95% confidence interval. |
CI.right |
upper/right endpoint of 95% confidence interval. |
Sayed H. Kadhem
Aristidis K. Nikoloulopoulos [email protected]
Joe, H. (2014). Dependence Modelling with Copulas. Chapman and Hall/CRC.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2022b) Factor tree copula models for item response data. Arxiv e-prints, <arXiv: 2201.00339>. https://arxiv.org/abs/2201.00339.
Vuong, Q.H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333.
#------------------------------------------------ # Setting quadreture points nq <- 25 #------------------------------------------------ # PTSD Data #------------------ ----------------- data(PTSD) ydat=PTSD d=ncol(ydat);d n=nrow(ydat);n #vine tree structure #selecting vine tree based on polychoric corr rmat=polychoric0(ydat)$p A.polychoric=selectFactorTrVine(y=ydat,rmat,alg=3) A.polychoric=A.polychoric$VineTreeA #------------------------------------------------ # M1 1-factor tree - M2 1-factor tree tau.m1 = rep(0.4,d*2-1) copnames.m1 = rep("bvn",d*2-1) tau.m2 = rep(0.4,d*2-1) copnames.m2 = rep("rgum",d*2-1) vuong.1factortree = vuong_FactorTree(models=1, ydat, A.m1=A.polychoric, tau.m1, copnames.m1, tau.m2, copnames.m2,A.m2=A.polychoric,nq) #------------------------------------------------ # M1 2-factor tree - M2 2-factor tree tau.m1 = rep(0.4,d*3-1) copnames.m1 = rep("bvn",d*3-1) tau.m2 = rep(0.4,d*3-1) copnames.m2 = rep("rgum",d*3-1) vuong.2factortree = vuong_FactorTree(models=2, ydat, A.m1=A.polychoric, tau.m1, copnames.m1, tau.m2, copnames.m2,A.m2=A.polychoric,nq) #------------------------------------------------ # M1 1-factor tree - M2 2-factor tree tau.m1 = rep(0.4,d*2-1) copnames.m1 = rep("bvn",d*2-1) tau.m2 = rep(0.4,d*3-1) copnames.m2 = rep("rgum",d*3-1) vuong.12factortree = vuong_FactorTree(models=3, ydat, A.m1=A.polychoric, tau.m1, copnames.m1, tau.m2, copnames.m2,A.m2=A.polychoric,nq)
#------------------------------------------------ # Setting quadreture points nq <- 25 #------------------------------------------------ # PTSD Data #------------------ ----------------- data(PTSD) ydat=PTSD d=ncol(ydat);d n=nrow(ydat);n #vine tree structure #selecting vine tree based on polychoric corr rmat=polychoric0(ydat)$p A.polychoric=selectFactorTrVine(y=ydat,rmat,alg=3) A.polychoric=A.polychoric$VineTreeA #------------------------------------------------ # M1 1-factor tree - M2 1-factor tree tau.m1 = rep(0.4,d*2-1) copnames.m1 = rep("bvn",d*2-1) tau.m2 = rep(0.4,d*2-1) copnames.m2 = rep("rgum",d*2-1) vuong.1factortree = vuong_FactorTree(models=1, ydat, A.m1=A.polychoric, tau.m1, copnames.m1, tau.m2, copnames.m2,A.m2=A.polychoric,nq) #------------------------------------------------ # M1 2-factor tree - M2 2-factor tree tau.m1 = rep(0.4,d*3-1) copnames.m1 = rep("bvn",d*3-1) tau.m2 = rep(0.4,d*3-1) copnames.m2 = rep("rgum",d*3-1) vuong.2factortree = vuong_FactorTree(models=2, ydat, A.m1=A.polychoric, tau.m1, copnames.m1, tau.m2, copnames.m2,A.m2=A.polychoric,nq) #------------------------------------------------ # M1 1-factor tree - M2 2-factor tree tau.m1 = rep(0.4,d*2-1) copnames.m1 = rep("bvn",d*2-1) tau.m2 = rep(0.4,d*3-1) copnames.m2 = rep("rgum",d*3-1) vuong.12factortree = vuong_FactorTree(models=3, ydat, A.m1=A.polychoric, tau.m1, copnames.m1, tau.m2, copnames.m2,A.m2=A.polychoric,nq)
The Vuong's test (Vuong,1989) is the sample version of the difference in Kullback-Leibler divergence between two models and can be used to differentiate two parametric models which could be non-nested. For the Vuong's test we provide the 95% confidence interval of the Vuong's test statistic (Joe, 2014, page 258). If the interval does not contain 0, then the best fitted model according to the AICs is better if the interval is completely above 0.
vuong_structured(models, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y, grpsize)
vuong_structured(models, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y, grpsize)
models |
choose a number from (1,2,3,4). 1: Model1 is bifactor, Model2 is bifactor. 2: Model1 is second-order, Model2 is second-order. 3: Model1 is second-order, Model2 is bifactor. 4: Model1 is bifactor, Model2 is nested. |
cpar.m1 |
vector of copula paramters for model 1, starting with copula parameters that link the items with common factor (bifactor), or group factors with common factor (second-order). |
cpar.m2 |
vector of copula paramters for model 2, starting with copula parameters that link the items with common factor (bifactor), or group factors with common factor (second-order). |
copnames.m1 |
vector of names of copula families for model 1, starting with copulas that link the items with common factor (bifactor), or group factors with common factor (second-order). |
copnames.m2 |
vector of names of copula families for model 2, starting with copulas that link the items with common factor (bifactor), or group factors with common factor (second-order). |
y |
matrix of ordinal data. |
grpsize |
vector indicating the size for each group, e.g., c(4,4,4) indicating four items in all three groups. |
A vector containing the following components:
z |
the test statistic. |
p.value |
the |
CI.left |
lower/left endpoint of 95% confidence interval. |
CI.right |
upper/right endpoint of 95% confidence interval. |
Sayed H. Kadhem [email protected]
Aristidis K. Nikoloulopoulos [email protected]
Joe, H. (2014). Dependence Modelling with Copulas. Chapman and Hall/CRC.
Kadhem, S.H. and Nikoloulopoulos, A.K. (2023) Bi-factor and second-order copula models for item response data. Psychometrika, 88, 132–157. doi:10.1007/s11336-022-09894-2.
Vuong, Q.H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333.
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # TAS Data #------------------ ----------------- data(TAS) grp1=c(1,3,6,7,9,13,14) grp2=c(2,4,11,12,17) grp3=c(5,8,10,15,16,18,19,20) ydat=TAS[,c(grp1,grp2,grp3)] d=ncol(ydat);d n=nrow(ydat);n #Rearrange items within testlets g1=length(grp1) g2=length(grp2) g3=length(grp3) grpsize=c(g1,g2,g3)#group size #number of groups ngrp=length(grpsize) #------------------------------------------------ # M1 bifactor - M2 bifactor cpar.m1 = rep(0.6,d*2) copnames.m1 = rep("bvn",d*2) cpar.m2 = rep(1.6,d*2) copnames.m2 = rep("rgum",d*2) vuong.bifactor = vuong_structured(models=1, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y=ydat, grpsize) # M1 seconod-order - M2 seconod-order cpar.m1 = rep(0.6,d+ngrp) copnames.m1 = rep("bvn",d+ngrp) cpar.m2 = rep(1.6,d+ngrp) copnames.m2 = rep("rgum",d+ngrp) vuong.second_order = vuong_structured(models=2, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y=ydat, grpsize) # M1 seconod-order - M2 bifactor cpar.m1 = rep(0.6,d+ngrp) copnames.m1 = rep("bvn",d+ngrp) cpar.m2 = rep(1.6,d*2) copnames.m2 = rep("rgum",d*2) vuong.2ndO_bif = vuong_structured(models=3, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y=ydat, grpsize) # M1 bifactor - M2 seconod-order cpar.m1 = rep(0.6,d*2) copnames.m1 = rep("bvn",d*2) cpar.m2 = rep(1.6,d+ngrp) copnames.m2 = rep("rgum",d+ngrp) vuong.bif_2ndO = vuong_structured(models=4, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y=ydat, grpsize)
#------------------------------------------------ # Setting quadreture points nq <- 25 gl <- gauss.quad.prob(nq) #------------------------------------------------ # TAS Data #------------------ ----------------- data(TAS) grp1=c(1,3,6,7,9,13,14) grp2=c(2,4,11,12,17) grp3=c(5,8,10,15,16,18,19,20) ydat=TAS[,c(grp1,grp2,grp3)] d=ncol(ydat);d n=nrow(ydat);n #Rearrange items within testlets g1=length(grp1) g2=length(grp2) g3=length(grp3) grpsize=c(g1,g2,g3)#group size #number of groups ngrp=length(grpsize) #------------------------------------------------ # M1 bifactor - M2 bifactor cpar.m1 = rep(0.6,d*2) copnames.m1 = rep("bvn",d*2) cpar.m2 = rep(1.6,d*2) copnames.m2 = rep("rgum",d*2) vuong.bifactor = vuong_structured(models=1, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y=ydat, grpsize) # M1 seconod-order - M2 seconod-order cpar.m1 = rep(0.6,d+ngrp) copnames.m1 = rep("bvn",d+ngrp) cpar.m2 = rep(1.6,d+ngrp) copnames.m2 = rep("rgum",d+ngrp) vuong.second_order = vuong_structured(models=2, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y=ydat, grpsize) # M1 seconod-order - M2 bifactor cpar.m1 = rep(0.6,d+ngrp) copnames.m1 = rep("bvn",d+ngrp) cpar.m2 = rep(1.6,d*2) copnames.m2 = rep("rgum",d*2) vuong.2ndO_bif = vuong_structured(models=3, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y=ydat, grpsize) # M1 bifactor - M2 seconod-order cpar.m1 = rep(0.6,d*2) copnames.m1 = rep("bvn",d*2) cpar.m2 = rep(1.6,d+ngrp) copnames.m2 = rep("rgum",d+ngrp) vuong.bif_2ndO = vuong_structured(models=4, cpar.m1, copnames.m1, cpar.m2, copnames.m2, y=ydat, grpsize)