Title: | Generalized Factor Model |
---|---|
Description: | Generalized factor model is implemented for ultra-high dimensional data with mixed-type variables. Two algorithms, variational EM and alternate maximization, are designed to implement the generalized factor model, respectively. The factor matrix and loading matrix together with the number of factors can be well estimated. This model can be employed in social and behavioral sciences, economy and finance, and genomics, to extract interpretable nonlinear factors. More details can be referred to Wei Liu, Huazhen Lin, Shurong Zheng and Jin Liu. (2021) <doi:10.1080/01621459.2021.1999818>. |
Authors: | Wei Liu [aut, cre], Huazhen Lin [aut], Shurong Zheng [aut], Jin Liu [aut], Jinyu Nie [aut] |
Maintainer: | Wei Liu <[email protected]> |
License: | GPL-3 |
Version: | 1.2.1 |
Built: | 2024-12-24 06:48:24 UTC |
Source: | CRAN |
This function is designed to chooose the number of factors for a generalized factor model.
chooseFacNumber(XList, types, q_set = 2: 10, select_method = c("SVR", "IC"),offset=FALSE, dc_eps=1e-4, maxIter=30, verbose = TRUE, parallelList=NULL)
chooseFacNumber(XList, types, q_set = 2: 10, select_method = c("SVR", "IC"),offset=FALSE, dc_eps=1e-4, maxIter=30, verbose = TRUE, parallelList=NULL)
XList |
a list consisting of matrices with the same rows n, and different columns (p1,p2, ..., p_d),observational mixed data matrix list, d is the types of variables, p_j is the dimension of varibles with the j-th type. |
types |
a d-dimensional character vector, specify the type of variables. For example, |
q_set |
a positive integer vector, specify the candidates of factor number q, (optional) default as |
select_method |
a string, specify the method to choose the number of factors. Two methods are supported: the singular value ratio (SVR) and information criterion (IC) based methods, default as 'SVR'. Empirically, 'SVR' is much faster than 'IC', especially for high-dimensional large-scale data. |
offset |
a logical value, whether add an offset term (the total counts for each row in the count component of XList) when there are Poisson variables. |
dc_eps |
positive real number, specify the tolerance of varing quantity of objective function in the algorithm. Optional parameter with default as |
maxIter |
a positive integer, specify the times of iteration. Optional parameter with default as 50. |
verbose |
a logical value, specify whether ouput the information in iteration process, (optional) default as TRUE. |
parallelList |
a list with two components: (1) parallel: a logical value with TRUE or FALSE, indicates wheter to use prallel computating. Optional parameter with default as FALSE. (2)ncores: a positive integer, specify the number of cores when parallel computing is used. This argument plays its role if only |
return an integer value, the estimated number of factors.
nothing
Liu Wei
nothing
## mix of normal and Poisson dat <- gendata(seed=1, n=60, p=60, type='norm_pois', q=2, rho=2) ## we set maxIter=2 for example. hq <- chooseFacNumber(dat$XList, dat$types, verbose = FALSE, maxIter=2)
## mix of normal and Poisson dat <- gendata(seed=1, n=60, p=60, type='norm_pois', q=2, rho=2) ## we set maxIter=2 for example. hq <- chooseFacNumber(dat$XList, dat$types, verbose = FALSE, maxIter=2)
Factor analysis to extract latent linear factor and estimate loadings.
Factorm(X, q=NULL)
Factorm(X, q=NULL)
X |
a |
q |
an integer between 1 and |
return a list with class named fac
, including following components:
hH |
a |
hB |
a |
q |
an integer between 1 and |
sigma2vec |
a p-dimensional vector, the estimated variance for each error term in model. |
propvar |
a positive number between 0 and 1, the explained propotion of cummulative variance by the |
egvalues |
a n-dimensional(n<=p) or p-dimensional(p<n) vector, the eigenvalues of sample covariance matrix. |
nothing
Liu Wei
Fan, J., Xue, L., and Yao, J. (2017). Sufficient forecasting using factor models. Journal of Econometrics.
gfm
.
dat <- gendata(n = 300, p = 500) res <- Factorm(dat$X) measurefun(res$hH, dat$H0) # the smallest canonical correlation
dat <- gendata(n = 300, p = 500) res <- Factorm(dat$X) measurefun(res$hH, dat$H0) # the smallest canonical correlation
Generate simulated data from high dimensional genelized nonlinear factor model.
gendata(seed = 1, n = 300, p = 50, type = c('homonorm', 'heternorm', 'pois', 'bino', 'norm_pois', 'pois_bino', 'npb'), q = 6, rho = 1, n_bin=1)
gendata(seed = 1, n = 300, p = 50, type = c('homonorm', 'heternorm', 'pois', 'bino', 'norm_pois', 'pois_bino', 'npb'), q = 6, rho = 1, n_bin=1)
seed |
a nonnegative integer, the random seed, default as 1. |
n |
a positive integer, the sample size. |
p |
an positive integer, the variable dimension. |
type |
a character, specify the variables types for generated data, default as 'homonorm', representing the homogeneous gaussian variables. |
q |
a positive integer, the number of factors. |
rho |
a positive number, controlling the magnitude of loading matrix. |
n_bin |
a positive integer, specify the number of trails for the binomial variables when |
This function provides a variaty of mix of different variable types, in which 'homonorm' represents the generated data with only homogenous normal variables; 'heternorm' represents the data with only heterogenous normal variables; 'pois' means the data with only poisson variables; 'bino' means the data with only binomial variables; 'norm_pois' means the mix of normal and poisson variables; 'pois_bino' represents the mix of poisson and binomial variables; and 'npb' means the most complex mix of normal, poisson and binomial variables.
return a list including two components:
X |
a |
XList |
a list consisting of the above observed data matrices with the same rows n (observations), and different columns (p1,p2, ..., p_d) and p columns in total, where d is the types of variables, pj is the dimension of varibles with the j-th type. |
H0 |
a |
B0 |
a |
mu0 |
a p-dimensional vector, the true intercept terms. |
nothing
Wei Liu
dat <- gendata(n=300, p = 500) str(dat)
dat <- gendata(n=300, p = 500) str(dat)
This function is to implement the generalized factor model.
gfm(XList, types, q=10, offset=FALSE, dc_eps=1e-4, maxIter=30, verbose = TRUE, algorithm=c("VEM", "AM"))
gfm(XList, types, q=10, offset=FALSE, dc_eps=1e-4, maxIter=30, verbose = TRUE, algorithm=c("VEM", "AM"))
XList |
a list consisting of matrices with the same rows n, and different columns (p1,p2, ..., p_d),observational mixed data matrix list, d is the types of variables, p_j is the dimension of varibles with the j-th type. |
types |
a d-dimensional character vector, specify the type of variables. For example, |
q |
a positive integer or empty, specify the number of factors, defualt as 10. |
offset |
a logical value, whether add an offset term (the total counts for each row in the count component of XList) when there are Poisson variables. |
dc_eps |
a positive real, specify the relative tolerance of objective function in the algorithm. Optional parameter with default as |
maxIter |
a positive integer, specify the times of iteration. Optional parameter with default as 30. |
verbose |
a logical value with TRUE or FALSE, specify whether ouput the information in iteration process, (optional) default as TRUE. |
algorithm |
a string, specify the algorithm to be used for fitting model. Now it supports two algorithms: variational EM (VEM) and alternate maximization (AM) algorithm, default as VEM. Empirically, we observed that VEM is more robust than AM to the high noise data. |
This function also has the MATLAB version at https://github.com/feiyoung/MGFM/blob/master/gfm.m.
return a list with class name 'gfm' and including following components,
hH |
a n*q matrix, the estimated factor matrix. |
hB |
a p*q matrix, the estimated loading matrix. |
hmu |
a p-dimensional vector, the estimated intercept terms. |
obj |
a real number, the value of objective function when the convergence achieves. |
q |
an integer, the used or estimated factor number. |
history |
a list including the following 7 components: (1)dB: the varied quantity of B in each iteration; (2)dH: the varied quantity of H in each iteration; (3)dc: the varied quantity of the objective function in each iteration; (4)c: the objective value in each iteration; (5) realIter: the real iterations to converge; (6)maxIter: the tolerance of maximum iterations; (7)elapsedTime: the elapsed time. |
nothing
Liu Wei
Bai, J. and Liao, Y. (2013). Statistical inferences using large esti- mated covariances for panel data and factor models.
nothing
## mix of normal and Poisson dat <- gendata(seed=1, n=60, p=60, type='norm_pois', q=2, rho=2) ## we set maxIter=2 for example. gfm2 <- gfm(dat$XList, dat$types, q=2, verbose = FALSE, maxIter=2) measurefun(gfm2$hH, dat$H0, type='ccor') measurefun(gfm2$hB, dat$B0, type='ccor')
## mix of normal and Poisson dat <- gendata(seed=1, n=60, p=60, type='norm_pois', q=2, rho=2) ## we set maxIter=2 for example. gfm2 <- gfm(dat$XList, dat$types, q=2, verbose = FALSE, maxIter=2) measurefun(gfm2$hH, dat$H0, type='ccor') measurefun(gfm2$hB, dat$B0, type='ccor')
Evaluate the smallest cononical correlation (ccor) coefficients or trace statistic between two matrices, where a larger ccor or trace statistic is better.
measurefun(hH, H, type=c('trace_statistic','ccor'))
measurefun(hH, H, type=c('trace_statistic','ccor'))
hH |
a |
H |
a |
type |
a character taking value within |
return a real number.
nothing
Liu Wei
dat <- gendata(n = 100, p = 200, q=2, rho=3) res <- Factorm(dat$XList[[1]]) measurefun(res$hB, dat$B0)
dat <- gendata(n = 100, p = 200, q=2, rho=3) res <- Factorm(dat$XList[[1]]) measurefun(res$hB, dat$B0)
This function is to implement the overdispersed generalized factor model.
overdispersedGFM(XList, types, q, offset=FALSE, epsELBO=1e-5, maxIter=30, verbose=TRUE)
overdispersedGFM(XList, types, q, offset=FALSE, epsELBO=1e-5, maxIter=30, verbose=TRUE)
XList |
a list consisting of matrices with the same rows n, and different columns (p1,p2, ..., p_d),observational mixed data matrix list, d is the types of variables, p_j is the dimension of varibles with the j-th type. |
types |
a d-dimensional character vector, specify the type of variables. For example, |
q |
a positive integer or empty, specify the number of factors. |
offset |
a logical value, whether add an offset term (the total counts for each row in the count component of XList) when there are Poisson variables. |
epsELBO |
a positive real, specify the relative tolerance of ELBO function in the algorithm. Optional parameter with default as |
maxIter |
a positive integer, specify the times of iteration. Optional parameter with default as 30. |
verbose |
a logical value with TRUE or FALSE, specify whether ouput the information in iteration process, (optional) default as TRUE. |
Overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data.
return a list with class name 'overdispersedGFM' and including following components,
hH |
a n*q matrix, the estimated factor matrix. |
hB |
a p*q matrix, the estimated loading matrix. |
hmu |
a p-dimensional vector, the estimated intercept terms. |
obj |
a real number, the value of objective function when the convergence achieves. |
q |
an integer, the used or estimated factor number. |
history |
a list including the following 7 components: (1)dB: the varied quantity of B in each iteration; (2)dH: the varied quantity of H in each iteration; (3)dc: the varied quantity of the objective function in each iteration; (4)c: the objective value in each iteration; (5) realIter: the real iterations to converge; (6)maxIter: the tolerance of maximum iterations; (7)elapsedTime: the elapsed time. |
nothing
Liu Wei
nothing
## mix of normal and Poisson dat <- gendata(seed=1, n=60, p=60, type='norm_pois', q=2, rho=2) ## we set maxIter=2 for example. gfm2 <- overdispersedGFM(dat$XList, dat$types, q=2, verbose = FALSE, maxIter=2) measurefun(gfm2$hH, dat$H0, type='ccor') measurefun(gfm2$hB, dat$B0, type='ccor')
## mix of normal and Poisson dat <- gendata(seed=1, n=60, p=60, type='norm_pois', q=2, rho=2) ## we set maxIter=2 for example. gfm2 <- overdispersedGFM(dat$XList, dat$types, q=2, verbose = FALSE, maxIter=2) measurefun(gfm2$hH, dat$H0, type='ccor') measurefun(gfm2$hB, dat$B0, type='ccor')
This function is designed to chooose the number of factors for the overdispersed generalized factor model by using the singular value ratio (SVR) based method.
OverGFMchooseFacNumber(XList, types, q_max=15,offset=FALSE, epsELBO=1e-4, maxIter=30, verbose = TRUE, threshold= 1e-2)
OverGFMchooseFacNumber(XList, types, q_max=15,offset=FALSE, epsELBO=1e-4, maxIter=30, verbose = TRUE, threshold= 1e-2)
XList |
a list consisting of matrices with the same rows n, and different columns (p1,p2, ..., p_d),observational mixed data matrix list, d is the types of variables, p_j is the dimension of varibles with the j-th type. |
types |
a d-dimensional character vector, specify the type of variables. For example, |
q_max |
a positive integer, specify the upper bound of the number of factors, defualt as 15. |
offset |
a logical value, whether add an offset term (the total counts for each row in the count component of XList) when there are Poisson variables. |
epsELBO |
a positive real, specify the relative tolerance of ELBO function in the algorithm. Optional parameter with default as |
maxIter |
a positive integer, specify the times of iteration. Optional parameter with default as 30. |
verbose |
a logical value with TRUE or FALSE, specify whether ouput the information in iteration process, (optional) default as TRUE. |
threshold |
a postive real, the threshold that is used to filter the small singular values in the SVR criterion. |
return an integer value, the estimated number of factors.
nothing
Liu Wei
nothing
## mix of normal and Poisson dat <- gendata(seed=1, n=60, p=60, type='norm_pois', q=2, rho=2) ## we set maxIter=2 for example. hq <- OverGFMchooseFacNumber(dat$XList, dat$types, verbose = FALSE, maxIter=2)
## mix of normal and Poisson dat <- gendata(seed=1, n=60, p=60, type='norm_pois', q=2, rho=2) ## we set maxIter=2 for example. hq <- OverGFMchooseFacNumber(dat$XList, dat$types, verbose = FALSE, maxIter=2)