Title: | Partial Profile Score Feature Selection in High-Dimensional Generalized Linear Interaction Models |
---|---|
Description: | This is an implementation of the partial profile score feature selection (PPSFS) approach to generalized linear (interaction) models. The PPSFS is highly scalable even for ultra-high-dimensional feature space. See the paper by Xu, Luo and Chen (2021, <doi:10.4310/21-SII706>). |
Authors: | Zengchao Xu [aut, cre], Shan Luo [aut], Zehua Chen [aut] |
Maintainer: | Zengchao Xu <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-11-28 06:27:52 UTC |
Source: | CRAN |
ppsfs
: PPSFS for main-effects.
ppsfsi
: PPSFS for interaction effects.
ppsfs( x, y, family, keep = NULL, I0 = NULL, ..., ebicFlag = 1, maxK = min(NROW(x) - 1, NCOL(x) + length(I0)), verbose = FALSE ) ppsfsi( x, y, family, keep = NULL, ..., ebicFlag = 1, maxK = min(NROW(x) - 1, choose(NCOL(x), 2)), verbose = FALSE )
ppsfs( x, y, family, keep = NULL, I0 = NULL, ..., ebicFlag = 1, maxK = min(NROW(x) - 1, NCOL(x) + length(I0)), verbose = FALSE ) ppsfsi( x, y, family, keep = NULL, ..., ebicFlag = 1, maxK = min(NROW(x) - 1, choose(NCOL(x), 2)), verbose = FALSE )
x |
Matrix. |
y |
Vector. |
family |
|
keep |
Initial set of features that are included in model fitting. |
I0 |
Index set of interaction effects to be identified. |
... |
Additional parameters for glm.fit. |
ebicFlag |
The procedure stops when the EBIC increases after |
maxK |
Maximum number of identified features. |
verbose |
Print the procedure path? |
That ppsfs(x, y, family="gaussian")
is an implementation to
sequential lasso method proposed by Luo and Chen doi:10/f6kfr6.
Index set of identified features.
Z. Xu, S. Luo and Z. Chen (2022). Partial profile score feature selection in high-dimensional generalized linear interaction models. Statistics and Its Interface. doi:10.4310/21-SII706
## *************************************************** ## Identify main-effect features ## *************************************************** set.seed(2022) n <- 300 p <- 1000 x <- matrix(rnorm(n*p), n) eta <- drop( x[, 1:3] %*% runif(3, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( A <- ppsfs(x, y, 'gaussian', verbose=TRUE) ) ## *************************************************** ## Identify interaction effects ## *************************************************** set.seed(2022) n <- 300 p <- 150 x <- matrix(rnorm(n*p), n) eta <- drop( cbind(x[, 1:3], x[, 4:6]*x[, 7:9]) %*% runif(6, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( group <- ppsfsi(x, y, 'gaussian', verbose=TRUE) ) print( A <- ppsfs(x, y, "gaussian", I0=group, verbose=TRUE) ) print( A <- ppsfs(x, y, "gaussian", keep=c(1, "5:8"), I0=group, verbose=TRUE) )
## *************************************************** ## Identify main-effect features ## *************************************************** set.seed(2022) n <- 300 p <- 1000 x <- matrix(rnorm(n*p), n) eta <- drop( x[, 1:3] %*% runif(3, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( A <- ppsfs(x, y, 'gaussian', verbose=TRUE) ) ## *************************************************** ## Identify interaction effects ## *************************************************** set.seed(2022) n <- 300 p <- 150 x <- matrix(rnorm(n*p), n) eta <- drop( cbind(x[, 1:3], x[, 4:6]*x[, 7:9]) %*% runif(6, 1.0, 1.5) ) y <- eta + rnorm(n, sd=sd(eta)/5) print( group <- ppsfsi(x, y, 'gaussian', verbose=TRUE) ) print( A <- ppsfs(x, y, "gaussian", I0=group, verbose=TRUE) ) print( A <- ppsfs(x, y, "gaussian", keep=c(1, "5:8"), I0=group, verbose=TRUE) )