Title: | Nonnegative Garrote Method Incorporating Hierarchical Relationships |
---|---|
Description: | An implementation of the nonnegative garrote method that incorporates hierarchical relationships among variables. The core function, HiGarrote(), offers an automated approach for analyzing experiments while respecting hierarchical structures among effects. For methodological details, refer to Yu and Joseph (2024) <doi:10.48550/arXiv.2411.01383>. This work is supported by U.S. National Science Foundation grant DMS-2310637. |
Authors: | Wei-Yang Yu [aut, cre], V. Roshan Joseph [aut] |
Maintainer: | Wei-Yang Yu <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0 |
Built: | 2024-11-15 08:48:44 UTC |
Source: | CRAN |
Hamada and Wu (1992) analyzed an 18-run experiment designed to study blood glucose readings of a clinical testing device. The experiment contains one two-level factor and seven three-level quantitative factors, which are denoted by A through H.
data(blood_glucose)
data(blood_glucose)
A data frame with 18 rows and 9 columns.
Hamada, M. and Wu, C. F. J. (1992) "Analysis of Designed Experiments with Complex Aliasing," Journal of Quality Technology, 24, 130–-137.
Hunter et al. (1982) used a 12-run Plackett-Burman design to investigate the effects of seven two-level factors on the fatigue life of weld-repaired castings. The seven factors are denoted by capital letters A through G.
data(cast_fatigue)
data(cast_fatigue)
A data frame with 12 rows and 8 columns.
Hunter, G. B., Hodi, F. S., and Eagar, T. W. (1982) "High Cycle Fatigue of Weld Repaired Cast Ti-6AI-4V," Metallurgical Transactions A, 13, 1589–1594.
'HiGarrote()' provides an automatic method for analyzing experimental data. This function applies the nonnegative garrote method to select important effects while preserving their hierarchical structures. It first estimates regression parameters using generalized ridge regression, where the ridge parameters are derived from a Gaussian process prior placed on the input-output relationship. Subsequently, the initial estimates will be used in the nonnegative garrote for effects selection.
HiGarrote( D, y, quali_id = NULL, quanti_id = NULL, heredity = "weak", U = NULL, me_num = NULL, quali_contr = NULL )
HiGarrote( D, y, quali_id = NULL, quanti_id = NULL, heredity = "weak", U = NULL, me_num = NULL, quali_contr = NULL )
D |
An |
y |
A vector for the responses corresponding to |
quali_id |
A vector indexing qualitative factors. |
quanti_id |
A vector indexing quantitative factors. |
heredity |
Choice of heredity principles: weak or strong. The default is weak. |
U |
Optional. An |
me_num |
Optional. A |
quali_contr |
Optional. A list specifying the contrasts of factors.
|
A vector for the nonnegative garrote estimates of the identified effects.
Yu, W. Y. and Joseph, V. R. (2024) "Automated Analysis of Experiments using Hierarchical Garrote," arXiv preprint arXiv:2411.01383.
# Cast fatigue experiment data(cast_fatigue) X <- cast_fatigue[,1:7] y <- cast_fatigue[,8] HiGarrote::HiGarrote(X, y) # Blood glucose experiment data(blood_glucose) X <- blood_glucose[,1:8] y <- blood_glucose[,9] HiGarrote::HiGarrote(X, y, quanti_id = 2:8) # Router bit experiment data(router_bit) X <- router_bit[, 1:9] y <- router_bit[,10] for(i in c(4,5)){ my.contrasts <- matrix(c(-1,-1,1,1,1,-1,-1,1,-1,1,-1,1), ncol = 3) X[,i] <- as.factor(X[,i]) contrasts(X[,i]) <- my.contrasts colnames(contrasts(X[,i])) <- paste0(".",1:(4-1)) } U <- model.matrix(~.^2, X) U <- U[, -1] # remove the unnecessary intercept terms from the model matrix me_num = c(rep(1,3), rep(3,2), rep(1, 4)) quali_contr <- list(NULL, NULL, NULL, matrix(c(-1,-1,1,1,1,-1,-1,1,-1,1,-1,1), ncol = 3), matrix(c(-1,-1,1,1,1,-1,-1,1,-1,1,-1,1), ncol = 3), NULL, NULL, NULL, NULL) HiGarrote::HiGarrote(X, y, quali_id = c(4,5), U = U, me_num = me_num, quali_contr = quali_contr) # Experiments with replicates # Generate simulated data data(cast_fatigue) X <- cast_fatigue[,1:7] U <- data.frame(model.matrix(~.^2, X)[,-1]) error <- matrix(rnorm(24), ncol = 2) # two replicates for each run y <- 20*U$A + 10*U$A.B + 5*U$A.C + error HiGarrote::HiGarrote(X, y)
# Cast fatigue experiment data(cast_fatigue) X <- cast_fatigue[,1:7] y <- cast_fatigue[,8] HiGarrote::HiGarrote(X, y) # Blood glucose experiment data(blood_glucose) X <- blood_glucose[,1:8] y <- blood_glucose[,9] HiGarrote::HiGarrote(X, y, quanti_id = 2:8) # Router bit experiment data(router_bit) X <- router_bit[, 1:9] y <- router_bit[,10] for(i in c(4,5)){ my.contrasts <- matrix(c(-1,-1,1,1,1,-1,-1,1,-1,1,-1,1), ncol = 3) X[,i] <- as.factor(X[,i]) contrasts(X[,i]) <- my.contrasts colnames(contrasts(X[,i])) <- paste0(".",1:(4-1)) } U <- model.matrix(~.^2, X) U <- U[, -1] # remove the unnecessary intercept terms from the model matrix me_num = c(rep(1,3), rep(3,2), rep(1, 4)) quali_contr <- list(NULL, NULL, NULL, matrix(c(-1,-1,1,1,1,-1,-1,1,-1,1,-1,1), ncol = 3), matrix(c(-1,-1,1,1,1,-1,-1,1,-1,1,-1,1), ncol = 3), NULL, NULL, NULL, NULL) HiGarrote::HiGarrote(X, y, quali_id = c(4,5), U = U, me_num = me_num, quali_contr = quali_contr) # Experiments with replicates # Generate simulated data data(cast_fatigue) X <- cast_fatigue[,1:7] U <- data.frame(model.matrix(~.^2, X)[,-1]) error <- matrix(rnorm(24), ncol = 2) # two replicates for each run y <- 20*U$A + 10*U$A.B + 5*U$A.C + error HiGarrote::HiGarrote(X, y)
'nnGarrote()' implements the nonnegative garrote method, as described in Yuan et al. (2009), for selecting important variables while preserving hierarchical structures. The method begins by obtaining the least squares estimates of the regression parameters under a linear model. These initial estimates are then used in the nonnegative garrote to perform variable selection. This function supports prediction based on the linear model fitted with the selected variables and their nonnegative garrote estimates. Note that this method is suitable only when the number of observations is much larger than the number of variables, ensuring that the least squares estimation remains reliable.
nnGarrote(U, y, new_U = NULL, heredity = "weak")
nnGarrote(U, y, new_U = NULL, heredity = "weak")
U |
An |
y |
A vector for the responses. |
new_U |
Optional. A matrix or data frame of the new model matrix for prediction. |
heredity |
Choice of heredity principles: weak or strong. The default is weak. |
If new_U
is NULL
, the function returns a vector for the nonnegative garrote estimates of the identified variables.
If new_U
is not NULL
, the function returns a list with:
beta_nng
: a vector for the nonnegative garrote estimates of the identified variables.
pred
: predictions for the output corresponding to new_U
.
Yuan, M., Joseph, V. R., and Zou H. (2009) "Structured Variable Selection and Estimation," The Annals of Applied Statistics, 3(4):1738–1757.
x1 <- runif(1000) x2 <- runif(1000) x3 <- runif(1000) error <- rnorm(1000) X <- data.frame(x1, x2, x3) U_all <- data.frame(model.matrix(~. + x1:x2 + x1:x3 + x2:x3 + I(x1^2) + I(x2^2) + I(x3^2), X)) colnames(U_all) <- c("X.Intercept.", "x1", "x2", "x3", "x1:x1", "x2:x2", "x3:x3", "x1:x2", "x1:x3", "x2:x3") # ":" is required for detecting the parent variables of a second-order interaction. new_idx <- sample(1:1000, 800) new_U <- U_all[new_idx,] U_idx <- setdiff(1:1000, new_idx) U <- U_all[U_idx,] y_all <- 20*U_all$x1 + 15*U_all$`x1:x1` + 10*U_all$`x1:x2` + error y <- y_all[U_idx] nnGarrote(U, y, new_U)
x1 <- runif(1000) x2 <- runif(1000) x3 <- runif(1000) error <- rnorm(1000) X <- data.frame(x1, x2, x3) U_all <- data.frame(model.matrix(~. + x1:x2 + x1:x3 + x2:x3 + I(x1^2) + I(x2^2) + I(x3^2), X)) colnames(U_all) <- c("X.Intercept.", "x1", "x2", "x3", "x1:x1", "x2:x2", "x3:x3", "x1:x2", "x1:x3", "x2:x3") # ":" is required for detecting the parent variables of a second-order interaction. new_idx <- sample(1:1000, 800) new_U <- U_all[new_idx,] U_idx <- setdiff(1:1000, new_idx) U <- U_all[U_idx,] y_all <- 20*U_all$x1 + 15*U_all$`x1:x1` + 10*U_all$`x1:x2` + error y <- y_all[U_idx] nnGarrote(U, y, new_U)
Phadke (1986) described a 32-run experiment aimed at increasing the lifespan of router bits used in a routing process to cut printed wiring boards from a panel. The experiment contains seven two-level factors and two four-level qualitative factors which are denoted by A through J with the exclusion of I.
data(router_bit)
data(router_bit)
A data frame with 32 rows and 10 columns.
Phadke, M. S. (1986) "Design Optimization Case Studies," AT&T Technical Journal, 65, 51–68.