Title: | OGA+HDIC+Trim and High-Dimensional Linear Regression Models |
---|---|
Description: | Ing and Lai (2011) <doi:10.5705/ss.2010.081> proposed a high-dimensional model selection procedure that comprises three steps: orthogonal greedy algorithm (OGA), high-dimensional information criterion (HDIC), and Trim. The first two steps, OGA and HDIC, are used to sequentially select input variables and determine stopping rules, respectively. The third step, Trim, is used to delete irrelevant variables remaining in the second step. This package aims at fitting a high-dimensional linear regression model via OGA+HDIC+Trim. |
Authors: | Hai-Tang Chiou, Ching-Kang Ing, Tze Leung Lai |
Maintainer: | Hai-Tang Chiou <[email protected]> |
License: | GPL-2 |
Version: | 1.0.0 |
Built: | 2024-11-19 06:32:43 UTC |
Source: | CRAN |
Select valuables via orthogonal greedy algorithm (OGA).
OGA(X, y, Kn = NULL, c1 = 5)
OGA(X, y, Kn = NULL, c1 = 5)
X |
Input matrix of |
y |
Response vector of length |
Kn |
The number of OGA iterations. |
c1 |
The tuning parameter for the number of OGA iterations. Default is |
n |
The number of observations. |
p |
The number of input variables. |
Kn |
The number of OGA iterations. |
J_OGA |
The index set of |
Hai-Tang Chiou, Ching-Kang Ing and Tze Leung Lai.
Ing, C.-K. and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513.
# Example setup (Example 3 in Section 5 of Ing and Lai (2011)) n = 400 p = 4000 q = 10 beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75) b = sqrt(3/(4 * q)) x_relevant = matrix(rnorm(n * q), n, q) d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q) x_relevant_sum = apply(x_relevant, 1, sum) x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum) X = cbind(x_relevant, x_irrelevant) epsilon = rnorm(n) y = as.vector((x_relevant %*% beta_1q) + epsilon) # Select valuables via OGA OGA(X, y)
# Example setup (Example 3 in Section 5 of Ing and Lai (2011)) n = 400 p = 4000 q = 10 beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75) b = sqrt(3/(4 * q)) x_relevant = matrix(rnorm(n * q), n, q) d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q) x_relevant_sum = apply(x_relevant, 1, sum) x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum) X = cbind(x_relevant, x_irrelevant) epsilon = rnorm(n) y = as.vector((x_relevant %*% beta_1q) + epsilon) # Select valuables via OGA OGA(X, y)
The first step is to sequentially select input variables via orthogonal greedy algorithm (OGA). The second step is to determine the number of OGA iterations using high-dimensional information criterion (HDIC). The third step is to remove irrelevant variables remaining in the second step using HDIC.
Ohit(X, y, Kn = NULL, c1 = 5, HDIC_Type = "HDBIC", c2 = 2, c3 = 2.01, intercept = TRUE)
Ohit(X, y, Kn = NULL, c1 = 5, HDIC_Type = "HDBIC", c2 = 2, c3 = 2.01, intercept = TRUE)
X |
Input matrix of |
y |
Response vector of length |
Kn |
The number of OGA iterations. |
c1 |
The tuning parameter for the number of OGA iterations. Default is |
HDIC_Type |
High-dimensional information criterion. The value must be |
c2 |
The tuning parameter for |
c3 |
The tuning parameter for |
intercept |
Should an intercept be fitted? Default is |
n |
The number of observations. |
p |
The number of input variables. |
Kn |
The number of OGA iterations. |
J_OGA |
The index set of Kn variables sequencially selected by OGA. |
HDIC |
The HDIC values along the OGA path. |
J_HDIC |
The index set of valuables determined by OGA+HDIC. |
J_Trim |
The index set of valuables determined by OGA+HDIC+Trim. |
betahat_HDIC |
The estimated regression coefficients of the model determined by OGA+HDIC. |
betahat_Trim |
The estimated regression coefficients of the model determined by OGA+HDIC+Trim. |
Hai-Tang Chiou, Ching-Kang Ing and Tze Leung Lai.
Ing, C.-K. and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513.
# Example setup (Example 3 in Section 5 of Ing and Lai (2011)) n = 400 p = 4000 q = 10 beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75) b = sqrt(3/(4 * q)) x_relevant = matrix(rnorm(n * q), n, q) d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q) x_relevant_sum = apply(x_relevant, 1, sum) x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum) X = cbind(x_relevant, x_irrelevant) epsilon = rnorm(n) y = as.vector((x_relevant %*% beta_1q) + epsilon) # Fit a high-dimensional linear regression model via OGA+HDIC+Trim Ohit(X, y, intercept = FALSE)
# Example setup (Example 3 in Section 5 of Ing and Lai (2011)) n = 400 p = 4000 q = 10 beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75) b = sqrt(3/(4 * q)) x_relevant = matrix(rnorm(n * q), n, q) d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q) x_relevant_sum = apply(x_relevant, 1, sum) x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum) X = cbind(x_relevant, x_irrelevant) epsilon = rnorm(n) y = as.vector((x_relevant %*% beta_1q) + epsilon) # Fit a high-dimensional linear regression model via OGA+HDIC+Trim Ohit(X, y, intercept = FALSE)
This function returns predictions from a fitted "Ohit"
object.
predict_Ohit(object, newX)
predict_Ohit(object, newX)
object |
Fitted "Ohit" model object. |
newX |
Matrix of new values for |
pred_HDIC |
The predicted value based on the model determined by OGA+HDIC. |
pred_Trim |
The predicted value based on the model determined by OGA+HDIC+Trim. |
Hai-Tang Chiou, Ching-Kang Ing and Tze Leung Lai.
Ing, C.-K. and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513.
# Example setup (Example 3 in Section 5 of Ing and Lai (2011)) n = 410 p = 4000 q = 10 beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75) b = sqrt(3/(4 * q)) x_relevant = matrix(rnorm(n * q), n, q) d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q) x_relevant_sum = apply(x_relevant, 1, sum) x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum) X = cbind(x_relevant, x_irrelevant) epsilon = rnorm(n) y = as.vector((x_relevant %*% beta_1q) + epsilon) # with intercept fit1 = Ohit(X[1:400, ], y[1:400]) predict_Ohit(fit1, rbind(X[401:401, ])) predict_Ohit(fit1, X[401:410, ]) # without intercept fit2 = Ohit(X[1:400, ], y[1:400], intercept = FALSE) predict_Ohit(fit2, rbind(X[401:401, ])) predict_Ohit(fit2, X[401:410, ])
# Example setup (Example 3 in Section 5 of Ing and Lai (2011)) n = 410 p = 4000 q = 10 beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75) b = sqrt(3/(4 * q)) x_relevant = matrix(rnorm(n * q), n, q) d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q) x_relevant_sum = apply(x_relevant, 1, sum) x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum) X = cbind(x_relevant, x_irrelevant) epsilon = rnorm(n) y = as.vector((x_relevant %*% beta_1q) + epsilon) # with intercept fit1 = Ohit(X[1:400, ], y[1:400]) predict_Ohit(fit1, rbind(X[401:401, ])) predict_Ohit(fit1, X[401:410, ]) # without intercept fit2 = Ohit(X[1:400, ], y[1:400], intercept = FALSE) predict_Ohit(fit2, rbind(X[401:401, ])) predict_Ohit(fit2, X[401:410, ])