| Title: | Feature Screening via Adaptive Iterative Ridge (Air-HOLP and Air-OLS) |
|---|---|
| Description: | Implements two complementary high-dimensional feature screening methods, Adaptive Iterative Ridge High-dimensional Ordinary Least-squares Projection (Air-HOLP, suitable when the number of predictors p is greater than or equal to the sample size n) and Adaptive Iterative Ridge Ordinary Least Squares (Air-OLS, for n greater than p). Also provides helper functions to generate compound-symmetry and AR(1) correlated data, plus a unified Air() front end and a summary method. For methodological details see Joudah, Muller and Zhu (2025) <doi:10.1007/s11222-025-10599-6>. |
| Authors: | Ibrahim Joudah [aut, cre] (ORCID: <https://orcid.org/0000-0002-0397-6266>), Samuel Muller [aut] (ORCID: <https://orcid.org/0000-0002-3087-8127>), Houying Zhu [aut] (ORCID: <https://orcid.org/0000-0002-2515-7413>) |
| Maintainer: | Ibrahim Joudah <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-18 06:19:06 UTC |
| Source: | https://github.com/cran/AirScreen |
AirHOLP and AirOLS
Air is a high-level wrapper that applies either the
AirHOLP or AirOLS methods based on data
dimensions. It returns an AirResult object ready for inspection with
summary.
Air( X, y, m = NULL, screening_threshold = NULL, penalty = 10, penalty_type = c("adaptive", "fixed", "both"), method = c("auto", "AirHOLP", "AirOLS") )Air( X, y, m = NULL, screening_threshold = NULL, penalty = 10, penalty_type = c("adaptive", "fixed", "both"), method = c("auto", "AirHOLP", "AirOLS") )
X |
Numeric predictor matrix ( |
y |
Numeric response vector of length |
m |
Integer specifying the number of coefficients retained at each
adaptive-penalty iteration (default |
screening_threshold |
Integer specifying the number of screened features to display in summary. |
penalty |
Numeric scalar or vector with ridge penalty value(s)
(default |
penalty_type |
One of |
method |
|
When method = "auto" (default), AirHOLP is used
if ; otherwise AirOLS is chosen.
penalty_type chooses whether the ridge penalty
is selected adaptively, fixed at the supplied value(s), or
both (returning two sets of ranks).
Air checks the validity of inputs and substitute by defaults when invalid.
An object of class "AirResult", a named list that may
contain the following elements (depending on penalty_type):
order, order_adaptive, order_fixed
Integer vector of feature indices sorted by absolute Air-HOLP or Air-OLS score, from largest to smallest.
rank, rank_adaptive, rank_fixed
Integer
vector of feature ranks matching order.
Beta, Beta_adaptive, Beta_fixed
Numeric vector of Air-HOLP or Air-OLS coefficient estimates.
penalty, penalty_adaptive, penalty_fixed
Final ridge penalty value(s) used.
The helper summary.AirResult prints a concise summary.
Joudah, I., Muller, S., and Zhu, H. (2025). "Air-HOLP: Adaptive Regularized Feature Screening for High-Dimensional Data." Statistics and Computing. doi:10.1007/s11222-025-10599-6
## simple example (p > n -> AirHOLP) set.seed(314) X <- matrix(rnorm(100000), nrow = 200, ncol = 500) y <- X[, 1] + X[, 2] + X[, 3] + X[, 4] + 3*rnorm(200) result <- Air(X, y, penalty_type = "both") summary(result) ## multiple fixed penalty values result2 <- Air(X, y, penalty_type = "fixed", penalty = c(1, 100)) summary(result2)## simple example (p > n -> AirHOLP) set.seed(314) X <- matrix(rnorm(100000), nrow = 200, ncol = 500) y <- X[, 1] + X[, 2] + X[, 3] + X[, 4] + 3*rnorm(200) result <- Air(X, y, penalty_type = "both") summary(result) ## multiple fixed penalty values result2 <- Air(X, y, penalty_type = "fixed", penalty = c(1, 100)) summary(result2)
This function ranks features with the Adaptive Iterative Ridge
High-dimensional Ordinary Least-squares Projection (Air-HOLP) method of
Joudah et al. (2025) and returns both the per-feature ranks and the
ordered feature indices. AirHOLP is intended for the high-dimensional
case . When , use AirOLS instead.
AirHOLP( X, y, Threshold = min(ncol(X) - 1, ceiling(nrow(X)/log(nrow(X)))), r0 = 10, adapt = TRUE, iter = 10, Lambda, Un, XUn )AirHOLP( X, y, Threshold = min(ncol(X) - 1, ceiling(nrow(X)/log(nrow(X)))), r0 = 10, adapt = TRUE, iter = 10, Lambda, Un, XUn )
X |
Numeric predictor matrix of dimension |
y |
Numeric response vector of length |
Threshold |
Integer specifying the number of coefficients retained at
each adaptive-penalty iteration (default |
r0 |
Numeric initial ridge penalty (default |
adapt |
Logical; set to |
iter |
Integer; maximum number of iterations for adaptive-penalty
selection (default |
Lambda |
Eigenvalues of |
Un |
Eigenvectors of |
XUn |
|
The Threshold parameter controls how many coefficients are kept at each
iteration of the adaptive-penalty procedure.
The default value performs well in most
settings; changing it can reduce stability, so we recommend keeping the
default unless you have a specific reason to adjust it.
The parameters Lambda, Un, and XUn are helpful to run
AirHOLP on or more different y vectors for the
same X (to avoid repeated heavy computations).
An object of class AirResult containing
order_rInteger vector of feature indices sorted by absolute Air-HOLP score, from largest to smallest.
index_rInteger vector of feature ranks matching
order_r.
Beta_rNumeric vector of Air-HOLP coefficient estimates.
rFinal ridge-penalty value used.
iter_lastNumber of iterations performed for adaptive penalty selection.
Joudah, I., Muller, S., and Zhu, H. (2025). "Air-HOLP: Adaptive Regularized Feature Screening for High-Dimensional Data." Statistics and Computing. doi:10.1007/s11222-025-10599-6
# Example 1 (default parameters) set.seed(314) X <- matrix(rnorm(10000), nrow = 50, ncol = 200) y <- X[, 1] + X[, 10] + rnorm(50) result <- AirHOLP(X, y) str(result) result$order_r[1:7] # the top 7 features result$index_r[c(1, 10),] # ranks of the true features (x1, and x10) # Example 2 (multiple responses, same X) set.seed(314) X <- matrix(rnorm(2000000), nrow = 1000, ncol = 2000) y1 <- X[, 1] + X[, 2] + 6*rnorm(1000) y2 <- X[, 1] - X[, 2] + 12*rnorm(1000) y3 <- X[, 1] + X[, 2] - X[, 3] + 3*rnorm(1000) y4 <- X[, 1] - X[, 2] + X[, 3] + 9*rnorm(1000) XXT <- tcrossprod(X) eXXT <- eigen(XXT) Lambda <- eXXT$values Un <- eXXT$vectors XUn <- crossprod(X,Un) result1 <- AirHOLP(X, y1, Lambda = Lambda, Un = Un, XUn = XUn) result1$order_r[1:7] # the top 7 features result1$index_r[1:2,] # ranks of the true features (x1 and x2) result2 <- AirHOLP(X, y2, Lambda = Lambda, Un = Un, XUn = XUn) result2$order_r[1:7] # the top 7 features result2$index_r[1:2,] # ranks of the true features (x1 and x2) result3 <- AirHOLP(X, y3, Lambda = Lambda, Un = Un, XUn = XUn) result3$order_r[1:7] # the top 7 features result3$index_r[1:3,] # ranks of the true features (x1, x2, and x3) result4 <- AirHOLP(X, y4, Lambda = Lambda, Un = Un, XUn = XUn) result4$order_r[1:7] # the top 7 features result4$index_r[1:3,] # ranks of the true features (x1, x2, and x3) # Example 3 (multiple fixed penalties) set.seed(314) X <- matrix(rnorm(10000), nrow = 100, ncol = 200) y <- X[, 1] - X[, 2] + X[, 3] + 2*rnorm(100) result <- AirHOLP(X, y, r0 = c(1, 100, 10000), adapt = FALSE) str(result) result$order_r0[1:7,] # the top 7 features (for each penalty) result$index_r0[1:3,] # ranks of the true features (x1, x2, and x3)# Example 1 (default parameters) set.seed(314) X <- matrix(rnorm(10000), nrow = 50, ncol = 200) y <- X[, 1] + X[, 10] + rnorm(50) result <- AirHOLP(X, y) str(result) result$order_r[1:7] # the top 7 features result$index_r[c(1, 10),] # ranks of the true features (x1, and x10) # Example 2 (multiple responses, same X) set.seed(314) X <- matrix(rnorm(2000000), nrow = 1000, ncol = 2000) y1 <- X[, 1] + X[, 2] + 6*rnorm(1000) y2 <- X[, 1] - X[, 2] + 12*rnorm(1000) y3 <- X[, 1] + X[, 2] - X[, 3] + 3*rnorm(1000) y4 <- X[, 1] - X[, 2] + X[, 3] + 9*rnorm(1000) XXT <- tcrossprod(X) eXXT <- eigen(XXT) Lambda <- eXXT$values Un <- eXXT$vectors XUn <- crossprod(X,Un) result1 <- AirHOLP(X, y1, Lambda = Lambda, Un = Un, XUn = XUn) result1$order_r[1:7] # the top 7 features result1$index_r[1:2,] # ranks of the true features (x1 and x2) result2 <- AirHOLP(X, y2, Lambda = Lambda, Un = Un, XUn = XUn) result2$order_r[1:7] # the top 7 features result2$index_r[1:2,] # ranks of the true features (x1 and x2) result3 <- AirHOLP(X, y3, Lambda = Lambda, Un = Un, XUn = XUn) result3$order_r[1:7] # the top 7 features result3$index_r[1:3,] # ranks of the true features (x1, x2, and x3) result4 <- AirHOLP(X, y4, Lambda = Lambda, Un = Un, XUn = XUn) result4$order_r[1:7] # the top 7 features result4$index_r[1:3,] # ranks of the true features (x1, x2, and x3) # Example 3 (multiple fixed penalties) set.seed(314) X <- matrix(rnorm(10000), nrow = 100, ncol = 200) y <- X[, 1] - X[, 2] + X[, 3] + 2*rnorm(100) result <- AirHOLP(X, y, r0 = c(1, 100, 10000), adapt = FALSE) str(result) result$order_r0[1:7,] # the top 7 features (for each penalty) result$index_r0[1:3,] # ranks of the true features (x1, x2, and x3)
This function ranks features with the Adaptive Iterative Ridge Ordinary
Least Squares (Air-OLS) method of Joudah et al. (2025) and returns both
the per-feature ranks and the ordered feature indices. AirHOLP is intended
for the high-dimensional case . When , use
AirHOLP instead.
AirOLS( X, y, Threshold = min(ncol(X) - 1, ceiling(nrow(X)/log(nrow(X)))), r0 = 10, adapt = TRUE, iter = 10, Lambda, Up, XUp )AirOLS( X, y, Threshold = min(ncol(X) - 1, ceiling(nrow(X)/log(nrow(X)))), r0 = 10, adapt = TRUE, iter = 10, Lambda, Up, XUp )
X |
Numeric predictor matrix of dimension |
y |
Numeric response vector of length |
Threshold |
Integer specifying the number of coefficients retained at
each adaptive-penalty iteration (default |
r0 |
Numeric initial ridge penalty (default |
adapt |
Logical; set to |
iter |
Integer; maximum number of iterations for adaptive-penalty
selection (default |
Lambda |
Eigenvalues of |
Up |
Eigenvectors of |
XUp |
|
The Threshold parameter controls how many coefficients are kept at each
iteration of the adaptive-penalty procedure.
The default value performs well in most
settings; changing it can reduce stability, so we recommend keeping the
default unless you have a specific reason to adjust it.
The parameters Lambda, Up, and XUp are helpful to run
AirOLS on or more different y vectors for the
same X (to avoid repeated heavy computations).
An object of class AirResult containing
order_rInteger vector of feature indices sorted by absolute Air-OLS score, from largest to smallest.
index_rInteger vector of feature ranks matching
order_r.
Beta_rNumeric vector of Air-OLS coefficient estimates.
rFinal ridge-penalty value used.
iter_lastNumber of iterations performed for adaptive penalty selection.
Joudah, I., Muller, S., and Zhu, H. (2025). "Air-HOLP: Adaptive Regularized Feature Screening for High-Dimensional Data." Statistics and Computing. doi:10.1007/s11222-025-10599-6
# Example 1 (default parameters) set.seed(314) X <- matrix(rnorm(10000), nrow = 200, ncol = 50) y <- X[, 1] + X[, 10] + 2*rnorm(200) result <- AirOLS(X, y) str(result) result$order_r[1:7] # the top 7 features result$index_r[c(1, 10),] # ranks of the true features (x1, and x10) # Example 2 (multiple responses, same X) set.seed(314) X <- matrix(rnorm(2000000), nrow = 2000, ncol = 1000) y1 <- X[, 1] + X[, 2] + 6*rnorm(2000) y2 <- X[, 1] - X[, 2] + 12*rnorm(2000) y3 <- X[, 1] + X[, 2] - X[, 3] + 5*rnorm(2000) y4 <- X[, 1] - X[, 2] + X[, 3] + 10*rnorm(2000) XTX <- crossprod(X) eXTX <- eigen(XTX) Lambda <- eXTX$values Up <- eXTX$vectors XUp <- X%*%Up result1 <- AirOLS(X, y1, Lambda = Lambda, Up = Up, XUp = XUp) result1$order_r[1:7] # the top 7 features result1$index_r[1:2,] # ranks of the true features (x1 and x2) result2 <- AirOLS(X, y2, Lambda = Lambda, Up = Up, XUp = XUp) result2$order_r[1:7] # the top 7 features result2$index_r[1:2,] # ranks of the true features (x1 and x2) result3 <- AirOLS(X, y3, Lambda = Lambda, Up = Up, XUp = XUp) result3$order_r[1:7] # the top 7 features result3$index_r[1:3,] # ranks of the true features (x1, x2, and x3) result4 <- AirOLS(X, y4, Lambda = Lambda, Up = Up, XUp = XUp) result4$order_r[1:7] # the top 7 features result4$index_r[1:3,] # ranks of the true features (x1, x2, and x3) # Example 3 (multiple fixed penalties) set.seed(314) X <- matrix(rnorm(10000), nrow = 200, ncol = 100) y <- X[, 1] - X[, 2] + X[, 3] + 3*rnorm(200) result <- AirOLS(X, y, r0 = c(1, 100, 10000), adapt = FALSE) str(result) result$order_r0[1:7,] # the top 7 features for each penalty result$index_r0[1:3,] # ranks of the true features (x1, x2, and x3)# Example 1 (default parameters) set.seed(314) X <- matrix(rnorm(10000), nrow = 200, ncol = 50) y <- X[, 1] + X[, 10] + 2*rnorm(200) result <- AirOLS(X, y) str(result) result$order_r[1:7] # the top 7 features result$index_r[c(1, 10),] # ranks of the true features (x1, and x10) # Example 2 (multiple responses, same X) set.seed(314) X <- matrix(rnorm(2000000), nrow = 2000, ncol = 1000) y1 <- X[, 1] + X[, 2] + 6*rnorm(2000) y2 <- X[, 1] - X[, 2] + 12*rnorm(2000) y3 <- X[, 1] + X[, 2] - X[, 3] + 5*rnorm(2000) y4 <- X[, 1] - X[, 2] + X[, 3] + 10*rnorm(2000) XTX <- crossprod(X) eXTX <- eigen(XTX) Lambda <- eXTX$values Up <- eXTX$vectors XUp <- X%*%Up result1 <- AirOLS(X, y1, Lambda = Lambda, Up = Up, XUp = XUp) result1$order_r[1:7] # the top 7 features result1$index_r[1:2,] # ranks of the true features (x1 and x2) result2 <- AirOLS(X, y2, Lambda = Lambda, Up = Up, XUp = XUp) result2$order_r[1:7] # the top 7 features result2$index_r[1:2,] # ranks of the true features (x1 and x2) result3 <- AirOLS(X, y3, Lambda = Lambda, Up = Up, XUp = XUp) result3$order_r[1:7] # the top 7 features result3$index_r[1:3,] # ranks of the true features (x1, x2, and x3) result4 <- AirOLS(X, y4, Lambda = Lambda, Up = Up, XUp = XUp) result4$order_r[1:7] # the top 7 features result4$index_r[1:3,] # ranks of the true features (x1, x2, and x3) # Example 3 (multiple fixed penalties) set.seed(314) X <- matrix(rnorm(10000), nrow = 200, ncol = 100) y <- X[, 1] - X[, 2] + X[, 3] + 3*rnorm(200) result <- AirOLS(X, y, r0 = c(1, 100, 10000), adapt = FALSE) str(result) result$order_r0[1:7,] # the top 7 features for each penalty result$index_r0[1:3,] # ranks of the true features (x1, x2, and x3)
Newton-Raphson root finder
newton(f, fp, x, tol = 0.001, m = 100)newton(f, fp, x, tol = 0.001, m = 100)
f |
Function whose root is sought. |
fp |
Derivative function of |
x |
Numeric starting value. |
tol |
Convergence tolerance (default |
m |
Maximum number of iterations (default |
Iterates until the change is below tol
or m iterations are reached (then issues a warning).
The estimated root.
# Solve x^2 - 2 = 0 newton(function(x) x^2 - 2, function(x) 2*x, 1)# Solve x^2 - 2 = 0 newton(function(x) x^2 - 2, function(x) 2*x, 1)
rnormAR1 efficiently generates samples from a multivariate
normal distribution with AR(1)
correlation .
rnormAR1(n, p, rho = 0.5, means = 0, variances = 1)rnormAR1(n, p, rho = 0.5, means = 0, variances = 1)
n |
Number of observations. |
p |
Number of variables. |
rho |
Autoregressive correlation coefficient. |
means |
Numeric vector of feature means (length |
variances |
Numeric vector of feature variances (length |
A numeric matrix with the specified correlation,
means, and variances.
X1 <- rnormAR1(10, 5) X2 <- rnormAR1(10, 5, rho = 0.3, means = 2, variances = 4) X3 <- rnormAR1(10, 5, rho = 0.4, means = 1:5, variances = 3:7)X1 <- rnormAR1(10, 5) X2 <- rnormAR1(10, 5, rho = 0.3, means = 2, variances = 4) X3 <- rnormAR1(10, 5, rho = 0.4, means = 1:5, variances = 3:7)
rnormCS efficiently generates samples from a multivariate
normal distribution with compound symmetry correlation structure (all
features equally correlated).
rnormCS(n, p, rho = 0.5, means = 0, variances = 1)rnormCS(n, p, rho = 0.5, means = 0, variances = 1)
n |
Number of observations. |
p |
Number of features. |
rho |
Common correlation coefficient. |
means |
Numeric vector of feature means (length |
variances |
Numeric vector of feature variances (length |
A numeric matrix with the specified correlation,
means, and variances.
X1 <- rnormCS(10, 5) X2 <- rnormCS(10, 5, rho = 0.3, means = 2, variances = 4) X3 <- rnormCS(10, 5, rho = 0.4, means = 1:5, variances = 3:7)X1 <- rnormCS(10, 5) X2 <- rnormCS(10, 5, rho = 0.3, means = 2, variances = 4) X3 <- rnormCS(10, 5, rho = 0.4, means = 1:5, variances = 3:7)
AirResult ObjectProduces a compact, human-readable summary of the ranking results
returned by Air.
## S3 method for class 'AirResult' summary(object, ...)## S3 method for class 'AirResult' summary(object, ...)
object |
An object of class |
... |
Additional arguments (ignored; included for S3 compatibility). |
Invisibly returns object. The function is invoked for
its printing side-effect.
set.seed(314) X <- matrix(rnorm(500), 50, 100) y <- X[, 1] - 2*X[, 3] + rnorm(50) res <- Air(X, y, penalty_type = "both") summary(res)set.seed(314) X <- matrix(rnorm(500), 50, 100) y <- X[, 1] - 2*X[, 3] + rnorm(50) res <- Air(X, y, penalty_type = "both") summary(res)