Title: | Variable Selection in Nonparametric Models using B-Splines |
---|---|
Description: | A variable selection method using B-Splines in multivariate nOnparametric Regression models Based on partial dErivatives Regularization (ABSORBER) implements a novel variable selection method in a nonlinear multivariate model using B-splines. For further details we refer the reader to the paper Savino, M. E. and Lévy-Leduc, C. (2024), <https://hal.science/hal-04434820>. |
Authors: | Mary E. Savino [aut, cre], Celine Levy-Leduc [ctb] |
Maintainer: | Mary E. Savino <[email protected]> |
License: | GPL-2 |
Version: | 1.0 |
Built: | 2024-12-13 06:49:29 UTC |
Source: | CRAN |
absorber consists of two functions: "absorber.R" and "plot_selection.R". For further information on how to use these functions, we refer the reader to the vignette of the package.
Two datasets are also provided within this package and used as examples of this manual and in the vignette.
Mary E. Savino
Maintainer: Mary E. Savino <[email protected]>
Savino, M. E. and Lévy-Leduc, C. (2024) A novel variable selection method in nonlinear multivariate models using B-splines with an application to geoscience. <https://hal.science/hal-04434820>.
This function implements the method described in Savino, M. E. and Levy-Leduc, C (2024) for variable selection in nonlinear multivariate settings where observations are assumed to satisfy a nonparametric regression model. Each observation point should belong to .
absorber(x, y, M = 3, K = 1, all.variables = NULL, parallel = FALSE, nbCore = 1)
absorber(x, y, M = 3, K = 1, all.variables = NULL, parallel = FALSE, nbCore = 1)
x |
matrix of |
y |
vector containing the corresponding response variable associated to the input values |
M |
order of the B-spline basis used in the regression model. Default is 3 (quadratic B-splines). |
K |
number of evenly spaced knots to use in the B-spline basis. Default value is 1. |
all.variables |
list of characters or integers, labels of the variables. Default is |
parallel |
logical, if TRUE then a parallelized version of the code is used. Default is FALSE. |
nbCore |
numerical, number of cores used for parallelization, if parallel is set to TRUE. |
selec.var |
list of vectors of the selected variables, one vector for each penalization parameter. |
aic.var |
vector of variables selected using AIC. |
# --- Loading values of x --- # data('x_obs') # --- Loading values of the corresponding y --- # data('y_obs') x_trunc = x_obs[1:70,,drop=FALSE] y_trunc = y_obs[1:70] # --- Variable selection of f1 --- # absorber(x=x_trunc, y=y_trunc, M = 3) # --- Parallel computing --- # absorber(x=x_trunc, y=y_trunc, M = 3, parallel = TRUE, nbCore = 2)
# --- Loading values of x --- # data('x_obs') # --- Loading values of the corresponding y --- # data('y_obs') x_trunc = x_obs[1:70,,drop=FALSE] y_trunc = y_obs[1:70] # --- Variable selection of f1 --- # absorber(x=x_trunc, y=y_trunc, M = 3) # --- Parallel computing --- # absorber(x=x_trunc, y=y_trunc, M = 3, parallel = TRUE, nbCore = 2)
This function produces a histogram of the variable selection percentage for each variable on which depends. It also displays the results obtained with the AIC.
plot_selection(object)
plot_selection(object)
object |
output obtained with |
This function produces a ggplot2::ggplot()
plot to visualize the variables selected with absorber()
.
# --- Loading values of x --- # data('x_obs') # --- Loading values of the corresponding y --- # data('y_obs') x_trunc = x_obs[1:70,,drop=FALSE] y_trunc = y_obs[1:70] # --- Variable selection of f1 --- # res = absorber(x=x_trunc, y=y_trunc, M = 3) plot_selection(res)
# --- Loading values of x --- # data('x_obs') # --- Loading values of the corresponding y --- # data('y_obs') x_trunc = x_obs[1:70,,drop=FALSE] y_trunc = y_obs[1:70] # --- Variable selection of f1 --- # res = absorber(x=x_trunc, y=y_trunc, M = 3) plot_selection(res)
An example of 700 observations for the variable selection of function (see Savino and Lévy-Leduc (2024) for more details) with five input variables.
data("x_obs")
data("x_obs")
Numeric matrix of 700 rows and 5 columns.
An example of noisy observations obtained by adding a Gaussian noise to associated to the input values contained in x_obs.rda. See Savino and Lévy-Leduc (2024) for the expression of
.
data("y_obs")
data("y_obs")
Numeric vector of 700 values.