Title: | Estimation, Variable Selection and Prediction for Functional Semiparametric Models |
---|---|
Description: | Routines for the estimation or simultaneous estimation and variable selection in several functional semiparametric models with scalar responses are provided. These models include the functional single-index model, the semi-functional partial linear model, and the semi-functional partial linear single-index model. Additionally, the package offers algorithms for handling scalar covariates with linear effects that originate from the discretization of a curve. This functionality is applicable in the context of the linear model, the multi-functional partial linear model, and the multi-functional partial linear single-index model. |
Authors: | German Aneiros [aut], Silvia Novo [aut, cre] |
Maintainer: | Silvia Novo <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.1 |
Built: | 2024-12-01 08:41:38 UTC |
Source: | CRAN |
This package is dedicated to the estimation and simultaneous estimation and variable selection in several functional semiparametric models with scalar response. These include the functional single-index model, the semi-functional partial linear model, and the semi-functional partial linear single-index model. Additionally, it encompasses algorithms for addressing estimation and variable selection in linear models and bi-functional partial linear models when the scalar covariates with linear effects are derived from the discretisation of a curve. Furthermore, the package offers routines for kernel- and kNN-based estimation using Nadaraya-Watson weights in models with a nonparametric or semiparametric component. It also includes S3 methods (predict, plot, print, summary) to facilitate statistical analysis across all the considered models and estimation procedures.
The package can be divided into several thematic sections:
Estimation of the functional single-index model.
predict, plot, summary
and print
methods for fsim.kernel
and fsim.kNN
classes.
Simultaneous estimation and variable selection in linear and semi-functional partial linear models.
Linear model
predict, summary, plot
and print
methods for lm.pels
class.
Semi-functional partial linear model.
predict, summary, plot
and print
methods for sfpl.kernel
and sfpl.kNN
classes.
Semi-functional partial linear single-index model.
predict, summary, plot
and print
methods for sfplsim.kernel
and sfplsim.kNN
classes.
Algorithms for impact point selection in models with covariates derived from the discretisation of a curve.
Linear model
predict, summary, plot
and print
methods for PVS
class.
Bi-functional partial linear model.
predict, summary, plot
and print
methods for PVS.kernel
and PVS.kNN
classes.
Bi-functional partial linear single-index model.
predict, summary, plot
and print
methods for FASSMR.kernel
, FASSMR.kNN
, IASSMR.kernel
and IASSMR.kNN
classes.
German Aneiros [aut], Silvia Novo [aut, cre]
Maintainer: Silvia Novo <[email protected]>
Aneiros, G. and Vieu, P., (2014) Variable selection in infinite-dimensional problems, Statistics and Probability Letters, 94, 12–20. doi:10.1016/j.spl.2014.06.025.
Aneiros, G., Ferraty, F., and Vieu, P., (2015) Variable selection in partial linear regression with functional covariate, Statistics, 49 1322–1347, doi:10.1080/02331888.2014.998675.
Aneiros, G., and Vieu, P., (2015) Partial linear modelling with multi-functional covariates. Computational Statistics, 30, 647–671. doi:10.1007/s00180-015-0568-8.
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single-index regression, Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
Novo, S., Aneiros, G., and Vieu, P., (2021) Sparse semiparametric regression when predictors are mixture of functional and high-dimensional variables, TEST, 30, 481–504, doi:10.1007/s11749-020-00728-w.
Novo, S., Aneiros, G., and Vieu, P., (2021) A kNN procedure in semiparametric functional data analysis, Statistics and Probability Letters, 171, 109028, doi:10.1016/j.spl.2020.109028.
Novo, S., Vieu, P., and Aneiros, G., (2021) Fast and efficient algorithms for sparse semiparametric bi-functional regression, Australian and New Zealand Journal of Statistics, 63, 606–638, doi:10.1111/anzs.12355.
This function implements the Fast Algorithm for Sparse Semiparametric Multi-functional Regression (FASSMR) with kernel estimation. This algorithm is specifically designed for estimating multi-functional partial linear single-index models, which incorporate multiple scalar variables and a functional covariate as predictors. These scalar variables are derived from the discretisation of a curve and have linear effect while the functional covariate exhibits a single-index effect.
FASSMR selects the impact points of the discretised curve and estimates the model. The algorithm employs a penalised least-squares regularisation procedure, integrated with kernel estimation using Nadaraya-Watson weights. It uses B-spline expansions to represent curves and eligible functional indexes. Additionally, it utilises an objective criterion (criterion
) to determine the initial number of covariates in the reduced model (w.opt
), the bandwidth (h.opt
), and the penalisation parameter (lambda.opt
).
FASSMR.kernel.fit(x, z, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, kind.of.kernel = "quad",range.grid = NULL, nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
FASSMR.kernel.fit(x, z, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, kind.of.kernel = "quad",range.grid = NULL, nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
x |
Matrix containing the observations of the functional covariate collected by row (functional single-index component). |
z |
Matrix containing the observations of the functional covariate that is discretised collected by row (linear component). |
y |
Vector containing the scalar response. |
seed.coeff |
Vector of initial values used to build the set |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3. |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
min.q.h |
Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05. |
max.q.h |
Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5. |
h.seq |
Vector containing the sequence of bandwidths. The default is a sequence of |
num.h |
Positive integer indicating the number of bandwidths in the grid. The default is 10. |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Positive integer indicating the number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
wn |
A vector of positive integers indicating the eligible number of covariates in the reduced model. For more information, refer to the section |
criterion |
The criterion used to select the tuning and regularisation parameters: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
n.core |
Number of CPU cores designated for parallel execution. The default is |
The multi-functional partial linear single-index model (MFPLSIM) is given by the expression
where:
is a real random response and
denotes a random element belonging to some separable Hilbert space
with inner product denoted by
. The second functional predictor
is assumed to be a curve defined on some interval
which is observed at the points
.
is a vector of unknown real coefficients and
denotes a smooth unknown link function. In addition,
is an unknown functional direction in
.
denotes the random error.
In the MFPLSIM, we assume that only a few scalar variables from the set form part of the model. Therefore, we must select the relevant variables in the linear component (the impact points of the curve
on the response) and estimate the model.
In this function, the MFPLSIM is fitted using the FASSMR algorithm. The main idea of this algorithm is to consider a reduced model, with only some (very few) linear covariates (but covering the entire discretization interval of ), and discarding directly the other linear covariates (since it is expected that they contain very similar information about the response).
To explain the algorithm, we assume, without loss of generality, that the number of linear covariates can be expressed as follows:
with
and
integers.
This consideration allows us to build a subset of the initial
linear covariates, containging only
equally spaced discretised observations of
covering the entire interval
. This subset is the following:
where and
denotes the smallest integer not less than the real number
.
We consider the following reduced model, which involves only the linear covariates belonging to :
The program receives the eligible numbers of linear covariates for building the reduced model through the argument wn
.
Then, the penalised least-squares variable selection procedure, with kernel estimation, is applied to the reduced model. This is done using the function sfplsim.kernel.fit
, which requires the remaining arguments (for details, see the documentation of the function sfplsim.kernel.fit
). The estimates obtained are the outputs of the FASSMR algorithm. For further details on this algorithm, see Novo et al. (2021).
Remark: If the condition is not met (then
is not an integer number), the function considers variable
values
. Specifically:
where denotes the integer part of the real number
.
The function supports parallel computation. To avoid it, we can set n.core=1
.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
|
beta.red |
Estimate of |
theta.est |
Coefficients of |
indexes.beta.nonnull |
Indexes of the non-zero |
h.opt |
Selected bandwidth (when |
w.opt |
Selected size for |
lambda.opt |
Selected value for the penalisation parameter (when |
IC |
Value of the criterion function considered to select |
vn.opt |
Selected value of |
beta.w |
Estimate of |
theta.w |
Estimate of |
IC.w |
Value of the criterion function for each value of the sequence |
indexes.beta.nonnull.w |
Indexes of the non-zero linear coefficients for each value of the sequence |
lambda.w |
Selected value of penalisation parameter for each value of the sequence |
h.w |
Selected bandwidth for each value of the sequence |
index01 |
Indexes of the covariates (in the entire set of |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Novo, S., Vieu, P., and Aneiros, G., (2021) Fast and efficient algorithms for sparse semiparametric bi-functional regression. Australian and New Zealand Journal of Statistics, 63, 606–638, doi:10.1111/anzs.12355.
See also sfplsim.kernel.fit
, predict.FASSMR.kernel
, plot.FASSMR.kernel
and IASSMR.kernel.fit
.
Alternative method FASSMR.kNN.fit
.
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit <- FASSMR.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], nknot.theta=2, lambda.min.l=0.03, max.q.h=0.35, nknot=20,criterion="BIC", max.iter=5000) proc.time()-ptm
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit <- FASSMR.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], nknot.theta=2, lambda.min.l=0.03, max.q.h=0.35, nknot=20,criterion="BIC", max.iter=5000) proc.time()-ptm
This function implements the Fast Algorithm for Sparse Semiparametric Multi-functional Regression (FASSMR) with kNN estimation. This algorithm is specifically designed for estimating multi-functional partial linear single-index models, which incorporate multiple scalar variables and a functional covariate as predictors. These scalar variables are derived from the discretisation of a curve and have linear effect while the functional covariate exhibits a single-index effect.
FASSMR selects the impact points of the discretised curve and estimates the model. The algorithm employs a penalised least-squares regularisation procedure, integrated with kNN estimation using Nadaraya-Watson weights. It uses B-spline expansions to represent curves and eligible functional indexes. Additionally, it utilises an objective criterion (criterion
) to determine the initial number of covariates in the reduced model (w.opt
), the number of neighbours (k.opt
), and the penalisation parameter (lambda.opt
).
FASSMR.kNN.fit(x, z, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, kind.of.kernel = "quad",range.grid = NULL, nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
FASSMR.kNN.fit(x, z, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, kind.of.kernel = "quad",range.grid = NULL, nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
x |
Matrix containing the observations of the functional covariate collected by row (functional single-index component). |
z |
Matrix containing the observations of the functional covariate that is discretised collected by row (linear component). |
y |
Vector containing the scalar response. |
seed.coeff |
Vector of initial values used to build the set |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3. |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
knearest |
Vector of positive integers containing the sequence in which the number of nearest neighbours |
min.knn |
A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours |
max.knn |
A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours |
step |
A positive integer used to construct the sequence of k-nearest neighbours as follows: |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Positive integer indicating the number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
wn |
A vector of positive integers indicating the eligible number of covariates in the reduced model. For more information, refer to the section |
criterion |
The criterion used to select the tuning and regularisation parameters: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
n.core |
Number of CPU cores designated for parallel execution. The default is |
The multi-functional partial linear single-index model (MFPLSIM) is given by the expression
where:
is a real random response and
denotes a random element belonging to some separable Hilbert space
with inner product denoted by
. The second functional predictor
is assumed to be a curve defined on some interval
which is observed at the points
.
is a vector of unknown real coefficients and
denotes a smooth unknown link function. In addition,
is an unknown functional direction in
.
denotes the random error.
In the MFPLSIM, we assume that only a few scalar variables from the set form part of the model. Therefore, we must select the relevant variables in the linear component (the impact points of the curve
on the response) and estimate the model.
In this function, the MFPLSIM is fitted using the FASSMR algorithm. The main idea of this algorithm is to consider a reduced model, with only some (very few) linear covariates (but covering the entire discretization interval of ), and discarding directly the other linear covariates (since it is expected that they contain very similar information about the response).
To explain the algorithm, we assume, without loss of generality, that the number of linear covariates can be expressed as follows:
with
and
integers.
This consideration allows us to build a subset of the initial
linear covariates, containging only
equally spaced discretised observations of
covering the entire interval
. This subset is the following:
where and
denotes the smallest integer not less than the real number
.
We consider the following reduced model, which involves only the linear covariates belonging to :
The program receives the eligible numbers of linear covariates for building the reduced model through the argument wn
.
Then, the penalised least-squares variable selection procedure, with kNN estimation, is applied to the reduced model. This is done using the function sfplsim.kNN.fit
, which requires the remaining arguments (for details, see the documentation of the function sfplsim.kNN.fit
). The estimates obtained are the outputs of the FASSMR algorithm. For further details on this algorithm, see Novo et al. (2021).
Remark: If the condition is not met (then
is not an integer number), the function considers variable
values
. Specifically:
where denotes the integer part of the real number
.
The function supports parallel computation. To avoid it, we can set n.core=1
.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
|
beta.red |
Estimate of |
theta.est |
Coefficients of |
indexes.beta.nonnull |
Indexes of the non-zero |
k.opt |
Selected number of nearest neighbours (when |
w.opt |
Selected size for |
lambda.opt |
Selected value for the penalisation parameter (when |
IC |
Value of the criterion function considered to select |
vn.opt |
Selected value of |
beta.w |
Estimate of |
theta.w |
Estimate of |
IC.w |
Value of the criterion function for each value of the sequence |
indexes.beta.nonnull.w |
Indexes of the non-zero linear coefficients for each value of the sequence |
lambda.w |
Selected value of penalisation parameter for each value of the sequence |
k.w |
Selected number of neighbours for each value of the sequence |
index01 |
Indexes of the covariates (in the entire set of |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Novo, S., Vieu, P., and Aneiros, G., (2021) Fast and efficient algorithms for sparse semiparametric bi-functional regression. Australian and New Zealand Journal of Statistics, 63, 606–638, doi:10.1111/anzs.12355.
See also sfplsim.kNN.fit, predict.FASSMR.kNN
, plot.FASSMR.kNN
and IASSMR.kNN.fit
.
Alternative method FASSMR.kernel.fit
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- FASSMR.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], nknot.theta=2, lambda.min.l=0.03, max.knn=20,nknot=20,criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- FASSMR.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], nknot.theta=2, lambda.min.l=0.03, max.knn=20,nknot=20,criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
The package includes the following internal functions, based on the code by F. Ferraty, which is available on his website at https://www.math.univ-toulouse.fr/~ferraty/SOFTWARES/NPFDA/index.html.
approx.spline.deriv
Bspline.ini
fnp.kernel.fit
fnp.kernel.fit.test
fnp.kernel.test
fnp.kNN.fit
fnp.kNN.fit.test
fnp.kNN.fit.test.loc
fnp.kNN.GCV
fnp.kNN.test
fsim.kernel.fit.fixedtheta
fsim.kNN.fit.fixedtheta
fun.kernel
fun.kernel.fixedtheta
fun.kNN
fun.kNN.fixedtheta
funopare.kNN
H.fnp.kernel
H.fnp.kNN
H.fsim.kernel
H.fsim.kNN
interp.spline.deriv
quad
semimetric.deriv
semimetric.interv
semimetric.pca
sfplsim.kernel.fit.fixedtheta
sfplsim.kNN.fit.fixedtheta
Splinemlf
symsolve
This function fits a functional single-index model (FSIM) between a functional covariate and a scalar response. It employs kernel estimation with Nadaraya-Watson weights and uses B-spline expansions to represent curves and eligible functional indexes.
The function also utilises the leave-one-out cross-validation (LOOCV) criterion to select the bandwidth (h.opt
) and the coefficients of the functional index in the spline basis (theta.est
). It performs a joint minimisation of the LOOCV objective function in both the bandwidth and the functional index.
fsim.kernel.fit(x, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, n.core = NULL)
fsim.kernel.fit(x, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, n.core = NULL)
x |
Matrix containing the observations of the functional covariate (i.e. curves) collected by row. |
y |
Vector containing the scalar response. |
seed.coeff |
Vector of initial values used to build the set |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3 |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
min.q.h |
Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05. |
max.q.h |
Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5. |
h.seq |
Vector containing the sequence of bandwidths. The default is a sequence of |
num.h |
Positive integer indicating the number of bandwidths in the grid. The default is 10. |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
n.core |
Number of CPU cores designated for parallel execution.The default is |
The functional single-index model (FSIM) is given by the expression:
where denotes a scalar response,
is a functional covariate valued in a separable Hilbert space
with an inner product
. The term
denotes the random error,
is the unknown functional index and
denotes the unknown smooth link function.
The FSIM is fitted using the kernel estimator
with Nadaraya-Watson weights
where
the real positive number is the bandwidth.
is a kernel function (see the argument
kind.of.kernel
).
is the projection semi-metric, and
is an estimate of
.
The procedure requires the estimation of the function-parameter . Therefore, we use B-spline expansions to represent curves (dimension
nknot+order.Bspline
) and eligible functional indexes (dimension nknot.theta+order.Bspline
). Then, we build a set of eligible functional indexes by calibrating (to ensure the identifiability of the model) the set of initial coefficients given in
seed.coeff
. The larger this set is, the greater the size of . Since our approach requires intensive computation, a trade-off between the size of
and the performance of the estimator is necessary. For that, Ait-Saidi et al. (2008) suggested considering
order.Bspline=3
and seed.coeff=c(-1,0,1)
. For details on the construction of , see Novo et al. (2019).
We obtain the estimated coefficients of in the spline basis (
theta.est
) and the selected bandwidth (h.opt
) by minimising the LOOCV criterion. This function performs a joint minimisation in both parameters, the bandwidth and the functional index, and supports parallel computation. To avoid parallel computation, we can set n.core=1
.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
theta.est |
Coefficients of |
h.opt |
Selected bandwidth. |
r.squared |
Coefficient of determination. |
var.res |
Redidual variance. |
df |
Residual degrees of freedom. |
yhat.cv |
Predicted values for the scalar response using leave-one-out samples. |
CV.opt |
Minimum value of the CV function, i.e. the value of CV for |
CV.values |
Vector containing CV values for each functional index in |
H |
Hat matrix. |
m.opt |
Index of |
theta.seq.norm |
The vector |
h.seq |
Sequence of eligible values for |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Ait-Saidi, A., Ferraty, F., Kassa, R., and Vieu, P. (2008) Cross-validated estimations in the single-functional index model. Statistics, 42(6), 475–494, doi:10.1080/02331880801980377.
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
See also fsim.kernel.test
, predict.fsim.kernel
, plot.fsim.kernel
.
Alternative procedure fsim.kNN.fit
.
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 #FSIM fit. ptm<-proc.time() fit<-fsim.kernel.fit(y[1:160],x=X[1:160,],max.q.h=0.35, nknot=20, range.grid=c(850,1050),nknot.theta=4) proc.time()-ptm fit names(fit)
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 #FSIM fit. ptm<-proc.time() fit<-fsim.kernel.fit(y[1:160],x=X[1:160,],max.q.h=0.35, nknot=20, range.grid=c(850,1050),nknot.theta=4) proc.time()-ptm fit names(fit)
This function fits a functional single-index model (FSIM) between a functional covariate and a scalar response. It employs kernel estimation with Nadaraya-Watson weights and uses B-spline expansions to represent curves and eligible functional indexes.
The function also utilises the leave-one-out cross-validation (LOOCV) criterion to select the bandwidth (h.opt
) and the coefficients of the functional index in the spline basis (theta.est
). It performs an iterative minimisation of the LOOCV objective function, starting from an initial set of coefficients (gamma
) for the functional index.
fsim.kernel.fit.optim(x, y, nknot.theta = 3, order.Bspline = 3, gamma = NULL, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, threshold = 0.005)
fsim.kernel.fit.optim(x, y, nknot.theta = 3, order.Bspline = 3, gamma = NULL, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, threshold = 0.005)
x |
Matrix containing the observations of the functional covariate (i.e. curves) collected by row. |
y |
Vector containing the scalar response. |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3 |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
gamma |
Vector indicating the initial coefficients for the functional index used in the iterative procedure. By default, it is a vector of ones. The size of the vector is determined by the sum |
min.q.h |
Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05. |
max.q.h |
Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5. |
h.seq |
Vector containing the sequence of bandwidths. The default is a sequence of |
num.h |
Positive integer indicating the number of bandwidths in the grid. The default is 10. |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
nknot |
Positive integer indicating the number of regularly spaced interior knots for the B-spline expansion of the functional covariate. The default value is |
threshold |
The convergence threshold for the LOOCV function (scaled by the variance of the response). The default is |
The functional single-index model (FSIM) is given by the expression:
where denotes a scalar response,
is a functional covariate valued in a separable Hilbert space
with an inner product
. The term
denotes the random error,
is the unknown functional index and
denotes the unknown smooth link function.
The FSIM is fitted using the kernel estimator
with Nadaraya-Watson weights
where
the real positive number is the bandwidth.
is a kernel function (see the argument
kind.of.kernel
).
is the projection semi-metric, and
is an estimate of
.
The procedure requires the estimation of the function-parameter . Therefore, we use B-spline expansions to represent curves (dimension
nknot+order.Bspline
) and eligible functional indexes (dimension nknot.theta+order.Bspline
).
We obtain the estimated coefficients of in the spline basis (
theta.est
) and the selected bandwidth (h.opt
) by minimising the LOOCV criterion. This function performs an iterative minimisation procedure, starting from an initial set of coefficients (gamma
) for the functional index. Given a functional index, the optimal bandwidth according to the LOOCV criterion is selected. For a given bandwidth, the minimisation in the functional index is performed using the R function optim
. The procedure is iterated until convergence. For details, see Ferraty et al. (2013).
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
theta.est |
Coefficients of |
h.opt |
Selected bandwidth. |
r.squared |
Coefficient of determination. |
var.res |
Redidual variance. |
df |
Residual degrees of freedom. |
CV.opt |
Minimum value of the LOOCV function, i.e. the value of LOOCV for |
err |
Value of the LOOCV function divided by |
H |
Hat matrix. |
h.seq |
Sequence of eligible values for the bandwidth. |
CV.hseq |
CV values for each |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Ferraty, F., Goia, A., Salinelli, E., and Vieu, P. (2013) Functional projection pursuit regression. Test, 22, 293–320, doi:10.1007/s11749-012-0306-2.
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
See also predict.fsim.kernel
and plot.fsim.kernel
.
Alternative procedures fsim.kNN.fit.optim
, fsim.kernel.fit
and fsim.kNN.fit
.
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 #FSIM fit. ptm<-proc.time() fit<-fsim.kernel.fit.optim(y[1:160],x=X[1:160,],max.q.h=0.35, nknot=20, range.grid=c(850,1050),nknot.theta=4) proc.time()-ptm fit names(fit)
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 #FSIM fit. ptm<-proc.time() fit<-fsim.kernel.fit.optim(y[1:160],x=X[1:160,],max.q.h=0.35, nknot=20, range.grid=c(850,1050),nknot.theta=4) proc.time()-ptm fit names(fit)
This function computes predictions for a functional single-index model (FSIM) with a scalar response, which is estimated using the Nadaraya-Watson kernel estimator. It requires a functional index (), a global bandwidth (
h
), and the new observations of the functional covariate (x.test
) as inputs.
fsim.kernel.test(x, y, x.test, y.test=NULL, theta, nknot.theta = 3, order.Bspline = 3, h = 0.5, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL)
fsim.kernel.test(x, y, x.test, y.test=NULL, theta, nknot.theta = 3, order.Bspline = 3, h = 0.5, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL)
x |
Matrix containing the observations of the functional covariate in the training sample, collected by row. |
y |
Vector containing the scalar responses in the training sample. |
x.test |
Matrix containing the observations of the functional covariate in the the testing sample, collected by row. |
y.test |
(optional) Vector or matrix containing the scalar responses in the testing sample. |
theta |
Vector containing the coefficients of |
nknot.theta |
Number of regularly spaced interior knots in the B-spline expansion of |
order.Bspline |
Order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3 |
h |
The global bandwidth. The default if 0.5. |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
nknot |
Number of regularly spaced interior knots for the B-spline expansion of the functional covariate. The default value is |
The functional single-index model (FSIM) is given by the expression:
where denotes a scalar response,
is a functional covariate valued in a separable Hilbert space
with an inner product
. The term
denotes the random error,
is the unknown functional index and
denotes the unknown smooth link function;
is the training sample size.
Given ,
and a testing sample {
}, the predicted responses (see the value
y.estimated.test
) can be computed using the kernel procedure using
with Nadaraya-Watson weights
where
is a kernel function (see the argument
kind.of.kernel
).
for
is the projection semi-metric.
If the argument y.test
is provided to the program (i. e. if(!is.null(y.test))
), the function calculates the mean squared error of prediction (see the value MSE.test
). This is computed as mean((y.test-y.estimated.test)^2)
.
y.estimated.test |
Predicted responses. |
MSE.test |
Mean squared error between predicted and observed responses in the testing sample. |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
See also fsim.kernel.fit
, fsim.kernel.fit.optim
and predict.fsim.kernel
.
Alternative procedure fsim.kNN.test
.
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 train<-1:160 test<-161:215 #FSIM fit. ptm<-proc.time() fit<-fsim.kernel.fit(y=y[train],x=X[train,],max.q.h=0.35, nknot=20, range.grid=c(850,1050),nknot.theta=4) proc.time()-ptm fit #FSIM prediction test<-fsim.kernel.test(y=y[train],x=X[train,],x.test=X[test,],y.test=y[test], theta=fit$theta.est,h=fit$h.opt,nknot.theta=4,nknot=20, range.grid=c(850,1050)) #MSEP test$MSE.test
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 train<-1:160 test<-161:215 #FSIM fit. ptm<-proc.time() fit<-fsim.kernel.fit(y=y[train],x=X[train,],max.q.h=0.35, nknot=20, range.grid=c(850,1050),nknot.theta=4) proc.time()-ptm fit #FSIM prediction test<-fsim.kernel.test(y=y[train],x=X[train,],x.test=X[test,],y.test=y[test], theta=fit$theta.est,h=fit$h.opt,nknot.theta=4,nknot=20, range.grid=c(850,1050)) #MSEP test$MSE.test
This function fits a functional single-index model (FSIM) between a functional covariate and a scalar response. It employs kNN estimation with Nadaraya-Watson weights and uses B-spline expansions to represent curves and eligible functional indexes.
The function also utilises the leave-one-out cross-validation (LOOCV) criterion to select the number of neighbours (k.opt
) and the coefficients of the functional index in the spline basis (theta.est
). It performs a joint minimisation of the LOOCV objective function in both the number of neighbours and the functional index.
fsim.kNN.fit(x, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, n.core = NULL)
fsim.kNN.fit(x, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, n.core = NULL)
x |
Matrix containing the observations of the functional covariate (i.e. curves) collected by row. |
y |
Vector containing the scalar response. |
seed.coeff |
Vector of initial values used to build the set |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3 |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
knearest |
Vector of positive integers that defines the sequence within which the optimal number of nearest neighbours |
min.knn |
A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours |
max.knn |
A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours |
step |
A positive integer used to construct the sequence of k-nearest neighbours as follows: |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
n.core |
Number of CPU cores designated for parallel execution.The default is |
The functional single-index model (FSIM) is given by the expression:
where denotes a scalar response,
is a functional covariate valued in a separable Hilbert space
with an inner product
. The term
denotes the random error,
is the unknown functional index and
denotes the unknown smooth link function.
The FSIM is fitted using the kNN estimator
with Nadaraya-Watson weights
where
the positive integer is a smoothing factor, representing the number of nearest neighbours.
is a kernel function (see the argument
kind.of.kernel
).
is the projection semi-metric, computed using
semimetric.projec
and is an estimate of
.
, where
is the indicator function of the open ball defined by the projection semi-metric, with centre
and radius
.
The procedure requires the estimation of the function-parameter . Therefore, we use B-spline expansions to represent curves (dimension
nknot+order.Bspline
) and eligible functional indexes (dimension nknot.theta+order.Bspline
). Then, we build a set of eligible functional indexes by calibrating (to ensure the identifiability of the model) the set of initial coefficients given in
seed.coeff
. The larger this set is, the greater the size of . Since our approach requires intensive computation, a trade-off between the size of
and the performance of the estimator is necessary. For that, Ait-Saidi et al. (2008) suggested considering
order.Bspline=3
and seed.coeff=c(-1,0,1)
. For details on the construction of , see Novo et al. (2019).
We obtain the estimated coefficients of in the spline basis (
theta.est
) and the selected number of neighbours (k.opt
) by minimising the LOOCV criterion. This function performs a joint minimisation in both parameters, the number of neighbours and the functional index, and supports parallel computation. To avoid parallel computation, we can set n.core=1
.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
theta.est |
Coefficients of |
k.opt |
Selected number of nearest neighbours. |
r.squared |
Coefficient of determination. |
var.res |
Redidual variance. |
df |
Residual degrees of freedom. |
yhat.cv |
Predicted values for the scalar response using leave-one-out samples. |
CV.opt |
Minimum value of the CV function, i.e. the value of CV for |
CV.values |
Vector containing CV values for each functional index in |
H |
Hat matrix. |
m.opt |
Index of |
theta.seq.norm |
The vector |
k.seq |
Sequence of eligible values for |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Ait-Saidi, A., Ferraty, F., Kassa, R., and Vieu, P. (2008) Cross-validated estimations in the single-functional index model, Statistics, 42(6), 475–494, doi:10.1080/02331880801980377.
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression, Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
See also fsim.kNN.test
, predict.fsim.kNN
, plot.fsim.kNN
.
Alternative procedures fsim.kernel.fit
, fsim.kNN.fit.optim
and fsim.kernel.fit.optim
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 #FSIM fit. ptm<-proc.time() fit<-fsim.kNN.fit(y=y[1:160],x=X[1:160,],max.knn=20,nknot.theta=4,nknot=20, range.grid=c(850,1050)) proc.time()-ptm fit names(fit)
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 #FSIM fit. ptm<-proc.time() fit<-fsim.kNN.fit(y=y[1:160],x=X[1:160,],max.knn=20,nknot.theta=4,nknot=20, range.grid=c(850,1050)) proc.time()-ptm fit names(fit)
This function fits a functional single-index model (FSIM) between a functional covariate and a scalar response. It employs kNN estimation with Nadaraya-Watson weights and uses B-spline expansions to represent curves and eligible functional indexes.
The function also utilises the leave-one-out cross-validation (LOOCV) criterion to select the bandwidth (h.opt
) and the coefficients of the functional index in the spline basis (theta.est
). It performs an iterative minimisation of the LOOCV objective function, starting from an initial set of coefficients (gamma
) for the functional index.
fsim.kNN.fit.optim(x, y, order.Bspline = 3, nknot.theta = 3, gamma = NULL, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, threshold = 0.005)
fsim.kNN.fit.optim(x, y, order.Bspline = 3, nknot.theta = 3, gamma = NULL, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, threshold = 0.005)
x |
Matrix containing the observations of the functional covariate (i.e. curves) collected by row. |
y |
Vector containing the scalar response. |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3 |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
gamma |
Vector indicating the initial coefficients for the functional index used in the iterative procedure. By default, it is a vector of ones. The size of the vector is determined by the sum |
knearest |
Vector of positive integers that defines the sequence within which the optimal number of nearest neighbours |
min.knn |
A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours |
max.knn |
A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours |
step |
A positive integer used to construct the sequence of k-nearest neighbours as follows: |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
nknot |
Positive integer indicating the number of regularly spaced interior knots for the B-spline expansion of the functional covariate. The default value is |
threshold |
The convergence threshold for the LOOCV function (scaled by the variance of the response). The default is |
The functional single-index model (FSIM) is given by the expression:
where denotes a scalar response,
is a functional covariate valued in a separable Hilbert space
with an inner product
. The term
denotes the random error,
is the unknown functional index and
denotes the unknown smooth link function.
The FSIM is fitted using the kNN estimator
with Nadaraya-Watson weights
where
the positive integer is a smoothing factor, representing the number of nearest neighbours.
is a kernel function (see the argument
kind.of.kernel
).
is the projection semi-metric and
is an estimate of
.
, where
is the indicator function of the open ball defined by the projection semi-metric, with centre
and radius
.
The procedure requires the estimation of the function-parameter . Therefore, we use B-spline expansions to represent curves (dimension
nknot+order.Bspline
) and eligible functional indexes (dimension nknot.theta+order.Bspline
).
We obtain the estimated coefficients of in the spline basis (
theta.est
) and the selected number of neighbours (k.opt
) by minimising the LOOCV criterion. This function performs an iterative minimisation procedure, starting from an initial set of coefficients (gamma
) for the functional index. Given a functional index, the optimal number of neighbours according to the LOOCV criterion is selected. For a given number of neighbours, the minimisation in the functional index is performed using the R function optim
. The procedure is iterated until convergence. For details, see Ferraty et al. (2013).
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
theta.est |
Coefficients of |
k.opt |
Selected number of neighbours. |
r.squared |
Coefficient of determination. |
var.res |
Redidual variance. |
df |
Residual degrees of freedom. |
CV.opt |
Minimum value of the LOOCV function, i.e. the value of LOOCV for |
err |
Value of the LOOCV function divided by |
H |
Hat matrix. |
k.seq |
Sequence of eligible values for |
CV.hseq |
CV values for each |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Ferraty, F., Goia, A., Salinelli, E., and Vieu, P. (2013) Functional projection pursuit regression. Test, 22, 293–320, doi:10.1007/s11749-012-0306-2.
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
See also predict.fsim.kNN
and plot.fsim.kNN
.
Alternative procedures fsim.kernel.fit.optim
, fsim.kernel.fit
and fsim.kNN.fit
.
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 #FSIM fit. ptm<-proc.time() fit<-fsim.kNN.fit.optim(y=y[1:160],x=X[1:160,],max.knn=20,nknot.theta=4,nknot=20, range.grid=c(850,1050)) proc.time()-ptm fit names(fit)
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 #FSIM fit. ptm<-proc.time() fit<-fsim.kNN.fit.optim(y=y[1:160],x=X[1:160,],max.knn=20,nknot.theta=4,nknot=20, range.grid=c(850,1050)) proc.time()-ptm fit names(fit)
This function computes predictions for a functional single-index model (FSIM) with a scalar response, which is estimated using the Nadaraya-Watson kNN estimator. It requires a functional index (), a global bandwidth (
h
), and the new observations of the functional covariate (x.test
) as inputs.
fsim.kNN.test(x, y, x.test, y.test = NULL, theta, order.Bspline = 3, nknot.theta = 3, k = 4, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL)
fsim.kNN.test(x, y, x.test, y.test = NULL, theta, order.Bspline = 3, nknot.theta = 3, k = 4, kind.of.kernel = "quad", range.grid = NULL, nknot = NULL)
x |
Matrix containing the observations of the functional covariate in the training sample, collected by row. |
y |
Vector containing the scalar responses in the training sample. |
x.test |
Matrix containing the observations of the functional covariate in the the testing sample, collected by row. |
y.test |
(optional) Vector or matrix containing the scalar responses in the testing sample. |
theta |
Vector containing the coefficients of |
nknot.theta |
Number of regularly spaced interior knots in the B-spline expansion of |
order.Bspline |
Order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3 |
k |
The number of nearest neighbours. The default is 4. |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
nknot |
Number of regularly spaced interior knots for the B-spline expansion of the functional covariate. The default value is |
The functional single-index model (FSIM) is given by the expression:
where denotes a scalar response,
is a functional covariate valued in a separable Hilbert space
with an inner product
. The term
denotes the random error,
is the unknown functional index and
denotes the unknown smooth link function;
is the training sample size.
Given ,
and a testing sample {
}, the predicted responses (see the value
y.estimated.test
) can be computed using the kNN procedure by means of
with Nadaraya-Watson weights
where
is a kernel function (see the argument
kind.of.kernel
).
for
is the projection semi-metric.
, where
is the indicator function of the open ball defined by the projection semi-metric, with centre
and radius
.
If the argument y.test
is provided to the program (i. e. if(!is.null(y.test))
), the function calculates the mean squared error of prediction (see the value MSE.test
). This is computed as mean((y.test-y.estimated.test)^2)
.
y.estimated.test |
Predicted responses. |
MSE.test |
Mean squared error between predicted and observed responses in the testing sample. |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
See also fsim.kNN.fit
, fsim.kNN.fit.optim
and predict.fsim.kNN
.
Alternative procedure fsim.kernel.test
.
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 train<-1:160 test<-161:215 #FSIM fit. ptm<-proc.time() fit<-fsim.kNN.fit(y=y[train],x=X[train,],max.knn=20,nknot.theta=4,nknot=20, range.grid=c(850,1050)) proc.time()-ptm fit #FSIM prediction test<-fsim.kNN.test(y=y[train],x=X[train,],x.test=X[test,],y.test=y[test], theta=fit$theta.est,k=fit$k.opt,nknot.theta=4,nknot=20, range.grid=c(850,1050)) #MSEP test$MSE.test
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 train<-1:160 test<-161:215 #FSIM fit. ptm<-proc.time() fit<-fsim.kNN.fit(y=y[train],x=X[train,],max.knn=20,nknot.theta=4,nknot=20, range.grid=c(850,1050)) proc.time()-ptm fit #FSIM prediction test<-fsim.kNN.test(y=y[train],x=X[train,],x.test=X[test,],y.test=y[test], theta=fit$theta.est,k=fit$k.opt,nknot.theta=4,nknot=20, range.grid=c(850,1050)) #MSEP test$MSE.test
This function implements the Improved Algorithm for Sparse Semiparametric Multi-functional Regression (IASSMR) with kernel estimation. This algorithm is specifically designed for estimating multi-functional partial linear single-index models, which incorporate multiple scalar variables and a functional covariate as predictors. These scalar variables are derived from the discretisation of a curve and have linear effects while the functional covariate exhibits a single-index effect.
IASSMR is a two-stage procedure that selects the impact points of the discretised curve and estimates the model. The algorithm employs a penalised least-squares regularisation procedure, integrated with kernel estimation using Nadaraya-Watson weights. It uses B-spline expansions to represent curves and eligible functional indexes. Additionally, it utilises an objective criterion (criterion
) to determine the initial number of covariates in the reduced model (w.opt
), the bandwidth (h.opt
), and the penalisation parameter (lambda.opt
).
IASSMR.kernel.fit(x, z, y, train.1 = NULL, train.2 = NULL, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
IASSMR.kernel.fit(x, z, y, train.1 = NULL, train.2 = NULL, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
x |
Matrix containing the observations of the functional covariate (functional single-index component), collected by row . |
z |
Matrix containing the observations of the functional covariate that is discretised (linear component), collected by row. |
y |
Vector containing the scalar response. |
train.1 |
Positions of the data that are used as the training sample in the 1st step. The default setting is |
train.2 |
Positions of the data that are used as the training sample in the 2nd step. The default setting is |
seed.coeff |
Vector of initial values used to build the set |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3. |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
min.q.h |
Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05. |
max.q.h |
Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5. |
h.seq |
Vector containing the sequence of bandwidths. The default is a sequence of |
num.h |
Positive integer indicating the number of bandwidths in the grid. The default is 10. |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
wn |
A vector of positive integers indicating the eligible number of covariates in the reduced model. For more information, refer to the section |
criterion |
The criterion used to select the tuning and regularisation parameters: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
n.core |
Number of CPU cores designated for parallel execution. The default is |
The multi-functional partial linear single-index model (MFPLSIM) is given by the expression
where:
represents a real random response and
denotes a random element belonging to some separable Hilbert space
with inner product denoted by
. The second functional predictor
is assumed to be a curve defined on the interval
, observed at the points
.
is a vector of unknown real coefficients, and
denotes a smooth unknown link function. In addition,
is an unknown functional direction in
.
denotes the random error.
In the MFPLSIM, it is assumed that only a few scalar variables from the set are part of the model. Therefore, the relevant variables in the linear component (the impact points of the curve
on the response) must be selected, and the model estimated.
In this function, the MFPLSIM is fitted using the IASSMR. The IASSMR is a two-step procedure. For this, we divide the sample into two independent subsamples, each asymptotically half the size of the original sample (). One subsample is used in the first stage of the method, and the other in the second stage.The subsamples are defined as follows:
Note that these two subsamples are specified to the program through the arguments train.1
and train.2
. The superscript , where
, indicates the stage of the method in which the sample, function, variable, or parameter is involved.
To explain the algorithm, we assume that the number of linear covariates can be expressed as follows:
, with
and
being integers.
First step. The FASSMR (see FASSMR.kernel.fit
) combined with kernel estimation is applied using only the subsample . Specifically:
Consider a subset of the initial linear covariates, which contains only
equally spaced discretized observations of
covering the interval
. This subset is the following:
where and
denotes the smallest integer not less than the real number
.The size (cardinality) of this subset is provided to the program in the argument
wn
(which contains a sequence of eligible sizes).
Consider the following reduced model, which involves only the linear covariates belonging to
:
The penalised least-squares variable selection procedure, with kernel estimation, is applied to the reduced model. This is done using the function sfplsim.kernel.fit
, which requires the remaining arguments (see sfplsim.kernel.fit
). The estimates obtained after that are the outputs of the first step of the algorithm.
Second step. The variables selected in the first step, along with those in their neighborhood, are included. The penalised least-squares procedure, combined with kernel estimation, is carried out again considering only the subsample . Specifically:
Consider a new set of variables:
Denoting by , the variables in
can be renamed as follows:
Consider the following model, which involves only the linear covariates belonging to
The penalised least-squares variable selection procedure, with kernel estimation, is applied to this model using the function sfplsim.kernel.fit
.
The outputs of the second step are the estimates of the MFPLSIM. For further details on this algorithm, see Novo et al. (2021).
Remark: If the condition is not met (then
is not an integer), the function considers variable
values
. Specifically:
where denotes the integer part of the real number
.
The function supports parallel computation. To avoid it, we can set n.core=1
.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
|
theta.est |
Coefficients of |
indexes.beta.nonnull |
Indexes of the non-zero |
h.opt |
Selected bandwidth (when |
w.opt |
Selected size for |
lambda.opt |
Selected value of the penalisation parameter |
IC |
Value of the criterion function considered to select |
vn.opt |
Selected value of |
beta2 |
Estimate of |
theta2 |
Estimate of |
indexes.beta.nonnull2 |
Indexes of the non-zero linear coefficients after the step 2 of the method for each value of the sequence |
h2 |
Selected bandwidth in the second step of the algorithm for each value of the sequence |
IC2 |
Optimal value of the criterion function in the second step for each value of the sequence |
lambda2 |
Selected value of penalisation parameter in the second step for each value of the sequence |
index02 |
Indexes of the covariates (in the entire set of |
beta1 |
Estimate of |
theta1 |
Estimate of |
h1 |
Selected bandwidth in the first step of the algorithm for each value of the sequence |
IC1 |
Optimal value of the criterion function in the first step for each value of the sequence |
lambda1 |
Selected value of penalisation parameter in the first step for each value of the sequence |
index01 |
Indexes of the covariates (in the whole set of |
index1 |
Indexes of the non-zero linear coefficients after the step 1 of the method for each value of the sequence |
... |
Further outputs to apply S3 methods. |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Novo, S., Vieu, P., and Aneiros, G., (2021) Fast and efficient algorithms for sparse semiparametric bi-functional regression. Australian and New Zealand Journal of Statistics, 63, 606–638, doi:10.1111/anzs.12355.
See also sfplsim.kernel.fit
, predict.IASSMR.kernel
, plot.IASSMR.kernel
and FASSMR.kernel.fit
.
Alternative methods IASSMR.kNN.fit
, FASSMR.kernel.fit
and FASSMR.kNN.fit
.
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- IASSMR.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,nknot.theta=2,lambda.min.h=0.03, lambda.min.l=0.03, max.q.h=0.35, nknot=20, criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- IASSMR.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,nknot.theta=2,lambda.min.h=0.03, lambda.min.l=0.03, max.q.h=0.35, nknot=20, criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
This function implements the Improved Algorithm for Sparse Semiparametric Multi-functional Regression (IASSMR) with kNN estimation. This algorithm is specifically designed for estimating multi-functional partial linear single-index models, which incorporate multiple scalar variables and a functional covariate as predictors. These scalar variables are derived from the discretisation of a curve and have linear effects while the functional covariate exhibits a single-index effect.
IASSMR is a two-stage procedure that selects the impact points of the discretised curve and estimates the model. The algorithm employs a penalised least-squares regularisation procedure, integrated with kNN estimation using Nadaraya-Watson weights. It uses B-spline expansions to represent curves and eligible functional indexes. Additionally, it utilises an objective criterion (criterion
) to determine the initial number of covariates in the reduced model (w.opt
), the number of neighbours (k.opt
), and the penalisation parameter (lambda.opt
).
IASSMR.kNN.fit(x, z, y, train.1 = NULL, train.2 = NULL, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
IASSMR.kNN.fit(x, z, y, train.1 = NULL, train.2 = NULL, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
x |
Matrix containing the observations of the functional covariate collected by row (functional single-index component). |
z |
Matrix containing the observations of the functional covariate that is discretised collected by row (linear component). |
y |
Vector containing the scalar response. |
train.1 |
Positions of the data that are used as the training sample in the 1st step. The default setting is |
train.2 |
Positions of the data that are used as the training sample in the 2nd step. The default setting is |
seed.coeff |
Vector of initial values used to build the set |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3. |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
knearest |
Vector of positive integers containing the sequence in which the number of nearest neighbours |
min.knn |
A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours |
max.knn |
A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours |
step |
A positive integer used to construct the sequence of k-nearest neighbours as follows: |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
wn |
A vector of positive integers indicating the eligible number of covariates in the reduced model. For more information, refer to the section |
criterion |
The criterion used to select the tuning and regularisation parameters: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
n.core |
Number of CPU cores designated for parallel execution. The default is |
The multi-functional partial linear single-index model (MFPLSIM) is given by the expression
where:
represents a real random response and
denotes a random element belonging to some separable Hilbert space
with inner product denoted by
. The second functional predictor
is assumed to be a curve defined on the interval
, observed at the points
.
is a vector of unknown real coefficients, and
denotes a smooth unknown link function. In addition,
is an unknown functional direction in
.
denotes the random error.
In the MFPLSIM, it is assumed that only a few scalar variables from the set are part of the model. Therefore, the relevant variables in the linear component (the impact points of the curve
on the response) must be selected, and the model estimated.
In this function, the MFPLSIM is fitted using the IASSMR. The IASSMR is a two-step procedure. For this, we divide the sample into two independent subsamples, each asymptotically half the size of the original (). One subsample is used in the first stage of the method, and the other in the second stage.The subsamples are defined as follows:
Note that these two subsamples are specified in the program through the arguments train.1
and train.2
. The superscript , where
, indicates the stage of the method in which the sample, function, variable, or parameter is involved.
To explain the algorithm, we assume that the number of linear covariates can be expressed as follows:
, with
and
being integers.
First step. The FASSMR (see FASSMR.kNN.fit
) combined with kNN estimation is applied using only the subsample . Specifically:
Consider a subset of the initial linear covariates, which contains only
equally spaced discretized observations of
covering the entire interval
. This subset is the following:
where and
denotes the smallest integer not less than the real number
.The size (cardinality) of this subset is provided to the program in the argument
wn
(which contains a sequence of eligible sizes).
Consider the following reduced model, which involves only the linear covariates belonging to
:
The penalised least-squares variable selection procedure, with kNN estimation, is applied to the reduced model. This is done using the function sfplsim.kNN.fit
, which requires the remaining arguments (see sfplsim.kNN.fit
). The estimates obtained after that are the outputs of the first step of the algorithm.
Second step. The variables selected in the first step, along with those in their neighborhood, are included. The penalised least-squares procedure, combined with kNN estimation, is carried out again considering only the subsample . Specifically:
Consider a new set of variables:
Denoting by , the variables in
can be renamed as follows:
Consider the following model, which involves only the linear covariates belonging to
The penalised least-squares variable selection procedure, with kNN estimation, is applied to this model using the function sfplsim.kNN.fit
.
The outputs of the second step are the estimates of the MFPLSIM. For further details on this algorithm, see Novo et al. (2021).
Remark: If the condition is not met (then
is not an integer number), the function considers variable
values
. Specifically:
where denotes the integer part of the real number
.
The function supports parallel computation. To avoid it, we can set n.core=1
.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
|
theta.est |
Coefficients of |
indexes.beta.nonnull |
Indexes of the non-zero |
k.opt |
Selected number of nearest neighbours (when |
w.opt |
Selected initial number of covariates in the reduced model. |
lambda.opt |
Selected value of the penalisation parameter |
IC |
Value of the criterion function considered to select |
vn.opt |
Selected value of |
beta2 |
Estimate of |
theta2 |
Estimate of |
indexes.beta.nonnull2 |
Indexes of the non-zero linear coefficients after the step 2 of the method for each value of the sequence |
knn2 |
Selected number of neighbours in the second step of the algorithm for each value of the sequence |
IC2 |
Optimal value of the criterion function in the second step for each value of the sequence |
lambda2 |
Selected value of penalisation parameter in the second step for each value of the sequence |
index02 |
Indexes of the covariates (in the entire set of |
beta1 |
Estimate of |
theta1 |
Estimate of |
knn1 |
Selected number of neighbours in the first step of the algorithm for each value of the sequence |
IC1 |
Optimal value of the criterion function in the first step for each value of the sequence |
lambda1 |
Selected value of penalisation parameter in the first step for each value of the sequence |
index01 |
Indexes of the covariates (in the whole set of |
index1 |
Indexes of the non-zero linear coefficients after the step 1 of the method for each value of the sequence |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Novo, S., Vieu, P., and Aneiros, G., (2021) Fast and efficient algorithms for sparse semiparametric bi-functional regression. Australian and New Zealand Journal of Statistics, 63, 606–638, doi:10.1111/anzs.12355.
See also sfplsim.kNN.fit, predict.IASSMR.kNN
, plot.IASSMR.kNN
and FASSMR.kNN.fit
.
Alternative method IASSMR.kernel.fit
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- IASSMR.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,nknot.theta=2,lambda.min.h=0.07, lambda.min.l=0.07, max.knn=20, nknot=20,criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- IASSMR.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,nknot.theta=2,lambda.min.h=0.07, lambda.min.l=0.07, max.knn=20, nknot=20,criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
This function fits a sparse linear model between a scalar response and a vector of scalar covariates. It employs a penalised least-squares regularisation procedure, with either (group)SCAD or (group)LASSO penalties. The method utilises an objective criterion (criterion
) to select the optimal regularisation parameter (lambda.opt
).
lm.pels.fit(z, y, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
lm.pels.fit(z, y, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
z |
Matrix containing the observations of the covariates collected by row. |
y |
Vector containing the scalar response. |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
lambda.seq |
Sequence of values in which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
criterion |
The criterion used to select the regularisation parameter |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
The sparse linear model (SLM) is given by the expression:
where denotes a scalar response,
are real covariates. In this equation,
is a vector of unknown real parameters and
represents the random error.
In this function, the SLM is fitted using a penalised least-squares (PeLS) approach by minimising
where is a penalty function (specified in the argument
penalty
) and is a tuning parameter.
To reduce the number of tuning parameters,
, to be selected for each sample, we consider
, where
denotes the OLS estimate of
and
is the estimated standard deviation. The parameter
is selected using the objetive criterion specified in the argument
criterion
.
For further details on the estimation procedure of the SLM, see e.g. Fan and Li. (2001). The PeLS objective function is minimised using the R function grpreg
of the package grpreg
(Breheny and Huang, 2015).
Remark: It should be noted that if we set lambda.seq
to , we obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using
lambda.seq
with a vaule is advisable when suspecting the presence of irrelevant variables.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
Estimate of |
indexes.beta.nonnull |
Indexes of the non-zero |
lambda.opt |
Selected value of lambda. |
IC |
Value of the criterion function considered to select |
vn.opt |
Selected value of |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Breheny, P., and Huang, J. (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25, 173–187, doi:10.1007/s11222-013-9424-2.
Fan, J., and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360, doi:10.1198/016214501753382273.
See also PVS.fit
.
data("Tecator") y<-Tecator$fat z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #LM fit ptm=proc.time() fit<-lm.pels.fit(z=z.com[train,], y=y[train],lambda.min.h=0.02, lambda.min.l=0.01,factor.pn=2, max.iter=5000, criterion="BIC") proc.time()-ptm #Results fit names(fit)
data("Tecator") y<-Tecator$fat z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #LM fit ptm=proc.time() fit<-lm.pels.fit(z=z.com[train,], y=y[train],lambda.min.h=0.02, lambda.min.l=0.01,factor.pn=2, max.iter=5000, criterion="BIC") proc.time()-ptm #Results fit names(fit)
plot
functions to generate visual representations for the outputs of several fitting functions: FASSMR.kernel.fit
, FASSMR.kNN.fit
, fsim.kernel.fit
, fsim.kernel.fit.optim
, fsim.kNN.fit
, fsim.kNN.fit.optim
, IASSMR.kernel.fit
, IASSMR.kNN.fit
, lm.pels.fit
, PVS.fit
, PVS.kernel.fit
, PVS.kNN.fit
, sfpl.kernel.fit
, sfpl.kNN.fit
,sfplsim.kernel.fit
and sfplsim.kNN.fit
.
## S3 method for class 'FASSMR.kernel' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0,...) ## S3 method for class 'FASSMR.kNN' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'fsim.kernel' plot(x,size=15,col1=1,col2=2, ...) ## S3 method for class 'fsim.kNN' plot(x,size=15,col1=1,col2=2,...) ## S3 method for class 'IASSMR.kernel' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'IASSMR.kNN' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'lm.pels' plot(x,size=15,col1=1,col2=2,col3=4, ...) ## S3 method for class 'PVS' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'PVS.kernel' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'PVS.kNN' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'sfpl.kernel' plot(x,size=15,col1=1,col2=2,col3=4, ...) ## S3 method for class 'sfpl.kNN' plot(x,size=15,col1=1,col2=2,col3=4, ...) ## S3 method for class 'sfplsim.kernel' plot(x,size=15,col1=1,col2=2,col3=4, ...) ## S3 method for class 'sfplsim.kNN' plot(x,size=15,col1=1,col2=2,col3=4, ...)
## S3 method for class 'FASSMR.kernel' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0,...) ## S3 method for class 'FASSMR.kNN' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'fsim.kernel' plot(x,size=15,col1=1,col2=2, ...) ## S3 method for class 'fsim.kNN' plot(x,size=15,col1=1,col2=2,...) ## S3 method for class 'IASSMR.kernel' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'IASSMR.kNN' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'lm.pels' plot(x,size=15,col1=1,col2=2,col3=4, ...) ## S3 method for class 'PVS' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'PVS.kernel' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'PVS.kNN' plot(x,ind=1:10, size=15,col1=1,col2=2,col3=4,option=0, ...) ## S3 method for class 'sfpl.kernel' plot(x,size=15,col1=1,col2=2,col3=4, ...) ## S3 method for class 'sfpl.kNN' plot(x,size=15,col1=1,col2=2,col3=4, ...) ## S3 method for class 'sfplsim.kernel' plot(x,size=15,col1=1,col2=2,col3=4, ...) ## S3 method for class 'sfplsim.kNN' plot(x,size=15,col1=1,col2=2,col3=4, ...)
x |
Output of the functions mentioned in the |
ind |
Indexes of the colors for the curves in the chart of estimated impact points. The default is |
size |
The size for title and axis labels in pts. The default is 15. |
col1 |
Color of the points in the charts. Also, color of the estimated functional index representation. The default is black. |
col2 |
Color of the nonparametric fit representation in FSIM functions, and of the straight line in 'Response vs Fitted Values' charts. The default is red. |
col3 |
Color of the nonparametric fit of the residuals in 'Residuals vs Fitted Values' charts. The default is blue. |
option |
Selection of charts to be plotted. The default, |
... |
Further arguments passed to or from other methods. |
The functions return different graphical representations.
For the classes fsim.kNN
and fsim.kernel
:
The estimated functional index: .
The regression fit.
For the classes lm.pels
, sfpl.kernel
and sfpl.kNN
:
The response over the fitted.values
.
The residuals
over the fitted.values
.
For the classes sfplsim.kernel
and sfplsim.kNN
:
The estimated functional index: .
The response over the fitted.values
.
The residuals
over the fitted.values
.
For the classes FASSMR.kernel
, FASSMR.kNN
, IASSMR.kernel
, IASSMR.kNN
, sfplsim.kernel
and sfplsim.kNN
:
If option=1
: The curves with the estimated impact points (in dashed vertical lines).
If option=2
: The estimated functional index: .
If option=3
:
The response over the fitted.values
.
The residuals
over the fitted.values
.
If option=0
: All chart are plotted.
For the classes PVS
, PVS.kNN
, PVS.kernel
:
If option=1
: The curves with the estimated impact points (in dashed vertical lines).
If option=2
:
The response over the fitted.values
.
The residuals
over the fitted.values
.
If option=0
: All chart are plotted.
All the routines implementing the plot S3 method use internally the R package ggplot2
to produce elegant and high quality
charts.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
FASSMR.kernel.fit
, FASSMR.kNN.fit
, fsim.kernel.fit
, fsim.kNN.fit
, IASSMR.kernel.fit
, IASSMR.kNN.fit
, lm.pels.fit
, PVS.fit
, PVS.kernel.fit
, PVS.kNN.fit
, sfpl.kernel.fit
, sfpl.kNN.fit
, sfplsim.kernel.fit
and sfplsim.kNN.fit
.
predict
method for the functional single-index model (FSIM) fitted using fsim.kernel.fit
, fsim.kernel.fit.optim
, fsim.kNN.fit
and fsim.kNN.fit.optim
.
## S3 method for class 'fsim.kernel' predict(object, newdata = NULL, y.test = NULL, ...) ## S3 method for class 'fsim.kNN' predict(object, newdata = NULL, y.test = NULL, ...)
## S3 method for class 'fsim.kernel' predict(object, newdata = NULL, y.test = NULL, ...) ## S3 method for class 'fsim.kNN' predict(object, newdata = NULL, y.test = NULL, ...)
object |
Output of the |
newdata |
A matrix containing new observations of the functional covariate collected by row. |
y.test |
(optional) A vector containing the new observations of the response. |
... |
Further arguments passed to or from other methods. |
The prediction is computed using the functions fsim.kernel.test
and fsim.kernel.fit
, respectively.
The function returns the predicted values of the response (y
) for newdata
. If !is.null(y.test)
, it also provides the mean squared error of prediction (MSEP
) computed as mean((y-y.test)^2)
.
If is.null(newdata)
the function returns the fitted values.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
fsim.kernel.fit
and fsim.kernel.test
or fsim.kNN.fit
and fsim.kNN.test
.
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 train<-1:160 test<-161:215 #FSIM fit. fit.kernel<-fsim.kernel.fit(y[train],x=X[train,],max.q.h=0.35, nknot=20, range.grid=c(850,1050),nknot.theta=4) fit.kNN<-fsim.kNN.fit(y=y[train],x=X[train,],max.knn=20,nknot=20, nknot.theta=4, range.grid=c(850,1050)) test<-161:215 pred.kernel<-predict(fit.kernel,newdata=X[test,],y.test=y[test]) pred.kernel$MSEP pred.kNN<-predict(fit.kNN,newdata=X[test,],y.test=y[test]) pred.kNN$MSEP
data(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra2 train<-1:160 test<-161:215 #FSIM fit. fit.kernel<-fsim.kernel.fit(y[train],x=X[train,],max.q.h=0.35, nknot=20, range.grid=c(850,1050),nknot.theta=4) fit.kNN<-fsim.kNN.fit(y=y[train],x=X[train,],max.knn=20,nknot=20, nknot.theta=4, range.grid=c(850,1050)) test<-161:215 pred.kernel<-predict(fit.kernel,newdata=X[test,],y.test=y[test]) pred.kernel$MSEP pred.kNN<-predict(fit.kNN,newdata=X[test,],y.test=y[test]) pred.kNN$MSEP
predict
method for the multi-functional partial linear single-index model (MFPLSIM) fitted using IASSMR.kernel.fit
or IASSMR.kNN.fit
.
## S3 method for class 'IASSMR.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'IASSMR.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, knearest.n = object$knearest, min.knn.n = object$min.knn, max.knn.n = object$max.knn.n, step.n = object$step, ...)
## S3 method for class 'IASSMR.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'IASSMR.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, knearest.n = object$knearest, min.knn.n = object$min.knn, max.knn.n = object$max.knn.n, step.n = object$step, ...)
object |
Output of the functions mentioned in the |
newdata.x |
A matrix containing new observations of the functional covariate in the functional single-index component, collected by row. |
newdata.z |
Matrix containing the new observations of the scalar covariates derived from the discretisation of a curve, collected by row. |
y.test |
(optional) A vector containing the new observations of the response. |
option |
Allows the choice between 1, 2 and 3. The default is 1. See the section |
... |
Further arguments. |
knearest.n |
Only used for objects |
min.knn.n |
Only used for objects |
max.knn.n |
Only used for objects |
step.n |
Only used for objects |
Three options are provided to obtain the predictions of the response for newdata.x
and newdata.z
:
If option=1
, we maintain all the estimates (k.opt
or h.opt
, theta.est
and beta.est
) to predict the functional single-index component of the model. As we use the estimates of the second step of the algorithm, only the train.2
is used as training sample to predict.
Then, it should be noted that k.opt
or h.opt
may not be suitable to predict the functional single-index component of the model.
If option=2
, we maintain theta.est
and beta.est
, while the tuning parameter ( or
) is selected again to predict the functional single-index component of the model. This selection is performed using the leave-one-out cross-validation criterion in the functional single-index model associated and the complete training sample (i.e.
train=c(train.1,train.2)
). As we use the entire training sample (not just a subsample of it), the sample size is modified and, as a consequence, the parameters knearest
, min.knn
, max.knn
, step
given to the function IASSMR.kNN.fit
may need to be provided again to compute predictions. For that, we add the arguments knearest.n
, min.knn.n
, max.knn.n
and step.n
.
If option=3
, we maintain only the indexes of the relevant variables selected by the IASSMR. We estimate again the linear coefficients and the functional index by means of sfplsim.kernel.fit
or sfplsim.kNN.fit
, respectively, without penalisation (setting lambda.seq=0
) and using the whole training sample (train=c(train.1,train.2)
). The method provides two predictions (and MSEPs):
a) The prediction associated with option=1
for sfplsim.kernel
or sfplsim.kNN
class.
b) The prediction associated with option=2
for sfplsim.kernel
or sfplsim.kNN
class.
(see the documentation of the functions predict.sfplsim.kernel
and predict.sfplsim.kNN
)
The function returns the predicted values of the response (y
) for newdata.x
and newdata.z
. If !is.null(y.test)
, it also provides the mean squared error of prediction (MSEP
) computed as mean((y-y.test)^2)
.
If option=3
, two sets of predictions (and two MSEPs) are provided, corresponding to the items a) and b) mentioned in the section Details.
If is.null(newdata.x)
or is.null(newdata.z)
, the function returns the fitted values.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
sfplsim.kernel.fit
, sfplsim.kNN.fit
, IASSMR.kernel.fit
, IASSMR.kNN.fit
.
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 test<-217:266 #Fit fit.kernel<-IASSMR.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,nknot.theta=2,lambda.min.h=0.03, lambda.min.l=0.03, max.q.h=0.35, nknot=20,criterion="BIC", max.iter=5000) fit.kNN<- IASSMR.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,nknot.theta=2,lambda.min.h=0.07, lambda.min.l=0.07, max.knn=20, nknot=20,criterion="BIC", max.iter=5000) #Predictions predict(fit.kernel,newdata.x=x.sug[test,],newdata.z=z.sug[test,],y.test=y.sug[test],option=2) predict(fit.kNN,newdata.x=x.sug[test,],newdata.z=z.sug[test,],y.test=y.sug[test],option=2)
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 test<-217:266 #Fit fit.kernel<-IASSMR.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,nknot.theta=2,lambda.min.h=0.03, lambda.min.l=0.03, max.q.h=0.35, nknot=20,criterion="BIC", max.iter=5000) fit.kNN<- IASSMR.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,nknot.theta=2,lambda.min.h=0.07, lambda.min.l=0.07, max.knn=20, nknot=20,criterion="BIC", max.iter=5000) #Predictions predict(fit.kernel,newdata.x=x.sug[test,],newdata.z=z.sug[test,],y.test=y.sug[test],option=2) predict(fit.kNN,newdata.x=x.sug[test,],newdata.z=z.sug[test,],y.test=y.sug[test],option=2)
predict
method for:
Linear model (LM) fitted using lm.pels.fit
.
Linear model with covariates derived from the discretization of a curve fitted using PVS.fit
.
## S3 method for class 'lm.pels' predict(object, newdata = NULL, y.test = NULL, ...) ## S3 method for class 'PVS' predict(object, newdata = NULL, y.test = NULL, ...)
## S3 method for class 'lm.pels' predict(object, newdata = NULL, y.test = NULL, ...) ## S3 method for class 'PVS' predict(object, newdata = NULL, y.test = NULL, ...)
object |
Output of the |
newdata |
Matrix containing the new observations of the scalar covariates (LM), or the scalar covariates resulting from the discretisation of a curve. Observations are collected by row. |
y.test |
(optional) A vector containing the new observations of the response. |
... |
Further arguments passed to or from other methods. |
The function returns the predicted values of the response (y
) for newdata
. If !is.null(y.test)
, it also provides the mean squared error of prediction (MSEP
) computed as mean((y-y.test)^2)
.
If is.null(newdata)
, then the function returns the fitted values.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
lm.pels.fit
and PVS.fit
.
data("Tecator") y<-Tecator$fat z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 test<-161:215 #LM fit. fit<-lm.pels.fit(z=z.com[train,], y=y[train],lambda.min.l=0.01, factor.pn=2, max.iter=5000, criterion="BIC") #Predictions predict(fit,newdata=z.com[test,],y.test=y[test]) data(Sugar) y<-Sugar$ash z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 test<-217:266 #Fit fit.pvs<-PVS.fit(z=z.sug[train,], y=y.sug[train],train.1=1:108,train.2=109:216, lambda.min.h=0.2,criterion="BIC", max.iter=5000) #Predictions predict(fit.pvs,newdata=z.sug[test,],y.test=y.sug[test])
data("Tecator") y<-Tecator$fat z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 test<-161:215 #LM fit. fit<-lm.pels.fit(z=z.com[train,], y=y[train],lambda.min.l=0.01, factor.pn=2, max.iter=5000, criterion="BIC") #Predictions predict(fit,newdata=z.com[test,],y.test=y[test]) data(Sugar) y<-Sugar$ash z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 test<-217:266 #Fit fit.pvs<-PVS.fit(z=z.sug[train,], y=y.sug[train],train.1=1:108,train.2=109:216, lambda.min.h=0.2,criterion="BIC", max.iter=5000) #Predictions predict(fit.pvs,newdata=z.sug[test,],y.test=y.sug[test])
predict
method for the multi-functional partial linear model (MFPLM) fitted using PVS.kernel.fit
or PVS.kNN.fit
.
## S3 method for class 'PVS.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'PVS.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, knearest.n = object$knearest, min.knn.n = object$min.knn, max.knn.n = object$max.knn.n, step.n = object$step, ...)
## S3 method for class 'PVS.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'PVS.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, knearest.n = object$knearest, min.knn.n = object$min.knn, max.knn.n = object$max.knn.n, step.n = object$step, ...)
object |
Output of the functions mentioned in the |
newdata.x |
A matrix containing new observations of the functional covariate in the functional nonparametric component, collected by row. |
newdata.z |
Matrix containing the new observations of the scalar covariates derived from the discretisation of a curve, collected by row. |
y.test |
(optional) A vector containing the new observations of the response. |
option |
Allows the selection among the choices 1, 2 and 3 for |
... |
Further arguments. |
knearest.n |
Only used for objects |
min.knn.n |
Only used for objects |
max.knn.n |
Only used for objects |
step.n |
Only used for objects |
To obtain the predictions of the response for newdata.x
and newdata.z
, the following options are provided:
If option=1
, we maintain all the estimates (k.opt
or h.opt
and beta.est
) to predict the functional nonparametric component of the model. As we use the estimates of the second step of the algorithm, only the train.2
is used as training sample to predict.
Then, it should be noted that k.opt
or h.opt
may not be suitable to predict the functional nonparametric component of the model.
If option=2
, we maintain beta.est
, while the tuning parameter ( or
) is selected again to predict the functional nonparametric component of the model. This selection is performed using the leave-one-out cross-validation (LOOCV) criterion in the associated functional nonparametric model and the complete training sample (i.e.
train=c(train.1,train.2)
), obtaining a global selection for or
. As we use the entire training sample (not just a subsample of it), the sample size is modified and, as a consequence, the parameters
knearest
, min.knn
, max.knn
, and step
given to the function IASSMR.kNN.fit
may need to be provided again to compute predictions. For that, we add the arguments knearest.n
, min.knn.n
, max.knn.n
and step.mn
.
If option=3
, we maintain only the indexes of the relevant variables selected by the IASSMR. We estimate again the linear coefficients using sfpl.kernel.fit
or sfpl.kNN.fit
, respectively, without penalisation (setting lambda.seq=0
) and using the entire training sample (train=c(train.1,train.2)
). The method provides two predictions (and MSEPs):
a) The prediction associated with option=1
for sfpl.kernel
or sfpl.kNN
class.
b) The prediction associated with option=2
for sfpl.kernel
or sfpl.kNN
class.
(see the documentation of the functions predict.sfpl.kernel
and predict.sfpl.kNN
)
If option=4
(an option only available for the class PVS.kNN
) we maintain beta.est
, while the tuning parameter is selected again to predict the functional nonparametric component of the model. This selection is performed using LOOCV criterion in the functional nonparametric model associated and the complete training sample (i.e.
train=c(train.1,train.2)
), obtaining a local selection for .
The function returns the predicted values of the response (y
) for newdata.x
and newdata.z
. If !is.null(y.test)
, it also provides the mean squared error of prediction (MSEP
) computed as mean((y-y.test)^2)
.
If option=3
, two sets of predictions (and two MSEPs) are provided, corresponding to the items a) and b) mentioned in the section Details.
If is.null(newdata.x)
or is.null(newdata.z)
, then the function returns the fitted values.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
PVS.kernel.fit
, sfpl.kernel.fit
and predict.sfpl.kernel
or PVS.kNN.fit
,
sfpl.kNN.fit
and predict.sfpl.kNN
.
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 test<-217:266 #Fit fit.kernel<- PVS.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train],train.1=1:108,train.2=109:216, lambda.min.h=0.03,lambda.min.l=0.03, max.q.h=0.35, nknot=20,criterion="BIC", max.iter=5000) fit.kNN<- PVS.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,lambda.min.h=0.07, lambda.min.l=0.07, nknot=20,criterion="BIC", max.iter=5000) #Preditions predict(fit.kernel,newdata.x=x.sug[test,],newdata.z=z.sug[test,],y.test=y.sug[test],option=2) predict(fit.kNN,newdata.x=x.sug[test,],newdata.z=z.sug[test,],y.test=y.sug[test],option=2)
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 test<-217:266 #Fit fit.kernel<- PVS.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train],train.1=1:108,train.2=109:216, lambda.min.h=0.03,lambda.min.l=0.03, max.q.h=0.35, nknot=20,criterion="BIC", max.iter=5000) fit.kNN<- PVS.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,lambda.min.h=0.07, lambda.min.l=0.07, nknot=20,criterion="BIC", max.iter=5000) #Preditions predict(fit.kernel,newdata.x=x.sug[test,],newdata.z=z.sug[test,],y.test=y.sug[test],option=2) predict(fit.kNN,newdata.x=x.sug[test,],newdata.z=z.sug[test,],y.test=y.sug[test],option=2)
predict
method for the semi-functional partial linear model (SFPLM) fitted using sfpl.kernel.fit
or sfpl.kNN.fit
.
## S3 method for class 'sfpl.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'sfpl.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...)
## S3 method for class 'sfpl.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'sfpl.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...)
object |
Output of the functions mentioned in the |
newdata.x |
Matrix containing new observations of the functional covariate collected by row. |
newdata.z |
Matrix containing the new observations of the scalar covariate collected by row. |
y.test |
(optional) A vector containing the new observations of the response. |
option |
Allows the selection among the choices 1 and 2 for |
... |
Further arguments passed to or from other methods. |
The following options are provided to obtain the predictions of the response for newdata.x
and newdata.z
:
If option=1
, we maintain all the estimations (k.opt
or h.opt
and beta.est
) to predict the functional nonparametric component of the model.
If option=2
, we maintain beta.est
, while the tuning parameter ( or
) is selected again to predict the functional nonparametric component of the model. This selection is performed using the leave-one-out cross-validation (LOOCV) criterion in the associated functional nonparametric model, obtaining a global selection for
or
.
In the case of sfpl.kNN
objects if option=3
, we maintain beta.est
, while the tuning parameter is seleted again to predict the functional nonparametric component of the model. This selection is performed using the LOOCV criterion in the associated functional nonparametric model, performing a local selection for
.
The function returns the predicted values of the response (y
) for newdata.x
and newdata.z
. If !is.null(y.test)
, it also provides the mean squared error of prediction (MSEP
) computed as mean((y-y.test)^2)
.
If is.null(newdata.x)
or is.null(newdata.z)
, then the function returns the fitted values.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
sfpl.kernel.fit
and sfpl.kNN.fit
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 test<-161:215 #Fit fit.kernel<-sfpl.kernel.fit(x=X[train,], z=z.com[train,], y=y[train],q=2, max.q.h=0.35,lambda.min.l=0.01, factor.pn=2, criterion="BIC", range.grid=c(850,1050), nknot=20, max.iter=5000) fit.kNN<-sfpl.kNN.fit(y=y[train],x=X[train,], z=z.com[train,],q=2, max.knn=20,lambda.min.l=0.01, factor.pn=2, criterion="BIC",range.grid=c(850,1050), nknot=20, max.iter=5000) #Predictions predict(fit.kernel,newdata.x=X[test,],newdata.z=z.com[test,],y.test=y[test], option=2) predict(fit.kNN,newdata.x=X[test,],newdata.z=z.com[test,],y.test=y[test], option=2)
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 test<-161:215 #Fit fit.kernel<-sfpl.kernel.fit(x=X[train,], z=z.com[train,], y=y[train],q=2, max.q.h=0.35,lambda.min.l=0.01, factor.pn=2, criterion="BIC", range.grid=c(850,1050), nknot=20, max.iter=5000) fit.kNN<-sfpl.kNN.fit(y=y[train],x=X[train,], z=z.com[train,],q=2, max.knn=20,lambda.min.l=0.01, factor.pn=2, criterion="BIC",range.grid=c(850,1050), nknot=20, max.iter=5000) #Predictions predict(fit.kernel,newdata.x=X[test,],newdata.z=z.com[test,],y.test=y[test], option=2) predict(fit.kNN,newdata.x=X[test,],newdata.z=z.com[test,],y.test=y[test], option=2)
predict
S3 method for:
Semi-functional partial linear single-index model (SFPLSIM) fitted using sfplsim.kernel.fit
or sfplsim.kNN.fit
.
Multi-functional partial linear single-index model (MFPLSIM) fitted using FASSMR.kernel.fit
or FASSMR.kNN.fit
.
## S3 method for class 'sfplsim.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'sfplsim.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'FASSMR.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'FASSMR.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...)
## S3 method for class 'sfplsim.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'sfplsim.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'FASSMR.kernel' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...) ## S3 method for class 'FASSMR.kNN' predict(object, newdata.x = NULL, newdata.z = NULL, y.test = NULL, option = NULL, ...)
object |
Output of the functions mentioned in the |
newdata.x |
A matrix containing new observations of the functional covariate in the functional-single index component collected by row. |
newdata.z |
Matrix containing the new observations of the scalar covariates (SFPLSIM) or of the scalar covariates coming from the discretisation of a curve (MFPLSIM), collected by row. |
y.test |
(optional) A vector containing the new observations of the response. |
option |
Allows the choice between 1 and 2. The default is 1. See the section |
... |
Further arguments passed to or from other methods. |
Two options are provided to obtain the predictions of the response for newdata.x
and newdata.z
:
If option=1
, we maintain all the estimations (k.opt
or h.opt
, theta.est
and beta.est
) to predict the functional single-index component of the model.
If option=2
, we maintain theta.est
and beta.est
, while the tuning parameter ( or
) is selected again to predict the functional single-index component of the model. This selection is performed using the leave-one-out cross-validation criterion in the associated functional single-index model.
The function returns the predicted values of the response (y
) for newdata.x
and newdata.z
. If !is.null(y.test)
, it also provides the mean squared error of prediction (MSEP
) computed as mean((y-y.test)^2)
.
If is.null(newdata.x)
or is.null(newdata.z)
, then the function returns the fitted values.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
sfplsim.kernel.fit
, sfplsim.kNN.fit
, FASSMR.kernel.fit
or FASSMR.kNN.fit
.
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra2 z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 test<-161:215 #SFPLSIM fit. Convergence errors for some theta are obtained. s.fit.kernel<-sfplsim.kernel.fit(x=X[train,], z=z.com[train,], y=y[train], max.q.h=0.35,lambda.min.l=0.01, factor.pn=2, nknot.theta=4, criterion="BIC", range.grid=c(850,1050), nknot=20, max.iter=5000) s.fit.kNN<-sfplsim.kNN.fit(y=y[train],x=X[train,], z=z.com[train,], max.knn=20,lambda.min.l=0.01, factor.pn=2, nknot.theta=4, criterion="BIC",range.grid=c(850,1050), nknot=20, max.iter=5000) predict(s.fit.kernel,newdata.x=X[test,],newdata.z=z.com[test,], y.test=y[test],option=2) predict(s.fit.kNN,newdata.x=X[test,],newdata.z=z.com[test,], y.test=y[test],option=2) data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 test<-217:266 m.fit.kernel <- FASSMR.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], nknot.theta=2, lambda.min.l=0.03, max.q.h=0.35,num.h = 10, nknot=20,criterion="BIC", max.iter=5000) m.fit.kNN<- FASSMR.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], nknot.theta=2, lambda.min.l=0.03, max.knn=20,nknot=20,criterion="BIC",max.iter=5000) predict(m.fit.kernel,newdata.x=x.sug[test,],newdata.z=z.sug[test,], y.test=y.sug[test],option=2) predict(m.fit.kNN,newdata.x=x.sug[test,],newdata.z=z.sug[test,], y.test=y.sug[test],option=2)
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra2 z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 test<-161:215 #SFPLSIM fit. Convergence errors for some theta are obtained. s.fit.kernel<-sfplsim.kernel.fit(x=X[train,], z=z.com[train,], y=y[train], max.q.h=0.35,lambda.min.l=0.01, factor.pn=2, nknot.theta=4, criterion="BIC", range.grid=c(850,1050), nknot=20, max.iter=5000) s.fit.kNN<-sfplsim.kNN.fit(y=y[train],x=X[train,], z=z.com[train,], max.knn=20,lambda.min.l=0.01, factor.pn=2, nknot.theta=4, criterion="BIC",range.grid=c(850,1050), nknot=20, max.iter=5000) predict(s.fit.kernel,newdata.x=X[test,],newdata.z=z.com[test,], y.test=y[test],option=2) predict(s.fit.kNN,newdata.x=X[test,],newdata.z=z.com[test,], y.test=y[test],option=2) data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 test<-217:266 m.fit.kernel <- FASSMR.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], nknot.theta=2, lambda.min.l=0.03, max.q.h=0.35,num.h = 10, nknot=20,criterion="BIC", max.iter=5000) m.fit.kNN<- FASSMR.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], nknot.theta=2, lambda.min.l=0.03, max.knn=20,nknot=20,criterion="BIC",max.iter=5000) predict(m.fit.kernel,newdata.x=x.sug[test,],newdata.z=z.sug[test,], y.test=y.sug[test],option=2) predict(m.fit.kNN,newdata.x=x.sug[test,],newdata.z=z.sug[test,], y.test=y.sug[test],option=2)
summary
and print
functions for fsim.kNN.fit
, fsim.kNN.fit.optim
, fsim.kernel.fit
and fsim.kernel.fit.optim
.
## S3 method for class 'fsim.kernel' print(x, ...) ## S3 method for class 'fsim.kNN' print(x, ...) ## S3 method for class 'fsim.kernel' summary(object, ...) ## S3 method for class 'fsim.kNN' summary(object, ...)
## S3 method for class 'fsim.kernel' print(x, ...) ## S3 method for class 'fsim.kNN' print(x, ...) ## S3 method for class 'fsim.kernel' summary(object, ...) ## S3 method for class 'fsim.kNN' summary(object, ...)
x |
Output of the |
... |
Further arguments. |
object |
Output of the |
The matched call.
The optimal value of the tunning parameter (h.opt
or k.opt
).
Coefficients of in the B-spline basis (
theta.est
: a vector of length(order.Bspline+nknot.theta).
Minimum value of the CV function, i.e. the value of CV for theta.est
and h.opt
/k.opt
.
R squared.
Residual variance.
Residual degrees of freedom.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
fsim.kernel.fit
and fsim.kNN.fit
.
summary
and print
functions for lm.pels.fit
and PVS.fit
.
## S3 method for class 'lm.pels' print(x, ...) ## S3 method for class 'PVS' print(x, ...) ## S3 method for class 'lm.pels' summary(object, ...) ## S3 method for class 'PVS' summary(object, ...)
## S3 method for class 'lm.pels' print(x, ...) ## S3 method for class 'PVS' print(x, ...) ## S3 method for class 'lm.pels' summary(object, ...) ## S3 method for class 'PVS' summary(object, ...)
x |
Output of the |
... |
Further arguments. |
object |
Output of the |
The matched call.
The estimated intercept of the model.
The estimated vector of linear coefficients (beta.est
).
The number of non-zero components in beta.est
.
The indexes of the non-zero components in beta.est
.
The optimal value of the penalisation parameter (lambda.opt
).
The optimal value of the criterion function, i.e. the value obtained with lambda.opt
and vn.opt
(and w.opt
in the case of the PVS).
Minimum value of the penalised least-squares function. That is, the value obtained using beta.est
and lambda.opt
.
The penalty function used.
The criterion used to select the penalisation parameter and vn
.
The optimal value of vn
in the case of the lm.pels
object.
In the case of the PVS
objects, these functions also return
the optimal number of covariates required to construct the reduced model in the first step of the algorithm (w.opt
). This value is selected using the same criterion employed for selecting the penalisation parameter.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
lm.pels.fit
and PVS.fit
.
summary
and print
functions for PVS.kernel.fit
and PVS.kNN.fit
.
## S3 method for class 'PVS.kernel' print(x, ...) ## S3 method for class 'PVS.kNN' print(x, ...) ## S3 method for class 'PVS.kernel' summary(object, ...) ## S3 method for class 'PVS.kNN' summary(object, ...)
## S3 method for class 'PVS.kernel' print(x, ...) ## S3 method for class 'PVS.kNN' print(x, ...) ## S3 method for class 'PVS.kernel' summary(object, ...) ## S3 method for class 'PVS.kNN' summary(object, ...)
x |
Output of the |
... |
Further arguments. |
object |
Output of the |
The matched call.
The optimal value of the tunning parameter (h.opt
or k.opt
).
The optimal initial number of covariates to build the reduced model (w.opt
).
The estimated vector of linear coefficients (beta.est
).
The number of non-zero components in beta.est
.
The indexes of the non-zero components in beta.est
.
The optimal value of the penalisation parameter (lambda.opt
).
The optimal value of the criterion function, i.e. the value obtained with w.opt
, lambda.opt
, vn.opt
and h.opt
/k.opt
Minimum value of the penalised least-squares function. That is, the value obtained using beta.est
and lambda.opt
.
The penalty function used.
The criterion used to select the number of covariates employed to construct the reduced model, the tuning parameter, the penalisation parameter and vn
.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
PVS.kernel.fit
and PVS.kNN.fit
.
summary
and print
functions for FASSMR.kernel.fit
, FASSMR.kNN.fit
, IASSMR.kernel.fit
and IASSMR.kNN.fit
.
## S3 method for class 'FASSMR.kernel' print(x, ...) ## S3 method for class 'FASSMR.kNN' print(x, ...) ## S3 method for class 'IASSMR.kernel' print(x, ...) ## S3 method for class 'IASSMR.kNN' print(x, ...) ## S3 method for class 'FASSMR.kernel' summary(object, ...) ## S3 method for class 'FASSMR.kNN' summary(object, ...) ## S3 method for class 'IASSMR.kernel' summary(object, ...) ## S3 method for class 'IASSMR.kNN' summary(object, ...)
## S3 method for class 'FASSMR.kernel' print(x, ...) ## S3 method for class 'FASSMR.kNN' print(x, ...) ## S3 method for class 'IASSMR.kernel' print(x, ...) ## S3 method for class 'IASSMR.kNN' print(x, ...) ## S3 method for class 'FASSMR.kernel' summary(object, ...) ## S3 method for class 'FASSMR.kNN' summary(object, ...) ## S3 method for class 'IASSMR.kernel' summary(object, ...) ## S3 method for class 'IASSMR.kNN' summary(object, ...)
x |
Output of the |
... |
Further arguments passed to or from other methods. |
object |
Output of the |
The matched call.
The optimal value of the tunning parameter (h.opt
or k.opt
).
The optimal initial number of covariates to build the reduced model (w.opt
).
Coefficients of in the B-spline basis (
theta.est
): a vector of length(order.Bspline+nknot.theta).
The estimated vector of linear coefficients (beta.est
).
The number of non-zero components in beta.est
.
The indexes of the non-zero components in beta.est
.
The optimal value of the penalisation parameter (lambda.opt
).
The optimal value of the criterion function, i.e. the value obtained with w.opt
, lambda.opt
, vn.opt
and h.opt
/k.opt
Minimum value of the penalised least-squares function. That is, the value obtained using theta.est
, beta.est
and lambda.opt
.
The penalty function used.
The criterion used to select the number of covariates employed to construct the reduced model, the tuning parameter, the penalisation parameter and vn
.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
FASSMR.kernel.fit
, FASSMR.kNN.fit
, IASSMR.kernel.fit
and IASSMR.kNN.fit
.
summary
and print
functions for sfpl.kNN.fit
and sfpl.kernel.fit
.
## S3 method for class 'sfpl.kernel' print(x, ...) ## S3 method for class 'sfpl.kNN' print(x, ...) ## S3 method for class 'sfpl.kernel' summary(object, ...) ## S3 method for class 'sfpl.kNN' summary(object, ...)
## S3 method for class 'sfpl.kernel' print(x, ...) ## S3 method for class 'sfpl.kNN' print(x, ...) ## S3 method for class 'sfpl.kernel' summary(object, ...) ## S3 method for class 'sfpl.kNN' summary(object, ...)
x |
Output of the |
... |
Further arguments. |
object |
Output of the |
The matched call.
The optimal value of the tunning parameter (h.opt
or k.opt
).
The estimated vector of linear coefficients (beta.est
).
The number of non-zero components in beta.est
.
The indexes of the non-zero components in beta.est
.
The optimal value of the penalisation parameter (lambda.opt
).
The optimal value of the criterion function, i.e. the value obtained with lambda.opt
, vn.opt
and h.opt
/k.opt
Minimum value of the penalised least-squares function. That is, the value obtained using beta.est
and lambda.opt
.
The penalty function used.
The criterion used to select the tuning parameter, the penalisation parameter and vn
.
The optimal value of vn
.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
sfpl.kernel.fit
and sfpl.kNN.fit
.
summary
and print
functions for sfplsim.kNN.fit
and sfplsim.kernel.fit
.
## S3 method for class 'sfplsim.kernel' print(x, ...) ## S3 method for class 'sfplsim.kNN' print(x, ...) ## S3 method for class 'sfplsim.kernel' summary(object, ...) ## S3 method for class 'sfplsim.kNN' summary(object, ...)
## S3 method for class 'sfplsim.kernel' print(x, ...) ## S3 method for class 'sfplsim.kNN' print(x, ...) ## S3 method for class 'sfplsim.kernel' summary(object, ...) ## S3 method for class 'sfplsim.kNN' summary(object, ...)
x |
Output of the |
... |
Further arguments. |
object |
Output of the |
The matched call.
The optimal value of the tunning parameter (h.opt
or k.opt
).
Coefficients of in the B-spline basis (
theta.est
): a vector of length(order.Bspline+nknot.theta).
The estimated vector of linear coefficients (beta.est
).
The number of non-zero components in beta.est
.
The indexes of the non-zero components in beta.est
.
The optimal value of the penalisation parameter (lambda.opt
).
The optimal value of the criterion function, i.e. the value obtained with lambda.opt
, vn.opt
and h.opt
/k.opt
Minimum value of the penalised least-squares function. That is, the value obtained using theta.est
, beta.est
and lambda.opt
.
The penalty function used.
The criterion used to select the tuning parameter, the penalisation parameter and vn
.
The optimal value of vn
.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
sfplsim.kernel.fit
and sfplsim.kNN.fit
.
Computes the inner product between each curve collected in data
and a particular curve .
projec(data, theta, order.Bspline = 3, nknot.theta = 3, range.grid = NULL, nknot = NULL)
projec(data, theta, order.Bspline = 3, nknot.theta = 3, range.grid = NULL, nknot = NULL)
data |
Matrix containing functional data collected by row |
theta |
Vector containing the coefficients of |
order.Bspline |
Order of the B-spline basis functions for the B-spline representation of |
nknot.theta |
Number of regularly spaced interior knots of the B-spline basis. The default is 3. |
range.grid |
Vector of length 2 containing the range of the discretisation of the functional data. If |
nknot |
Number of regularly spaced interior knots for the B-spline representation of the functional data. The default value is |
A matrix containing the inner products.
The construction of this code is based on that by Frederic Ferraty, which is available on his website https://www.math.univ-toulouse.fr/~ferraty/SOFTWARES/NPFDA/index.html.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
See also semimetric.projec
.
data("Tecator") names(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra #length(theta)=6=order.Bspline+nknot.theta projec(X,theta=c(1,0,0,1,1,-1),nknot.theta=3,nknot=20,range.grid=c(850,1050))
data("Tecator") names(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra #length(theta)=6=order.Bspline+nknot.theta projec(X,theta=c(1,0,0,1,1,-1),nknot.theta=3,nknot=20,range.grid=c(850,1050))
This function implements the Partitioning Variable Selection (PVS) algorithm. This algorithm is specifically designed for estimating multivarite linear models, where the scalar covariates are derived from the discretisation of a curve.
PVS is a two-stage procedure that selects the impact points of the discretised curve and estimates the model. The algorithm employs a penalised least-squares regularisation procedure. Additionally, it utilises an objective criterion (criterion
) to determine the initial number of covariates in the reduced model (w.opt
) of the first stage, and the penalisation parameter (lambda.opt
).
PVS.fit(z, y, train.1 = NULL, train.2 = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), range.grid = NULL, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
PVS.fit(z, y, train.1 = NULL, train.2 = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), range.grid = NULL, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
z |
Matrix containing the observations of the functional covariate collected by row (linear component). |
y |
Vector containing the scalar response. |
train.1 |
Positions of the data that are used as the training sample in the 1st step. The default setting is |
train.2 |
Positions of the data that are used as the training sample in the 2nd step. The default setting is |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Number of values in the sequence from which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
wn |
A vector of positive integers indicating the eligible number of covariates in the reduced model. For more information, refer to the section |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
criterion |
The criterion used to select the tuning and regularisation parameters: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
The sparse linear model with covariates coming from the discretization of a curve is given by the expression
where
is a real random response and
is assumed to be a random curve defined on some interval
, which is observed at the points
.
is a vector of unknown real coefficients.
denotes the random error.
In this model, it is assumed that only a few scalar variables from the set are part of the model. Therefore, the relevant variables (the impact points of the curve
on the response) must be selected, and the model estimated.
In this function, this model is fitted using the PVS. The PVS is a two-steps procedure. So we divide the sample into two independent subsamples, each asymptotically half the size of the original sample (). One subsample is used in the first stage of the method, and the other in the second stage.The subsamples are defined as follows:
Note that these two subsamples are specified to the program through the arguments train.1
and train.2
. The superscript , where
, indicates the stage of the method in which the sample, function, variable, or parameter is involved.
To explain the algorithm, we assume that the number of linear covariates can be expressed as follows:
, with
and
being integers.
First step. A reduced model is considered, discarding many linear covariates. The penalised least-squares procedure is applied to the reduced model using only the subsample . Specifically:
Consider a subset of the initial linear covariates, containing only
equally spaced discretized observations of
covering the interval
. This subset is the following:
where and
denotes the smallest integer not less than the real number
. The size (cardinality) of this subset is provided to the program in the argument
wn
(which contains a sequence of eligible sizes).
Consider the following reduced model involving only the linear covariates from
:
:
The penalised least-squares variable selection procedure is applied to the reduced model using the function lm.pels.fit
, which requires the remaining arguments (for details, see the documentation of the function lm.pels.fit
). The estimates obtained are the outputs of the first step of the algorithm.
Second step. The variables selected in the first step, along with the variables in their neighborhood, are included. Then the penalised least-squares procedure is carried out again considering only the subsample . Specifically:
Consider a new set of variables :
Denoting by , we can rename the variables in
as follows:
Consider the following model, which involves only the linear covariates belonging to
The penalised least-squares variable selection procedure is applied to this model using lm.pels.fit
.
The outputs of the second step are the estimates of the model. For further details on this algorithm, see Aneiros and Vieu (2014).
Remark: If the condition is not met (then
is not an integer), the function considers variable
values
. Specifically:
where denotes the integer part of the real number
.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
|
indexes.beta.nonnull |
Indexes of the non-zero |
w.opt |
Selected size for |
lambda.opt |
Selected value of the penalisation parameter |
IC |
Value of the criterion function considered to select |
beta2 |
Estimate of |
indexes.beta.nonnull2 |
Indexes of the non-zero linear coefficients after the step 2 of the method for each value of the sequence |
IC2 |
Optimal value of the criterion function in the second step for each value of the sequence |
lambda2 |
Selected value of penalisation parameter in the second step for each value of the sequence |
index02 |
Indexes of the covariates (in the entire set of |
beta1 |
Estimate of |
IC1 |
Optimal value of the criterion function in the first step for each value of the sequence |
lambda1 |
Selected value of penalisation parameter in the first step for each value of the sequence |
index01 |
Indexes of the covariates (in the entire set of |
index1 |
Indexes of the non-zero linear coefficients after the step 1 of the method for each value of the sequence |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Aneiros, G. and Vieu, P. (2014) Variable selection in infinite-dimensional problems. Statistics & Probability Letters, 94, 12–20, doi:10.1016/j.spl.2014.06.025.
See also lm.pels.fit
.
data(Sugar) y<-Sugar$ash z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- PVS.fit(z=z.sug[train,], y=y.sug[train],train.1=1:108,train.2=109:216, lambda.min.h=0.2,criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
data(Sugar) y<-Sugar$ash z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- PVS.fit(z=z.sug[train,], y=y.sug[train],train.1=1:108,train.2=109:216, lambda.min.h=0.2,criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
This function computes the partitioning variable selection (PVS) algorithm for multi-functional partial linear models (MFPLM).
PVS is a two-stage procedure that selects the impact points of the discretised curve and estimates the model. The algorithm employs a penalised least-squares regularisation procedure, integrated with kernel estimation with Nadaraya-Watson weights.
Additionally, it utilises an objective criterion (criterion
) to select the number of covariates in the reduced model (w.opt
), the bandwidth (h.opt
) and the penalisation parameter (lambda.opt
).
PVS.kernel.fit(x, z, y, train.1 = NULL, train.2 = NULL, semimetric = "deriv", q = NULL, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
PVS.kernel.fit(x, z, y, train.1 = NULL, train.2 = NULL, semimetric = "deriv", q = NULL, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
x |
Matrix containing the observations of the functional covariate (functional nonparametric component), collected by row. |
z |
Matrix containing the observations of the functional covariate that is discretised (linear component), collected by row. |
y |
Vector containing the scalar response. |
train.1 |
Positions of the data that are used as the training sample in the 1st step. The default setting is |
train.2 |
Positions of the data that are used as the training sample in the 2nd step. The default setting is |
semimetric |
Semi-metric function. Only |
q |
Order of the derivative (if |
min.q.h |
Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05. |
max.q.h |
Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5. |
h.seq |
Vector containing the sequence of bandwidths. The default is a sequence of |
num.h |
Positive integer indicating the number of bandwidths in the grid. The default is 10. |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i.e. the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
wn |
A vector of positive integers indicating the eligible number of covariates in the reduced model. For more information, refer to the section |
criterion |
The criterion used to select the tuning and regularisation parameters: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
The multi-functional partial linear model (MFPLM) is given by the expression
where:
is a real random response and
denotes a random element belonging to some semi-metric space
. The second functional predictor
is assumed to be a curve defined on some interval
, observed at the points
.
is a vector of unknown real coefficients and
represents a smooth unknown real-valued link function.
denotes the random error.
In the MFPLM, it is assumed that only a few scalar variables from the set are part of the model. Therefore, the relevant variables in the linear component (the impact points of the curve
on the response) must be selected, and the model estimated.
In this function, the MFPLM is fitted using the PVS procedure, a two-step algorithm. For this, we divide the sample into two two independent subsamples (asymptotically of the same size ). One subsample is used in the first stage of the method, and the other in the second stage.The subsamples are defined as follows:
Note that these two subsamples are specified to the program through the arguments train.1
and train.2
.
The superscript , where
, indicates the stage of the method in which the sample, function, variable, or parameter is involved.
To explain the algorithm, let's assume that the number of linear covariates can be expressed as follows:
with
and
being integers.
First step. A reduced model is considered, discarding many linear covariates. The penalised least-squares procedure is applied to the reduced model using only the subsample . Specifically:
Consider a subset of the initial linear covariates containing only
equally spaced discretised observations of
covering the interval
. This subset is the following:
where and
denotes the smallest integer not less than the real number
. The size (cardinality) of this subset is provided to the program through the argument
wn
, which contains the sequence of eligible sizes.
Consider the following reduced model involving only the linear covariates from
:
The penalised least-squares variable selection procedure, with kernel estimation, is applied to the reduced model using the function sfpl.kernel.fit
, which requires the remaining arguments (for details, see the documentation of the function sfpl.kernel.fit
). The estimates obtained after that are the outputs of the first step of the algorithm.
Second step. The variables selected in the first step, along with those in their neighborhood, are included. Then the penalised least-squares procedure, combined with kernel estimation, is carried out again, considering only the subsample . Specifically:
Consider a new set of variables:
Denoting by , we can rename the variables in
as follows:
Consider the following model, which involves only the linear covariates belonging to
The penalised least-squares variable selection procedure, with kernel estimation, is applied to this model using sfpl.kernel.fit
.
The outputs of the second step are the estimates of the MFPLM. For further details on this algorithm, see Aneiros and Vieu (2015).
Remark: If the condition is not met (then
is not an integer), the function considers variable
values
. Specifically:
where denotes the integer part of the real number
.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
|
indexes.beta.nonnull |
Indexes of the non-zero |
h.opt |
Selected bandwidth (when |
w.opt |
Selected size for |
lambda.opt |
Selected value of the penalisation parameter |
IC |
Value of the criterion function considered to select |
vn.opt |
Selected value of |
beta2 |
Estimate of |
indexes.beta.nonnull2 |
Indexes of the non-zero linear coefficients after the step 2 of the method for each value of the sequence |
h2 |
Selected bandwidth in the second step of the algorithm for each value of the sequence |
IC2 |
Optimal value of the criterion function in the second step for each value of the sequence |
lambda2 |
Selected value of penalisation parameter in the second step for each value of the sequence |
index02 |
Indexes of the covariates (in the entire set of |
beta1 |
Estimate of |
h1 |
Selected bandwidth in the first step of the algorithm for each value of the sequence |
IC1 |
Optimal value of the criterion function in the first step for each value of the sequence |
lambda1 |
Selected value of penalisation parameter in the first step for each value of the sequence |
index01 |
Indexes of the covariates (in the entire set of |
index1 |
Indexes of the non-zero linear coefficients after the step 1 of the method for each value of the sequence |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Aneiros, G., and Vieu, P. (2015) Partial linear modelling with multi-functional covariates. Computational Statistics, 30, 647–671, doi:10.1007/s00180-015-0568-8.
See also sfpl.kernel.fit, predict.PVS.kernel
and plot.PVS.kernel
.
Alternative method PVS.kNN.fit
.
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- PVS.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,lambda.min.h=0.03, lambda.min.l=0.03, max.q.h=0.35, nknot=20, criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- PVS.kernel.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,lambda.min.h=0.03, lambda.min.l=0.03, max.q.h=0.35, nknot=20, criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
This function computes the partitioning variable selection (PVS) algorithm for multi-functional partial linear models (MFPLM).
PVS is a two-stage procedure that selects the impact points of the discretised curve and estimates the model. The algorithm employs a penalised least-squares regularisation procedure, integrated with kNN estimation using Nadaraya-Watson weights.
Additionally, it utilises an objective criterion (criterion
) to select the number of covariates in the reduced model (w.opt
), the number of neighbours (k.opt
) and the penalisation parameter (lambda.opt
).
PVS.kNN.fit(x, z, y, train.1 = NULL, train.2 = NULL, semimetric = "deriv", q = NULL, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
PVS.kNN.fit(x, z, y, train.1 = NULL, train.2 = NULL, semimetric = "deriv", q = NULL, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, vn = ncol(z), nfolds = 10, seed = 123, wn = c(10, 15, 20), criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
x |
Matrix containing the observations of the functional covariate (functional nonparametric component), collected by row. |
z |
Matrix containing the observations of the functional covariate that is discretised (linear component), collected by row. |
y |
Vector containing the scalar response. |
train.1 |
Positions of the data that are used as the training sample in the 1st step. The default setting is |
train.2 |
Positions of the data that are used as the training sample in the 2nd step. The default setting is |
semimetric |
Semi-metric function. Currently, only |
q |
Order of the derivative (if |
knearest |
Vector of positive integers containing the sequence in which the number of nearest neighbours |
min.knn |
A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours |
max.knn |
A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours |
step |
A positive integer used to construct the sequence of k-nearest neighbours as follows: |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i.e. the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
wn |
A vector of positive integers indicating the eligible number of covariates in the reduced model. For more information, refer to the section |
criterion |
The criterion used to select the tuning and regularisation parameters: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
The multi-functional partial linear model (MFPLM) is given by the expression
where:
is a real random response and
denotes a random element belonging to some semi-metric space
. The second functional predictor
is assumed to be a curve defined on some interval
, observed at the points
.
is a vector of unknown real coefficients and
represents a smooth unknown real-valued link function.
denotes the random error.
In the MFPLM, it is assumed that only a few scalar variables from the set are part of the model. Therefore, the relevant variables in the linear component (the impact points of the curve
on the response) must be selected, and the model estimated.
In this function, the MFPLM is fitted using the PVS procedure, a two-step algorithm. For this, we divide the sample into two two independent subsamples (asymptotically of the same size ). One subsample is used in the first stage of the method, and the other in the second stage.The subsamples are defined as follows:
Note that these two subsamples are specified to the program through the arguments train.1
and train.2
.
The superscript , where
, indicates the stage of the method in which the sample, function, variable, or parameter is involved.
To explain the algorithm, let's assume that the number of linear covariates can be expressed as follows:
with
and
being integers.
First step. A reduced model is considered, discarding many linear covariates. The penalised least-squares procedure is applied to the reduced model using only the subsample . Specifically:
Consider a subset of the initial linear covariates containing only
equally spaced discretised observations of
covering the interval
. This subset is the following:
where and
denotes the smallest integer not less than the real number
. The size (cardinality) of this subset is provided to the program through the argument
wn
, which contains the sequence of eligible sizes.
Consider the following reduced model involving only the linear covariates from
:
The penalised least-squares variable selection procedure, with kNN estimation, is applied to the reduced model using the function sfpl.kNN.fit
, which requires the remaining arguments (for details, see the documentation of the function sfpl.kNN.fit
). The estimates obtained after that are the outputs of the first step of the algorithm.
Second step. The variables selected in the first step, along with those in their neighborhood, are included. Then the penalised least-squares procedure, combined with kNN estimation, is carried out again, considering only the subsample . Specifically:
Consider a new set of variables:
Denoting by , we can rename the variables in
as follows:
Consider the following model, which involves only the linear covariates belonging to
The penalised least-squares variable selection procedure, with kNN estimation, is applied to this model using sfpl.kNN.fit
.
The outputs of the second step are the estimates of the MFPLM. For further details on this algorithm, see Aneiros and Vieu (2015).
Remark: If the condition is not met (then
is not an integer), the function considers variable
values
. Specifically:
where denotes the integer part of the real number
.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
|
indexes.beta.nonnull |
Indexes of the non-zero |
k.opt |
Selected number of nearest neighbours (when |
w.opt |
Selected initial number of covariates in the reduced model. |
lambda.opt |
Selected value of the penalisation parameter |
IC |
Value of the criterion function considered to select |
vn.opt |
Selected value of |
beta2 |
Estimate of |
indexes.beta.nonnull2 |
Indexes of the non-zero linear coefficients after the step 2 of the method for each value of the sequence |
knn2 |
Selected number of neighbours in the second step of the algorithm for each value of the sequence |
IC2 |
Optimal value of the criterion function in the second step for each value of the sequence |
lambda2 |
Selected value of penalisation parameter in the second step for each value of the sequence |
index02 |
Indexes of the covariates (in the entire set of |
beta1 |
Estimate of |
knn1 |
Selected number of neighbours in the first step of the algorithm for each value of the sequence |
IC1 |
Optimal value of the criterion function in the first step for each value of the sequence |
lambda1 |
Selected value of penalisation parameter in the first step for each value of the sequence |
index01 |
Indexes of the covariates (in the entire set of |
index1 |
Indexes of the non-zero linear coefficients after the step 1 of the method for each value of the sequence |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Aneiros, G., and Vieu, P. (2015) Partial linear modelling with multi-functional covariates. Computational Statistics, 30, 647–671, doi:10.1007/s00180-015-0568-8.
See also sfpl.kNN.fit, predict.PVS.kNN
and plot.PVS.kNN
.
Alternative method PVS.kernel.fit
.
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- PVS.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,lambda.min.h=0.07, lambda.min.l=0.07, nknot=20,criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
data(Sugar) y<-Sugar$ash x<-Sugar$wave.290 z<-Sugar$wave.240 #Outliers index.y.25 <- y > 25 index.atip <- index.y.25 (1:268)[index.atip] #Dataset to model x.sug <- x[!index.atip,] z.sug<- z[!index.atip,] y.sug <- y[!index.atip] train<-1:216 ptm=proc.time() fit<- PVS.kNN.fit(x=x.sug[train,],z=z.sug[train,], y=y.sug[train], train.1=1:108,train.2=109:216,lambda.min.h=0.07, lambda.min.l=0.07, nknot=20,criterion="BIC", max.iter=5000) proc.time()-ptm fit names(fit)
Computes the projection semi-metric between each curve in data1
and each curve in data2
, given a functional index .
semimetric.projec(data1, data2, theta, order.Bspline = 3, nknot.theta = 3, range.grid = NULL, nknot = NULL)
semimetric.projec(data1, data2, theta, order.Bspline = 3, nknot.theta = 3, range.grid = NULL, nknot = NULL)
data1 |
Matrix containing functional data collected by row. |
data2 |
Matrix containing functional data collected by row. |
theta |
Vector containing the coefficients of |
order.Bspline |
Order of the B-spline basis functions for the B-spline representation of |
nknot.theta |
Number of regularly spaced interior knots of the B-spline basis. The default is 3. |
range.grid |
Vector of length 2 containing the range of the discretisation of the functional data. If |
nknot |
Number of regularly spaced interior knots for the B-spline representation of the functional data. The default value is |
For , where
is a separable Hilbert space, the projection semi-metric in the direction
is defined as
The function semimetric.projec
computes this projection semi-metric using the B-spline representation of the curves and . The dimension of the B-spline basis for
is determined by
order.Bspline
+nknot.theta
.
A matrix containing the projection semi-metrics for each pair of curves.
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
See also projec
.
data("Tecator") names(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra #length(theta)=6=order.Bspline+nknot.theta semimetric.projec(data1=X[1:5,], data2=X[5:10,],theta=c(1,0,0,1,1,-1), nknot.theta=3,nknot=20,range.grid=c(850,1050))
data("Tecator") names(Tecator) y<-Tecator$fat X<-Tecator$absor.spectra #length(theta)=6=order.Bspline+nknot.theta semimetric.projec(data1=X[1:5,], data2=X[5:10,],theta=c(1,0,0,1,1,-1), nknot.theta=3,nknot=20,range.grid=c(850,1050))
This function fits a sparse semi-functional partial linear model (SFPLM). It employs a penalised least-squares regularisation procedure, integrated with nonparametric kernel estimation using Nadaraya-Watson weights.
The procedure utilises an objective criterion (criterion
) to select both the bandwidth (h.opt
) and the regularisation parameter (lambda.opt
).
sfpl.kernel.fit(x, z, y, semimetric = "deriv", q = NULL, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
sfpl.kernel.fit(x, z, y, semimetric = "deriv", q = NULL, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
x |
Matrix containing the observations of the functional covariate (functional nonparametric component), collected by row. |
z |
Matrix containing the observations of the scalar covariates (linear component), collected by row. |
y |
Vector containing the scalar response. |
semimetric |
Semi-metric function. Only |
q |
Order of the derivative (if |
min.q.h |
Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05. |
max.q.h |
Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5. |
h.seq |
Vector containing the sequence of bandwidths. The default is a sequence of |
num.h |
Positive integer indicating the number of bandwidths in the grid. The default is 10. |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i.e. the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
lambda.seq |
Sequence of values in which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
criterion |
The criterion used to select the tuning and regularisation parameter: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
The sparse semi-functional partial linear model (SFPLM) is given by the expression:
where denotes a scalar response,
are real random covariates, and
is a functional random covariate valued in a semi-metric space
. In this equation,
and
represent a vector of unknown real parameters and an unknown smooth real-valued function, respectively. Additionally,
is the random error.
In this function, the SFPLM is fitted using a penalised least-squares approach. The approach involves transforming the SFPLM into a linear model by extracting from and
(
) the effect of the functional covariate
using functional nonparametric regression (for details, see Ferraty and Vieu, 2006). This transformation is achieved using kernel estimation with Nadaraya-Watson weights.
An approximate linear model is then obtained:
and the penalised least-squares procedure is applied to this model by minimising
where is a penalty function (specified in the argument
penalty
) and is a tuning parameter.
To reduce the number of tuning parameters,
, to be selected for each sample, we consider
, where
denotes the OLS estimate of
and
is the estimated standard deviation. Both
and
(in the kernel estimation) are selected using the objective criterion specified in the argument
criterion
.
Finally, after estimating by minimising (1), we address the estimation of the nonlinear function
.
For this, we again employ the kernel procedure with Nadaraya-Watson weights to smooth the partial residuals
.
For further details on the estimation procedure of the sparse SFPLM, see Aneiros et al. (2015).
Remark: It should be noted that if we set lambda.seq
to , we can obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using
lambda.seq
with a value is advisable when suspecting the presence of irrelevant variables.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
Estimate of |
indexes.beta.nonnull |
Indexes of the non-zero |
h.opt |
Selected bandwidth. |
lambda.opt |
Selected value of lambda. |
IC |
Value of the criterion function considered to select |
h.min.opt.max.mopt |
|
vn.opt |
Selected value of |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Aneiros, G., Ferraty, F., Vieu, P. (2015) Variable selection in partial linear regression with functional covariate. Statistics, 49, 1322–1347, doi:10.1080/02331888.2014.998675.
Ferraty, F. and Vieu, P. (2006) Nonparametric Functional Data Analysis. Springer Series in Statistics, New York.
See also predict.sfpl.kernel
and plot.sfpl.kernel
.
Alternative method sfpl.kNN.fit
.
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #SFPLM fit. ptm=proc.time() fit<-sfpl.kernel.fit(x=X[train,], z=z.com[train,], y=y[train],q=2, max.q.h=0.35, lambda.min.l=0.01, max.iter=5000, criterion="BIC", nknot=20) proc.time()-ptm #Results fit names(fit)
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #SFPLM fit. ptm=proc.time() fit<-sfpl.kernel.fit(x=X[train,], z=z.com[train,], y=y[train],q=2, max.q.h=0.35, lambda.min.l=0.01, max.iter=5000, criterion="BIC", nknot=20) proc.time()-ptm #Results fit names(fit)
This function fits a sparse semi-functional partial linear model (SFPLM). It employs a penalised least-squares regularisation procedure, integrated with nonparametric kNN estimation using Nadaraya-Watson weights.
The procedure utilises an objective criterion (criterion
) to select both the bandwidth (h.opt
) and the regularisation parameter (lambda.opt
).
sfpl.kNN.fit(x, z, y, semimetric = "deriv", q = NULL, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
sfpl.kNN.fit(x, z, y, semimetric = "deriv", q = NULL, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
x |
Matrix containing the observations of the functional covariate (functional nonparametric component), collected by row. |
z |
Matrix containing the observations of the scalar covariates (linear component), collected by row. |
y |
Vector containing the scalar response. |
semimetric |
Semi-metric function. Only |
q |
Order of the derivative (if |
knearest |
Vector of positive integers containing the sequence in which the number of nearest neighbours |
min.knn |
A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours |
max.knn |
A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours |
step |
A positive integer used to construct the sequence of k-nearest neighbours as follows: |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i.e. the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
lambda.seq |
Sequence of values in which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
criterion |
The criterion used to select the tuning and regularisation parameter: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
The sparse semi-functional partial linear model (SFPLM) is given by the expression:
where denotes a scalar response,
are real random covariates, and
is a functional random covariate valued in a semi-metric space
. In this equation,
and
represent a vector of unknown real parameters and an unknown smooth real-valued function, respectively. Additionally,
is the random error.
In this function, the SFPLM is fitted using a penalised least-squares approach. The approach involves transforming the SFPLM into a linear model by extracting from and
(
) the effect of the functional covariate
using functional nonparametric regression (for details, see Ferraty and Vieu, 2006). This transformation is achieved using kNN estimation with Nadaraya-Watson weights.
An approximate linear model is then obtained:
and the penalised least-squares procedure is applied to this model by minimising
where is a penalty function (specified in the argument
penalty
) and is a tuning parameter.
To reduce the number of tuning parameters,
, to be selected for each sample, we consider
, where
denotes the OLS estimate of
and
is the estimated standard deviation. Both
and
(in the kNN estimation) are selected using the objective criterion specified in the argument
criterion
.
Finally, after estimating by minimising (1), we address the estimation of the nonlinear function
.
For this, we again employ the kNN procedure with Nadaraya-Watson weights to smooth the partial residuals
.
For further details on the estimation procedure of the sparse SFPLM, see Aneiros et al. (2015).
Remark: It should be noted that if we set lambda.seq
to , we can obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using
lambda.seq
with a value is advisable when suspecting the presence of irrelevant variables.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
Estimate of |
indexes.beta.nonnull |
Indexes of the non-zero |
k.opt |
Selected number of nearest neighbours. |
lambda.opt |
Selected value of lambda. |
IC |
Value of the criterion function considered to select both |
vn.opt |
Selected value of |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Aneiros, G., Ferraty, F., Vieu, P. (2015) Variable selection in partial linear regression with functional covariate. Statistics, 49, 1322–1347, doi:10.1080/02331888.2014.998675.
See also predict.sfpl.kNN
and plot.sfpl.kNN
.
Alternative method sfpl.kernel.fit
.
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #SFPLM fit. ptm=proc.time() fit<-sfpl.kNN.fit(y=y[train],x=X[train,], z=z.com[train,],q=2, max.knn=20, lambda.min.l=0.01, criterion="BIC", range.grid=c(850,1050), nknot=20, max.iter=5000) proc.time()-ptm #Results fit names(fit)
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #SFPLM fit. ptm=proc.time() fit<-sfpl.kNN.fit(y=y[train],x=X[train,], z=z.com[train,],q=2, max.knn=20, lambda.min.l=0.01, criterion="BIC", range.grid=c(850,1050), nknot=20, max.iter=5000) proc.time()-ptm #Results fit names(fit)
This function fits a sparse semi-functional partial linear single-index (SFPLSIM). It employs a penalised least-squares regularisation procedure, integrated with nonparametric kernel estimation using Nadaraya-Watson weights.
The function uses B-spline expansions to represent curves and eligible functional indexes. It also utilises an objective criterion (criterion
) to select both the bandwidth (h.opt
) and the regularisation parameter (lambda.opt
).
sfplsim.kernel.fit(x, z, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
sfplsim.kernel.fit(x, z, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
x |
Matrix containing the observations of the functional covariate (functional single-index component), collected by row. |
z |
Matrix containing the observations of the scalar covariates (linear component), collected by row. |
y |
Vector containing the scalar response. |
seed.coeff |
Vector of initial values used to build the set |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3. |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
min.q.h |
Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05. |
max.q.h |
Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5. |
h.seq |
Vector containing the sequence of bandwidths. The default is a sequence of |
num.h |
Positive integer indicating the number of bandwidths in the grid. The default is 10. |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
lambda.seq |
Sequence of values in which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
criterion |
The criterion used to select the tuning and regularisation parameter: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
n.core |
Number of CPU cores designated for parallel execution. The default is |
The sparse semi-functional partial linear single-index model (SFPLSIM) is given by the expression:
where denotes a scalar response,
are real random covariates and
is a functional random covariate valued in a separable Hilbert space
with inner product
. In this equation,
,
and
are a vector of unknown real parameters, an unknown functional direction and an unknown smooth real-valued function, respectively. In addition,
is the random error.
The sparse SFPLSIM is fitted using the penalised least-squares approach. The first step is to transform the SSFPLSIM into a linear model by extracting from and
(
) the effect of the functional covariate
using functional single-index regression. This transformation is achieved using nonparametric kernel estimation (see, for details, the documentation of the function
fsim.kernel.fit
).
An approximate linear model is then obtained:
and the penalised least-squares procedure is applied to this model by minimising over the pair
where is a penalty function (specified in the argument
penalty
) and is a tuning parameter.
To reduce the quantity of tuning parameters,
, to be selected for each sample, we consider
, where
denotes the OLS estimate of
and
is the estimated standard deviation. Both
and
(in the kernel estimation) are selected using the objetive criterion specified in the argument
criterion
.
In addition, the function uses a B-spline representation to construct a set of eligible functional indexes
. The dimension of the B-spline basis is
order.Bspline
+nknot.theta
and the set of eligible coefficients is obtained by calibrating (to ensure the identifiability of the model) the set of initial coefficients given in seed.coeff
. The larger this set, the greater the size of . ue to the intensive computation required by our approach, a balance between the size of
and the performance of the estimator is necessary. For that, Ait-Saidi et al. (2008) suggested considering
order.Bspline=3
and seed.coeff=c(-1,0,1)
. For details on the construction of see Novo et al. (2019).
Finally, after estimating and
by minimising (1), we proceed to estimate the nonlinear function
.
For this purporse, we again apply the kernel procedure with Nadaraya-Watson weights to smooth the partial residuals
.
For further details on the estimation procedure of the SSFPLSIM, see Novo et al. (2021).
Remark: It should be noted that if we set lambda.seq
to , we can obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using
lambda.seq
with a value is advisable when suspecting the presence of irrelevant variables.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
Estimate of |
theta.est |
Coefficients of |
indexes.beta.nonnull |
Indexes of the non-zero |
h.opt |
Selected bandwidth. |
lambda.opt |
Selected value of the penalisation parameter |
IC |
Value of the criterion function considered to select |
Q.opt |
Minimum value of the penalized criterion used to estimate |
Q |
Vector of dimension equal to the cardinal of |
m.opt |
Index of |
lambda.min.opt.max.mopt |
A grid of values in [ |
lambda.min.opt.max.m |
A grid of values in [ |
h.min.opt.max.mopt |
|
h.min.opt.max.m |
For each |
h.seq.opt |
Sequence of eligible values for |
theta.seq.norm |
The vector |
vn.opt |
Selected value of |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Ait-Saidi, A., Ferraty, F., Kassa, R., and Vieu, P. (2008) Cross-validated estimations in the single-functional index model. Statistics, 42(6), 475–494, doi:10.1080/02331880801980377.
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single-index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
Novo, S., Aneiros, G., and Vieu, P., (2021) Sparse semiparametric regression when predictors are mixture of functional and high-dimensional variables. TEST, 30, 481–504, doi:10.1007/s11749-020-00728-w.
Novo, S., Aneiros, G., and Vieu, P., (2021) A kNN procedure in semiparametric functional data analysis. Statistics and Probability Letters, 171, 109028, doi:10.1016/j.spl.2020.109028.
See also fsim.kernel.fit
, predict.sfplsim.kernel
and plot.sfplsim.kernel
Alternative procedure sfplsim.kNN.fit
.
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra2 z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #SSFPLSIM fit. Convergence errors for some theta are obtained. ptm=proc.time() fit<-sfplsim.kernel.fit(x=X[train,], z=z.com[train,], y=y[train], max.q.h=0.35,lambda.min.l=0.01, max.iter=5000, nknot.theta=4,criterion="BIC",nknot=20) proc.time()-ptm #Results fit names(fit)
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra2 z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #SSFPLSIM fit. Convergence errors for some theta are obtained. ptm=proc.time() fit<-sfplsim.kernel.fit(x=X[train,], z=z.com[train,], y=y[train], max.q.h=0.35,lambda.min.l=0.01, max.iter=5000, nknot.theta=4,criterion="BIC",nknot=20) proc.time()-ptm #Results fit names(fit)
This function fits a sparse semi-functional partial linear single-index (SFPLSIM). It employs a penalised least-squares regularisation procedure, integrated with nonparametric kNN estimation using Nadaraya-Watson weights.
The function uses B-spline expansions to represent curves and eligible functional indexes. It also utilises an objective criterion (criterion
) to select both the number of neighbours (k.opt
) and the regularisation parameter (lambda.opt
).
sfplsim.kNN.fit(x, z, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
sfplsim.kNN.fit(x, z, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3, knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL, range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000, n.core = NULL)
x |
Matrix containing the observations of the functional covariate (functional single-index component), collected by row. |
z |
Matrix containing the observations of the scalar covariates (linear component), collected by row. |
y |
Vector containing the scalar response. |
seed.coeff |
Vector of initial values used to build the set |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3. |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
knearest |
Vector of positive integers containing the sequence in which the number of nearest neighbours |
min.knn |
A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours |
max.knn |
A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours |
step |
A positive integer used to construct the sequence of k-nearest neighbours as follows: |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
lambda.seq |
Sequence of values in which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
criterion |
The criterion used to select the tuning and regularisation parameter: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
n.core |
Number of CPU cores designated for parallel execution. The default is |
The sparse semi-functional partial linear single-index model (SFPLSIM) is given by the expression:
where denotes a scalar response,
are real random covariates and
is a functional random covariate valued in a separable Hilbert space
with inner product
. In this equation,
,
and
are a vector of unknown real parameters, an unknown functional direction and an unknown smooth real-valued function, respectively. In addition,
is the random error.
The sparse SFPLSIM is fitted using the penalised least-squares approach. The first step is to transform the SSFPLSIM into a linear model by extracting from and
(
) the effect of the functional covariate
using functional single-index regression. This transformation is achieved using nonparametric kNN estimation (see, for details, the documentation of the function
fsim.kNN.fit
).
An approximate linear model is then obtained:
and the penalised least-squares procedure is applied to this model by minimising over the pair
where is a penalty function (specified in the argument
penalty
) and is a tuning parameter.
To reduce the quantity of tuning parameters,
, to be selected for each sample, we consider
, where
denotes the OLS estimate of
and
is the estimated standard deviation. Both
and
(in the kNN estimation) are selected using the objetive criterion specified in the argument
criterion
.
In addition, the function uses a B-spline representation to construct a set of eligible functional indexes
. The dimension of the B-spline basis is
order.Bspline
+nknot.theta
and the set of eligible coefficients is obtained by calibrating (to ensure the identifiability of the model) the set of initial coefficients given in seed.coeff
. The larger this set, the greater the size of . ue to the intensive computation required by our approach, a balance between the size of
and the performance of the estimator is necessary. For that, Ait-Saidi et al. (2008) suggested considering
order.Bspline=3
and seed.coeff=c(-1,0,1)
. For details on the construction of see Novo et al. (2019).
Finally, after estimating and
by minimising (1), we proceed to estimate the nonlinear function
.
For this purporse, we again apply the kNN procedure with Nadaraya-Watson weights to smooth the partial residuals
.
For further details on the estimation procedure of the sparse SFPLSIM, see Novo et al. (2021).
Remark: It should be noted that if we set lambda.seq
to , we can obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using
lambda.seq
with a value is advisable when suspecting the presence of irrelevant variables.
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
|
theta.est |
Coefficients of |
indexes.beta.nonnull |
Indexes of the non-zero |
k.opt |
Selected number of nearest neighbours. |
lambda.opt |
Selected value of the penalisation parameter |
IC |
Value of the criterion function considered to select |
Q.opt |
Minimum value of the penalized criterion used to estimate |
Q |
Vector of dimension equal to the cardinal of |
m.opt |
Index of |
lambda.min.opt.max.mopt |
A grid of values in [ |
lambda.min.opt.max.m |
A grid of values in [ |
knn.min.opt.max.mopt |
|
knn.min.opt.max.m |
For each |
knearest |
Sequence of eligible values for |
theta.seq.norm |
The vector |
vn.opt |
Selected value of |
... |
German Aneiros Perez [email protected]
Silvia Novo Diaz [email protected]
Ait-Saidi, A., Ferraty, F., Kassa, R., and Vieu, P., (2008) Cross-validated estimations in the single-functional index model. Statistics, 42(6), 475–494, doi:10.1080/02331880801980377.
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single-index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
Novo, S., Aneiros, G., and Vieu, P., (2021) Sparse semiparametric regression when predictors are mixture of functional and high-dimensional variables. TEST, 30, 481–504, doi:10.1007/s11749-020-00728-w.
Novo, S., Aneiros, G., and Vieu, P., (2021) A kNN procedure in semiparametric functional data analysis. Statistics and Probability Letters, 171, 109028, doi:10.1016/j.spl.2020.109028
See also fsim.kNN.fit
, predict.sfplsim.kNN
and plot.sfplsim.kNN
Alternative procedure sfplsim.kernel.fit
.
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra2 z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #SSFPLSIM fit. Convergence errors for some theta are obtained. ptm=proc.time() fit<-sfplsim.kNN.fit(y=y[train],x=X[train,], z=z.com[train,], max.knn=20, lambda.min.l=0.01, factor.pn=2, nknot.theta=4, criterion="BIC",range.grid=c(850,1050), nknot=20, max.iter=5000) proc.time()-ptm #Results fit names(fit)
data("Tecator") y<-Tecator$fat X<-Tecator$absor.spectra2 z1<-Tecator$protein z2<-Tecator$moisture #Quadratic, cubic and interaction effects of the scalar covariates. z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2) train<-1:160 #SSFPLSIM fit. Convergence errors for some theta are obtained. ptm=proc.time() fit<-sfplsim.kNN.fit(y=y[train],x=X[train,], z=z.com[train,], max.knn=20, lambda.min.l=0.01, factor.pn=2, nknot.theta=4, criterion="BIC",range.grid=c(850,1050), nknot=20, max.iter=5000) proc.time()-ptm #Results fit names(fit)
Ash content and absorbance spectra at two different excitation wavelengths of 268 sugar samples. Detailed information about this dataset can be found at https://ucphchemometrics.com/datasets/.
data(Sugar)
data(Sugar)
A list
containing:
ash
: A vector with the ash content.
wave.290
: A matrix containing the absorbance spectra observed at 571 equally spaced wavelengths in the range of 275-560nm, at an excitation wavelengths of 290nm.
wave.240
: A matrix containing the absorbance spectra observed at 571 equally spaced wavelengths in the range of 275-560nm, at an excitation wavelengths of 240nm.
Aneiros, G., and Vieu, P. (2015) Partial linear modelling with multi-functional covariates. Computational Statistics, 30, 647–671, doi:10.1007/s00180-015-0568-8.
Novo, S., Vieu, P., and Aneiros, G., (2021) Fast and efficient algorithms for sparse semiparametric bi-functional regression. Australian and New Zealand Journal of Statistics, 63, 606–638, doi:10.1111/anzs.12355.
data(Sugar) names(Sugar) Sugar$ash dim(Sugar$wave.290) dim(Sugar$wave.240)
data(Sugar) names(Sugar) Sugar$ash dim(Sugar$wave.290) dim(Sugar$wave.240)
Fat, protein, and moisture content, along with absorbance spectra (including the first and second derivatives), of 215 meat samples. A detailed description of the data can be found at http://lib.stat.cmu.edu/datasets/tecator.
data(Tecator)
data(Tecator)
A list
containing:
fat
: A vector with the fat content.
protein
: A vector with the protein content.
moisture
: A vector with the moisture content.
absor.spectra
: A matrix containing the near-infrared absorbance spectra observed at 100 equally spaced wavelengths in the range of 850-1050nm.
absor.spectra1
: Fist derivative of the absorbance spectra (computed using B-spline representation of the curves).
absor.spectra2
: Second derivative of the absorbance spectra (computed using B-spline representation of the curves).
Ferraty, F. and Vieu, P. (2006) Nonparametric functional data analysis, Springer Series in Statistics, New York.
data(Tecator) names(Tecator) Tecator$fat Tecator$protein Tecator$moisture dim(Tecator$absor.spectra)
data(Tecator) names(Tecator) Tecator$fat Tecator$protein Tecator$moisture dim(Tecator$absor.spectra)