Title: | Cross-Validation Model Averaging for Partial Linear Functional Additive Models |
---|---|
Description: | Produce an averaging estimate/prediction by combining all candidate models for partial linear functional additive models, using multi-fold cross-validation criterion. More details can be referred to Shishi Liu and Jingxiao Zhang. (2021) <arXiv:2105.00966>. |
Authors: | Shishi Liu [aut, cre], Jingxiao Zhang [aut] |
Maintainer: | Shishi Liu <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2024-12-09 06:51:44 UTC |
Source: | CRAN |
Randomly split the data indexes into nfolds
folds.
cvfolds(nfolds, datasize)
cvfolds(nfolds, datasize)
nfolds |
The number of folds used in cross-validation. |
datasize |
The sample size. |
A list
. Each element contains the index vector of sample data included in this fold.
# Given sample size 20, generate 5 folds set.seed(1212) cvfolds(5, 20) #[[1]] # [1] 6 11 14 16 #[[2]] # [1] 3 5 10 18 #[[3]] # [1] 4 7 8 19 #[[4]] # [1] 2 9 12 15 #[[5]] # [1] 1 13 17 20
# Given sample size 20, generate 5 folds set.seed(1212) cvfolds(5, 20) #[[1]] # [1] 6 11 14 16 #[[2]] # [1] 3 5 10 18 #[[3]] # [1] 4 7 8 19 #[[4]] # [1] 2 9 12 15 #[[5]] # [1] 1 13 17 20
Summarize the estimate of weights for averaging across all candidate models for PLFAMs, using multi-fold cross-validation criterion, and the corresponding mean squared prediction error risk. Additionally, the results of AIC, BIC, SAIC and SBIC are delivered simultaneously.
cvmaPLFAM( Y, scalars, functional, Y.test = NULL, scalars.test = NULL, functional.test = NULL, tt, nump, numfpcs, nbasis, nfolds, ratio.train = NULL )
cvmaPLFAM( Y, scalars, functional, Y.test = NULL, scalars.test = NULL, functional.test = NULL, tt, nump, numfpcs, nbasis, nfolds, ratio.train = NULL )
Y |
The vector of the scalar response variable. |
scalars |
The design matrix of scalar predictors. |
functional |
The matrix including records/measurements of the functional predictor. |
Y.test |
Test data: The vector of the scalar response variable. |
scalars.test |
Test data: The design matrix of scalar predictors. |
functional.test |
Test data: The matrix including records/measurements of the functional predictor. |
tt |
The vector of recording/measurement points for the functional predictor. |
nump |
The number of scalar predictors in candidate models. |
numfpcs |
The number of functional principal components (FPCs) for the functional predictor in candidate models. |
nbasis |
The number of basis functions used for spline approximation. |
nfolds |
The number of folds used in cross-validation. |
ratio.train |
The ratio of data for training, if test data are |
A list
of
aic |
Mean squared error risk in training data set, produced by AIC model selection method. |
bic |
Mean squared error risk in training data set, produced by BIC model selection method. |
saic |
Mean squared error risk in training data set, produced by SAIC model averaging method. |
sbic |
Mean squared error risk in training data set, produced by SBIC model averaging method. |
cv |
Mean squared error risk in training data set, produced by CVMA method. |
waic |
The selected candidate model by AIC model selection method. |
wbic |
The selected candidate model by BIC model selection method. |
wsaic |
The weights for each candidate model by SAIC model averaging method. |
wsbic |
The weights for each candidate model by SBIC model averaging method. |
wcv |
The weights for each candidate model by CVMA method. |
predaic |
Mean squared prediction error risk in test data set, produced by AIC model selection method. |
predbic |
Mean squared prediction error risk in test data set, produced by BIC model selection method. |
predsaic |
Mean squared prediction error risk in test data set, produced by SAIC model averaging method. |
predsbic |
Mean squared prediction error risk in test data set, produced by SBIC model averaging method. |
predcv |
Mean squared prediction error risk in test data set, produced by CVMA method. |
# Generate simulated data simdata = data_gen(R = 0.7, K = 1, n = 50, M0 = 20, typ = 1, design = 3) dat1 = simdata[[1]] scalars = dat1[,1:20] fd = dat1[,21:120] Y = dat1[,122] tps = seq(0, 1, length.out = 100) # Estimation est_res = cvmaPLFAM(Y=Y, scalars = scalars, functional = fd, tt = tps, nump = 2, numfpcs = 3, nbasis = 50, nfolds = 5, ratio.train = 0.8) # Weights estimated by CVMA method est_res$wcv # Prediction error risk on test data set est_res$predcv
# Generate simulated data simdata = data_gen(R = 0.7, K = 1, n = 50, M0 = 20, typ = 1, design = 3) dat1 = simdata[[1]] scalars = dat1[,1:20] fd = dat1[,21:120] Y = dat1[,122] tps = seq(0, 1, length.out = 100) # Estimation est_res = cvmaPLFAM(Y=Y, scalars = scalars, functional = fd, tt = tps, nump = 2, numfpcs = 3, nbasis = 50, nfolds = 5, ratio.train = 0.8) # Weights estimated by CVMA method est_res$wcv # Prediction error risk on test data set est_res$predcv
Simulate sample data for illustration, including a M0
-column design matrix of scalar predictors,
a 100
-column matrix of the functional predictor, a one-column vector of mu
, a one-column vector of Y
,
and a one-column vector of testY
.
data_gen(R, K, n, M0 = 50, typ, design)
data_gen(R, K, n, M0 = 50, typ, design)
R |
A scalar of value ranging from |
K |
A scalar. The number of replications. |
n |
A scalar. The sample size of simulated data. |
M0 |
A scalar. True dimension of scalar predictors. |
typ |
A scalar of value |
design |
A scalar of value |
A list
of K
simulated data sets. Each data set is of matrix
type,
whose first M0
columns corresponds to the design matrix of scalar predictors, followed by the
recording/measurement matrix of the functional predictor, and vectors mu
, Y
, testY
.
library(MASS) # Example: Design 1 in simulation study data_gen(R = 0.6, K = 2, n = 10, typ = 1, design = 1) # Example: Design 2 in simulation study data_gen(R = 0.3, K = 3, n = 10, typ = 2, design = 2) # Example: Design 3 in simulation study data_gen(R = 0.9, K = 5, n = 20, typ = 1, design = 3)
library(MASS) # Example: Design 1 in simulation study data_gen(R = 0.6, K = 2, n = 10, typ = 1, design = 1) # Example: Design 2 in simulation study data_gen(R = 0.3, K = 3, n = 10, typ = 2, design = 2) # Example: Design 3 in simulation study data_gen(R = 0.9, K = 5, n = 20, typ = 1, design = 3)
Conduct functional principal component analysis (FPCA) on the observation matrix of the functional predictor.
fpcscore(Z, nbasis, tt)
fpcscore(Z, nbasis, tt)
Z |
An |
nbasis |
The number of basis functions used for spline approximation. |
tt |
The vector of recording/measurement points for the functional predictor. |
A list
of
score |
An |
eigv |
A vector of estimated eigen-values related to FPCA. |
varp |
A vector of percents of variance explained related to FPCA. |
# Generate a recording/measurement matrix of the functional predictor fddata = matrix(rnorm(1000), nrow = 10, ncol = 100) tpoints = seq(0, 1, length.out = 100) library(fda) # Using 20 basis functions for spline approximation fpcscore(fddata, nbasis = 20, tt = tpoints) # Generate simulated data simdata = data_gen(R = 0.7, K = 1, n = 20, M0 = 20, typ = 1, design = 1) # Extract functional data from 'simdata', columns (M0+1):(M0+100) simfd = simdata[[1]][,21:120] # Calculate fpc scores fpcres = fpcscore(simfd, nbasis = 50, tt = seq(0, 1, length.out = 100)) fpcres$score fpcres$eigv cumsum(fpcres$varp)
# Generate a recording/measurement matrix of the functional predictor fddata = matrix(rnorm(1000), nrow = 10, ncol = 100) tpoints = seq(0, 1, length.out = 100) library(fda) # Using 20 basis functions for spline approximation fpcscore(fddata, nbasis = 20, tt = tpoints) # Generate simulated data simdata = data_gen(R = 0.7, K = 1, n = 20, M0 = 20, typ = 1, design = 1) # Extract functional data from 'simdata', columns (M0+1):(M0+100) simfd = simdata[[1]][,21:120] # Calculate fpc scores fpcres = fpcscore(simfd, nbasis = 50, tt = seq(0, 1, length.out = 100)) fpcres$score fpcres$eigv cumsum(fpcres$varp)
Specify non-nested candidate models, according to the prescribed number of scalar predictors and the number of functional principal components (FPCs).
Each candidate model comprises at least one scalar predictor and one FPC, leading to a total number of candidate models (2^nump
-1)*(2^numq
-1).
modelspec(nump, numq)
modelspec(nump, numq)
nump |
The number of scalar predictors used in candidate models. |
numq |
The number of functional principal components (FPCs) used in candidate models. |
A list
of
a1 |
The number of scalar predictors in each candidate model. |
a2 |
The number of FPCs in each candidate model. |
a3 |
The index for each component in each candidate model. |
# Given nump = 2 and numq = 2, resulting in 9 candidate models modelspec(2, 2) #$a1 #[1] 2 2 2 1 1 1 1 1 1 #$a2 #[1] 2 1 1 2 1 1 2 1 1 #$a3 # [,1] [,2] [,3] [,4] # [1,] 1 2 3 4 # [2,] 1 2 3 0 # [3,] 1 2 0 4 # [4,] 1 0 3 4 # [5,] 1 0 3 0 # [6,] 1 0 0 4 # [7,] 0 2 3 4 # [8,] 0 2 3 0 # [9,] 0 2 0 4
# Given nump = 2 and numq = 2, resulting in 9 candidate models modelspec(2, 2) #$a1 #[1] 2 2 2 1 1 1 1 1 1 #$a2 #[1] 2 1 1 2 1 1 2 1 1 #$a3 # [,1] [,2] [,3] [,4] # [1,] 1 2 3 4 # [2,] 1 2 3 0 # [3,] 1 2 0 4 # [4,] 1 0 3 4 # [5,] 1 0 3 0 # [6,] 1 0 0 4 # [7,] 0 2 3 4 # [8,] 0 2 3 0 # [9,] 0 2 0 4
Calculate the estimated weights for averaging across all candidate models and the corresponding mean squared prediction error risk. The methods include AIC, BIC, SAIC, SBIC, and CVMA for PLFAMs.
predRisk(M, nump, numq, a2, a3, nfolds, XX.train, Y.train, XX.pred, Y.pred)
predRisk(M, nump, numq, a2, a3, nfolds, XX.train, Y.train, XX.pred, Y.pred)
M |
The number of candidate models. |
nump |
The number of scalar predictors in candidate models. |
numq |
The number of funtional principal components (FPCs) in candidate models. |
a2 |
The number of FPCs in each candidate model. See |
a3 |
The index for each component in each candidate model. See |
nfolds |
The number of folds used in cross-validation. |
XX.train |
The training data of predictors processed. |
Y.train |
The training data of response variable. |
XX.pred |
The test data of predictors processed. |
Y.pred |
The test data of response variable. |
A list
of
aic |
Mean squared error risk in training data set, produced by AIC model selection method. |
bic |
Mean squared error risk in training data set, produced by BIC model selection method. |
saic |
Mean squared error risk in training data set, produced by SAIC model averaging method. |
sbic |
Mean squared error risk in training data set, produced by SBIC model averaging method. |
cv |
Mean squared error risk in training data set, produced by CVMA method. |
ws |
A |
predaic |
Mean squared prediction error risk in test data set, produced by AIC model selection method. |
predbic |
Mean squared prediction error risk in test data set, produced by BIC model selection method. |
predsaic |
Mean squared prediction error risk in test data set, produced by SAIC model averaging method. |
predsbic |
Mean squared prediction error risk in test data set, produced by SBIC model averaging method. |
predcv |
Mean squared prediction error risk in test data set, produced by CVMA method. |