Title: | Extensible Classes and Methods for Penalized-Regression-Based Integration of Base Learners |
---|---|
Description: | Extending the base classes and methods of EnsembleBase package for Penalized-Regression-based (Ridge and Lasso) integration of base learners. Default implementation uses cross-validation error to choose the optimal lambda (shrinkage parameter) for the final predictor. The package takes advantage of the file method provided in EnsembleBase package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in EnsembleBase package as well as this package. |
Authors: | Mansour T.A. Sharabiani, Alireza S. Mahani |
Maintainer: | Alireza S. Mahani <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.7 |
Built: | 2024-12-28 06:26:49 UTC |
Source: | CRAN |
This function applies Penalized Regression (Lasso and Ridge) to predictions from regression base learners to produce an ensemble prediction. Shrinkage parameter (lambda
) is determined by minimizing the cross-validation error. The data partition for the integration phase does not have to be the same as the partition(s) used to generate the base learners. Functions from EnsembleBase are used for training and prediction of base learners. Also, base classes and generic methods of the same package are extended to support PenReg integration.
epenreg(formula, data , baselearner.control=epenreg.baselearner.control() , integrator.control=epenreg.integrator.control() , ncores=1, filemethod=FALSE, print.level=1 , preschedule = TRUE , schedule.method = c("random", "as.is", "task.length") , task.length )
epenreg(formula, data , baselearner.control=epenreg.baselearner.control() , integrator.control=epenreg.integrator.control() , ncores=1, filemethod=FALSE, print.level=1 , preschedule = TRUE , schedule.method = c("random", "as.is", "task.length") , task.length )
formula |
Formula expressing response variable and covariates. |
data |
Data frame containing the response variable and covariates. |
baselearner.control |
Control structure determining the base learners, their configurations, and data partitioning details. See |
integrator.control |
Control structure governing integrator behavior. See |
ncores |
Number of cores used for parallel training of base learners. |
filemethod |
Boolean flag indicating whether or not to save estimation objects to disk or not. Using |
print.level |
Controlling verbosity level. |
preschedule |
Boolean flag, indicating whether base learner training jobs must be scheduled statically ( |
schedule.method |
Method used for scheduling tasks on threads. In "as.is" tasks are assigned to threads in a round-robin fashion for static scheduling. In dynamic scheduling, tasks form a queue without any re-ordering. In "random", tasks are first randomly shuffled, and the rest is similar to "as.is". In "task.length", a heuristic algorithm is used in static scheduling for assigning tasks to threads to minimize load imbalance, i.e. make total task lengths in threads roughly equal. In dynamic scheduling, tasks are sorted in descending order of expected length to form the task queue. |
task.length |
Vector of estimated task lengths, to be used in the "task.length" method of scheduling. |
An object of classes epenreg
(if filemethod==TRUE
, also has class of epenreg.file
), a list with the following elements:
call |
Copy of function call. |
formula |
Copy of formula argument in function call. |
instance.list |
An object of class |
integrator.config |
Copy of configuration object passed to the integrator. Object of class |
method |
Integration method. Currently, only "default" is supported. |
est |
A list with these elements: 1) |
y |
Copy of response variable vector. |
pred |
Within-sample prediction of the ensemble model. |
filemethod |
Copy of passed-in |
Mansour T.A. Sharabiani, Alireza S. Mahani
epenreg.baselearner.control
, epenreg.integrator.control
, Instance.List
, Regression.Integrator.PenReg.SelMin.Config
, Regression.CV.Batch.FitObj
, Regression.Batch.FitObj
, Regression.Integrator.PenReg.SelMin.FitObj
data(servo) myformula <- class~motor+screw+pgain+vgain perc.train <- 0.7 index.train <- sample(1:nrow(servo), size = round(perc.train*nrow(servo))) data.train <- servo[index.train,] data.predict <- servo[-index.train,] ## to run longer test using all 5 default regression base learners ## try: est <- epenreg(myformula, data.train, ncores=2) est <- epenreg(myformula, data.train, ncores=2 , baselearner.control=epenreg.baselearner.control(baselearners="knn")) newpred <- predict(est, data.predict)
data(servo) myformula <- class~motor+screw+pgain+vgain perc.train <- 0.7 index.train <- sample(1:nrow(servo), size = round(perc.train*nrow(servo))) data.train <- servo[index.train,] data.predict <- servo[-index.train,] ## to run longer test using all 5 default regression base learners ## try: est <- epenreg(myformula, data.train, ncores=2) est <- epenreg(myformula, data.train, ncores=2 , baselearner.control=epenreg.baselearner.control(baselearners="knn")) newpred <- predict(est, data.predict)
Function epenreg.baselearner.control
sets up the base learners used in the epenreg
call. Function epenreg.integrator.control
sets up the PCR integrator.
epenreg.baselearner.control( baselearners = c("nnet","rf","svm","gbm","knn") , baselearner.configs = make.configs(baselearners, type = "regression") , npart = 1, nfold = 5 ) epenreg.integrator.control(errfun=rmse.error, alpha=1.0 , n=100, nfold=5, method=c("default") )
epenreg.baselearner.control( baselearners = c("nnet","rf","svm","gbm","knn") , baselearner.configs = make.configs(baselearners, type = "regression") , npart = 1, nfold = 5 ) epenreg.integrator.control(errfun=rmse.error, alpha=1.0 , n=100, nfold=5, method=c("default") )
baselearners |
Names of base learners used. Currently, regression options available are Neural Network ("nnet"), Random Forest ("rf"), Support Vector Machine ("svm"), Gradient Boosting Machine ("gbm"), K-Nearest Neighbors ("knn"), Penalized Rergession ("penreg") and Bayesian Additive Regression Trees ("bart"). The last two learners are not include in the default list: "penreg" tends to produce highly correlated, and generally imprecise, predictions and skews the integration stage towards itself. "bart", on the other hand, is quite time- and memory-consuming to train, depsite generally having superior predictive performance. Users with more CPU and memory resources can add "bart" to achieve higher predictive accuracy. |
baselearner.configs |
List of base learner configurations. Default is to call |
npart |
Number of partitions to train each base learner configuration in a CV scheme. |
nfold |
Number of folds within each data partition. |
errfun |
Error function used to compare performance of base learner configurations. Default is to use |
alpha |
Determining L1 vs L2 penalty. |
n |
Suggested number of |
method |
Integrator method. Currently, only option is "default", where PenReg is performed on all base learner instances, and CV error is used to find the optimal shrinkage parameter. Same CV-based PenReg output is used to make final prediction. |
Both functions return lists with same element names as function arguments.
Mansour T.A. Sharabiani, Alireza S. Mahani
These functions can be used whether filemethod
flag is set to TRUE
or FALSE
during the epenreg
call. Note that epenreg.load
‘returns’ the estimation object (in contrast to the standard load
method).
epenreg.save(obj, file) epenreg.load(file)
epenreg.save(obj, file) epenreg.load(file)
obj |
Object of classes |
file |
Filepath to where |
Function epenreg.load
returns the saved obj
, with estimation files automatically copied to R temporary directory, and filepaths inside the obj
fields updated to point to these new filepaths.
Mansour T.A. Sharabiani, Alireza S. Mahani
## Not run: data(servo) myformula <- class~motor+screw+pgain+vgain perc.train <- 0.7 index.train <- sample(1:nrow(servo), size = round(perc.train*nrow(servo))) data.train <- servo[index.train,] data.predict <- servo[-index.train,] est <- epenreg(myformula, data.train, ncores=2, filemethod=TRUE , baselearner.control=epenreg.baselearner.control(baselearners="knn")) epenreg.save(est, "somefile") rm(est) # alternatively, exit and re-launch R session est.loaded <- epenreg.load("somefile") newpred <- predict(est.loaded, data.predict) # can also be used with filemethod set to FALSE est <- epenreg(myformula, data.train, ncores=2, filemethod=FALSE , baselearner.control=epenreg.baselearner.control(baselearners="knn")) epenreg.save(est, "somefile") rm(est) # alternatively, exit and re-launch R session est.loaded <- epenreg.load("somefile") newpred <- predict(est.loaded, data.predict) ## End(Not run)
## Not run: data(servo) myformula <- class~motor+screw+pgain+vgain perc.train <- 0.7 index.train <- sample(1:nrow(servo), size = round(perc.train*nrow(servo))) data.train <- servo[index.train,] data.predict <- servo[-index.train,] est <- epenreg(myformula, data.train, ncores=2, filemethod=TRUE , baselearner.control=epenreg.baselearner.control(baselearners="knn")) epenreg.save(est, "somefile") rm(est) # alternatively, exit and re-launch R session est.loaded <- epenreg.load("somefile") newpred <- predict(est.loaded, data.predict) # can also be used with filemethod set to FALSE est <- epenreg(myformula, data.train, ncores=2, filemethod=FALSE , baselearner.control=epenreg.baselearner.control(baselearners="knn")) epenreg.save(est, "somefile") rm(est) # alternatively, exit and re-launch R session est.loaded <- epenreg.load("somefile") newpred <- predict(est.loaded, data.predict) ## End(Not run)
epenreg
model
Function for generating diagnostics plot for epenreg
trained model.
## S3 method for class 'epenreg' plot(x, ...)
## S3 method for class 'epenreg' plot(x, ...)
x |
Object of class |
... |
Arguments passed to/from other methods. |
Function plot.epenreg
creates two sub-plots in a figure: 1) a plot of base learner CV errors, with one data point per base learner configuration. The horizontal dotted line indicates the CV error corresponding to the chosen base learner configuration. For "default" method, this is the same as the minimum error of points on this plot; 2) plot of CV error as a function of the value of shrinkage parameter (x-axis in log scale). The minimum point of this plot is chosen as the optimal lambda and subsequently used for prediction.
Mansour T.A. Sharabiani, Alireza S. Mahani
"epenreg"
Obtain model predictions from training or new data for epenreg
model.
## S3 method for class 'epenreg' predict(object, newdata=NULL, ncores=1, ...)
## S3 method for class 'epenreg' predict(object, newdata=NULL, ncores=1, ...)
object |
Object of class |
newdata |
New data frame to make predictions for. If |
ncores |
Number of cores to use for parallel prediction. |
... |
Arguments passed to/from other methods. |
A vector of length nrow(newdata)
(or of length of training data if newdata==NULL
.)
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Integrator.PenReg.SelMin.Config"
Configuration class for PenReg-base integration, where optimal shrinkage parameter is selected to minimize the cross-validation error of the integrator.
Objects can be created by calls of the form new("Regression.Integrator.PenReg.SelMin.Config", ...)
.
partition
:Object of class "integer"
, data partition to use for cross-validation selection of optimal PC's in PCR integration. This can be the output of generate.partition
.
n
:Object of class "OptionalNumeric"
, indicating, in this derived class, the maximum number of values of lambda
's to produce predictions for.
alpha
:Object of class "numeric"
, indicating the relative strength of L1 (alpha=1.0
) vs. L2 (alpha=0.0
) penalty in penalized regression.
errfun
:Object of class "function"
, error function to use for selecting best number of PC's.
Class "Regression.Integrator.Config"
, directly.
signature(object = "Regression.Integrator.PenReg.SelMin.Config")
: ...
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Integrator.PenReg.SelMin.FitObj"
Class containing the output of fitting a PenReg-based integrator with CV-error minimization method for selecting the optimal shrinkage parameter.
Objects can be created by calls of the form new("Regression.Integrator.PenReg.SelMin.FitObj", ...)
.
config
:Object of class "Regression.Integrator.Config"
, containing the error function and the partition to use for training the PenReg integrator.
est
:Object of class "ANY"
, estimation object that is used for prediction.
pred
:Object of class "numeric"
, prediction for training set.
Class "Regression.Integrator.FitObj"
, directly.
No methods defined with class "Regression.Integrator.PenReg.SelMin.FitObj" in the signature.
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Integrator.FitObj"
Perform the same sweep operation on data partitions and assemble the pieces into a complete set.
Regression.Sweep.CV.Fit(config, X, y, partition, print.level = 1)
Regression.Sweep.CV.Fit(config, X, y, partition, print.level = 1)
config |
Object of class |
X |
Matrix of predictors to perform PCR on. |
y |
Vector of response to use during PCR. |
partition |
Data partition used for CV sweep, typically the output of |
print.level |
Determining verbosity level during function execution. |
An object of class Regression.Sweep.CV.FitObj
.
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Sweep.CV.FitObj"
Class containing output of Regression.Sweep.CV.Fit
function.
Objects can be created by calls of the form new("Regression.Sweep.CV.FitObj", ...)
.
sweep.list
:Object of class "list"
, list of length equal to number of folds in partition
. Each element of list is contains the output of Regression.Sweep.Fit
and has class Regression.Sweep.FitObj
.
pred
:Object of class "matrix"
, containing the matrix of predictions from this operation.
partition
:Object of class "OptionalInteger"
, data partition used to perform CV sweep.
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Sweep.PenReg.Config"
Configuration class for PenReg sweep operation
Objects can be created by calls of the form new("Regression.Sweep.PenReg.Config", ...)
.
n
:Object of class "OptionalNumeric"
, indicating, in this derived class, the maximum number of values of lambda
's to produce predictions for.
alpha
:Object of class "numeric"
, indicating the relative strength of L1 (alpha=1.0
) vs. L2 (alpha=0.0
) penalty in penalized regression.
lambda
:Object of class "numeric"
, containing the values of shrinkage parameter to generate predictions for. During CV sweep, this parameter is determined in the first fold and passed on to the remaining folds.
Class "Regression.Sweep.Config"
, directly.
signature(object = "Regression.Sweep.PenReg.Config")
: ...
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Sweep.PenReg.FitObj"
Class containing the output of performing - or fitting - of PenReg sweep operation.
Objects can be created by calls of the form new("Regression.Sweep.PenReg.FitObj", ...)
.
config
:Object of class "Regression.Sweep.Config"
~~
est
:Object of class "ANY"
, the estimation object needed for prediction.
pred
:Object of class "matrix"
, matrix of predictions for training data. Column n
corresponds to the prediction using PC's from 1 to n
.
Class "Regression.Sweep.FitObj"
, directly.
No methods defined with class "Regression.Sweep.PenReg.FitObj" in the signature.
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Sweep.FitObj"