Title: | Extensible Package for Principal-Component-Regression-Based Heterogeneous Ensemble Meta-Learning |
---|---|
Description: | Extends the base classes and methods of 'EnsembleBase' package for Principal-Components-Regression-based (PCR) integration of base learners. Default implementation uses cross-validation error to choose the optimal number of PC components for the final predictor. The package takes advantage of the file method provided in 'EnsembleBase' package for writing estimation objects to disk in order to circumvent RAM bottleneck. Special save and load methods are provided to allow estimation objects to be saved to permanent files on disk, and to be loaded again into temporary files in a later R session. Users and developers can extend the package by extending the generic methods and classes provided in 'EnsembleBase' package as well as this package. |
Authors: | Mansour T.A. Sharabiani, Alireza S. Mahani |
Maintainer: | Alireza S. Mahani <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.4 |
Built: | 2024-12-17 06:53:16 UTC |
Source: | CRAN |
This function applies PCR to predictions from regression base learners to produce an ensemble prediction. Number of PC's used in the PCR algorithm is determined by minimizing the cross-validation error. The data partition for the integration phase does not have to be the same as the partition(s) used to generate the base learners. Functions from EnsembleBase are used for training and prediction of base learners. Also, base classes and generic methods of the same package are extended to support PCR integration.
epcreg(formula, data , baselearner.control=epcreg.baselearner.control() , integrator.control=epcreg.integrator.control() , ncores=1, filemethod=FALSE, print.level=1 , preschedule = TRUE , schedule.method = c("random", "as.is", "task.length") , task.length )
epcreg(formula, data , baselearner.control=epcreg.baselearner.control() , integrator.control=epcreg.integrator.control() , ncores=1, filemethod=FALSE, print.level=1 , preschedule = TRUE , schedule.method = c("random", "as.is", "task.length") , task.length )
formula |
Formula expressing response variable and covariates. |
data |
Data frame containing the response variable and covariates. |
baselearner.control |
Control structure determining the base learners, their configurations, and data partitioning details. See |
integrator.control |
Control structure governing integrator behavior. See |
ncores |
Number of cores used for parallel training of base learners. |
filemethod |
Boolean flag indicating whether or not to save estimation objects to disk or not. Using |
print.level |
Controlling verbosity level. |
preschedule |
Boolean flag, indicating whether base learner training jobs must be scheduled statically ( |
schedule.method |
Method used for scheduling tasks on threads. In "as.is" tasks are assigned to threads in a round-robin fashion for static scheduling. In dynamic scheduling, tasks form a queue without any re-ordering. In "random", tasks are first randomly shuffled, and the rest is similar to "as.is". In "task.length", a heuristic algorithm is used in static scheduling for assigning tasks to threads to minimize load imbalance, i.e. make total task lengths in threads roughly equal. In dynamic scheduling, tasks are sorted in descending order of expected length to form the task queue. |
task.length |
Vector of estimated task lengths, to be used in the "task.length" method of scheduling. |
An object of classes epcreg
(if filemethod==TRUE
, also has class of epcreg.file
), a list with the following elements:
call |
Copy of function call. |
formula |
Copy of formula argument in function call. |
instance.list |
An object of class |
integrator.config |
Copy of configuration object passed to the integrator. Object of class |
method |
Integration method. Currently, only "default" is supported. |
est |
A list with these elements: 1) |
y |
Copy of response variable vector. |
pred |
Within-sample prediction of the ensemble model. |
filemethod |
Copy of passed-in |
Mansour T.A. Sharabiani, Alireza S. Mahani
epcreg.baselearner.control
, epcreg.integrator.control
, Instance.List
, Regression.Integrator.PCR.SelMin.Config
, Regression.CV.Batch.FitObj
, Regression.Batch.FitObj
, Regression.Integrator.PCR.SelMin.FitObj
data(servo) myformula <- class~motor+screw+pgain+vgain perc.train <- 0.7 index.train <- sample(1:nrow(servo), size = round(perc.train*nrow(servo))) data.train <- servo[index.train,] data.predict <- servo[-index.train,] ## to run longer test using all 5 default regression base learners ## try: est <- epcreg(myformula, data.train, ncores=2) est <- epcreg(myformula, data.train, ncores=2 , baselearner.control=epcreg.baselearner.control( baselearners="knn" , baselearner.configs = make.configs("knn" , config.df = expand.grid(kernel = "rectangular" , k = c(5, 10))))) newpred <- predict(est, data.predict)
data(servo) myformula <- class~motor+screw+pgain+vgain perc.train <- 0.7 index.train <- sample(1:nrow(servo), size = round(perc.train*nrow(servo))) data.train <- servo[index.train,] data.predict <- servo[-index.train,] ## to run longer test using all 5 default regression base learners ## try: est <- epcreg(myformula, data.train, ncores=2) est <- epcreg(myformula, data.train, ncores=2 , baselearner.control=epcreg.baselearner.control( baselearners="knn" , baselearner.configs = make.configs("knn" , config.df = expand.grid(kernel = "rectangular" , k = c(5, 10))))) newpred <- predict(est, data.predict)
Function epcreg.baselearner.control
sets up the base learners used in the epcreg
call. Function epcreg.integrator.control
sets up the PCR integrator.
epcreg.baselearner.control( baselearners = c("nnet","rf","svm","gbm","knn") , baselearner.configs = make.configs(baselearners, type = "regression") , npart = 1, nfold = 5 ) epcreg.integrator.control(errfun=rmse.error, nfold=5, method=c("default"))
epcreg.baselearner.control( baselearners = c("nnet","rf","svm","gbm","knn") , baselearner.configs = make.configs(baselearners, type = "regression") , npart = 1, nfold = 5 ) epcreg.integrator.control(errfun=rmse.error, nfold=5, method=c("default"))
baselearners |
Names of base learners used. Currently, regression options available are Neural Network ("nnet"), Random Forest ("rf"), Support Vector Machine ("svm"), Gradient Boosting Machine ("gbm"), and K-Nearest Neighbors ("knn"). |
baselearner.configs |
List of base learner configurations. Default is to call |
npart |
Number of partitions to train each base learner configuration in a CV scheme. |
nfold |
Number of folds within each data partition. |
errfun |
Error function used to compare performance of base learner configurations. Default is to use |
method |
Integrator method. Currently, only option is "default", where PCR is performed on all base learner instances, and CV error is used to find the optimal number of PC's. Same CV-based PCR output is used to make final prediction. |
Both functions return lists with same element names as function arguments.
Mansour T.A. Sharabiani, Alireza S. Mahani
These functions can be used whether filemethod
flag is set to TRUE
or FALSE
during the epcreg
call. Note that epcreg.load
‘returns’ the estimation object (in contrast to the standard load
method).
epcreg.save(obj, file) epcreg.load(file)
epcreg.save(obj, file) epcreg.load(file)
obj |
Object of classes |
file |
Filepath to where |
Function epcreg.load
returns the saved obj
, with estimation files automatically copied to R temporary directory, and filepaths inside the obj
fields updated to point to these new filepaths (if filemethod
was set to TRUE
in the call to epcreg
).
Mansour T.A. Sharabiani, Alireza S. Mahani
data(servo) myformula <- class~motor+screw+pgain+vgain perc.train <- 0.7 index.train <- sample(1:nrow(servo), size = round(perc.train*nrow(servo))) data.train <- servo[index.train,] data.predict <- servo[-index.train,] est <- epcreg(myformula, data.train, ncores=2 , baselearner.control=epcreg.baselearner.control( baselearners="knn" , baselearner.configs = make.configs("knn" , config.df = expand.grid(kernel = "rectangular" , k = c(5, 10)))), filemethod = TRUE) epcreg.save(est, "somefile") rm(est) # alternatively, exit and re-launch R session est.loaded <- epcreg.load("somefile") newpred <- predict(est.loaded, data.predict) file.remove("somefile")
data(servo) myformula <- class~motor+screw+pgain+vgain perc.train <- 0.7 index.train <- sample(1:nrow(servo), size = round(perc.train*nrow(servo))) data.train <- servo[index.train,] data.predict <- servo[-index.train,] est <- epcreg(myformula, data.train, ncores=2 , baselearner.control=epcreg.baselearner.control( baselearners="knn" , baselearner.configs = make.configs("knn" , config.df = expand.grid(kernel = "rectangular" , k = c(5, 10)))), filemethod = TRUE) epcreg.save(est, "somefile") rm(est) # alternatively, exit and re-launch R session est.loaded <- epcreg.load("somefile") newpred <- predict(est.loaded, data.predict) file.remove("somefile")
epcreg
model
Function for generating diagnostics plot for epcreg
trained model.
## S3 method for class 'epcreg' plot(x, ...)
## S3 method for class 'epcreg' plot(x, ...)
x |
Object of class |
... |
Arguments passed to/from other methods. |
Function plot.epcreg
creates two sub-plots in a figure: 1) a plot of base learner CV errors, with one data point per base learner configuration. The horizontal dotted line indicates the CV error corresponding to the chosen base learner configuration. For "default" method, this is the same as the minimum error of points on this plot; 2) plot of CV error as a function number of PC's used in PCR-based integration. The minimum point of this plot is chosen as the optimal number of PC's and subsequrnyl used for prediction.
Mansour T.A. Sharabiani, Alireza S. Mahani
"epcreg"
Obtain model predictions from training or new data for epcreg
model.
## S3 method for class 'epcreg' predict(object, newdata=NULL, ncores=1 , preschedule = TRUE, ...)
## S3 method for class 'epcreg' predict(object, newdata=NULL, ncores=1 , preschedule = TRUE, ...)
object |
Object of class |
newdata |
New data frame to make predictions for. If |
ncores |
Number of cores to use for parallel prediction. |
preschedule |
Boolean flag, indicating whether base learner training jobs must be scheduled statically ( |
... |
Arguments passed to/from other methods. |
A vector of length nrow(newdata)
(or of length of training data if newdata==NULL
.)
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Integrator.PCR.SelMin.Config"
Configuration class for PCR-base integration, where the number of PC's is selected to minimize the cross-validation error of the integrator.
Objects can be created by calls of the form new("Regression.Integrator.PCR.SelMin.Config", ...)
.
partition
:Object of class "integer"
, data partition to use for cross-validation selection of optimal PC's in PCR integration. This can be the output of generate.partition
.
errfun
:Object of class "function"
, error function to use for selecting best number of PC's.
Class "Regression.Integrator.Config"
, directly.
signature(object = "Regression.Integrator.PCR.SelMin.Config")
: ...
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Integrator.PCR.SelMin.FitObj"
Class containing the output of fitting a PCR-based integrator with CV-error minimization method for selecting the number of PC's.
Objects can be created by calls of the form new("Regression.Integrator.PCR.SelMin.FitObj", ...)
.
config
:Object of class "Regression.Integrator.Config"
, containing the error function and the partition to use for training the PCR integrator.
est
:Object of class "ANY"
, estimation object that is used for prediction.
pred
:Object of class "numeric"
, prediction for training set.
Class "Regression.Integrator.FitObj"
, directly.
No methods defined with class "Regression.Integrator.PCR.SelMin.FitObj" in the signature.
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Integrator.FitObj"
Perform the same sweep operation on data partitions and assemble the pieces into a complete set.
Regression.Sweep.CV.Fit(config, X, y, partition, print.level = 1)
Regression.Sweep.CV.Fit(config, X, y, partition, print.level = 1)
config |
Object of class |
X |
Matrix of predictors to perform PCR on. |
y |
Vector of response to use during PCR. |
partition |
Data partition used for CV sweep, typically the output of |
print.level |
Determining verbosity level during function execution. |
An object of class Regression.Sweep.CV.FitObj
.
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Sweep.CV.FitObj"
Class containing output of Regression.Sweep.CV.Fit
function.
Objects can be created by calls of the form new("Regression.Sweep.CV.FitObj", ...)
.
sweep.list
:Object of class "list"
, list of length equal to number of folds in partition
. Each element of list is contains the output of Regression.Sweep.Fit
and has class Regression.Sweep.FitObj
.
pred
:Object of class "matrix"
, containing the matrix of predictions from this operation.
partition
:Object of class "OptionalInteger"
, data partition used to perform CV sweep.
Mansour T.A. Sharabiani, Alireza S. Mahani
Regression.Sweep.Fit
in Package EnsemblePCReg ~~~~ Methods for function Regression.Sweep.Fit
in package EnsemblePCReg ~~
signature(object = "Regression.Sweep.PCR.Config")
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Sweep.PCR.Config"
Configuration class for PCR sweep operation
Objects can be created by calls of the form new("Regression.Sweep.PCR.Config", ...)
.
n
:Object of class "OptionalNumeric"
, indicating, in this derived class, the maximum number of PC's to perform the PCR sweep for.
Class "Regression.Sweep.Config"
, directly.
signature(object = "Regression.Sweep.PCR.Config")
: ...
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Sweep.PCR.FitObj"
Class containing the output of performing - or fitting - of PCR sweep operation.
Objects can be created by calls of the form new("Regression.Sweep.PCR.FitObj", ...)
.
config
:Object of class "Regression.Sweep.Config"
~~
est
:Object of class "ANY"
, the estimation object needed for prediction.
pred
:Object of class "matrix"
, matrix of predictions for training data. Column n
corresponds to the prediction using PC's from 1 to n
.
Class "Regression.Sweep.FitObj"
, directly.
No methods defined with class "Regression.Sweep.PCR.FitObj" in the signature.
Mansour T.A. Sharabiani, Alireza S. Mahani
"Regression.Sweep.FitObj"
epcreg
model
Summary function for epcreg
model
## S3 method for class 'epcreg' summary(object, ...)
## S3 method for class 'epcreg' summary(object, ...)
object |
Object of class |
... |
Arguments passed to/from other functions. |
A list with the following elements:
n.instance |
Number of base learner instances used in training the model. |
maxpc |
Maximum number of PC's considered in PCR-based integration of base learners. |
index.min |
Optimal number of PC's, i.e. what minimizes the CV error. |
error.min |
Minimum CV error in PCR-based integration, corresponding to |
tvec |
Vector of task lengths for each base learner instance. This can be passed to |
Mansour T.A. Sharabiani, Alireza S. Mahani