Title: | Distance Weighted Discrimination (DWD) and Kernel Methods |
---|---|
Description: | A novel implementation that solves the linear distance weighted discrimination and the kernel distance weighted discrimination. Reference: Wang and Zou (2018) <doi:10.1111/rssb.12244>. |
Authors: | Boxiang Wang <[email protected]>, Hui Zou <[email protected]> |
Maintainer: | Boxiang Wang <[email protected]> |
License: | GPL-2 |
Version: | 2.0.3 |
Built: | 2024-11-01 11:16:22 UTC |
Source: | CRAN |
Extremely novel efficient procedures for solving linear generalized DWD and kernel generalized DWD in reproducing kernel Hilbert spaces for classification. The algorithm is based on the majorization-minimization (MM) principle to compute the entire solution path at a given fine grid of regularization parameters.
Suppose x
is predictor and y
is a binary response. The package computes the entire solution path over a grid of lambda
values.
The main functions of the package kerndwd
include:kerndwd
cv.kerndwd
tunedwd
predict.kerndwd
plot.kerndwd
plot.cv.kerndwd
Boxiang Wang and Hui Zou
Maintainer: Boxiang Wang [email protected]
Wang, B. and Zou, H. (2018)
“Another Look at Distance Weighted Discrimination,"
Journal of Royal Statistical Society, Series B, 80(1), 177–198.
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12244
Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A. (2004)
“kernlab – An S4 Package for Kernel Methods in R",
Journal of Statistical Software, 11(9), 1–20.
https://www.jstatsoft.org/v11/i09/paper
Marron, J.S., Todd, M.J., Ahn, J. (2007)
“Distance-Weighted Discrimination"",
Journal of the American Statistical Association, 102(408), 1267–1271.
https://www.tandfonline.com/doi/abs/10.1198/016214507000001120
BUPA's liver disorders data: 345 male individuals' blood test result and liver disorder status.
data(BUPA)
data(BUPA)
This data set consists of 345 observations and 6 predictors representing the blood test result liver disorder status of 345 patients. The three predictors are mean corpuscular volume (MCV), alkaline phosphotase (ALKPHOS), alamine aminotransferase (SGPT), aspartate aminotransferase (SGOT), gamma-glutamyl transpeptidase (GAMMAGT), and the number of alcoholic beverage drinks per day (DRINKS).
A list with the following elements:
X |
A numerical matrix for predictors: 345 rows and 6 columns; each row corresponds to a patient. |
y |
A numeric vector of length 305 representing the liver disorder status. |
The data set is available for download from UCI machine learning repository.
# load data set data(BUPA) # the number of samples predictors dim(BUPA$X) # the number of samples for each class sum(BUPA$y == -1) sum(BUPA$y == 1)
# load data set data(BUPA) # the number of samples predictors dim(BUPA$X) # the number of samples for each class sum(BUPA$y == -1) sum(BUPA$y == 1)
Carry out a cross-validation for kerndwd
to find optimal values of the tuning parameter lambda
.
cv.kerndwd(x, y, kern, lambda, nfolds=5, foldid, wt, ...)
cv.kerndwd(x, y, kern, lambda, nfolds=5, foldid, wt, ...)
x |
A matrix of predictors, i.e., the matrix |
y |
A vector of binary class labels, i.e., the |
kern |
A kernel function. |
lambda |
A user specified |
nfolds |
The number of folds. Default value is 5. The allowable range is from 3 to the sample size. |
foldid |
An optional vector with values between 1 and |
wt |
A vector of length |
... |
Other arguments being passed to |
This function computes the mean cross-validation error and the standard error by fitting kerndwd
with every fold excluded alternatively. This function is modified based on the cv
function from the glmnet
package.
A cv.kerndwd
object including the cross-validation results is return..
lambda |
The |
cvm |
A vector of length |
cvsd |
A vector of length |
cvupper |
The upper curve: |
cvlower |
The lower curve: |
lambda.min |
The |
lambda.1se |
The largest value of |
cvm.min |
The cross-validation error corresponding to |
cvm.1se |
The cross-validation error corresponding to |
Boxiang Wang and Hui Zou
Maintainer: Boxiang Wang [email protected]
Wang, B. and Zou, H. (2018)
“Another Look at Distance Weighted Discrimination,"
Journal of Royal Statistical Society, Series B, 80(1), 177–198.
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12244
Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths for generalized linear models via coordinate descent," Journal of Statistical Software, 33(1), 1–22.
https://www.jstatsoft.org/v33/i01/paper
set.seed(1) data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(3, -3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) m.cv = cv.kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) m.cv$lambda.min
set.seed(1) data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(3, -3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) m.cv = cv.kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) m.cv$lambda.min
Fit the linear generalized distance weighted discrimination (DWD) model and the generalized DWD on Reproducing kernel Hilbert space. The solution path is computed at a grid of values of tuning parameter lambda
.
kerndwd(x, y, kern, lambda, qval=1, wt, eps=1e-05, maxit=1e+05)
kerndwd(x, y, kern, lambda, qval=1, wt, eps=1e-05, maxit=1e+05)
x |
A numerical matrix with |
y |
A vector of length |
kern |
A kernel function; see |
lambda |
A user supplied |
qval |
The exponent index of the generalized DWD. Default value is 1. |
wt |
A vector of length |
eps |
The algorithm stops when (i.e. |
maxit |
The maximum of iterations allowed. Default is 1e5. |
Suppose that the generalized DWD loss is if
and
if
. The value of
, i.e.,
lambda
, is user-specified.
In the linear case (kern
is the inner product and N > p), the kerndwd
fits a linear DWD by minimizing the L2 penalized DWD loss function,
If a linear DWD is fitted when N < p, a kernel DWD with the linear kernel is actually solved. In such case, the coefficient can be obtained from
In the kernel case, the kerndwd
fits a kernel DWD by minimizing
where is the kernel matrix and
is the ith row.
The weighted linear DWD and the weighted kernel DWD are formulated as follows,
where is the ith element of
wt
. The choice of weight factors can be seen in the reference below.
An object with S3 class kerndwd
.
alpha |
A matrix of DWD coefficients at each |
lambda |
The |
npass |
Total number of MM iterations for all lambda values. |
jerr |
Warnings and errors; 0 if none. |
info |
A list including parameters of the loss function, |
call |
The call that produced this object. |
Boxiang Wang and Hui Zou
Maintainer: Boxiang Wang [email protected]
Wang, B. and Zou, H. (2018)
“Another Look at Distance Weighted Discrimination,"
Journal of Royal Statistical Society, Series B, 80(1), 177–198.
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12244
Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A. (2004)
“kernlab – An S4 Package for Kernel Methods in R",
Journal of Statistical Software, 11(9), 1–20.
https://www.jstatsoft.org/v11/i09/paper
Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths for generalized
linear models via coordinate descent," Journal of Statistical Software, 33(1), 1–22.
https://www.jstatsoft.org/v33/i01/paper
Marron, J.S., Todd, M.J., and Ahn, J. (2007)
“Distance-Weighted Discrimination"",
Journal of the American Statistical Association, 102(408), 1267–1271.
https://www.tandfonline.com/doi/abs/10.1198/016214507000001120
Qiao, X., Zhang, H., Liu, Y., Todd, M., Marron, J.S. (2010)
“Weighted distance weighted discrimination and its asymptotic properties",
Journal of the American Statistical Association, 105(489), 401–414.
https://www.tandfonline.com/doi/abs/10.1198/jasa.2010.tm08487
predict.kerndwd
, plot.kerndwd
, and cv.kerndwd
.
data(BUPA) # standardize the predictors BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) # a grid of tuning parameters lambda = 10^(seq(3, -3, length.out=10)) # fit a linear DWD kern = vanilladot() DWD_linear = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) # fit a DWD using Gaussian kernel kern = rbfdot(sigma=1) DWD_Gaussian = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) # fit a weighted kernel DWD kern = rbfdot(sigma=1) weights = c(1, 2)[factor(BUPA$y)] DWD_wtGaussian = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, wt = weights, eps=1e-5, maxit=1e5)
data(BUPA) # standardize the predictors BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) # a grid of tuning parameters lambda = 10^(seq(3, -3, length.out=10)) # fit a linear DWD kern = vanilladot() DWD_linear = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) # fit a DWD using Gaussian kernel kern = rbfdot(sigma=1) DWD_Gaussian = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) # fit a weighted kernel DWD kern = rbfdot(sigma=1) weights = c(1, 2)[factor(BUPA$y)] DWD_wtGaussian = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, wt = weights, eps=1e-5, maxit=1e5)
Kernel functions provided in the R package kernlab
. Details can be seen in the reference below.
The Gaussian RBF kernel
The Polynomial kernel
The Linear kernel
The Laplacian kernel
The Bessel kernel
The ANOVA RBF kernel where k(x, x) is a Gaussian RBF kernel.
The Spline kernel .
The parameter
sigma
used in rbfdot
can be selected by sigest()
.
rbfdot(sigma = 1) polydot(degree = 1, scale = 1, offset = 1) vanilladot() laplacedot(sigma = 1) besseldot(sigma = 1, order = 1, degree = 1) anovadot(sigma = 1, degree = 1) splinedot() sigest(x)
rbfdot(sigma = 1) polydot(degree = 1, scale = 1, offset = 1) vanilladot() laplacedot(sigma = 1) besseldot(sigma = 1, order = 1, degree = 1) anovadot(sigma = 1, degree = 1) splinedot() sigest(x)
sigma |
The inverse kernel width used by the Gaussian, the Laplacian, the Bessel, and the ANOVA kernel. |
degree |
The degree of the polynomial, bessel or ANOVA kernel function. This has to be an positive integer. |
scale |
The scaling parameter of the polynomial kernel function. |
offset |
The offset used in a polynomial kernel. |
order |
The order of the Bessel function to be used as a kernel. |
x |
The design matrix used in |
These R functions and descriptions are directly duplicated and/or adapted from the R package kernlab
.
Return an S4 object of class kernel
which can be used as the argument of kern
when fitting a kerndwd
model.
Wang, B. and Zou, H. (2018)
“Another Look at Distance Weighted Discrimination,"
Journal of Royal Statistical Society, Series B, 80(1), 177–198.
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12244
Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A. (2004)
“kernlab – An S4 Package for Kernel Methods in R",
Journal of Statistical Software, 11(9), 1–20.
https://www.jstatsoft.org/v11/i09/paper
data(BUPA) # generate a linear kernel kfun = vanilladot() # generate a Laplacian kernel function with sigma = 1 kfun = laplacedot(sigma=1) # generate a Gaussian kernel function with sigma estimated by sigest() kfun = rbfdot(sigma=sigest(BUPA$X)) # set kern=kfun when fitting a kerndwd object data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) m1 = kerndwd(BUPA$X, BUPA$y, kern=kfun, qval=1, lambda=lambda, eps=1e-5, maxit=1e5)
data(BUPA) # generate a linear kernel kfun = vanilladot() # generate a Laplacian kernel function with sigma = 1 kfun = laplacedot(sigma=1) # generate a Gaussian kernel function with sigma estimated by sigest() kfun = rbfdot(sigma=sigest(BUPA$X)) # set kern=kfun when fitting a kerndwd object data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) m1 = kerndwd(BUPA$X, BUPA$y, kern=kfun, qval=1, lambda=lambda, eps=1e-5, maxit=1e5)
Plot cross-validation error curves with the upper and lower standard deviations versus log lambda
values.
## S3 method for class 'cv.kerndwd' plot(x, sign.lambda, ...)
## S3 method for class 'cv.kerndwd' plot(x, sign.lambda, ...)
x |
A fitted |
sign.lambda |
Against |
... |
Other graphical parameters being passed to |
This function plots the cross-validation error curves. This function is modified based on the plot.cv
function of the glmnet
package.
Boxiang Wang and Hui Zou
Maintainer: Boxiang Wang [email protected]
Wang, B. and Zou, H. (2018)
“Another Look at Distance Weighted Discrimination,"
Journal of Royal Statistical Society, Series B, 80(1), 177–198.
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12244
Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths for generalized
linear models via coordinate descent," Journal of Statistical Software, 33(1), 1–22.
https://www.jstatsoft.org/v33/i01/paper
set.seed(1) data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) m.cv = cv.kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) m.cv
set.seed(1) data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) m.cv = cv.kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) m.cv
Plot the solution paths for a fitted kerndwd
object.
## S3 method for class 'kerndwd' plot(x, color=FALSE, ...)
## S3 method for class 'kerndwd' plot(x, color=FALSE, ...)
x |
A fitted “ |
color |
If |
... |
Other graphical parameters to |
Plots the solution paths as a coefficient profile plot. This function is modified based on the plot
function from the glmnet
package.
Boxiang Wang and Hui Zou
Maintainer: Boxiang Wang [email protected]
Wang, B. and Zou, H. (2018)
“Another Look at Distance Weighted Discrimination,"
Journal of Royal Statistical Society, Series B, 80(1), 177–198.
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12244
Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths for generalized linear models via coordinate descent," Journal of Statistical Software, 33(1), 1–22.
https://www.jstatsoft.org/v33/i01/paper
kerndwd
, predict.kerndwd
, coef.kerndwd
, plot.kerndwd
, and cv.kerndwd
.
data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) m1 = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) plot(m1, color=TRUE)
data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) m1 = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) plot(m1, color=TRUE)
Predict the binary class labels or the fitted values of an kerndwd
object.
## S3 method for class 'kerndwd' predict(object, kern, x, newx, type=c("class", "link"), ...)
## S3 method for class 'kerndwd' predict(object, kern, x, newx, type=c("class", "link"), ...)
object |
A fitted |
kern |
The kernel function used when fitting the |
x |
The predictor matrix, i.e., the |
newx |
A matrix of new values for |
type |
|
... |
Not used. Other arguments to |
If "type"
is "class"
, the function returns the predicted class labels. If "type"
is "link"
, the result is for the linear case and
for the kernel case.
Returns either the predicted class labels or the fitted values, depending on the choice of type
.
Boxiang Wang and Hui Zou
Maintainer: Boxiang Wang [email protected]
Wang, B. and Zou, H. (2018)
“Another Look at Distance Weighted Discrimination,"
Journal of Royal Statistical Society, Series B, 80(1), 177–198.
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12244
data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) m1 = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) predict(m1, kern, BUPA$X, tail(BUPA$X))
data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) m1 = kerndwd(BUPA$X, BUPA$y, kern, qval=1, lambda=lambda, eps=1e-5, maxit=1e5) predict(m1, kern, BUPA$X, tail(BUPA$X))
A fast implementaiton of cross-validation for kerndwd
to find the optimal values of the tuning parameter lambda
.
tunedwd(x, y, kern, lambda, qvals=1, eps=1e-5, maxit=1e+5, nfolds=5, foldid=NULL)
tunedwd(x, y, kern, lambda, qvals=1, eps=1e-5, maxit=1e+5, nfolds=5, foldid=NULL)
x |
A matrix of predictors, i.e., the matrix |
y |
A vector of binary class labels, i.e., the |
kern |
A kernel function. |
lambda |
A user specified |
qvals |
A vector containing the index of the generalized DWD. Default value is 1. |
eps |
The algorithm stops when (i.e. |
maxit |
The maximum of iterations allowed. Default is 1e5. |
nfolds |
The number of folds. Default value is 5. The allowable range is from 3 to the sample size. |
foldid |
An optional vector with values between 1 and |
This function returns the best tuning parameters
q
and lambda
by cross-validation. An efficient tune method is employed to accelerate the algorithm.
A tunedwd.kerndwd
object including the cross-validation results is return.
lam.tune |
The optimal |
q.tune |
The optimal |
Boxiang Wang and Hui Zou
Maintainer: Boxiang Wang [email protected]
Wang, B. and Zou, H. (2018)
“Another Look at Distance Weighted Discrimination,"
Journal of Royal Statistical Society, Series B, 80(1), 177–198.
https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12244
Friedman, J., Hastie, T., and Tibshirani, R. (2010), "Regularization paths for generalized linear models via coordinate descent," Journal of Statistical Software, 33(1), 1–22.
https://www.jstatsoft.org/v33/i01/paper
set.seed(1) data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) ret = tunedwd(BUPA$X, BUPA$y, kern, qvals=c(1,2,10), lambda=lambda, eps=1e-5, maxit=1e5) ret
set.seed(1) data(BUPA) BUPA$X = scale(BUPA$X, center=TRUE, scale=TRUE) lambda = 10^(seq(-3, 3, length.out=10)) kern = rbfdot(sigma=sigest(BUPA$X)) ret = tunedwd(BUPA$X, BUPA$y, kern, qvals=c(1,2,10), lambda=lambda, eps=1e-5, maxit=1e5) ret