| Title: | Probabilistic Support Vector Machines |
|---|---|
| Description: | Implements kernel-based classification Support Vector Machines with reliable estimated probabilities of class membership. Theoretical support for the functions in this package can be found in Duarte Silva (2025) <doi:10.1016/j.cor.2025.107203>. |
| Authors: | A. Pedro Duarte Silva [aut, cre] |
| Maintainer: | A. Pedro Duarte Silva <[email protected]> |
| License: | GPL-2 |
| Version: | 0.1.0 |
| Built: | 2026-06-25 19:57:23 UTC |
| Source: | https://github.com/cran/ProbSVMs |
Implements 'all-in-one' kernel-based multiclass Support Vector Machines for supervised classification and class probability estimation.
| Package: | ProbSVMs |
| Type: | Package |
| Version: | 0.1-0 |
| Date: | 2026-06-22 |
| License: | GPL-2 |
| LazyLoad: | yes |
Package ProbSVMs implements 'all-in-one' kernel-based multiclass Support Vector Machines for supervised classification and class probability estimation. The most original novelty of ProbSVMs is the Probabilistic Vector Machines (PVMs) proposed by Duarte Silva in reference [3]. PVMs are distribution-free estimators of class probabilities based on the predictions given by a sequence of kernel-based weighted Supported Vector Machines (SVMs).
Currently there two variants of multiclass PVMs implemented in ProbSVMs: (i) machines that use the SVM loss proposed by Lee, Lin and Wahba (see reference [5]), and machine that use the loss proposed by Weston and Watkins (see reference [6]). The former variant has better asymptotic properties, but there are some empirical evidence suggesting that the later often gives more reliable results in many applications (see, e.g., reference [2]). For two-class problems these two variants are equivalent.
In problems with more than two classes currently ProbSVMs only implements PVMs based on classification rules derived from weighted kernel-based SVMs without bias terms. In two-class problems both rules with (default) or without bias terms can be used. In problems with more than two classes dropping the bias terms has some computational advantages, and it has been argued that it should not have a noticable impact on statistical performance (see references [2] and [3] for further discussion).
In adition, ProbSVMs also implements Duarte Silva's adaptation (see reference [3]) of Dogan, Glasmachers and Igel' s efficient training algorithm (see reference [1]) for 'all-in-one' multiclass kernel-based SVMs without bias terms. While an original implementation of this algorithm is availalble on the machine learning Shark library (see reference [4]), unlike ProbSVMs that implementation does not seems to cover weighted SVM versions, nor does it seem to include any interface to the R environment.
in ProbSVMs, PVMs can be trained by the function trainPVM which generates an object of class PVM, which has a predict method for computing
reliable class probability estimates. Similarly, kernel-based SVMs can be trained by the function trainSVM which generates an object of class
kernelSVM (single trained SVM)) or kernelSVMs (multiple trained SVMs) that have predict methods for finding class predictions.
Antonio Pedro Duarte Silva <[email protected]>
Maintainer: Antonio Pedro Duarte Silva <[email protected]>
[1] Dogan, U.; Glasmachers, T. and Igel, C. (2011) Fast training of multi-class support vector machines. Technical report. Department of Computer Sciences, University of Copenhagen.
[2] Dogan U.; Glasmachers T. and Igel, C. (2016) A unified view on multi-class support vector classification. Journal of Machine Learning Research, Vol. 17 (45), 1–32.
[3] Duarte Silva, A.P. (2025) Probabilistic Vector Machines. Computers and Operations Research, Vol. 183, 107203. <doi:10.1016/j.cor.2025.107203>
[4] Igel, C.; Heidrich-Meisner, V. and Glasmachers, T. (2008) Shark. Journal of Machine Learning Research, Vol 9 (6), 993-996.
[5] Lee, Y.; Lin, Y. and Wahba, G. (2004) Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, Vol. 99, 67–81. <doi:10.1198/016214504000000098>
[6] Weston, J. and Watkins, C. (1999) Support vector machines for multi-class pattern recognition. In Proceedings of the 7th European Symposium on Artificial Neural Networks, 219–224.
trainPVM, trainSVM, predict.PVM, predict.kernelSVM, plot.ClassProb
# train the Weston and Watkins SVM on the Iris data data(iris) WWsvm <- trainSVM(iris[,1:4],iris$Species,loss="WW") # Get in-sample classification results WWpred <- predict(WWsvm,newdata=iris[,1:4]) WWpred # Compare classifications with true assignments cat("Original classes:\n") print(iris$Species) print(WWpred==iris$Species) # Estimate class probabilities based on the WW loss WWpvm <- trainPVM(iris[,1:4],iris$Species,loss="WW") WWprob <- predict(WWpvm,newdata=iris[,1:4],trndt="newdata") # Display the probabilities of the predicted classes # on the space of the petal measurements plot(WWprob, projecton="OrigDt", trdata=iris, axis=c(3,4))# train the Weston and Watkins SVM on the Iris data data(iris) WWsvm <- trainSVM(iris[,1:4],iris$Species,loss="WW") # Get in-sample classification results WWpred <- predict(WWsvm,newdata=iris[,1:4]) WWpred # Compare classifications with true assignments cat("Original classes:\n") print(iris$Species) print(WWpred==iris$Species) # Estimate class probabilities based on the WW loss WWpvm <- trainPVM(iris[,1:4],iris$Species,loss="WW") WWprob <- predict(WWpvm,newdata=iris[,1:4],trndt="newdata") # Display the probabilities of the predicted classes # on the space of the petal measurements plot(WWprob, projecton="OrigDt", trdata=iris, axis=c(3,4))
ClassProb objectconversion function to create ClassProb objects form matrices and data.frames.
as.ClassProb(x, ...)as.ClassProb(x, ...)
x |
a matrix or data frame (with observations in rows and classes in columns) containing class probability estimates in a supervised classification problem. |
... |
further arguments passed to or from other methods. |
an object of class ClassProb containing class probability estimates, for which there is a specialized plot method.
The internal structure of objects inheriting from class ClassProb is simply the original matrix or data frame plus its
class attribute.
GetrbfdotSigPar returns sensible values for the sigma parameter of the rbfdot (“Gaussian”) kernel function.
GetrbfdotSigPar(x, kpar=c("d2median","d2q01q09mean","d2q01q09hmean"))GetrbfdotSigPar(x, kpar=c("d2median","d2q01q09mean","d2q01q09hmean"))
x |
a matrix or data frame of training data (observations in rows, variables in columns). |
kpar |
a string specifying how sigma should be computed. Valid alternatives are:
|
GetrbfdotSigPar returns a sensible value for the sigma parameter of the Radial Basis (“Gaussian”) kernel function.
According to Caputo et. al (reference [1]) any sigma values between the 10-th and 90-th percentiles of the distribution of the
pariwise euclidean distances between observation pairs, , lead to reasonable Support Vector Machine
classification results. GetrbfdotSigPar returns one of three alternatives for the sigma value: (i) the median of the
distribution (default). (ii) the arithmetic mean of the 10-th and 90-th percentiles
of the distribution. (iii) the harmonic mean of the 10-th and 90-th percentiles of the distribution.
a scalar with a sensible value for sigma hyper-parameter of the Radial Basis (“Gaussian”) kernel function.
[1] Caputo, B.; Sim, K.; Furesjo, F. and Smola, A. (2002). Appearance-based object recognition using svms: which kernel should i use? In Proceedings of NIPS workshop on Statistical methods for computational experiments in visual processing and computer vision, Whistler.
makeKMat creates kernel matrices from row data. In can be used to create the Gram matrix from a single data set, or a general kernel matrix from two different data sets.
makeKMat(dt1, dt2=NULL, kernel=c("rbfdot","vanilladot","polydot"), kpar=list(sigma="d2median"))makeKMat(dt1, dt2=NULL, kernel=c("rbfdot","vanilladot","polydot"), kpar=list(sigma="d2median"))
dt1 |
a data matrix or data frame, with observations in rows and variables in columns. |
dt2 |
a data matrix or data frame, with observations in rows and variables in columns.
If equal to NULL the kernel matrix to be created will consist of kernel values between
all pairs of the |
kernel |
the kernel function used for training and prediction. Currentlty this argument can be set to one of the following strings:
|
kpar |
the list of kernel hyper-parameters. This is a list which contains the parameters to be used with the kernel function. Valid parameters for existing kernels are :
|
A kernel matrix. If argument dt2 is NULL, a symmetrix matrix consisting of the kernel values
between all pairs of dt1 elements. If argument dt2 is not NULL, a matrix with the kernel
values between the elements of the dt1 (in rows) and dt2 (in columns).
ClassProb objectsMethod for displaying the class probabilities contained in ClassProb objects.
## S3 method for class 'ClassProb' plot(x, ..., type=c("scatterplt","stckdbarplt"), projecton=c("DiscFact","PCs","OrigDt"), trdata=NULL, grouping=NULL, newdata="trdata", ShownCl="PredCl", axis=c(1,2), obs=1:nrow(x), withxlabels=length(obs)<=30, title=NULL, pntsize = 2.5, threecolors=TRUE, clgrdparl = list(low="white",mid="blue",high="darkred",midpoint=0.7))## S3 method for class 'ClassProb' plot(x, ..., type=c("scatterplt","stckdbarplt"), projecton=c("DiscFact","PCs","OrigDt"), trdata=NULL, grouping=NULL, newdata="trdata", ShownCl="PredCl", axis=c(1,2), obs=1:nrow(x), withxlabels=length(obs)<=30, title=NULL, pntsize = 2.5, threecolors=TRUE, clgrdparl = list(low="white",mid="blue",high="darkred",midpoint=0.7))
x |
an object of class inheriting from “ClassProb”. |
... |
further arguments passed to or from other methods. |
type |
type of plot to be displayed. Currentlty this argument can be set to one of the following strings:
|
projecton |
Type of projection used to display the class probabilites. The value of this argument is ignored when
argument
|
trdata |
either a matrix or data frame with the training data (with observations in rows and variables in columns)
based on which the scatter plot projections are to be found. The value of this argument is ignored when argument |
grouping |
a factor describing the response vector, with one label by observation, in the training data. This argument
is only required for projections onto canonical Linear Discriminant spaces, and its value is ignored when argument argument |
newdata |
either a matrix or data frame with the data (with observations in rows and variables in columns) for which class probabilites are to be displayed. |
ShownCl |
a description of the class whose probabilities are to be displayed by a colour grid in the scatter plot projection
displays. This could be either the string “PredCl” (default), if for each observation the probabilities of the predicted
clase are to be displayed, or the position or name of a particular class. The value of this argument is ignored when argument |
axis |
the set of LDFs or PCs (when argument |
obs |
the set of |
withxlabels |
a logical flag indicating if, in stacked bar plot displays, the observation names are to be shown on the x axis.
The value of this argument is ignored when argument |
title |
an overall title for the plot. |
pntsize |
a numerical value giving the amount by which, in scatter plots, plotting symbols should be magnified relative
to the |
threecolors |
logical flag indicating if, in scatter plot displays, the colour scheme used for representing class probabilities
should be based on a three-colour scale (TRUE), or a more conventional (but somewhow less discriminatory) gradient colour scale (FALSE).
The value of this argument is ignored when argument |
clgrdparl |
a list of arguments used to define the colour scale used for representing class probabilities in scatter plot projections.
The value of this argument is ignored when argument
|
No return value, called for side effects
trainPVM, predict.PVM, as.ClassProb
# Train the Weston and Watkins PVM on the Iris data # and obtain the training sample class probabilities data(iris) WWpvm <- trainPVM(iris[,1:4],iris$Species) WWprob <- predict(WWpvm,newdata=iris[,1:4],trndt="newdata") # Display the probabilites of the predicted classes # on the two-dimensional canonical discriminant space plot(WWprob, trdata=iris[,1:4], grouping=iris$Species) # Display the same probabilities on the space of the petal measurements plot(WWprob, projecton="OrigDt", trdata=iris, axis=c(3,4)) # Show all class probabilities by a stacked bar plot plot(WWprob, type="stckdbarplt") # Display the scatter plots, focusing now on the virginica probabilities plot(WWprob, trdata=iris[,1:4], grouping=iris$Species, ShownCl="virginica") plot(WWprob, projecton="OrigDt", trdata=iris, axis=c(3,4), ShownCl="virginica") # Repeat the previous analysis, using the first 40 observations in each class # for training, and the last 10 for probability estimation trdata <- iris[c(1:40,51:90,101:140),1:4] trSpecies <- iris$Species[c(1:40,51:90,101:140)] evaldata <- iris[c(41:50,91:100,141:150),1:4] WWpvm1 <- trainPVM(trdata,trSpecies) WWprob1 <- predict(WWpvm1,newdata=evaldata,trndt=trdata) plot(WWprob1, projecton="DiscFact", trdata=trdata, newdata=evaldata, grouping=trSpecies) plot(WWprob1, projecton="OrigDt", newdata=evaldata) plot(WWprob1, type="stckdbarplt", title = "Estimated class probabilities") plot(WWprob1, projecton="DiscFact", trdata=trdata, newdata=evaldata, grouping=trSpecies, ShownCl="virginica") plot(WWprob1, projecton="OrigDt", newdata=evaldata, ShownCl="virginica")# Train the Weston and Watkins PVM on the Iris data # and obtain the training sample class probabilities data(iris) WWpvm <- trainPVM(iris[,1:4],iris$Species) WWprob <- predict(WWpvm,newdata=iris[,1:4],trndt="newdata") # Display the probabilites of the predicted classes # on the two-dimensional canonical discriminant space plot(WWprob, trdata=iris[,1:4], grouping=iris$Species) # Display the same probabilities on the space of the petal measurements plot(WWprob, projecton="OrigDt", trdata=iris, axis=c(3,4)) # Show all class probabilities by a stacked bar plot plot(WWprob, type="stckdbarplt") # Display the scatter plots, focusing now on the virginica probabilities plot(WWprob, trdata=iris[,1:4], grouping=iris$Species, ShownCl="virginica") plot(WWprob, projecton="OrigDt", trdata=iris, axis=c(3,4), ShownCl="virginica") # Repeat the previous analysis, using the first 40 observations in each class # for training, and the last 10 for probability estimation trdata <- iris[c(1:40,51:90,101:140),1:4] trSpecies <- iris$Species[c(1:40,51:90,101:140)] evaldata <- iris[c(41:50,91:100,141:150),1:4] WWpvm1 <- trainPVM(trdata,trSpecies) WWprob1 <- predict(WWpvm1,newdata=evaldata,trndt=trdata) plot(WWprob1, projecton="DiscFact", trdata=trdata, newdata=evaldata, grouping=trSpecies) plot(WWprob1, projecton="OrigDt", newdata=evaldata) plot(WWprob1, type="stckdbarplt", title = "Estimated class probabilities") plot(WWprob1, projecton="DiscFact", trdata=trdata, newdata=evaldata, grouping=trSpecies, ShownCl="virginica") plot(WWprob1, projecton="OrigDt", newdata=evaldata, ShownCl="virginica")
predict classes based on object of class 'kernelSVM'.
## S3 method for class 'kernelSVM' predict(object, ..., newdata=NULL, KMat=NULL, trndt=NULL)## S3 method for class 'kernelSVM' predict(object, ..., newdata=NULL, KMat=NULL, trndt=NULL)
object |
Object of class inheriting from “kernelSVM”.
This could either contain a single trained SVM (if the |
... |
further arguments passed to or from other methods. |
newdata |
a matrix or data frame with the new data
(with observations in rows and variables in columns)
to be predicted by the SVM(s) given by |
KMat |
a Kernel matrix with the values resulting from applying
the kernel function to all pairs between newdata and training data
observations. The |
trndt |
either a matrix or data frame with the training data (with observations in rows and variables in columns) or the string “newdata”, if (and only if) SVM predictions are to be found for the same data as the one used for training. |
For predictions based on single SVM objects (classe “kernelSVM”), a factor with the class predictions for each new data observation. For predictions based on multiple SVM objects (class “kernelSVMs”), a data frame of factors where which each column corresponds to a different SVM and each row corresponds to a new data observation.
## The following examples are based on the MASS data set "crabs". # This data records physical measurements on 200 specimens of # Leptograpsus variegatus crabs observed on the shores ## of Western Australia. The crabs are classified by two factors, sex and sp ## (crab species as defined by its colour: blue or orange), with two levels ## each. The measurement variables include the natural logarithms of carapace length (CL), ## the carapace width (CW), the size of the frontal lobe (FL) and the size of ## the rear width (RW). In the analysis provided, we created four classes # by crossing the sex and sp levels. library(MASS) data(crabs) crabs$grp <- interaction(crabs$sex,crabs$sp) crabs$lnFL <- log(crabs$FL) crabs$lnRW <- log(crabs$RW) crabs$lnCL <- log(crabs$CL) crabs$lnCW <- log(crabs$CW) crabs$lnBD <- log(crabs$BD) # train the WW SVM, with its default setings, or the crabs data WWsvm <- trainSVM(crabs[,10:14],crabs$grp) # Get in-sample classification results WWpred <- predict(WWsvm,newdata=crabs[,10:14]) WWpred # Compare classifications with true assignments cat("Original classes:\n") print(crabs$grp) print(WWpred==crabs$grp)## The following examples are based on the MASS data set "crabs". # This data records physical measurements on 200 specimens of # Leptograpsus variegatus crabs observed on the shores ## of Western Australia. The crabs are classified by two factors, sex and sp ## (crab species as defined by its colour: blue or orange), with two levels ## each. The measurement variables include the natural logarithms of carapace length (CL), ## the carapace width (CW), the size of the frontal lobe (FL) and the size of ## the rear width (RW). In the analysis provided, we created four classes # by crossing the sex and sp levels. library(MASS) data(crabs) crabs$grp <- interaction(crabs$sex,crabs$sp) crabs$lnFL <- log(crabs$FL) crabs$lnRW <- log(crabs$RW) crabs$lnCL <- log(crabs$CL) crabs$lnCW <- log(crabs$CW) crabs$lnBD <- log(crabs$BD) # train the WW SVM, with its default setings, or the crabs data WWsvm <- trainSVM(crabs[,10:14],crabs$grp) # Get in-sample classification results WWpred <- predict(WWsvm,newdata=crabs[,10:14]) WWpred # Compare classifications with true assignments cat("Original classes:\n") print(crabs$grp) print(WWpred==crabs$grp)
predict class probabilities based on objects of class 'PVM'.
## S3 method for class 'PVM' predict(object, ..., newdata, eta=15., probepsilon="adjtogrid", trndt=NULL, retallprd=FALSE, newdataasKmat=FALSE)## S3 method for class 'PVM' predict(object, ..., newdata, eta=15., probepsilon="adjtogrid", trndt=NULL, retallprd=FALSE, newdataasKmat=FALSE)
object |
Object of class inheriting from “PVM”. |
... |
further arguments passed to or from other methods. |
newdata |
either a matrix or data frame with the data
(with observations in rows and variables in columns)
for which class probabilites are to be computed, or a Kernel matrix
with the values resulting from applying the kernel function to all pairs
between newdata and training data observations. In the later case argument
|
eta |
value of the eta hyper-parameter used in the conversion of SVM(s)
predictions to class probabilities. It gives the relative importance of
undesirable versus desirable |
probepsilon |
the minimum possible value for the class probabilities estimated
by |
trndt |
either a matrix or data frame with the training data (with observations in rows and variables in columns) or the string “newdata”, if (and only if) class probabilites are to be computed for the same data as the one used for training. |
retallprd |
a boolean flag stating if all weigthed SVM predctions should be returned together with the estimated class probabilites. |
newdataasKmat |
a boolean flag stating if the new data is given as a pre-computed Kernel matrix. |
when retallprd is set to FALSE, a ClassProb object with the
probability estimates for the new data. Class ClassProb are matrices
or data frames of class probabilities (observations in rows, classes in columns),
plus a class attribute, and have a plot method for relevant displays of these
probabilities.
When retallprd is set to TRUE a list with two components named (i) Pprob:
a ClassProb object; and (ii) classpred: a matrix with observations in
rows and class-weight specifications in columns, containing the weighted SVM
predictions.
[1] Duarte Silva, A.P. (2025) Probabilistic Vector Machines. Computers and Operations Research, Vol. 183, 107203. <doi:10.1016/j.cor.2025.107203>
## The following examples are based on the MASS data set "crabs". # This data records physical measurements on 200 specimens of # Leptograpsus variegatus crabs observed on the shores ## of Western Australia. The crabs are classified by two factors, sex and sp ## (crab species as defined by its colour: blue or orange), with two levels ## each. The measurement variables include the natural logarithms of carapace length (CL), ## the carapace width (CW), the size of the frontal lobe (FL) and the size of ## the rear width (RW). In the analysis provided, we created four classes # by crossing the sex and sp levels. library(MASS) data(crabs) crabs$grp <- interaction(crabs$sex,crabs$sp) crabs$lnFL <- log(crabs$FL) crabs$lnRW <- log(crabs$RW) crabs$lnCL <- log(crabs$CL) crabs$lnCW <- log(crabs$CW) crabs$lnBD <- log(crabs$BD) WWpvm <- trainPVM(crabs[,10:14],crabs$grp) WWprob <- predict(WWpvm,newdata=crabs[,10:14],trndt="newdata") # Display the probabilities of the predicted classes plot(WWprob, , trdata=crabs[,10:14], grouping=crabs$grp) plot(WWprob, , trdata=crabs[,10:14], grouping=crabs$grp, type="stckdbarplt") WWprob # Repeat the analysis, using the first 45 observations in each class for training, # and the last 5 for probability estimation trdata <- crabs[c(1:45,51:95,101:145,151:195),10:14] trgrp <- crabs$grp[c(1:45,51:95,101:145,151:195)] evaldata <- crabs[c(46:50,96:100,146:150,196:200),10:14] WWpvm1 <- trainPVM(trdata,trgrp) WWprob1 <- predict(WWpvm1,newdata=evaldata,trndt=trdata) plot(WWprob1, type="stckdbarplt") WWprob1## The following examples are based on the MASS data set "crabs". # This data records physical measurements on 200 specimens of # Leptograpsus variegatus crabs observed on the shores ## of Western Australia. The crabs are classified by two factors, sex and sp ## (crab species as defined by its colour: blue or orange), with two levels ## each. The measurement variables include the natural logarithms of carapace length (CL), ## the carapace width (CW), the size of the frontal lobe (FL) and the size of ## the rear width (RW). In the analysis provided, we created four classes # by crossing the sex and sp levels. library(MASS) data(crabs) crabs$grp <- interaction(crabs$sex,crabs$sp) crabs$lnFL <- log(crabs$FL) crabs$lnRW <- log(crabs$RW) crabs$lnCL <- log(crabs$CL) crabs$lnCW <- log(crabs$CW) crabs$lnBD <- log(crabs$BD) WWpvm <- trainPVM(crabs[,10:14],crabs$grp) WWprob <- predict(WWpvm,newdata=crabs[,10:14],trndt="newdata") # Display the probabilities of the predicted classes plot(WWprob, , trdata=crabs[,10:14], grouping=crabs$grp) plot(WWprob, , trdata=crabs[,10:14], grouping=crabs$grp, type="stckdbarplt") WWprob # Repeat the analysis, using the first 45 observations in each class for training, # and the last 5 for probability estimation trdata <- crabs[c(1:45,51:95,101:145,151:195),10:14] trgrp <- crabs$grp[c(1:45,51:95,101:145,151:195)] evaldata <- crabs[c(46:50,96:100,146:150,196:200),10:14] WWpvm1 <- trainPVM(trdata,trgrp) WWprob1 <- predict(WWpvm1,newdata=evaldata,trndt=trdata) plot(WWprob1, type="stckdbarplt") WWprob1
SetUpOptPar sets up several control parameters for the optimization of the mathematical
programming models used for SVM training.
SetUpOptPar(start=c("allub","alllb","compubwithlb","warmstarts"), alpha0=NULL, epsilon=1e-12, maxiter=100000, tol=1.5e-12 )SetUpOptPar(start=c("allub","alllb","compubwithlb","warmstarts"), alpha0=NULL, epsilon=1e-12, maxiter=100000, tol=1.5e-12 )
start |
string describing the initial solution for the optimization algorithm. It can have one of the follwing values:
|
alpha0 |
vector with initial values for the optimization algorithm. Only used
wehn argument |
epsilon |
stoping criterion for the optimization algorithm. When gradient values, or criteria increments,
are below |
maxiter |
maximum number of iterations for the optimization algorithm. |
tol |
variable numerical tolerance. Variables closer to their lower or upper bounds than |
a list with components start, alpha0, epsilon, maxiter and tol .
SetUpTunPar sets up several control parameters in the
procedure used for tuning SVM regularization hyper-parameters.
SetUpTunPar(tunex=NULL, tuney=NULL, tuneK=NULL, Csrchpar=list(Cpowerbase=2.,Cgridinlev=7,Cnloops=2), crossval=FALSE, crossvalpar=list(Strfolds=TRUE,kfold=10,CVrep=3))SetUpTunPar(tunex=NULL, tuney=NULL, tuneK=NULL, Csrchpar=list(Cpowerbase=2.,Cgridinlev=7,Cnloops=2), crossval=FALSE, crossvalpar=list(Strfolds=TRUE,kfold=10,CVrep=3))
tunex |
a matrix or data frame containing the data (observations in rows
and variables in columns) used for tuning SVM hyper-parameters. It is ignored
if argument |
tuney |
a factor describing the response vector (with one label by observation) used or tuning SVM hyper-parameters. |
tuneK |
a kernel matrix based on the trainining data (columns), and the data used for tuning
SVM hyper-parameters (rows). When not NULL, overrides the value of argument |
Csrchpar |
list of control parameters used for the search of the tuning parameters. It is formed by the following components:
|
crossval |
a logical flag flag indicating if the tuning procedure should be based on a (statistically more sound, but more time intensitive) cross-validation strategy. |
crossvalpar |
list of control parameters used for the cross-validation strategy. It is ignored when crossval is set to FALSE (default), and is formed by the following components:
|
SetUpTunPar sets up several control parameters in the procedure used for tuning SVM regularization hyper-parameters.
The tuning procedure is based on a search for the optimal value of an performance criterion over a finite grid of
Csrchpar$Cgridinlev different powers of Csrchpar$Cpowerbase, centered at zero.
When argument Csrchpar$Cnloops is set to one, these powers are consecutive integers. When Csrchpar$Cnloops is higher
than one, there is an initial search over a wider grid, which is then refined around the best value found in the previous loop.
This procedure is repeated Csrchpar$Cnloops times, in such way that in the last loop there are Csrchpar$Cgridinlev
consecutive integers.
The evaluation criterion to be optimized can be an SVM error rate on its training data (if arguments tunex and tuneK are both
NULL), on a tuning data set defined by arguments tunex and tuney or tuneK (if argument crossval is FALSE), or a
cross-validated estimate of the SVM error rate (if argument crossval is TRUE) as defined by argument crossvalpar.
a list with components tunex, tuney, tuneK, Csrchpar, crossval and crossvalpar .
trainPVM creates objects of class 'PVM' (Probabilistic Vector Machines) by training a sequence of kernel-based weighted Supported Vector Machines with different class-weight specifications.
trainPVM(x=NULL, y, scaled=TRUE, K=NULL, loss=c("WW","LLW"), withbias=(length(levels(y))==2), kernel=c("rbfdot","vanilladot","polydot"), kpar=list(sigma="d2median"), C=0.25, lambda=NULL, tunex=x, tuney=y, tuneK=K, grid=NULL, dpiinv=ceiling(sqrt(length(y))/0.2), keepdt=TRUE, ... )trainPVM(x=NULL, y, scaled=TRUE, K=NULL, loss=c("WW","LLW"), withbias=(length(levels(y))==2), kernel=c("rbfdot","vanilladot","polydot"), kpar=list(sigma="d2median"), C=0.25, lambda=NULL, tunex=x, tuney=y, tuneK=K, grid=NULL, dpiinv=ceiling(sqrt(length(y))/0.2), keepdt=TRUE, ... )
x |
a matrix or data frame containing the training data, with observations in rows and variables in columns.
If x is equal to NULL, a kernel matrix should be provided directly in argument |
y |
a factor describing the response vector, with one label by observation. |
scaled |
A logical flag indicating wheter or not the variables should be scaled to unit variance. |
K |
A symmetrix kernel matrix. When equal to NULL, the training data should be given by argument |
loss |
a string specifying the multiclass large margin loss to be employed. Currently the following alternatives are implemented: “LLW” for the Lee, Lin and Wahba loss (see reference[3]), and “WW” for the Weston and Watkins loss (see reference[4]). In two-class problems these two losses are equivalent, and also equivalent to the classical hinge loss. |
withbias |
a logical flag indicating if a bias term (intercept) should be included in the classification rule. For problems with more than two classes, currently only rules without bias terms are implemented. However, in two-class problems the default is to use rules with a bias term. |
kernel |
the kernel function used for training and prediction. Currentlty this argument can be set to one of the following strings:
|
kpar |
the list of kernel hyper-parameters. This is a list which contains the parameters to be used with the kernel function. Valid parameters for existing kernels are :
|
C |
cost of constraints violation in the base SVMs (default: 0.25). This is the "C"-constant of the regularization term in the Lagrange
formulation. When set to the string “tuneit” the value of "C" is tuned to the data provided
in arguments |
lambda |
regularization parameter in the functional analysis formulation of SVMs. When set to a non-NULL value, overrides
the value of argument |
tunex |
a matrix or data frame containing the data (observations in rows and variables in columns) used to tune the hyper-parameters
|
tuney |
a factor describing the response vector (with one label by observation) used to tune the hyper-parameters
|
tuneK |
a kernel matrix based on the trainining data (columns), and the data used (rows) for tuning the hyper-parameters |
grid |
A matrix with the class-weight specifications used to train the PVM. The rows correspond to different specifications, and the columns
to the corresponding class weights. All rows should have non-negative elements adding up to one. When set to NULL (default) the grid matrix
is automatically created based on the value of argument |
dpiinv |
the distance between two consecutive weights in the (one-dimensional) specifications of class-weights used to construct a grid matrix.
Note that weights are uniformily distributed only along one dimension, while the remaing components of the grid are set at random. However, all grid
dimensions (classes) are chosen in turn as the one forced to have uniform weights. See reference [2] for further details.
This argument is ignored when argument |
keepdt |
a logical flag indicating if the orginal training data should be returned together with the trainded PVM |
... |
other arguments to be passed to trainSVM |
trainPVM trains the Probabilistic Vector Machines (PVMs) proposed by Duarte Silva in reference [2]. PVMs are distribution-free estimators of class
probabilities based on the predictions given by a sequence of kernel-based weighted Supported Vector Machines (SVMs) with different class-weight specifications.
A grid matrix with these specifications is usually created automatically, with its resolution level controled by the argument dpiinv, but can also
be directly provided by argument grid.
Currently there two variants of multiclass PVMs implemented in trainPVM: (i) machines that use the SVM loss proposed by Lee, Lin and Wahba (see reference [3]), and machine that use the loss proposed by Weston and Watkins (see reference [4]). The former variant has better asymptotic properties, but there are some empirical evidence suggesting that the later often gives more reliable results in many applications (see, e.g., reference [1]). For two-class problems these two variants are equivalent.
In problems with more than two classes currently trainPVM only implements PVMs based on classification rules derived from weighted kernel-based SVMs without
bias terms. In two-class problems rules with (default) or without bias terms can be used, with this choice controled by the argument with bias. In problems
with more than two classes dropping the bias terms has some computational advantages, and it has been argued that it should not have a noticable impact on
statistical performance (see references [1] and [2] for further discussion).
The amount of regularization provided by the underlying SVMs is controled by the value of the hyper-paramters C or lambda. While C (the
cost of constraint violations) is the traditional regularization hyper-parameter used in the majority of the SVM literature, there are also alternative formulations
in which SVM training is presented as a particular case of regularized function estimation in functional analysis. In this perspective,
the SVM constraint violations are viewed as unit-cost losses, and the margin component of the traditional SVM criterion is understood as a complexity penalty
that is weighted by the regularization hyper-paramter lambda. The two formulations are mathematically equivalent if lambda is set to 1/(2 n C),
or if C is set to 1/(2 n lambda), with n being the number of observations in the training data. Note that, as C and lambda are inversely
related, higher values of lambda and lower values of C correspond to higher regularization and smoother classification rules, while lower lambda
values and higher C values lead to more complex (and flexible) rules with better training sample performance, but possibly with worse generability potential.
In trainPVM the values of C and lambda can be set by the user, or tuned automatically to the training or additional data, provided by arguments tunex,
tuney, or tuneK. In that case, C and lambda are found by optimizing the log-likelihood of the class probability estimates on the tunning data.
However, the tuning procedure is computionally intensive, and can be very time consuming. See reference [2] for further details.
A object of class “PVM” containing the trained Probability Vector Machine. Class “PVM” has a predict method, that estimates class probabilites from “PVM” objects, and data frames or Kernel matrices of new (or training) data.
An object inheriting from class “PVM” is a list containing at least the following components:
the number of different classess.
the value of argument kernel in the call to trainPVM.
the value of argument kpar in the call to trainPVM.
the grid matrix with the class-weight specifications used to train the PVM.
a list with as many components as the number of different weighted SVMs trained.
Each component of this list is an object of class kernelSVM (if argument withbias is FALSE) or class
kernelksvms (if argument withbias is TRUE) containing the trained SVM for a particular weight specification.
[1] Dogan U.; Glasmachers T. and Igel, C. (2016) A unified view on multi-class support vector classification. Journal of Machine Learning Research, Vol. 17 (45), 1–32.
[2] Duarte Silva, A.P. (2025) Probabilistic Vector Machines. Computers and Operations Research, Vol. 183, 107203. <doi:10.1016/j.cor.2025.107203>
[3] Lee, Y.; Lin, Y. and Wahba, G. (2004) Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, Vol. 99, 67–81. <doi:10.1198/016214504000000098>
[4] Weston, J. and Watkins, C. (1999) Support vector machines for multi-class pattern recognition. In Proceedings of the 7th European Symposium on Artificial Neural Networks, 219–224.
predict.PVM, trainSVM, SetUpOptPar, GetrbfdotSigPar, plot.ClassProb
## The following examples are based on the MASS data set "crabs". # This data records physical measurements on 200 specimens of # Leptograpsus variegatus crabs observed on the shores ## of Western Australia. The crabs are classified by two factors, sex and sp ## (crab species as defined by its colour: blue or orange), with two levels ## each. The measurement variables include the natural logarithms of carapace length (CL), ## the carapace width (CW), the size of the frontal lobe (FL) and the size of ## the rear width (RW). In the analysis provided, we created four classes # by crossing the sex and sp levels. library(MASS) data(crabs) crabs$grp <- interaction(crabs$sex,crabs$sp) crabs$lnFL <- log(crabs$FL) crabs$lnRW <- log(crabs$RW) crabs$lnCL <- log(crabs$CL) crabs$lnCW <- log(crabs$CW) crabs$lnBD <- log(crabs$BD) # Estimate class probabilities based on the WW loss WWpvm <- trainPVM(crabs[,10:14],crabs$grp) WWprob <- predict(WWpvm,newdata=crabs[,10:14],trndt="newdata") # Display the probabilites of the predicted classes plot(WWprob, , trdata=crabs[,10:14], grouping=crabs$grp) plot(WWprob, , trdata=crabs[,10:14], grouping=crabs$grp, type="stckdbarplt") WWprob # Repeat the analysis, using the first 45 observations in each class for training, # and the last 5 for probability estimation trind <- c(1:45,51:95,101:145,151:195) evalind <- c(46:50,96:100,146:150,196:200) trdata <- crabs[trind,10:14] trgrp <- crabs$grp[trind] evaldata <- crabs[evalind,10:14] WWpvm1 <- trainPVM(trdata,trgrp) WWprob1 <- predict(WWpvm1,newdata=evaldata,trndt=trdata) plot(WWprob1, type="stckdbarplt") WWprob1## The following examples are based on the MASS data set "crabs". # This data records physical measurements on 200 specimens of # Leptograpsus variegatus crabs observed on the shores ## of Western Australia. The crabs are classified by two factors, sex and sp ## (crab species as defined by its colour: blue or orange), with two levels ## each. The measurement variables include the natural logarithms of carapace length (CL), ## the carapace width (CW), the size of the frontal lobe (FL) and the size of ## the rear width (RW). In the analysis provided, we created four classes # by crossing the sex and sp levels. library(MASS) data(crabs) crabs$grp <- interaction(crabs$sex,crabs$sp) crabs$lnFL <- log(crabs$FL) crabs$lnRW <- log(crabs$RW) crabs$lnCL <- log(crabs$CL) crabs$lnCW <- log(crabs$CW) crabs$lnBD <- log(crabs$BD) # Estimate class probabilities based on the WW loss WWpvm <- trainPVM(crabs[,10:14],crabs$grp) WWprob <- predict(WWpvm,newdata=crabs[,10:14],trndt="newdata") # Display the probabilites of the predicted classes plot(WWprob, , trdata=crabs[,10:14], grouping=crabs$grp) plot(WWprob, , trdata=crabs[,10:14], grouping=crabs$grp, type="stckdbarplt") WWprob # Repeat the analysis, using the first 45 observations in each class for training, # and the last 5 for probability estimation trind <- c(1:45,51:95,101:145,151:195) evalind <- c(46:50,96:100,146:150,196:200) trdata <- crabs[trind,10:14] trgrp <- crabs$grp[trind] evaldata <- crabs[evalind,10:14] WWpvm1 <- trainPVM(trdata,trgrp) WWprob1 <- predict(WWpvm1,newdata=evaldata,trndt=trdata) plot(WWprob1, type="stckdbarplt") WWprob1
trainSVM creates objects of class kernelSVM or class kernelSVMs
by training one or several kernel-based (possibly weighted) Support Vector Machines
for general multiclass problems, using an 'one-in-all' global model approach.
trainSVM(x=NULL, y, class.weights=rep(1.,length(levels(y))), scaled=TRUE, K=NULL, loss=c("WW","LLW"), kernel=c("rbfdot","vanilladot","polydot"), kpar=list(sigma="d2median"), C=1., lambda=NULL, keepdt=TRUE, retotpst=FALSE, OptCntrl=SetUpOptPar(), TunCntrl=SetUpTunPar() )trainSVM(x=NULL, y, class.weights=rep(1.,length(levels(y))), scaled=TRUE, K=NULL, loss=c("WW","LLW"), kernel=c("rbfdot","vanilladot","polydot"), kpar=list(sigma="d2median"), C=1., lambda=NULL, keepdt=TRUE, retotpst=FALSE, OptCntrl=SetUpOptPar(), TunCntrl=SetUpTunPar() )
x |
a matrix or data frame containing the training data, with observations in rows and variables in columns.
If x is equal to NULL, a kernel matrix should be provided directly in argument |
y |
a factor describing the response vector, with one label by observation. |
class.weights |
a vector or a matrix of class weights.
When class.weights is a vector, its length shoul equal to number of classes, and the i-th component represents the weight given
to the i-th class in the SVM loss function. For classical (non-weighted) SVMs all its components are equal to one (default).
In weighted SVM variants, the class.weights components may be different and should add up to one.
When |
scaled |
A logical flag indicating wheter or not the variables should be scaled to unit variance. |
K |
A symmetrix kernel matrix. When equal to NULL, the training data should be given by argument |
loss |
a string specifying the multiclass large margin loss to be employed. Currently the following alternatives are implemented: “WW” for the Weston and Watkins loss (see reference [6]), and “LLW” for the Lee, Lin and Wahba loss (see reference [4]). In two-class problems these two losses are equivalent and also equivalent to the classical hinge loss. In problems with more than two classes these losses differ and the “LLW” loss has better assymptotic properties, but the “WW” loss was found empirically to usually give more reliable results in many pratical applications. |
kernel |
the kernel function used for training and prediction. Currentlty this argument can be set to one of the following strings:
|
kpar |
the list of kernel hyper-parameters. This is a list which contains the parameters to be used with the kernel function. Valid parameters for existing kernels are :
|
C |
cost of constraints violation (default: 1). This is the "C"-constant of the regularization term in the Lagrange
formulation. When set to the string “tuneit” the value of "C" is tuned acccording
to the procedure defined by the value of argument |
lambda |
regularization parameter in the functional analysis formulation of SVMs. When set to a non-NULL value, overrides
the value of argument |
keepdt |
a logical flag indicating if the orginal training data should be returned together with the trainded SVM(s) |
retotpst |
a logical flag indicating if internal optimization results and statistics should be returned together with the trainded SVM(s) |
OptCntrl |
a list with several control arguments for the optimization algorithm used in the SVM(s) training. See |
TunCntrl |
a list with several arguments controlling the hyperparameter ( |
trainSVM trains multiclass kernel-based (possibly weighted) Supported Vector Machines (SVMs) without bias terms.
It implements Duarte Silva's adaptation (see reference [3]) of Dogan, Glasmachers and Igel' s efficient training
algorithm (see reference [1]) for 'all-in-one' multiclass SVMs. In SVM problems with more than two classes, dropping
bias terms has considerable computational advantages, and it has been argued that the resulting rules have essentially
the same statistcal properties (at least asymptotically for universal kernels) as the corresponding
rules with bias terms. See references [2], [3] and [5] for further discussion.
Currently there are two variants of multiclass SVMs implemented in trainSVM: (i) machines that use the SVM loss proposed by Weston and Watkins (see reference [6]), and machines that use the loss proposed by Lee, Lin and Wahba (see reference [4]). The later variant has better asymptotic properties, but there are some empirical evidence suggesting that the former often gives more reliable results in many applications (see, e.g. reference [2]). For two-class problems these two variants are equivalent among themselves and to the classical SVM hinge loss.
The amount of regularization of the resulting SVMs is controled by the value of the hyper-paramters C or lambda.
While C (the cost of constraint violations) is the traditional regularization hyper-parameter used in the majority of the SVM literature,
there are also alternative formulations in which SVM training is presented as a particular case of regularized function estimation
in functional analysis. Under this perspective, the SVM constraint violations are viewed as unit-cost losses, and the margin component
of the traditional SVM criterion is understood as a complexity penalty that is weighted by the regularization hyper-paramter lambda.
The two formulations are mathematically equivalent if lambda is set to 1/(2 n C), or if C is set to 1/(2 n lambda),
with n being the number of observations in the training data. Note that, as C and lambda are inversely
related, higher values of lambda and lower values of C correspond to higher regularization and smoother classification rules,
while lower lambda values and higher C values lead to more complex (and flexible) rules, with better training sample performance,
but potentially worse generability ability.
In trainSVM the values of C and lambda can be set by the user, or tuned automatically to the training or tuning data acccording
to the procedure defined by the value of argument TunCntrl. See the documentation of SetUpTunPar for further details.
An object inheriting from class “kernelSVM” containing the trained Support Vector Machine(s). It could be
either an “kernelSVM” object if argument class.weights is a vector or a single-row
matrix, or an “kernelSVMs” object otherwise. Class “kernelSVM” has a predict method,
that gives class predictions from “kernelSVM” objects, and data frames or kernel matrices of new (or training) data.
An object inheriting from class “kernelSVM” is a list containing, at least, the following components:
in “kernelSVM” classes, a matrix (observations in rows, classes in columns) containing the resulting support vectors. In “kernelSVMs” classes, a three-dimensional array, in which each level of the third dimension corresponds to a different trained SVM, and the first two dimensions contain the correspondig support vectors, with observations in rows and classes in columns.
the levels of the factor representing the different classes.
the value of the lambda regularization hyperparameter used in the SVM(s) training.
the value of the C regularization hyperparameter used in the SVM(s) training.
when argument keepdt is TRUE, a matrix or data frame containing the training data, with
observations in rows and variables in columns. Otherwise, the value NULL.
when argument scaled is TRUE, a vector with the variables standard deviations, that were used for
data scaling. Otherwise, the value NULL.
the value of argument kernel in the call to trainSVM.
the value of argument kpar in the call to trainSVM.
this component is only non NULL when argument retotpst is set to TRUE. In that case it is a list with the following components:
the returned value of the optimization model criterion used for SVM training .
the number of iterations used by the SVM training optimization algorithm.
the classification hit rate (when known) of the trained SVM. Tipically this value is only available when arguments C
or lamdba are set to the string “tuneit”, and refers to the optimal hit rate obtained in the tuning data.
When C or lambda is not tuned, this component is set to NULL.
[1] Dogan, U.; Glasmachers, T. and Igel, C. (2011) Fast training of multi-class support vector machines. Technical report. Department of Computer Sciences, University of Copenhagen.
[2] Dogan U.; Glasmachers T. and Igel, C. (2016) A unified view on multi-class support vector classification. Journal of Machine Learning Research, Vol. 17 (45), 1–32.
[3] Duarte Silva, A.P. (2025) Probabilistic Vector Machines. Computers and Operations Research, Vol. 183, 107203. <doi:10.1016/j.cor.2025.107203>
[4] Lee, Y.; Lin, Y. and Wahba, G. (2004) Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, Vol. 99, 67–81. >doi:10.1198/016214504000000098>
[5] Poggio, T.; Mukherjee, S.; Rifkin, R.; Raklin, A. and Verri, A. (2002). B. In Uncertainty in Geometric Computations, Springer US.
[6] Weston, J. and Watkins, C. (1999) Support vector machines for multi-class pattern recognition. In Proceedings of the 7th European Symposium on Artificial Neural Networks, 219–224.
predict.kernelSVM, SetUpOptPar, SetUpTunPar, GetrbfdotSigPar.
## The following examples are based on the MASS data set "crabs". # This data records physical measurements on 200 specimens of # Leptograpsus variegatus crabs observed on the shores ## of Western Australia. The crabs are classified by two factors, sex and sp ## (crab species as defined by its colour: blue or orange), with two levels ## each. The measurement variables include the natural logarithms of carapace length (CL), ## the carapace width (CW), the size of the frontal lobe (FL) and the size of ## the rear width (RW). In the analysis provided, we created four classes # by crossing the sex and sp levels. library(MASS) data(crabs) crabs$grp <- interaction(crabs$sex,crabs$sp) crabs$lnFL <- log(crabs$FL) crabs$lnRW <- log(crabs$RW) crabs$lnCL <- log(crabs$CL) crabs$lnCW <- log(crabs$CW) crabs$lnBD <- log(crabs$BD) # train the WW and LLW SVMs, with their default setings, or the crabs data WWsvm <- trainSVM(crabs[,10:14],crabs$grp,loss="WW") LLWsvm <- trainSVM(crabs[,10:14],crabs$grp,loss="LLW") # Get in-sample classification results WWpred <- predict(WWsvm,newdata=crabs[,10:14]) WWpred LLWpred <- predict(LLWsvm,newdata=crabs[,10:14]) LLWpred # Compare classifications with true assignments print(WWpred==crabs$grp) print(LLWpred==crabs$grp) # Repeat the analysis, by tuning # the regularization hyper-paremeter to the training data WWsvm1 <- trainSVM(crabs[,10:14],crabs$grp,C="tuneit") WWpred1 <- predict(WWsvm1,newdata=crabs[,10:14]) WWpred1 print(WWpred1==crabs$grp) LLWsvm1 <- trainSVM(crabs[,10:14],crabs$grp,C="tuneit",loss="LLW") LLWpred1 <- predict(LLWsvm1,newdata=crabs[,10:14]) LLWpred1 print(LLWpred1==crabs$grp) # Repeat the analysis, for the WW loss only, using the first 45 observations # in each class for training, and the last 5 for prediction trind <- c(1:45,51:95,101:145,151:195) evalind <- c(46:50,96:100,146:150,196:200) trdata <- crabs[trind,10:14] trgrp <- crabs$grp[trind] evaldata <- crabs[evalind,10:14] WWsvm2 <- trainSVM(trdata,trgrp,C="tuneit") WWpred2 <- predict(WWsvm2,newdata=evaldata) WWpred2 print(WWpred2==crabs$grp[evalind]) # Now, tune C by a cross-validated estimate of the error rate in the training data WWsvm3 <- trainSVM(trdata,trgrp,C="tuneit",TunCntrl=list(crossval=TRUE)) WWpred3 <- predict(WWsvm3,newdata=evaldata) WWpred3 print(WWpred3==crabs$grp[evalind]) ## Repeat the analysis by using a pre-computed kernel matrix given by the K argument ## (note: this is recommended in multiple analysis based on the same data, in order ## to avoid uncessary repeated evalutions of the kernel funtion GramMat <- makeKMat(scale(crabs[,10:14],center=FALSE),kernel="rbfdot",kpar=list(sigma="d2median")) # Gram matrix, with sigma automatically adjusted to the scaled training data # All data as training data WWsvm4 <- trainSVM(K=GramMat,y=crabs$grp,C="tuneit") WWpred4 <- predict(WWsvm4,KMat=GramMat) WWpred4 print(WWpred4==crabs$grp) # Split 45 observations per group for training and 5 observations per group for prediction WWsvm5 <- trainSVM(K=GramMat[trind,trind],y=crabs$grp[trind],C="tuneit") WWpred5 <- predict(WWsvm5,KMat=GramMat[evalind,trind]) WWpred5 print(WWpred5==crabs$grp[evalind])## The following examples are based on the MASS data set "crabs". # This data records physical measurements on 200 specimens of # Leptograpsus variegatus crabs observed on the shores ## of Western Australia. The crabs are classified by two factors, sex and sp ## (crab species as defined by its colour: blue or orange), with two levels ## each. The measurement variables include the natural logarithms of carapace length (CL), ## the carapace width (CW), the size of the frontal lobe (FL) and the size of ## the rear width (RW). In the analysis provided, we created four classes # by crossing the sex and sp levels. library(MASS) data(crabs) crabs$grp <- interaction(crabs$sex,crabs$sp) crabs$lnFL <- log(crabs$FL) crabs$lnRW <- log(crabs$RW) crabs$lnCL <- log(crabs$CL) crabs$lnCW <- log(crabs$CW) crabs$lnBD <- log(crabs$BD) # train the WW and LLW SVMs, with their default setings, or the crabs data WWsvm <- trainSVM(crabs[,10:14],crabs$grp,loss="WW") LLWsvm <- trainSVM(crabs[,10:14],crabs$grp,loss="LLW") # Get in-sample classification results WWpred <- predict(WWsvm,newdata=crabs[,10:14]) WWpred LLWpred <- predict(LLWsvm,newdata=crabs[,10:14]) LLWpred # Compare classifications with true assignments print(WWpred==crabs$grp) print(LLWpred==crabs$grp) # Repeat the analysis, by tuning # the regularization hyper-paremeter to the training data WWsvm1 <- trainSVM(crabs[,10:14],crabs$grp,C="tuneit") WWpred1 <- predict(WWsvm1,newdata=crabs[,10:14]) WWpred1 print(WWpred1==crabs$grp) LLWsvm1 <- trainSVM(crabs[,10:14],crabs$grp,C="tuneit",loss="LLW") LLWpred1 <- predict(LLWsvm1,newdata=crabs[,10:14]) LLWpred1 print(LLWpred1==crabs$grp) # Repeat the analysis, for the WW loss only, using the first 45 observations # in each class for training, and the last 5 for prediction trind <- c(1:45,51:95,101:145,151:195) evalind <- c(46:50,96:100,146:150,196:200) trdata <- crabs[trind,10:14] trgrp <- crabs$grp[trind] evaldata <- crabs[evalind,10:14] WWsvm2 <- trainSVM(trdata,trgrp,C="tuneit") WWpred2 <- predict(WWsvm2,newdata=evaldata) WWpred2 print(WWpred2==crabs$grp[evalind]) # Now, tune C by a cross-validated estimate of the error rate in the training data WWsvm3 <- trainSVM(trdata,trgrp,C="tuneit",TunCntrl=list(crossval=TRUE)) WWpred3 <- predict(WWsvm3,newdata=evaldata) WWpred3 print(WWpred3==crabs$grp[evalind]) ## Repeat the analysis by using a pre-computed kernel matrix given by the K argument ## (note: this is recommended in multiple analysis based on the same data, in order ## to avoid uncessary repeated evalutions of the kernel funtion GramMat <- makeKMat(scale(crabs[,10:14],center=FALSE),kernel="rbfdot",kpar=list(sigma="d2median")) # Gram matrix, with sigma automatically adjusted to the scaled training data # All data as training data WWsvm4 <- trainSVM(K=GramMat,y=crabs$grp,C="tuneit") WWpred4 <- predict(WWsvm4,KMat=GramMat) WWpred4 print(WWpred4==crabs$grp) # Split 45 observations per group for training and 5 observations per group for prediction WWsvm5 <- trainSVM(K=GramMat[trind,trind],y=crabs$grp[trind],C="tuneit") WWpred5 <- predict(WWsvm5,KMat=GramMat[evalind,trind]) WWpred5 print(WWpred5==crabs$grp[evalind])