Title: | Rating Scale Reduction Procedure |
---|---|
Description: | Describes a new procedure of reducing items in a rating scale called Rating Scale Reduction (RSR). The new stop criterion in RSR procedure is added (stop global max). The function order is replaced by sort.list. |
Authors: | Waldemar W. Koczkodaj, Feng Li, Alicja Wolny-Dominiak |
Maintainer: | Alicja Wolny-Dominiak <[email protected]> |
License: | GPL-2 |
Version: | 1.4 |
Built: | 2024-10-31 20:35:22 UTC |
Source: | CRAN |
This package describes a procedure of reducing items in a rating scale. It was published in the refence included in this description. The method was proposed by Waldemar W. Koczkodaj and published by a sizable collboration coordinated by him.
Waldemar W. Koczkodaj, Feng Li,Alicja Wolny-Dominiak
Maintainer: Alicja Wolny-Dominiak
1. W.W. Koczkodaj, T. Kakiashvili, A. Szymanska, J. Montero-Marin, R. Araya, J. Garcia-Campayo, K. Rutkowski, D. Strzalka,
How to reduce the number of rating scale items without
predictability loss? Scientometrics, 909(2):581-593(open access), 2017
https://link.springer.com/article/10.1007/s11192-017-2283-4
2. T. Kakiashvili, W. W. Koczkodaj, and M. Woodbury-Smith. Improving the medical scale predictability
by the pairwise comparisons method: Evidence from a clinical data study. Computer Methods and
Programs in Biomedicine, 105(3), 2012
https://www.sciencedirect.com/science/article/abs/pii/S0169260711002586
3. X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez, and M. Muller. proc: an opensource
package for r and s+ to analyze and compare roc curves. BMC Bioinformatics, 2011
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-77
4. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson. Comparing the areas under two or more
correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 1988
The attribute checked for AUC before it is added to the running total. The running total is used with the class (decision attribute) to compute AUC. The next attribute is added to the sequence of attributes having the MAX total AUC.
CheckAttr4Inclusion(attribute, D, plotCheck=FALSE, method=c("delong", "bootstrap", "venkatraman", "sensitivity", "specificity"), boot.n, alternative = c("two.sided", "less", "greater"))
CheckAttr4Inclusion(attribute, D, plotCheck=FALSE, method=c("delong", "bootstrap", "venkatraman", "sensitivity", "specificity"), boot.n, alternative = c("two.sided", "less", "greater"))
attribute |
a matrix or data.frame containing attributes |
D |
the decision vector |
plotCheck |
If TRUE the plot with two ROC curves is created |
method |
the method to useas in the function roc.test{pROC} |
boot.n |
boostrap replication number |
alternative |
the alternative hipothesis |
test |
the result of the roc.test as in the function roc.test from the package pROC |
Waldemar W. Koczkodaj, Feng Li,Alicja Wolny-Dominiak
1. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson. Comparing the areas under two or more
correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, pages
837 - 845, 1988.
2. W.W. Koczkodaj, T. Kakiashvili, A. Szymanska,
J. Montero-Marin, R. Araya, J. Garcia-Campayo, K. Rutkowski,
D. Strzalka,
How to reduce the number of rating scale items without
predictability loss? Scientometrics,909(2):581-593(open access), 2017
https://link.springer.com/article/10.1007/s11192-017-2283-4
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") decision <-as.numeric(outcome) #deLong test, two-side alternative hiphotesis CheckAttr4Inclusion(attribute, decision, method=c("delong"), alternative=c("two.side")) #bootstrap, two-side alternative hiphotesis #CheckAttr4Inclusion(attribute, decision, method=c("bootstrap"), boot.n=500) #
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") decision <-as.numeric(outcome) #deLong test, two-side alternative hiphotesis CheckAttr4Inclusion(attribute, decision, method=c("delong"), alternative=c("two.side")) #bootstrap, two-side alternative hiphotesis #CheckAttr4Inclusion(attribute, decision, method=c("bootstrap"), boot.n=500) #
Datasets often contain replications. In particular, one example may be replicated n times, where n is the total number of examples, so that there are no other examples. Such situation would deviate computations and should be early detected. Ideally, no example should be replicated but if the rate is small, we can progress to computing AUC.
diffExamples(attribute)
diffExamples(attribute)
attribute |
a matrix or data.frame containing attributes |
total.examples |
a number of examples in a data |
diff.examples |
a number of different examples in a data |
dup.exapmles |
a number of duplicate examples in a data |
Waldemar W. Koczkodaj, Feng Li,Alicja Wolny-Dominiak
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") #show the number of different examples diffExamples(attribute)
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") #show the number of different examples diffExamples(attribute)
A subset of data with examples having identical values on all attributes (excluding the class attribute also called the decision attribute which is different and has two permited values: positive and negative)
grayExamples(attribute, D)
grayExamples(attribute, D)
attribute |
a matrix or data.frame containing attributes |
D |
the decision vector |
1 |
a list of pairs of identical examples on all atributes |
Waldemar W. Koczkodaj, Alicja Wolny-Dominiak
#generate data a=c(); attribute=c() for (i in 1:3){ a <-sample(c(1,2,3), 100, replace=TRUE) attribute <-cbind(attribute, a) attribute=data.frame(attribute) } colnames(attribute)=c("a1", "a2", "a3") names(attribute) decision=sample(c(0,1), 100, replace=TRUE) #check examples grayExamples(attribute, decision)
#generate data a=c(); attribute=c() for (i in 1:3){ a <-sample(c(1,2,3), 100, replace=TRUE) attribute <-cbind(attribute, a) attribute=data.frame(attribute) } colnames(attribute)=c("a1", "a2", "a3") names(attribute) decision=sample(c(0,1), 100, replace=TRUE) #check examples grayExamples(attribute, decision)
This package implements a rather sophisticated method published in (Koczkodaj et al., 2017) In essence, it is a stepwise method fro maximizing the area under the area (AUC) of receiver operating characteristic (ROC). In this description, data mining terminology will be used:
examples (observations in statistics),
variables in statistics,
class or decision attribute (decision variable may be used statistics).
The implemented algorithm (when reduced to its minimum) comes to using a loop for all attributes (with the class excluded) to compute AUC. Subsequently, attributes are sorted in the descending order by AUC. The attribute with the largest AUC is added to a subset of all attributes (evidently, it cannot be empty since it is supposed to be the minimum subset S of all attributes with the maximum AUC). We keep adding the next in line (according to AUC) attribute to the subset S checking AUC. If it decreases, we stop the procedure. The above procedure can be described by the following algorithm.
Algorithm:
compute AUC of all attributes excluding class
sort attributes by their AUC in the ascending order
select the attribute with the largest AUC to subset S
select the next attribute A with the largest AUC to subset S
if the AUC of the subset S is larger that AUC of the former AUC then go to 3
There are a lot of checking (e.g., if the dataset is not empty or full of replications) involved.
rsr(attribute, D, plotRSR = FALSE, method=c('Stop1Max', 'StopGlobalMax'))
rsr(attribute, D, plotRSR = FALSE, method=c('Stop1Max', 'StopGlobalMax'))
attribute |
a matrix or data.frame containing attributes |
D |
the decision vector |
plotRSR |
If TRUE the ROC curve is ploted |
method |
the Stop reduction criteria: First Max of AUC or Global Max of AUC, default: 'Stop1Max' |
rsr.auc |
total AUC of atrtibutes |
rsr.label |
attribute labels |
summary |
a summary table |
Waldemar W. Koczkodaj, Alicja Wolny-Dominiak
1. W.W. Koczkodaj, T. Kakiashvili, A. Szymanska, J. Montero-Marin, R. Araya, J. Garcia-Campayo, K. Rutkowski, D. Strzalka,
How to reduce the number of rating scale items without
predictability loss? Scientometrics, 909(2):581-593(open access), 2017
https://link.springer.com/article/10.1007/s11192-017-2283-4
2. T. Kakiashvili, W. W. Koczkodaj, and M. Woodbury-Smith. Improving the medical scale predictability
by the pairwise comparisons method: Evidence from a clinical data study. Computer Methods and
Programs in Biomedicine, 105(3), 2012
https://www.sciencedirect.com/science/article/abs/pii/S0169260711002586
3. X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez, and M. Muller. proc: an opensource
package for r and s+ to analyze and compare roc curves. BMC Bioinformatics, 2011
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-77
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") decision <-as.numeric(outcome) #rating scale reduction procedure rsred <-rsr(attribute, decision, plotRSR=TRUE) rsred
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") decision <-as.numeric(outcome) #rating scale reduction procedure rsred <-rsr(attribute, decision, plotRSR=TRUE) rsred
Compute AUC of every single attribute
startAuc(attribute, D)
startAuc(attribute, D)
attribute |
a matrix or data.frame containing attributes |
D |
the decision vector |
auc |
AUC of a single attribute |
item |
attribute labels |
summary |
a summary table |
Waldemar W. Koczkodaj, Alicja Wolny-Dominiak
1. W.W. Koczkodaj, T. Kakiashvili, A. Szymanska, J. Montero-Marin, R. Araya, J. Garcia-Campayo, K. Rutkowski, D. Strzalka,
How to reduce the number of rating scale items without
predictability loss? Scientometrics, 909(2):581-593(open access), 2017
https://link.springer.com/article/10.1007/s11192-017-2283-4
2. X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez, and M. Muller. proc: an opensource
package for r and s+ to analyze and compare roc curves. BMC Bioinformatics, 2011
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-77
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") decision <-as.numeric(outcome) #compute AUC of all attributes start <-startAuc(attribute, decision) start$summary
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") decision <-as.numeric(outcome) #compute AUC of all attributes start <-startAuc(attribute, decision) start$summary
AUC values are computed for all individual attributes. We sort them in an ascending order. We beging with the attribute having the largest AUC and add to it the second, third,... attribute until AUC of the total of them decreases.
totalAuc(attribute, D, plotT = FALSE)
totalAuc(attribute, D, plotT = FALSE)
attribute |
a matrix or data.frame containing attributes |
D |
the decision vector |
plotT |
If TRUE the plot is created: x - labels of atrributes, y - total AUC in ascending order |
ordered.attribute |
ordered attribute matrix |
total.auc |
total AUC |
item |
ordered attribute labels |
summary |
a summary table |
Waldemar W. Koczkodaj, Alicja Wolny-Dominiak
1. W.W. Koczkodaj, T. Kakiashvili, A. Szymanska, J. Montero-Marin, R. Araya, J. Garcia-Campayo, K. Rutkowski, D. Strzalka,
How to reduce the number of rating scale items without
predictability loss? Scientometrics, 909(2):581-593(open access), 2017
https://link.springer.com/article/10.1007/s11192-017-2283-4
2. T. Kakiashvili, W. W. Koczkodaj, and M. Woodbury-Smith. Improving the medical scale predictability
by the pairwise comparisons method: Evidence from a clinical data study. Computer Methods and
Programs in Biomedicine, 105(3), 2012
https://www.sciencedirect.com/science/article/abs/pii/S0169260711002586
3. X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez, and M. Muller. proc: an opensource
package for r and s+ to analyze and compare roc curves. BMC Bioinformatics, 2011
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-77
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") decision <-as.numeric(outcome) #arrange start AUC in an ascending order and compute total AUC according to #Rating Scale Reduction procedure tot <-totalAuc(attribute, decision, plotT=TRUE) tot$summary
#creating the matrix of attributes and the decision vector #must be as.numeric() data(aSAH) attach(aSAH) is.numeric(aSAH) attribute <-data.frame(as.numeric(gender), as.numeric(age), as.numeric(wfns), as.numeric(s100b), as.numeric(ndka)) colnames(attribute) <-c("a1", "a2", "a3", "a4", "a5") decision <-as.numeric(outcome) #arrange start AUC in an ascending order and compute total AUC according to #Rating Scale Reduction procedure tot <-totalAuc(attribute, decision, plotT=TRUE) tot$summary