Package 'ConsRankClass' reference manual

Title:	Classification and Clustering of Preference Rankings
Description:	Tree-based classification and soft-clustering method for preference rankings, with tools for external validation of fuzzy clustering. It contains the recursive partitioning algorithm for preference rankings, non-parametric tree-based method for a matrix of preference rankings as a response variable. It contains also the distribution-free soft clustering method for preference rankings, namely the K-median cluster component analysis (CCA). The package depends on the 'ConsRank' R package. Options for validate the tree-based method are both test-set procedure and V-fold cross validation. The package contains the routines to compute the adjusted concordance index (a fuzzy version of the adjusted rand index) and the normalized degree of concordance (the corresponding fuzzy version of the rand index). Essential references: D'Ambrosio, A., Amodio, S., Iorio, C., Pandolfo, G., and Siciliano, R. (2021) <doi:10.1007/s00357-020-09367-0> D'Ambrosio, A., and Heiser, W.J. (2019) <doi:10.1007/s41237-018-0069-5>; D'Ambrosio, A., and Heiser W.J. (2016) <doi:10.1007/s11336-016-9505-1>; Hullermeier, E., Rifqi, M., Henzgen, S., and Senge, R. (2012) <doi:10.1109/TFUZZ.2011.2179303>.
Authors:	Antonio D'Ambrosio [aut, cre]
Maintainer:	Antonio D'Ambrosio <[email protected]>
License:	GPL-3
Version:	1.0.1
Built:	2025-01-24 06:42:31 UTC
Source:	CRAN

K-Median Cluster Component Analysis

Description

K-Median Cluster Component Analysis, a distribution-free soft-clustering method for preference rankings.

Usage

cca(X, k, control = ccacontrol(...), ...)
cca(X, k, control = ccacontrol(...), ...)

Arguments

`X`	A n by m data matrix containing preference rankings, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns.
`k`	The number of cluster components
`control`	a list of options that control details of the `cca` algorithm governed by the function `ccacontrol`. The options govern maximum number of iterations of `cca` (itercca=1 is the default), the algorithm chosen to compute the median ranking (default, "quick"), and other options related to the consrank algorithm, which is called by `cca`
`...`	arguments passed bypassing `ccacontrol`

Details

The user can use any algorithm implemented in the consrank function from the ConsRank package. All algorithms allow the user to set the option 'full=TRUE' if the median ranking(s) must be searched in the restricted space of permutations instead of in the unconstrained universe of rankings of n items including all possible ties. There are two classification uncertainty measures: Us and Uprods. "Us" is the geometric mean of the membership probabilities of each individual, normalized in such a way that in the case of maximum uncertainty Us=1. "Ucca" is the average of all the "Us". "Uprods" is the product of the membership probabilities of each individual, normalized in such a way that in the case of maximum uncertainty Uprods=1. "Uprodscca" is the average of all the "Uprods".

Value

An object of the class "cca". It contains:

pk		the membership probability matrix
clc		cluster centers
oclc		cluster centers in terms of orderings
idc		crisp partition: id of the cluster component associated with the highest membership probability
Hcca		Global homogeneity measure (tau_X rank correlation coefficient)
hk		Homogeneity within cluster
props		estimated proportion of cases within cluster
Us		Uncertainty measure per-individual (see details)
Ucca		Global uncertainty measure
Uprods		Uncertainty measure per-individual (see details)
Uprodscca		Global uncertainty measure
consrankout		complete output of rank aggregation algorithm, containing eventually multiple median rankings

Author(s)

Antonio D'Ambrosio [email protected]

References

D'Ambrosio, A. and Heiser, W.J. (2019). A Distribution-free Soft Clustering Method for Preference Rankings. Behaviormetrika , vol. 46(2), pp. 333–351, DOI: 10.1007/s41237-018-0069-5

Heiser W.J., and D'Ambrosio A. (2013). Clustering and Prediction of Rankings within a Kemeny Distance Framework. In Berthold, L., Van den Poel, D, Ultsch, A. (eds). Algorithms from and for Nature and Life.pp-19-31. Springer international. DOI: 10.1007/978-3-319-00035-0_2.

Ben-Israel, A., and Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), pp.5-26. DOI: 10.1007/s00357-008-9002-z

Examples


data(Irish)
set.seed(135) #for reproducibility
# CCA with four components
ccares <- cca(Irish$rankings, 4, itercca=10)
summary(ccares)



data(Irish)
set.seed(135) #for reproducibility
# CCA with four components
ccares <- cca(Irish$rankings, 4, itercca=10)
summary(ccares)

Utility function

Description

Utility function to use to set the control arguments of cca

Usage

ccacontrol(
  algorithm = "quick",
  full = FALSE,
  itercca = 1,
  consrankitermax = 10,
  np = 15,
  gl = 100,
  ff = 0.4,
  cr = 0.9,
  proc = FALSE,
  ps = FALSE
)
ccacontrol(
  algorithm = "quick",
  full = FALSE,
  itercca = 1,
  consrankitermax = 10,
  np = 15,
  gl = 100,
  ff = 0.4,
  cr = 0.9,
  proc = FALSE,
  ps = FALSE
)

Arguments

`algorithm`	The algorithm used to compute the median ranking. One among"BB", "quick" (default), "fast" and "decor"
`full`	Specifies if the median ranking must be searched in the universe of rankings including all the possible ties. Default: FALSE
`itercca`	Number of iterations of cca
`consrankitermax`	Number of iterations for "fast" and "decor" algorithms. itermax=10 is the default option.
`np`	(for "decor" only) the number of population individuals. np=15 is the default option.
`gl`	(for"decor" only) generations limit, maximum number of consecutive generations without improvement. gl=100 is the default option.
`ff`	(for"decor" only) the scaling rate for mutation. Must be in [0,1]. ff=0.4 is the default option.
`cr`	(for"decor" only) the crossover range. Must be in [0,1]. cr=0.9 is the default option.
`proc`	(for "BB" only) proc=TRUE allows the branch and bound algorithm to work in difficult cases, i.e. when the number of objects is larger than 15 or 25. proc=FALSE is the default option
`ps`	If PS=TRUE, on the screen some information about how many branches are processed are displayed. Default value: FALSE

Value

A list containing all the control parameters

Author(s)

Antonio D'Ambrosio [email protected]

European Values Studies (EVS) data

Description

Random sub-sample of 3584 cases of the survey conducted in 1999 in 32 countries analyzed by Vermunt (2003).

Usage

data("EVS")data("EVS")

Format

The format is: List of 3

$ data:'data.frame': 1911 obs. of 11 variables:

country, gender ,yearbird, mstatus (marital status), eduage (age of education completion), employment (Employment status: ordinal scale 1-8), householdinc (Household income: ordinal scale 1-10), A (Maintain order in Nation),Give people more say in Government decisions, (C) Fight rising prices, (D) Protect freedom of speech.

$ predictors:'data.frame' with all the predictors

$ rankings : matrix with the preferencres for "A" (Maintain order in Nation), "B" (Give people more say in Government decisions), "C" (Fight rising prices), "D" (Protect freedom of speech).

Details

Rankings were obtained by applying the post-materialism scale developed by Inglehart (1977). The scale is based upon an experiment of the type “pick 2 out of 4” most important political goals for your Governments. For this reason, replace the 'NA's with 3 before using the rankings with codes 'ranktree' or 'cca' (see D'Ambrosio and Heiser, 2016). About the predictors, the coding of the Countries are: G1 (Austria, Denmark, Netherlands, Sweden), G2 (Belgium, Croatia, France, Greece, Ireland, Northern Ireland, Spain), G3 (Bulgaria, Czechnia, East, Germany, Finland, Iceland, Luxembourg, Malta, Portugal, Romania, Slovenia, West Germany), G4 (Belarus, Estonia, Hungary, Latvia, Lithuania, Poland, Russia, Slovakia, Ukraine). Coding of predictor "mstatus" are: mar (married), wid (widowed), div (divorced), sep (separated), nevm (never married).

Source

http://statisticalinnovations.com/technicalsupport/choice_datasets.html

References

Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33(1), 213–239.

Inglehart, R. (1977). The silent revolution: Changing values and political styles among Western Publics. Princeton, NJ: Princeton University Press.

D'Ambrosio, A., and Heiser W.J. (2016). A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika, vol. 81 (3), pp.774-94.

Examples

data(EVS)

# EVS$rankings[is.na(EVS$rankings)] <- 3 #place unranked objects in a tie to the third position
# ccares <- cca(EVS$rankings,4) #solution with 4 components

data(EVS)

# EVS$rankings[is.na(EVS$rankings)] <- 3 #place unranked objects in a tie to the third position
# ccares <- cca(EVS$rankings,4) #solution with 4 components

Normalized Degree of Concordance (NDC) and Adjusted Concordance Index (ACI)

Description

Given two fuzzy (Ruspini) partitions, it compute the NDC and the ACI. NDC is the fuzzy version of the Rand Index, as well as ACI is the fuzzy version of the Adjusted Rand Index

Usage

fuzzyconcordance(P, Q, nperms = 1000)
fuzzyconcordance(P, Q, nperms = 1000)

Arguments

`P`	A fuzzy partition. It has to be a matrix with n rows and k columns. Each column is expression of the degree of membership of the i-th row over the k partitions (see details).
`Q`	A fuzzy partition. It has to be a matrix with n rows and h columns. Each column is expression of the degree of membership of the i-th row over the h partitions (see details).
`nperms`	number of permutations necessary to compute ACI. Default: 1000

Details

Both P and Q, or only one of those, can be crisp (or hard) partitions. In this case, each row must contain either 0 or 1, and the sum of the i-th row must be 1. In other words, either P or Q (or both) are expressed in terms of dummy coding. If both partitions are crisp, then NDC is equal to Rand Index and ACI is equal to Adjusted Rand Index. This function can be used to externally validate the output of any fuzzy clustering method

Value

A list containing:

ACI		the Adjusted Concordance Index
NDC		the Normalized Degree of Concordance

Author(s)

Antonio D'Ambrosio [email protected]

References

D’Ambrosio, A., Amodio, S., Iorio, C., Pandolfo, G. and Siciliano, R. (2021). Adjusted Concordance Index: an Extension of the Adjusted Rand Index to Fuzzy Partitions. Journal of Classification vol. 38(1), pp. 112–128 (2021). DOI: 10.1007/s00357-020-09367-0

Hullermeier, E., Rifqi, M., Henzgen, S., and Senge, R. (2012). Comparing fuzzy partitions: a generalization of the Rand index and related measures. IEEE Transactions on Fuzzy Systems, 20(3), 546–556. DOI: 10.1109/TFUZZ.2011.2179303

Examples

#two random fuzzy partitions
P = rbind(c(0.5259,  0.1656,    0.3085),
c(0.5623,    0.1036,    0.3341),
c(0.2508,    0.1849,    0.5643),
c(0.5654,    0.1934,    0.2413),
c(0.4529,    0.1679,    0.3792),
c(0.2390,    0.1758,    0.5852),
c(0.3114,    0.1743,    0.5143),
c(0.4188,    0.1392,    0.4420),
c(0.5830,    0.1655,    0.2514),
c(0.5860,    0.1171,    0.2969),
c(0.2630,    0.1706,    0.5664),
c(0.5882,    0.1032,    0.3086),
c(0.5829,    0.1277,    0.2894),
c(0.3942,    0.1046,    0.5012),
c(0.5201,    0.1097,    0.3702),
c(0.2568,    0.1823,    0.5609),
c(0.3687,    0.1695,    0.4618),
c(0.5663,    0.1317,    0.3020),
c(0.5169,    0.1950,    0.2881),
c(0.5838,    0.1034,    0.3128))

Q = rbind(c(0.4494,    0.3755,    0.1751),
c(0.5219,    0.3526,    0.1255),
c(0.3432,    0.5062,    0.1506),
c(0.3120,    0.5181,    0.1699),
c(0.5362,    0.2747,    0.1891),
c(0.4082,    0.3959,    0.1959),
c(0.4670,    0.3782,    0.1547),
c(0.4276,    0.4585,    0.1139),
c(0.4013,    0.4837,    0.1149),
c(0.3724,    0.5019,    0.1258),
c(0.5055,    0.3104,    0.1841),
c(0.4027,    0.4719,    0.1254),
c(0.3565,    0.4620,    0.1814),
c(0.6106,    0.2650,    0.1244),
c(0.5595,    0.2476,    0.1929),
c(0.4657,    0.3993,    0.1350),
c(0.2964,    0.5839,    0.1197),
c(0.5387,    0.3362,    0.1251),
c(0.4043,    0.4341,    0.1616),
c(0.5631,    0.2895,    0.1473))
         
ci <- fuzzyconcordance(P,Q) 

#generate a random fuzzy partition with two components (clusters)
Q2 <- matrix(runif(20),ncol=1)
Q2 <- cbind(Q2,1-Q2)

ci2 <- fuzzyconcordance(P,Q2)

#generate a random crisp partition
P2 <- t(rmultinom(20,1,c(0.3,0.3,0.4)))

ci3 <- fuzzyconcordance(P2,Q)
#--------------------
## Not run: 
# install.packages("Rankcluster")
library("Rankcluster") # model-based clustering algorithm for
                     #  ranking data by Biernacki and Jacques (2013)
                     #  <doi:10.1016/j.csda.2012.08.008>
data(APA)
set.seed(136) #for reproducibility
rcres <- rankclust(APA$data,K=3)  # solution with 3 centers, it takes about 75 seconds
##
ccares <- cca(APA$data,k=3) #solution with 3 components, it takes about 7 seconds
##
ci <- fuzzyconcordance(rcres[3]@tik,ccares$pk)
ci$ACI  # 0.0226 means that the two partitions are similar (see NDC below), 
        # but their similarity is mainly due to chance
ci$NDC

## End(Not run)
 
  
    
#two random fuzzy partitions
P = rbind(c(0.5259,  0.1656,    0.3085),
c(0.5623,    0.1036,    0.3341),
c(0.2508,    0.1849,    0.5643),
c(0.5654,    0.1934,    0.2413),
c(0.4529,    0.1679,    0.3792),
c(0.2390,    0.1758,    0.5852),
c(0.3114,    0.1743,    0.5143),
c(0.4188,    0.1392,    0.4420),
c(0.5830,    0.1655,    0.2514),
c(0.5860,    0.1171,    0.2969),
c(0.2630,    0.1706,    0.5664),
c(0.5882,    0.1032,    0.3086),
c(0.5829,    0.1277,    0.2894),
c(0.3942,    0.1046,    0.5012),
c(0.5201,    0.1097,    0.3702),
c(0.2568,    0.1823,    0.5609),
c(0.3687,    0.1695,    0.4618),
c(0.5663,    0.1317,    0.3020),
c(0.5169,    0.1950,    0.2881),
c(0.5838,    0.1034,    0.3128))

Q = rbind(c(0.4494,    0.3755,    0.1751),
c(0.5219,    0.3526,    0.1255),
c(0.3432,    0.5062,    0.1506),
c(0.3120,    0.5181,    0.1699),
c(0.5362,    0.2747,    0.1891),
c(0.4082,    0.3959,    0.1959),
c(0.4670,    0.3782,    0.1547),
c(0.4276,    0.4585,    0.1139),
c(0.4013,    0.4837,    0.1149),
c(0.3724,    0.5019,    0.1258),
c(0.5055,    0.3104,    0.1841),
c(0.4027,    0.4719,    0.1254),
c(0.3565,    0.4620,    0.1814),
c(0.6106,    0.2650,    0.1244),
c(0.5595,    0.2476,    0.1929),
c(0.4657,    0.3993,    0.1350),
c(0.2964,    0.5839,    0.1197),
c(0.5387,    0.3362,    0.1251),
c(0.4043,    0.4341,    0.1616),
c(0.5631,    0.2895,    0.1473))
         
ci <- fuzzyconcordance(P,Q) 

#generate a random fuzzy partition with two components (clusters)
Q2 <- matrix(runif(20),ncol=1)
Q2 <- cbind(Q2,1-Q2)

ci2 <- fuzzyconcordance(P,Q2)

#generate a random crisp partition
P2 <- t(rmultinom(20,1,c(0.3,0.3,0.4)))

ci3 <- fuzzyconcordance(P2,Q)
#--------------------
## Not run: 
# install.packages("Rankcluster")
library("Rankcluster") # model-based clustering algorithm for
                     #  ranking data by Biernacki and Jacques (2013)
                     #  <doi:10.1016/j.csda.2012.08.008>
data(APA)
set.seed(136) #for reproducibility
rcres <- rankclust(APA$data,K=3)  # solution with 3 centers, it takes about 75 seconds
##
ccares <- cca(APA$data,k=3) #solution with 3 components, it takes about 7 seconds
##
ci <- fuzzyconcordance(rcres[3]@tik,ccares$pk)
ci$ACI  # 0.0226 means that the two partitions are similar (see NDC below), 
        # but their similarity is mainly due to chance
ci$NDC

## End(Not run)

Determine a tree from the main tree-based structure

Description

Given a tree belonging to the class "ranktree", determine a subtree with a given number of terminal nodes

Usage

getsubtree(Tree, cut, tokeep = NULL)
getsubtree(Tree, cut, tokeep = NULL)

Arguments

`Tree`	An object of the class "ranktree" coming form te function `ranktree`
`cut`	The maximum number of terminal nodes that the Tree must have
`tokeep`	parameter invoked by other internal functions

Details

If the pruning sequence returns a series of subtrees with, say, 1,2,4,7,9 terminal nodes and the user set cut=8, the function extract the subtree with 7 terminal nodes.

Value

An object of the class "ranktree", containing the same information of the output of the function ranktree

Author(s)

Antonio D'Ambrosio [email protected]

Examples


data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
#see how many terminal nodes have the trees compomimg the nested sequence of subtrees
infoprun <- tree$pruneinfo$termnodes
#select the tree with, say, 6 terminal nodes
tree6 <- getsubtree(tree,6)


data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
#see how many terminal nodes have the trees compomimg the nested sequence of subtrees
infoprun <- tree$pruneinfo$termnodes
#select the tree with, say, 6 terminal nodes
tree6 <- getsubtree(tree,6)

Irish Election data set

Description

An opinion poll conducted by Irish Marketing Surveys one month prior to the election in 1997. Interviews were conducted on about 1100 respondents, drawn from 100 sampling areas. Interviews took place at randomly located homes, with respondents selected according to a socioeconomic quota. A range of sociological questions was asked of each respondent, as was their voting preference, if any, for each of the candidates.

Usage

data("Irish")data("Irish")

Format

The format is: List of 3

$ IrishElection: 'data.frame': 1083 obs. of 11 variables: Gender (male, housewife, nonhousewife), marital status (single, married, separated), age, socialclass (five unordered categories), Area (rural, city, town), government satisfaction (no opinion,m satisfied, dissatisfied), Bano , Roch, McAl, Nall, Scal

$ predictors :'data.frame' with all the predictors

$ rankings : matrix with the preferencres for "Bano" "Roch" "McAl" "Nall"

Details

In the original version of the data, the ranking matrix contains NAs. Here, NAs are replaced with the number 7, to indicate that all the non-stated preferences are in a tie at the last position (see D'Ambrosio and Heiser, 2016). For details about the data set see Gormley and Murphy, 2008.

Source

https://projecteuclid.org/journals/annals-of-applied-statistics/volume-2/issue-4/A-mixture-of-experts-model-for-rank-data-with/10.1214/08-AOAS178.full?tab=ArticleLinkSupplemental

References

Gormley, I.C., and Murphy, T.B. (2008). A mixture of experts model for rank data withapplications in election studies. Annals of Applied Statistics 2(4): 1452-1477. DOI: 10.1214/08-AOAS178

D'Ambrosio, A., and Heiser W.J. (2016). A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika, vol. 81 (3), pp.774-94. DOI: 10.1007/s11336-016-9505-1.

Examples

data(Irish)
data(Irish)

Utility function

Description

A utility function completing the output of the function ranktree.

Usage

layouttree(Tree)
layouttree(Tree)

Arguments

Tree

an object of the class "ranktree"

Value

an object of the class "ranktree" completing the output of the function ranktree

Author(s)

Antonio D'Ambrosio [email protected]

Path of a terminal node

Description

Given an object of the class "ranktree", it visualize the path leading to the terminal node

Usage

nodepath(termnode, Tree)
nodepath(termnode, Tree)

Arguments

`termnode`	The terminal node of which the path has to be extracted
`Tree`	An object of the class "ranktree"

Value

The path leading to the terminal node

Author(s)

Antonio D'Ambrosio [email protected]

Examples


data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)
#get information about all the paths leading to terminal nodes
paths <- treepaths(tree)
#see the path for terminal node number 8
nodepath(termnode=8,tree)

 
data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)
#get information about all the paths leading to terminal nodes
paths <- treepaths(tree)
#see the path for terminal node number 8
nodepath(termnode=8,tree)

Plot tree-based structure or pruning sequence of ranktree

Description

Plot the tree coming from the ranktree or the pruning sequence of the ranktree

Usage

## S3 method for class 'ranktree'
plot(
  x,
  plot.type = "tree",
  dispclass = FALSE,
  valtree = NULL,
  taos = TRUE,
  ...
)
## S3 method for class 'ranktree'
plot(
  x,
  plot.type = "tree",
  dispclass = FALSE,
  valtree = NULL,
  taos = TRUE,
  ...
)

Arguments

`x`	An object of the class "ranktree"
`plot.type`	One among "tree" or "pruningseq"
`dispclass`	Display the median ranking above terminal nodes. Default option: FALSE
`valtree`	If plot.type="pruningseq", it shows the Tau_x rank correlation coefficient or the error along the pruning sequence on the training set. If valtree is the output of the function `validatetree`, it shows either the Tau_x rank correlation coefficient or the error along the pruning sequence of also the decision tree (validated by wither test set or cross-validation)
`taos`	If plot.type="pruningseq", it plots the Tau_x rank correlation coefficient along the pruning sequence. If taos=FALSE, it plots the error.
`...`	System reserved (No specific usage)

Value

the plot of either the tree or the pruning sequence

Author(s)

Antonio D'Ambrosio [email protected]

Examples

data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
plot(tree,dispclass=TRUE)
  

data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
plot(tree,dispclass=TRUE)
#test set validation
vtreetest <- validatetree(tree,testX=EVS$predictors[-training,],EVS$rankings[-training,]) 
dtree <- getsubtree(tree,vtreetest$best_tau)
plot(dtree,dispclass=TRUE)
#see the global weigthted tau_X rank correlation coefficients
plot(tree,plot.type="pruningseq",valtree=vtreetest)
#see the error rates
plot(tree,plot.type="pruningseq",valtree=vtreetest, taos=FALSE)


data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
plot(tree,dispclass=TRUE)
  

data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
plot(tree,dispclass=TRUE)
#test set validation
vtreetest <- validatetree(tree,testX=EVS$predictors[-training,],EVS$rankings[-training,]) 
dtree <- getsubtree(tree,vtreetest$best_tau)
plot(dtree,dispclass=TRUE)
#see the global weigthted tau_X rank correlation coefficients
plot(tree,plot.type="pruningseq",valtree=vtreetest)
#see the error rates
plot(tree,plot.type="pruningseq",valtree=vtreetest, taos=FALSE)

Predict the median rankings for new observations

Description

Predict the median rankings in a tree-based structure built with ranktree for new observations

Usage

## S3 method for class 'ranktree'
predict(object, newx, ...)
## S3 method for class 'ranktree'
predict(object, newx, ...)

Arguments

`object`	An object of the class "ranktree"
`newx`	A dataframe of the same nature of the predictor dataframe with which the tree has been built
`...`	System reserved (No specific usage)

Value

A list containing:

rankings		the fit in terms of rankings
orderings		the fit in terms of orderings
info		dataframe containing the terminal nodes in which the new x fall down, then the new x and the fit (in terms of rankings)

Author(s)

Antonio D'Ambrosio [email protected]

Examples


data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
#use the function predict ro predict rankings for new predictors
rankfit <- predict(tree,newx=EVS$predictors[-training,])
#fit in terms of rankings
rankfit$rankings
#fit in terms of orderings
rankfit$orderings
# information about the fit (terminal node, predictor and fit (in terms of rankings))
rankfit$info




data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
#use the function predict ro predict rankings for new predictors
rankfit <- predict(tree,newx=EVS$predictors[-training,])
#fit in terms of rankings
rankfit$rankings
#fit in terms of orderings
rankfit$orderings
# information about the fit (terminal node, predictor and fit (in terms of rankings))
rankfit$info

S3 methods for cca

Description

Print methods for objects of class cca

Usage

## S3 method for class 'cca'
print(x, ...)
## S3 method for class 'cca'
print(x, ...)

Arguments

`x`	An object of the class "cca"
`...`	not used

Value

print a brief summary of the CCA

S3 methods for ranktree

Description

Print methods for objects of class ranktree

Usage

## S3 method for class 'ranktree'
print(x, ...)
## S3 method for class 'ranktree'
print(x, ...)

Arguments

`x`	An object of the class "ranktree"
`...`	not used

Value

print a brief summary of the prediction tree

Examples


data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
tree



data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
tree

Recursive partitioning method for the prediction of preference rankings based upon Kemeny distances

Description

Recursive partitioning method for the prediction of preference rankings based upon Kemeny distances.

Usage

ranktree(Y, X, prunplot = FALSE, control = ranktreecontrol(...), ...)
ranktree(Y, X, prunplot = FALSE, control = ranktreecontrol(...), ...)

Arguments

`Y`	A n by m data matrix, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns.
`X`	A dataframe containing the predictor, that must have n rows.
`prunplot`	prunplot=TRUE returns the plot of the pruning sequence. Default value: FALSE
`control`	a list of options that control details of the `ranktree` algorithm governed by the function `ranktreecontrol`. The options govern the minimum size within node to split (the default value is 0.1*n, where n is the total sample size), the bound on the decrease in impurity, (default, 0.01), the algorithm chosen to compute the median ranking (default, "quick"), and other options related to the consrank algorithm, which is called by `ranktree`
`...`	arguments passed bypassing `ranktreecontrol`

Details

X		the predictors: it must be a dataframe
Y		the response variable: the matrix of the rankings
node		a list containing teh tree-based structure:
	number	node number
	terminal	logical: TRUE is terminal node
	father	father node number of the current node
	idfather	id of the father node of the current node
	size	sample size within node
	impur	impurity at node
	wimpur	weighted impurity at node
	idatnode	id of the observations within node
	class	median ranking within node in terms of orderings
	nclass	median ranking within node in terms of rankings
	mclass	eventual multiple median rankings
	tau	Tau_x rank correlation coefficient at node
	wtau	weighted Tau_x rank correlation coefficient at node
	error	error at node
	werror	weighted error at node
	varsplit	variables generating split
	varsplitid	id of variables generating split
	cutspli	splitting point
	children	children nodes generated by current node
	idchildren	id of children nodes generated by current node
	...	other info about node
control		parameters used to build the tree
numnodes		number of nodes of the tree
tsynt		list containing the synthesis of the tree:
	children	list containing all information about leaves
	parents	list containing all information about parent nodes
geneaoly		data frame containing information about all nodes
idgenealogy		data frame containing information about all nodes in terms of nodes id
idparents		id of the parents of all the nodes
goodness		goodness -and badness- of fit measures of the tree: Tau_X, error, impurity
nomin		information about nature of the predictors
alpha		alpha parameter for pruning sequence
pruneinfo		list containing information about the pruning sequence:
	prunelist	information about the pruning
	tau	tau_X rank correlation coefficient of each subtree
	error	error of each subtree
	termnodes	number of terminal nodes of each subtree
subtrees		list of each subtree created with the cost-complexity pruning procedure

Value

An object of the class ranktree. See details for detailed information.

Author(s)

Antonio D'Ambrosio [email protected]

References

D'Ambrosio, A., and Heiser W.J. (2016). A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika, vol. 81 (3), pp.774-94.

Examples

data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)


data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)

#plot the tree
plot(tree,dispclass=TRUE)

#visualize information
summary(tree)

#get information about the paths leading to terminal nodes (all the paths)
infopaths <- treepaths(tree)

#the terminal nodes
infopaths$leaves

#sample size within each terminal node
infopaths$size

#visualize the path of the second leave (terminal node number 8)
infopaths$paths[[2]]

#alternatively
nodepath(termnode=8,tree)


set.seed(132) #for reproducibility
#validation of the tree via v-fold cross-validation (default value of V=5)
vtree <- validatetree(tree,method="cv")

#extract the "best" tree
dtree <- getsubtree(tree,vtree$best_tau)

summary(dtree)

#plot the validated tree
plot(dtree,dispclass=TRUE)

#predicted rankings
rankfit <- predict(dtree,newx=Irish$predictors)

#fit of rankings
rankfit$rankings

#fit in terms of orderings
rankfit$orderings

#all info about the fit (id og the leaf, predictor values, and fit)
rankfit$orderings



data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)


data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)

#plot the tree
plot(tree,dispclass=TRUE)

#visualize information
summary(tree)

#get information about the paths leading to terminal nodes (all the paths)
infopaths <- treepaths(tree)

#the terminal nodes
infopaths$leaves

#sample size within each terminal node
infopaths$size

#visualize the path of the second leave (terminal node number 8)
infopaths$paths[[2]]

#alternatively
nodepath(termnode=8,tree)


set.seed(132) #for reproducibility
#validation of the tree via v-fold cross-validation (default value of V=5)
vtree <- validatetree(tree,method="cv")

#extract the "best" tree
dtree <- getsubtree(tree,vtree$best_tau)

summary(dtree)

#plot the validated tree
plot(dtree,dispclass=TRUE)

#predicted rankings
rankfit <- predict(dtree,newx=Irish$predictors)

#fit of rankings
rankfit$rankings

#fit in terms of orderings
rankfit$orderings

#all info about the fit (id og the leaf, predictor values, and fit)
rankfit$orderings

Utility function

Description

Utility function to use to set the control arguments of ranktree

Usage

ranktreecontrol(
  num = NULL,
  decrmin = 0.01,
  algorithm = "quick",
  full = FALSE,
  itermax = 10,
  np = 15,
  gl = 100,
  ff = 0.4,
  cr = 0.9,
  proc = FALSE,
  ps = FALSE
)
ranktreecontrol(
  num = NULL,
  decrmin = 0.01,
  algorithm = "quick",
  full = FALSE,
  itermax = 10,
  np = 15,
  gl = 100,
  ff = 0.4,
  cr = 0.9,
  proc = FALSE,
  ps = FALSE
)

Arguments

`num`	The maximum number of observations in a node to be split: default, 10% of the sample size
`decrmin`	Minimum decrease in impurity
`algorithm`	The algorithm used to compute the median ranking. One among"BB", "quick" (default), "fast" and "decor"
`full`	Specifies if the median ranking must be searched in the universe of rankings including all the possible ties. Default: FALSE
`itermax`	Number of iterations for "fast" and "decor" algorithms. itermax=10 is the default option.
`np`	(for "decor" only) the number of population individuals. np=15 is the default option.
`gl`	(for"decor" only) generations limit, maximum number of consecutive generations without improvement. gl=100 is the default option.
`ff`	(for"decor" only) the scaling rate for mutation. Must be in [0,1]. ff=0.4 is the default option.
`cr`	(for"decor" only) the crossover range. Must be in [0,1]. cr=0.9 is the default option.
`proc`	(for "BB" only) proc=TRUE allows the branch and bound algorithm to work in difficult cases, i.e. when the number of objects is larger than 15 or 25. proc=FALSE is the default option
`ps`	If PS=TRUE, on the screen some information about how many branches are processed are displayed. Default value: FALSE

Value

A list containing all the control parameters

Author(s)

Antonio D'Ambrosio [email protected]

S3 methods for ranktree

Description

Summary methods for objects of class cca

Usage

## S3 method for class 'cca'
summary(object, ...)
## S3 method for class 'cca'
summary(object, ...)

Arguments

`object`	An object of the class "cca"
`...`	not used

Value

it shows the summary of the prediction tree

S3 methods for ranktree

Description

Summary methods for objects of class ranktree

Usage

## S3 method for class 'ranktree'
summary(object, ...)
## S3 method for class 'ranktree'
summary(object, ...)

Arguments

`object`	An object of the class "ranktree"
`...`	not used

Value

it shows the summary of the prediction tree

Examples


data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
summary(tree)

data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
summary(tree)

Path of a terminal node

Description

Given an object of the class "ranktree", it extracts the paths of all terminal nodes

Usage

treepaths(Tree)
treepaths(Tree)

Arguments

Tree

An object of the class "ranktree"

Value

A list containing:

leaves		the number of the terminal nodes
size		the sample size within each terminal nodes
paths		a list containing all the paths

Author(s)

Antonio D'Ambrosio [email protected]

Examples


data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)
#get information about all the paths leading to terminal nodes
paths <- treepaths(tree)
#
#the terminal nodes
paths$leaves
#
#sample size within each terminal node
paths$size
#
#visualize the path of the second leave (terminal node number 8)
paths$paths[[2]]



data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)
#get information about all the paths leading to terminal nodes
paths <- treepaths(tree)
#
#the terminal nodes
paths$leaves
#
#sample size within each terminal node
paths$size
#
#visualize the path of the second leave (terminal node number 8)
paths$paths[[2]]

University rankings dataset.

Description

University rankings dataset was analysed by Dittrich, Hatzinger and Katzenbeisser (1998) to investigate paired comparison data concerning European universities and student's characteristics with the goal to show that university rankings are different for different groups of students. Here both raw data (with paired comparisons) and the version with rankings are preesented (see details). A survey of 303 students studying at the Vienna University of Economics was carried out to examine the student's preference of six universities, namely London, Paris, Milan, St. Gallen, Barcelona and Stockholm. The data set contains 23 variables. The first 15 digits in each row indicate the preferences of a student. For a given comparison, responses were coded by 1 if the first preference was preferred, by 2 if the second university was preferred, and by 3 if universities are tied. All rows containing missing ranked Universities were skipped.

Usage

data("Univranks")data("Univranks")

Format

The format is: List of 3

$ rawdata: 'data.frame': 212 obs. of 23 variables: the first 15 are the paired comparisons coded as follows: (1: the first is preferred to the second; 2: the second is preferred to the fisrt; 3 tied)

$ LP : comparison of London to Paris

$ LM : comparison of London to Milan

$ PM : comparison of London to Milan

$ LSg : comparison of London to St. Gallen

$ PSg : comparison of Paris to St. Gallen

$ MSg : comparison of Milan to St. Gallen

$ LB : comparison of London to Barcelona

$ PB : comparison of Paris to Barcelona

$ MB : comparison of Milan to Barcelona

$ SgB : comparison of St. Gallen to Barcelona

$ LSt : comparison of London to Stockholm

$ PSt : comparison of Paris to Stockholm

$ MSt : comparison of Milan to Stockholm

$ SgSt: comparison of St. Gallen to Stockholm

$ BSt : comparison of Barcelona to Stockholm

$ Stud: Factor w/ 2 levels "commerce","other"

$ Eng : Factor w/ 2 levels "good","poor""

$ Fra : Factor w/ 2 levels "good","poor"

$ Spa : Factor w/ 2 levels "good","poor"

$ Ita : Factor w/ 2 levels "good","poor"

$ Wor : Factor w/ 2 levels "no","yes"

$ Deg : Factor w/ 2 levels "no","yes"

$ Sex : Factor w/ 2 levels "female","male"

$ predictors:'data.frame': 212 obs. of 8 variables( the last 8 variables of the "rawdata" dataframe

$ rankings : matrix of preference rankings. The columns are: "L" (London), "P" (Paris), "M" (Milan), "Sg" (St. Gallen), "B" (Barcerlona), "St" (Stockholm)

Details

To obtain the preference rankings from the paired comparisons the procedure has been the following: the first row of the raw data is [1 3 2 1 2 1 1 2 1 1 1 2 1 1 2]. London is preferred to Paris, St. Gallen, Barcelona Stockholm (LP, LM, LSg, LB and LSt are always equal to 1), and there is no preference between London and Milan (they are tied); Milan is preferred to Paris (PM = 2), St. Gallen, Barcelona and Stockholm; and so on. The first ordering is then <L M Sg St B P> corresponding to a ranking [1,5,1,2,4,3], where the columns indicate L P M Sg B St.

Source

http://www.blackwellpublishers.co.uk/rss

References

Dittrich, R., Hatzinger, R., and Katzenbeisser, W. (1998). Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47(4), 511-525. DOI: 10.1111/1467-9876.00125

D'Ambrosio, A. (2008). Tree based methods for data editing and preference rankings. Ph.D. thesis, University of Naples Federico II. https://www.doi.org/10.6092/UNINA/FEDOA/2746

Examples

data(Univranks)
data(Univranks)

Validation of the tree for preference rankings

Description

Validation of the tree either with a test set procedure or with v-fold cross validation

Usage

validatetree(
  Tree,
  testX = NULL,
  testY = NULL,
  method = "test",
  V = 5,
  plotting = TRUE
)
validatetree(
  Tree,
  testX = NULL,
  testY = NULL,
  method = "test",
  V = 5,
  plotting = TRUE
)

Arguments

`Tree`	An object of the class "ranktree" coming form te function `ranktree`
`testX`	The data frame containing the test set (predictors)
`testY`	The matrix ontaining the test set (response)
`method`	One between "test" (default) or "cv"
`V`	The cross-validation parameter. Default V=5
`plotting`	With the defaul option plotting=TRUE, the pruning sequence plot is visualized

Value

A list containing:

tau		the Tau_x rank correlation coefficient of the sequence of the trees
error		the error of the sequence of the trees
termnodes		the number of terminal nodes of the sequence of the trees
best_tau		the best tree in terms of Tau_x rank correlation coefficient
best_error		the best tree in terms of error (it is the same)
validation		information about the validation procedure

Author(s)

Antonio D'Ambrosio [email protected]

Examples


data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
#test set validation
vtreetest <- validatetree(tree,testX=EVS$predictors[-training,],EVS$rankings[-training,]) 
#cross-validation
vtreecv <- validatetree(tree,method="cv",V=10) 


data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
#test set validation
vtreetest <- validatetree(tree,testX=EVS$predictors[-training,],EVS$rankings[-training,]) 
#cross-validation
vtreecv <- validatetree(tree,method="cv",V=10)

Package 'ConsRankClass'

Help Index

K-Median Cluster Component Analysis

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Utility function

Description

Usage

Arguments

Value

Author(s)

See Also

European Values Studies (EVS) data

Description

Usage

Format

Details

Source

References

Examples

Normalized Degree of Concordance (NDC) and Adjusted Concordance Index (ACI)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Determine a tree from the main tree-based structure

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Irish Election data set

Description

Usage

Format

Details

Source

References

Examples

Utility function

Description

Usage

Arguments

Value

Author(s)

Path of a terminal node

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plot tree-based structure or pruning sequence of ranktree

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Predict the median rankings for new observations

Description

Usage

Arguments

Value