Title: | Utilities to Fit Paired Comparison Models for Preferences |
---|---|
Description: | Generates design matrix for analysing real paired comparisons and derived paired comparison data (Likert type items/ratings or rankings) using a loglinear approach. Fits loglinear Bradley-Terry model (LLBT) exploiting an eliminate feature. Computes pattern models for paired comparisons, rankings, and ratings. Some treatment of missing values (MCAR and MNAR). Fits latent class (mixture) models for paired comparison, rating and ranking patterns using a non-parametric ML approach. |
Authors: | Reinhold Hatzinger [aut], Marco Johannes Maier [cre] |
Maintainer: | Marco Johannes Maier <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.8-36 |
Built: | 2024-11-23 06:33:38 UTC |
Source: | CRAN |
Generates design matrix for analysing real paired comparisons and derived paired comparison data (Likert-type items/ratings or rankings) using a loglinear approach. Fits loglinear Bradley-Terry model (LLBT) exploiting an eliminate feature. Computes pattern models for paired comparisons, rankings, and ratings. Some treatment of missing values (MCAR and MNAR). Fits pattern mixture models using a non-parametric ML approach.
Package: | prefmod |
Type: | Package |
Version: | 0.8-36 |
Date: | 2023-09-28 |
License: | GPL (>= 2) |
Reinhold Hatzinger, Marco J. Maier
Maintainer: Marco J. Maier ([email protected])
Hatzinger, R., & Dittrich, R. (2012). prefmod: An R Package for Modeling Preferences Based on Paired Comparisons, Rankings, or Ratings. Journal of Statistical Software, 48(10), 1–31. https://www.jstatsoft.org/v48/i10/
# mini example with three Likert items and two subject covariates # using example data "xmpl" in the package dsgnmat <- patt.design(xmpl, nitems = 3, resptype = "rating", ia = TRUE, cov.sel = "ALL") head(dsgnmat) # fit of Critchlov & Fligner (1991) Salad Dressings Data pattR.fit(salad, nitems = 4) # alternatively use glm() with patt.design() sal <- patt.design(salad, nitems = 4, resptype = "ranking") glm(y ~ A+B+C+D, data = sal, family = poisson)
# mini example with three Likert items and two subject covariates # using example data "xmpl" in the package dsgnmat <- patt.design(xmpl, nitems = 3, resptype = "rating", ia = TRUE, cov.sel = "ALL") head(dsgnmat) # fit of Critchlov & Fligner (1991) Salad Dressings Data pattR.fit(salad, nitems = 4) # alternatively use glm() with patt.design() sal <- patt.design(salad, nitems = 4, resptype = "ranking") glm(y ~ A+B+C+D, data = sal, family = poisson)
The result of the 1987 season for seven baseball teams in the Eastern Division of the American League according to the (home team, away team) classification are shown.
baseball
baseball
Baseball is a numeric vector with the results for the season according to the (home team, away team) classification.
The results of the seven teams Milwaukee, Detroit, Toronto, New York, Boston, Cleveland and Baltimore, that play 13 games each. There is no possibility of ending in a draw.
Alan Agresti, Categorical Data Analysis (Second Edition), 2002 pages 437 and 438
R. Dittrich, R. Hatzinger, and W. Katzenbeisser, Fitting paired comparison models in GLIM. GLIM newsletter 1997
# baseball example (Agresti, 2002, p. 437) # pseudodata for generating a design matrix d1 <- c(rep(0, 21), 1) d2 <- c(1, rep(0, 20), 2) d <- data.frame(rbind(d1, d2)) names(d) <- c(paste0("v", 1:21), "cov") # design matrix des5 <- llbt.design(d, nitems = 7, objnames = c("MIL", "DET", "TOR", "NY", "BOS", "CLE", "BAL"), cat.scov = "cov") des5$y <- baseball des5$mu <- gl(42, 2) pos <- c(rep(1:0, 21), rep(0:1, 21)) # fit model and display results res5 <- gnm(y ~ MIL+DET+TOR+NY+BOS+CLE+BAL + pos, eliminate = mu, data = des5, family = poisson) w5 <- llbt.worth(res5) plot(w5)
# baseball example (Agresti, 2002, p. 437) # pseudodata for generating a design matrix d1 <- c(rep(0, 21), 1) d2 <- c(1, rep(0, 20), 2) d <- data.frame(rbind(d1, d2)) names(d) <- c(paste0("v", 1:21), "cov") # design matrix des5 <- llbt.design(d, nitems = 7, objnames = c("MIL", "DET", "TOR", "NY", "BOS", "CLE", "BAL"), cat.scov = "cov") des5$y <- baseball des5$mu <- gl(42, 2) pos <- c(rep(1:0, 21), rep(0:1, 21)) # fit model and display results res5 <- gnm(y ~ MIL+DET+TOR+NY+BOS+CLE+BAL + pos, eliminate = mu, data = des5, family = poisson) w5 <- llbt.worth(res5) plot(w5)
Online configuration systems allow customers to actively participate in the creation of products and become increasingly important in various market sectors. Dabic and Hatzinger report a study on car configurators that aimed at investigating the effects of certain person characteristics (such as gender) on the configuration process. Subjects were asked to configure a car according to their preferences. They could choose freely from several modules: such as exterior and interior design, technical equipment, brand, price, and producing country. The order of module choice was recorded as ranks. Since not all modules had to be chosen the response format was partial rankings.
carconf
carconf
A data frame with 435 observations on the following 9 variables.
price
rank (1 highest preference)
exterior
rank
brand
rank
tech.equip
rank
country
rank
interior
rank
sex
1 female 2 male
age
1 17–29 years, 2 30–49 years, 3 50+ years
segment
preferred car type: 1 premium-class, 2 medium-class, 3 low-budget
Dabic, M., & Hatzinger, R. (2009). Zielgruppenadäquate Abläufe in Konfigurationssystemen – eine empirische Studie im Automobilmarkt: Das Paarvergleichs-Pattern-Modell für Partial Rankings. In R. Hatzinger, R. Dittrich, & T. Salzberger (Eds.), Präferenzanalyse mit R: Anwendungen aus Marketing, Behavioural Finance und Human Resource Management (pp. 119–150). Wien: Facultas.
head(carconf)
head(carconf)
A survey of 303 students was carried out to examine the students' preferences of 6 universities (London, Paris, Milano, St. Gallen, Barcelona and Stockholm) with a 17 items questionnaire.
The first 15 variables indicate the subjects' preferences.
For a given comparison the responses are coded as 0 if the first university was preferred, 2 if the second university was preferred and 1 if no decision was made.
The variable ENG
contains the knowledge of English and the variable SEX
contains the gender.
cemspc
cemspc
A data frame with 303 rows for the subjects containing the outcome of the 15 comparisons and the two covariates
V1
London vs. Paris
V2
London vs. Milano
V3
Paris vs. Milano
V4
London vs. St. Gallen
V5
Paris vs. St. Gallen
V6
Milano vs. St. Gallen
V7
London vs. Barcelona
V8
Paris vs. Barcelona
V9
Milano vs. Barcelona
V10
St. Gallen vs. Barcelona
V11
London vs. Stockholm
V12
Paris vs. Stockholm
V13
Milano vs. Stockholm
V14
St. Gallen vs. Stockholm
V15
Barcelona vs. Stockholm
ENG
Knowledge of English: (1) good, (2) poor
SEX
Gender: (1) female, (2) male
Dittrich, R., Hatzinger, R., & Katzenbeisser, W. (1998). Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. Applied Statistics, 47(4), 511–525.
old_par <- par(mfrow = c(4, 4)) for(i in 1:15){ barplot(table(cemspc[, i])) } par(old_par)
old_par <- par(mfrow = c(4, 4)) for(i in 1:15){ barplot(table(cemspc[, i])) } par(old_par)
For a given paired comparisons data set the function calculates and prints the number of missing comparisons and the number of times objects are missing. It can also be used to avoid failure of nonresponse-parameter for nonresponse models in
checkMIS(obj, nitems, MISmodel = "obj", obj.names = NULL, verbose = FALSE)
checkMIS(obj, nitems, MISmodel = "obj", obj.names = NULL, verbose = FALSE)
obj |
dataframe or datafile path/name (like |
nitems |
the number of compared objects, not the number of comparisons (like |
MISmodel |
specifies the nonresponse model, either |
obj.names |
character vector with names for objects. |
verbose |
if |
a logical vector (returned invisibly) specifying for which object/comparison there are NA
responses in the data (obj)
.
# no missing NAs in dataset dat4 checkMIS(dat4, nitems = 4, verbose = TRUE) # generates data set with three items and some missing values in # comparison (23), column 3, then there are no NAs for object 1 data3 <- dat4[, 1:3] idx3 <- sample(1:100, 10) data3[idx3, 3] <- NA checkMIS(data3, nitems = 3, verbose = TRUE) # estimate MCAR PC pattern model for data3 with NA indicators alpha1 # cannot be estimated being accommodated by using checkMIS pattPC.fit(data3, nitems = 3, MISalpha = checkMIS(data3, nitems = 3))
# no missing NAs in dataset dat4 checkMIS(dat4, nitems = 4, verbose = TRUE) # generates data set with three items and some missing values in # comparison (23), column 3, then there are no NAs for object 1 data3 <- dat4[, 1:3] idx3 <- sample(1:100, 10) data3[idx3, 3] <- NA checkMIS(data3, nitems = 3, verbose = TRUE) # estimate MCAR PC pattern model for data3 with NA indicators alpha1 # cannot be estimated being accommodated by using checkMIS pattPC.fit(data3, nitems = 3, MISalpha = checkMIS(data3, nitems = 3))
A fictitious dataset with 100 observations on 6 paired comparisons.
The responses get the value if the first object in a comparison is preferred and
otherwise.
For the arrangement of objects and comparisons see llbt.design
.
dat4
dat4
A data frame with 100 observations on 6 comparisons (comp1
to comp6
)
str(dat4) # to get a general idea we use the histogram plot old_par <- par(mfrow = c(2, 3)) for(i in 1:6){ barplot(table(dat4[, i])) } par(old_par)
str(dat4) # to get a general idea we use the histogram plot old_par <- par(mfrow = c(2, 3)) for(i in 1:6){ barplot(table(dat4[, i])) } par(old_par)
Eurobarometer public opinion surveys have been carried out in all member states of the European Union since 1973. Eurobarometer 55.2 was a special survey collected in 2001 and designed to elicit information on European experience and perception of science and technology. One question, of identical form in all states, asked respondents about their sources of information about science, and requested them to rank them in order of importance. The items were: Television, Radio, Press, Scientific Magazines, Internet, School and University.
euro55.2.des
euro55.2.des
The data is a design data frame generated from the original data using patt.design
.
Each row represents a (derived) paired comparison response pattern crossclassified by the categorical subject covariates SEX
and AGE4
.
The columns are:
y
counts
TV
design vector for Television
RAD
design vector for Radio
NEWSP
design vector for Newspapers and magazine
SCIMAG
design vector for Scientific magazines
WWW
design vector for The internet
EDINST
design vector for School/University
SEX
Gender: Factor with 2 levels: (1) male, (2) female
AGE4
Age: Factor with 4 levels: (1) 15–24, (2) 25–39, (3) 40–54, (4) 55+
The original data contained 16 029 cases, after removal of cases with missing values and improper rankings this design data frame is based on 12216 observations.
https://data.europa.eu/data/datasets/s209_55_2_ebs154
Christensen, T. (2001). Eurobarometer 55.2: Europeans, science and technology. Technical report, European Opinion Research Group, Commission of the European Communities, Brussels.
str(euro55.2.des)
str(euro55.2.des)
The function expands aggregated data into casewise data. For instance, for a contingency table given in the form of a design matrix and corresponding counts the function sets up a matrix where each design row is repeated according to the frequencies for that row.
expand.mat(mat, freq)
expand.mat(mat, freq)
mat |
a matrix (or column vector) or data frame to be expanded |
freq |
a vector of counts |
the expanded matrix.
This utility allows to generate input data for the design generating and model fitting functions of the prefmod package from aggregated data.
tdat <- expand.mat(tennis[, -1], tennis[, 1]) head(tdat)
tdat <- expand.mat(tennis[, -1], tennis[, 1]) head(tdat)
NA
s): Negative Attitudes towards ImmigrantsA survey of 98 students was carried out to examine student's (negative) attitudes towards immigrants. Four statements had to be compared with regard to higher acceptance. The four statements were
Foreigners increase crime rates (crimRate
)
Foreigners take apprenticeship training position away (position
)
Foreigners are a strain for the social welfare system (socBurd
)
Foreigners threaten our culture (culture
)
The first 6 variables in the dataset indicate the preferences of the students.
For a given comparison the responses are coded by if the first item was preferred,
if the second university was preferred and
if no decision was made.
The variable
ENG
characterises the knowledge of English and the variable SEX
characterises the gender.
immig
immig
A data frame with 98 observations on the following 9 variables.
V12
crimRate
vs. position
V13
crimRate
vs. socBurd
V23
position
vs. socBurd
V14
crimRate
vs. culture
V24
position
vs. culture
V34
socBurd
vs. culture
SEX
Gender: (1) male, (2) female
AGE
Age (continuous)
NAT
Nationality (Factor). Cannot directly be used in prefmod
Weber, D., & Hatzinger, R. (2011). A novel approach for modelling paired comparisons data with non-ignorable missing values on students' attitudes towards foreigners. Data Analysis Bulletin, 12, 11–22.
summary(immig)
summary(immig)
In 2000 the International Social Survey Programme (ISSP) has addressed the topic of attitudes to environmental protection and preferred government measures for environmental protection. This dataset focusses on six items (with a 5-point rating scale (Likert type)) where respondents from Austria and Great Britain were asked about their perception of environmental dangers.
issp2000
issp2000
A data frame with 1595 observations on the following 11 variables. The first six variables are items to be answered on a 5-point rating scale (Likert type) with response categories: (1) extremely dangerous for the environment to (5) not dangerous at all for the environment.
CAR
air pollution caused by cars
IND
air pollution caused by industry
FARM
pesticides and chemicals used in farming
WATER
pollution of country's rivers, lakes and streams
TEMP
a rise in the world's temperature
GENE
modifying the genes of certain crops
SEX
gender: (1) male, (2) female
URB
location of residence: (1) urban area, (2) suburbs of large cities, small town, county seat (3) rural area
AGE
age: (1) < 40 years, (2) 41–59 years, (3) 60+ years
CNTRY
country: (1) Great Britain, (2) Austria
EDU
education: (1) below A-level/matrice, (2) A-level/matrice or higher
ISSP Research Group (2003). International Social Survey Programme: Environment II – ISSP 2000. GESIS Data Archive, Cologne. ZA3440 Data file Version 1.0.0, doi:10.4232/1.3440
https://www.gesis.org/issp/modules/issp-modules-by-topic/environment/2000
(Usage regulations)
Dittrich, R., Francis, B.J., Hatzinger R., Katzenbeisser, W. (2007). A Paired Comparison Approach for the Analysis of Sets of Likert Scale Responses. Statistical Modelling, Vol. 7, No. 1, 3–28.
str(issp2000)
str(issp2000)
The function llbt.design
returns a data frame containing the design matrix for a loglinear paired comparison model.
Additionally, the frequencies of the pairwise comparisons are computed and are stored in the first column of the data frame.
llbt.design(data, nitems = NULL, objnames = "", objcovs = NULL, cat.scovs = NULL, num.scovs = NULL, casewise = FALSE, ...)
llbt.design(data, nitems = NULL, objnames = "", objcovs = NULL, cat.scovs = NULL, num.scovs = NULL, casewise = FALSE, ...)
data |
either a data frame or a data file name. |
nitems |
number of items (objects). |
objnames |
an optional character vector with names for the objects.
These names are the columns names in the output data frame.
If |
objcovs |
an optional data frame with object specific covariates.
The rows correspond to the objects, the columns define the covariates.
The column names of this data frame are later used to fit the covariates.
Factors are not allowed.
In that case dummy variables have to be set up manually (favourably using |
cat.scovs |
a character vector with the names of the categorical subject covariates in the data file to be included into the design matrix.
(Example: |
num.scovs |
analogous to |
casewise |
If |
... |
deprecated options to allow for backwards compatibility (see Deprecated below) |
The function llbt.design
allows for different scenarios mainly concerning
Paired comparison data. Responses can be either simply preferred – not preferred or ordinal (strongly preferred – ... – not at all preferred). In both cases an undecided category may or may not occur. If there are more than three categories a they are reduced to two or three response categories.
Item covariates. The design matrix for the basic model has columns for the items (objects) and for each response category.
Object specific covariates.
For modelling certain characteristics of objects a reparameterisation can be included in the design.
This is sometimes called conjoint analysis.
The object specific covariates can be continuous or dummy variables.
For the specification see Argument objcovs
above.
Subject covariates.
For modelling different preference scales for the items according to characteristics of the respondents categorical and/or continuous subject covariates can be included in the design.
Categorical subject covariates: The corresponding variables are defined as numerical vectors where the levels are specified with consecutive integers starting with 1.
This format must be used in the input data file.
These variables are factor(s)
in the output data frame.
Continuous subject covariates: also defined as numerical vectors in the input data frame.
If present, the resulting design structure is automatically expanded, i.e., there are as many design blocks as there are subjects.
Object specific covariates. The objects (items) can be reparameterised using an object specific design matrix. This allows for scenarios such as conjoint analysis or for modelling some characteristics shared by the objects. The number of such characteristics must not exceed the number of objects minus one.
The output is a dataframe of class llbtdes
.
Each row represents a decision in a certain comparison.
Dependent on the number of response categories, comparisons are made up of two or three rows in the design matrix.
If subject covariates are specified, the design matrix is duplicated as many times as there are combinations of the levels of each categorical covariate or, if casewise = TRUE
, as there are subjects in the data.
Each individual design matrix consists of rows for all comparisons.
The first column contains the counts for the paired comparison response patterns and is labelled with y
.
The next columns are the covariates for the categories (labelled as g0
, g1
, etc.).
In case of an odd number of categories, g1
can be used to model an undecided effect.
The subsequent columns are covariates for the items.
If specified, subject covariates and then object specific covariates follow.
Responses have to be coded as consecutive integers (e.g., (0, 1), or (1, 2, 3, ...), where the smallest value corresponds to (highest) preference for the first object in a comparison.
For (ordinal) paired comparison data (resptype = "paircomp"
) the codings ,
,
,
, etc. can also be used.
Then negative numbers correspond to not preferred, 0 to undecided.
Missing responses (for paired comparisons but not for subject covariates) are allowed under a missing at random assumption and specified via
NA
.
Input data (via the first argument obj
in the function call) is specified either through a dataframe or a datafile in which case obj
is a path/filename.
The input data file if specified must be a plain text file with variable names in the first row as readable via the command read.table(datafilename, header = TRUE)
.
The leftmost columns must be the responses to the paired comparisons (where the mandatory order of comparisons is (12) (13) (23) (14) (24) (34) (15) (25) etc.), optionally followed by columns for subject covariates. If categorical, these have to be specified such that the categories are represented by consecutive integers starting with 1. Missing values for subject covariates are not allowed and treated such that rows with NAs are removed from the resulting design structure and a message is printed.
For an example see xmpl
.
The following options are for backwards compatibility and should no longer be used.
same as casewise
.
same as cat.scovs
.
Options for requesting GLIM commands and data structures are no longer supported.
Specifying the input to llbt.design
via a control list is also deprecated.
If you want to use these features you have to install prefmod <= 0.8-22.
Reinhold Hatzinger
Dittrich, R., Hatzinger, R., & Katzenbeisser, W. (1998). Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47(4), 511–252. doi:10.1111/1467-9876.00125
patt.design
,
llbt.worth
,
llbtPC.fit
# cems universities example des <- llbt.design(cemspc, nitems = 6, cat.scovs = "ENG") res0 <- gnm(y ~ o1+o2+o3+o4+o5+o6 + ENG:(o1+o2+o3+o4+o5+o6), eliminate = mu:ENG, data = des, family = poisson) summary(res0) # inclusion of g1 allows for an undecided effect res <- gnm(y ~ o1+o2+o3+o4+o5+o6 + ENG:(o1+o2+o3+o4+o5+o6) + g1, eliminate = mu:ENG, data = des, family = poisson) summary(res) # calculating and plotting worth parameters wmat <- llbt.worth(res) plot(wmat) # object specific covariates LAT <- c(0, 1, 1, 0, 1, 0) # latin cities EC <- c(1, 0, 1, 0, 0, 1) OBJ <- data.frame(LAT, EC) des2 <- llbt.design(cemspc, nitems = 6, objcovs = OBJ, cat.scovs = "ENG") res2 <- gnm(y ~ LAT + EC + LAT:ENG + g1, eliminate = mu:ENG, data = des2, family = poisson) # calculating and plotting worth parameters wmat2 <- llbt.worth(res2) wmat2 plot(wmat2)
# cems universities example des <- llbt.design(cemspc, nitems = 6, cat.scovs = "ENG") res0 <- gnm(y ~ o1+o2+o3+o4+o5+o6 + ENG:(o1+o2+o3+o4+o5+o6), eliminate = mu:ENG, data = des, family = poisson) summary(res0) # inclusion of g1 allows for an undecided effect res <- gnm(y ~ o1+o2+o3+o4+o5+o6 + ENG:(o1+o2+o3+o4+o5+o6) + g1, eliminate = mu:ENG, data = des, family = poisson) summary(res) # calculating and plotting worth parameters wmat <- llbt.worth(res) plot(wmat) # object specific covariates LAT <- c(0, 1, 1, 0, 1, 0) # latin cities EC <- c(1, 0, 1, 0, 0, 1) OBJ <- data.frame(LAT, EC) des2 <- llbt.design(cemspc, nitems = 6, objcovs = OBJ, cat.scovs = "ENG") res2 <- gnm(y ~ LAT + EC + LAT:ENG + g1, eliminate = mu:ENG, data = des2, family = poisson) # calculating and plotting worth parameters wmat2 <- llbt.worth(res2) wmat2 plot(wmat2)
Function to fit an LLBT using an ELIMINATE feature
llbt.fit(y, Xmodel, q, ncat, maxiter = 100)
llbt.fit(y, Xmodel, q, ncat, maxiter = 100)
y |
response, usually counts |
Xmodel |
design matrix |
q |
number of parameters to eliminate (usually number of comparisons times number of subject covariate levels |
ncat |
number of response categories |
maxiter |
maximum number of iterations (default 100) |
Be careful when specifying the design matrix. Since there is no extrinsic aliasing the matrix must have full rank. Usually, one of the design columns for object must be left out.
Reinhold Hatzinger
Hatzinger, R., & Francis, B. (2004). Fitting paired comparison models in R. https://epub.wu.ac.at/id/eprint/740
# fit basic model casewise mfr <- llbt.design(cemspc, nitems = 6, objnames = c("lo", "pa", "mi", "sg", "ba", "st"), casewise = TRUE) mm <- model.matrix(~ lo+pa+mi+sg+ba + g1, data = mfr) X <- mm[, -1] p <- ncol(X) ncat <- 3 q <- length(levels(mfr$mu)) * length(levels(mfr$CASE)) llbt.fit(mfr$y, X, q, ncat) # fit the (aggregated) model with one subject covariate mfr <- llbt.design(cemspc, nitems = 6, objnames = c("lo", "pa", "mi", "sg", "ba", "st"), cov.sel = "ENG") eng <- mfr$ENG eng <- factor(eng) mm <- model.matrix(~ lo+pa+mi+sg+ba + g1 + (lo+pa+mi+sg+ba):eng, data = mfr) X <- mm[, -1] q <- length(levels(mfr$mu)) * length(levels(eng)) ncat <- 3 llbt.fit(mfr$y, X, q, ncat)
# fit basic model casewise mfr <- llbt.design(cemspc, nitems = 6, objnames = c("lo", "pa", "mi", "sg", "ba", "st"), casewise = TRUE) mm <- model.matrix(~ lo+pa+mi+sg+ba + g1, data = mfr) X <- mm[, -1] p <- ncol(X) ncat <- 3 q <- length(levels(mfr$mu)) * length(levels(mfr$CASE)) llbt.fit(mfr$y, X, q, ncat) # fit the (aggregated) model with one subject covariate mfr <- llbt.design(cemspc, nitems = 6, objnames = c("lo", "pa", "mi", "sg", "ba", "st"), cov.sel = "ENG") eng <- mfr$ENG eng <- factor(eng) mm <- model.matrix(~ lo+pa+mi+sg+ba + g1 + (lo+pa+mi+sg+ba):eng, data = mfr) X <- mm[, -1] q <- length(levels(mfr$mu)) * length(levels(eng)) ncat <- 3 llbt.fit(mfr$y, X, q, ncat)
Worth parameters are calculated from the results of an LLBT model fit, i.e., from llbtPC.fit
or from a gnm
-fit, respectively.
For the latter, the function only works if the design matrix had been generated using llbt.design
.
llbt.worth(fitobj, outmat = "worth")
llbt.worth(fitobj, outmat = "worth")
fitobj |
result of an LLBT model fit using either |
outmat |
a matrix of estimated worth parameters ( |
If the LLBT model includes categorical subject covariates, the function provides estimates for all groups formed by the full crossclassification. Numerical subject covariates are not implemented (yet)(see Warning below).
llbt.worth
returns a matrix of worth or model parameters.
If subject covariates have been specified, each column represents a group defined by the crossclassification of the subject covariates.
In case of object-specific covariates (gnm
-fit only) the rows are collapsed to the number of different combinations of object-specific covariate values and labelled accordingly.
Additionally, there is an attribute objtable
containing a summary of original objects (items) and their reparameterisation with object-specific covariates.
This is a list or a matrix.
The function plot
gives a plot of the estimates.
If the LLBT model has been fitted including numerical subject covariates, they are ignored. However, estimates for the remaining predictors are calculated for convenience. Please note, that these cannot be interpreted as standard estimates but are intercepts of the regression model where the objects (or reparameterised objects) are explained by one or more numerical subject covariates.
If a position effect has been fitted (for details see Dittrich, et. al., 1998), the corresponding variable must have been named pos
.
Reinhold Hatzinger
# fit only first three objects with SEX effect mod <- llbtPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate and print worth parameters mw <- llbt.worth(mod) mw # the same using llbt.design and gnm des <- llbt.design(cemspc, nitems = 3, cat.scovs = "SEX") m2 <- gnm(y ~ o1+o2+o3 + SEX:(o1+o2+o3) + g1, elim = SEX:mu, data = des, family = poisson) # calculate and plot worth parameters w2 <- llbt.worth(m2) plot(w2) # model with object specific covariates latin <- c(0, 1, 1, 0, 1, 0) # object-specific covariate LAT <- data.frame(LAT = latin) # objcovs must be data frame with named columns onames <- c("LO", "PA", "MI", "SG", "BA", "ST") des <- llbt.design(cemspc, nitems = 6, objnames = onames, objcovs = LAT) m3 <- gnm(y ~ LAT + g1, eliminate = mu, data = des, family = poisson) w3 <- llbt.worth(m3) w3 attr(w3, "objtable")
# fit only first three objects with SEX effect mod <- llbtPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate and print worth parameters mw <- llbt.worth(mod) mw # the same using llbt.design and gnm des <- llbt.design(cemspc, nitems = 3, cat.scovs = "SEX") m2 <- gnm(y ~ o1+o2+o3 + SEX:(o1+o2+o3) + g1, elim = SEX:mu, data = des, family = poisson) # calculate and plot worth parameters w2 <- llbt.worth(m2) plot(w2) # model with object specific covariates latin <- c(0, 1, 1, 0, 1, 0) # object-specific covariate LAT <- data.frame(LAT = latin) # objcovs must be data frame with named columns onames <- c("LO", "PA", "MI", "SG", "BA", "ST") des <- llbt.design(cemspc, nitems = 6, objnames = onames, objcovs = LAT) m3 <- gnm(y ~ LAT + g1, eliminate = mu, data = des, family = poisson) w3 <- llbt.worth(m3) w3 attr(w3, "objtable")
Function to fit a loglinear Bradley-Terry for paired comparisons allowing subject covariates and undecided response categories.
llbtPC.fit(obj, nitems, formel = ~1, elim = ~1, resptype = "paircomp", obj.names = NULL, undec = TRUE)
llbtPC.fit(obj, nitems, formel = ~1, elim = ~1, resptype = "paircomp", obj.names = NULL, undec = TRUE)
obj |
either a dataframe or the path/name of the datafile to be read. |
nitems |
the number of compared objects, not the number of comparisons |
formel |
the formula for subject covariates to fit different preference scales for the objects (see below). |
elim |
the formula for the subject covariates that specify the table to be analysed. If omitted and |
resptype |
is |
obj.names |
character vector with names for objects. |
undec |
for paired comparisons with a undecided/neutral category, a common parameter will be estimated if |
Models including categorical subject covariates can be fitted using the formel
and elim
arguments.
formel
specifies the actual model to be fitted.
For instance, if specified as formel = ~SEX
different preference scale for the objects will be estimated for males and females.
For two or more covariates, the operators +
or *
can be used to model main or interaction effects, respectively.
The operator :
is not allowed.
See also formula
.
The specification for elim
follows the same rules as for formel
.
However, elim
specifies the basic contingency table to be set up but does not specify any covariates to be fitted.
This is done using formel
.
If, e.g., elim = ~SEX
but formel = ~1
, then the table is set up as if SEX
would be fitted but only one global preference scale is computed.
This feature allows for the successive fitting of nested models to enable the use of deviance differences for model selection (see example below).
llbtPC.fit
returns an object of class llbtMod
.
This object is basically a gnm
object with an additional element envList
.
This is a list with further details like the subject covariates design structure covdesmat
, the model specification (formel
and elim
), the object names (obj.names
), the number of items (nobj
) and comparisons (ncomp
), etc.
The function llbt.worth
can be used to produce a matrix of estimated worth parameters.
The responses have to be coded as 0/1 for paired comparisons without undecided category (0 means first object in a comparison preferred) or 0/1/2 for paired comparisons with an undecided category (where 1 is the undecided category). Optional subject covariates have to be specified such that the categories are represented by consecutive integers starting with 1. Rows with missing values for subject covariates are removed from the data and a message is printed. The leftmost columns in the data must be the responses to the paired comparisons (where the mandatory order of comparisons is (12) (13) (23) (14) (24) (34) (15) (25) etc.), optionally followed by columns for categorical subject covariates.
The data specified via obj
are supplied using either a data frame or a datafile in which case obj
is a path/filename.
The input data file if specified must be a plain text file with variable names in the first row as readable via the command read.table(datafilename, header = TRUE)
.
For an example see cemspc
.
The function llbtPC.fit
is a wrapper function for gnm
and was designed to facilitate fitting of LLBTs with subject covariates and undecided categories.
More specialised setups (e.g., object-specific covariates) can be obtained using llbt.design
and then calling gnm
(or glm
) directly (see Examples for llbt.design
).
Reinhold Hatzinger
# cems universities example res0 <- llbtPC.fit(cemspc, nitems = 6, formel = ~1, elim = ~ENG, undec = TRUE) res1 <- llbtPC.fit(cemspc, nitems = 6, formel = ~ENG, elim = ~ENG, undec = TRUE) anova(res1, res0) llbt.worth(res1)
# cems universities example res0 <- llbtPC.fit(cemspc, nitems = 6, formel = ~1, elim = ~ENG, undec = TRUE) res1 <- llbtPC.fit(cemspc, nitems = 6, formel = ~ENG, elim = ~ENG, undec = TRUE) anova(res1, res0) llbt.worth(res1)
The dataset contains data on 18 items referring to the liking of musical styles (ratings on a 5-point Likert type response scale) and three subject covariates. The data is an excerpt from the US General Social Survey 1993. Missing values have been removed from the subject variables.
music
music
A data frame with 1597 observations.
The variables bigb
through hvym
are responses to items asking for liking/disliking of type of music:
Do you like it very much (1), like it (2), have mixed feelings (3), dislike it (4), dislike it very much (5).
bigb
like or dislike bigband music
blug
like or dislike bluegrass music
coun
like or dislike country western music
blue
like or dislike blues or r and b music
musi
like or dislike broadway musicals
clas
like or dislike classical music
folk
like or dislike folk music
gosp
like or dislike gospel music
jazz
like or dislike jazz
lati
like or dislike latin music
mood
like or dislike easy listening music
newa
like or dislike new age music
oper
like or dislike opera
rap
like or dislike rap music
regg
like or dislike reggae music
conr
like or dislike contemporary rock music
oldi
like or dislike oldies rock music
hvym
like or dislike heavy metal music
age
age in years
educ
highest year of school completed
sex
1 male, 2 female
Davis, J. A., Smith, T. W., & Marsden, P. V. (2016) General Social Survey, 1993, 1998, 2000, 2002 with Cultural, Information Security, and Freedom Modules [United States]. Inter-university Consortium for Political and Social Research [distributor]. doi:10.3886/ICPSR35536.v2
summary(music)
summary(music)
The function patt.design
converts (i) real paired comparison responses, or (ii) a set of ratings (or Likert-type responses measured on a common scale), or (iii) full rankings into paired comparison patterns, returning a new data frame containing the design matrix for a loglinear paired comparison model.
Additionally, the frequencies of these patterns are computed and are stored in the first column of the data frame.
patt.design(obj, nitems = NULL, objnames = "", objcovs = NULL, cat.scovs = NULL, num.scovs = NULL, resptype = "paircomp", reverse = FALSE, ia = FALSE, casewise = FALSE, ...)
patt.design(obj, nitems = NULL, objnames = "", objcovs = NULL, cat.scovs = NULL, num.scovs = NULL, resptype = "paircomp", reverse = FALSE, ia = FALSE, casewise = FALSE, ...)
obj |
either a data frame or a data file name. |
nitems |
number of items (objects). |
objnames |
an optional character vector with names for the objects.
These names are the columns names in the output data frame.
If |
objcovs |
an optional data frame with object specific covariates.
The rows correspond to the objects, the columns define the covariates.
The column names of this data frame are later used to fit the covariates.
Factors are not allowed.
In that case dummy variables have to be set up manually (favourably using |
cat.scovs |
a character vector with the names of the categorical subject covariates in the data file to be included into the design matrix.
(example: |
num.scovs |
analogous to |
resptype |
one of |
reverse |
If the responses are such that low values correspond to high preference (or agreement or rank) and high values to low preference (or agreement or ranks) (e.g., (1) I strongly agree ... (5) I strongly disagree) then |
ia |
generates covariates for interactions between comparisons if |
casewise |
If |
... |
deprecated options to allow for backwards compatibility (see Deprecated below). |
The function patt.design
allows for different scenarios mainly concerning
responses. Currently, three types of responses can be specified.
paired comparison data. Responses can be either simply preferred – not preferred or ordinal (strongly preferred – ... – not at all preferred). In both cases an undecided category may or may not occur. If there are more than three categories a they are reduced to two or three response categories. The set of paired comparison responses represents a response pattern.
ratings/Likert type responses.
The responses to Likert type items are transformed to paired comparison responses by calculating the difference between each pair of the Likert items.
This leads to an ordinal (adjacent categories) paired comparison model with 2-1 response categories where
is the number of the (original) Likert categories.
Again, the transformed ratings are reduced to three response categories (preferred – undecided – not preferred).
rankings. Currently only full rankings are allowed, i.e., a (consecutive) integer must uniquely be assigned to each object in a list according to the (subjective) ordering. Ties are not allowed. As for ratings, the rankings are transformed to paired comparison responses by calculating the difference between each pair of the ranks. Again a category reduction (as described above) is automatically performed.
comparison covariates.
The design matrix for the basic model has columns for the items (objects) and (depending on the type of responses) for undecided comparisons.
For ratings (Likert type) undecided comparisons occur if any subject has responded to two items in the same category.
For paired comparisons it depends on the design.
For rankings there are no undecided categories.
If undecided categories occur there is one dummy variable for each comparison.
Additionally, covariates for two way interaction between comparisons (i.e., for effects resulting from the dependence between two comparisons that have one item in common) can be obtained by setting ia = TRUE
.
object specific covariates.
For modelling certain characteristics of objects a reparameterisation can be included in the design.
This is sometimes called conjoint analysis.
The object specific covariates can be continuous or dummy variables.
For the specification see Argument objcovs
above.
subject covariates. For modelling different preference scales for the items according to characteristics of the respondents categorical subject covariates can be included in the design. The corresponding variables are defined as numerical vectors where the levels are specified with consecutive integers starting with 1. This format must be used in the input data file and is also used in all outputs.
The output is a dataframe. Each row represents a unique response pattern. If subject covariates are specified, each row instead represents a particular combination of a unique covariate combination with a response pattern. All possible combinations are generated.
The first column contains the counts for the paired comparison response patterns and is labelled with Y
.
The next columns are the covariates for the items and the undecided category effects (one for each comparison).
These are labelled as u12
, u13
, etc., where 12
denotes the comparison between items 1
and 2
.
Optionally, covariates for dependencies between comparisons follow.
The columns are labelled Ia.bc
denoting the interaction of the comparisons between items (a, b)
and (a, c)
where the common item is a
.
If subject covariates are present they are in the rightmost columns and defined to be factors.
Responses have to be coded as consecutive integers (e.g., (0, 1), or (1, 2, 3, ...), where the smallest value corresponds to (highest) preference for the first object in a comparison.
For (ordinal) paired comparison data (resptype = "paircomp"
) the codings ,
,
,
etc. can also be used.
Then negative numbers correspond to not preferred, 0 to undecided.
Missing responses are not allowed (use functions
pattPC.fit
, pattL.fit
, or pattR.fit
instead).
Input data (via the first argument obj
in the function call) is specified either through a dataframe or a datafile in which case obj
is a path/filename.
The input data file if specified must be a plain text file with variable names in the first row as readable via the command read.table(datafilename, header = TRUE)
.
The leftmost columns must be the responses to the paired comparisons, ratings (Likert items), or rankings.
For paired comparisons the mandatory order is of comparisons is (12) (13) (23) (14) (24) (34) (15) (25) etc. For rankings, the lowest value means highest rank according to the underlying scale.
Each column in the data file corresponds to one of the ranked objects.
For example, if we have 3 objects denoted by A
, B
, and C
, with corresponding columns in the data matrix, the response pattern (3, 1, 2)
represents: object B
ranked highest, C
ranked second, and A
ranked lowest.
For ratings.
again the lowest value means highest ‘endorsement’ (agreement) according to the underlying scale.
All items are assumed to have the same number of response category.
The columns for responses are optionally followed by columns for subject covariates.
If categorical, they have to be specified such that the categories are represented by consecutive integers starting with 1.
Missing values are not allowed and treated such that rows with NA
s are removed from the resulting design structure and a message is printed.
For an example see xmpl
.
(Besides supplying data via a dataframe or a datafile name, obj
can also be specified as a control list with the same elements as the arguments in the function call.
The data must then be specified as a path/filename using the element datafile = "filename"
.
The control list feature is deprecated.
An example is given below.)
The following options are for backwards compatibility and should no longer be used.
same as casewise
.
same as ia
.
same as reverse
.
same as cat.scovs
.
Options for requesting GLIM commands and data structures are no longer supported.
Specifying the input to llbt.design
via a control list is also deprecated.
If you want to use these features you have to install prefmod <= 0.8-22.
Reinhold Hatzinger
Dittrich, R., Francis, B.J., Hatzinger R., Katzenbeisser, W. (2007), A Paired Comparison Approach for the Analysis of Sets of Likert Scale Responses. Statistical Modelling, Vol. 7, No. 1, 3–28.
llbt.design
,
pattPC.fit
, pattL.fit
, pattR.fit
# mini example with three Likert items and two subject covariates dsgnmat <- patt.design(xmpl, nitems = 3, resptype = "rating", ia = TRUE, cov.sel = "ALL") head(dsgnmat) # ILLUSTRATING THE ISSP2000 EXAMPLE # simplified version of the analysis as given in Dittrich et. al (2007). design <- patt.design(issp2000, nitems = 6, resptype = "rating", cov.sel = c("SEX", "EDU")) # - fit null multinomial model (basic model for items without subject # covariates) through Poisson distribution. # - SEX:EDU parameters are nuisance parameters # - the last item (GENE) becomes a reference item in the model and is aliased; # all other items are compared to this last item # item parameters with undecided effects and no covariate effects. summary(glm(y ~ SEX*EDU + CAR+IND+FARM+WATER+TEMP+GENE + u12+u13+u23+u14+u24+u34+u15+u25+u35+u45+u16+u26+u36+u46+u56, data = design, family = poisson)) # now add main effect of SEX on items summary(glm(y ~ SEX:EDU + CAR+IND+FARM+WATER+TEMP+GENE + (CAR+IND+FARM+WATER+TEMP+GENE):SEX + u12+u13+u23+u14+u24+u34+u15+u25+u35+u45+u16+u26+u36+u46+u56, data = design, family = poisson))
# mini example with three Likert items and two subject covariates dsgnmat <- patt.design(xmpl, nitems = 3, resptype = "rating", ia = TRUE, cov.sel = "ALL") head(dsgnmat) # ILLUSTRATING THE ISSP2000 EXAMPLE # simplified version of the analysis as given in Dittrich et. al (2007). design <- patt.design(issp2000, nitems = 6, resptype = "rating", cov.sel = c("SEX", "EDU")) # - fit null multinomial model (basic model for items without subject # covariates) through Poisson distribution. # - SEX:EDU parameters are nuisance parameters # - the last item (GENE) becomes a reference item in the model and is aliased; # all other items are compared to this last item # item parameters with undecided effects and no covariate effects. summary(glm(y ~ SEX*EDU + CAR+IND+FARM+WATER+TEMP+GENE + u12+u13+u23+u14+u24+u34+u15+u25+u35+u45+u16+u26+u36+u46+u56, data = design, family = poisson)) # now add main effect of SEX on items summary(glm(y ~ SEX:EDU + CAR+IND+FARM+WATER+TEMP+GENE + (CAR+IND+FARM+WATER+TEMP+GENE):SEX + u12+u13+u23+u14+u24+u34+u15+u25+u35+u45+u16+u26+u36+u46+u56, data = design, family = poisson))
Worth parameter are calculated from the results of a pattern model fit, i.e., from pattPC.fit
, pattR.fit
, pattL.fit
, and pattLrep.fit
or from a gnm
-fit, respectively.
For the latter, the function only works if the design matrix had been generated using patt.design
.
patt.worth(fitobj, obj.names = NULL, outmat = "worth")
patt.worth(fitobj, obj.names = NULL, outmat = "worth")
fitobj |
Object of class |
obj.names |
names for the objects, for repeated measurement models just the names of objects for the first time point |
outmat |
a matrix of estimated worth parameters ( |
If the pattern model includes categorical subject covariates, the function provides estimates for all groups formed by the full crossclassification. Numerical subject covariates are not implemented (yet)(see Warning below).
patt.worth
returns a matrix of worth or model parameters.
If subject covariates have been specified, each column represents a groups defined by the crossclassification of the subject covariates.
The function plot
gives a plot of the estimates.
If the pattern model has been fitted including numerical subject covariates, they are ignored. However, estimates for the remaining predictors are calculated for convenience. Please note, that these cannot be interpreted as standard estimates but are intercepts of the regression model where the objects (or reparameterised objects) are explained by one or more numerical subject covariates.
Reinhold Hatzinger
pattPC.fit
,
pattR.fit
,
pattL.fit
,
pattLrep.fit
,
plot
# fit only first three objects with SEX effect m2 <- pattPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate and print worth parameters m2worth <- patt.worth(m2) m2worth
# fit only first three objects with SEX effect m2 <- pattPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate and print worth parameters m2worth <- patt.worth(m2) m2worth
Function to fit a pattern model for ratings/Likert items (transformed to paired comparisons) allowing for missing values using a CL approach.
pattL.fit(obj, nitems, formel = ~1, elim = ~1, resptype = "rating", obj.names = NULL, undec = TRUE, ia = FALSE, NItest = FALSE, pr.it = FALSE)
pattL.fit(obj, nitems, formel = ~1, elim = ~1, resptype = "rating", obj.names = NULL, undec = TRUE, ia = FALSE, NItest = FALSE, pr.it = FALSE)
obj |
either a dataframe or the path/name of the datafile to be read. |
nitems |
the number of items |
formel |
the formula for subject covariates to fit different preference scales for the objects (see below). |
elim |
the formula for the subject covariates that specify the table to be analysed.
If omitted and |
resptype |
is |
obj.names |
character vector with names for objects. |
undec |
for paired comparisons with a undecided/neutral category, a common parameter will be estimated if |
ia |
interaction parameters between comparisons that have one object in common if |
NItest |
separate estimation of object parameters for complete and incomplete patterns if |
pr.it |
a dot is printed at each iteration cycle if set to |
Models including categorical subject covariates can be fitted using the formel
and elim
arguments.
formel
specifies the actual model to be fitted.
For instance, if specified as formel = ~SEX
different preference scale for the objects will be estimated for males and females.
For two or more covariates, the operators +
or *
can be used to model main or interaction effects, respectively.
The operator :
is not allowed.
See also formula
.
The specification for elim
follows the same rules as for formel
.
However, elim
specifies the basic contingency table to be set up but does not specify any covariates to be fitted.
This is done using formel
.
If, e.g., elim = ~SEX
but formel = ~1
, then the table is set up as if SEX
would be fitted but only one global preference scale is computed.
This feature allows for the succesive fitting of nested models to enable the use of deviance differences for model selection (see example below).
pattL.fit
returns an object of class pattMod
.
The function print
(i.e., print.pattMod
) can be used to print the results and the function patt.worth
to produce a matrix of worth parameters.
An object of class pattMod
is a list containing the following components:
coefficients |
estimates |
ll |
log-likelihood of the model |
fl |
log-likelihood of the saturated model |
call |
function call |
result |
a list of results from the fitting routine (see Value of |
envList |
a list with further fit details like subject covariates design structure |
partsList |
a list of the basic data structures for each subgroup defined by crossing all covariate levels and different missing value patterns.
Each element of |
The responses have to be coded as consecutive integers starting with 1 (or 0).
The value of 1 (0) means highest ‘endorsement’ (agreement) according to the underlying scale.
Missing values are coded as NA
, rows with less than 2 valid responses are removed from the fit and a message is printed.
Optional subject covariates have to be specified such that the categories are represented by consecutive integers starting with 1. Rows with missing values for subject covariates are removed from the data and a message is printed. The leftmost columns in the data must be the rankings, optionally followed by columns for categorical subject covariates.
The data specified via obj
are supplied using either a data frame or a datafile in which case obj
is a path/filename.
The input data file if specified must be a plain text file with variable names in the first row as readable via the command read.table(datafilename, header = TRUE)
.
For an example see issp2000
.
The size of the table to be analysed increases dramatically with the number of items.
For ratings (Likert items) the number of paired comparison response categories is always three.
The number of rows of the table to set up the design matrix is initially , e.g., for six items with 5 response categories each this is 531441.
A reasonable maximum number of items with five response categories to be analysed with pattern models is 7.
Reinhold Hatzinger
patt.design
,
pattPC.fit
,
pattR.fit
# fit only four items music4 <- music[, c("jazz", "blue", "folk", "rap")] pattL.fit(music4, nitems = 4) # fit additional undecided effect pattL.fit(music4, nitems = 4, undec = TRUE) # fit dependence parameters ## Not run: pattL.fit(music4, nitems = 4, undec = TRUE, ia = TRUE) # check for ignorable missing pattL.fit(music4, nitems = 4, undec = TRUE, NItest = TRUE)
# fit only four items music4 <- music[, c("jazz", "blue", "folk", "rap")] pattL.fit(music4, nitems = 4) # fit additional undecided effect pattL.fit(music4, nitems = 4, undec = TRUE) # fit dependence parameters ## Not run: pattL.fit(music4, nitems = 4, undec = TRUE, ia = TRUE) # check for ignorable missing pattL.fit(music4, nitems = 4, undec = TRUE, NItest = TRUE)
Function to fit a pattern model for repeated ratings/Likert items (transformed to paired comparisons) allowing for missing values using a CL approach.
pattLrep.fit(obj, nitems, tpoints = 1, formel = ~1, elim = ~1, resptype = "ratingT", obj.names = NULL, undec = TRUE, ia = FALSE, iaT = FALSE, NItest = FALSE, pr.it = FALSE)
pattLrep.fit(obj, nitems, tpoints = 1, formel = ~1, elim = ~1, resptype = "ratingT", obj.names = NULL, undec = TRUE, ia = FALSE, iaT = FALSE, NItest = FALSE, pr.it = FALSE)
obj |
either a dataframe or the path/name of the datafile to be read. |
nitems |
the number of items at one time point. |
tpoints |
the number of time points. |
formel |
the formula for subject covariates to fit different preference scales for the objects (see below). |
elim |
the formula for the subject covariates that specify the table to be analysed.
If omitted and |
resptype |
is |
obj.names |
character vector with names for objects. |
undec |
for paired comparisons with a undecided/neutral category, a common parameter will be estimated if |
ia |
for each time point interaction parameters between comparisons that have one object in common if |
iaT |
if |
NItest |
separate estimation of object parameters for complete and incomplete patterns if |
pr.it |
a dot is printed at each iteration cycle if set to |
Models including categorical subject covariates can be fitted using the formel
and elim
arguments.
formel
specifies the actual model to be fitted.
For instance, if specified as formel = ~SEX
different preference scale for the objects will be estimated for males and females.
For two or more covariates, the operators +
or *
can be used to model main or interaction effects, respectively.
The operator :
is not allowed (redundant terms are removed automatically).
See also formula
.
The specification for elim
follows the same rules as for formel
.
However, elim
specifies the basic contingency table to be set up but does not specify any covariates to be fitted.
This is done using formel
.
If, e.g., elim = ~SEX
but formel = ~1
, then the table is set up as if SEX
would be fitted but only one global preference scale is computed.
This feature allows for the successive fitting of nested models to enable the use of deviance differences for model selection (see example below).
pattLrep.fit
returns an object of class pattMod
.
The function print
(i.e., print.pattMod
) can be used to print the results and the function patt.worth
to produce a matrix of worth parameters.
An object of class pattMod
is a list containing the following components:
coefficients |
estimates |
ll |
log-likelihood of the model |
fl |
log-likelihood of the saturated model |
call |
function call |
result |
a list of results from the fitting routine (see Value of |
envList |
a list with further fit details like subject covariates design structure |
partsList |
a list of the basic data structures for each subgroup defined by crossing all covariate levels and different missing value patterns.
Each element of |
The input data must have the following order (from left to right): all items at first time point, all items at second time point (with the same order as before), etc. for the other time points, optional subject covariates.
The responses have to be coded as consecutive integers starting with 1 (or 0).
The value of 1 (0) means highest ‘endorsement’ (agreement) according to the underlying scale.
Missing values are coded as NA
, rows with less than 2 valid responses are removed from the fit and a message is printed.
Optional subject covariates have to be specified such that the categories are represented by consecutive integers starting with 1. Rows with missing values for subject covariates are removed from the data and a message is printed. Again, the leftmost columns in the data must be the ratings, optionally followed by columns for categorical subject covariates.
The data specified via obj
are supplied using either a data frame or a datafile in which case obj
is a path/filename.
The input data file if specified must be a plain text file with variable names in the first row as readable via the command read.table(datafilename, header = TRUE)
.
The size of the table to be analysed increases dramatically with the number of items and time points.
For ratings (Likert items) the number of paired comparison response categories is always three.
For each time point the number of rows of the table to set up the design matrix is initially .
After reducing to three categories the number of patterns are 13, 75, 541 for 3 to 5 items, respectively.
Generally, the number of rows in the design matrix is
.
The number of covariate levels and the number of missing value patterns have effects only on the run time.
A (reasonable) maximum number of items for two time points is 5, for three timepoints 4, and for four to five timepoints 3.
The number of timepoints can also be regarded as different response dimensions.
Reinhold Hatzinger
pattL.fit
,
patt.design
,
pattPC.fit
,
pattR.fit
,
pattRrep.fit
# simulated data: 3 items, 2 timepoints dat <- as.data.frame(matrix(sample(1:5, 300, replace = TRUE), ncol = 6)) res <- pattLrep.fit(dat, nitems = 3, tpoints = 2, iaT = TRUE) res patt.worth(res, obj.names = LETTERS[1:3])
# simulated data: 3 items, 2 timepoints dat <- as.data.frame(matrix(sample(1:5, 300, replace = TRUE), ncol = 6)) res <- pattLrep.fit(dat, nitems = 3, tpoints = 2, iaT = TRUE) res patt.worth(res, obj.names = LETTERS[1:3])
Fits a mixture model to overdispersed paired comparison data using non-parametric maximum likelihood (Aitkin, 1996a).
pattnpml.fit(formula, random = ~1, k = 1, design, tol = 0.5, startp = NULL, EMmaxit = 500, EMdev.change = 0.001, seed = NULL, pr.it = FALSE)
pattnpml.fit(formula, random = ~1, k = 1, design, tol = 0.5, startp = NULL, EMmaxit = 500, EMdev.change = 0.001, seed = NULL, pr.it = FALSE)
formula |
A formula defining the response (the count of the number of cases of each pattern) and the fixed effects (e.g. |
random |
A formula defining the random model. If there are three objects labelled o1, o2, o3, set |
k |
The number of mass points (latent classes). Up to 21 mass points are supported. |
design |
The design data frame for paired comparison data as generated using |
tol |
The |
startp |
Optional numerical vector of length |
EMmaxit |
The maximum number of EM iterations. |
EMdev.change |
Stops EM algorithm when deviance change falls below this value. |
seed |
Seed for random weights. If |
pr.it |
A dot is printed at each iteration cycle of the EM algorithm if set to |
The function pattnpml.fit
is a wrapper function for alldistPC
which in turn is a modified version of the function alldist
from the npmlreg package.
The non-parametric maximum likelihood (NPML) approach was introduced in Aitkin (1996) as a tool to fit overdispersed generalised linear models. The idea is to approximate the unknown and unspecified distribution of the random effect by a discrete mixture of exponential family densities, leading to a simple expression of the marginal likelihood which can then be maximised using a standard EM algorithm.
This function extends the NPML approach to allow fitting of overdispersed paired comparison models. It assumes that overdispersion arises because of dependence in the patterns. Fitting a non-parametric random effects term is equivalent to specifying distinct latent classes of response patterns.
The number of components k
of the finite mixture has to be specified beforehand.
The EM algorithm used by the function takes the Gauss-Hermite masses and mass points as starting points.
The position of the starting points can be concentrated or extended by setting tol
smaller or larger, respectively; the initial mass point probabilities of the starting points can also be specified through startp
.
Fitting models for overdispersion can be achieved by specifying the paired comparison items as additive terms in the random part of the model formula. A separate estimate for each item and for each mass point is produced.
Fitting subject covariate models with the same effect for each mass point component is achieved by specifying as part of the formula
a) a subject factor giving a different estimate for each covariate combination b) an interaction of the chosen subject covariates with the objects.
For models with subject factor covariates only, the first term is simply the interaction of all of the factor covariates.
Fitting subject covariate models with a different effect for each mass point component (sometimes called random coefficient models, see Aitkin, Francis, Hinde and Darnell, 2009, pp. 497) is possible by specifying an interaction of the subject covariates with the items in the random
term, and also in the formula
part.
Thus the setting random = ~x:(o1+o2+o3
gives a model with a set of random slopes (one set for each mass point) and a set of random intercepts, one set for each mass point.
The AIC
and BIC
functions from the stats-package can be used.
The function produces an object of class pattNPML
.
The object contains the following 29 components:
coefficients |
a named vector of coefficients (including the mass points). In case of Gaussian quadrature, the coefficient given at |
residuals |
the difference between the true response and the empirical Bayes predictions. |
fitted.values |
the empirical Bayes predictions (Aitkin, 1996b) on the scale of the responses. |
family |
the ‘family’ object used. |
linear.predictors |
the extended linear predictors |
disparity |
the disparity ( |
deviance |
the deviance of the fitted mixture regression model. |
null.deviance |
The deviance for the null model (just containing an intercept), comparable with ‘deviance.’ |
df.residual |
the residual degrees of freedom of the fitted model (including the random part). |
df.null |
the residual degrees of freedom for the null model. |
y |
the (extended) response vector. |
call |
the matched call. |
formula |
the formula supplied. |
random |
the random term of the model formula. |
data |
the data argument. |
model |
the (extended) design matrix. |
weights |
the case weights initially supplied. |
offset |
the offset initially supplied. |
mass.points |
the fitted mass points. |
masses |
the mass point probabilities corresponding to the patterns. |
sdev |
a list of the two elements |
shape |
a list of the two elements |
rsdev |
estimated random effect standard deviation. |
post.prob |
a matrix of posteriori probabilities. |
post.int |
a vector of ‘posteriori intercepts’ (as in Sofroniou et al. (2006)). |
ebp |
the empirical Bayes Predictions on the scale of the linear predictor. For compatibility with older versions. |
EMiter |
gives the number of iterations of the EM algorithm. |
EMconverged |
logical value indicating if the EM algorithm converged. |
lastglm |
the fitted |
Misc |
contains additional information relevant for the summary and plot functions, in particular the disparity trend and the EM trajectories. |
For further details see the help file for function alldist
in package npmlreg.
The mass point probabilities given in the output are the proportion of patterns estimated to contribute to each mass point. To estimate the proportion of cases contributing to each mass point the posterior probabilities need to be averaged over patterns with observed counts as weights (see example below).
Originally translated from the GLIM 4 functions alldist
and allvc
(Aitkin & Francis, 1995) to R by Ross Darnell (2002).
Modified, extended, and prepared for publication by Jochen Einbeck and John Hinde (2006).
Adapted for paired comparison modelling by Reinhold Hatzinger and Brian Francis (2009).
Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing, 6(3), 251–262. doi:10.1007/BF00140869
Aitkin, M., Francis, B., Hinde, J., & Darnell, R. (2009). Statistical Modelling in R. Oxford: Oxford University Press.
Einbeck, J., & Hinde, J. (2006). A Note on NPML Estimation for Exponential Family Regression Models with Unspecified Dispersion Parameter. Austrian Journal of Statistics, 35(2&3), 233–243.
Sofroniou, N., Einbeck, J., & Hinde, J. (2006). Analyzing Irish suicide rates with mixture models. Proceedings of the 21st International Workshop on Statistical Modelling in Galway, Ireland, 2006.
# two latent classes for paired comparison data dfr <- patt.design(dat4, 4) modPC <- pattnpml.fit(y ~ 1, random = ~o1 + o2 + o3, k = 2, design = dfr) modPC # estimated proportion of cases in each mixture component apply(modPC$post.prob, 2, function(x){ sum(x * dfr$y / sum(dfr$y)) }) ## Not run: # fitting a model for two latent classes and fixed categorical subject # covariates to the Eurobarometer 55.2 data (see help("euro55.2.des")) # on rankings of sources of information on scientific developments model2cl <- pattnpml.fit( y ~ SEX:AGE4 + (SEX + AGE4):(TV + RAD + NEWSP + SCIMAG + WWW + EDINST) - 1, random = ~ TV + RAD + NEWSP + SCIMAG + WWW + EDINST, k = 2, design = euro55.2.des, pr.it = TRUE) summary(model2cl) BIC(model2cl) ## End(Not run)
# two latent classes for paired comparison data dfr <- patt.design(dat4, 4) modPC <- pattnpml.fit(y ~ 1, random = ~o1 + o2 + o3, k = 2, design = dfr) modPC # estimated proportion of cases in each mixture component apply(modPC$post.prob, 2, function(x){ sum(x * dfr$y / sum(dfr$y)) }) ## Not run: # fitting a model for two latent classes and fixed categorical subject # covariates to the Eurobarometer 55.2 data (see help("euro55.2.des")) # on rankings of sources of information on scientific developments model2cl <- pattnpml.fit( y ~ SEX:AGE4 + (SEX + AGE4):(TV + RAD + NEWSP + SCIMAG + WWW + EDINST) - 1, random = ~ TV + RAD + NEWSP + SCIMAG + WWW + EDINST, k = 2, design = euro55.2.des, pr.it = TRUE) summary(model2cl) BIC(model2cl) ## End(Not run)
Function to fit a pattern model for paired comparisons allowing for missing values using a CL approach.
pattPC.fit(obj, nitems, formel = ~1, elim = ~1, resptype = "paircomp", obj.names = NULL, undec = TRUE, ia = FALSE, NItest = FALSE, NI = FALSE, MIScommon = FALSE, MISmodel = "obj", MISalpha = NULL, MISbeta = NULL, pr.it = FALSE)
pattPC.fit(obj, nitems, formel = ~1, elim = ~1, resptype = "paircomp", obj.names = NULL, undec = TRUE, ia = FALSE, NItest = FALSE, NI = FALSE, MIScommon = FALSE, MISmodel = "obj", MISalpha = NULL, MISbeta = NULL, pr.it = FALSE)
obj |
either a dataframe or the path/name of the datafile to be read. |
nitems |
the number of compared objects, not the number of comparisons |
formel |
the formula for subject covariates to fit different preference scales for the objects (see below). |
elim |
the formula for the subject covariates that specify the table to be analysed.
If omitted and |
resptype |
is |
obj.names |
character vector with names for objects. |
undec |
for paired comparisons with a undecided/neutral category, a common parameter will be estimated if |
ia |
interaction parameters between comparisons that have one object in common if |
NItest |
separate estimation of object parameters for complete and incomplete patterns if |
NI |
if |
MIScommon |
if |
MISmodel |
either |
MISalpha |
if not |
MISbeta |
if not |
pr.it |
a dot is printed at each iteration cycle if set to |
Models including categorical subject covariates can be fitted using the formel
and elim
arguments.
formel
specifies the actual model to be fitted.
For instance, if specified as formel = ~SEX
different preference scale for the objects will be estimated for males and females.
For two or more covariates, the operators +
or *
can be used to model main or interaction effects, respectively.
The operator :
is not allowed. See also formula
.
The specification for elim
follows the same rules as for formel
.
However, elim
specifies the basic contingency table to be set up but does not specify any covariates to be fitted.
This is done using formel
.
If, e.g., elim = ~SEX
but formel = ~1
, then the table is set up as if SEX
would be fitted but only one global preference scale is computed.
This feature allows for the successive fitting of nested models to enable the use of deviance differences for model selection (see example below).
pattPC.fit
returns an object of class pattMod
.
The function print
(i.e., print.pattMod
) can be used to print the results and the function patt.worth
to produce a matrix of the estimated worth parameters.
An object of class pattMod
is a list containing the following components:
coefficients |
estimates |
ll |
log-likelihood of the model |
fl |
log-likelihood of the saturated model |
call |
function call |
result |
a list of results from the fitting routine (see Value of |
envList |
a list with further fit details like subject covariates design structure |
partsList |
a list of the basic data structures for each subgroup defined by crossing all covariate levels and different missing value patterns.
Each element of |
The responses have to be coded as 0/1 for paired comparisons without undecided category (0 means first object in a comparison preferred) or 0/1/2 for paired comparisons with an undecided category (where 1 is the undecided category). Optional subject covariates have to be specified such that the categories are represented by consecutive integers starting with 1. Rows with missing values for subject covariates are removed from the data and a message is printed. The leftmost columns in the data must be the responses to the paired comparisons (where the mandatory order of comparisons is (12) (13) (23) (14) (24) (34) (15) (25) etc.), optionally followed by columns for categorical subject covariates.
The data specified via obj
are supplied using either a data frame or a datafile in which case obj
is a path/filename.
The input data file if specified must be a plain text file with variable names in the first row as readable via the command read.table(datafilename, header = TRUE)
.
For an example see cemspc
.
The size of the table to be analysed increases dramatically with the number of objects.
For paired comparisons with two response categories the number of rows of the table is , e.g., with six objects this is 32768, for three response categories this is 14348907.
A reasonable maximum number of objects to be analysed with pattern models is 6 in the case of two response categories and 5 when an additional undecided/neutral category has been observed).
Reinhold Hatzinger
patt.design
,
checkMIS
,
pattL.fit
,
pattR.fit
# fit only first three objects with undecided parameter pattPC.fit(cemspc, nitems = 3, undec = TRUE) # check for ignorable missing pattPC.fit(cemspc, nitems = 3, undec = TRUE, NItest = TRUE) # check if SEX has an effect m1 <- pattPC.fit(cemspc, nitems = 3, formel = ~1, elim = ~SEX, undec = TRUE) m2 <- pattPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate LR test for SEX ll1 <- m1$result$minimum ll2 <- m2$result$minimum df1 <- length(m1$result$estimate) df2 <- length(m2$result$estimate) lr <- 2*(ll1 - ll2) df <- df2 - df1 cat("LR test = ", lr, " on df = ", df, " (p = ", round(pchisq(lr, df, lower.tail = FALSE), digits = 5), ")\n", sep = "") # generates data set with three items and some missing values in # comparison (23), column 3, then there are no NAs for object 1 data3 <- dat4[, 1:3] idx3 <- sample(1:100, 10) data3[idx3, 3] <- NA checkMIS(data3, nitems = 3, verbose = TRUE) # estimate MNAR PC pattern model for data3 without alpha1 and beta1 pattPC.fit(data3, nitems = 3, MISalpha = c(FALSE, TRUE, TRUE), MISbeta = c(FALSE, TRUE, TRUE))
# fit only first three objects with undecided parameter pattPC.fit(cemspc, nitems = 3, undec = TRUE) # check for ignorable missing pattPC.fit(cemspc, nitems = 3, undec = TRUE, NItest = TRUE) # check if SEX has an effect m1 <- pattPC.fit(cemspc, nitems = 3, formel = ~1, elim = ~SEX, undec = TRUE) m2 <- pattPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate LR test for SEX ll1 <- m1$result$minimum ll2 <- m2$result$minimum df1 <- length(m1$result$estimate) df2 <- length(m2$result$estimate) lr <- 2*(ll1 - ll2) df <- df2 - df1 cat("LR test = ", lr, " on df = ", df, " (p = ", round(pchisq(lr, df, lower.tail = FALSE), digits = 5), ")\n", sep = "") # generates data set with three items and some missing values in # comparison (23), column 3, then there are no NAs for object 1 data3 <- dat4[, 1:3] idx3 <- sample(1:100, 10) data3[idx3, 3] <- NA checkMIS(data3, nitems = 3, verbose = TRUE) # estimate MNAR PC pattern model for data3 without alpha1 and beta1 pattPC.fit(data3, nitems = 3, MISalpha = c(FALSE, TRUE, TRUE), MISbeta = c(FALSE, TRUE, TRUE))
Function to fit a pattern model for (partial) rankings (transformed to paired comparisons) allowing for missing values using a CL approach.
pattR.fit(obj, nitems, formel = ~1, elim = ~1, resptype = "ranking", obj.names = NULL, ia = FALSE, NItest = FALSE, pr.it = FALSE)
pattR.fit(obj, nitems, formel = ~1, elim = ~1, resptype = "ranking", obj.names = NULL, ia = FALSE, NItest = FALSE, pr.it = FALSE)
obj |
either a dataframe or the path/name of the datafile to be read. |
nitems |
the number of items |
formel |
the formula for subject covariates to fit different preference scales for the objects (see below). |
elim |
the formula for the subject covariates that specify the table to be analysed.
If omitted and |
resptype |
is |
obj.names |
character vector with names for objects. |
ia |
interaction parameters between comparisons that have one object in common if |
NItest |
separate estimation of object parameters for complete and incomplete patterns if |
pr.it |
a dot is printed at each iteration cycle if set to |
Models including categorical subject covariates can be fitted using the formel
and elim
arguments.
formel
specifies the actual model to be fitted.
For instance, if specified as formel = ~SEX
different preference scale for the objects will be estimated for males and females.
For two or more covariates, the operators +
or *
can be used to model main or interaction effects, respectively.
The operator :
is not allowed.
See also formula
.
The specification for elim
follows the same rules as for formel
.
However, elim
specifies the basic contingency table to be set up but does not specify any covariates to be fitted.
This is done using formel
.
If, e.g., elim = ~SEX
but formel = ~1
, then the table is set up as if SEX
would be fitted but only one global preference scale is computed.
This feature allows for the successive fitting of nested models to enable the use of deviance differences for model selection (see example below).
pattR.fit
returns an object of class pattMod
.
The function print
(i.e., print.pattMod
) can be used to print the results and the function patt.worth
to produce a matrix of worth parameters.
An object of class pattMod
is a list containing the following components:
coefficients |
estimates |
ll |
log-likelihood of the model |
fl |
log-likelihood of the saturated model |
call |
function call |
result |
a list of results from the fitting routine (see Value of |
envList |
a list with further fit details like subject covariates design structure |
partsList |
a list of the basic data structures for each subgroup defined by crossing all covariate levels and different missing value patterns.
Each element of |
The responses have to be coded as consecutive integers starting with 1.
The value of 1 means highest rank according to the underlying scale.
Each column in the data file corresponds to one of the ranked objects.
For example, if we have 3 objects denoted by A
, B
, and C
, with corresponding columns in the data matrix, the response pattern (3, 1, 2)
represents: object B
ranked highest, C
ranked second, and A
ranked lowest.
Missing values are coded as NA
, ties are not allowed (in that case use pattL.fit
.
Rows with less than 2 ranked objects are removed from the fit and a message is printed.
Optional subject covariates have to be specified such that the categories are represented by consecutive integers starting with 1. Rows with missing values for subject covariates are removed from the data and a message is printed. The leftmost columns in the data must be the rankings, optionally followed by columns for categorical subject covariates.
The data specified via obj
are supplied using either a data frame or a datafile in which case obj
is a path/filename.
The input data file if specified must be a plain text file with variable names in the first row as readable via the command read.table(datafilename, header = TRUE)
.
For an example without covariates and no missing values see salad
.
The size of the table to be analysed increases dramatically with the number of items.
For rankings the number of paired comparison response categories is always two.
The number of rows of the table used to set up the design matrix is factorial(number of items)
.
For instance, for nine objects this is 362880.
A reasonable maximum number of items is 8.
The option NItest = TRUE
has to be used with care.
The meaning of missing responses is not obvious with partial rankings.
Are the corresponding values really missing or just not chosen.
Reinhold Hatzinger
patt.design
,
pattL.fit
,
pattPC.fit
# fit of Critchlov & Fligner (1991) Salad Dressings Data pattR.fit(salad, nitems = 4) # alternatively use glm() with patt.design() sal <- patt.design(salad, nitems = 4, resptype = "ranking") glm(y ~ A+B+C+D, data = sal, family = poisson)
# fit of Critchlov & Fligner (1991) Salad Dressings Data pattR.fit(salad, nitems = 4) # alternatively use glm() with patt.design() sal <- patt.design(salad, nitems = 4, resptype = "ranking") glm(y ~ A+B+C+D, data = sal, family = poisson)
Function to fit a pattern model for repeated (partial) rankings (transformed to paired comparisons) allowing for missing values using a CL approach.
pattRrep.fit(obj, nitems, tpoints = 1, formel = ~1, elim = ~1, resptype = "rankingT", obj.names = NULL, ia = FALSE, iaT = FALSE, NItest = FALSE, pr.it = FALSE)
pattRrep.fit(obj, nitems, tpoints = 1, formel = ~1, elim = ~1, resptype = "rankingT", obj.names = NULL, ia = FALSE, iaT = FALSE, NItest = FALSE, pr.it = FALSE)
obj |
either a dataframe or the path/name of the datafile to be read. |
nitems |
the number of items at one time point. |
tpoints |
the number of time points (must be > 1). |
formel |
the formula for subject covariates to fit different preference scales for the objects (see below). |
elim |
the formula for the subject covariates that specify the table to be analysed.
If omitted and |
resptype |
is |
obj.names |
character vector with names for objects. |
ia |
|
iaT |
if |
NItest |
separate estimation of object parameters for complete and incomplete patterns if |
pr.it |
a dot is printed at each iteration cycle if set to |
Models including categorical subject covariates can be fitted using the formel
and elim
arguments.
formel
specifies the actual model to be fitted.
For instance, if specified as formel = ~SEX
different preference scale for the objects will be estimated for males and females.
For two or more covariates, the operators +
or *
can be used to model main or interaction effects, respectively.
The operator :
is not allowed (redundant terms are removed automatically).
See also formula
.
The specification for elim
follows the same rules as for formel
.
However, elim
specifies the basic contingency table to be set up but does not specify any covariates to be fitted.
This is done using formel
.
If, e.g., elim = ~SEX
but formel = ~1
, then the table is set up as if SEX
would be fitted but only one global preference scale is computed.
This feature allows for the successive fitting of nested models to enable the use of deviance differences for model selection (see example below).
pattRrep.fit
returns an object of class pattMod
.
The function print
(i.e., print.pattMod
) can be used to print the results and the function patt.worth
to produce a matrix of worth parameters.
An object of class pattMod
is a list containing the following components:
coefficients |
estimates |
ll |
log-likelihood of the model |
fl |
log-likelihood of the saturated model |
call |
function call |
result |
a list of results from the fitting routine (see Value of |
envList |
a list with further fit details like subject covariates design structure |
partsList |
a list of the basic data structures for each subgroup defined by crossing all covariate levels and different missing value patterns.
Each element of |
The input data must have the following order (from left to right): all items at first time point, all items at second time point (with the same order as before), etc. for the other time points, optional subject covariates.
The responses have to be coded as consecutive integers starting with 1 (or 0).
The value of 1 (0) means highest ‘endorsement’ (agreement) according to the underlying scale.
Missing values are coded as NA
, rows with less than 1 valid response are removed from the fit and a message is printed.
Optional subject covariates have to be specified such that the categories are represented by consecutive integers starting with 1. Rows with missing values for subject covariates are removed from the data and a message is printed. Again, the leftmost columns in the data must be the rankings, optionally followed by columns for categorical subject covariates.
The data specified via obj
are supplied using either a data frame or a datafile in which case obj
is a path/filename.
The input data file if specified must be a plain text file with variable names in the first row as readable via the command read.table(datafilename, header = TRUE)
.
The size of the table to be analysed increases dramatically with the number of items and time points
.
For rankings the number of paired comparison response categories is always two.
For each time point the number of rows of the table to set up the design matrix is initially
.
Thus, the number of rows in the design matrix is
.
The number of combined covariate levels and the number of missing value patterns have effects only on the run time.
A (reasonable) maximum number of items for two time points is 5 or 6, for three timepoints 4, and for four to seven timepoints 3.
The number of timepoints can also be regarded as different response dimensions.
Reinhold Hatzinger
pattL.fit
,
patt.design
,
pattPC.fit
,
pattR.fit
,
pattLrep.fit
# simulated data: 3 items, 2 timepoints dat1 <- simR(3, 100, c(.2, .7, .1)) dat2 <- simR(3, 100, c(.5, .4, .1)) dat <- data.frame(dat1, dat2) res <- pattLrep.fit(dat, nitems = 3, tpoints = 2, iaT = TRUE) res patt.worth(res, obj.names = LETTERS[1:3])
# simulated data: 3 items, 2 timepoints dat1 <- simR(3, 100, c(.2, .7, .1)) dat2 <- simR(3, 100, c(.5, .4, .1)) dat <- data.frame(dat1, dat2) res <- pattLrep.fit(dat, nitems = 3, tpoints = 2, iaT = TRUE) res patt.worth(res, obj.names = LETTERS[1:3])
A plot of the worth or model parameter matrix obtained from the fit of an LLBT or pattern model is produced.
This matrix is obtained from llbt.worth
or patt.worth
and is an object of class wmat
.
## S3 method for class 'wmat' plot(x, main = "Preferences", ylab = "Estimate", psymb = NULL, pcol = NULL, ylim = range(worthmat), log = "", ...)
## S3 method for class 'wmat' plot(x, main = "Preferences", ylab = "Estimate", psymb = NULL, pcol = NULL, ylim = range(worthmat), log = "", ...)
x |
worth or parameter matrix as generated from |
main |
main title of the plot. |
ylab |
y-axis label |
psymb |
plotsymbols for objects, see Details below |
pcol |
colours for objects, see Details below |
ylim |
limits for y-axis |
log |
if specified as |
... |
further graphical parameters, use e.g. |
Plotsymbols can be defined as an integer vector of length equal to the number of objects, e.g., psymb = c(15, 22, 18)
.
They specify the graphical option pch
as used in the points
function.
The default (psymb = NULL
) uses the symbols 15 through 18 and 21 through 25.
The number of symbols is determined from the number of rows in worthmat
.
A display of some plotsymbols may be obtained from the corresponding example below.
If pcol = NULL
, the colours for objects are defined from the rainbow_hcl
palette using the colorspace package.
Other specifications include "heat"
, "terrain"
(see rainbow_hcl
), and "gray"
(see grDevices
).
The number of different colours is automatically determined via the number of objects.
Alternatively, pcol
can be specified as a character vector containing user defined RGB colour values for all objects (as hexadecimal strings in the form "#rrggbb"
), e.g., for blue "#0000FF"
).
These are usually set up using standard colour palettes (see rainbow
or, e.g., the RColorBrewer package (see Examples below).
The old plot function, plotworth()
, is defunct (see prefmod-defunct
) and will generate errors.
If you are still using it, please update your code!
Reinhold Hatzinger
# fit only first three objects with SEX effect m2 <- pattPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate and plot worth parameters m2worth <- patt.worth(m2) plot(m2worth) plot(m2worth, pcol = "terrain") # display of some plotsymbols (pch) plot(0:25, rep(1, 26), pch = 0:25, cex = 1.5) text(0:25, rep(0.95, 26), 0:25) # usage of the "RColorBrewer" package ## Not run: library("RColorBrewer") mypalette <- brewer.pal(3, "Set1") plot(m2worth, pcol = mypalette) ## End(Not run)
# fit only first three objects with SEX effect m2 <- pattPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate and plot worth parameters m2worth <- patt.worth(m2) plot(m2worth) plot(m2worth, pcol = "terrain") # display of some plotsymbols (pch) plot(0:25, rep(1, 26), pch = 0:25, cex = 1.5) text(0:25, rep(0.95, 26), 0:25) # usage of the "RColorBrewer" package ## Not run: library("RColorBrewer") mypalette <- brewer.pal(3, "Set1") plot(m2worth, pcol = mypalette) ## End(Not run)
A list of functions that are no longer part of prefmod.
plotworth(worthmat, main = "Preferences", ylab = "Estimate", psymb = NULL, pcol = NULL, ylim = range(worthmat), ...)
plotworth(worthmat, main = "Preferences", ylab = "Estimate", psymb = NULL, pcol = NULL, ylim = range(worthmat), ...)
worthmat |
parameter matrix as generated from |
main |
main title of the plot. |
ylab |
y-axis label |
psymb |
plotsymbols for objects, see Details below |
pcol |
colours for objects, see Details below |
ylim |
limits for y-axis |
... |
further graphical parameters, use e.g. |
plotworth()
was initially used to plot worth or model parameters from LLBT or pattern models (in a matrix created by llbt.worth()
or patt.worth()
).
Now, the generic plot
(i.e., plot.wmat()
) has to be used.
## Not run: ################### ### plotworth() ### ################### # fit only first three objects with SEX effect m2 <- pattPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate and plot worth parameters m2worth <- patt.worth(m2) plot.wmat(m2worth) plot.wmat(m2worth, pcol = "terrain") # display of some plotsymbols (pch) plot(0:25, rep(1, 26), pch = 0:25, cex = 1.5) text(0:25, rep(0.95, 26), 0:25) # usage of the "RColorBrewer" package library("RColorBrewer") mypalette <- brewer.pal(3, "Set1") plot.wmat(m2worth, pcol = mypalette) ## End(Not run)
## Not run: ################### ### plotworth() ### ################### # fit only first three objects with SEX effect m2 <- pattPC.fit(cemspc, nitems = 3, formel = ~SEX, elim = ~SEX, undec = TRUE) # calculate and plot worth parameters m2worth <- patt.worth(m2) plot.wmat(m2worth) plot.wmat(m2worth, pcol = "terrain") # display of some plotsymbols (pch) plot(0:25, rep(1, 26), pch = 0:25, cex = 1.5) text(0:25, rep(0.95, 26), 0:25) # usage of the "RColorBrewer" package library("RColorBrewer") mypalette <- brewer.pal(3, "Set1") plot.wmat(m2worth, pcol = mypalette) ## End(Not run)
Print method for objects of class pattMod
.
## S3 method for class 'pattMod' print(x, ...)
## S3 method for class 'pattMod' print(x, ...)
x |
Object of class |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
This print method generates output for fitted pattern models, i.e., for models of class pattMod
.
The functions pattPC.fit
, pattR.fit
, pattL.fit
, and pattLrep.fit
produce such objects.
Reinhold Hatzinger
res <- pattR.fit(salad, nitems = 4) res
res <- pattR.fit(salad, nitems = 4) res
The dataset contains the rankings of four salad dressings concerning tartness by 32 judges, with values ranging from 1 (most tart) to 4 (least tart).
salad
salad
A data frame with 32 observations on 4 variables (A
, B
, C
, D
) each representing a different salad dressing.
Critchlow, D. E. & Fligner, M. A. (1991). Paired comparison, triple comparison, and ranking experiments as generalized linear models, and their implementation on GLIM. Psychometrika 56(3), 517–533.
# Example for object covariates # fit object covariates: # salads A - D have varying concentrations of acetic and gluconic acid. # The four pairs of concentrations are # A = (.5, 0), B = (.5, 10.0), C = (1.0, 0), and D = (0, 10.0), conc <- matrix(c(.5, 0, .5, 10, 1, 0, 0, 10), ncol = 2, byrow = TRUE) sal <- patt.design(salad, nitems = 4, resptype = "ranking") X <- as.matrix(sal[, 2:5]) glm(y ~ X, data = sal, family = poisson)
# Example for object covariates # fit object covariates: # salads A - D have varying concentrations of acetic and gluconic acid. # The four pairs of concentrations are # A = (.5, 0), B = (.5, 10.0), C = (1.0, 0), and D = (0, 10.0), conc <- matrix(c(.5, 0, .5, 10, 1, 0, 0, 10), ncol = 2, byrow = TRUE) sal <- patt.design(salad, nitems = 4, resptype = "ranking") X <- as.matrix(sal[, 2:5]) glm(y ~ X, data = sal, family = poisson)
The function generates a random paired comparison data matrix (two response categories, no undecided) or a rankings data matrix optionally based on user specified worth parameters.
simPC(nobj, nobs, worth = NULL, seed = NULL, pr = FALSE) simR(nobj, nobs, worth = NULL, seed = NULL, pr = FALSE)
simPC(nobj, nobs, worth = NULL, seed = NULL, pr = FALSE) simR(nobj, nobs, worth = NULL, seed = NULL, pr = FALSE)
nobj |
Number of objects. |
nobs |
Number of cases. |
worth |
If |
seed |
Starting value for the random number generator. |
pr |
If |
The random data matrix as a data frame.
Reinhold Hatzinger
data <- simPC(4, 10, worth = 1:4, seed = 123456) data
data <- simPC(4, 10, worth = 1:4, seed = 123456) data
These functions are the summary
, print
, and BIC
methods for objects of type pattNPML
.
## S3 method for class 'pattNPML' summary(object, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'pattNPML' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'pattNPML' summary(object, digits = max(3, getOption("digits") - 3), ...) ## S3 method for class 'pattNPML' print(x, digits = max(3, getOption("digits") - 3), ...)
object |
a fitted object of class |
x |
a fitted object of class |
digits |
number of digits; applied on various displayed quantities. |
... |
further arguments, which will mostly be ignored. |
The summary
and print
methods are adapted versions from the npmlreg package.
The data describes results from a paired comparison study where 68 male and 96 female students were asked whom they would prefer to interview. The potential interview partners were Bonnie Blair, Jackie Joyner, and Jennifer Capriati.
tennis
tennis
A data frame with 16 observations on the following 5 variables.
n
counts of response pattern (C1, C2, C3
)
C1
Blair vs. Joyner: (1) Blair preferred, () Joyner preferred
C2
Blair vs. Capriati: (1) Blair preferred, () Capriati preferred)
C3
Joyner vs. Capriati: (1) Joyner preferred, () Capriati preferred)
SEX
a numeric vector: (1) male, (2) female
Böckenholt, U., & Dillon, W. R., (1997). Modeling within-subject dependencies in ordinal paired comparison data. Psychometrika, 62(3), 411–434.
tdat <- expand.mat(tennis[, -1], tennis[, 1]) head(tdat)
tdat <- expand.mat(tennis[, -1], tennis[, 1]) head(tdat)
The dataset trdel
contains data from a paired comparison study to investigate which of five training delivery modes trainees prefer (Schoell and Veith, 2011).
The modes were computer-based (CO
), TV-based (TV
), paper-based (PA
), audio-based (AU
) and classroom-based (CL
) training.
Study participants were unemployed persons in the labour market training of the Austrian labour market service (AMS).
To account for trainee characteristics that might affect the preference order the variables gender, age, and learning personality type were recorded.
These variables were coded as sex
(1 male, 2 female), age
(numeric in years), ltype
(1 accomodator, 2 diverger, 3 converger, 4 assimilator).
The learning personality types were identified from a questionnaire.
trdel
trdel
A data frame with 198 observations on the following 14 variables.
V1
, V2
, V3
, V4
, V5
, V6
, V7
, V8
, V9
, V10
paired comparisons in standard order: CO:TV
, CO:PA
, etc.1
first object preferred, 2
second object preferred.
ltype
learning types: (1) accomodator, (2) diverger, (3) converger, (4) assimilator
age
numeric in years
sex
(1) male, (2) female
Schöll, B., Veith, S. (2011). Learning style evaluation and preferred training delivery modes in labour market training (in German). Master's thesis, Vienna University of Economics and Business.
head(trdel)
head(trdel)
Data to illustrate the usage of patt.design for rating scale (Likert type) items
.
xmpl
xmpl
A data frame with 100 observations on 5 numeric variables.
The first three variables (I1
, I2
, I3
) are the rating scale (Likert type) items with 5 response categories, ranging from 1 (strong agreement) to 5 (strong disagreement).
I1
response to item 1
I2
response to item 2
I3
response to item 3
SEX
(1) male, (2) female
EDU
(1) low education, (2) high education
Datasets in data files or Data frames used in patt.design
require the following structure:
All values must be numeric.
The item responses must be in the leftmost columns (such as I1
to I3
above).
Categorical subject covariates follow the item responses (rightmost columns) and their levels must be specified as consecutive integers.
If in a used datafile or dataframe these are defined as R factors they will be converted to integers.
This is not possible if characters are used as factor levels and, consequently, patt.design
will produce an error.
des <- patt.design(xmpl, nitems = 3, resptype = "rating", cov.sel = "SEX") head(des)
des <- patt.design(xmpl, nitems = 3, resptype = "rating", cov.sel = "SEX") head(des)