Title: | Correct and Cluster Response Style Biased Data |
---|---|
Description: | Functions for performing Correcting and Clustering response-style-biased preference data (CCRS). The main functions are correct.RS() for correcting for response styles, and ccrs() for simultaneously correcting and content-based clustering. The procedure begin with making rank-ordered boundary data from the given preference matrix using a function called create.ccrsdata(). Then in correct.RS(), the response style is corrected as follows: the rank-ordered boundary data are smoothed by I-spline functions, the given preference data are transformed by the smoothed functions. The resulting data matrix, which is considered as bias-corrected data, can be used for any data analysis methods. If one wants to cluster respondents based on their indicated preferences (content-based clustering), ccrs() can be applied to the given (response-style-biased) preference data, which simultaneously corrects for response styles and clusters respondents based on the contents. Also, the correction result can be checked by plot.crs() function. |
Authors: | Mariko Takagishi [aut, cre] |
Maintainer: | Mariko Takagishi <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2024-12-14 06:24:40 UTC |
Source: | CRAN |
Corrects and clusters response-style-biased data.
Mariko Takagishi
Takagishi, M., Velden, M. van de and Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.
Applies CCRS to ccrsdata.list
.
ccrs(ccrsdata.list,K=K,lam=lam, tandem.initial=FALSE, tol = 1e-5, maxit = 50, trace = 1, nstart = 3, parallel=F,verbose=T)
ccrs(ccrsdata.list,K=K,lam=lam, tandem.initial=FALSE, tol = 1e-5, maxit = 50, trace = 1, nstart = 3, parallel=F,verbose=T)
ccrsdata.list |
A list generated by |
K |
An integer indicating the number of content-based clusters used for CCRS estimation. |
lam |
A numeric value indicating |
tandem.initial |
A logical value indicating whether the 1st initial value is generated by CCRS tandem initialization. See Section 3.3 in the paper for the detail. |
tol |
A numeric value indicating the absolute convergence tolerance |
maxit |
An integer indicating the maximum number of iterations |
trace |
An non-negative integer. If positive, tracing information on the progress of the optimization is produced. Higher values produce more tracing information. |
nstart |
An integer indicating the number of random initial values. |
parallel |
A logical value indicating parallelization over starts is used. |
verbose |
A logical value indicaitng if the progress is printed during the iteration (only when |
Returns a list with the following elements.
G |
A K by m matrix of content-based cluster centroid. |
cls.cont.vec |
A vector of integers (from 1:K) indicating the content-based cluster to which each respondent is allocated. |
opt.obval |
An optimal value of objective function. |
crs.list |
A list of class |
Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.
###data setting n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5 datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE) ###obtain n x m data matrix X <- datagene$X ccrsdata.list <- create.ccrsdata(X,q=q) ###CCRS lam <- 0.8 ; K <- 2 ccrs.list <- ccrs(ccrsdata.list,K=K,lam=lam) ###check content-based clustering result ccrs.list$cls.cont.vec ###check correction result plot(ccrs.list$crs.list)
###data setting n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5 datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE) ###obtain n x m data matrix X <- datagene$X ccrsdata.list <- create.ccrsdata(X,q=q) ###CCRS lam <- 0.8 ; K <- 2 ccrs.list <- ccrs(ccrsdata.list,K=K,lam=lam) ###check content-based clustering result ccrs.list$cls.cont.vec ###check correction result plot(ccrs.list$crs.list)
Converts data matrix to rank-ordered boundary data.
convert.X2F(X,q=q)
convert.X2F(X,q=q)
X |
An n by m categorical data matrix. |
q |
An integer indicating the maximum rating. |
An n by q-1 scaled rank-ordered boundary data.
Corrects response-style-biased data, given ccrsdata.list
created by create.ccrsdata
.
correct.rs(ccrsdata.list)
correct.rs(ccrsdata.list)
ccrsdata.list |
A list generated by |
Returns an object of crs
with the following elements.
Beta |
An n by q-1 matrix of coefficiets for response functions. |
Y.hat |
An n by m matrix of corrected data matrix. |
MB |
An n by q matrix of values of response functions evaluated at the midpoint between boundaries. |
Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.
###data setting n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5 datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE) ###obtain n x m data matrix X <- datagene$X ccrsdata.list <- create.ccrsdata(X,q=q) crs.list <- correct.rs(ccrsdata.list)
###data setting n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5 datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE) ###obtain n x m data matrix X <- datagene$X ccrsdata.list <- create.ccrsdata(X,q=q) crs.list <- correct.rs(ccrsdata.list)
Creates a dataset for CCRS from a preference data matrix.
create.ccrsdata(X,q=q)
create.ccrsdata(X,q=q)
X |
An n by m categorical data matrix. |
q |
An integer indicating the maximum rating. |
For the difference between Mmat.q and Mmat.q1 in the resulting list, see Section 3.2 in reference paper.
Returns a list with the following elements.
Fmat |
An n by q-1 matrix of scaled rank-ordered boundary data. |
Mmat.q1 |
A q-1 by 3+1 matrix of I-spline basis functions, evaluated at the boundaries. +1 indicates all 0 intercepts. |
Mmat.q |
A q by 3+1 matrix of I-spline basis functions, evaluated at the midpoints between boundaries. |
X |
An n by m categorical data matrix same as the input |
Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.
Simulates artificial preference data containing content-based (and response-style-based) clusters.
generate.rsdata(n=n,m=m,q=q,K.true=K.true,H.true=NULL,clustered.rs=FALSE, cls.cont.vec=NULL,cls.rs.vec=NULL,savedata=FALSE)
generate.rsdata(n=n,m=m,q=q,K.true=K.true,H.true=NULL,clustered.rs=FALSE, cls.cont.vec=NULL,cls.rs.vec=NULL,savedata=FALSE)
n |
An integer indicating the number of respondents. |
m |
An integer indicating the number of items. |
q |
An integer indicating the maximum rating. |
K.true |
An integer indicating the true number of content-based clusters for n respondents. |
H.true |
An integer indicating the true number of response-style-based clusters for n respondents. This is needed when |
clustered.rs |
A logical value indicating whether response-style-based cluster structure exists in generated data. If |
cls.cont.vec |
A vector of integers (from 1:K.true) of length n indicating the content-based cluster to which each respondent is allocated in artificial data. If it's |
cls.rs.vec |
A vector of integers (from 1:H.true) of length n indicating the response-style-based clusters. If it's |
savedata |
A logical value indicating whether artificial data are saved as csv files. The default is |
A list with the following elements:
X |
An n by m matrix of categorical variables. |
X.star |
An n by m matrix of true preference data |
X.nors |
An n by m matrix of categorical variables transformed by reference boundaries. |
cls.cont.vec |
A vector of integers (from 1:H.true) indicating content-based clusters used to generate artificial data. |
cls.rs.vec |
A vector of integers (from 1:H.true) indicating response-style-based clusters used to generate artificial data. |
Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.
#data setting n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5 datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE) #obtain n x m data matrix X <- datagene$X
#data setting n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5 datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE) #obtain n x m data matrix X <- datagene$X
crs
objectsPlots results of correction (1st plot: estimated response functions, 2nd plot: coefficient plot. See Appendix A of the reference paper for the 2nd plot).
## S3 method for class 'crs' plot(x, H = NULL, cls.rs.vec = NULL, ...)
## S3 method for class 'crs' plot(x, H = NULL, cls.rs.vec = NULL, ...)
x |
An object of class |
H |
An integer indicating the number of response-style-based clusters to display the correction result. If |
cls.rs.vec |
An integer vector of length n indicating response-style-based clusters for n respondents. If |
... |
Additional arguments passed to |
Correction results for each respondent are displayed. If either response-style-based clusters or the number of response-style-based clusters are specified, the correction results of response-style-based clusters are displayed.
Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.
###data setting n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5 datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE) ###obtain n x m data matrix X <- datagene$X ccrsdata.list <- create.ccrsdata(X,q=q) crs.list <- correct.rs(ccrsdata.list) ###You can check correction result using this \code{crs.plot} function. plot(crs.list) #####You can also check correction result obtained #####by a simultaneous analysis of correction and content-based clustering. ###CCRS lam <- 0.8 ; K <- 2 ccrs.list <- ccrs(ccrsdata.list,K=K,lam=lam) ###check correction result using this \code{crs.plot} function. plot(ccrs.list$crs.list)
###data setting n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5 datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE) ###obtain n x m data matrix X <- datagene$X ccrsdata.list <- create.ccrsdata(X,q=q) crs.list <- correct.rs(ccrsdata.list) ###You can check correction result using this \code{crs.plot} function. plot(crs.list) #####You can also check correction result obtained #####by a simultaneous analysis of correction and content-based clustering. ###CCRS lam <- 0.8 ; K <- 2 ccrs.list <- ccrs(ccrsdata.list,K=K,lam=lam) ###check correction result using this \code{crs.plot} function. plot(ccrs.list$crs.list)
Transforms data matrix by estimated response functions.
transformRSdata(X,Beta=Beta,Mmat.q=Mmat.q)
transformRSdata(X,Beta=Beta,Mmat.q=Mmat.q)
X |
An n by m categorical data matrix. |
Beta |
An n by q-1 matrix of coefficiets for response functions. |
Mmat.q |
A q by 3+1 matrix of I-spline basis functions, evaluated at the midpoints between boundaries. |
Returns a list with the following elements.
Y.hat |
An n by m matrix of corrected data matrix. |
MB |
An n by q matrix of values of response functions evaluated at the midpoint between boundaries. |