Package 'ccrs' reference manual

Title:	Correct and Cluster Response Style Biased Data
Description:	Functions for performing Correcting and Clustering response-style-biased preference data (CCRS). The main functions are correct.RS() for correcting for response styles, and ccrs() for simultaneously correcting and content-based clustering. The procedure begin with making rank-ordered boundary data from the given preference matrix using a function called create.ccrsdata(). Then in correct.RS(), the response style is corrected as follows: the rank-ordered boundary data are smoothed by I-spline functions, the given preference data are transformed by the smoothed functions. The resulting data matrix, which is considered as bias-corrected data, can be used for any data analysis methods. If one wants to cluster respondents based on their indicated preferences (content-based clustering), ccrs() can be applied to the given (response-style-biased) preference data, which simultaneously corrects for response styles and clusters respondents based on the contents. Also, the correction result can be checked by plot.crs() function.
Authors:	Mariko Takagishi [aut, cre]
Maintainer:	Mariko Takagishi <[email protected]>
License:	GPL (>= 2)
Version:	0.1.0
Built:	2025-02-12 06:31:23 UTC
Source:	CRAN

Correcting and Clustering preference data in the presence of response style bias.

Description

Corrects and clusters response-style-biased data.

Author(s)

Mariko Takagishi

References

Takagishi, M., Velden, M. van de and Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.

Correcting and Clustering response style biased data

Description

Applies CCRS to ccrsdata.list.

Usage

ccrs(ccrsdata.list,K=K,lam=lam, tandem.initial=FALSE,
            tol = 1e-5, maxit = 50, trace = 1, nstart = 3, parallel=F,verbose=T)
ccrs(ccrsdata.list,K=K,lam=lam, tandem.initial=FALSE,
            tol = 1e-5, maxit = 50, trace = 1, nstart = 3, parallel=F,verbose=T)

Arguments

`ccrsdata.list`	A list generated by `create.ccrsdata`.
`K`	An integer indicating the number of content-based clusters used for CCRS estimation.
`lam`	A numeric value indicating `lambda` used for CCRS estimation.
`tandem.initial`	A logical value indicating whether the 1st initial value is generated by CCRS tandem initialization. See Section 3.3 in the paper for the detail.
`tol`	A numeric value indicating the absolute convergence tolerance
`maxit`	An integer indicating the maximum number of iterations
`trace`	An non-negative integer. If positive, tracing information on the progress of the optimization is produced. Higher values produce more tracing information.
`nstart`	An integer indicating the number of random initial values.
`parallel`	A logical value indicating parallelization over starts is used.
`verbose`	A logical value indicaitng if the progress is printed during the iteration (only when `parallel==FALSE`).

Value

Returns a list with the following elements.

`G`	A K by m matrix of content-based cluster centroid.
`cls.cont.vec`	A vector of integers (from 1:K) indicating the content-based cluster to which each respondent is allocated.
`opt.obval`	An optimal value of objective function.
`crs.list`	A list of class `crs`, same as the one generated by correct.rs.

References

Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.

Examples

###data setting
n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5
datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE)
###obtain n x m data matrix
X <- datagene$X
ccrsdata.list <- create.ccrsdata(X,q=q)
###CCRS
lam <- 0.8 ; K <- 2
ccrs.list <- ccrs(ccrsdata.list,K=K,lam=lam)
###check content-based clustering result
ccrs.list$cls.cont.vec
###check correction result
plot(ccrs.list$crs.list)
###data setting
n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5
datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE)
###obtain n x m data matrix
X <- datagene$X
ccrsdata.list <- create.ccrsdata(X,q=q)
###CCRS
lam <- 0.8 ; K <- 2
ccrs.list <- ccrs(ccrsdata.list,K=K,lam=lam)
###check content-based clustering result
ccrs.list$cls.cont.vec
###check correction result
plot(ccrs.list$crs.list)

Convert data matrix to rank-ordered boundary data

Description

Converts data matrix to rank-ordered boundary data.

Usage

convert.X2F(X,q=q)
convert.X2F(X,q=q)

Arguments

`X`	An n by m categorical data matrix.
`q`	An integer indicating the maximum rating.

Value

An n by q-1 scaled rank-ordered boundary data.

Correct response-style-biased data

Description

Corrects response-style-biased data, given ccrsdata.list created by create.ccrsdata.

Usage

correct.rs(ccrsdata.list)
correct.rs(ccrsdata.list)

Arguments

ccrsdata.list

A list generated by create.ccrsdata, which contains Fmat, Mmat.q1, Mmat.q and X.

Value

Returns an object of crs with the following elements.

`Beta`	An n by q-1 matrix of coefficiets for response functions.
`Y.hat`	An n by m matrix of corrected data matrix.
`MB`	An n by q matrix of values of response functions evaluated at the midpoint between boundaries.

References

Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.

Examples

###data setting
n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5
datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE)
###obtain n x m data matrix
X <- datagene$X
ccrsdata.list <- create.ccrsdata(X,q=q)
crs.list <- correct.rs(ccrsdata.list)
###data setting
n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5
datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE)
###obtain n x m data matrix
X <- datagene$X
ccrsdata.list <- create.ccrsdata(X,q=q)
crs.list <- correct.rs(ccrsdata.list)

Create a dataset for CCRS

Description

Creates a dataset for CCRS from a preference data matrix.

Usage

create.ccrsdata(X,q=q)
create.ccrsdata(X,q=q)

Arguments

`X`	An n by m categorical data matrix.
`q`	An integer indicating the maximum rating.

Details

For the difference between Mmat.q and Mmat.q1 in the resulting list, see Section 3.2 in reference paper.

Value

Returns a list with the following elements.

`Fmat`	An n by q-1 matrix of scaled rank-ordered boundary data.
`Mmat.q1`	A q-1 by 3+1 matrix of I-spline basis functions, evaluated at the boundaries. +1 indicates all 0 intercepts.
`Mmat.q`	A q by 3+1 matrix of I-spline basis functions, evaluated at the midpoints between boundaries.
`X`	An n by m categorical data matrix same as the input `X`.

References

Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.

Simulate preference data to apply CCRS

Description

Simulates artificial preference data containing content-based (and response-style-based) clusters.

Usage

generate.rsdata(n=n,m=m,q=q,K.true=K.true,H.true=NULL,clustered.rs=FALSE,
              cls.cont.vec=NULL,cls.rs.vec=NULL,savedata=FALSE)
generate.rsdata(n=n,m=m,q=q,K.true=K.true,H.true=NULL,clustered.rs=FALSE,
              cls.cont.vec=NULL,cls.rs.vec=NULL,savedata=FALSE)

Arguments

`n`	An integer indicating the number of respondents.
`m`	An integer indicating the number of items.
`q`	An integer indicating the maximum rating.
`K.true`	An integer indicating the true number of content-based clusters for n respondents.
`H.true`	An integer indicating the true number of response-style-based clusters for n respondents. This is needed when `clustered.rs=TRUE`.
`clustered.rs`	A logical value indicating whether response-style-based cluster structure exists in generated data. If `TRUE`, coefficients of I-spline are generated by response-style-based clusters. The default is `clustered.rs=FALSE`.
`cls.cont.vec`	A vector of integers (from 1:K.true) of length n indicating the content-based cluster to which each respondent is allocated in artificial data. If it's `NULL`, it is generated automatically.
`cls.rs.vec`	A vector of integers (from 1:H.true) of length n indicating the response-style-based clusters. If it's `NULL` and clustered.rs==T, it is generated randomly.
`savedata`	A logical value indicating whether artificial data are saved as csv files. The default is `savedata=FALSE`.

Value

A list with the following elements:

`X`	An n by m matrix of categorical variables.
`X.star`	An n by m matrix of true preference data `X^*`.
`X.nors`	An n by m matrix of categorical variables transformed by reference boundaries.
`cls.cont.vec`	A vector of integers (from 1:H.true) indicating content-based clusters used to generate artificial data.
`cls.rs.vec`	A vector of integers (from 1:H.true) indicating response-style-based clusters used to generate artificial data.

References

Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.

Examples

#data setting
n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5
datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE)
#obtain n x m data matrix
X <- datagene$X
#data setting
n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5
datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE)
#obtain n x m data matrix
X <- datagene$X

Plot `crs` objects

Description

Plots results of correction (1st plot: estimated response functions, 2nd plot: coefficient plot. See Appendix A of the reference paper for the 2nd plot).

Usage

## S3 method for class 'crs'
plot(x, H = NULL, cls.rs.vec = NULL, ...)
## S3 method for class 'crs'
plot(x, H = NULL, cls.rs.vec = NULL, ...)

Arguments

`x`	An object of class `crs`.
`H`	An integer indicating the number of response-style-based clusters to display the correction result. If `H=NULL` and `cls.rs.vec=NULL`, `H` is set as `H=n`. If `H=NULL` but `cls.rs.vec!=NULL`, `H` is set as `H=max(cls.rs.vec)`. The default is `H=NULL`.
`cls.rs.vec`	An integer vector of length n indicating response-style-based clusters for n respondents. If `cls.rs.vec=NULL` and `H!=NULL`, clusters are determined by k-means clustering of Beta. The default is `cls.rs.vec=NULL`.
`...`	Additional arguments passed to `plot`.

Details

Correction results for each respondent are displayed. If either response-style-based clusters or the number of response-style-based clusters are specified, the correction results of response-style-based clusters are displayed.

References

Takagishi, M., Velden, M. van de & Yadohisa, H. (2019). Clustering preference data in the presence of response style bias, to appear in British Journal of Mathematical and Statistical Psychology.

Examples

###data setting
n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5
datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE)
###obtain n x m data matrix
X <- datagene$X
ccrsdata.list <- create.ccrsdata(X,q=q)
crs.list <- correct.rs(ccrsdata.list)
###You can check correction result using this \code{crs.plot} function.
plot(crs.list)

#####You can also check correction result obtained
#####by a simultaneous analysis of correction and content-based clustering.
###CCRS
lam <- 0.8 ; K <- 2
ccrs.list <- ccrs(ccrsdata.list,K=K,lam=lam)
###check correction result using this \code{crs.plot} function.
plot(ccrs.list$crs.list)
###data setting
n <- 30 ; m <- 10 ; H.true <- 2 ; K.true <- 2 ; q <- 5
datagene <- generate.rsdata(n=n,m=m,K.true=K.true,H.true=H.true,q=q,clustered.rs = TRUE)
###obtain n x m data matrix
X <- datagene$X
ccrsdata.list <- create.ccrsdata(X,q=q)
crs.list <- correct.rs(ccrsdata.list)
###You can check correction result using this \code{crs.plot} function.
plot(crs.list)

#####You can also check correction result obtained
#####by a simultaneous analysis of correction and content-based clustering.
###CCRS
lam <- 0.8 ; K <- 2
ccrs.list <- ccrs(ccrsdata.list,K=K,lam=lam)
###check correction result using this \code{crs.plot} function.
plot(ccrs.list$crs.list)

Transform data by the estimated response function

Description

Transforms data matrix by estimated response functions.

Usage

transformRSdata(X,Beta=Beta,Mmat.q=Mmat.q)
transformRSdata(X,Beta=Beta,Mmat.q=Mmat.q)

Arguments

`X`	An n by m categorical data matrix.
`Beta`	An n by q-1 matrix of coefficiets for response functions.
`Mmat.q`	A q by 3+1 matrix of I-spline basis functions, evaluated at the midpoints between boundaries.

Value

Returns a list with the following elements.

`Y.hat`	An n by m matrix of corrected data matrix.
`MB`	An n by q matrix of values of response functions evaluated at the midpoint between boundaries.

Package 'ccrs'

Help Index

Correcting and Clustering preference data in the presence of response style bias.

Description

Author(s)

References

Correcting and Clustering response style biased data

Description

Usage

Arguments

Value

References

See Also

Examples

Convert data matrix to rank-ordered boundary data

Description

Usage

Arguments

Value

Correct response-style-biased data

Description

Usage

Arguments

Value

References

See Also

Examples

Create a dataset for CCRS

Description

Usage

Arguments

Details

Value

References

See Also

Simulate preference data to apply CCRS

Description

Usage

Arguments

Value

References

See Also

Examples

Plot crs objects

Description

Usage

Arguments

Details

References

See Also

Examples

Transform data by the estimated response function

Description

Usage

Arguments

Value

Plot `crs` objects