Title: | Sample Generation by Replacement |
---|---|
Description: | Sample Generation by Replacement simulations (SGR; Lombardi & Pastore, 2014; Pastore & Lombardi, 2014). The package can be used to perform fake data analysis according to the sample generation by replacement approach. It includes functions for making simple inferences about discrete/ordinal fake data. The package allows to study the implications of fake data for empirical results. |
Authors: | Massimiliano Pastore & Luigi Lombardi |
Maintainer: | Massimiliano Pastore <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3.1 |
Built: | 2025-01-24 06:38:57 UTC |
Source: | CRAN |
Average root mean square error (AMSE).
amse(Bpar, B0)
amse(Bpar, B0)
Bpar |
Matrix with dimension |
B0 |
Vector of true parameter values. |
Let be the estimated parameter value for the
th
parameter in the
th sample (replicate),
,
,
and let
be the corresponding true parameter value, the Average root mean square error is defined as follows:
Gives the AMSE value.
If , the ratio
is modified as follows:
Massimiliano Pastore & Luigi Lombardi
Yang-Wallentin, F., Joreskog, K. G., Luo, H. (2010). Confirmatory Factor Analysis of Ordinal Variables With Misspecified Models, Structural Equation Modeling: A Multidisciplinary Journal, 17, 392-423.
Average relative bias (ARB).
arb(Bpar, B0)
arb(Bpar, B0)
Bpar |
Matrix with dimension |
B0 |
Vector of true parameter values. |
Let be the estimated parameter value for the
th
parameter in the
th sample (replicate),
,
,
and let
be the corresponding true parameter value, the Average relative bias is defined as follows:
Gives the ARB value.
If , the ratio
is modified as follows:
Massimiliano Pastore & Luigi Lombardi
Yang-Wallentin, F., Joreskog, K. G., Luo, H. (2010). Confirmatory Factor Analysis of Ordinal Variables With Misspecified Models, Structural Equation Modeling: A Multidisciplinary Journal, 17, 392-423.
The generalized beta distribution extends the classical beta distribution beyond the [0,1] range (Whitby, 1971).
dgBeta(x, a = min(x), b = max(x), gam = 1, del = 1)
dgBeta(x, a = min(x), b = max(x), gam = 1, del = 1)
x |
Vector of quantilies. |
a |
Minimum of range of r.v. |
b |
Maximum of range of r.v. |
gam |
Gamma parameter. |
del |
Delta parameter. |
The Generalized Beta Distribution is defined as follows:
where is the beta function. The parameters
and
(with
) are the left and right end points, respectively. The parameters
and
are the governing shape parameters for
and
respectively. For all the values of
the r.v.
that fall outside the interval
,
simply takes value 0. The
generalized beta distribution reduces to the beta distribution when
and
.
Gives the density.
Massimiliano Pastore & Luigi Lombardi
Whitby, O. (1971). Estimation of parameters in the generalized beta distribution (Technical Report NO. 29). Department of Statistics: Standford University.
curve(dgBeta(x)) curve(dgBeta(x,gam=3,del=3)) curve(dgBeta(x,gam=1.5,del=2.5)) x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) par(mfrow=c(2,2)) for (j in 1:4) { plot(x,dgBeta(x,gam=GA[j],del=DE[j]),type="h", panel.first=points(x,dgBeta(x,gam=GA[j],del=DE[j]),pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.6), ylab="dgBeta(x)") }
curve(dgBeta(x)) curve(dgBeta(x,gam=3,del=3)) curve(dgBeta(x,gam=1.5,del=2.5)) x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) par(mfrow=c(2,2)) for (j in 1:4) { plot(x,dgBeta(x,gam=GA[j],del=DE[j]),type="h", panel.first=points(x,dgBeta(x,gam=GA[j],del=DE[j]),pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.6), ylab="dgBeta(x)") }
Generalized Beta distribution for discrete variables.
dgBetaD(x, a = min(x), b = max(x), gam = 1, del = 1, ct = 1)
dgBetaD(x, a = min(x), b = max(x), gam = 1, del = 1, ct = 1)
x |
Vector of quantilies. |
a |
Minimum of range of r.v. |
b |
Maximum of range of r.v. |
gam |
Gamma parameter. |
del |
Delta parameter. |
ct |
Correction term, default value: 1. |
Let be a discrete r. v. with range
and where and
. The Generalized Discrete Beta Distribution for the r.v.
is defined as follows:
where is a modified version of the generalized beta distribution
dgBeta
defined as
Gives the density.
Massimiliano Pastore & Luigi Lombardi
Lombardi, L., Pastore, M. (2014). sgr: A Package for Simulating Conditional Fake Ordinal Data. The R Journal, 6, 164-177.
Pastore, M., Lombardi, L. (2014). The impact of faking on Cronbach's Alpha for dichotomous and ordered rating scores. Quality & Quantity, 48, 1191-1211.
x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) par(mfrow=c(2,2)) for (j in 1:4) { plot(x,dgBetaD(x,gam=GA[j],del=DE[j]),type="h", panel.first=points(x,dgBetaD(x,gam=GA[j],del=DE[j]),pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.6), ylab="dgBetaD(x)") }
x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) par(mfrow=c(2,2)) for (j in 1:4) { plot(x,dgBetaD(x,gam=GA[j],del=DE[j]),type="h", panel.first=points(x,dgBetaD(x,gam=GA[j],del=DE[j]),pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.6), ylab="dgBetaD(x)") }
Set different instances of the conditional replacement distribution.
model.fake.par(fake.model = c("uninformative", "average", "slight", "extreme"))
model.fake.par(fake.model = c("uninformative", "average", "slight", "extreme"))
fake.model |
A character string
indicating the model for the conditional replacement distribution.
The options are: |
Gives a list with and
parameters.
Massimiliano Pastore
Lombardi, L., Pastore, M. (2014). sgr: A Package for Simulating Conditional Fake Ordinal Data. The R Journal, 6, 164-177.
Pastore, M., Lombardi, L. (2014). The impact of faking on Cronbach's Alpha for dichotomous and ordered rating scores. Quality & Quantity, 48, 1191-1211.
model.fake.par() # default model.fake.par("average")
model.fake.par() # default model.fake.par("average")
This function allows to set different replacement distributions for different subsets of cells in the data matrix.
partition.replacement(Dx, PM, Q = NULL, Pparm = NULL, fake.model = NULL,p = NULL)
partition.replacement(Dx, PM, Q = NULL, Pparm = NULL, fake.model = NULL,p = NULL)
Dx |
Data frame or matrix to be replaced. |
PM |
Partition matrix with size |
Q |
Max value in the discrete r.v. range: |
Pparm |
List of replacement parameters for each class in the replacement partition. See details. |
fake.model |
A character string indicating the model for the conditional replacement distribution, see |
p |
Overall probability of replacement. Must be a matrix with |
PM
has size dim(Dx)
and contains a
numeric code for each distinct class in the partition.
If a cell of the partition matrix PM
contains
0
, then the corresponding Dx
cell value is not modified (no replacements condition class).
Pparm
is a list containing three elements. Each element is a matrix where
is the total number of classes in the partition (see examples for further details).
p
: Overall probability of replacement: p[,1]
indicates the faking good probability, p[,2]
indicates the faking bad probability.
gam
: Gamma parameter: gam[,1]
and gam[,2]
indicate the faking good and the faking bad parameters for the
lower bound a
.
del
: Delta parameter: del[,1]
and del[,2]
indicate the faking good and the faking bad parameters for the
upper bound b
.
Note that it is possible to define a faking model using the fake.model
assignment. In such cases the user must specify also the overall probability of replacement using parameter p
.
Returns the fake data matrix.
Massimiliano Pastore
require(MASS) set.seed(20130207) R <- matrix(c(1,.3,.3,1),2,2) Dx <- rdatagen(n=20,R=R,Q=5)$data ## partition matrix PM <- matrix(0,nrow(Dx),ncol(Dx)) PM[3:10,2] <- 1 PM[3:10,1] <- 1 partition.replacement(Dx,PM) # warning! zero replacements ## using fake.model partition.replacement(Dx,PM,fake.model="uninformative",p=matrix(c(.3,.2),ncol=2)) ### p <- c(.5,0) gam <- c(1,1) del <- c(1,1) Pparm <- list(p=p,gam=gam,del=del) partition.replacement(Dx,PM,Pparm=Pparm) ### another partition PM[11:15,2] <- 2 (Pparm <- list(p=matrix(c(0,.5,.5,0),2,2), gam=matrix(c(1,4,1,4),2,2),del=matrix(c(1,2,1,2),2,2))) partition.replacement(Dx,PM,Pparm=Pparm)
require(MASS) set.seed(20130207) R <- matrix(c(1,.3,.3,1),2,2) Dx <- rdatagen(n=20,R=R,Q=5)$data ## partition matrix PM <- matrix(0,nrow(Dx),ncol(Dx)) PM[3:10,2] <- 1 PM[3:10,1] <- 1 partition.replacement(Dx,PM) # warning! zero replacements ## using fake.model partition.replacement(Dx,PM,fake.model="uninformative",p=matrix(c(.3,.2),ncol=2)) ### p <- c(.5,0) gam <- c(1,1) del <- c(1,1) Pparm <- list(p=p,gam=gam,del=del) partition.replacement(Dx,PM,Pparm=Pparm) ### another partition PM[11:15,2] <- 2 (Pparm <- list(p=matrix(c(0,.5,.5,0),2,2), gam=matrix(c(1,4,1,4),2,2),del=matrix(c(1,2,1,2),2,2))) partition.replacement(Dx,PM,Pparm=Pparm)
The function gives the conditional probability of replacement for discrete values in the range
.
pfake(k, h = k, p = c(0,0), Q = 5, gam = c(1,1), del = c(1,1), fake.model = c("uninformative", "average", "slight", "extreme"))
pfake(k, h = k, p = c(0,0), Q = 5, gam = c(1,1), del = c(1,1), fake.model = c("uninformative", "average", "slight", "extreme"))
k |
A fake value. |
h |
An observed original value. |
p |
Overall probability of replacement: |
Q |
Max value in the discrete r.v. range: |
gam |
Gamma parameter: |
del |
Delta parameter: |
fake.model |
A character string
indicating the model for the conditional replacement distribution. The options are: |
Gives the conditional probability distribution based on the following equation
with and
being the parameter vector
and the generalized Beta distribution for discrete variables (
dgBetaD
) with bounds (resp.
) and
(resp
). The parameter
denotes the probability of faking good,
indicates the probability of faking bad.
Note that the faking probabilities must satisfy the following condition:
.
Massimiliano Pastore & Luigi Lombardi
Lombardi, L., Pastore, M. (2014). sgr: A Package for Simulating Conditional Fake Ordinal Data. The R Journal, 6, 164-177.
Pastore, M., Lombardi, L. (2014). The impact of faking on Cronbach's Alpha for dichotomous and ordered rating scores. Quality & Quantity, 48, 1191-1211.
x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) ### fake good par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfake(x[i],h=4,Q=7, gam=c(GA[j],GA[j]),del=c(DE[j],DE[j]),p=c(.4,0))) plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.7), ylab="Replacement probability") } ### fake bad for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfake(x[i],h=4,Q=7, gam=c(GA[j],GA[j]),del=c(DE[j],DE[j]),p=c(0,.4))) plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.7), ylab="Replacement probability") } ### fake good and fake bad P = c(.4,.4) par(mfrow=c(2,4)) for (j in x) { y <- NULL for (i in x) { y <- c(y,pfake(x[i],h=x[j],Q=max(x),gam=c(GA[1],GA[1]),del=c(DE[1],DE[1]),p=P)) } plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("h=",x[j],sep=""),ylim=c(0,1), ylab="Replacement probability") print(sum(y,na.rm=TRUE)) } ### using the fake.model argument x <- 1:5 models <- c("uninformative","average","slight","extreme") par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfake(x[i],h=2,Q=max(x), fake.model=models[j],p=c(.45,0))) # fake good plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste(models[j],"model"),ylim=c(0,1), ylab="Replacement probability") } par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfake(x[i],h=4,Q=max(x), fake.model=models[j],p=c(0,.45))) # fake bad plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste(models[j],"model"),ylim=c(0,1), ylab="Replacement probability") }
x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) ### fake good par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfake(x[i],h=4,Q=7, gam=c(GA[j],GA[j]),del=c(DE[j],DE[j]),p=c(.4,0))) plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.7), ylab="Replacement probability") } ### fake bad for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfake(x[i],h=4,Q=7, gam=c(GA[j],GA[j]),del=c(DE[j],DE[j]),p=c(0,.4))) plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.7), ylab="Replacement probability") } ### fake good and fake bad P = c(.4,.4) par(mfrow=c(2,4)) for (j in x) { y <- NULL for (i in x) { y <- c(y,pfake(x[i],h=x[j],Q=max(x),gam=c(GA[1],GA[1]),del=c(DE[1],DE[1]),p=P)) } plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("h=",x[j],sep=""),ylim=c(0,1), ylab="Replacement probability") print(sum(y,na.rm=TRUE)) } ### using the fake.model argument x <- 1:5 models <- c("uninformative","average","slight","extreme") par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfake(x[i],h=2,Q=max(x), fake.model=models[j],p=c(.45,0))) # fake good plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste(models[j],"model"),ylim=c(0,1), ylab="Replacement probability") } par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfake(x[i],h=4,Q=max(x), fake.model=models[j],p=c(0,.45))) # fake bad plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste(models[j],"model"),ylim=c(0,1), ylab="Replacement probability") }
The function gives the conditional probability of replacement for discrete values in the range
.
pfakebad(k, h = k, p = 0, Q = 5, gam = 1, del = 1)
pfakebad(k, h = k, p = 0, Q = 5, gam = 1, del = 1)
k |
A fake value. |
h |
An observed original value. |
p |
Overall probability of replacement. |
Q |
Max value in the discrete r.v. range: |
gam |
Gamma parameter. |
del |
Delta parameter. |
Gives the conditional probability based on the following equation
with and
being the parameter vector
and the generalized Beta distribution for discrete variables (
dgBetaD
) with bounds and
. The parameter
denotes the probability of faking bad.
Massimiliano Pastore & Luigi Lombardi
Pastore, M., Lombardi, L. (2014). The impact of faking on Cronbach's Alpha for dichotomous and ordered rating scores. Quality & Quantity, 48, 1191-1211.
x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfakebad(x[i],h=5,Q=7,gam=GA[j],del=DE[j],p=.4)) plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.7), ylab="Replacement probability") }
x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfakebad(x[i],h=5,Q=7,gam=GA[j],del=DE[j],p=.4)) plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.7), ylab="Replacement probability") }
The function gives the conditional probability of replacement for discrete values in the range
.
pfakegood(k, h = k, p = 0, Q = 5, gam = 1, del = 1)
pfakegood(k, h = k, p = 0, Q = 5, gam = 1, del = 1)
k |
A fake value. |
h |
An observed original value. |
p |
Overall probability of replacement. |
Q |
Max value in the discrete r.v. range: |
gam |
Gamma parameter. |
del |
Delta parameter. |
Gives the conditional probability based on the following equation
with and
being the parameter vector
and the generalized Beta distribution for discrete variables (
dgBetaD
) with bounds and
. The parameter
denotes the probability of faking good.
Massimiliano Pastore & Luigi Lombardi
Pastore, M., Lombardi, L. (2014). The impact of faking on Cronbach's Alpha for dichotomous and ordered rating scores. Quality & Quantity, 48, 1191-1211.
x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfakegood(x[i],h=3,Q=7,gam=GA[j],del=DE[j],p=.4)) plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.7), ylab="Replacement probability") }
x <- 1:7 GA <- c(1,3,1.5,8); DE <- c(1,3,4,2.5) par(mfrow=c(2,2)) for (j in 1:4) { y <- NULL for (i in x) y <- c(y,pfakegood(x[i],h=3,Q=7,gam=GA[j],del=DE[j],p=.4)) plot(x,y,type="h",panel.first=points(x,y,pch=19), main=paste("gamma=",GA[j]," delta=",DE[j],sep=""),ylim=c(0,.7), ylab="Replacement probability") }
The psydata
data frame has 744 rows (observations) and 22 columns (variables).
data(psydata)
data(psydata)
This data frame contains the following variables:
nsogg
: int, subject number.
vers
: Factor, questionnaire version:
V1
fake-motivating version, V3
honest-motivating version
e V4
neutral version.
sex
: Factor, gender.
eta
: int, age.
resid
: Factor, residence.
dipl
: Factor, education.
voto
: int, high school's final score.
votomax
: int, maximum value for voto
.
cdl
: Factor, a character string indicating the type of undergraduate program.
aep..
: int, 12 items of the AEP/A scale.
tot
: int, total score.
Andrea Bobbio, Massimo Nucci, Massimiliano Pastore
Simulate discrete data from either a correlation matrix or thresholds or probabilities.
rdatagen(n = 100, R = diag(1,2), Q = NULL, th = NULL, probs = NULL)
rdatagen(n = 100, R = diag(1,2), Q = NULL, th = NULL, probs = NULL)
n |
Number of observations. |
R |
Correlation matrix. |
Q |
Number of discrete values in the
random variables. It is a single value or a vector. If |
th |
List of thresholds; each element contains |
probs |
List of probabilities; each elements contains |
Returns a list with four elements:
data |
The simulated data matrix. |
R |
Correlation matrix. |
thresholds |
The list of thresholds. |
probs |
The list of probabilities. |
Defaults work like in the mvrnorm
function of the MASS
package.
Massimiliano Pastore, Luigi Lombardi & Marco Bressan
Lombardi, L., Pastore, M. (2014). sgr: A Package for Simulating Conditional Fake Ordinal Data. The R Journal, 6, 164-177.
Pastore, M., Lombardi, L. (2014). The impact of faking on Cronbach's Alpha for dichotomous and ordered rating scores. Quality & Quantity, 48, 1191-1211.
require(MASS) ## only default rdatagen() ## set correlations only R <- matrix(c(1,.4,.4,1),2,2) Dx <- rdatagen(n=10000,R=R)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) hist(Dx[,j]) ## set correlations and Q Dx <- rdatagen(n=10000,R=R,Q=2)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx)) ## set correlations and thresholds th <- list(c(-Inf,qchisq(pbinom(0:3,4,.5),1),Inf), c(-Inf,qnorm(pbinom(0:2,3,.5)),Inf)) Dx <- rdatagen(n=10000,R=R,th=th)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx)) ## set correlations and probabilities [1] probs <- list(c(.125,.375,.375,.125),c(.125,.375,.375,.125)) Dx <- rdatagen(n=10000,R=R,probs=probs)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx)) ## set correlations and probabilities [2] probs <- c(.125,.375,.375,.125) Dx <- rdatagen(n=10000,R=R,probs=probs)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx)) ## set different values for Q Dx <- rdatagen(n=1000,Q=c(2,4),R=R)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx))
require(MASS) ## only default rdatagen() ## set correlations only R <- matrix(c(1,.4,.4,1),2,2) Dx <- rdatagen(n=10000,R=R)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) hist(Dx[,j]) ## set correlations and Q Dx <- rdatagen(n=10000,R=R,Q=2)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx)) ## set correlations and thresholds th <- list(c(-Inf,qchisq(pbinom(0:3,4,.5),1),Inf), c(-Inf,qnorm(pbinom(0:2,3,.5)),Inf)) Dx <- rdatagen(n=10000,R=R,th=th)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx)) ## set correlations and probabilities [1] probs <- list(c(.125,.375,.375,.125),c(.125,.375,.375,.125)) Dx <- rdatagen(n=10000,R=R,probs=probs)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx)) ## set correlations and probabilities [2] probs <- c(.125,.375,.375,.125) Dx <- rdatagen(n=10000,R=R,probs=probs)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx)) ## set different values for Q Dx <- rdatagen(n=1000,Q=c(2,4),R=R)$data par(mfrow=c(1,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j])/nrow(Dx))
Replaces data in the original data matrix using a specified replacement matrix.
rdatarepl(Dx, RM, printfp = TRUE)
rdatarepl(Dx, RM, printfp = TRUE)
Dx |
Data frame or matrix to be replaced. |
RM |
Replacement matrix. |
printfp |
Logical, if |
Replacement matrices can be obtained from the replacement.matrix
function.
Returns a list with two elements:
Fx |
The replaced (fake) data matrix. |
Fperc |
Percentage of replaced data. |
Massimiliano Pastore
require(MASS) set.seed(20130207) Dx <- rdatagen(R=matrix(c(1,.3,.3,1),2,2),Q=5)$data RM <- replacement.matrix(p=c(.6,0)) Fx <- rdatarepl(Dx,RM)$Fx par(mfrow=c(2,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j]),main="original data") for (j in 1:ncol(Fx)) barplot(table(Fx[,j]),main="replaced data")
require(MASS) set.seed(20130207) Dx <- rdatagen(R=matrix(c(1,.3,.3,1),2,2),Q=5)$data RM <- replacement.matrix(p=c(.6,0)) Fx <- rdatarepl(Dx,RM)$Fx par(mfrow=c(2,2)) for (j in 1:ncol(Dx)) barplot(table(Dx[,j]),main="original data") for (j in 1:ncol(Fx)) barplot(table(Fx[,j]),main="replaced data")
Builds the replacement matrix.
replacement.matrix(Q = 5, p = c(0,0), gam = c(1,1), del = c(1,1), fake.model = c("uninformative", "average", "slight", "extreme"))
replacement.matrix(Q = 5, p = c(0,0), gam = c(1,1), del = c(1,1), fake.model = c("uninformative", "average", "slight", "extreme"))
Q |
Max value in the discrete r.v. range: |
p |
Overall probability of replacement: |
gam |
Gamma parameter: |
del |
Delta parameter: |
fake.model |
A character string
indicating the model for the conditional replacement distribution. The options are: |
Gives a matrix with replacement probabilities. Each row
(
) in the matrix indicates the conditional probability distribution
(
p
) denotes the overall replacement probability.
Massimiliano Pastore
dgBetaD
, pfake
, pfakegood
, pfakebad
## no replacements replacement.matrix(Q=7) ## faking good replacement.matrix(Q=7,p=c(.5,0)) replacement.matrix(Q=7,p=c(.5,0),gam=8,del=2.5) ## faking bad replacement.matrix(Q=7,p=c(0,.5)) replacement.matrix(Q=7,p=c(0,.5),gam=8,del=2.5) ## faking good and faking bad replacement.matrix(Q=7,p=c(.3,.5),gam=c(8,8),del=c(2.5,2.5)) ## using the fake.model argument replacement.matrix(Q=7,p=c(0,.4),fake.model="extreme") replacement.matrix(Q=7,p=c(.4,0),fake.model="extreme") replacement.matrix(Q=7,p=c(.4,.4),fake.model="slight")
## no replacements replacement.matrix(Q=7) ## faking good replacement.matrix(Q=7,p=c(.5,0)) replacement.matrix(Q=7,p=c(.5,0),gam=8,del=2.5) ## faking bad replacement.matrix(Q=7,p=c(0,.5)) replacement.matrix(Q=7,p=c(0,.5),gam=8,del=2.5) ## faking good and faking bad replacement.matrix(Q=7,p=c(.3,.5),gam=c(8,8),del=c(2.5,2.5)) ## using the fake.model argument replacement.matrix(Q=7,p=c(0,.4),fake.model="extreme") replacement.matrix(Q=7,p=c(.4,0),fake.model="extreme") replacement.matrix(Q=7,p=c(.4,.4),fake.model="slight")
Data about smoking and drug consumption among young people.
data(smokers)
data(smokers)
This data frame contains the following columns:
age
: int, 1 = adults, 2 = minors.
smoking
: int, 1 = yes, 2 = no.
drug
: int, drug addiction, 1 = yes, 2 = no.
druguse
: int, drug consumption, 1 = never, 2 = once, 3 = some times,
4 = often.
Pastore, M., Lombardi, L., Mereu, F. (2007). Effects of malingering in self-report measures: A scenario analysis approach; in C. H. Skiadas (Ed.), Recent Advances in Stochastic Modeling and Data Analysis, pp. 483-491, World Scientific Publishing.