An important aspect of research is planning a forthcoming empirical study. Some key issues that scientists need to consider are: the nature of the population to be studied, the measure or measures that will be collected per sampled unit, the design of the study, and the planned sample size. The first two of those topics are not really statistical; rather, they involve the nature of the research question and the availability of measurement tools in the scientific field. But the last two issues – the research design and the sample size – are statistical topics. Statistical precision and therefore power grow with the sample size. Determining the statistical power of studies has important ethical and practical implications. When the research subjects are animals or patients, and the number tested was more than necessary to answer the research question, then too many subjects might have been unnecessarily exposed to participation risks. But if too few were tested, then the risks undertaken by the subjects might be wasted because there is not enough evidence to answer the research question. Time and financial resources are limited, so a careful researcher plans future studies so that there is a good chance that the research question is answered. In fact, funding agencies usually require scientists to provide a statistical rationale showing that their research design is feasible. Too few samples compromises the likelihood that the research question can be answered, but too many samples sinks more time and financial resources than needed into one experiment.
The DFBA
package has three functions that are designed
to assist researchers in their planning of a forthcoming study where
there are two conditions and where there is a univariate continuous
measure for each observation. The dfba_bayes_vs_t_power()
function and the dfba_power_curve()
function deal with
frequentist and Bayesian statistical power. These two functions are
discussed together in a separate vignette (see the
dfba_power_functions
vignette). The third function is the
dfba_sim_data()
function.
The dfba_sim_data()
function creates two data sets from
one of nine different probability models. The
dfba_sim_data()
function has an argument called
model
for stipulating the model for data generation. The
model
argument requires a text string from the following
list:
"normal"
"weibull"
"cauchy"
"lognormal"
"chisquare"
"logistic"
"exponential"
"gumbel"
"pareto"
The dfba_sim_data()
function is called many times in the
Monte Carlo sampling that is used by the
dfba_bayes_vs_t_power()
and dfba_power_curve()
functions. The output from the dfba_sim_data()
function has
the frequentist p-value from
the appropriate t-test and the
posterior probability from the corresponding Bayesian distribution-free
test for a difference between the two conditions. If the research design
has paired scores for the two variates, then the frequentist
t-test is the paired
t-test, and the Bayesian
analysis is the Wilcoxon signed-rank test. If the design has two
independent conditions, then the frequentist test is the two-sample
t-test, and the Bayesian
analysis is the Mann-Whitney U
test. Thus, from each call of the dfba_sim_data()
function,
there are two data sets generated along with the primary results from a
frequentist t-test and the
corresponding results from a Bayesian distribution-free statistical
assessment. However, power is not defined for a single sample.
Power estimates are calculated by either the
dfba_bayes_vs_t_power()
function or the
dfba_power_curve()
function from the Monte Carlo
simulations that repeatedly call the dfba_sim_data()
function.
Besides generating random scores for each of two conditions, the
dfba_sim_data()
function can also include a block effect –
for example, variation among individuals with regard to a variable –
specified by the block.max
argument. The
dfba_sim_data()
function generates random scores from a
uniform distribution over the interval of [0, block.max
] where
block.max
is a non-negative number. The default value for
block.max
is 0. A score
for blocking is considered true variability rather than random
measurement error. If an experiment is designed such that each
participant is tested in both the control condition and in the
experimental condition, then the experimental design is a
paired (also known by other names including
within-block, repeated-measures,
matched-samples, and randomized block) design. For a
paired design, the variation of blocks cancels out because the same
block effect is added to the scores for each condition, thus, the
difference scores per block removes the effect of block variation. This
feature of paired-design experiments is a key advantage. For many
research topics, though, it is either impractical or impossible to use
the same block for testing performance in both conditions. If an
experiment is designed such that participants are tested in only one of
two conditions, then the experimental design is an independent
groups (also known by other names including between-block,
completely randomized, and between-participants)
design. With an independent-grousp design, block variation does not
cancel out, so block variability adds to statistical noise. Block
variation increases statistical noise for the independent-groups design,
thus affecting statistical power.
Researchers might be wary of using distribution-free methods because
of a presumed lack of statistical power. Those concerns are largely
unjustified. In fact, in many cases the distribution-free analyses such
as the Bayesian Wilcoxon-signed rank test and the Bayesian Mann-Whitney
U test outperforms the
frequentist t-test (Chechile,
2020). The DFBA
power functions provide power estimates
across a wide range of data types, so researchers can see for themselves
what the relative power may be for a wide variety of data models. Since
most researchers cannot be sure that the data in a forthcoming
experiment will be from Gaussian distributions, power studies using
functions in the DFBA
package are prudent experimental
design tools. The DFBA
functions provide the user with the
opportunity to explore Gaussian data as well as eight other probability
models to see the relative benefits for the frequentist parametric t-test versus a distribution-free
Bayesian analysis. The dfba_sim_data()
function is the
essential function that interfaces with the other functions in the
DFBA
package for doing power analyses (even though the
dfba_sim_data()
function does not itself compute
power estimates).
Since the dfba_sim_data()
function is used primarily by
the two DFBA
power functions, it is not expected that most
users will ever need to implement the dfba_sim_data()
function directly. Consequently. for those users it may not be necessary
to read this vignette further. Such readers can skip to the
dfba_power_functions
vignette that describes both the
dfba_bayes_vs_t_power()
function and the
dfba_power_curve()
function. The rest of the current
vignette is primarily reference material. Nine
Probability Models for Data Generation is devoted to describing each
of the nine probability models. Using the
dfba_sim_data()
Function describes the arguments for
the function. Examples consists of some
selective examples for calling the dfba_sim_data()
function
to get two vectors of random scores along with the Bayesian posterior
probability for a difference between the conditions along with
frequentist p-value from a
t-test.
The normal or Gaussian distribution is the familiar
symmetric probability-density function that is the basis of parametric
statistics. The normal distribution has two population parameters: the
mean and the standard deviation. Random values from any normal
distribution can be sampled using the stats
package
via the command rnorm(n, mean = 0, sd = 1)
. For
example, rnorm(100, 5, 2.2)
produces 100 random values from a normal distribution
with the mean of 5 and the standard
deviation of 2.2. The normal
distribution is important when all values of a continuous variate are
composed of some true score plus measurement error, where the
measurement errors are composed of the sum of latent independent
random perturbations. The normal has the density function:
where μ is the mean and
σ is the standard deviation.
The support for x is the real
line (i.e., x ∈ (−∞, ∞)). The normal does not
have a closed form function for the cumulative probability F(x) = P(y ≤ x);
however, estimates are easily obtained with the
pnorm(x, mean = 0, sd = 1)
command.
In the dfba_sim_data()
function with the argument
model = "normal"
, the control condition variate has a
normal distribution with a mean equal to 0 and a standard deviation equal to the
shape1
argument for the function; in the experimental
condition, the variate has a normal distribution with a mean equal to
the delta
argument with a standard deviation equal to the
shape2
input for the dfba_sim_data()
function.
The Weibull distribution plays an important role for variates where the underlying process depends on a maximum or a minimum of a latent factor. For example, a system with many components, which are all required for proper functioning, fails when the weakest component breaks down (i.e., the component with the minimum lifetime). The Weibull probability density function is:
where s is a scale factor and k is a shape parameter and where x ≥ 0. Both the k and s parameters must be positive. When the shape parameter k equals 1, the Weibull distribution is equivalent to the exponential distribution.
From the stats
package, n random values from a Weibull
distribution can be sampled with the command
rweibull(n, shape, scale=1)
. Consequently, the command
rweibull(30, .7)
randomly samples 30 scores from a Weibull distribution with a
shape parameter of .7 and with a scale
factor of 1. In the
dfba_sim_data()
function, the variate for the first
(control) condition C
has a Weibull distribution with a
scale factor s = 1 and a shape
parameter k equal to the value
of the input for the function. The variate for the second (experimental)
condition E
has an offset equal to the delta
input for the function plus a value from a Weibull distribution with a
scale factor s = 1 and a shape
parameter k equal to the
shape2
input. The value for the offset is added to the
Weibull component of the scores in the experimental condition. The
offset parameter and both shape parameter inputs must be positive
values.
The Cauchy distribution occurs when there is a latent process that is the ratio of two independent normal distributions each with a mean of 0 (Johnson, Kotz, & Balakrishnan, 1995). The density function distribution is similar in shape to – but has heavier tails than – a normal distribution. This distribution is unusual because it does not have a defined expected value or variance. The density function is
where s is the scale factor and l1 is the location parameter (note: when s = 1 and l = 0, the distribution is equivalent to a t-distribution with 1 degree of freedom). The support for the Cauchy distribution is x ∈ (−∞, ∞), and the scale factor must be positive.
The Cauchy is one of the distributions included in the
stats
package: for example, the command
rcauchy(50, location=0, scale=1)
generates 50 random scores from a Cauchy distribution
where the location parameter l
equals 0 and where the scale factor
s is 1. In the dfba_sim_data()
function, the first (control) condition variate has a Cauchy
distribution with location l
equal to 0 and scale factor s equal to the value of the
shape1
argument for the function; in the second
(experimental) condition, the variate has a Cauchy distribution with
location l equal to the value
of the delta
input and with a scale factor s equal to the shape2
argument for the dfba_sim_data()
function. The
dfba_sim_data()
function thus enables power studies where
the variates are Cauchy-distributed and where the location parameters
are separated by the amount stipulated with the delta
argument and where the scale factor for the experimental condition can
be varied to be different from the scale factor for the control
condition.
The lognormal distribution is a continuous density function that arises when the variate is the multiplicative product of a number of latent independent positive components (Johnson, Kotz, & Balakrishnan, 1995). The probability density is
where μ and σ are the respective mean and standard deviation on the log scale. The lognormal has support for x on the positive real numbers.
The lognormal is one of the distributions included in the
stats
package, for example, the command
rlnorm(40, meanlog = 0, sdlog = 2)
generates 40 random scores from a lognormal
distribution that has the mean and standard deviation on the log scale
of, respectively, 0 and 2. In the dfba_sim_data()
function, the first (control) condition variate C
has a
lognormal distribution with a mean and standard deviation on the log
scale of respectively 0 and the
shape1
argument; the second (experimental) condition
variate E
has a lognormal distribution with a mean and
standard deviation on the log scale of the respective the
delta
and shape2
arguments.
The χ2 distribution is a continuous density function for a variate that is the sum of squares of independent latent normal variables (Johnson, Kotz, & Balakrishnan, 1995). The probability density is
where k is the degrees-of-freedom (df) parameter and where Γ is the gamma function. The k parameter must be non-negative, but it need not be an integer. The support for x is on the non-negative real numbers.
The χ2
distribution is included in the stats
package, for example:
the command rchisq(35, df = 5)
generates 35 scores from a χ2 distribution with
k = 5. For the
dfba_sim_data()
function, the first (control) condition
variable C
has a χ2 distribution where the
degrees of freedom are equal to the shape1
argument; the
second (experimental) condition variate E
has a positive
offset equal to the delta
argument plus a χ2 component with degrees
of freedom equal to the shape2
argument.
The logistic distribution is often used as an approximation to a normal, although the logistic has heavier tails than that of the normal (Johnson, Kotz, & Balakrishnan, 1995). Unlike the normal, the logistic has a closed-form equation for the cumulative probability. The probability density function f(x) and the cumulative probability F(x) are
where μ and s are, respectively, the mean and scale factor of the logistic distribution. The support for x is the whole real line. The logistic distribution is a good approximation of a normal with mean μ and standard deviation σ when the mean of the logistic is μ and its scale factor $s=\frac{\sqrt{3}\sigma}{\pi}$.
The logistic distribution is included in the stats
package, for example: the command
rlogis(50, location = 0, scale = 3)
generates 50 random scores from a logistic distribution
with a mean of 0 and a scale factor of
3. In the dfba_sim_data()
function, the first (control) variate C
is sampled from a
logistic distribution with a mean of 0
and a scale factor equal to the shape1
argument; the second
(experimental) variate E
is sampled from a logistic
distribution with a mean and scale factor that are, respectively, the
values of the delta
and shape2
arguments.
The exponential distribution is a special continuous density function that has the property of constant hazard (Chechile, 2003), that is: the probability for a critical event occurring given that it did not occur earlier (the hazard) is a constant. Because of this characteristic, the exponential is described as a memoryless distribution. The probability density function is:
where k is a positive constant that is equal to the hazard rate. The support for the distribution for x ∈ [0, ∞).
The exponential distribution is included in the stats
package, for example: command rexp(60, rate = 5)
generates
50 random scores from an exponential
distribution with a rate parameter equal to 5. In the dfba_sim_data()
function, the first (control) variate C
is sampled from an
exponential distribution with a rate parameter equal to the
shape1
argument; the second (experimental) variate
E
is sampled from an exponential distribution with a rate
parameter equal to the shape2
argument plus an added offset
value that is equal to the delta
argument.
The Gumbel distribution, like the Weibull distribution, is a probability model of a system where the process is controlled by the maximum or the minimum of latent factors (Gumbel, 1958). The probability density f(x) and the cumulative probability F(x) are
where μ is the mode and s is a scale factor. The support for the distribution is for x ∈ −(∞, ∞). The scale factor must be a positive value.
The Gumbel distribution is not included in the
stats
package. However, since the Gumbel distribution has a
closed form cumulative probability, random samples from any Gumbel can
be obtained via the inverse transform method (Fishman, 1996).
For the dfba_sim_data()
function, the first (control)
variate C
is a Gumbel with a mode μ equal to 0 and scale factor s equal to the shape1
argument; the second (experimental) variate E
is a Gumbel
with a mode and scale factor that are, respectively, the values of the
delta
and the shape2
arguments.
The Pareto distribution arose as a probability model of incomes in economics (Pareto, 1897). Harris (1968) observed that a Pareto distribution can model cases of a mixture of exponentially distributed variates where the component rate parameters k−1 have a gamma distribution. The cumulative function for a Pareto distribution is
where x ≥ xm and where xm is the mode (Arnold, 1983). Pareto observed that the value of α is typically near the value of 1.5. However, when α = 1.161, the distribution represents a 80-20 law, which stipulates that 20 percent of people receive 80 percent of the income (Hardy, 2010).
Although the Pareto distribution is not included in the
stats
package, random values can be easily obtained by the
inverse transform method of Monte Carlo sampling (Fishman, 1996). In the
dfba_sim_data()
function, the first (control) variate
C
is sampled from a Pareto distribution with xm = 1; the
second (experimental) variate E
is sampled from a Pareto
with xm
equal to the value of the delta
argument plus 1. The α parameters for the control and
experimental conditions are 1.16 times the respective
shape1
and shape2
arguments. Since the default
value for both the shape1
and shape2
arguments
is 1, the default condition results in
random data samples that satisfy the 80-20 law.
dfba_sim_data()
FunctionThere are three required arguments for the
dfba_sim_data()
function: model
,
design
and delta
. The model
argument is a character string from the following list:
normal
weibull
cauchy
lognormal
chisquare
logistic
exponential
gumbel
pareto
The design
argument must be either the character string
paired
or the character string independent
.
The delta
argument must be a non-negative value for the
separation between the first (experimental) variable and the second
(control) variable (corresponding respectively to the values
E
and C
).
The dfba_sim_data()
function also has six optional
arguments; listed with their corresponding default values, they are:
n = 20
, a0 = 1
, b0 = 1
,
shape1 = 1
, shape2 = 1
, and
block.max = 0
. The value of the n
argument
must be an integer greater than or equal to 20. The reason for the constraint on
n
is to assure that the Bayesian posterior distribution has
sufficient sample size to use the large-n approximation method for either
the Bayesian Mann-Whitney analysis or the Bayesian Wilcoxon signed-rank
analysis. The a0
and b0
arguments represent
the shape parameters for the prior beta distribution used for the
large-n approximation for the
two Bayesian methods; the default value of 1 for each represents a uniform
prior. The shape1
and shape2
arguments are
associated with, respectively, the C
and E
variates. These arguments refer to different distribution shape
parameters depending on the model
input, for example: given
the argument model = "normal"
, shape1
and
shape2
define the values of the standard deviations for the
normal distributions from which the E
and the
C
variates, respectively, are sampled. (see above for more details). The last optional
argument is block.max
. As described above, the dfba_sim_data()
function has the feature of enabling two other DFBA
functions – dfba_bayes_vs_t_power()
and
dfba_power_curve()
– to show the effect of block variation
on power. In the default value block.max = 0
, there is no
blocking effect. As the value of the block.max
argument
increases, there can be reduced power for studies that have independent
groups. But as mentioned previously, the dfba_sim_data()
function itself does not compute power. Variation of the
block.max
argument from the default value is an option best
employed by way of either the dfba_bayes_vs_t_power()
or
the dfba_power_curve()
function rather than with the
dfba_sim_data()
function.
As an example of dfba_sim_data()
function consider the
following commands:
set.seed(1)
example_A1 <- dfba_sim_data(n = 80,
model = "normal",
design = "paired",
delta = 0.4,
shape2 = 2)
example_A1
#> Frequentist p-value
#> 0.3475369
#> Bayesian posterior probability
#> 0.5473132
example_A1$E
#> [1] -0.73733747 0.12964277 2.75617399 -2.64713360 1.58789238 1.06590074
#> [7] 2.52619967 -0.20836785 1.14003762 0.93419758 -0.68504006 2.81573561
#> [13] 2.72080523 1.80042730 3.57366691 1.51697285 -2.15318442 -0.74653083
#> [19] -2.04922523 -0.54680127 -0.84073335 0.48423175 -1.42184330 0.71605754
#> [25] -0.90916929 3.93457454 1.83341495 2.22034846 1.16837072 3.76435216
#> [31] -0.87147291 -0.52328946 3.26456448 -0.90139271 -0.01476149 -0.38561586
#> [37] -0.23998574 -0.15822661 1.38837666 0.04533904 -0.61191492 3.08607765
#> [43] -0.02915882 0.04088694 0.19961852 1.82533261 0.25287119 0.32473166
#> [49] -0.96332096 -0.24854054 0.52032088 -0.77778897 1.46299239 -2.63678816
#> [55] 1.01311572 -2.67289965 -0.20195225 -0.65655981 -0.90418956 0.28620644
#> [61] -3.42871885 2.75316662 -2.92994487 -0.52706080 -1.83184021 -1.10163800
#> [67] 4.57433309 0.43479124 -2.17260106 -2.88121107 1.30037420 0.36288033
#> [73] -0.23613675 -1.45872429 -2.57492062 -1.75038459 2.40005761 -0.84253339
#> [79] -2.36885369 4.13858124
example_A1$C
#> [1] -0.626453811 0.183643324 -0.835628612 1.595280802 0.329507772
#> [6] -0.820468384 0.487429052 0.738324705 0.575781352 -0.305388387
#> [11] 1.511781168 0.389843236 -0.621240581 -2.214699887 1.124930918
#> [16] -0.044933609 -0.016190263 0.943836211 0.821221195 0.593901321
#> [21] 0.918977372 0.782136301 0.074564983 -1.989351696 0.619825748
#> [26] -0.056128740 -0.155795507 -1.470752384 -0.478150055 0.417941560
#> [31] 1.358679552 -0.102787727 0.387671612 -0.053805041 -1.377059557
#> [36] -0.414994563 -0.394289954 -0.059313397 1.100025372 0.763175748
#> [41] -0.164523596 -0.253361680 0.696963375 0.556663199 -0.688755695
#> [46] -0.707495157 0.364581962 0.768532925 -0.112346212 0.881107726
#> [51] 0.398105880 -0.612026393 0.341119691 -1.129363096 1.433023702
#> [56] 1.980399899 -0.367221476 -1.044134626 0.569719627 -0.135054604
#> [61] 2.401617761 -0.039240003 0.689739362 0.028002159 -0.743273209
#> [66] 0.188792300 -1.804958629 1.465554862 0.153253338 2.172611670
#> [71] 0.475509529 -0.709946431 0.610726353 -0.934097632 -1.253633400
#> [76] 0.291446236 -0.443291873 0.001105352 0.074341324 -0.589520946
Note that the above commands generate 80 random samples from the standard normal
for the C
variate and 80
random samples for the E
variate from a normal distribution
with a mean equal to 0.4 and a standard
deviation of 2. Repeating the above
commands with a different seed draws a second set of 80 scores for each condition:
set.seed(2)
example_A2 <- dfba_sim_data(n = 80,
model = "normal",
design = "paired",
delta = 0.4,
shape2 = 2)
example_A2$E
#> [1] 2.39196918 -2.99152981 -0.66674429 -2.34453890 -4.01583956 4.04424504
#> [7] -0.90678682 -0.16936244 -0.37389921 1.17338995 3.60078170 3.76230991
#> [13] -1.96721278 -2.31691451 -2.62534159 -2.10620980 4.31871415 0.41529174
#> [19] -1.28523040 -0.80232021 2.54891881 0.92119567 -0.22854396 -1.09926019
#> [25] -1.32439666 4.49608061 2.27984016 4.41737423 -0.44274714 -0.30166885
#> [31] -1.65476120 -0.10103825 1.34371893 3.11787964 1.52833721 1.31196018
#> [37] 2.86190733 2.69427370 0.61319608 -1.16663331 2.88239965 0.67771684
#> [43] 3.82126318 -0.46128195 -1.68845916 1.47515905 -0.93917197 1.67761122
#> [49] -3.04797967 -3.08486016 1.77960835 1.06192635 2.14213542 -3.63249116
#> [55] 2.82515821 2.80098940 2.46413665 1.97282051 4.62014703 -2.50761969
#> [61] -0.76620770 1.21944797 -1.21396327 0.57110088 1.89248634 -0.90734612
#> [67] 1.71421197 1.49981847 -1.21345872 -1.59475943 2.35178128 0.06115364
#> [73] 1.84438356 -1.28883721 2.95458737 -2.28622110 1.93068134 1.32840514
#> [79] 0.93598656 1.73504537
example_A2$C
#> [1] -0.896914547 0.184849185 1.587845331 -1.130375674 -0.080251757
#> [6] 0.132420284 0.707954729 -0.239698024 1.984473937 -0.138787012
#> [11] 0.417650751 0.981752777 -0.392695356 -1.039668977 1.782228960
#> [16] -2.311069085 0.878604581 0.035806718 1.012828692 0.432265155
#> [21] 2.090819205 -1.199925820 1.589638200 1.954651642 0.004937777
#> [26] -2.451706388 0.477237303 -0.596558169 0.792203270 0.289636710
#> [31] 0.738938604 0.318960401 1.076164354 -0.284157720 -0.776675274
#> [36] -0.595660499 -1.725979779 -0.902584480 -0.559061915 -0.246512567
#> [41] -0.383586228 -1.959103175 -0.841705060 1.903547467 0.622493930
#> [46] 1.990920436 -0.305483725 -0.090844235 -0.184161452 -1.198767765
#> [51] -0.838287148 2.066301356 -0.562247053 1.275715512 -1.047572627
#> [56] -1.965878241 -0.322971094 0.935862527 1.139229803 1.671618767
#> [61] -1.788242207 2.031242519 -0.703144333 0.158164763 0.506234797
#> [66] -0.819995106 -1.998846995 -0.479292591 0.084179904 -0.895486611
#> [71] -0.921275666 0.330449503 -0.141660809 0.434847762 -0.053722626
#> [76] -0.907110376 1.303512232 0.771789776 1.052525595 -1.410038341
As an example with a distribution very different than the normal, consider the following commands:
set.seed(1)
example_B1 <- dfba_sim_data(n = 100,
model = "cauchy",
design = "paired",
delta = 0.5)
example_B1
#> Frequentist p-value
#> 0.9103605
#> Bayesian posterior probability
#> 0.2911831
example_B1$E
#> [1] -1.39263889 2.51232669 1.63614785 0.47701225 -1.74300312
#> [6] 1.29195003 0.93039956 15.02375500 0.25684386 -2.61894118
#> [11] 0.42499787 -0.62148240 2.56959890 5.07309228 1.00246836
#> [16] 0.54110755 -0.74366507 0.83601581 6.36947328 -1.62335831
#> [21] 0.47435465 72.73310490 20.82227994 1.10608966 -0.47015879
#> [26] 7.35574602 -27.98571014 1.26340017 1.37415416 -2.72486647
#> [31] -3.67267954 0.74694880 0.61212033 -1.57755063 0.27190120
#> [36] -2.64162132 -4.66276783 -11.70237908 0.45314101 -41.14565941
#> [41] -1.04565497 -2.52772284 1.43239638 1.55267080 -0.63924892
#> [46] 7.16152714 1.11334873 -0.52096358 0.84233315 0.04676054
#> [51] -2.15537644 -5.00881091 2.17617954 7.24239304 -721.33547226
#> [56] 1.13844383 -10.21155879 0.74099516 1.69158751 1.28935329
#> [61] 1.74652167 0.15795373 6.36401818 -0.32735191 0.10631988
#> [66] 4.07253422 0.70318925 2.25944225 -0.68038392 2.28714708
#> [71] -1.80263989 -0.04724313 0.01450588 3.31526997 3.03720625
#> [76] 0.15918611 -1.55240928 -0.55768640 -2.41170041 0.18575031
#> [81] 1.82086740 1.18540206 0.12733169 -94.81354023 0.09333932
#> [86] 1.17590232 -0.45034045 -0.67458889 0.32134102 -6.13067393
#> [91] -0.77471045 3.24791285 0.82795500 0.26755719 1.73409762
#> [96] -2.91903140 0.86130239 -0.04768216 2.05368834 -0.31229431
example_B1$C
#> [1] 1.10251990 2.35383060 -4.29262624 -0.29664256 0.73464686
#> [6] -0.33052199 -0.17557937 -1.80824308 -2.32862439 0.19658244
#> [11] 0.75561999 0.61954832 -1.50147326 2.62405381 -0.88250440
#> [16] 138.34760311 -1.22737450 -0.02543323 2.52652746 -0.84088155
#> [21] -0.20805599 0.78651704 -1.93735849 0.41625813 1.11450727
#> [26] 2.67469787 0.04209180 2.58214078 -0.43389224 1.82372681
#> [31] 17.74417330 -3.09202775 49.27718408 0.66236678 -0.60259133
#> [36] -1.70964962 -0.75456222 0.35274134 -1.18049671 3.49417974
#> [41] -0.63045697 -2.00824906 -0.81186981 -5.94609241 -10.67930318
#> [46] -0.77892393 0.07342868 13.95554098 -1.11779219 -1.44463156
#> [51] 14.19927731 -0.46593149 5.07709560 0.96783322 0.22576743
#> [56] 0.32306654 1.53568997 -17.06244971 -1.79215906 3.31831965
#> [61] -0.28075465 1.31977477 7.73320790 1.72031518 -1.94941320
#> [66] 1.05168352 14.81386093 -0.90243119 0.27102773 -0.41303137
#> [71] 1.80651036 -0.55204644 1.91306982 1.73761346 13.43512638
#> [76] -0.35223994 -0.45401782 2.77732059 -0.84154958 -0.12435738
#> [81] 4.80293716 -1.26837364 3.07749829 1.63591139 -0.95643295
#> [86] 0.73954537 -1.27985783 0.40208959 0.97204780 0.48330667
#> [91] 0.93687401 0.18729283 -2.08605029 -0.40954977 -0.83303134
#> [96] -0.73954011 7.07006090 3.44541936 -0.67561006 -2.92275964
As with the previous example, repeating these commands with a different starting seed results in two different random scores for paired Cauchy variates.
set.seed(2)
example_B2 <- dfba_sim_data(n = 100,
model = "cauchy",
design = "paired",
delta = 0.5)
example_B2$E
#> [1] 1.231039e+00 4.822881e+00 4.389775e-01 -9.597813e-02 1.764190e+00
#> [6] -2.717543e+00 1.715058e-01 7.278421e+00 9.993489e-01 9.278128e-01
#> [11] 5.776155e-01 -5.899310e-01 2.879434e+00 -3.701513e+00 -1.113842e-01
#> [16] -1.627588e-01 7.728193e-02 8.619926e-01 3.502999e-01 -4.040561e+00
#> [21] 6.163465e-01 1.470841e+00 4.335672e-01 1.247760e-01 1.444890e+00
#> [26] -4.556848e-01 -4.499709e+00 1.923836e+00 -9.357222e-01 2.265352e+00
#> [31] 1.256286e+00 2.408186e-01 5.717904e-01 3.856534e-01 2.031410e+00
#> [36] -1.245426e+00 -8.954369e+00 -1.443718e-01 1.158063e+00 3.561885e+00
#> [41] 1.127824e+00 1.751699e+00 -1.821464e+00 1.877294e+00 6.092095e+00
#> [46] -6.328349e-01 -1.213392e+00 1.988059e+00 1.533664e+01 1.801053e+00
#> [51] 1.144243e+00 2.638391e+00 1.882284e-01 3.392492e+00 -3.279043e-01
#> [56] 1.741467e+00 5.111861e-03 1.100947e+00 7.542938e-01 1.806607e+00
#> [61] -4.823711e-02 3.263398e-01 6.422095e-01 -4.486686e-01 1.848459e+00
#> [66] -1.446642e+00 7.735345e-01 4.228188e-01 5.428301e-01 -7.648460e+00
#> [71] 3.920843e-01 8.286522e-01 1.543346e+00 1.581401e-01 3.222311e+00
#> [76] -2.543375e-01 2.453450e+00 9.657548e-01 -1.454877e+00 -6.687467e+00
#> [81] 3.262620e-01 -3.861250e+00 3.532966e-01 -3.259340e-01 8.897068e-01
#> [86] -6.163003e-02 7.808741e-01 -5.088356e-01 7.076853e-01 7.490537e-01
#> [91] 8.426706e-01 3.812900e+00 4.211861e-01 1.069697e+00 -1.038526e+02
#> [96] 1.234029e+00 1.225208e+00 1.126431e+00 1.662443e+00 9.350540e-01
example_B2$C
#> [1] 0.65634786 -1.35501351 -4.26394344 0.58316746 -0.17828774
#> [6] -0.17946910 0.42960529 -0.57686664 9.91942932 -6.31583937
#> [11] -5.98774837 0.93254957 -0.93603240 0.63823928 3.26083375
#> [16] -0.49556224 -0.07428251 0.85857558 5.70953592 0.24001040
#> [21] -1.79356668 2.71191838 -0.56255042 0.51151135 1.92171947
#> [26] 28.34100330 0.50654872 2.07518260 -0.11789894 0.44161475
#> [31] 0.03272988 0.56890129 -0.67871723 -0.43699346 -22.27289880
#> [36] -2.36786977 -0.53177166 1.24716447 -1.72504774 0.51138575
#> [41] -0.05746675 1.34954628 0.37817039 0.56292291 -0.17763152
#> [46] -0.75149043 -0.07968836 1.94879322 -161.58632951 -0.67777639
#> [51] 0.02233742 0.04619510 -1.53912291 -0.22444953 1.17384226
#> [56] -0.67109223 -0.79662763 -0.03487904 -2.67298247 -1.28764014
#> [61] -0.88146307 -0.37076171 -2.41160461 1.06690534 -0.47412584
#> [66] 5.02634933 2.72762030 8.22767128 0.82028557 0.21015696
#> [71] 1.17608458 1.47525153 0.13327918 0.65540973 0.64958580
#> [76] -0.96625441 1.27309215 -0.44080158 3.16690936 -4.30292473
#> [81] 1.97244393 -1.66692710 0.07886089 3.11429671 0.72642988
#> [86] -0.48396807 -0.08972656 1.61724576 -1.11160867 1.81991733
#> [91] -0.07315583 2.98227544 2.52568571 -5.20772070 8.75714527
#> [96] 0.71118400 4.28025666 0.30086180 0.37897874 5.24501729
Arnold, B. C. (1983). Pareto Distribution. Fairland, MD: International Cooperative Publishing House.
Chechile, R. A. (2003). Mathematical tools for hazard function analysis. Journal of Mathematical Psychology, 47, 478-494.
Fishman, G. S. (1996). Monte Carlo: Concepts, Algorithms, and Applications, Springer, New York.
Gumbel, E. J. (1958). Statistics of Extremes, Columbia University Press, New York.
Hardy, M. (2010). Pareto’s Law. Mathematical Intelligencer, 32, 38-43.
Harris, C. M. (1968). The Pareto distribution as a queue service discipline. Operations Research, 16, 307-313.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995). Continuous Univariate Distributions, Wiley, New York.
Pareto, V. (1897). Cours d’Economie Politique Vol. 2, F. Rouge: Lausanna.
The letter l, not to be confused with the numeral 1↩︎