This vignette documents the use of the GFD
function for
the analysis of general factorial designs. The GFD
function
calculates the Wald-type statistic (WTS), the ANOVA-type statistic (ATS)
and a permuted Wald-type statistic (WTPS). These test statistics can be
used for general factorial designs (crossed or nested) with an arbitrary
number of factors, unequal covariance matrices among groups and
unbalanced data even for small sample sizes.
For illustration purposes, we will use the data set
pizza
which is included in the GFD
package. We
first load the GFD
package and the data set.
The objective of the study was to see how the delivery time in minutes would be affected by three different factors: whether thick or thin crust was ordered (factor A), whether Coke was ordered with the pizza or not (factor B), and whether or not garlic bread was ordered as a side (factor C).
## Crust Coke Bread Driver Hour Delivery
## 1 thin yes yes M 20.87 14
## 2 thick yes no M 20.78 21
## 3 thin no no M 20.75 18
## 4 thin no yes F 20.60 17
## 5 thick no no M 20.70 19
## 6 thick no yes M 20.95 17
This is a three-way crossed design, where each factor has two levels.
We will now analyze this design with the GFD
function. The
GFD
function takes as arguments: * formula
: A
formula consisting of the outcome variable on the left hand side of a ~
operator and the factor variables of interest on the right hand side. An
interaction term must be specified. * data
: A data.frame,
list or environment containing the variables in formula
. *
nperm
: The number of permutations. Default value is 10000.
* alpha
: The significance level, default is 0.05. *
CI.method
: Specifies the method used for calculating the
CIs, either t-quantile
(default) or perm
.
set.seed(1234)
model1 <- GFD(Delivery ~ Crust * Coke * Bread, data = pizza, nperm = 1000, alpha = 0.05)
summary(model1)
## Call:
## Delivery ~ Crust * Coke * Bread
##
## Descriptive:
## Crust Coke Bread n Means Variances Lower 95 % CI Upper 95 % CI
## 1 thin no no 2 19.0 2.0 6.293795 31.70620
## 5 thin no yes 2 17.5 0.5 11.146898 23.85310
## 3 thin yes no 2 17.5 4.5 -1.559307 36.55931
## 7 thin yes yes 2 15.0 2.0 2.293795 27.70620
## 2 thick no no 2 19.5 0.5 13.146898 25.85310
## 6 thick no yes 2 18.0 2.0 5.293795 30.70620
## 4 thick yes no 2 21.5 0.5 15.146898 27.85310
## 8 thick yes yes 2 18.5 0.5 12.146898 24.85310
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value p-value WTPS
## Crust 11.56 1 0.0006738585 0.008
## Coke 0.36 1 0.5485062355 0.570
## Crust:Coke 6.76 1 0.0093223760 0.026
## Bread 11.56 1 0.0006738585 0.015
## Crust:Bread 0.04 1 0.8414805811 0.818
## Coke:Bread 1.00 1 0.3173105079 0.379
## Crust:Coke:Bread 0.04 1 0.8414805811 0.814
##
## ANOVA-Type Statistic (ATS):
## Test statistic df1 df2 p-value
## Crust 11.56 1 4.699248 0.02121110
## Coke 0.36 1 4.699248 0.57625702
## Crust:Coke 6.76 1 4.699248 0.05122842
## Bread 11.56 1 4.699248 0.02121110
## Crust:Bread 0.04 1 4.699248 0.84984482
## Coke:Bread 1.00 1 4.699248 0.36598284
## Crust:Coke:Bread 0.04 1 4.699248 0.84984482
The output consists of three parts: model1$Descriptive
gives an overview of the descriptive statistics: The number of
observations, mean and variance as well as confidence intervals (based
on quantiles of the t-distribution or the permutation distribution) are
displayed for each factor level combination. model1$WTS
contains the results for the Wald-type test: The test statistic, degree
of freedom and p-values based on the asymptotic χ2 distribution and the
permutation procedure, respectively, are displayed. Note that the χ2 approximation is very
liberal for small sample sizes and therefore the WTPS is recommended for
such situations. Finally, model1$ATS
contains the
corresponding results based on the ATS. This test statistic tends to
rather conservative decisions in the case of small sample sizes and is
even asymptotically only an approximation, thus not providing an
asymptotic level α test.
We find a significant influence of the factors Crust and Bread. The WTS and WTPS also suggest a significant interaction between the factors Crust and Coke at 5% level, which is only borderline significant when using the ATS.
Nested designs can also be analyzed using the GFD
function. Note that in nested designs, the levels of the nested factor
usually have the same labels for all levels of the main factor, i.e.,
for each level i = 1, ..., a of the main
factor A the nested factor levels are labeled as j = 1, ..., bi.
If the levels of the nested factor are named uniquely, this has to be
specified by setting the parameter nested.levels.unique
to
TRUE. In this package, only analysis of balanced nested designs is
possible, that is, the same number of levels of the nested factor for
each level of the main factor.
We consider the data set curdies
from the
GFD
package:
data("curdies")
set.seed(987)
nested <- GFD(dugesia ~ season + season:site, data = curdies, nested.levels.unique = TRUE)
summary(nested)
## Call:
## dugesia ~ season + season:site
##
## Descriptive:
## season site n Means Variances Lower 95 % CI Upper 95 % CI
## 1 SUMMER 4 6 0.4190947 0.4615290 -0.29384911 1.1320385
## 2 SUMMER 5 6 0.2290862 0.3148830 -0.35979868 0.8179711
## 3 SUMMER 6 6 0.1942443 0.0729142 -0.08913091 0.4776195
## 4 WINTER 1 6 2.0494375 4.0647606 -0.06635610 4.1652311
## 5 WINTER 2 6 4.1819078 35.6801853 -2.08667494 10.4504905
## 6 WINTER 3 6 0.6782063 0.1910970 0.21944919 1.1369633
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value p-value WTPS
## season 5.415180 1 0.01996239 0.0000
## season:site 5.200991 4 0.26728919 0.3247
##
## ANOVA-Type Statistic (ATS):
## Test statistic df1 df2 p-value
## season 5.415180 1.000000 6.447707 0.05593278
## season:site 1.382224 1.217424 6.447707 0.29278958
The aim of the study was to describe basic patterns of variation in a small flatworm, Dugesia, in the Curdies River, Western Victoria. Therefore, worms were sampled at two different seasons and three different sites within each season. For our analyses we consider both factors as fixed (e.g., some sites may only be accessed in summer). In this setting, both WTS and WTPS detect a significant influence of the season whereas the ATS, again, only shows a borderline significance at 5% level. The effect of the site is not significant.
The GFD
package is equipped with a plotting function,
displaying the calculated means along with (1 − α) confidence intervals. The
plot
function takes a GFD
object as an
argument. In addition, the factor of interest may be specified. If this
argument is omitted in a two- or higher-way layout, the user is asked to
specify the factor for plotting. Furthermore, additional graphical
parameters can be used to customize the plots. The optional argument
legendpos
specifies the position of the legend in
higher-way layouts.
plot(model1, factor = "Crust:Coke:Bread", legendpos = "center", main = "Delivery time of pizza", xlab = "Bread")
The GFD
package is equipped with an optional graphical
user interface, which is based on RGtk2
. The GUI may be
started in R
(if RGtk2
is installed) using the
command calculateGUI()
.
The user can specify the data location (either directly or via the “load data” button), the formula, the number of permutations and the significance level. Additionally, one can specify whether or not headers are included in the data file, and which separator (e.g., ‘,’ for *.csv files) and character symbols are used for decimals in the data file. The GUI also provides a plotting option, which generates a new window for specifying the factors to be plotted (in higher-way layouts) along with a few plotting parameters.