Title: | Generalized Estimating Equation Package |
---|---|
Description: | Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses. See e.g. Halekoh and Højsgaard, (2005, <doi:10.18637/jss.v015.i02>), for details. |
Authors: | Søren Højsgaard [aut, cre, cph], Ulrich Halekoh [aut, cph], Jun Yan [aut, cph], Claus Thorn Ekstrøm [ctb] |
Maintainer: | Søren Højsgaard <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.3.12 |
Built: | 2024-12-23 06:47:30 UTC |
Source: | CRAN |
Comparing regression coefficients between models when one model is nested within another for clustered data.
compCoef(fit0, fit1)
compCoef(fit0, fit1)
fit0 |
a fitted object of class |
fit1 |
another fitted object of class |
a list of two components:
delta |
estimated difference
in the coefficients of common covariates from |
variance |
estimated variance matrix of delta |
Jun Yan [email protected]
Allison, P. D. (1995). The impact of random predictors on comparisons of coefficients between models: Comment on Clogg, Petkova, and Haritou. American Journal of Sociology, 100(5), 1294–1305.
Clogg, C. C., Petkova, E., and Haritou, A. (1995). Statistical methods for comparing regression coefficients between models. American Journal of Sociology, 100(5), 1261–1293.
Yan, J., Aseltine, R., and Harel, O. (2011). Comparing Regression Coefficients Between Nested Linear Models for Clustered Data with Generalized Estimating Equations. Journal of Educational and Behaviorial Statistics, Forthcoming.
## generate clustered data gendat <- function(ncl, clsz) { ## ncl: number of clusters ## clsz: cluster size (all equal) id <- rep(1:ncl, each = clsz) visit <- rep(1:clsz, ncl) n <- ncl * clsz x1 <- rbinom(n, 1, 0.5) ## within cluster varying binary covariate x2 <- runif(n, 0, 1) ## within cluster varying continuous covariate ## the true correlation coefficient rho for an ar(1) ## correlation structure is 2/3 rho <- 2/3 rhomat <- rho ^ outer(1:4, 1:4, function(x, y) abs(x - y)) chol.u <- chol(rhomat) noise <- as.vector(sapply(1:ncl, function(x) chol.u %*% rnorm(clsz))) y <- 1 + 3 * x1 - 2 * x2 + noise dat <- data.frame(y, id, visit, x1, x2) dat } simdat <- gendat(100, 4) fit0 <- geese(y ~ x1, id = id, data = simdat, corstr = "un") fit1 <- geese(y ~ x1 + x2, id = id, data = simdat, corstr = "un") compCoef(fit0, fit1)
## generate clustered data gendat <- function(ncl, clsz) { ## ncl: number of clusters ## clsz: cluster size (all equal) id <- rep(1:ncl, each = clsz) visit <- rep(1:clsz, ncl) n <- ncl * clsz x1 <- rbinom(n, 1, 0.5) ## within cluster varying binary covariate x2 <- runif(n, 0, 1) ## within cluster varying continuous covariate ## the true correlation coefficient rho for an ar(1) ## correlation structure is 2/3 rho <- 2/3 rhomat <- rho ^ outer(1:4, 1:4, function(x, y) abs(x - y)) chol.u <- chol(rhomat) noise <- as.vector(sapply(1:ncl, function(x) chol.u %*% rnorm(clsz))) y <- 1 + 3 * x1 - 2 * x2 + noise dat <- data.frame(y, id, visit, x1, x2) dat } simdat <- gendat(100, 4) fit0 <- geese(y ~ x1, id = id, data = simdat, corstr = "un") fit1 <- geese(y ~ x1 + x2, id = id, data = simdat, corstr = "un") compCoef(fit0, fit1)
The dietox
data frame has 861 rows and 7 columns.
dietox
dietox
This data frame contains the following columns:
Weight in Kg
Cumulated feed intake in Kg
Time (in weeks) in the experiment
Factor; id of each pig
Factor; vitamin E dose; see 'details'.
Factor, copper dose; see 'details'
Start weight in experiment, i.e. weight at week 1.
Factor, id of litter of each pig
Data contains weight of slaughter pigs measured weekly for 12 weeks. Data also contains the startweight (i.e. the weight at week 1). The treatments are 3 different levels of Evit = vitamin E (dose: 0, 100, 200 mg dl-alpha-tocopheryl acetat /kg feed) in combination with 3 different levels of Cu=copper (dose: 0, 35, 175 mg/kg feed) in the feed. The cumulated feed intake is also recorded. The pigs are littermates.
Lauridsen, C., Højsgaard, S.,Sørensen, M.T. C. (1999) Influence of Dietary Rapeseed Oli, Vitamin E, and Copper on Performance and Antioxidant and Oxidative Status of Pigs. J. Anim. Sci.77:906-916
data(dietox) head(dietox) ## Not run: if (require(ggplot2)){ qplot(Time, Weight, data=dietox, col=Pig) + geom_line() + theme(legend.position = "none") + facet_grid(Evit~Cu) } else { coplot(Weight ~ Time | Evit * Cu, data=dietox) } ## End(Not run)
data(dietox) head(dietox) ## Not run: if (require(ggplot2)){ qplot(Time, Weight, data=dietox, col=Pig) + geom_line() + theme(legend.position = "none") + facet_grid(Evit~Cu) } else { coplot(Weight ~ Time | Evit * Cu, data=dietox) } ## End(Not run)
Construct zcor vector (of fixed correlations) from a fixed working correlation matrix, a specification of clusters and a specifcation of waves.
fixed2Zcor(cor.fixed, id, waves)
fixed2Zcor(cor.fixed, id, waves)
cor.fixed |
Matrix |
id |
Clusters |
waves |
Vector giving the ordering of observations within clusters. |
A vector which can be passed as the zcor argument to geeglm.
Søren Højsgaard, [email protected]
timeorder <- rep(1:5, 6) tvar <- timeorder + rnorm(length(timeorder)) idvar <- rep(1:6, each=5) uuu <- rep(rnorm(6), each=5) yvar <- 1 + 2*tvar + uuu + rnorm(length(tvar)) simdat <- data.frame(idvar, timeorder, tvar, yvar) head(simdat,12) simdatPerm <- simdat[sample(nrow(simdat)),] simdatPerm <- simdatPerm[order(simdatPerm$idvar),] head(simdatPerm) cor.fixed <- matrix(c(1 , 0.5 , 0.25, 0.125, 0.125, 0.5 , 1 , 0.25, 0.125, 0.125, 0.25 , 0.25 , 1 , 0.5 , 0.125, 0.125, 0.125, 0.5 , 1 , 0.125, 0.125, 0.125, 0.125, 0.125, 1 ), nrow=5, ncol=5) cor.fixed zcor <- fixed2Zcor(cor.fixed, id=simdatPerm$idvar, waves=simdatPerm$timeorder) zcor mod4 <- geeglm(yvar~tvar, id=idvar, data=simdatPerm, corstr="fixed", zcor=zcor) mod4
timeorder <- rep(1:5, 6) tvar <- timeorder + rnorm(length(timeorder)) idvar <- rep(1:6, each=5) uuu <- rep(rnorm(6), each=5) yvar <- 1 + 2*tvar + uuu + rnorm(length(tvar)) simdat <- data.frame(idvar, timeorder, tvar, yvar) head(simdat,12) simdatPerm <- simdat[sample(nrow(simdat)),] simdatPerm <- simdatPerm[order(simdatPerm$idvar),] head(simdatPerm) cor.fixed <- matrix(c(1 , 0.5 , 0.25, 0.125, 0.125, 0.5 , 1 , 0.25, 0.125, 0.125, 0.25 , 0.25 , 1 , 0.5 , 0.125, 0.125, 0.125, 0.5 , 1 , 0.125, 0.125, 0.125, 0.125, 0.125, 1 ), nrow=5, ncol=5) cor.fixed zcor <- fixed2Zcor(cor.fixed, id=simdatPerm$idvar, waves=simdatPerm$timeorder) zcor mod4 <- geeglm(yvar~tvar, id=idvar, data=simdatPerm, corstr="fixed", zcor=zcor) mod4
The geeglm function fits generalized estimating equations using the 'geese.fit' function of the 'geepack' package for doing the actual computations. geeglm has a syntax similar to glm and returns an object similar to a glm object. An important feature of geeglm, is that an anova method exists for these models.
geeglm( formula, family = gaussian, data = parent.frame(), weights, subset, na.action, start = NULL, etastart, mustart, offset, control = geese.control(...), method = "glm.fit", contrasts = NULL, id, waves = NULL, zcor = NULL, corstr = "independence", scale.fix = FALSE, scale.value = 1, std.err = "san.se", ... )
geeglm( formula, family = gaussian, data = parent.frame(), weights, subset, na.action, start = NULL, etastart, mustart, offset, control = geese.control(...), method = "glm.fit", contrasts = NULL, id, waves = NULL, zcor = NULL, corstr = "independence", scale.fix = FALSE, scale.value = 1, std.err = "san.se", ... )
formula |
See corresponding documentation to |
family |
See corresponding documentation to |
data |
See corresponding documentation to |
weights |
See corresponding documentation to |
subset |
See corresponding documentation to |
na.action |
No action is taken. Indeed geeglm only works on complete data. |
start |
See corresponding documentation to |
etastart |
See corresponding documentation to |
mustart |
See corresponding documentation to |
offset |
See corresponding documentation to |
control |
See corresponding documentation to |
method |
See corresponding documentation to |
contrasts |
See corresponding documentation to |
id |
a vector which identifies the clusters. The length of ‘id’ should be the same as the number of observations. Data are assumed to be sorted so that observations on each cluster appear as contiguous rows in data. If data is not sorted this way, the function will not identify the clusters correctly. If data is not sorted this way, a warning will be issued. Please consult the package vignette for details. |
waves |
Wariable specifying the ordering of repeated mesurements on the same unit. Also used in connection with missing values. Please consult the package vignette for details. |
zcor |
Used for entering a user defined working correlation structure. |
corstr |
a character string specifying the correlation structure. The following are permitted: '"independence"', '"exchangeable"', '"ar1"', '"unstructured"' and '"userdefined"' |
scale.fix |
a logical variable; if true, the scale parameter is fixed at the value of 'scale.value'. |
scale.value |
numeric variable giving the value to which the scale parameter should be fixed; used only if 'scale.fix = TRUE'. |
std.err |
Type of standard error to be calculated. Defualt 'san.se' is the usual robust estimate. Other options are 'jack': if approximate jackknife variance estimate should be computed. 'j1s': if 1-step jackknife variance estimate should be computed. 'fij': logical indicating if fully iterated jackknife variance estimate should be computed. |
... |
further arguments passed to or from other methods. |
In the case of corstr="fixed" one must provide the zcor vector if the clusters have unequal sizes. Clusters with size one must not be represented in zcor.
An object of type 'geeglm'
Use "unstructured" correlation structure only with great care. (It may cause R to crash).
See the documentation for the 'geese' function for additional information. geeglm only works for complete data. Thus if there are NA's in data you can specify data=na.omit(mydata).
Søren Højsgaard, [email protected]
Halekoh, U.; Højsgaard, S. and Yan, J (2006) The R Package geepack for Generalized Estimating Equations. Journal of Statistical Software, 15, 2, 1-11"
Liang, K.Y. and Zeger, S.L. (1986) Longitudinal data analysis using generalized linear models. Biometrika, 73 13-22.
Prentice, R.L. and Zhao, L.P. (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics, 47 825-839.
data(dietox) dietox$Cu <- as.factor(dietox$Cu) mf <- formula(Weight ~ Cu * (Time + I(Time^2) + I(Time^3))) gee1 <- geeglm(mf, data=dietox, id=Pig, family=poisson("identity"), corstr="ar1") gee1 coef(gee1) vcov(gee1) summary(gee1) coef(summary(gee1)) mf2 <- formula(Weight ~ Cu * Time + I(Time^2) + I(Time^3)) gee2 <- geeglm(mf2, data=dietox, id=Pig, family=poisson("identity"), corstr="ar1") anova(gee2) # Notice the difference here: Clusters of observations must # appear as chunks in data. set.seed(1) chick1 <- ChickWeight chick2 <- chick1[sample(nrow(chick1)),] chick3 <- chick2[order(chick2$Chick),] fit1 <- geeglm(weight~Time, id=Chick, data=chick1) fit2 <- geeglm(weight~Time, id=Chick, data=chick2) fit3 <- geeglm(weight~Time, id=Chick, data=chick3) vcov(fit1) vcov(fit2) vcov(fit3)
data(dietox) dietox$Cu <- as.factor(dietox$Cu) mf <- formula(Weight ~ Cu * (Time + I(Time^2) + I(Time^3))) gee1 <- geeglm(mf, data=dietox, id=Pig, family=poisson("identity"), corstr="ar1") gee1 coef(gee1) vcov(gee1) summary(gee1) coef(summary(gee1)) mf2 <- formula(Weight ~ Cu * Time + I(Time^2) + I(Time^3)) gee2 <- geeglm(mf2, data=dietox, id=Pig, family=poisson("identity"), corstr="ar1") anova(gee2) # Notice the difference here: Clusters of observations must # appear as chunks in data. set.seed(1) chick1 <- ChickWeight chick2 <- chick1[sample(nrow(chick1)),] chick3 <- chick2[order(chick2$Chick),] fit1 <- geeglm(weight~Time, id=Chick, data=chick1) fit2 <- geeglm(weight~Time, id=Chick, data=chick2) fit3 <- geeglm(weight~Time, id=Chick, data=chick3) vcov(fit1) vcov(fit2) vcov(fit3)
Produces an object of class ‘geese’ which is a Generalized Estimating Equation fit of the data.
geese( formula = formula(data), sformula = ~1, id, waves = NULL, data = parent.frame(), subset = NULL, na.action = na.omit, contrasts = NULL, weights = NULL, zcor = NULL, corp = NULL, control = geese.control(...), b = NULL, alpha = NULL, gm = NULL, family = gaussian(), mean.link = NULL, variance = NULL, cor.link = "identity", sca.link = "identity", link.same = TRUE, scale.fix = FALSE, scale.value = 1, corstr = "independence", ... )
geese( formula = formula(data), sformula = ~1, id, waves = NULL, data = parent.frame(), subset = NULL, na.action = na.omit, contrasts = NULL, weights = NULL, zcor = NULL, corp = NULL, control = geese.control(...), b = NULL, alpha = NULL, gm = NULL, family = gaussian(), mean.link = NULL, variance = NULL, cor.link = "identity", sca.link = "identity", link.same = TRUE, scale.fix = FALSE, scale.value = 1, corstr = "independence", ... )
formula |
a formula expression as for |
sformula |
a formula expression of the form |
id |
a vector which identifies the clusters. The length of ‘id’ should be the same as the number of observations. Data are assumed to be sorted so that observations on a cluster are contiguous rows for all entities in the formula. |
waves |
an integer vector which identifies components in
clusters. The length of |
data |
an optional data frame in which to interpret the
variables occurring in the |
subset |
expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default. |
na.action |
a function to filter missing data. For |
contrasts |
a list giving contrasts for some or all of the factors appearing in the model formula. The elements of the list should have the same name as the variable and should be either a contrast matrix (specifically, any full-rank matrix with as many rows as there are levels in the factor), or else a function to compute such a matrix given the number of levels. |
weights |
an optional vector of weights to be used in the
fitting process. The length of |
zcor |
a design matrix for correlation parameters. |
corp |
known parameters such as coordinates used for correlation coefficients. |
control |
a list of iteration and algorithmic constants. See
|
b |
an initial estimate for the mean parameters. |
alpha |
an initial estimate for the correlation parameters. |
gm |
an initial estimate for the scale parameters. |
family |
a description of the error distribution and link
function to be used in the model, as for |
mean.link |
a character string specifying the link function
for the means. The following are allowed: |
variance |
a character string specifying the variance function
in terms of the mean. The following are allowed:
|
cor.link |
a character string specifying the link function for
the correlation coefficients. The following are allowed:
|
sca.link |
a character string specifying the link function for
the scales. The following are allowed: |
link.same |
a logical indicating if all the components in a cluster should use the same link. |
scale.fix |
a logical variable; if true, the scale parameter
is fixed at the value of |
scale.value |
numeric variable giving the value to which the
scale parameter should be fixed; used only if |
corstr |
a character string specifying the correlation
structure. The following are permitted: |
... |
further arguments passed to or from other methods. |
when the correlation structure is fixed
, the specification of
Zcor
should be a vector of length sum(clusz * (clusz - 1)) /
2.
An object of class "geese"
representing the fit.
Jun Yan [email protected]
Yan, J. and J.P. Fine (2004) Estimating Equations for Association Structures. Statistics in Medicine, 23, 859–880.
data(seizure) ## Diggle, Liang, and Zeger (1994) pp166-168, compare Table 8.10 seiz.l <- reshape(seizure, varying=list(c("base","y1", "y2", "y3", "y4")), v.names="y", times=0:4, direction="long") seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),] seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2) seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1) m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data=seiz.l, corstr="exch", family=poisson) summary(m1) m2 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data = seiz.l, subset = id!=49, corstr = "exch", family=poisson) summary(m2) ## Using fixed correlation matrix cor.fixed <- matrix(c(1, 0.5, 0.25, 0.125, 0.125, 0.5, 1, 0.25, 0.125, 0.125, 0.25, 0.25, 1, 0.5, 0.125, 0.125, 0.125, 0.5, 1, 0.125, 0.125, 0.125, 0.125, 0.125, 1), 5, 5) cor.fixed zcor <- rep(cor.fixed[lower.tri(cor.fixed)], 59) m3 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data = seiz.l, family = poisson, corstr = "fixed", zcor = zcor) summary(m3) data(ohio) fit <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) summary(fit) fit.ar1 <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="ar1", scale.fix=TRUE) summary(fit.ar1) ###### simulated data ## a function to generate a dataset gendat <- function() { id <- gl(50, 4, 200) visit <- rep(1:4, 50) x1 <- rbinom(200, 1, 0.6) ## within cluster varying binary covariate x2 <- runif(200, 0, 1) ## within cluster varying continuous covariate phi <- 1 + 2 * x1 ## true scale model ## the true correlation coefficient rho for an ar(1) ## correlation structure is 0.667. rhomat <- 0.667 ^ outer(1:4, 1:4, function(x, y) abs(x - y)) chol.u <- chol(rhomat) noise <- as.vector(sapply(1:50, function(x) chol.u %*% rnorm(4))) e <- sqrt(phi) * noise y <- 1 + 3 * x1 - 2 * x2 + e dat <- data.frame(y, id, visit, x1, x2) dat } dat <- gendat() fit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1, corstr = "ar1", jack = TRUE, j1s = TRUE, fij = TRUE) summary(fit) #### create user-defined design matrix of unstrctured correlation. #### in this case, zcor has 4*3/2 = 6 columns, and 50 * 6 = 300 rows zcor <- genZcor(clusz = rep(4, 50), waves = dat$visit, "unstr") zfit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1, corstr = "userdefined", zcor = zcor, jack = TRUE, j1s = TRUE, fij = TRUE) summary(zfit) #### Now, suppose that we want the correlation of 1-2, 2-3, and 3-4 #### to be the same. Then zcor should have 4 columns. z2 <- matrix(NA, 300, 4) z2[,1] <- zcor[,1] + zcor[,4] + zcor[,6] z2[,2:4] <- zcor[, c(2, 3, 5)] summary(geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1, corstr = "userdefined", zcor = z2, jack = TRUE, j1s = TRUE, fij = TRUE)) #### Next, we introduce non-constant cluster sizes by #### randomly selecting 60 percent of the data good <- sort(sample(1:nrow(dat), .6 * nrow(dat))) mdat <- dat[good,] summary(geese(y ~ x1 + x2, id = id, data = mdat, waves = visit, sformula = ~ x1, corstr="ar1", jack = TRUE, j1s = TRUE, fij = TRUE))
data(seizure) ## Diggle, Liang, and Zeger (1994) pp166-168, compare Table 8.10 seiz.l <- reshape(seizure, varying=list(c("base","y1", "y2", "y3", "y4")), v.names="y", times=0:4, direction="long") seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),] seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2) seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1) m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data=seiz.l, corstr="exch", family=poisson) summary(m1) m2 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data = seiz.l, subset = id!=49, corstr = "exch", family=poisson) summary(m2) ## Using fixed correlation matrix cor.fixed <- matrix(c(1, 0.5, 0.25, 0.125, 0.125, 0.5, 1, 0.25, 0.125, 0.125, 0.25, 0.25, 1, 0.5, 0.125, 0.125, 0.125, 0.5, 1, 0.125, 0.125, 0.125, 0.125, 0.125, 1), 5, 5) cor.fixed zcor <- rep(cor.fixed[lower.tri(cor.fixed)], 59) m3 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data = seiz.l, family = poisson, corstr = "fixed", zcor = zcor) summary(m3) data(ohio) fit <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) summary(fit) fit.ar1 <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="ar1", scale.fix=TRUE) summary(fit.ar1) ###### simulated data ## a function to generate a dataset gendat <- function() { id <- gl(50, 4, 200) visit <- rep(1:4, 50) x1 <- rbinom(200, 1, 0.6) ## within cluster varying binary covariate x2 <- runif(200, 0, 1) ## within cluster varying continuous covariate phi <- 1 + 2 * x1 ## true scale model ## the true correlation coefficient rho for an ar(1) ## correlation structure is 0.667. rhomat <- 0.667 ^ outer(1:4, 1:4, function(x, y) abs(x - y)) chol.u <- chol(rhomat) noise <- as.vector(sapply(1:50, function(x) chol.u %*% rnorm(4))) e <- sqrt(phi) * noise y <- 1 + 3 * x1 - 2 * x2 + e dat <- data.frame(y, id, visit, x1, x2) dat } dat <- gendat() fit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1, corstr = "ar1", jack = TRUE, j1s = TRUE, fij = TRUE) summary(fit) #### create user-defined design matrix of unstrctured correlation. #### in this case, zcor has 4*3/2 = 6 columns, and 50 * 6 = 300 rows zcor <- genZcor(clusz = rep(4, 50), waves = dat$visit, "unstr") zfit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1, corstr = "userdefined", zcor = zcor, jack = TRUE, j1s = TRUE, fij = TRUE) summary(zfit) #### Now, suppose that we want the correlation of 1-2, 2-3, and 3-4 #### to be the same. Then zcor should have 4 columns. z2 <- matrix(NA, 300, 4) z2[,1] <- zcor[,1] + zcor[,4] + zcor[,6] z2[,2:4] <- zcor[, c(2, 3, 5)] summary(geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1, corstr = "userdefined", zcor = z2, jack = TRUE, j1s = TRUE, fij = TRUE)) #### Next, we introduce non-constant cluster sizes by #### randomly selecting 60 percent of the data good <- sort(sample(1:nrow(dat), .6 * nrow(dat))) mdat <- dat[good,] summary(geese(y ~ x1 + x2, id = id, data = mdat, waves = visit, sformula = ~ x1, corstr="ar1", jack = TRUE, j1s = TRUE, fij = TRUE))
Auxiliary function as user interface for gee' fitting. Only used when calling
geese' or ‘geese.fit’.
geese.control( epsilon = 1e-04, maxit = 25, trace = FALSE, scale.fix = FALSE, jack = FALSE, j1s = FALSE, fij = FALSE )
geese.control( epsilon = 1e-04, maxit = 25, trace = FALSE, scale.fix = FALSE, jack = FALSE, j1s = FALSE, fij = FALSE )
epsilon |
positive convergence tolerance epsilon; the
iterations converge when the absolute value of the difference
in parameter estimate is below |
maxit |
integer giving the maximal number of Fisher Scoring iteration. |
trace |
logical indicating if output should be produced for each iteration. |
scale.fix |
logical indicating if the scale should be fixed. |
jack |
logical indicating if approximate jackknife variance estimate should be computed. |
j1s |
logical indicating if 1-step jackknife variance estimate should be computed. |
fij |
logical indicating if fully iterated jackknife variance estimate should be computed. |
When trace' is true, output for each iteration is printed to the screen by the c++ code. Hence,
options(digits = *)' does not control the precision.
A list with the arguments as components.
Jun Yan [email protected]
geese.fit', the fitting procedure used by
geese'.
constructs the design matrix for the correlation structures: independence, echangeable, ar1 and unstructured The user will need this function only as a basis to construct a user defined correlation structure: use genZcor to get the design matrix Z for the unstructured correlation and define the specific correlation structure by linear combinations of the columns of Z.
genZcor(clusz, waves, corstrv)
genZcor(clusz, waves, corstrv)
clusz |
integer vector giving the number of observations in each cluster. |
waves |
integer vector, obervations in the same cluster with
values of wave i and j have the correlation
|
corstrv |
correlation structures: 1=independence, 2=exchangeable, 3=ar1, 4=unstructured. |
The design matrix for the correlation structure.
Jun Yan [email protected]
# example to construct a Toeplitz correlation structure # sigma_ij=sigma_|i-j| # data set with 5 clusters and maximally 4 observations (visits) per cluster gendat <- function() { id <- gl(5, 4, 20) visit <- rep(1:4, 5) y <- rnorm(id) dat <- data.frame(y, id, visit)[c(-2,-9),] } set.seed(88) dat <- gendat() # generating the design matrix for the unstructured correlation zcor <- genZcor(clusz = table(dat$id), waves = dat$visit, corstrv=4) # defining the Toeplitz structure zcor.toep <- matrix(NA, nrow(zcor), 3) zcor.toep[,1] <- apply(zcor[,c(1, 4, 6)], 1, sum) zcor.toep[,2] <- apply(zcor[,c(2, 5)], 1, sum) zcor.toep[,3] <- zcor[,3] zfit1 <- geese(y ~ 1,id = id, data = dat, corstr = "userdefined", zcor = zcor.toep) zfit2 <- geeglm(y ~ 1,id = id, data = dat, corstr = "userdefined", zcor = zcor.toep)
# example to construct a Toeplitz correlation structure # sigma_ij=sigma_|i-j| # data set with 5 clusters and maximally 4 observations (visits) per cluster gendat <- function() { id <- gl(5, 4, 20) visit <- rep(1:4, 5) y <- rnorm(id) dat <- data.frame(y, id, visit)[c(-2,-9),] } set.seed(88) dat <- gendat() # generating the design matrix for the unstructured correlation zcor <- genZcor(clusz = table(dat$id), waves = dat$visit, corstrv=4) # defining the Toeplitz structure zcor.toep <- matrix(NA, nrow(zcor), 3) zcor.toep[,1] <- apply(zcor[,c(1, 4, 6)], 1, sum) zcor.toep[,2] <- apply(zcor[,c(2, 5)], 1, sum) zcor.toep[,3] <- zcor[,3] zfit1 <- geese(y ~ 1,id = id, data = dat, corstr = "userdefined", zcor = zcor.toep) zfit2 <- geeglm(y ~ 1,id = id, data = dat, corstr = "userdefined", zcor = zcor.toep)
The koch
data frame has 288 rows and 4 columns.
koch
koch
This data frame contains the following columns:
a numeric vector
a numeric vector
an ordered factor with levels: 1
< 2
< 3
a numeric vector
data(koch) fit <- ordgee(ordered(y) ~ trt + as.factor(day), id=id, data=koch, corstr="exch") summary(fit)
data(koch) fit <- ordgee(ordered(y) ~ trt + as.factor(day), id=id, data=koch, corstr="exch") summary(fit)
The data are from the Muscatine Coronary Risk Factor (MCRF) study, a longitudinal survey of school-age children in Muscatine, Iowa. The MCRF study had the goal of examining the development and persistence of risk factors for coronary disease in children. In the MCRF study, weight and height measurements of five cohorts of children, initially aged 5-7, 7-9, 9-11, 11-13, and 13-15 years, were obtained biennially from 1977 to 1981. Data were collected on 4856 boys and girls. On the basis of a comparison of their weight to age-gender specific norms, children were classified as obese or not obese.
muscatine
muscatine
A dataframe with 14568 rows and 7 variables:
identifier of child.
gender of child
baseline age
current age
identifier of occasion of recording
'yes' or 'no'
obese in numerical form: 1 corresponds to 'yes' and 0 corresponds to 'no'.
https://content.sph.harvard.edu/fitzmaur/ala2e/muscatine.txt
Woolson, R.F. and Clarke, W.R. (1984). Analysis of categorical incompletel longitudinal data. Journal of the Royal Statistical Society, Series A, 147, 87-99.
muscatine$cage <- muscatine$age - 12 muscatine$cage2 <- muscatine$cage^2 f1 <- numobese ~ gender f2 <- numobese ~ gender + cage + cage2 + gender:cage + gender:cage2 gee1 <- geeglm(formula = f1, id = id, waves = occasion, data = muscatine, family = binomial(), corstr = "independence") gee2 <- geeglm(formula = f2, id = id, waves = occasion, data = muscatine, family = binomial(), corstr = "independence") tidy(gee1) tidy(gee2) QIC(gee1) QIC(gee2)
muscatine$cage <- muscatine$age - 12 muscatine$cage2 <- muscatine$cage^2 f1 <- numobese ~ gender f2 <- numobese ~ gender + cage + cage2 + gender:cage + gender:cage2 gee1 <- geeglm(formula = f1, id = id, waves = occasion, data = muscatine, family = binomial(), corstr = "independence") gee2 <- geeglm(formula = f2, id = id, waves = occasion, data = muscatine, family = binomial(), corstr = "independence") tidy(gee1) tidy(gee2) QIC(gee1) QIC(gee2)
The ohio
data frame has 2148 rows and 4 columns. The dataset is a
subset of the six-city study, a longitudinal study of the health effects of
air pollution.
ohio
ohio
This data frame contains the following columns:
an indicator of wheeze status (1=yes, 0=no)
a numeric vector for subject id
a numeric vector of age, 0 is 9 years old
an indicator of maternal smoking at the first year of the study
Fitzmaurice, G.M. and Laird, N.M. (1993) A likelihood-based method for analyzing longitudinal binary responses, Biometrika 80: 141–151.
data(ohio) fit.ex <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) QIC(fit.ex) fit.ar <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="ar1", scale.fix=TRUE) QIC(fit.ex)
data(ohio) fit.ex <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) QIC(fit.ex) fit.ar <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="ar1", scale.fix=TRUE) QIC(fit.ex)
Produces an object of class ‘geese’ which is a Generalized Estimating Equation fit of the clustered ordinal data.
ordgee( formula = formula(data), ooffset = NULL, id, waves = NULL, data = parent.frame, subset = NULL, na.action = na.omit, contrasts = NULL, weights = NULL, z = NULL, mean.link = "logit", corstr = "independence", control = geese.control(...), b = NA, alpha = NA, scale.fix = TRUE, scale.val = 1, int.const = TRUE, rev = FALSE, ... )
ordgee( formula = formula(data), ooffset = NULL, id, waves = NULL, data = parent.frame, subset = NULL, na.action = na.omit, contrasts = NULL, weights = NULL, z = NULL, mean.link = "logit", corstr = "independence", control = geese.control(...), b = NA, alpha = NA, scale.fix = TRUE, scale.val = 1, int.const = TRUE, rev = FALSE, ... )
formula |
a formula expression as for |
ooffset |
vector of offset for the odds ratio model. |
id |
a vector which identifies the clusters. The length of ‘id’ should be the same as the number of observations. Data are assumed to be sorted so that observations on a cluster are contiguous rows for all entities in the formula. |
waves |
an integer vector which identifies components in
clusters. The length of |
data |
an optional data frame in which to interpret the
variables occurring in the |
subset |
expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default. |
na.action |
a function to filter missing data. For |
contrasts |
a list giving contrasts for some or all of the factors appearing in the model formula. The elements of the list should have the same name as the variable and should be either a contrast matrix (specifically, any full-rank matrix with as many rows as there are levels in the factor), or else a function to compute such a matrix given the number of levels. |
weights |
an optional vector of weights to be used in the
fitting process. The length of |
z |
a design matrix for the odds ratio model. The number of rows of z is
where |
mean.link |
a character string specifying the link function
for the means. The following are allowed: |
corstr |
a character string specifying the log odds. The
following are allowed: |
control |
a list of iteration and algorithmic constants. See
|
b |
an initial estimate for the mean parameters. |
alpha |
an initial estimate for the odds ratio parameters. |
scale.fix |
a logical variable indicating if scale is fixed; it is set at TRUE currently (it can not be FALSE yet!). |
scale.val |
this argument is ignored currently. |
int.const |
a logical variable; if true, the intercepts are constant, and if false, the intercepts are different for different components in the response. |
rev |
a logical variable. For example, for a three level ordered response Y = 2, the accumulated indicator is coded as (1, 0, 0) if true and (0, 1, 1) if false. |
... |
further arguments passed to or from other methods. |
An object of class "geese"
representing the fit.
Jun Yan [email protected]
Heagerty, P.J. and Zeger, S.L. (1996) Marginal regression models for clustered ordinal measurements. JASA, 91 1024–1036.
data(respdis) resp.l <- reshape(respdis, varying =list(c("y1", "y2", "y3", "y4")), v.names = "resp", direction = "long") resp.l <- resp.l[order(resp.l$id, resp.l$time),] fit <- ordgee(ordered(resp) ~ trt, id=id, data=resp.l, int.const=FALSE) summary(fit) data(ohio) ohio$resp <- ordered(as.factor(ohio$resp)) fit <- ordgee(resp ~ age + smoke + age:smoke, id = id, data=ohio) summary(fit)
data(respdis) resp.l <- reshape(respdis, varying =list(c("y1", "y2", "y3", "y4")), v.names = "resp", direction = "long") resp.l <- resp.l[order(resp.l$id, resp.l$time),] fit <- ordgee(ordered(resp) ~ trt, id=id, data=resp.l, int.const=FALSE) summary(fit) data(ohio) ohio$resp <- ordered(as.factor(ohio$resp)) fit <- ordgee(resp ~ age + smoke + age:smoke, id = id, data=ohio) summary(fit)
Function for calculating the quasi-likelihood under the independence model information criterion (QIC), quasi-likelihood, correlation information criterion (CIC), and corrected QIC for one or several fitted geeglm model object from the geepack package.
## S3 method for class 'geeglm' QIC(object, ..., tol = .Machine$double.eps, env = parent.frame()) ## S3 method for class 'ordgee' QIC(object, ..., tol = .Machine$double.eps, env = parent.frame()) ## S3 method for class 'geekin' QIC(object, ..., tol = .Machine$double.eps, env = parent.frame()) QIC(object, ..., tol = .Machine$double.eps, env = parent.frame())
## S3 method for class 'geeglm' QIC(object, ..., tol = .Machine$double.eps, env = parent.frame()) ## S3 method for class 'ordgee' QIC(object, ..., tol = .Machine$double.eps, env = parent.frame()) ## S3 method for class 'geekin' QIC(object, ..., tol = .Machine$double.eps, env = parent.frame()) QIC(object, ..., tol = .Machine$double.eps, env = parent.frame())
object |
a fitted GEE model from the geepack package. Currently only works on geeglm objects. |
... |
optionally more fitted geeglm model objects. |
tol |
the tolerance used for matrix inversion. |
env |
environment. |
QIC is used to select a correlation structure. The QICu is used to compare models that have the same working correlation matrix and the same quasi-likelihood form but different mean specifications. CIC has been suggested as a more robust alternative to QIC when the model for the mean may not fit the data very well and when models with different correlation structures are compared.
Models with smaller values of QIC, CIC, QICu, or QICC are preferred.
If the MASS package is loaded then the ginv
function is used
for matrix inversion. Otherwise the standard solve
function is
used.
A vector or matrix with the QIC, QICu, quasi likelihood, CIC, the number of mean effect parameters, and the corrected QIC for each GEE object
Claus Ekstrom [email protected], Brian McLoone [email protected] and Steven Orzack [email protected]
Pan, W. (2001). Akaike's information criterion in
generalized estimating equations. Biometrics, 57, 120-125.
Hardin, J.W. and Hilbe, J.M. (2012). Generalized
Estimating Equations, 2nd Edition, Chapman and Hall/CRC: New
York.
Hin, L.-Y. and Wang, Y-G. (2009). \emph{Working-correlation-structure identification in generalized estimating equations}, Statistics in Medicine 28: generalized estimating equations}, Statistics in Medicine 28: 642-658. \cr Thall, P.F. and Vail, S.C. (1990). \emph{Some Covariance Models for Longitudinal Count Data with Overdispersion}. Biometrics, 46, 657-671.
geeglm
library(geepack) data(ohio) fit <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) fit2 <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="ar1", scale.fix=TRUE) QIC(fit, fit2)
library(geepack) data(ohio) fit <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="exch", scale.fix=TRUE) fit2 <- geeglm(resp ~ age + smoke + age:smoke, id=id, data=ohio, family=binomial, corstr="ar1", scale.fix=TRUE) QIC(fit, fit2)
Fit a Relative Risk Model for Binary data with Log Link using the COPY method.
relRisk( formula, id, waves = NULL, data = parent.frame(), subset = NULL, contrasts = NULL, na.action = na.omit, corstr = "indep", ncopy = 1000, control = geese.control(), b = NULL, alpha = NULL )
relRisk( formula, id, waves = NULL, data = parent.frame(), subset = NULL, contrasts = NULL, na.action = na.omit, corstr = "indep", ncopy = 1000, control = geese.control(), b = NULL, alpha = NULL )
formula |
same as in |
id |
same as in |
waves |
same as in |
data |
same as in |
subset |
same as in |
contrasts |
same as in |
na.action |
same as in |
corstr |
same as in |
ncopy |
the number of copies of the original data in constructing weight. |
control |
same as in |
b |
initial values for regression coefficients as in
|
alpha |
same as in |
An object of class "geese"
representing the fit.
Jun Yan [email protected]
Lumley, T., Kornmal, R. and Ma, S. (2006). Relative risk regression in medical research: models, contrasts, estimators, and algorithms. UW Biostatistics Working Paper Series 293, University of Washington.
## this example was used in Yu and Yan (2010, techreport) data(respiratory) respiratory$treat <- relevel(respiratory$treat, ref = "P") respiratory$sex <- relevel(respiratory$sex, ref = "M") respiratory$center <- as.factor(respiratory$center) ## 1 will be the reference level fit <- relRisk(outcome ~ treat + center + sex + age + baseline + visit, id = id, corstr = "ar1", data = respiratory, ncopy=10000) summary(fit) ## fit <- relRisk(outcome ~ treat + center + sex + age + baseline + visit, ## id = id, corstr = "ex", data = respiratory) ## summary(fit) ## fit <- relRisk(outcome ~ treat + center + sex + age + baseline + visit, ## id = id, corstr = "indep", data = respiratory) ## summary(fit)
## this example was used in Yu and Yan (2010, techreport) data(respiratory) respiratory$treat <- relevel(respiratory$treat, ref = "P") respiratory$sex <- relevel(respiratory$sex, ref = "M") respiratory$center <- as.factor(respiratory$center) ## 1 will be the reference level fit <- relRisk(outcome ~ treat + center + sex + age + baseline + visit, id = id, corstr = "ar1", data = respiratory, ncopy=10000) summary(fit) ## fit <- relRisk(outcome ~ treat + center + sex + age + baseline + visit, ## id = id, corstr = "ex", data = respiratory) ## summary(fit) ## fit <- relRisk(outcome ~ treat + center + sex + age + baseline + visit, ## id = id, corstr = "indep", data = respiratory) ## summary(fit)
The respdis
data frame has 111 rows and 3 columns. The study described
in Miller et. al. (1993) is a randomized clinical trial of a new treatment of
respiratory disorder. The study was conducted in 111 patients who were
randomly assigned to one of two treatments (active, placebo). At each of four
visits during the follow-up period, the response status of each patients was
classified on an ordinal scale.
respdis
respdis
This data frame contains the following columns:
ordered factor measured at 4 visits for the response with
levels, 1
< 2
< 3
, 1 = poor, 2 = good, and 3 =
excellent
a factor for treatment with levels, 1 = active, 0 = placebo.
Miller, M.E., David, C.S., and Landis, R.J. (1993) The analysis of longitudinal polytomous data: Generalized estimating equation and connections with weighted least squares, Biometrics 49: 1033-1048.
data(respdis) resp.l <- reshape(respdis, varying = list(c("y1", "y2", "y3", "y4")), v.names = "resp", direction = "long") resp.l <- resp.l[order(resp.l$id, resp.l$time),] fit <- ordgee(ordered(resp) ~ trt, id = id, data = resp.l, int.const = FALSE) summary(fit) z <- model.matrix( ~ trt - 1, data = respdis) ind <- rep(1:111, 4*3/2 * 2^2) zmat <- z[ind,,drop=FALSE] fit <- ordgee(ordered(resp) ~ trt, id = id, data = resp.l, int.const = FALSE, z = zmat, corstr = "exchangeable") summary(fit)
data(respdis) resp.l <- reshape(respdis, varying = list(c("y1", "y2", "y3", "y4")), v.names = "resp", direction = "long") resp.l <- resp.l[order(resp.l$id, resp.l$time),] fit <- ordgee(ordered(resp) ~ trt, id = id, data = resp.l, int.const = FALSE) summary(fit) z <- model.matrix( ~ trt - 1, data = respdis) ind <- rep(1:111, 4*3/2 * 2^2) zmat <- z[ind,,drop=FALSE] fit <- ordgee(ordered(resp) ~ trt, id = id, data = resp.l, int.const = FALSE, z = zmat, corstr = "exchangeable") summary(fit)
The data are from a clinical trial of patients with respiratory illness, where 111 patients from two different clinics were randomized to receive either placebo or an active treatment. Patients were examined at baseline and at four visits during treatment. The respiratory status (categorized as 1 = good, 0 = poor) was determined at each visit.
respiratory
respiratory
A data frame with 444 observations on the following 8 variables.
a numeric vector
a numeric vector
treatment or placebo
M or F
in years at baseline
resporatory status at baseline
id of each of four visits
respiratory status at each visit
data(respiratory) data(respiratory, package="geepack") respiratory$center <- factor(respiratory$center) head(respiratory) m1 <- glm(outcome ~ center + treat + age + baseline, data=respiratory, family=binomial()) gee.ind <- geeglm(outcome ~ center + treat + age + baseline, data=respiratory, id=id, family=binomial(), corstr="independence") gee.exc <- geeglm(outcome ~ center + treat + age + baseline, data=respiratory, id=id, family=binomial(), corstr="exchangeable") gee.uns <- geeglm(outcome ~ center + treat + age + baseline, data=respiratory, id=id, family=binomial(), corstr="unstructured") gee.ar1 <- geeglm(outcome ~ center + treat + age + baseline, data=respiratory, id=id, family=binomial(), corstr="ar1") mlist <- list(gee.ind, gee.exc, gee.uns, gee.ar1) do.call(rbind, lapply(mlist, QIC)) lapply(mlist, tidy)
data(respiratory) data(respiratory, package="geepack") respiratory$center <- factor(respiratory$center) head(respiratory) m1 <- glm(outcome ~ center + treat + age + baseline, data=respiratory, family=binomial()) gee.ind <- geeglm(outcome ~ center + treat + age + baseline, data=respiratory, id=id, family=binomial(), corstr="independence") gee.exc <- geeglm(outcome ~ center + treat + age + baseline, data=respiratory, id=id, family=binomial(), corstr="exchangeable") gee.uns <- geeglm(outcome ~ center + treat + age + baseline, data=respiratory, id=id, family=binomial(), corstr="unstructured") gee.ar1 <- geeglm(outcome ~ center + treat + age + baseline, data=respiratory, id=id, family=binomial(), corstr="ar1") mlist <- list(gee.ind, gee.exc, gee.uns, gee.ar1) do.call(rbind, lapply(mlist, QIC)) lapply(mlist, tidy)
The seizure
data frame has 59 rows and 7 columns. The dataset has the
number of epiliptic seizures in each of four two-week intervals, and in a
baseline eight-week inverval, for treatment and control groups with a total
of 59 individuals.
seizure
seizure
This data frame contains the following columns:
the number of epiliptic seizures in the 1st 2-week interval
the number of epiliptic seizures in the 2nd 2-week interval
the number of epiliptic seizures in the 3rd 2-week interval
the number of epiliptic seizures in the 4th 2-week interval
an indicator of treatment
the number of epilitic seizures in a baseline 8-week interval
a numeric vector of subject age
Thall, P.F. and Vail S.C. (1990) Some covariance models for longitudinal count data with overdispersion. Biometrics 46: 657–671.
Diggle, P.J., Liang, K.Y., and Zeger, S.L. (1994) Analysis of Longitudinal Data. Clarendon Press.
data(seizure) ## Diggle, Liang, and Zeger (1994) pp166-168, compare Table 8.10 seiz.l <- reshape(seizure, varying=list(c("base","y1", "y2", "y3", "y4")), v.names="y", times=0:4, direction="long") seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),] seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2) seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1) m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data=seiz.l, corstr="exch", family=poisson) summary(m1) m2 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data = seiz.l, subset = id!=49, corstr = "exch", family=poisson) summary(m2) ## Thall and Vail (1990) seiz.l <- reshape(seizure, varying=list(c("y1","y2","y3","y4")), v.names="y", direction="long") seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),] seiz.l$lbase <- log(seiz.l$base / 4) seiz.l$lage <- log(seiz.l$age) seiz.l$v4 <- ifelse(seiz.l$time == 4, 1, 0) m3 <- geese(y ~ lbase + trt + lbase:trt + lage + v4, sformula = ~ as.factor(time) - 1, id = id, data = seiz.l, corstr = "exchangeable", family=poisson) ## compare to Model 13 in Table 4, noticeable difference summary(m3) ## set up a design matrix for the correlation z <- model.matrix(~ age, data = seizure) # data is not seiz.l ## just to illustrate the scale link and correlation link m4 <- geese(y ~ lbase + trt + lbase:trt + lage + v4, sformula = ~ as.factor(time)-1, id = id, data = seiz.l, corstr = "ar1", family = poisson, zcor = z, cor.link = "fisherz", sca.link = "log") summary(m4)
data(seizure) ## Diggle, Liang, and Zeger (1994) pp166-168, compare Table 8.10 seiz.l <- reshape(seizure, varying=list(c("base","y1", "y2", "y3", "y4")), v.names="y", times=0:4, direction="long") seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),] seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2) seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1) m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data=seiz.l, corstr="exch", family=poisson) summary(m1) m2 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id, data = seiz.l, subset = id!=49, corstr = "exch", family=poisson) summary(m2) ## Thall and Vail (1990) seiz.l <- reshape(seizure, varying=list(c("y1","y2","y3","y4")), v.names="y", direction="long") seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),] seiz.l$lbase <- log(seiz.l$base / 4) seiz.l$lage <- log(seiz.l$age) seiz.l$v4 <- ifelse(seiz.l$time == 4, 1, 0) m3 <- geese(y ~ lbase + trt + lbase:trt + lage + v4, sformula = ~ as.factor(time) - 1, id = id, data = seiz.l, corstr = "exchangeable", family=poisson) ## compare to Model 13 in Table 4, noticeable difference summary(m3) ## set up a design matrix for the correlation z <- model.matrix(~ age, data = seizure) # data is not seiz.l ## just to illustrate the scale link and correlation link m4 <- geese(y ~ lbase + trt + lbase:trt + lage + v4, sformula = ~ as.factor(time)-1, id = id, data = seiz.l, corstr = "ar1", family = poisson, zcor = z, cor.link = "fisherz", sca.link = "log") summary(m4)
Impact of ozone on the growth of sitka spruce trees.
sitka89
sitka89
A dataframe
size of the tree measured in
days after the 1st january, 1988
id number of a tree
ozone: grown under ozone environment, control: ozone free
data(sitka89)
data(sitka89)
The spruce
data frame has 1027 rows and 6 columns. The data consists
of measurements on 79 sitka spruce trees over two growing seasons. The trees
were grown in four controlled environment chambers, of which the first two,
containing 27 trees each, were treated with introduced ozone at 70 ppb whilst
the remaining two, containing 12 and 13 trees, were controls.
spruce
spruce
This data frame contains the following columns:
a numeric vector of chamber numbers
a factor with levels enriched
and normal
a numeric vector of tree id
a numeric vector of the time when the measurements were taken, measured in days since Jan. 1, 1988
a numeric vector of the measurement number
a numeric vector of the log-size
Diggle, P.J., Liang, K.Y., and Zeger, S.L. (1994) Analysis of Longitudinal Data, Clarendon Press.
data(spruce) spruce$contr <- ifelse(spruce$ozone=="enriched", 0, 1) sitka88 <- spruce[spruce$wave <= 5,] sitka89 <- spruce[spruce$wave > 5,] fit.88 <- geese(logsize ~ as.factor(wave) + contr + I(time/100*contr) - 1, id=id, data=sitka88, corstr="ar1") summary(fit.88) fit.89 <- geese(logsize ~ as.factor(wave) + contr - 1, id=id, data=sitka89, corstr="ar1") summary(fit.89)
data(spruce) spruce$contr <- ifelse(spruce$ozone=="enriched", 0, 1) sitka88 <- spruce[spruce$wave <= 5,] sitka89 <- spruce[spruce$wave > 5,] fit.88 <- geese(logsize ~ as.factor(wave) + contr + I(time/100*contr) - 1, id=id, data=sitka88, corstr="ar1") summary(fit.88) fit.89 <- geese(logsize ~ as.factor(wave) + contr - 1, id=id, data=sitka89, corstr="ar1") summary(fit.89)