Title: | Data from the Book "Multivariate Statistical Modelling Based on Generalized Linear Models", First Edition, by Ludwig Fahrmeir and Gerhard Tutz |
---|---|
Description: | Data and functions for the book "Multivariate Statistical Modelling Based on Generalized Linear Models", first edition, by Ludwig Fahrmeir and Gerhard Tutz. Useful when using the book. |
Authors: | compiled by Kjetil B Halvorsen |
Maintainer: | Kjetil B Halvorsen <[email protected]> |
License: | GPL (>= 2) |
Version: | 2016.5.31 |
Built: | 2024-11-04 21:43:17 UTC |
Source: | CRAN |
Effects of age and smoking status on breathing test results for workers in industrial plants in Texas.
data(breath)
data(breath)
A data frame with 18 observations on the following 4 variables.
a factor with levels <40
40-59
number of workers in group
a factor with levels Current.smoker
Former.smoker
Never.smoked
a factor with levels Abnormal
Borderline
Normal
We consider the effects of age and smoking status upon breathing test results for workers in industrial plants in Texas. The test results are given on an ordered scale with categories "Abnormal", "Borderline" and "Normal". It is of interest how age and smoking status are connected to breathing test results.
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(breath) breath$Breathing.test <- ordered(breath$Breathing.test) library(MASS) breath.polr1 <- polr(Breathing.test ~ Age*Smoking.status, weight=n, data=breath) breath.polr2 <- polr(Breathing.test ~ Age*Smoking.status, weight=n, data=breath, method="cloglog") summary(breath.polr1) summary(breath.polr2) # continuation ratio models (as of page 89) might be fitted with # Design or VGAM package.
str(breath) breath$Breathing.test <- ordered(breath$Breathing.test) library(MASS) breath.polr1 <- polr(Breathing.test ~ Age*Smoking.status, weight=n, data=breath) breath.polr2 <- polr(Breathing.test ~ Age*Smoking.status, weight=n, data=breath, method="cloglog") summary(breath.polr1) summary(breath.polr2) # continuation ratio models (as of page 89) might be fitted with # Design or VGAM package.
Data on infection from births by Caesarian section
data(caesar)
data(caesar)
A data frame with 24 observations on the following 7 variables.
a factor with levels 1
2
3
, the response
number of patients in group
a factor with levels not
planned
,
was the caesarian planned?
a factor with levels risk factors
without
,
was there risk factors?
a factor with levels antibiotics
without
logistic response, 0=no infection
covariate pattern number
Infection from birth by Caesarian section. The response variable,
y
, has levels 1=type I infection, 2=type II infection,
3=none infection. Where risk-factors (diabetes, overweight, others)
present? Where antibiotics used as prophylaxis? Aim is to
analyse effects on response by covariates.
Kjetil Halvorsen
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
summary(caesar) caesar.glm1 <- glm(yl ~ noplan+factor+antib, data=caesar, weight=w, family=binomial(link="logit")) caesar.glm2 <- glm(yl ~ noplan+factor+antib, data=caesar, weight=w, family=binomial(link="probit")) summary(caesar.glm1) summary(caesar.glm2)
summary(caesar) caesar.glm1 <- glm(yl ~ noplan+factor+antib, data=caesar, weight=w, family=binomial(link="logit")) caesar.glm2 <- glm(yl ~ noplan+factor+antib, data=caesar, weight=w, family=binomial(link="probit")) summary(caesar.glm1) summary(caesar.glm2)
The effect of two agents of immuno-activating ability that may induce cell differentiation was investigated.
data(cells)
data(cells)
A data frame with 16 observations on the following 3 variables.
number of cells differentiating
dose of TNF, U/ml
dose of IFN, U/ml
The effect of two agents of immuno-activating ability that may induce cell differentiation was investigated. As response variable the number of cells that exhibited markers after exposure was recorded. It is of interest if the agents TNF (tumor necrosis factor) and IFN (interferon) stimulate cell differentiation independently, or if there is a synergetic effect. 200 cells were examined at each dose combination.
Kjetil Halvorsen
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(cells) cells.poisson <- glm(y~TNF+IFN+TNF:IFN, data=cells, family=poisson) summary(cells.poisson) confint(cells.poisson) # Now we follow the book, example 2.6, page 51: # there seems to be overdispersion? cells.quasi <- glm(y~TNF+IFN+TNF:IFN, data=cells, family=quasipoisson) summary(cells.quasi) anova(cells.quasi) confint(cells.quasi) # We follow the book, example 2.7, page 56: with(cells, tapply(y, factor(TNF), function(x) c(mean(x), var(x)))) # which might indicate the use of a negative binomial model
str(cells) cells.poisson <- glm(y~TNF+IFN+TNF:IFN, data=cells, family=poisson) summary(cells.poisson) confint(cells.poisson) # Now we follow the book, example 2.6, page 51: # there seems to be overdispersion? cells.quasi <- glm(y~TNF+IFN+TNF:IFN, data=cells, family=quasipoisson) summary(cells.quasi) anova(cells.quasi) confint(cells.quasi) # We follow the book, example 2.7, page 56: with(cells, tapply(y, factor(TNF), function(x) c(mean(x), var(x)))) # which might indicate the use of a negative binomial model
The credit
data frame has 1000 rows and 8 columns. This are
data for 1000 clients of a south german bank, 700 good payers and
300 bad payers. They are used to construct a credit scoring method.
data(credit)
data(credit)
This data frame contains the following columns:
a factor with levels
buen
mal
, the response variable. buen is the good payers.
a factor with levels
no
good running
bad running
, quality of the credit clients bank account.
a numeric vector, duration of loan in months.
a factor with levels
pre buen pagador
pre mal pagador
, if the client previosly have been a
good or bad payer.
a factor with levels
privado
profesional
, the use to which the loan is made.
a numeric vector, the size of loan in german marks.
a factor with levels
mujer
hombre
, sex of the client.
a factor with levels
no vive solo
vive solo
, civil state of the client.
Fahrmeier and Tutz, Multivariate Generalized Linear Models, Springer Verlag.
summary(credit)
summary(credit)
Relationship between sex, years in school, and reported happiness.
data(happy)
data(happy)
A data frame with 24 observations on the following 4 variables.
an ordered factor with levels Not to happy
< \
codePretty happy < Very happy
a factor with levels <12
>16
12
13-16
a factor with levels Females
Males
number of persons in group
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(happy) table(happy)
str(happy) table(happy)
Data from a head and neck cancer study where time was discretized by one-month intervals.
data(headneck)
data(headneck)
A data frame with 47 observations on the following 4 variables.
a numeric vector
a numeric vector, number at risk
a numeric vector
a numeric vector
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(headneck) summary(headneck) with(headneck, {plot(month, atrisk, type="s"); lines(month, deaths, type="s", col="red"); lines(month, withdrawals, type="S", col="green")})
str(headneck) summary(headneck) with(headneck, {plot(month, atrisk, type="s"); lines(month, deaths, type="s", col="red"); lines(month, withdrawals, type="S", col="green")})
Air Pollution and Health, annual data on children 7 to ten years old in Ohio.
data(ohio)
data(ohio)
A data frame with 32 observations on the following 6 variables.
Presence (1) or absence (0) of respiratory infection
Presence (1) or absence (0) of respiratory infection
Presence (1) or absence (0) of respiratory infection
Presence (1) or absence (0) of respiratory infection
a factor with levels no
yes
number of children
Within the harvard Study of Air Pollution and Health, 537 children were examined annually from age 7 to 10, on the presence or absence of respiratory infection. So there are four repeated measurements on each child, or "short time series". The only available covariate is mothers smoking status at start of study.
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(ohio) summary(ohio)
str(ohio) summary(ohio)
A sample of psychology students was asked if they expected to find adecuate employment after graduation.
data(Regensburg)
data(Regensburg)
A data frame with 30 observations on the following 4 variables.
response categories
number of students with this response in group
age in years
natural log of age
In a study on the perspectives of students, psychology students at the university of Regensburg have been asked if they expect to find an adequate employment after getting their degree. The response categories where ordered with respect to their expectation. Categories where "don't expect adequate employment" - 1, "not sure" - 2, "immediately after the degree" - 3.
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(Regensburg) summary(Regensburg) # Example 3.5 page 83 in book: library(MASS) Regensburg$y <- ordered(Regensburg$y) Regensburg.polr <- polr(y~lage, data=Regensburg, weights = n) summary(Regensburg.polr) class(Regensburg.polr)
str(Regensburg) summary(Regensburg) # Example 3.5 page 83 in book: library(MASS) Regensburg$y <- ordered(Regensburg$y) Regensburg.polr <- polr(y~lage, data=Regensburg, weights = n) summary(Regensburg.polr) class(Regensburg.polr)
Data from patients with acute rheumathoid arthritis. A new agent was compared with an active control, and each patient was evaluated on a five-point assessment scale.
data(rheuma)
data(rheuma)
A data frame with 10 observations on the following 3 variables.
a factor with levels Active.control
New.agent
an ordered factor with levels Much.worse
<
Worse
< No.change
< Improved
<
Much.improved
number of patients in group
The global assessment in this example may be subdivided in the coarse response "improvement", "no change" and "worse". On a higher level improvement is split into "much improved" and "improved", while the "worse" category is split into "worse" and "much worse".
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(rheuma) summary(rheuma)
str(rheuma) summary(rheuma)
Children have been classified according to their relative tonsil size and wheater or not they are carriers of Streptococcus pyogenes.
data(tonsil)
data(tonsil)
A data frame with 6 observations on the following 3 variables.
a factor with levels carriers
noncarriers
numeric, 1, 2 or 3, tonsil size
number of children in group
It may be assumed that tonsil size always starts in the normal state "present but not enlarged" (category 1). If the tonsils grow abnormally, they may become "enlarged" (category 2), if the process does not stop, they may become "greatly enlarged" (category 3).
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(tonsil) summary(tonsil)
str(tonsil) summary(tonsil)
For 5199 individuals bivariate binary responses were observed, indicating wheater or not an eye was visually impaired, with covariates. The main objective is to analyze the influence of age and race on visual impairment, controlling for education, a surrogate for socioeconomic status. Data are only given individually for right and left eye, the bivariate response is lost.
data(visual)
data(visual)
The format is: List of 2 $ left :‘data.frame’: 16 obs. of 4 variables: ..$ left: Factor w/ 2 levels "no","yes": 2 1 2 1 2 1 2 1 2 1 ... ..$ race: Factor w/ 2 levels "black","white": 2 2 2 2 2 2 2 2 1 1 ... ..$ age : Factor w/ 4 levels "40-50","51-60",..: 1 1 2 2 3 3 4 4 1 1 ... ..$ n : int [1:16] 15 617 24 557 42 789 139 673 29 750 ... $ right:‘data.frame’: 16 obs. of 4 variables: ..$ right: Factor w/ 2 levels "no","yes": 2 1 2 1 2 1 2 1 2 1 ... ..$ race : Factor w/ 2 levels "black","white": 2 2 2 2 2 2 2 2 1 1 ... ..$ age : Factor w/ 4 levels "40-50","51-60",..: 1 1 2 2 3 3 4 4 1 1 ... ..$ n : int [1:16] 19 613 25 556 48 783 146 666 31 748 ...
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(visual) summary(visual)
str(visual) summary(visual)
In a study on the bitterness of white wine it is of interest wheater treatments that can be controlleds during pressing the grapes influence the bitterness of wines. The two factors considered are the temperature and the admission of contact with skin when pressing the grapes.
data(wine)
data(wine)
A data frame with 72 observations on the following 5 variables.
a factor, temperature, with levels high
low
a factor with levels no
yes
a factor with levels 1
2
3
4
5
6
7
8
a factor with levels 1
2
3
4
5
6
7
8
9
numeric, ordinal score, from '1'=nonbitter to '5'=very bitter
Ludwig Fahrmeir, Gerhard Tutz (1994): Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer Verlag. New-York Berlin Heidelberg
str(wine) summary(wine)
str(wine) summary(wine)