Title:  Applied Econometrics with R 

Description:  Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, SpringerVerlag, New York. ISBN 9780387773162. <doi:10.1007/9780387773186> (See the vignette "AER" for a package overview.) 
Authors:  Christian Kleiber [aut] , Achim Zeileis [aut, cre] 
Maintainer:  Achim Zeileis <[email protected]> 
License:  GPL2  GPL3 
Version:  1.212 
Built:  20240603 06:46:57 UTC 
Source:  CRAN 
Infidelity data, known as Fair's Affairs. Crosssection data from a survey conducted by Psychology Today in 1969.
data("Affairs")
A data frame containing 601 observations on 9 variables.
numeric. How often engaged in extramarital sexual intercourse
during the past year? 0
= none, 1
= once, 2
= twice,
3
= 3 times, 7
= 4–10 times, 12
= monthly,
12
= weekly, 12
= daily.
factor indicating gender.
numeric variable coding age in years: 17.5
= under 20, 22
= 20–24,
27
= 25–29, 32
= 30–34, 37
= 35–39, 42
= 40–44,
47
= 45–49, 52
= 50–54, 57
= 55 or over.
numeric variable coding number of years married: 0.125
= 3 months or less,
0.417
= 4–6 months, 0.75
= 6 months–1 year, 1.5
= 1–2 years,
4
= 3–5 years, 7
= 6–8 years, 10
= 9–11 years, 15
= 12 or more years.
factor. Are there children in the marriage?
numeric variable coding religiousness: 1
= anti, 2
= not at all,
3
= slightly, 4
= somewhat, 5
= very.
numeric variable coding level of education: 9
= grade school,
12
= high school graduate, 14
= some college, 16
= college graduate,
17
= some graduate work, 18
= master's degree, 20
= Ph.D., M.D., or
other advanced degree.
numeric variable coding occupation according to Hollingshead classification (reverse numbering).
numeric variable coding self rating of marriage: 1
= very unhappy,
2
= somewhat unhappy, 3
= average, 4
= happier than average,
5
= very happy.
Online complements to Greene (2003). Table F22.2.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Fair, R.C. (1978). A Theory of Extramarital Affairs. Journal of Political Economy, 86, 45–61.
data("Affairs")
## Greene (2003)
## Tab. 22.3 and 22.4
fm_ols < lm(affairs ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs)
fm_probit < glm(I(affairs > 0) ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs, family = binomial(link = "probit"))
fm_tobit < tobit(affairs ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs)
fm_tobit2 < tobit(affairs ~ age + yearsmarried + religiousness + occupation + rating,
right = 4, data = Affairs)
fm_pois < glm(affairs ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs, family = poisson)
library("MASS")
fm_nb < glm.nb(affairs ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs)
## Tab. 22.6
library("pscl")
fm_zip < zeroinfl(affairs ~ age + yearsmarried + religiousness + occupation + rating  age +
yearsmarried + religiousness + occupation + rating, data = Affairs)
Time series of consumer price index (CPI) in Argentina (index with 1969(4) = 1).
data("ArgentinaCPI")
A quarterly univariate time series from 1970(1) to 1989(4).
Online complements to Franses (1998).
De Ruyter van Steveninck, M.A. (1996). The Impact of Capital Imports; Argentina 1970–1989. Amsterdam: Thesis Publishers.
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
data("ArgentinaCPI")
plot(ArgentinaCPI)
plot(log(ArgentinaCPI))
library("dynlm")
## estimation sample 1970.31988.4 means
acpi < window(ArgentinaCPI, start = c(1970,1), end = c(1988,4))
## eq. (3.90), p.54
acpi_ols < dynlm(d(log(acpi)) ~ L(d(log(acpi))))
summary(acpi_ols)
## alternatively
ar(diff(log(acpi)), order.max = 1, method = "ols")
This manual page collects a list of examples from the book. Some solutions might not be exact and the list is certainly not complete. If you have suggestions for improvement (preferably in the form of code), please contact the package maintainer.
Baltagi, B.H. (2002). Econometrics, 3rd ed., Berlin: SpringerVerlag.
BenderlyZwick
, CigarettesB
, EuroEnergy
,
Grunfeld
, Mortgage
, NaturalGas
,
OECDGas
, OrangeCounty
, PSID1982
,
TradeCredit
, USConsump1993
, USCrudes
,
USGasB
, USMacroB
################################
## Cigarette consumption data ##
################################
## data
data("CigarettesB", package = "AER")
## Table 3.3
cig_lm < lm(packs ~ price, data = CigarettesB)
summary(cig_lm)
## Figure 3.9
plot(residuals(cig_lm) ~ price, data = CigarettesB)
abline(h = 0, lty = 2)
## Figure 3.10
cig_pred < with(CigarettesB,
data.frame(price = seq(from = min(price), to = max(price), length = 30)))
cig_pred < cbind(cig_pred, predict(cig_lm, newdata = cig_pred, interval = "confidence"))
plot(packs ~ price, data = CigarettesB)
lines(fit ~ price, data = cig_pred)
lines(lwr ~ price, data = cig_pred, lty = 2)
lines(upr ~ price, data = cig_pred, lty = 2)
## Chapter 5: diagnostic tests (p. 111115)
cig_lm2 < lm(packs ~ price + income, data = CigarettesB)
summary(cig_lm2)
## Glejser tests (p. 112)
ares < abs(residuals(cig_lm2))
summary(lm(ares ~ income, data = CigarettesB))
summary(lm(ares ~ I(1/income), data = CigarettesB))
summary(lm(ares ~ I(1/sqrt(income)), data = CigarettesB))
summary(lm(ares ~ sqrt(income), data = CigarettesB))
## GoldfeldQuandt test (p. 112)
gqtest(cig_lm2, order.by = ~ income, data = CigarettesB, fraction = 12, alternative = "less")
## NOTE: Baltagi computes the test statistic as mss1/mss2,
## i.e., tries to find decreasing variances. gqtest() always uses
## mss2/mss1 and has an "alternative" argument.
## Spearman rank correlation test (p. 113)
cor.test(~ ares + income, data = CigarettesB, method = "spearman")
## BreuschPagan test (p. 113)
bptest(cig_lm2, varformula = ~ income, data = CigarettesB, student = FALSE)
## White test (Table 5.1, p. 113)
bptest(cig_lm2, ~ income * price + I(income^2) + I(price^2), data = CigarettesB)
## White HC standard errors (Table 5.2, p. 114)
coeftest(cig_lm2, vcov = vcovHC(cig_lm2, type = "HC1"))
## JarqueBera test (Figure 5.2, p. 115)
hist(residuals(cig_lm2), breaks = 16, ylim = c(0, 10), col = "lightgray")
library("tseries")
jarque.bera.test(residuals(cig_lm2))
## Tables 8.1 and 8.2
influence.measures(cig_lm2)
#####################################
## US consumption data (19501993) ##
#####################################
## data
data("USConsump1993", package = "AER")
plot(USConsump1993, plot.type = "single", col = 1:2)
## Chapter 5 (p. 122125)
fm < lm(expenditure ~ income, data = USConsump1993)
summary(fm)
## DurbinWatson test (p. 122)
dwtest(fm)
## BreuschGodfrey test (Table 5.4, p. 124)
bgtest(fm)
## NeweyWest standard errors (Table 5.5, p. 125)
coeftest(fm, vcov = NeweyWest(fm, lag = 3, prewhite = FALSE, adjust = TRUE))
## Chapter 8
library("strucchange")
## Recursive residuals
rr < recresid(fm)
rr
## Recursive CUSUM test
rcus < efp(expenditure ~ income, data = USConsump1993)
plot(rcus)
sctest(rcus)
## HarveyCollier test
harvtest(fm)
## NOTE" Mistake in Baltagi (2002) who computes
## the tstatistic incorrectly as 0.0733 via
mean(rr)/sd(rr)/sqrt(length(rr))
## whereas it should be (as in harvtest)
mean(rr)/sd(rr) * sqrt(length(rr))
## Rainbow test
raintest(fm, center = 23)
## J test for nonnested models
library("dynlm")
fm1 < dynlm(expenditure ~ income + L(income), data = USConsump1993)
fm2 < dynlm(expenditure ~ income + L(expenditure), data = USConsump1993)
jtest(fm1, fm2)
## Chapter 11
## Table 11.1 Instrumentalvariables regression
usc < as.data.frame(USConsump1993)
usc$investment < usc$income  usc$expenditure
fm_ols < lm(expenditure ~ income, data = usc)
fm_iv < ivreg(expenditure ~ income  investment, data = usc)
## Hausman test
cf_diff < coef(fm_iv)  coef(fm_ols)
vc_diff < vcov(fm_iv)  vcov(fm_ols)
x2_diff < as.vector(t(cf_diff) %*% solve(vc_diff) %*% cf_diff)
pchisq(x2_diff, df = 2, lower.tail = FALSE)
## Chapter 14
## ACF and PACF for expenditures and first differences
exps < USConsump1993[, "expenditure"]
(acf(exps))
(pacf(exps))
(acf(diff(exps)))
(pacf(diff(exps)))
## dynamic regressions, eq. (14.8)
fm < dynlm(d(exps) ~ I(time(exps)  1949) + L(exps))
summary(fm)
################################
## Grunfeld's investment data ##
################################
## select the first three companies (as panel data)
data("Grunfeld", package = "AER")
pgr < subset(Grunfeld, firm %in% levels(Grunfeld$firm)[1:3])
library("plm")
pgr < pdata.frame(pgr, c("firm", "year"))
## Ex. 10.9
library("systemfit")
gr_ols < systemfit(invest ~ value + capital, method = "OLS", data = pgr)
gr_sur < systemfit(invest ~ value + capital, method = "SUR", data = pgr)
#########################################
## Panel study on income dynamics 1982 ##
#########################################
## data
data("PSID1982", package = "AER")
## Table 4.1
earn_lm < lm(log(wage) ~ . + I(experience^2), data = PSID1982)
summary(earn_lm)
## Table 13.1
union_lpm < lm(I(as.numeric(union)  1) ~ .  wage, data = PSID1982)
union_probit < glm(union ~ .  wage, data = PSID1982, family = binomial(link = "probit"))
union_logit < glm(union ~ .  wage, data = PSID1982, family = binomial)
## probit OK, logit and LPM rather different.
Wages of employees of a US bank.
data("BankWages")
A data frame containing 474 observations on 4 variables.
Ordered factor indicating job category, with levels "custodial"
,
"admin"
and "manage"
.
Education in years.
Factor indicating gender.
Factor. Is the employee member of a minority?
Online complements to Heij, de Boer, Franses, Kloek, and van Dijk (2004).
https://global.oup.com/booksites/content/0199268010/datasets/ch6/xr614bwa.asc
Heij, C., de Boer, P.M.C., Franses, P.H., Kloek, T. and van Dijk, H.K. (2004). Econometric Methods with Applications in Business and Economics. Oxford: Oxford University Press.
data("BankWages")
## exploratory analysis of job ~ education
## (tables and spine plots, some education levels merged)
xtabs(~ education + job, data = BankWages)
edcat < factor(BankWages$education)
levels(edcat)[3:10] < rep(c("1415", "1618", "1921"), c(2, 3, 3))
tab < xtabs(~ edcat + job, data = BankWages)
prop.table(tab, 1)
spineplot(tab, off = 0)
plot(job ~ edcat, data = BankWages, off = 0)
## fit multinomial model for male employees
library("nnet")
fm_mnl < multinom(job ~ education + minority, data = BankWages,
subset = gender == "male", trace = FALSE)
summary(fm_mnl)
confint(fm_mnl)
## same with mlogit package
library("mlogit")
fm_mlogit < mlogit(job ~ 1  education + minority, data = BankWages,
subset = gender == "male", shape = "wide", reflevel = "custodial")
summary(fm_mlogit)
Time series data, 1952–1982.
data("BenderlyZwick")
An annual multiple time series from 1952 to 1982 with 5 variables.
real annual returns on stocks, measured using the IbbotsonSinquefeld data base.
annual growth rate of output, measured by real GNP (from the given year to the next year).
inflation rate, measured as growth of price rate (from December of the previous year to December of the present year).
annual growth rate of real GNP as given by Baltagi.
inflation rate as given by Baltagi
The first three columns of the data are from Table 1 in Benderly and Zwick (1985). The remaining columns are taken from the online complements of Baltagi (2002). The first column is identical in both sources, the other two variables differ in their numeric values and additionally the growth series seems to be lagged differently. Baltagi (2002) states Lott and Ray (1992) as the source for his version of the data set.
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
Benderly, J., and Zwick, B. (1985). Inflation, Real Balances, Output and Real Stock Returns. American Economic Review, 75, 1115–1123.
Lott, W.F., and Ray, S.C. (1992). Applied Econometrics: Problems with Data Sets. New York: The Dryden Press.
Zaman, A., Rousseeuw, P.J., and Orhan, M. (2001). Econometric Applications of HighBreakdown Robust Regression Techniques. Economics Letters, 71, 1–8.
data("BenderlyZwick")
plot(BenderlyZwick)
## Benderly and Zwick (1985), p. 1116
library("dynlm")
bz_ols < dynlm(returns ~ growth + inflation,
data = BenderlyZwick/100, start = 1956, end = 1981)
summary(bz_ols)
## Zaman, Rousseeuw and Orhan (2001)
## use larger period, without scaling
bz_ols2 < dynlm(returns ~ growth + inflation,
data = BenderlyZwick, start = 1954, end = 1981)
summary(bz_ols2)
Monthly averages of the yield on a Moody's Aaa rated corporate bond (in percent/year).
data("BondYield")
A monthly univariate time series from 1990(1) to 1994(12).
Online complements to Greene (2003), Table F20.1.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
data("BondYield")
plot(BondYield)
This manual page collects a list of examples from the book. Some solutions might not be exact and the list is certainly not complete. If you have suggestions for improvement (preferably in the form of code), please contact the package maintainer.
Cameron, A.C. and Trivedi, P.K. (1998). Regression Analysis of Count Data. Cambridge: Cambridge University Press.
DoctorVisits
, NMES1988
, RecreationDemand
library("MASS")
library("pscl")
###########################################
## Australian health service utilization ##
###########################################
## data
data("DoctorVisits", package = "AER")
## Poisson regression
dv_pois < glm(visits ~ . + I(age^2), data = DoctorVisits, family = poisson)
dv_qpois < glm(visits ~ . + I(age^2), data = DoctorVisits, family = quasipoisson)
## Table 3.3
round(cbind(
Coef = coef(dv_pois),
MLH = sqrt(diag(vcov(dv_pois))),
MLOP = sqrt(diag(vcovOPG(dv_pois))),
NB1 = sqrt(diag(vcov(dv_qpois))),
RS = sqrt(diag(sandwich(dv_pois)))
), digits = 3)
## Table 3.4
## NM2ML
dv_nb < glm.nb(visits ~ . + I(age^2), data = DoctorVisits)
summary(dv_nb)
## NB1GLM = quasipoisson
summary(dv_qpois)
## overdispersion tests (page 79)
lrtest(dv_pois, dv_nb) ## pvalue would need to be halved
dispersiontest(dv_pois, trafo = 1)
dispersiontest(dv_pois, trafo = 2)
##########################################
## Demand for medical care in NMES 1988 ##
##########################################
## select variables for analysis
data("NMES1988", package = "AER")
nmes < NMES1988[,(2:6)]
## dependent variable
## Table 6.1
table(cut(nmes$visits, c(0:13, 100)0.5, labels = 0:13))
## NegBin regression
nmes_nb < glm.nb(visits ~ ., data = nmes)
## NegBin hurdle
nmes_h < hurdle(visits ~ ., data = nmes, dist = "negbin")
## from Table 6.3
lrtest(nmes_nb, nmes_h)
## from Table 6.4
AIC(nmes_nb)
AIC(nmes_nb, k = log(nrow(nmes)))
AIC(nmes_h)
AIC(nmes_h, k = log(nrow(nmes)))
## Table 6.8
coeftest(nmes_h, vcov = sandwich)
logLik(nmes_h)
1/nmes_h$theta
###################################################
## Recreational boating trips to Lake Somerville ##
###################################################
## data
data("RecreationDemand", package = "AER")
## Poisson model:
## Cameron and Trivedi (1998), Table 6.11
## Ozuna and Gomez (1995), Table 2, col. 3
fm_pois < glm(trips ~ ., data = RecreationDemand, family = poisson)
summary(fm_pois)
logLik(fm_pois)
coeftest(fm_pois, vcov = sandwich)
## Negbin model:
## Cameron and Trivedi (1998), Table 6.11
## Ozuna and Gomez (1995), Table 2, col. 5
library("MASS")
fm_nb < glm.nb(trips ~ ., data = RecreationDemand)
coeftest(fm_nb, vcov = vcovOPG)
logLik(fm_nb)
## ZIP model:
## Cameron and Trivedi (1998), Table 6.11
fm_zip < zeroinfl(trips ~ .  quality + income, data = RecreationDemand)
summary(fm_zip)
logLik(fm_zip)
## Hurdle models
## Cameron and Trivedi (1998), Table 6.13
## poissonpoisson
sval < list(count = c(2.15, 0.044, .467, .097, .601, .002, .036, .024),
zero = c(1.88, 0.815, .403, .01, 2.95, 0.006, .052, .046))
fm_hp0 < hurdle(trips ~ ., data = RecreationDemand, dist = "poisson",
zero = "poisson", start = sval, maxit = 0)
fm_hp1 < hurdle(trips ~ ., data = RecreationDemand, dist = "poisson",
zero = "poisson", start = sval)
fm_hp2 < hurdle(trips ~ ., data = RecreationDemand, dist = "poisson",
zero = "poisson")
sapply(list(fm_hp0, fm_hp1, fm_hp2), logLik)
## negbinnegbin
fm_hnb < hurdle(trips ~ ., data = RecreationDemand, dist = "negbin", zero = "negbin")
summary(fm_hnb)
logLik(fm_hnb)
sval < list(count = c(0.841, 0.172, .622, .057, .576, .057, .078, .012),
zero = c(3.046, 4.638, .025, .026, 16.203, 0.030, .156, .117),
theta = c(count = 1/1.7, zero = 1/5.609))
fm_hnb2 < try(hurdle(trips ~ ., data = RecreationDemand,
dist = "negbin", zero = "negbin", start = sval))
if(!inherits(fm_hnb2, "tryerror")) {
summary(fm_hnb2)
logLik(fm_hnb2)
}
## geonegbin
sval98 < list(count = c(0.841, 0.172, .622, .057, .576, .057, .078, .012),
zero = c(2.88, 1.44, .4, .03, 9.43, 0.01, .08, .071),
theta = c(count = 1/1.7))
sval96 < list(count = c(0.841, 0.172, .622, .057, .576, .057, .078, .012),
zero = c(2.882, 1.437, .406, .026, 11.936, 0.008, .081, .071),
theta = c(count = 1/1.7))
fm_hgnb < hurdle(trips ~ ., data = RecreationDemand, dist = "negbin", zero = "geometric")
summary(fm_hgnb)
logLik(fm_hgnb)
## logLik with starting values from Gurmu + Trivedi 1996
fm_hgnb96 < hurdle(trips ~ ., data = RecreationDemand, dist = "negbin", zero = "geometric",
start = sval96, maxit = 0)
logLik(fm_hgnb96)
## logitnegbin
fm_hgnb2 < hurdle(trips ~ ., data = RecreationDemand, dist = "negbin")
summary(fm_hgnb2)
logLik(fm_hgnb2)
## Note: quasicomplete separation
with(RecreationDemand, table(trips > 0, userfee))
Weekly observations on prices and other factors from 1880–1886, for a total of 326 weeks.
data("CartelStability")
A data frame containing 328 observations on 5 variables.
weekly index of price of shipping a ton of grain by rail.
factor. Is a railroad cartel operative?
total tonnage of grain shipped in the week.
factor indicating season of year. To match the weekly data, the calendar has been divided into 13 periods, each approximately 4 weeks long.
factor. Are the Great Lakes innavigable because of ice?
Online complements to Stock and Watson (2007).
Porter, R. H. (1983). A Study of Cartel Stability: The Joint Executive Committee, 1880–1886. The Bell Journal of Economics, 14, 301–314.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
data("CartelStability")
summary(CartelStability)
The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California.
data("CASchools")
A data frame containing 420 observations on 14 variables.
character. District code.
character. School name.
factor indicating county.
factor indicating grade span of district.
Total enrollment.
Number of teachers.
Percent qualifying for CalWorks (income assistance).
Percent qualifying for reducedprice lunch.
Number of computers.
Expenditure per student.
District average income (in USD 1,000).
Percent of English learners.
Average reading score.
Average math score.
The data used here are from all 420 K6 and K8 districts in California with data available for 1998 and 1999. Test scores are on the Stanford 9 standardized test administered to 5th grade students. School characteristics (averaged across the district) include enrollment, number of teachers (measured as “fulltime equivalents”, number of computers per classroom, and expenditures per student. Demographic variables for the students are averaged across the district. The demographic variables include the percentage of students in the public assistance program CalWorks (formerly AFDC), the percentage of students that qualify for a reduced price lunch, and the percentage of students that are English learners (that is, students for whom English is a second language).
Online complements to Stock and Watson (2007).
Stock, J. H. and Watson, M. W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
## data and transformations
data("CASchools")
CASchools$stratio < with(CASchools, students/teachers)
CASchools$score < with(CASchools, (math + read)/2)
## Stock and Watson (2007)
## p. 152
fm1 < lm(score ~ stratio, data = CASchools)
coeftest(fm1, vcov = sandwich)
## p. 159
fm2 < lm(score ~ I(stratio < 20), data = CASchools)
## p. 199
fm3 < lm(score ~ stratio + english, data = CASchools)
## p. 224
fm4 < lm(score ~ stratio + expenditure + english, data = CASchools)
## Table 7.1, p. 242 (numbers refer to columns)
fmc3 < lm(score ~ stratio + english + lunch, data = CASchools)
fmc4 < lm(score ~ stratio + english + calworks, data = CASchools)
fmc5 < lm(score ~ stratio + english + lunch + calworks, data = CASchools)
## More examples can be found in:
## help("StockWatson2007")
Time series of real national income in China per section (index with 1952 = 100).
data("ChinaIncome")
An annual multiple time series from 1952 to 1988 with 5 variables.
Real national income in agriculture sector.
Real national income in industry sector.
Real national income in construction sector.
Real national income in transport sector.
Real national income in commerce sector.
Online complements to Franses (1998).
Chow, G.C. (1993). Capital Formation and Economic Growth in China. Quarterly Journal of Economics, 103, 809–842.
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
data("ChinaIncome")
plot(ChinaIncome)
Crosssection data on cigarette consumption for 46 US States, for the year 1992.
data("CigarettesB")
A data frame containing 46 observations on 3 variables.
Logarithm of cigarette consumption (in packs) per person of smoking age (> 16 years).
Logarithm of real price of cigarette in each state.
Logarithm of real disposable income (per capita) in each state.
The data are from Baltagi (2002).
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
Baltagi, B.H. and Levin, D. (1992). Cigarette Taxation: Raising Revenues and Reducing Consumption. Structural Change and Economic Dynamics, 3, 321–335.
data("CigarettesB")
## Baltagi (2002)
## Table 3.3
cig_lm < lm(packs ~ price, data = CigarettesB)
summary(cig_lm)
## Chapter 5: diagnostic tests (p. 111115)
cig_lm2 < lm(packs ~ price + income, data = CigarettesB)
summary(cig_lm2)
## Glejser tests (p. 112)
ares < abs(residuals(cig_lm2))
summary(lm(ares ~ income, data = CigarettesB))
summary(lm(ares ~ I(1/income), data = CigarettesB))
summary(lm(ares ~ I(1/sqrt(income)), data = CigarettesB))
summary(lm(ares ~ sqrt(income), data = CigarettesB))
## GoldfeldQuandt test (p. 112)
gqtest(cig_lm2, order.by = ~ income, data = CigarettesB, fraction = 12, alternative = "less")
## NOTE: Baltagi computes the test statistic as mss1/mss2,
## i.e., tries to find decreasing variances. gqtest() always uses
## mss2/mss1 and has an "alternative" argument.
## Spearman rank correlation test (p. 113)
cor.test(~ ares + income, data = CigarettesB, method = "spearman")
## BreuschPagan test (p. 113)
bptest(cig_lm2, varformula = ~ income, data = CigarettesB, student = FALSE)
## White test (Table 5.1, p. 113)
bptest(cig_lm2, ~ income * price + I(income^2) + I(price^2), data = CigarettesB)
## White HC standard errors (Table 5.2, p. 114)
coeftest(cig_lm2, vcov = vcovHC(cig_lm2, type = "HC1"))
## JarqueBera test (Figure 5.2, p. 115)
hist(residuals(cig_lm2), breaks = 16, ylim = c(0, 10), col = "lightgray")
library("tseries")
jarque.bera.test(residuals(cig_lm2))
## Tables 8.1 and 8.2
influence.measures(cig_lm2)
## More examples can be found in:
## help("Baltagi2002")
Panel data on cigarette consumption for the 48 continental US States from 1985–1995.
data("CigarettesSW")
A data frame containing 48 observations on 7 variables for 2 periods.
Factor indicating state.
Factor indicating year.
Consumer price index.
State population.
Number of packs per capita.
State personal income (total, nominal).
Average state, federal and average local excise taxes for fiscal year.
Average price during fiscal year, including sales tax.
Average excise taxes for fiscal year, including sales tax.
Online complements to Stock and Watson (2007).
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
## Stock and Watson (2007)
## data and transformations
data("CigarettesSW")
CigarettesSW < transform(CigarettesSW,
rprice = price/cpi,
rincome = income/population/cpi,
rtax = tax/cpi,
rtdiff = (taxs  tax)/cpi
)
c1985 < subset(CigarettesSW, year == "1985")
c1995 < subset(CigarettesSW, year == "1995")
## convenience function: HC1 covariances
hc1 < function(x) vcovHC(x, type = "HC1")
## Equations 12.912.11
fm_s1 < lm(log(rprice) ~ rtdiff, data = c1995)
coeftest(fm_s1, vcov = hc1)
fm_s2 < lm(log(packs) ~ fitted(fm_s1), data = c1995)
fm_ivreg < ivreg(log(packs) ~ log(rprice)  rtdiff, data = c1995)
coeftest(fm_ivreg, vcov = hc1)
## Equation 12.15
fm_ivreg2 < ivreg(log(packs) ~ log(rprice) + log(rincome)  log(rincome) + rtdiff, data = c1995)
coeftest(fm_ivreg2, vcov = hc1)
## Equation 12.16
fm_ivreg3 < ivreg(log(packs) ~ log(rprice) + log(rincome)  log(rincome) + rtdiff + rtax,
data = c1995)
coeftest(fm_ivreg3, vcov = hc1)
## More examples can be found in:
## help("StockWatson2007")
Crosssection data from the High School and Beyond survey conducted by the Department of Education in 1980, with a followup in 1986. The survey included students from approximately 1,100 high schools.
data("CollegeDistance")
A data frame containing 4,739 observations on 14 variables.
factor indicating gender.
factor indicating ethnicity (AfricanAmerican, Hispanic or other).
base year composite test score. These are achievement tests given to high school seniors in the sample.
factor. Is the father a college graduate?
factor. Is the mother a college graduate?
factor. Does the family own their home?
factor. Is the school in an urban area?
county unemployment rate in 1980.
state hourly wage in manufacturing in 1980.
distance from 4year college (in 10 miles).
average state 4year college tuition (in 1000 USD).
number of years of education.
factor. Is the family income above USD 25,000 per year?
factor indicating region (West or other).
Rouse (1995) computed years of education by assigning 12 years to all members of the senior class. Each additional year of secondary education counted as a one year. Students with vocational degrees were assigned 13 years, AA degrees were assigned 14 years, BA degrees were assigned 16 years, those with some graduate education were assigned 17 years, and those with a graduate degree were assigned 18 years.
Stock and Watson (2007) provide separate data files for the students from
Western states and the remaining students. CollegeDistance
includes
both data sets, subsets are easily obtained (see also examples).
Online complements to Stock and Watson (2007).
Rouse, C.E. (1995). Democratization or Diversion? The Effect of Community Colleges on Educational Attainment. Journal of Business & Economic Statistics, 12, 217–224.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
## exclude students from Western states
data("CollegeDistance")
cd < subset(CollegeDistance, region != "west")
summary(cd)
Time series of distribution, market share and price of a fastmoving consumer good.
data("ConsumerGood")
A weekly multiple time series from 1989(11) to 1991(9) with 3 variables.
Distribution.
Market share.
Price.
Online complements to Franses (1998).
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
data("ConsumerGood")
plot(ConsumerGood)
Crosssection data originating from the May 1985 Current Population Survey by the US Census Bureau (random sample drawn for Berndt 1991).
data("CPS1985")
A data frame containing 534 observations on 11 variables.
Wage (in dollars per hour).
Number of years of education.
Number of years of potential work experience
(age  education  6
).
Age in years.
Factor with levels "cauc"
, "hispanic"
,
"other"
.
Factor. Does the individual live in the South?
Factor indicating gender.
Factor with levels "worker"
(tradesperson or assembly line worker),
"technical"
(technical or professional worker), "services"
(service
worker), "office"
(office and clerical worker), "sales"
(sales worker),
"management"
(management and administration).
Factor with levels "manufacturing"
(manufacturing or mining),
"construction"
, "other"
.
Factor. Does the individual work on a union job?
Factor. Is the individual married?
StatLib.
http://lib.stat.cmu.edu/datasets/CPS_85_Wages
Berndt, E.R. (1991). The Practice of Econometrics. New York: AddisonWesley.
data("CPS1985")
## Berndt (1991)
## Exercise 2, p. 196
cps_2b < lm(log(wage) ~ union + education, data = CPS1985)
cps_2c < lm(log(wage) ~ 1 + union + education, data = CPS1985)
## Exercise 3, p. 198/199
cps_3a < lm(log(wage) ~ education + experience + I(experience^2),
data = CPS1985)
cps_3b < lm(log(wage) ~ gender + education + experience + I(experience^2),
data = CPS1985)
cps_3c < lm(log(wage) ~ gender + married + education + experience + I(experience^2),
data = CPS1985)
cps_3e < lm(log(wage) ~ gender*married + education + experience + I(experience^2),
data = CPS1985)
## Exercise 4, p. 199/200
cps_4a < lm(log(wage) ~ gender + union + ethnicity + education + experience + I(experience^2),
data = CPS1985)
cps_4c < lm(log(wage) ~ gender + union + ethnicity + education * experience + I(experience^2),
data = CPS1985)
## Exercise 6, p. 203
cps_6a < lm(log(wage) ~ gender + union + ethnicity + education + experience + I(experience^2),
data = CPS1985)
cps_6a_noeth < lm(log(wage) ~ gender + union + education + experience + I(experience^2),
data = CPS1985)
anova(cps_6a_noeth, cps_6a)
## Exercise 8, p. 208
cps_8a < lm(log(wage) ~ gender + union + ethnicity + education + experience + I(experience^2),
data = CPS1985)
summary(cps_8a)
coeftest(cps_8a, vcov = vcovHC(cps_8a, type = "HC0"))
Crosssection data originating from the March 1988 Current Population Survey by the US Census Bureau.
data("CPS1988")
A data frame containing 28,155 observations on 7 variables.
Wage (in dollars per week).
Number of years of education.
Number of years of potential work experience.
Factor with levels "cauc"
and "afam"
(AfricanAmerican).
Factor. Does the individual reside in a Standard Metropolitan Statistical Area (SMSA)?
Factor with levels "northeast"
, "midwest"
, "south"
, "west"
.
Factor. Does the individual work parttime?
A sample of men aged 18 to 70 with positive annual income greater than USD 50 in 1992, who are not selfemployed nor working without pay. Wages are deflated by the deflator of Personal Consumption Expenditure for 1992.
A problem with CPS data is that it does not provide actual work experience.
It is therefore customary to compute experience as age  education  6
(as was done by Bierens and Ginther, 2001), this may be considered potential experience.
As a result, some respondents have negative experience.
Personal web page of Herman J. Bierens.
Bierens, H.J., and Ginther, D. (2001). Integrated Conditional Moment Testing of Quantile Regression Models. Empirical Economics, 26, 307–324.
Buchinsky, M. (1998). Recent Advances in Quantile Regression Models: A Practical Guide for Empirical Research. Journal of Human Resources, 33, 88–126.
## data and packages
library("quantreg")
data("CPS1988")
CPS1988$region < relevel(CPS1988$region, ref = "south")
## Model equations: Mincertype, quartic, Buchinskytype
mincer < log(wage) ~ ethnicity + education + experience + I(experience^2)
quart < log(wage) ~ ethnicity + education + experience + I(experience^2) +
I(experience^3) + I(experience^4)
buchinsky < log(wage) ~ ethnicity * (education + experience + parttime) +
region*smsa + I(experience^2) + I(education^2) + I(education*experience)
## OLS and LAD fits (for LAD see Bierens and Ginter, Tables 13.A.)
mincer_ols < lm(mincer, data = CPS1988)
quart_ols < lm(quart, data = CPS1988)
buchinsky_ols < lm(buchinsky, data = CPS1988)
quart_lad < rq(quart, data = CPS1988)
mincer_lad < rq(mincer, data = CPS1988)
buchinsky_lad < rq(buchinsky, data = CPS1988)
Stock and Watson (2007) provide several subsets created from March Current Population Surveys (CPS) with data on the relationship of earnings and education over several year.
data("CPSSW9204")
data("CPSSW9298")
data("CPSSW04")
data("CPSSW3")
data("CPSSW8")
data("CPSSWEducation")
CPSSW9298
: A data frame containing 13,501 observations on 5 variables.
CPSSW9204
: A data frame containing 15,588 observations on 5 variables.
CPSSW04
: A data frame containing 7,986 observations on 4 variables.
CPSSW3
: A data frame containing 20,999 observations on 3 variables.
CPSSW8
: A data frame containing 61,395 observations on 5 variables.
CPSSWEducation
: A data frame containing 2,950 observations on 4 variables.
factor indicating year.
average hourly earnings (sum of annual pretax wages, salaries, tips, and bonuses, divided by the number of hours worked annually).
number of years of education.
factor indicating highest educational degree ("bachelor"
or"highschool"
).
factor indicating gender.
age in years.
factor indicating region of residence
("Northeast"
, "Midwest"
, "South"
, "West"
).
Each month the Bureau of Labor Statistics in the US Department of Labor conducts the Current Population Survey (CPS), which provides data on labor force characteristics of the population, including the level of employment, unemployment, and earnings. Approximately 65,000 randomly selected US households are surveyed each month. The sample is chosen by randomly selecting addresses from a database. Details can be found in the Handbook of Labor Statistics and is described on the Bureau of Labor Statistics website (https://www.bls.gov/).
The survey conducted each March is more detailed than in other months and asks questions about earnings during the previous year. The data sets contain data for 2004 (from the March 2005 survey), and some also for earlier years (up to 1992).
If education is given, it is for fulltime workers, defined as workers employed more than 35 hours per week for at least 48 weeks in the previous year. Data are provided for workers whose highest educational achievement is a high school diploma and a bachelor's degree.
Earnings for years earlier than 2004 were adjusted for inflation by putting them in 2004 USD using the Consumer Price Index (CPI). From 1992 to 2004, the price of the CPI market basket rose by 34.6%. To make earnings in 1992 and 2004 comparable, 1992 earnings are inflated by the amount of overall CPI price inflation, by multiplying 1992 earnings by 1.346 to put them into 2004 dollars.
CPSSW9204
provides the distribution of earnings in the US in 1992 and 2004
for collegeeducated fulltime workers aged 25–34.
CPSSW04
is a subset of CPSSW9204
and provides the distribution of
earnings in the US in 2004 for collegeeducated fulltime workers aged 25–34.
CPSSWEducation
is similar (but not a true subset) and contains the
distribution of earnings in the US in 2004 for collegeeducated fulltime workers
aged 29–30.
CPSSW8
contains a larger sample with workers aged 21–64, additionally
providing information about the region of residence.
CPSSW9298
is similar to CPSSW9204
providing data from 1992 and 1998
(with the 1992 subsets not being exactly identical).
CPSSW3
provides trends (from 1992 to 2004) in hourly earnings in the US of
working college graduates aged 25–34 (in 2004 USD).
Online complements to Stock and Watson (2007).
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
StockWatson2007
, CPS1985
, CPS1988
data("CPSSW3")
with(CPSSW3, interaction.plot(year, gender, earnings))
## Stock and Watson, p. 165
data("CPSSWEducation")
plot(earnings ~ education, data = CPSSWEducation)
fm < lm(earnings ~ education, data = CPSSWEducation)
coeftest(fm, vcov = sandwich)
abline(fm)
Crosssection data on the credit history for a sample of applicants for a type of credit card.
data("CreditCard")
A data frame containing 1,319 observations on 12 variables.
Factor. Was the application for a credit card accepted?
Number of major derogatory reports.
Age in years plus twelfths of a year.
Yearly income (in USD 10,000).
Ratio of monthly credit card expenditure to yearly income.
Average monthly credit card expenditure.
Factor. Does the individual own their home?
Factor. Is the individual selfemployed?
Number of dependents.
Months living at current address.
Number of major credit cards held.
Number of active credit accounts.
According to Greene (2003, p. 952) dependents
equals 1 + number of dependents
,
our calculations suggest that it equals number of dependents
.
Greene (2003) provides this data set twice in Table F21.4 and F9.1, respectively.
Table F9.1 has just the observations, rounded to two digits. Here, we give the
F21.4 version, see the examples for the F9.1 version. Note that age
has some
suspiciously low values (below one year) for some applicants. One of these differs
between the F9.1 and F21.4 version.
Online complements to Greene (2003). Table F21.4.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
data("CreditCard")
## Greene (2003)
## extract data set F9.1
ccard < CreditCard[1:100,]
ccard$income < round(ccard$income, digits = 2)
ccard$expenditure < round(ccard$expenditure, digits = 2)
ccard$age < round(ccard$age + .01)
## suspicious:
CreditCard$age[CreditCard$age < 1]
## the first of these is also in TableF9.1 with 36 instead of 0.5:
ccard$age[79] < 36
## Example 11.1
ccard < ccard[order(ccard$income),]
ccard0 < subset(ccard, expenditure > 0)
cc_ols < lm(expenditure ~ age + owner + income + I(income^2), data = ccard0)
## Figure 11.1
plot(residuals(cc_ols) ~ income, data = ccard0, pch = 19)
## Table 11.1
mean(ccard$age)
prop.table(table(ccard$owner))
mean(ccard$income)
summary(cc_ols)
sqrt(diag(vcovHC(cc_ols, type = "HC0")))
sqrt(diag(vcovHC(cc_ols, type = "HC2")))
sqrt(diag(vcovHC(cc_ols, type = "HC1")))
bptest(cc_ols, ~ (age + income + I(income^2) + owner)^2 + I(age^2) + I(income^4), data = ccard0)
gqtest(cc_ols)
bptest(cc_ols, ~ income + I(income^2), data = ccard0, studentize = FALSE)
bptest(cc_ols, ~ income + I(income^2), data = ccard0)
## More examples can be found in:
## help("Greene2003")
Tests the null hypothesis of equidispersion in Poisson GLMs against the alternative of overdispersion and/or underdispersion.
dispersiontest(object, trafo = NULL, alternative = c("greater", "two.sided", "less"))
object 
a fitted Poisson GLM of class 
trafo 
a specification of the alternative (see also details),
can be numeric or a (positive) function or 
alternative 
a character string specifying the alternative hypothesis:

The standard Poisson GLM models the (conditional) mean
$\mathsf{E}[y] = \mu$
which is assumed to be equal to the
variance $\mathsf{VAR}[y] = \mu$
. dispersiontest
assesses the hypothesis that this assumption holds (equidispersion) against
the alternative that the variance is of the form:
$\mathsf{VAR}[y] \quad = \quad \mu \; + \; \alpha \cdot \mathrm{trafo}(\mu).$
Overdispersion corresponds to $\alpha > 0$
and underdispersion to
$\alpha < 0$
. The coefficient $\alpha$
can be estimated
by an auxiliary OLS regression and tested with the corresponding t (or z) statistic
which is asymptotically standard normal under the null hypothesis.
Common specifications of the transformation function $\mathrm{trafo}$
are
$\mathrm{trafo}(\mu) = \mu^2$
or $\mathrm{trafo}(\mu) = \mu$
.
The former corresponds to a negative binomial (NB) model with quadratic variance function
(called NB2 by Cameron and Trivedi, 2005), the latter to a NB model with linear variance
function (called NB1 by Cameron and Trivedi, 2005) or quasiPoisson model with dispersion
parameter, i.e.,
$\mathsf{VAR}[y] \quad = \quad (1 + \alpha) \cdot \mu = \mathrm{dispersion} \cdot \mu.$
By default, for trafo = NULL
, the latter dispersion formulation is used in
dispersiontest
. Otherwise, if trafo
is specified, the test is formulated
in terms of the parameter $\alpha$
. The transformation trafo
can either
be specified as a function or an integer corresponding to the function function(x) x^trafo
,
such that trafo = 1
and trafo = 2
yield the linear and quadratic formulations
respectively.
An object of class "htest"
.
Cameron, A.C. and Trivedi, P.K. (1990). Regressionbased Tests for Overdispersion in the Poisson Model. Journal of Econometrics, 46, 347–364.
Cameron, A.C. and Trivedi, P.K. (1998). Regression Analysis of Count Data. Cambridge: Cambridge University Press.
Cameron, A.C. and Trivedi, P.K. (2005). Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.
data("RecreationDemand")
rd < glm(trips ~ ., data = RecreationDemand, family = poisson)
## linear specification (in terms of dispersion)
dispersiontest(rd)
## linear specification (in terms of alpha)
dispersiontest(rd, trafo = 1)
## quadratic specification (in terms of alpha)
dispersiontest(rd, trafo = 2)
dispersiontest(rd, trafo = function(x) x^2)
## further examples
data("DoctorVisits")
dv < glm(visits ~ . + I(age^2), data = DoctorVisits, family = poisson)
dispersiontest(dv)
data("NMES1988")
nmes < glm(visits ~ health + age + gender + married + income + insurance,
data = NMES1988, family = poisson)
dispersiontest(nmes)
Dow Jones index time series computed at the end of the week where week is assumed to run from Thursday to Wednesday.
data("DJFranses")
A weekly univariate time series from 1980(1) to 1994(42).
Online complements to Franses (1998).
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
data("DJFranses")
plot(DJFranses)
Time series of the Dow Jones Industrial Average (DJIA) index.
data("DJIA8012")
A daily univariate time series from 19800101 to 20121231 (of class "zoo"
with "Date"
index).
Online complements to Franses, van Dijk and Opschoor (2014).
Franses, P.H., van Dijk, D. and Opschoor, A. (2014). Time Series Models for Business and Economic Forecasting, 2nd ed. Cambridge, UK: Cambridge University Press.
data("DJIA8012")
plot(DJIA8012)
# p.26, Figure 2.18
dldjia < diff(log(DJIA8012))
plot(dldjia)
# p.141, Figure 6.4
plot(window(dldjia, start = "19870901", end = "19871231"))
# p.167, Figure 7.1
dldjia9005 < window(dldjia, start = "19900101", end = "20051231")
qqnorm(dldjia9005)
qqline(dldjia9005, lty = 2)
# p.170, Figure 7.4
acf(dldjia9005, na.action = na.exclude, lag.max = 250, ylim = c(0.1, 0.25))
acf(dldjia9005^2, na.action = na.exclude, lag.max = 250, ylim = c(0.1, 0.25))
acf(abs(dldjia9005), na.action = na.exclude, lag.max = 250, ylim = c(0.1, 0.25))
Crosssection data originating from the 1977–1978 Australian Health Survey.
data("DoctorVisits")
A data frame containing 5,190 observations on 12 variables.
Number of doctor visits in past 2 weeks.
Factor indicating gender.
Age in years divided by 100.
Annual income in tens of thousands of dollars.
Number of illnesses in past 2 weeks.
Number of days of reduced activity in past 2 weeks due to illness or injury.
General health questionnaire score using Goldberg's method.
Factor. Does the individual have private health insurance?
Factor. Does the individual have free government health insurance due to low income?
Factor. Does the individual have free government health insurance due to old age, disability or veteran status?
Factor. Is there a chronic condition not limiting activity?
Factor. Is there a chronic condition limiting activity?
Journal of Applied Econometrics Data Archive.
http://qed.econ.queensu.ca/jae/1997v12.3/mullahy/
Cameron, A.C. and Trivedi, P.K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests. Journal of Applied Econometrics, 1, 29–53.
Cameron, A.C. and Trivedi, P.K. (1998). Regression Analysis of Count Data. Cambridge: Cambridge University Press.
Mullahy, J. (1997). Heterogeneity, Excess Zeros, and the Structure of Count Data Models. Journal of Applied Econometrics, 12, 337–350.
data("DoctorVisits", package = "AER")
library("MASS")
## Cameron and Trivedi (1986), Table III, col. (1)
dv_lm < lm(visits ~ . + I(age^2), data = DoctorVisits)
summary(dv_lm)
## Cameron and Trivedi (1998), Table 3.3
dv_pois < glm(visits ~ . + I(age^2), data = DoctorVisits, family = poisson)
summary(dv_pois) ## MLH standard errors
coeftest(dv_pois, vcov = vcovOPG) ## MLOP standard errors
logLik(dv_pois)
## standard errors denoted RS ("unspecified omega robust sandwich estimate")
coeftest(dv_pois, vcov = sandwich)
## Cameron and Trivedi (1986), Table III, col. (4)
dv_nb < glm.nb(visits ~ . + I(age^2), data = DoctorVisits)
summary(dv_nb)
logLik(dv_nb)
Time series of television and radio advertising expenditures (in real terms) in The Netherlands.
data("DutchAdvert")
A fourweekly multiple time series from 1978(1) to 1994(13) with 2 variables.
Television advertising expenditures.
Radio advertising expenditures.
Originally available as an online supplement to Franses (1998). Now available via online complements to Franses, van Dijk and Opschoor (2014).
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
Franses, P.H., van Dijk, D. and Opschoor, A. (2014). Time Series Models for Business and Economic Forecasting, 2nd ed. Cambridge, UK: Cambridge University Press.
data("DutchAdvert")
plot(DutchAdvert)
## EACF tables (Franses 1998, Sec. 5.1, p. 99)
ctrafo < function(x) residuals(lm(x ~ factor(cycle(x))))
ddiff < function(x) diff(diff(x, frequency(x)), 1)
eacf < function(y, lag = 12) {
stopifnot(all(lag > 0))
if(length(lag) < 2) lag < 1:lag
rval < sapply(
list(y = y, dy = diff(y), cdy = ctrafo(diff(y)),
Dy = diff(y, frequency(y)), dDy = ddiff(y)),
function(x) acf(x, plot = FALSE, lag.max = max(lag))$acf[lag + 1])
rownames(rval) < lag
return(rval)
}
## Franses (1998, p. 103), Table 5.4
round(eacf(log(DutchAdvert[,"tv"]), lag = c(1:19, 26, 39)), digits = 3)
Time series of retail sales index in The Netherlands.
data("DutchSales")
A monthly univariate time series from 1960(5) to 1995(9).
Online complements to Franses (1998).
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
data("DutchSales")
plot(DutchSales)
## EACF tables (Franses 1998, p. 99)
ctrafo < function(x) residuals(lm(x ~ factor(cycle(x))))
ddiff < function(x) diff(diff(x, frequency(x)), 1)
eacf < function(y, lag = 12) {
stopifnot(all(lag > 0))
if(length(lag) < 2) lag < 1:lag
rval < sapply(
list(y = y, dy = diff(y), cdy = ctrafo(diff(y)),
Dy = diff(y, frequency(y)), dDy = ddiff(y)),
function(x) acf(x, plot = FALSE, lag.max = max(lag))$acf[lag + 1])
rownames(rval) < lag
return(rval)
}
## Franses (1998), Table 5.3
round(eacf(log(DutchSales), lag = c(1:18, 24, 36)), digits = 3)
Cost function data for 145 (+14) US electricity producers in 1955.
data("Electricity1955")
A data frame containing 159 observations on 8 variables.
total cost.
total output.
wage rate.
cost share for labor.
capital price index.
cost share for capital.
fuel price.
cost share for fuel.
The data contains several extra observations that are aggregates of commonly owned firms. Only the first 145 observations should be used for analysis.
Online complements to Greene (2003). Table F14.2.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Nerlove, M. (1963) “Returns to Scale in Electricity Supply.” In C. Christ (ed.), Measurement in Economics: Studies in Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld. Stanford University Press, 1963.
data("Electricity1955")
Electricity < Electricity1955[1:145,]
## Greene (2003)
## Example 7.3
## CobbDouglas cost function
fm_all < lm(log(cost/fuel) ~ log(output) + log(labor/fuel) + log(capital/fuel),
data = Electricity)
summary(fm_all)
## hypothesis of constant returns to scale
linearHypothesis(fm_all, "log(output) = 1")
## Table 7.4
## log quadratic cost function
fm_all2 < lm(log(cost/fuel) ~ log(output) + I(log(output)^2) + log(labor/fuel) + log(capital/fuel),
data = Electricity)
summary(fm_all2)
## More examples can be found in:
## help("Greene2003")
Crosssection data, at the firm level, on electric power generation.
data("Electricity1970")
A data frame containing 158 crosssection observations on 9 variables.
total cost.
total output.
wage rate.
cost share for labor.
capital price index.
cost share for capital.
fuel price.
cost share for fuel.
The data are from Christensen and Greene (1976) and pertain to the year 1970. However, the file contains some extra observations, the holding companies. Only the first 123 observations are needed to replicate Christensen and Greene (1976).
Online complements to Greene (2003), Table F5.2.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Christensen, L. and Greene, W.H. (1976). Economies of Scale in U.S. Electric Power Generation. Journal of Political Economy, 84, 655–676.
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
data("Electricity1970")
## Greene (2003), Ex. 5.6: a generalized CobbDouglas cost function
fm < lm(log(cost/fuel) ~ log(output) + I(log(output)^2/2) +
log(capital/fuel) + log(labor/fuel), data=Electricity1970[1:123,])
Analysis of citations of evolutionary biology papers published in 1998 in the top three journals (as judged by their 5year impact factors in the Thomson Reuters Journal Citation Reports 2010).
data("EquationCitations")
A data frame containing 649 observations on 13 variables.
Factor. Journal in which the paper was published (The American Naturalist, Evolution, Proceedings of the Royal Society of London B: Biological Sciences).
Character. Names of authors.
Volume in which the paper was published.
Starting page of publication.
Number of pages.
Number of equations in total.
Number of equations in main text.
Number of equations in appendix.
Number of citations in total.
Number of citations by the authors themselves.
Number of citations by other authors.
Number of citations by theoretical papers.
Number of citations by nontheoretical papers.
Fawcett and Higginson (2012) investigate the relationship between the number of citations evolutionary biology papers receive, depending on the number of equations per page in the cited paper. Overall it can be shown that papers with many mathematical equations significantly lower the number of citations they receive, in particular from nontheoretical papers.
Online supplements to Fawcett and Higginson (2012).
https://www.pnas.org/doi/suppl/10.1073/pnas.1205259109/suppl_file/sd01.xlsx
Fawcett, T.W. and Higginson, A.D. (2012). Heavy Use of Equations Impedes Communication among Biologists. PNAS – Proceedings of the National Academy of Sciences of the United States of America, 109, 11735–11739. doi:10.1073/pnas.1205259109
## load data and MASS package
data("EquationCitations", package = "AER")
library("MASS")
## convenience function for summarizing NB models
nbtable < function(obj, digits = 3) round(cbind(
"OR" = exp(coef(obj)),
"CI" = exp(confint.default(obj)),
"Wald z" = coeftest(obj)[,3],
"p" = coeftest(obj)[, 4]), digits = digits)
#################
## Replication ##
#################
## Table 1
m1a < glm.nb(othercites ~ I(equations/pages) * pages + journal,
data = EquationCitations)
m1b < update(m1a, nontheocites ~ .)
m1c < update(m1a, theocites ~ .)
nbtable(m1a)
nbtable(m1b)
nbtable(m1c)
## Table 2
m2a < glm.nb(
othercites ~ (I(mainequations/pages) + I(appequations/pages)) * pages + journal,
data = EquationCitations)
m2b < update(m2a, nontheocites ~ .)
m2c < update(m2a, theocites ~ .)
nbtable(m2a)
nbtable(m2b)
nbtable(m2c)
###############
## Extension ##
###############
## nonlinear page effect: use log(pages) instead of pages+interaction
m3a < glm.nb(othercites ~ I(equations/pages) + log(pages) + journal,
data = EquationCitations)
m3b < update(m3a, nontheocites ~ .)
m3c < update(m3a, theocites ~ .)
## nested models: allow different equation effects over journals
m4a < glm.nb(othercites ~ journal / I(equations/pages) + log(pages),
data = EquationCitations)
m4b < update(m4a, nontheocites ~ .)
m4c < update(m4a, theocites ~ .)
## nested model best (wrt AIC) for all responses
AIC(m1a, m2a, m3a, m4a)
nbtable(m4a)
AIC(m1b, m2b, m3b, m4b)
nbtable(m4b)
AIC(m1c, m2c, m3c, m4c)
nbtable(m4c)
## equation effect by journal/response
## comb nontheo theo
## AmNat =/  +
## Evolution =/+ = +
## ProcB   =/+
Statewide data on transportation equipment manufacturing for 25 US states.
data("Equipment")
A data frame containing 25 observations on 4 variables.
Aggregate output, in millions of 1957 dollars.
Capital input, in millions of 1957 dollars.
Aggregate labor input, in millions of man hours.
Number of firms.
Journal of Applied Econometrics Data Archive.
http://qed.econ.queensu.ca/jae/1998v13.2/zellnerryu/
Online complements to Greene (2003), Table F9.2.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Zellner, A. and Revankar, N. (1969). Generalized Production Functions. Review of Economic Studies, 36, 241–250.
Zellner, A. and Ryu, H. (1998). Alternative Functional Forms for Production, Cost and Returns to Scale Functions. Journal of Applied Econometrics, 13, 101–127.
## Greene (2003), Example 17.5
data("Equipment")
## CobbDouglas
fm_cd < lm(log(valueadded/firms) ~ log(capital/firms) + log(labor/firms), data = Equipment)
## generalized CobbDouglas with ZellnerRevankar trafo
GCobbDouglas < function(theta)
lm(I(log(valueadded/firms) + theta * valueadded/firms) ~ log(capital/firms) + log(labor/firms),
data = Equipment)
## yields classical CobbDouglas for theta = 0
fm_cd0 < GCobbDouglas(0)
## ML estimation of generalized model
## choose starting values from classical model
par0 < as.vector(c(coef(fm_cd0), 0, mean(residuals(fm_cd0)^2)))
## set up likelihood function
nlogL < function(par) {
beta < par[1:3]
theta < par[4]
sigma2 < par[5]
Y < with(Equipment, valueadded/firms)
K < with(Equipment, capital/firms)
L < with(Equipment, labor/firms)
rhs < beta[1] + beta[2] * log(K) + beta[3] * log(L)
lhs < log(Y) + theta * Y
rval < sum(log(1 + theta * Y)  log(Y) +
dnorm(lhs, mean = rhs, sd = sqrt(sigma2), log = TRUE))
return(rval)
}
## optimization
opt < optim(par0, nlogL, hessian = TRUE)
## Table 17.2
opt$par
sqrt(diag(solve(opt$hessian)))[1:4]
opt$value
## refit ML model
fm_ml < GCobbDouglas(opt$par[4])
deviance(fm_ml)
sqrt(diag(vcov(fm_ml)))
## fit NLS model
rss < function(theta) deviance(GCobbDouglas(theta))
optim(0, rss)
opt2 < optimize(rss, c(1, 1))
fm_nls < GCobbDouglas(opt2$minimum)
nlogL(c(coef(fm_nls), opt2$minimum, mean(residuals(fm_nls)^2)))
Crosssection data on energy consumption for 20 European countries, for the year 1980.
data("EuroEnergy")
A data frame containing 20 observations on 2 variables.
Real gross domestic product for the year 1980 (in million 1975 US dollars).
Aggregate energy consumption (in million kilograms coal equivalence).
The data are from Baltagi (2002).
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
data("EuroEnergy")
energy_lm < lm(log(energy) ~ log(gdp), data = EuroEnergy)
influence.measures(energy_lm)
US traffic fatalities panel data for the “lower 48” US states (i.e., excluding Alaska and Hawaii), annually for 1982 through 1988.
data("Fatalities")
A data frame containing 336 observations on 34 variables.
factor indicating state.
factor indicating year.
numeric. Spirits consumption.
numeric. Unemployment rate.
numeric. Per capita personal income in 1987 dollars.
numeric. Employment/population ratio.
numeric. Tax on case of beer.
numeric. Percent of southern baptist.
numeric. Percent of mormon.
numeric. Minimum legal drinking age.
numeric. Percent residing in “dry” countries.
numeric. Percent of drivers aged 15–24.
numeric. Average miles per driver.
factor. Preliminary breath test law?
factor. Mandatory jail sentence?
factor. Mandatory community service?
numeric. Number of vehicle fatalities.
numeric. Number of nighttime vehicle fatalities.
numeric. Number of single vehicle fatalities.
numeric. Number of vehicle fatalities, 15–17 year olds.
numeric. Number of nighttime vehicle fatalities, 15–17 year olds.
numeric. Number of vehicle fatalities, 18–20 year olds.
numeric. Number of nighttime vehicle fatalities, 18–20 year olds.
numeric. Number of vehicle fatalities, 21–24 year olds.
numeric. Number of nighttime vehicle fatalities, 21–24 year olds.
numeric. Number of alcoholinvolved vehicle fatalities.
numeric. Population.
numeric. Population, 15–17 year olds.
numeric. Population, 18–20 year olds.
numeric. Population, 21–24 year olds.
numeric. Total vehicle miles (millions).
numeric. US unemployment rate.
numeric. US employment/population ratio.
numeric. GSP rate of change.
Traffic fatalities are from the US Department of Transportation Fatal Accident Reporting System. The beer tax is the tax on a case of beer, which is an available measure of state alcohol taxes more generally. The drinking age variable is a factor indicating whether the legal drinking age is 18, 19, or 20. The two binary punishment variables describe the state's minimum sentencing requirements for an initial drunk driving conviction.
Total vehicle miles traveled annually by state was obtained from the Department of Transportation. Personal income was obtained from the US Bureau of Economic Analysis, and the unemployment rate was obtained from the US Bureau of Labor Statistics.
Online complements to Stock and Watson (2007).
Ruhm, C. J. (1996). Alcohol Policies and Highway Vehicle Fatalities. Journal of Health Economics, 15, 435–454.
Stock, J. H. and Watson, M. W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
## data from Stock and Watson (2007)
data("Fatalities", package = "AER")
## add fatality rate (number of traffic deaths
## per 10,000 people living in that state in that year)
Fatalities$frate < with(Fatalities, fatal/pop * 10000)
## add discretized version of minimum legal drinking age
Fatalities$drinkagec < cut(Fatalities$drinkage,
breaks = 18:22, include.lowest = TRUE, right = FALSE)
Fatalities$drinkagec < relevel(Fatalities$drinkagec, ref = 4)
## any punishment?
Fatalities$punish < with(Fatalities,
factor(jail == "yes"  service == "yes", labels = c("no", "yes")))
## plm package
library("plm")
## for comparability with Stata we use HC1 below
## p. 351, Eq. (10.2)
f1982 < subset(Fatalities, year == "1982")
fm_1982 < lm(frate ~ beertax, data = f1982)
coeftest(fm_1982, vcov = vcovHC(fm_1982, type = "HC1"))
## p. 353, Eq. (10.3)
f1988 < subset(Fatalities, year == "1988")
fm_1988 < lm(frate ~ beertax, data = f1988)
coeftest(fm_1988, vcov = vcovHC(fm_1988, type = "HC1"))
## pp. 355, Eq. (10.8)
fm_diff < lm(I(f1988$frate  f1982$frate) ~ I(f1988$beertax  f1982$beertax))
coeftest(fm_diff, vcov = vcovHC(fm_diff, type = "HC1"))
## pp. 360, Eq. (10.15)
## (1) via formula
fm_sfe < lm(frate ~ beertax + state  1, data = Fatalities)
## (2) by hand
fat < with(Fatalities,
data.frame(frates = frate  ave(frate, state),
beertaxs = beertax  ave(beertax, state)))
fm_sfe2 < lm(frates ~ beertaxs  1, data = fat)
## (3) via plm()
fm_sfe3 < plm(frate ~ beertax, data = Fatalities,
index = c("state", "year"), model = "within")
coeftest(fm_sfe, vcov = vcovHC(fm_sfe, type = "HC1"))[1,]
## uses different df in sd and pvalue
coeftest(fm_sfe2, vcov = vcovHC(fm_sfe2, type = "HC1"))[1,]
## uses different df in pvalue
coeftest(fm_sfe3, vcov = vcovHC(fm_sfe3, type = "HC1", method = "white1"))[1,]
## pp. 363, Eq. (10.21)
## via lm()
fm_stfe < lm(frate ~ beertax + state + year  1, data = Fatalities)
coeftest(fm_stfe, vcov = vcovHC(fm_stfe, type = "HC1"))[1,]
## via plm()
fm_stfe2 < plm(frate ~ beertax, data = Fatalities,
index = c("state", "year"), model = "within", effect = "twoways")
coeftest(fm_stfe2, vcov = vcovHC) ## different
## p. 368, Table 10.1, numbers refer to cols.
fm1 < plm(frate ~ beertax, data = Fatalities, index = c("state", "year"), model = "pooling")
fm2 < plm(frate ~ beertax, data = Fatalities, index = c("state", "year"), model = "within")
fm3 < plm(frate ~ beertax, data = Fatalities, index = c("state", "year"), model = "within",
effect = "twoways")
fm4 < plm(frate ~ beertax + drinkagec + jail + service + miles + unemp + log(income),
data = Fatalities, index = c("state", "year"), model = "within", effect = "twoways")
fm5 < plm(frate ~ beertax + drinkagec + jail + service + miles,
data = Fatalities, index = c("state", "year"), model = "within", effect = "twoways")
fm6 < plm(frate ~ beertax + drinkage + punish + miles + unemp + log(income),
data = Fatalities, index = c("state", "year"), model = "within", effect = "twoways")
fm7 < plm(frate ~ beertax + drinkagec + jail + service + miles + unemp + log(income),
data = Fatalities, index = c("state", "year"), model = "within", effect = "twoways")
## summaries not too close, s.e.s generally too small
coeftest(fm1, vcov = vcovHC)
coeftest(fm2, vcov = vcovHC)
coeftest(fm3, vcov = vcovHC)
coeftest(fm4, vcov = vcovHC)
coeftest(fm5, vcov = vcovHC)
coeftest(fm6, vcov = vcovHC)
coeftest(fm7, vcov = vcovHC)
## TODO: Testing exclusion restrictions
Crosssection data from the 1980 US Census on married women aged 21–35 with two or more children.
data("Fertility")
data("Fertility2")
A data frame containing 254,654 (and 30,000, respectively) observations on 8 variables.
factor. Does the mother have more than 2 children?
factor indicating gender of first child.
factor indicating gender of second child.
age of mother at census.
factor. Is the mother AfricanAmerican?
factor. Is the mother Hispanic?
factor. Is the mother's ethnicity neither AfricanAmerican nor Hispanic, nor Caucasian? (see below)
number of weeks in which the mother worked in 1979.
Fertility2
is a random subset of Fertility
with 30,000 observations.
There are conflicts in the ethnicity coding (see also examples). Hence, it was not possible to create a single factor and the original three indicator variables have been retained.
Not all variables from Angrist and Evans (1998) have been included.
Online complements to Stock and Watson (2007).
Angrist, J.D., and Evans, W.N. (1998). Children and Their Parents' Labor Supply: Evidence from Exogenous Variation in Family Size American Economic Review, 88, 450–477.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
data("Fertility2")
## conflicts in ethnicity coding
ftable(xtabs(~ afam + hispanic + other, data = Fertility2))
## create convenience variables
Fertility2$mkids < with(Fertility2, as.numeric(morekids)  1)
Fertility2$samegender < with(Fertility2, factor(gender1 == gender2))
Fertility2$twoboys < with(Fertility2, factor(gender1 == "male" & gender2 == "male"))
Fertility2$twogirls < with(Fertility2, factor(gender1 == "female" & gender2 == "female"))
## similar to Angrist and Evans, p. 462
fm1 < lm(mkids ~ samegender, data = Fertility2)
summary(fm1)
fm2 < lm(mkids ~ gender1 + gender2 + samegender + age + afam + hispanic + other, data = Fertility2)
summary(fm2)
fm3 < lm(mkids ~ gender1 + twoboys + twogirls + age + afam + hispanic + other, data = Fertility2)
summary(fm3)
This manual page collects a list of examples from the book. Some solutions might not be exact and the list is certainly not complete. If you have suggestions for improvement (preferably in the form of code), please contact the package maintainer.
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
ArgentinaCPI
, ChinaIncome
, ConsumerGood
,
DJFranses
, DutchAdvert
, DutchSales
,
GermanUnemployment
, MotorCycles
, OlympicTV
,
PepperPrice
, UKNonDurables
, USProdIndex
###########################
## Convenience functions ##
###########################
## EACF tables (Franses 1998, p. 99)
ctrafo < function(x) residuals(lm(x ~ factor(cycle(x))))
ddiff < function(x) diff(diff(x, frequency(x)), 1)
eacf < function(y, lag = 12) {
stopifnot(all(lag > 0))
if(length(lag) < 2) lag < 1:lag
rval < sapply(
list(y = y, dy = diff(y), cdy = ctrafo(diff(y)),
Dy = diff(y, frequency(y)), dDy = ddiff(y)),
function(x) acf(x, plot = FALSE, lag.max = max(lag))$acf[lag + 1])
rownames(rval) < lag
return(rval)
}
#######################################
## Index of US industrial production ##
#######################################
data("USProdIndex", package = "AER")
plot(USProdIndex, plot.type = "single", col = 1:2)
## Franses (1998), Table 5.1
round(eacf(log(USProdIndex[,1])), digits = 3)
## Franses (1998), Equation 5.6: Unrestricted airline model
## (Franses: ma1 = 0.388 (0.063), ma4 = 0.739 (0.060), ma5 = 0.452 (0.069))
arima(log(USProdIndex[,1]), c(0, 1, 5), c(0, 1, 0), fixed = c(NA, 0, 0, NA, NA))
###########################################
## Consumption of nondurables in the UK ##
###########################################
data("UKNonDurables", package = "AER")
plot(UKNonDurables)
## Franses (1998), Table 5.2
round(eacf(log(UKNonDurables)), digits = 3)
## Franses (1998), Equation 5.51
## (Franses: sma1 = 0.632 (0.069))
arima(log(UKNonDurables), c(0, 1, 0), c(0, 1, 1))
##############################
## Dutch retail sales index ##
##############################
data("DutchSales", package = "AER")
plot(DutchSales)
## Franses (1998), Table 5.3
round(eacf(log(DutchSales), lag = c(1:18, 24, 36)), digits = 3)
###########################################
## TV and radio advertising expenditures ##
###########################################
data("DutchAdvert", package = "AER")
plot(DutchAdvert)
## Franses (1998), Table 5.4
round(eacf(log(DutchAdvert[,"tv"]), lag = c(1:19, 26, 39)), digits = 3)
Monthly data on the price of frozen orange juice concentrate and temperature in the orangegrowing region of Florida.
data("FrozenJuice")
A monthly multiple time series from 1950(1) to 2000(12) with 3 variables.
Average producer price for frozen orange juice.
Producer price index for finished goods. Used to deflate the overall producer price index for finished goods to eliminate the effects of overall price inflation.
Number of freezing degree days at the Orlando, Florida, airport.
Calculated as the sum of the number of degrees Fahrenheit that the
minimum temperature falls below freezing (32 degrees Fahrenheit = about 0 degrees Celsius)
in a given day over all days in the month: fdd
= sum(max(0, 32  minimum daily temperature)),
e.g. for February fdd
is the number of freezing degree days from January 11
to February 10.
The orange juice price data are the frozen orange juice component of processed foods and feeds group of the Producer Price Index (PPI), collected by the US Bureau of Labor Statistics (BLS series wpu02420301). The orange juice price series was divided by the overall PPI for finished goods to adjust for general price inflation. The freezing degree days series was constructed from daily minimum temperatures recorded at Orlando area airports, obtained from the National Oceanic and Atmospheric Administration (NOAA) of the US Department of Commerce.
Online complements to Stock and Watson (2007).
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
## load data
data("FrozenJuice")
## Stock and Watson, p. 594
library("dynlm")
fm_dyn < dynlm(d(100 * log(price/ppi)) ~ fdd, data = FrozenJuice)
coeftest(fm_dyn, vcov = vcovHC(fm_dyn, type = "HC1"))
## equivalently, returns can be computed 'by hand'
## (reducing the complexity of the formula notation)
fj < ts.union(fdd = FrozenJuice[, "fdd"],
ret = 100 * diff(log(FrozenJuice[,"price"]/FrozenJuice[,"ppi"])))
fm_dyn < dynlm(ret ~ fdd, data = fj)
## Stock and Watson, p. 595
fm_dl < dynlm(ret ~ L(fdd, 0:6), data = fj)
coeftest(fm_dl, vcov = vcovHC(fm_dl, type = "HC1"))
## Stock and Watson, Table 15.1, p. 620, numbers refer to columns
## (1) Dynamic Multipliers
fm1 < dynlm(ret ~ L(fdd, 0:18), data = fj)
coeftest(fm1, vcov = NeweyWest(fm1, lag = 7, prewhite = FALSE))
## (2) Cumulative Multipliers
fm2 < dynlm(ret ~ L(d(fdd), 0:17) + L(fdd, 18), data = fj)
coeftest(fm2, vcov = NeweyWest(fm2, lag = 7, prewhite = FALSE))
## (3) Cumulative Multipliers, more lags in NW
coeftest(fm2, vcov = NeweyWest(fm2, lag = 14, prewhite = FALSE))
## (4) Cumulative Multipliers with monthly indicators
fm4 < dynlm(ret ~ L(d(fdd), 0:17) + L(fdd, 18) + season(fdd), data = fj)
coeftest(fm4, vcov = NeweyWest(fm4, lag = 7, prewhite = FALSE))
## monthly indicators needed?
fm4r < update(fm4, . ~ .  season(fdd))
waldtest(fm4, fm4r, vcov= NeweyWest(fm4, lag = 7, prewhite = FALSE)) ## close ...
Time series of unemployment rate (in percent) in Germany.
data("GermanUnemployment")
A quarterly multiple time series from 1962(1) to 1991(4) with 2 variables.
Raw unemployment rate,
Seasonally adjusted rate.
Online complements to Franses (1998).
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
data("GermanUnemployment")
plot(GermanUnemployment, plot.type = "single", col = 1:2)
Time series of gold and silver prices.
data("GoldSilver")
A daily multiple time series from 19771230 to 20121231 (of class "zoo"
with "Date"
index).
spot price for gold,
spot price for silver.
Online complements to Franses, van Dijk and Opschoor (2014).
Franses, P.H., van Dijk, D. and Opschoor, A. (2014). Time Series Models for Business and Economic Forecasting, 2nd ed. Cambridge, UK: Cambridge University Press.
data("GoldSilver", package = "AER")
## p.31, daily returns
lgs < log(GoldSilver)
plot(lgs[, c("silver", "gold")])
dlgs < 100 * diff(lgs)
plot(dlgs[, c("silver", "gold")])
## p.31, monthly log prices
lgs7812 < window(lgs, start = as.Date("19780101"))
lgs7812m < aggregate(lgs7812, as.Date(as.yearmon(time(lgs7812))), mean)
plot(lgs7812m, plot.type = "single", lty = 1:2, lwd = 2)
## p.93, empirical ACF of absolute daily gold returns, 19780101  20121231
absgret < abs(100 * diff(lgs7812[, "gold"]))
sacf < acf(absgret, lag.max = 200, na.action = na.exclude, plot = FALSE)
plot(1:201, sacf$acf, ylim = c(0.04, 0.28), type = "l", xaxs = "i", yaxs = "i", las = 1)
## ARFIMA(0,1,1) model, eq. (4.44)
library("longmemo")
WhittleEst(absgret, model = "fARIMA", p = 0, q = 1, start = list(H = 0.3, MA = .25))
library("forecast")
arfima(as.vector(absgret), max.p = 0, max.q = 1)
## p.254: VAR(2), monthly data for 1986.1  2012.12
library("vars")
lgs8612 < window(lgs, start = as.Date("19860101"))
dim(lgs8612)
lgs8612m < aggregate(lgs8612, as.Date(as.yearmon(time(lgs8612))), mean)
plot(lgs8612m)
dim(lgs8612m)
VARselect(lgs8612m, 5)
gs2 < VAR(lgs8612m, 2)
summary(gs2)
summary(gs2)$covres
## ACF of residuals, p.256
acf(resid(gs2), 2, plot = FALSE)
## Figure 9.1, p.260 (somewhat different)
plot(irf(gs2, impulse = "gold", n.ahead = 50), ylim = c(0.02, 0.1))
plot(irf(gs2, impulse = "silver", n.ahead = 50), ylim = c(0.02, 0.1))
## Table 9.2, p.261
fevd(gs2)
## p.266
ls < lgs8612[, "silver"]
lg < lgs8612[, "gold"]
gsreg < lm(lg ~ ls)
summary(gsreg)
sgreg < lm(ls ~ lg)
summary(sgreg)
library("tseries")
adf.test(resid(gsreg), k = 0)
adf.test(resid(sgreg), k = 0)
This manual page collects a list of examples from the book. Some solutions might not be exact and the list is certainly not complete. If you have suggestions for improvement (preferably in the form of code), please contact the package maintainer.
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall. URL https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm.
Affairs
, BondYield
, CreditCard
,
Electricity1955
, Electricity1970
, Equipment
,
Grunfeld
, KleinI
, Longley
,
ManufactCosts
, MarkPound
, Municipalities
,
ProgramEffectiveness
, PSID1976
, SIC33
,
ShipAccidents
, StrikeDuration
, TechChange
,
TravelMode
, UKInflation
, USConsump1950
,
USConsump1979
, USGasG
, USAirlines
,
USInvest
, USMacroG
, USMoney
#####################################
## US consumption data (19701979) ##
#####################################
## Example 1.1
data("USConsump1979", package = "AER")
plot(expenditure ~ income, data = as.data.frame(USConsump1979), pch = 19)
fm < lm(expenditure ~ income, data = as.data.frame(USConsump1979))
summary(fm)
abline(fm)
#####################################
## US consumption data (19401950) ##
#####################################
## data
data("USConsump1950", package = "AER")
usc < as.data.frame(USConsump1950)
usc$war < factor(usc$war, labels = c("no", "yes"))
## Example 2.1
plot(expenditure ~ income, data = usc, type = "n", xlim = c(225, 375), ylim = c(225, 350))
with(usc, text(income, expenditure, time(USConsump1950)))
## single model
fm < lm(expenditure ~ income, data = usc)
summary(fm)
## different intercepts for war yes/no
fm2 < lm(expenditure ~ income + war, data = usc)
summary(fm2)
## compare
anova(fm, fm2)
## visualize
abline(fm, lty = 3)
abline(coef(fm2)[1:2])
abline(sum(coef(fm2)[c(1, 3)]), coef(fm2)[2], lty = 2)
## Example 3.2
summary(fm)$r.squared
summary(lm(expenditure ~ income, data = usc, subset = war == "no"))$r.squared
summary(fm2)$r.squared
########################
## US investment data ##
########################
data("USInvest", package = "AER")
## Chapter 3 in Greene (2003)
## transform (and round) data to match Table 3.1
us < as.data.frame(USInvest)
us$invest < round(0.1 * us$invest/us$price, digits = 3)
us$gnp < round(0.1 * us$gnp/us$price, digits = 3)
us$inflation < c(4.4, round(100 * diff(us$price)/us$price[15], digits = 2))
us$trend < 1:15
us < us[, c(2, 6, 1, 4, 5)]
## p. 2224
coef(lm(invest ~ trend + gnp, data = us))
coef(lm(invest ~ gnp, data = us))
## Example 3.1, Table 3.2
cor(us)[1,1]
pcor < solve(cor(us))
dcor < 1/sqrt(diag(pcor))
pcor < (pcor * (dcor %o% dcor))[1,1]
## Table 3.4
fm < lm(invest ~ trend + gnp + interest + inflation, data = us)
fm1 < lm(invest ~ 1, data = us)
anova(fm1, fm)
## Example 4.1
set.seed(123)
w < rnorm(10000)
x < rnorm(10000)
eps < 0.5 * w
y < 0.5 + 0.5 * x + eps
b < rep(0, 500)
for(i in 1:500) {
ix < sample(1:10000, 100)
b[i] < lm.fit(cbind(1, x[ix]), y[ix])$coef[2]
}
hist(b, breaks = 20, col = "lightgray")
###############################
## Longley's regression data ##
###############################
## package and data
data("Longley", package = "AER")
library("dynlm")
## Example 4.6
fm1 < dynlm(employment ~ time(employment) + price + gnp + armedforces,
data = Longley)
fm2 < update(fm1, end = 1961)
cbind(coef(fm2), coef(fm1))
## Figure 4.3
plot(rstandard(fm2), type = "b", ylim = c(3, 3))
abline(h = c(2, 2), lty = 2)
#########################################
## US gasoline market data (19601995) ##
#########################################
## data
data("USGasG", package = "AER")
## Greene (2003)
## Example 2.3
fm < lm(log(gas/population) ~ log(price) + log(income) + log(newcar) + log(usedcar),
data = as.data.frame(USGasG))
summary(fm)
## Example 4.4
## estimates and standard errors (note different offset for intercept)
coef(fm)
sqrt(diag(vcov(fm)))
## confidence interval
confint(fm, parm = "log(income)")
## test linear hypothesis
linearHypothesis(fm, "log(income) = 1")
## Figure 7.5
plot(price ~ gas, data = as.data.frame(USGasG), pch = 19,
col = (time(USGasG) > 1973) + 1)
legend("topleft", legend = c("after 1973", "up to 1973"), pch = 19, col = 2:1, bty = "n")
## Example 7.6
## reused in Example 8.3
## linear time trend
ltrend < 1:nrow(USGasG)
## shock factor
shock < factor(time(USGasG) > 1973, levels = c(FALSE, TRUE), labels = c("before", "after"))
## 19601995
fm1 < lm(log(gas/population) ~ log(income) + log(price) + log(newcar) + log(usedcar) + ltrend,
data = as.data.frame(USGasG))
summary(fm1)
## pooled
fm2 < lm(
log(gas/population) ~ shock + log(income) + log(price) + log(newcar) + log(usedcar) + ltrend,
data = as.data.frame(USGasG))
summary(fm2)
## segmented
fm3 < lm(
log(gas/population) ~ shock/(log(income) + log(price) + log(newcar) + log(usedcar) + ltrend),
data = as.data.frame(USGasG))
summary(fm3)
## Chow test
anova(fm3, fm1)
library("strucchange")
sctest(log(gas/population) ~ log(income) + log(price) + log(newcar) + log(usedcar) + ltrend,
data = USGasG, point = c(1973, 1), type = "Chow")
## Recursive CUSUM test
rcus < efp(log(gas/population) ~ log(income) + log(price) + log(newcar) + log(usedcar) + ltrend,
data = USGasG, type = "RecCUSUM")
plot(rcus)
sctest(rcus)
## Note: Greene's remark that the break is in 1984 (where the process crosses its boundary)
## is wrong. The break appears to be no later than 1976.
## Example 12.2
library("dynlm")
resplot < function(obj, bound = TRUE) {
res < residuals(obj)
sigma < summary(obj)$sigma
plot(res, ylab = "Residuals", xlab = "Year")
grid()
abline(h = 0)
if(bound) abline(h = c(2, 2) * sigma, col = "red")
lines(res)
}
resplot(dynlm(log(gas/population) ~ log(price), data = USGasG))
resplot(dynlm(log(gas/population) ~ log(price) + log(income), data = USGasG))
resplot(dynlm(log(gas/population) ~ log(price) + log(income) + log(newcar) + log(usedcar) +
log(transport) + log(nondurable) + log(durable) +log(service) + ltrend, data = USGasG))
## different shock variable than in 7.6
shock < factor(time(USGasG) > 1974, levels = c(FALSE, TRUE), labels = c("before", "after"))
resplot(dynlm(log(gas/population) ~ shock/(log(price) + log(income) + log(newcar) + log(usedcar) +
log(transport) + log(nondurable) + log(durable) + log(service) + ltrend), data = USGasG))
## NOTE: something seems to be wrong with the sigma estimates in the `full' models
## Table 12.4, OLS
fm < dynlm(log(gas/population) ~ log(price) + log(income) + log(newcar) + log(usedcar),
data = USGasG)
summary(fm)
resplot(fm, bound = FALSE)
dwtest(fm)
## ML
g < as.data.frame(USGasG)
y < log(g$gas/g$population)
X < as.matrix(cbind(log(g$price), log(g$income), log(g$newcar), log(g$usedcar)))
arima(y, order = c(1, 0, 0), xreg = X)
#######################################
## US macroeconomic data (19502000) ##
#######################################
## data and trend
data("USMacroG", package = "AER")
ltrend < 0:(nrow(USMacroG)  1)
## Example 5.3
## OLS and IV regression
library("dynlm")
fm_ols < dynlm(consumption ~ gdp, data = USMacroG)
fm_iv < dynlm(consumption ~ gdp  L(consumption) + L(gdp), data = USMacroG)
## Hausman statistic
library("MASS")
b_diff < coef(fm_iv)  coef(fm_ols)
v_diff < summary(fm_iv)$cov.unscaled  summary(fm_ols)$cov.unscaled
(t(b_diff) %*% ginv(v_diff) %*% b_diff) / summary(fm_ols)$sigma^2
## Wu statistic
auxreg < dynlm(gdp ~ L(consumption) + L(gdp), data = USMacroG)
coeftest(dynlm(consumption ~ gdp + fitted(auxreg), data = USMacroG))[3,3]
## agrees with Greene (but not with errata)
## Example 6.1
## Table 6.1
fm6.1 < dynlm(log(invest) ~ tbill + inflation + log(gdp) + ltrend, data = USMacroG)
fm6.3 < dynlm(log(invest) ~ I(tbill  inflation) + log(gdp) + ltrend, data = USMacroG)
summary(fm6.1)
summary(fm6.3)
deviance(fm6.1)
deviance(fm6.3)
vcov(fm6.1)[2,3]
## F test
linearHypothesis(fm6.1, "tbill + inflation = 0")
## alternatively
anova(fm6.1, fm6.3)
## t statistic
sqrt(anova(fm6.1, fm6.3)[2,5])
## Example 6.3
## Distributed lag model:
## log(Ct) = b0 + b1 * log(Yt) + b2 * log(C(t1)) + u
us < log(USMacroG[, c(2, 5)])
fm_distlag < dynlm(log(consumption) ~ log(dpi) + L(log(consumption)),
data = USMacroG)
summary(fm_distlag)
## estimate and test longrun MPC
coef(fm_distlag)[2]/(1coef(fm_distlag)[3])
linearHypothesis(fm_distlag, "log(dpi) + L(log(consumption)) = 1")
## correct, see errata
## Example 6.4
## predict investiment in 2001(1)
predict(fm6.1, interval = "prediction",
newdata = data.frame(tbill = 4.48, inflation = 5.262, gdp = 9316.8, ltrend = 204))
## Example 7.7
## no GMM available in "strucchange"
## using OLS instead yields
fs < Fstats(log(m1/cpi) ~ log(gdp) + tbill, data = USMacroG,
vcov = NeweyWest, from = c(1957, 3), to = c(1991, 3))
plot(fs)
## which looks somewhat similar ...
## Example 8.2
## Ct = b0 + b1*Yt + b2*Y(t1) + v
fm1 < dynlm(consumption ~ dpi + L(dpi), data = USMacroG)
## Ct = a0 + a1*Yt + a2*C(t1) + u
fm2 < dynlm(consumption ~ dpi + L(consumption), data = USMacroG)
## Cox test in both directions:
coxtest(fm1, fm2)
## ... and do the same for jtest() and encomptest().
## Notice that in this particular case two of them are coincident.
jtest(fm1, fm2)
encomptest(fm1, fm2)
## encomptest could also be performed `by hand' via
fmE < dynlm(consumption ~ dpi + L(dpi) + L(consumption), data = USMacroG)
waldtest(fm1, fmE, fm2)
## Table 9.1
fm_ols < lm(consumption ~ dpi, data = as.data.frame(USMacroG))
fm_nls < nls(consumption ~ alpha + beta * dpi^gamma,
start = list(alpha = coef(fm_ols)[1], beta = coef(fm_ols)[2], gamma = 1),
control = nls.control(maxiter = 100), data = as.data.frame(USMacroG))
summary(fm_ols)
summary(fm_nls)
deviance(fm_ols)
deviance(fm_nls)
vcov(fm_nls)
## Example 9.7
## F test
fm_nls2 < nls(consumption ~ alpha + beta * dpi,
start = list(alpha = coef(fm_ols)[1], beta = coef(fm_ols)[2]),
control = nls.control(maxiter = 100), data = as.data.frame(USMacroG))
anova(fm_nls, fm_nls2)
## Wald test
linearHypothesis(fm_nls, "gamma = 1")
## Example 9.8, Table 9.2
usm < USMacroG[, c("m1", "tbill", "gdp")]
fm_lin < lm(m1 ~ tbill + gdp, data = usm)
fm_log < lm(m1 ~ tbill + gdp, data = log(usm))
## PE auxiliary regressions
aux_lin < lm(m1 ~ tbill + gdp + I(fitted(fm_log)  log(fitted(fm_lin))), data = usm)
aux_log < lm(m1 ~ tbill + gdp + I(fitted(fm_lin)  exp(fitted(fm_log))), data = log(usm))
coeftest(aux_lin)[4,]
coeftest(aux_log)[4,]
## matches results from errata
## With lmtest >= 0.924:
## petest(fm_lin, fm_log)
## Example 12.1
fm_m1 < dynlm(log(m1) ~ log(gdp) + log(cpi), data = USMacroG)
summary(fm_m1)
## Figure 12.1
par(las = 1)
plot(0, 0, type = "n", axes = FALSE,
xlim = c(1950, 2002), ylim = c(0.3, 0.225),
xaxs = "i", yaxs = "i",
xlab = "Quarter", ylab = "", main = "Least Squares Residuals")
box()
axis(1, at = c(1950, 1963, 1976, 1989, 2002))
axis(2, seq(0.3, 0.225, by = 0.075))
grid(4, 7, col = grey(0.6))
abline(0, 0)
lines(residuals(fm_m1), lwd = 2)
## Example 12.3
fm_pc < dynlm(d(inflation) ~ unemp, data = USMacroG)
summary(fm_pc)
## Figure 12.3
plot(residuals(fm_pc))
## natural unemployment rate
coef(fm_pc)[1]/coef(fm_pc)[2]
## autocorrelation
res < residuals(fm_pc)
summary(dynlm(res ~ L(res)))
## Example 12.4
coeftest(fm_m1)
coeftest(fm_m1, vcov = NeweyWest(fm_m1, lag = 5))
summary(fm_m1)$r.squared
dwtest(fm_m1)
as.vector(acf(residuals(fm_m1), plot = FALSE)$acf)[2]
## matches Tab. 12.1 errata and Greene 6e, apart from NeweyWest SE
#################################################
## Cost function of electricity producers 1870 ##
#################################################
## Example 5.6: a generalized CobbDouglas cost function
data("Electricity1970", package = "AER")
fm < lm(log(cost/fuel) ~ log(output) + I(log(output)^2/2) +
log(capital/fuel) + log(labor/fuel), data=Electricity1970[1:123,])
####################################################
## SIC 33: Production for primary metals industry ##
####################################################
## data
data("SIC33", package = "AER")
## Example 6.2
## Translog model
fm_tl < lm(
output ~ labor + capital + I(0.5 * labor^2) + I(0.5 * capital^2) + I(labor * capital),
data = log(SIC33))
## CobbDouglas model
fm_cb < lm(output ~ labor + capital, data = log(SIC33))
## Table 6.2 in Greene (2003)
deviance(fm_tl)
deviance(fm_cb)
summary(fm_tl)
summary(fm_cb)
vcov(fm_tl)
vcov(fm_cb)
## CobbDouglas vs. Translog model
anova(fm_cb, fm_tl)
## hypothesis of constant returns
linearHypothesis(fm_cb, "labor + capital = 1")
###############################
## Cost data for US airlines ##
###############################
## data
data("USAirlines", package = "AER")
## Example 7.2
fm_full < lm(log(cost) ~ log(output) + I(log(output)^2) + log(price) + load + year + firm,
data = USAirlines)
fm_time < lm(log(cost) ~ log(output) + I(log(output)^2) + log(price) + load + year,
data = USAirlines)
fm_firm < lm(log(cost) ~ log(output) + I(log(output)^2) + log(price) + load + firm,
data = USAirlines)
fm_no < lm(log(cost) ~ log(output) + I(log(output)^2) + log(price) + load, data = USAirlines)
## full fitted model
coef(fm_full)[1:5]
plot(1970:1984, c(coef(fm_full)[6:19], 0), type = "n",
xlab = "Year", ylab = expression(delta(Year)),
main = "Estimated Year Specific Effects")
grid()
points(1970:1984, c(coef(fm_full)[6:19], 0), pch = 19)
## Table 7.2
anova(fm_full, fm_time)
anova(fm_full, fm_firm)
anova(fm_full, fm_no)
## alternatively, use plm()
library("plm")
usair < pdata.frame(USAirlines, c("firm", "year"))
fm_full2 < plm(log(cost) ~ log(output) + I(log(output)^2) + log(price) + load,
data = usair, model = "within", effect = "twoways")
fm_time2 < plm(log(cost) ~ log(output) + I(log(output)^2) + log(price) + load,
data = usair, model = "within", effect = "time")
fm_firm2 < plm(log(cost) ~ log(output) + I(log(output)^2) + log(price) + load,
data = usair, model = "within", effect = "individual")
fm_no2 < plm(log(cost) ~ log(output) + I(log(output)^2) + log(price) + load,
data = usair, model = "pooling")
pFtest(fm_full2, fm_time2)
pFtest(fm_full2, fm_firm2)
pFtest(fm_full2, fm_no2)
## Example 13.1, Table 13.1
fm_no < plm(log(cost) ~ log(output) + log(price) + load, data = usair, model = "pooling")
fm_gm < plm(log(cost) ~ log(output) + log(price) + load, data = usair, model = "between")
fm_firm < plm(log(cost) ~ log(output) + log(price) + load, data = usair, model = "within")
fm_time < plm(log(cost) ~ log(output) + log(price) + load, data = usair, model = "within",
effect = "time")
fm_ft < plm(log(cost) ~ log(output) + log(price) + load, data = usair, model = "within",
effect = "twoways")
summary(fm_no)
summary(fm_gm)
summary(fm_firm)
fixef(fm_firm)
summary(fm_time)
fixef(fm_time)
summary(fm_ft)
fixef(fm_ft, effect = "individual")
fixef(fm_ft, effect = "time")
## Table 13.2
fm_rfirm < plm(log(cost) ~ log(output) + log(price) + load, data = usair, model = "random")
fm_rft < plm(log(cost) ~ log(output) + log(price) + load, data = usair, model = "random",
effect = "twoways")
summary(fm_rfirm)
summary(fm_rft)
#################################################
## Cost function of electricity producers 1955 ##
#################################################
## Nerlove data
data("Electricity1955", package = "AER")
Electricity < Electricity1955[1:145,]
## Example 7.3
## CobbDouglas cost function
fm_all < lm(log(cost/fuel) ~ log(output) + log(labor/fuel) + log(capital/fuel),
data = Electricity)
summary(fm_all)
## hypothesis of constant returns to scale
linearHypothesis(fm_all, "log(output) = 1")
## Figure 7.4
plot(residuals(fm_all) ~ log(output), data = Electricity)
## scaling seems to be different in Greene (2003) with logQ > 10?
## grouped functions
Electricity$group < with(Electricity, cut(log(output), quantile(log(output), 0:5/5),
include.lowest = TRUE, labels = 1:5))
fm_group < lm(
log(cost/fuel) ~ group/(log(output) + log(labor/fuel) + log(capital/fuel))  1,
data = Electricity)
## Table 7.3 (close, but not quite)
round(rbind(coef(fm_all)[1], matrix(coef(fm_group), nrow = 5)[,1]), digits = 3)
## Table 7.4
## log quadratic cost function
fm_all2 < lm(
log(cost/fuel) ~ log(output) + I(log(output)^2) + log(labor/fuel) + log(capital/fuel),
data = Electricity)
summary(fm_all2)
##########################
## Technological change ##
##########################
## Exercise 7.1
data("TechChange", package = "AER")
fm1 < lm(I(output/technology) ~ log(clr), data = TechChange)
fm2 < lm(I(output/technology) ~ I(1/clr), data = TechChange)
fm3 < lm(log(output/technology) ~ log(clr), data = TechChange)
fm4 < lm(log(output/technology) ~ I(1/clr), data = TechChange)
## Exercise 7.2 (a) and (c)
plot(I(output/technology) ~ clr, data = TechChange)
sctest(I(output/technology) ~ log(clr), data = TechChange,
type = "Chow", point = c(1942, 1))
##################################
## Expenditure and default data ##
##################################
## full data set (F21.4)
data("CreditCard", package = "AER")
## extract data set F9.1
ccard < CreditCard[1:100,]
ccard$income < round(ccard$income, digits = 2)
ccard$expenditure < round(ccard$expenditure, digits = 2)
ccard$age < round(ccard$age + .01)
## suspicious:
CreditCard$age[CreditCard$age < 1]
## the first of these is also in TableF9.1 with 36 instead of 0.5:
ccard$age[79] < 36
## Example 11.1
ccard < ccard[order(ccard$income),]
ccard0 < subset(ccard, expenditure > 0)
cc_ols < lm(expenditure ~ age + owner + income + I(income^2), data = ccard0)
## Figure 11.1
plot(residuals(cc_ols) ~ income, data = ccard0, pch = 19)
## Table 11.1
mean(ccard$age)
prop.table(table(ccard$owner))
mean(ccard$income)
summary(cc_ols)
sqrt(diag(vcovHC(cc_ols, type = "HC0")))
sqrt(diag(vcovHC(cc_ols, type = "HC2")))
sqrt(diag(vcovHC(cc_ols, type = "HC1")))
bptest(cc_ols, ~ (age + income + I(income^2) + owner)^2 + I(age^2) + I(income^4),
data = ccard0)
gqtest(cc_ols)
bptest(cc_ols, ~ income + I(income^2), data = ccard0, studentize = FALSE)
bptest(cc_ols, ~ income + I(income^2), data = ccard0)
## Table 11.2, WLS and FGLS
cc_wls1 < lm(expenditure ~ age + owner + income + I(income^2), weights = 1/income,
data = ccard0)
cc_wls2 < lm(expenditure ~ age + owner + income + I(income^2), weights = 1/income^2,
data = ccard0)
auxreg1 < lm(log(residuals(cc_ols)^2) ~ log(income), data = ccard0)
cc_fgls1 < lm(expenditure ~ age + owner + income + I(income^2),
weights = 1/exp(fitted(auxreg1)), data = ccard0)
auxreg2 < lm(log(residuals(cc_ols)^2) ~ income + I(income^2), data = ccard0)
cc_fgls2 < lm(expenditure ~ age + owner + income + I(income^2),
weights = 1/exp(fitted(auxreg2)), data = ccard0)
alphai < coef(lm(log(residuals(cc_ols)^2) ~ log(income), data = ccard0))[2]
alpha < 0
while(abs((alphai  alpha)/alpha) > 1e7) {
alpha < alphai
cc_fgls3 < lm(expenditure ~ age + owner + income + I(income^2), weights = 1/income^alpha,
data = ccard0)
alphai < coef(lm(log(residuals(cc_fgls3)^2) ~ log(income), data = ccard0))[2]
}
alpha ## 1.7623 for Greene
cc_fgls3 < lm(expenditure ~ age + owner + income + I(income^2), weights = 1/income^alpha,
data = ccard0)
llik < function(alpha)
logLik(lm(expenditure ~ age + owner + income + I(income^2), weights = 1/income^alpha,
data = ccard0))
plot(0:100/20, sapply(0:100/20, llik), type = "l", xlab = "alpha", ylab = "logLik")
alpha < optimize(llik, interval = c(0, 5))$minimum
cc_fgls4 < lm(expenditure ~ age + owner + income + I(income^2), weights = 1/income^alpha,
data = ccard0)
## Table 11.2
cc_fit < list(cc_ols, cc_wls1, cc_wls2, cc_fgls2, cc_fgls1, cc_fgls3, cc_fgls4)
t(sapply(cc_fit, coef))
t(sapply(cc_fit, function(obj) sqrt(diag(vcov(obj)))))
## Table 21.21, Poisson and logit models
cc_pois < glm(reports ~ age + income + expenditure, data = CreditCard, family = poisson)
summary(cc_pois)
logLik(cc_pois)
xhat < colMeans(CreditCard[, c("age", "income", "expenditure")])
xhat < as.data.frame(t(xhat))
lambda < predict(cc_pois, newdata = xhat, type = "response")
ppois(0, lambda) * nrow(CreditCard)
cc_logit < glm(factor(reports > 0) ~ age + income + owner,
data = CreditCard, family = binomial)
summary(cc_logit)
logLik(cc_logit)
## Table 21.21, "split population model"
library("pscl")
cc_zip < zeroinfl(reports ~ age + income + expenditure  age + income + owner,
data = CreditCard)
summary(cc_zip)
sum(predict(cc_zip, type = "prob")[,1])
###################################
## DEM/GBP exchange rate returns ##
###################################
## data as given by Greene (2003)
data("MarkPound")
mp < round(MarkPound, digits = 6)
## Figure 11.3 in Greene (2003)
plot(mp)
## Example 11.8 in Greene (2003), Table 11.5
library("tseries")
mp_garch < garch(mp, grad = "numerical")
summary(mp_garch)
logLik(mp_garch)
## Greene (2003) also includes a constant and uses different
## standard errors (presumably computed from Hessian), here
## OPG standard errors are used. garchFit() in "fGarch"
## implements the approach used by Greene (2003).
## compare Errata to Greene (2003)
library("dynlm")
res < residuals(dynlm(mp ~ 1))^2
mp_ols < dynlm(res ~ L(res, 1:10))
summary(mp_ols)
logLik(mp_ols)
summary(mp_ols)$r.squared * length(residuals(mp_ols))
################################
## Grunfeld's investment data ##
################################
## subset of data with mistakes
data("Grunfeld", package = "AER")
ggr < subset(Grunfeld, firm %in% c("General Motors", "US Steel",
"General Electric", "Chrysler", "Westinghouse"))
ggr[c(26, 38), 1] < c(261.6, 645.2)
ggr[32, 3] < 232.6
## Tab. 13.4
fm_pool < lm(invest ~ value + capital, data = ggr)
summary(fm_pool)
logLik(fm_pool)
## White correction
sqrt(diag(vcovHC(fm_pool, type = "HC0")))
## heteroskedastic FGLS
auxreg1 < lm(residuals(fm_pool)^2 ~ firm  1, data = ggr)
fm_pfgls < lm(invest ~ value + capital, data = ggr, weights = 1/fitted(auxreg1))
summary(fm_pfgls)
## ML, computed as iterated FGLS
sigmasi < fitted(lm(residuals(fm_pfgls)^2 ~ firm  1 , data = ggr))
sigmas < 0
while(any(abs((sigmasi  sigmas)/sigmas) > 1e7)) {
sigmas < sigmasi
fm_pfgls_i < lm(invest ~ value + capital, data = ggr, weights = 1/sigmas)
sigmasi < fitted(lm(residuals(fm_pfgls_i)^2 ~ firm  1 , data = ggr))
}
fm_pmlh < lm(invest ~ value + capital, data = ggr, weights = 1/sigmas)
summary(fm_pmlh)
logLik(fm_pmlh)
## Tab. 13.5
auxreg2 < lm(residuals(fm_pfgls)^2 ~ firm  1, data = ggr)
auxreg3 < lm(residuals(fm_pmlh)^2 ~ firm  1, data = ggr)
rbind(
"OLS" = coef(auxreg1),
"Het. FGLS" = coef(auxreg2),
"Het. ML" = coef(auxreg3))
## Chapter 14: explicitly treat as panel data
library("plm")
pggr < pdata.frame(ggr, c("firm", "year"))
## Tab. 14.1
library("systemfit")
fm_sur < systemfit(invest ~ value + capital, data = pggr, method = "SUR",
methodResidCov = "noDfCor")
fm_psur < systemfit(invest ~ value + capital, data = pggr, method = "SUR", pooled = TRUE,
methodResidCov = "noDfCor", residCovWeighted = TRUE)
## Tab 14.2
fm_ols < systemfit(invest ~ value + capital, data = pggr, method = "OLS")
fm_pols < systemfit(invest ~ value + capital, data = pggr, method = "OLS", pooled = TRUE)
## or "by hand"
fm_gm < lm(invest ~ value + capital, data = ggr, subset = firm == "General Motors")
mean(residuals(fm_gm)^2) ## Greene uses MLE
## etc.
fm_pool < lm(invest ~ value + capital, data = ggr)
## Tab. 14.3 (and Tab 13.4, crosssection ML)
## (not run due to long computation time)
## Not run:
fm_ml < systemfit(invest ~ value + capital, data = pggr, method = "SUR",
methodResidCov = "noDfCor", maxiter = 1000, tol = 1e10)
fm_pml < systemfit(invest ~ value + capital, data = pggr, method = "SUR", pooled = TRUE,
methodResidCov = "noDfCor", residCovWeighted = TRUE, maxiter = 1000, tol = 1e10)
## End(Not run)
## Fig. 14.2
plot(unlist(residuals(fm_sur)[, c(3, 1, 2, 5, 4)]),
type = "l", ylab = "SUR residuals", ylim = c(400, 400), xaxs = "i", yaxs = "i")
abline(v = c(20,40,60,80), h = 0, lty = 2)
###################
## Klein model I ##
###################
## data
data("KleinI", package = "AER")
## Tab. 15.3, OLS
library("dynlm")
fm_cons < dynlm(consumption ~ cprofits + L(cprofits) + I(pwage + gwage), data = KleinI)
fm_inv < dynlm(invest ~ cprofits + L(cprofits) + capital, data = KleinI)
fm_pwage < dynlm(pwage ~ gnp + L(gnp) + I(time(gnp)  1931), data = KleinI)
summary(fm_cons)
summary(fm_inv)
summary(fm_pwage)
## Notes:
##  capital refers to previous year's capital stock > no lag needed!
##  trend used by Greene (p. 381, "time trend measured as years from 1931")
## Maddala uses years since 1919
## preparation of data frame for systemfit
KI < ts.intersect(KleinI, lag(KleinI, k = 1), dframe = TRUE)
names(KI) < c(colnames(KleinI), paste("L", colnames(KleinI), sep = ""))
KI$trend < (1921:1941)  1931
library("systemfit")
system < list(
consumption = consumption ~ cprofits + Lcprofits + I(pwage + gwage),
invest = invest ~ cprofits + Lcprofits + capital,
pwage = pwage ~ gnp + Lgnp + trend)
## Tab. 15.3 OLS again
fm_ols < systemfit(system, method = "OLS", data = KI)
summary(fm_ols)
## Tab. 15.3 2SLS, 3SLS, I3SLS
inst < ~ Lcprofits + capital + Lgnp + gexpenditure + taxes + trend + gwage
fm_2sls < systemfit(system, method = "2SLS", inst = inst,
methodResidCov = "noDfCor", data = KI)
fm_3sls < systemfit(system, method = "3SLS", inst = inst,
methodResidCov = "noDfCor", data = KI)
fm_i3sls < systemfit(system, method = "3SLS", inst = inst,
methodResidCov = "noDfCor", maxiter = 100, data = KI)
############################################
## Transportation equipment manufacturing ##
############################################
## data
data("Equipment", package = "AER")
## Example 17.5
## CobbDouglas
fm_cd < lm(log(valueadded/firms) ~ log(capital/firms) + log(labor/firms),
data = Equipment)
## generalized CobbDouglas with ZellnerRevankar trafo
GCobbDouglas < function(theta)
lm(I(log(valueadded/firms) + theta * valueadded/firms) ~ log(capital/firms) + log(labor/firms),
data = Equipment)
## yields classical CobbDouglas for theta = 0
fm_cd0 < GCobbDouglas(0)
## ML estimation of generalized model
## choose starting values from classical model
par0 < as.vector(c(coef(fm_cd0), 0, mean(residuals(fm_cd0)^2)))
## set up likelihood function
nlogL < function(par) {
beta < par[1:3]
theta < par[4]
sigma2 < par[5]
Y < with(Equipment, valueadded/firms)
K < with(Equipment, capital/firms)
L < with(Equipment, labor/firms)
rhs < beta[1] + beta[2] * log(K) + beta[3] * log(L)
lhs < log(Y) + theta * Y
rval < sum(log(1 + theta * Y)  log(Y) +
dnorm(lhs, mean = rhs, sd = sqrt(sigma2), log = TRUE))
return(rval)
}
## optimization
opt < optim(par0, nlogL, hessian = TRUE)
## Table 17.2
opt$par
sqrt(diag(solve(opt$hessian)))[1:4]
opt$value
## refit ML model
fm_ml < GCobbDouglas(opt$par[4])
deviance(fm_ml)
sqrt(diag(vcov(fm_ml)))
## fit NLS model
rss < function(theta) deviance(GCobbDouglas(theta))
optim(0, rss)
opt2 < optimize(rss, c(1, 1))
fm_nls < GCobbDouglas(opt2$minimum)
nlogL(c(coef(fm_nls), opt2$minimum, mean(residuals(fm_nls)^2)))
############################
## Municipal expenditures ##
############################
## Table 18.2
data("Municipalities", package = "AER")
summary(Municipalities)
###########################
## Program effectiveness ##
###########################
## Table 21.1, col. "Probit"
data("ProgramEffectiveness", package = "AER")
fm_probit < glm(grade ~ average + testscore + participation,
data = ProgramEffectiveness, family = binomial(link = "probit"))
summary(fm_probit)
####################################
## Labor force participation data ##
####################################
## data and transformations
data("PSID1976", package = "AER")
PSID1976$kids < with(PSID1976, factor((youngkids + oldkids) > 0,
levels = c(FALSE, TRUE), labels = c("no", "yes")))
PSID1976$nwincome < with(PSID1976, (fincome  hours * wage)/1000)
## Example 4.1, Table 4.2
## (reproduced in Example 7.1, Table 7.1)
gr_lm < lm(log(hours * wage) ~ age + I(age^2) + education + kids,
data = PSID1976, subset = participation == "yes")
summary(gr_lm)
vcov(gr_lm)
## Example 4.5
summary(gr_lm)
## or equivalently
gr_lm1 < lm(log(hours * wage) ~ 1, data = PSID1976, subset = participation == "yes")
anova(gr_lm1, gr_lm)
## Example 21.4, p. 681, and Tab. 21.3, p. 682
gr_probit1 < glm(participation ~ age + I(age^2) + I(fincome/10000) + education + kids,
data = PSID1976, family = binomial(link = "probit") )
gr_probit2 < glm(participation ~ age + I(age^2) + I(fincome/10000) + education,
data = PSID1976, family = binomial(link = "probit"))
gr_probit3 < glm(participation ~ kids/(age + I(age^2) + I(fincome/10000) + education),
data = PSID1976, family = binomial(link = "probit"))
## LR test of all coefficients
lrtest(gr_probit1)
## Chowtype test
lrtest(gr_probit2, gr_probit3)
## equivalently:
anova(gr_probit2, gr_probit3, test = "Chisq")
## Table 21.3
summary(gr_probit1)
## Example 22.8, Table 22.7, p. 786
library("sampleSelection")
gr_2step < selection(participation ~ age + I(age^2) + fincome + education + kids,
wage ~ experience + I(experience^2) + education + city,
data = PSID1976, method = "2step")
gr_ml < selection(participation ~ age + I(age^2) + fincome + education + kids,
wage ~ experience + I(experience^2) + education + city,
data = PSID1976, method = "ml")
gr_ols < lm(wage ~ experience + I(experience^2) + education + city,
data = PSID1976, subset = participation == "yes")
## NOTE: ML estimates agree with Greene, 5e errata.
## Standard errors are based on the Hessian (here), while Greene has BHHH/OPG.
####################
## Ship accidents ##
####################
## subset data
data("ShipAccidents", package = "AER")
sa < subset(ShipAccidents, service > 0)
## Table 21.20
sa_full < glm(incidents ~ type + construction + operation, family = poisson,
data = sa, offset = log(service))
summary(sa_full)
sa_notype < glm(incidents ~ construction + operation, family = poisson,
data = sa, offset = log(service))
summary(sa_notype)
sa_noperiod < glm(incidents ~ type + operation, family = poisson,
data = sa, offset = log(service))
summary(sa_noperiod)
## model comparison
anova(sa_full, sa_notype, test = "Chisq")
anova(sa_full, sa_noperiod, test = "Chisq")
## test for overdispersion
dispersiontest(sa_full)
dispersiontest(sa_full, trafo = 2)
######################################
## Fair's extramarital affairs data ##
######################################
## data
data("Affairs", package = "AER")
## Tab. 22.3 and 22.4
fm_ols < lm(affairs ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs)
fm_probit < glm(I(affairs > 0) ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs, family = binomial(link = "probit"))
fm_tobit < tobit(affairs ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs)
fm_tobit2 < tobit(affairs ~ age + yearsmarried + religiousness + occupation + rating,
right = 4, data = Affairs)
fm_pois < glm(affairs ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs, family = poisson)
library("MASS")
fm_nb < glm.nb(affairs ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs)
## Tab. 22.6
library("pscl")
fm_zip < zeroinfl(affairs ~ age + yearsmarried + religiousness + occupation + rating  age +
yearsmarried + religiousness + occupation + rating, data = Affairs)
######################
## Strike durations ##
######################
## data and package
data("StrikeDuration", package = "AER")
library("MASS")
## Table 22.10
fit_exp < fitdistr(StrikeDuration$duration, "exponential")
fit_wei < fitdistr(StrikeDuration$duration, "weibull")
fit_wei$estimate[2]^(1)
fit_lnorm < fitdistr(StrikeDuration$duration, "lognormal")
1/fit_lnorm$estimate[2]
exp(fit_lnorm$estimate[1])
## Weibull and lognormal distribution have
## different parameterizations, see Greene p. 794
## Example 22.10
library("survival")
fm_wei < survreg(Surv(duration) ~ uoutput, dist = "weibull", data = StrikeDuration)
summary(fm_wei)
Growth regression data as provided by Durlauf & Johnson (1995).
data("GrowthDJ")
A data frame containing 121 observations on 10 variables.
factor. Is the country an oilproducing country?
factor. Does the country have better quality data?
factor. Is the country a member of the OECD?
Per capita GDP in 1960.
Per capita GDP in 1985.
Average growth rate of per capita GDP from 1960 to 1985 (in percent).
Average growth rate of workingage population 1960 to 1985 (in percent).
Average ratio of investment (including Government Investment) to GDP from 1960 to 1985 (in percent).
Average fraction of workingage population enrolled in secondary school from 1960 to 1985 (in percent).
Fraction of the population over 15 years old that is able to read and write in 1960 (in percent).
The data are derived from the Penn World Table 4.0 and are given in Mankiw, Romer and Weil (1992),
except literacy60
that is from the World Bank's World Development Report.
Journal of Applied Econometrics Data Archive.
http://qed.econ.queensu.ca/jae/1995v10.4/durlaufjohnson/
Durlauf, S.N., and Johnson, P.A. (1995). Multiple Regimes and CrossCountry Growth Behavior. Journal of Applied Econometrics, 10, 365–384.
Koenker, R., and Zeileis, A. (2009). On Reproducible Econometric Research. Journal of Applied Econometrics, 24(5), 833–847.
Mankiw, N.G, Romer, D., and Weil, D.N. (1992). A Contribution to the Empirics of Economic Growth. Quarterly Journal of Economics, 107, 407–437.
Masanjala, W.H., and Papageorgiou, C. (2004). The Solow Model with CES Technology: Nonlinearities and Parameter Heterogeneity. Journal of Applied Econometrics, 19, 171–201.
## data for nonoilproducing countries
data("GrowthDJ")
dj < subset(GrowthDJ, oil == "no")
## Different scalings have been used by different authors,
## different types of standard errors, etc.,
## see Koenker & Zeileis (2009) for an overview
## Durlauf & Johnson (1995), Table II
mrw_model < I(log(gdp85)  log(gdp60)) ~ log(gdp60) +
log(invest/100) + log(popgrowth/100 + 0.05) + log(school/100)
dj_mrw < lm(mrw_model, data = dj)
coeftest(dj_mrw)
dj_model < I(log(gdp85)  log(gdp60)) ~ log(gdp60) +
log(invest) + log(popgrowth/100 + 0.05) + log(school)
dj_sub1 < lm(dj_model, data = dj, subset = gdp60 < 1800 & literacy60 < 50)
coeftest(dj_sub1, vcov = sandwich)
dj_sub2 < lm(dj_model, data = dj, subset = gdp60 >= 1800 & literacy60 >= 50)
coeftest(dj_sub2, vcov = sandwich)
Data on average growth rates over 1960–1995 for 65 countries, along with variables that are potentially related to growth.
data("GrowthSW")
A data frame containing 65 observations on 6 variables.
average annual percentage growth of real GDP from 1960 to 1995.
value of GDP per capita in 1960, converted to 1960 US dollars.
average share of trade in the economy from 1960 to 1995, measured as the sum of exports (X) plus imports (M), divided by GDP; that is, the average value of (X + M)/GDP from 1960 to 1995.
average number of years of schooling of adult residents in that country in 1960.
average annual number of revolutions, insurrections (successful or not) and coup d'etats in that country from 1960 to 1995.
average annual number of political assassinations in that country from 1960 to 1995 (in per million population).
Online complements to Stock and Watson (2007).
Beck, T., Levine, R., and Loayza, N. (2000). Finance and the Sources of Growth. Journal of Financial Economics, 58, 261–300.
Stock, J. H. and Watson, M. W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
StockWatson2007
, GrowthDJ
, OECDGrowth
data("GrowthSW")
summary(GrowthSW)
Panel data on 11 large US manufacturing firms over 20 years, for the years 1935–1954.
data("Grunfeld")
A data frame containing 20 annual observations on 3 variables for 11 firms.
Gross investment, defined as additions to plant and equipment plus maintenance and repairs in millions of dollars deflated by the implicit price deflator of producers' durable equipment (base 1947).
Market value of the firm, defined as the price of common shares at December 31 (or, for WH, IBM and CH, the average price of December 31 and January 31 of the following year) times the number of common shares outstanding plus price of preferred shares at December 31 (or average price of December 31 and January 31 of the following year) times number of preferred shares plus total book value of debt at December 31 in millions of dollars deflated by the implicit GNP price deflator (base 1947).
Stock of plant and equipment, defined as the accumulated sum of net additions to plant and equipment deflated by the implicit price deflator for producers' durable equipment (base 1947) minus depreciation allowance deflated by depreciation expense deflator (10 years moving average of wholesale price index of metals and metal products, base 1947).
factor with 11 levels: "General Motors"
, "US Steel"
,
"General Electric"
, "Chrysler"
, "Atlantic Refining"
, "IBM"
,
"Union Oil"
, "Westinghouse"
, "Goodyear"
, "Diamond Match"
,
"American Steel"
.
Year.
This is a popular data set for teaching purposes. Unfortunately, there exist several
different versions (see Kleiber and Zeileis, 2010, for a detailed discussion).
In particular, the version provided by Greene (2003) has a couple of errors
for "US Steel"
(firm 2):
investment in 1940 is 261.6 (instead of the correct 361.6),
investment in 1952 is 645.2 (instead of the correct 645.5),
capital in 1946 is 132.6 (instead of the correct 232.6).
Here, we provide the original data from Grunfeld (1958). The data for the first 10 firms are identical to those of Baltagi (2002) or Baltagi (2005), now also used by Greene (2008).
The data are taken from Grunfeld (1958, Appendix, Tables 2–9 and 11–13).
Baltagi, B.H. (2002). Econometrics, 3rd ed., Berlin: SpringerVerlag.
Baltagi, B.H. (2005). Econometric Analysis of Panel Data, 3rd ed. Chichester, UK: John Wiley.
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Greene, W.H. (2008). Econometric Analysis, 6th edition. Upper Saddle River, NJ: Prentice Hall.
Grunfeld, Y. (1958). The Determinants of Corporate Investment. Unpublished Ph.D. Dissertation, University of Chicago.
Kleiber, C., and Zeileis, A. (2010). “The Grunfeld Data at 50.” German Economic Review, 11(4), 404–417. doi:10.1111/j.14680475.2010.00513.x
data("Grunfeld", package = "AER")
## Greene (2003)
## subset of data with mistakes
ggr < subset(Grunfeld, firm %in% c("General Motors", "US Steel",
"General Electric", "Chrysler", "Westinghouse"))
ggr[c(26, 38), 1] < c(261.6, 645.2)
ggr[32, 3] < 232.6
## Tab. 14.2, col. "GM"
fm_gm < lm(invest ~ value + capital, data = ggr, subset = firm == "General Motors")
mean(residuals(fm_gm)^2) ## Greene uses MLE
## Tab. 14.2, col. "Pooled"
fm_pool < lm(invest ~ value + capital, data = ggr)
## equivalently
library("plm")
pggr < pdata.frame(ggr, c("firm", "year"))
library("systemfit")
fm_ols < systemfit(invest ~ value + capital, data = pggr, method = "OLS")
fm_pols < systemfit(invest ~ value + capital, data = pggr, method = "OLS",
pooled = TRUE)
## Tab. 14.1
fm_sur < systemfit(invest ~ value + capital, data = pggr, method = "SUR",
methodResidCov = "noDfCor")
fm_psur < systemfit(invest ~ value + capital, data = pggr, method = "SUR", pooled = TRUE,
methodResidCov = "noDfCor", residCovWeighted = TRUE)
## Further examples:
## help("Greene2003")
## Panel models
library("plm")
pg < pdata.frame(subset(Grunfeld, firm != "American Steel"), c("firm", "year"))
fm_fe < plm(invest ~ value + capital, model = "within", data = pg)
summary(fm_fe)
coeftest(fm_fe, vcov = vcovHC)
fm_reswar < plm(invest ~ value + capital, data = pg,
model = "random", random.method = "swar")
summary(fm_reswar)
## testing for random effects
fm_ols < plm(invest ~ value + capital, data = pg, model = "pooling")
plmtest(fm_ols, type = "bp")
plmtest(fm_ols, type = "honda")
## Random effects models
fm_ream < plm(invest ~ value + capital, data = pg, model = "random",
random.method = "amemiya")
fm_rewh < plm(invest ~ value + capital, data = pg, model = "random",
random.method = "walhus")
fm_rener < plm(invest ~ value + capital, data = pg, model = "random",
random.method = "nerlove")
## Baltagi (2005), Tab. 2.1
rbind(
"OLS(pooled)" = coef(fm_ols),
"FE" = c(NA, coef(fm_fe)),
"RESwAr" = coef(fm_reswar),
"REAmemiya" = coef(fm_ream),
"REWalHus" = coef(fm_rewh),
"RENerlove" = coef(fm_rener))
## Hausman test
phtest(fm_fe, fm_reswar)
## Further examples:
## help("Baltagi2002")
## help("Greene2003")
Crosssection data for 675 14year old children born between 1980 and 1988. The sample is taken from the German SocioEconomic Panel (GSOEP) for the years 1994 to 2002 to investigate the determinants of secondary school choice.
data("GSOEP9402")
A data frame containing 675 observations on 12 variables.
factor. Child's secondary school level.
Year of child's birth.
factor indicating child's gender.
Total number of kids living in household.
Birth order.
Household income.
Household size
factor indicating German federal state.
factor indicating mother's marital status.
Mother's educational level in years.
factor indicating mother's employment level: fulltime, parttime, or not working.
Year of GSOEP wave.
This sample from the German SocioEconomic Panel (GSOEP) for the years between 1994 and 2002 has been selected by Winkelmann and Boes (2009) to investigate the determinants of secondary school choice.
In the German schooling system, students are separated relatively early into
different school types, depending on their ability as perceived by the teachers
after four years of primary school. After that, around the age of ten, students are placed
into one of three types of secondary school: "Hauptschule"
(lower secondary school), "Realschule"
(middle secondary school), or
"Gymnasium"
(upper secondary school). Only a degree from the latter
type of school (called Abitur) provides direct access to universities.
A frequent criticism of this system is that the tracking takes place too early, and that it cements inequalities in education across generations. Although the secondary school choice is based on the teachers' recommendations, it is typically also influenced by the parents; both indirectly through their own educational level and directly through influence on the teachers.
Online complements to Winkelmann and Boes (2009).
Winkelmann, R., and Boes, S. (2009). Analysis of Microdata, 2nd ed. Berlin and Heidelberg: SpringerVerlag.
## data
data("GSOEP9402", package = "AER")
## some convenience data transformations
gsoep < GSOEP9402
gsoep$year2 < factor(gsoep$year)
## visualization
plot(school ~ meducation, data = gsoep, breaks = c(7, 9, 10.5, 11.5, 12.5, 15, 18))
## Chapter 5, Table 5.1
library("nnet")
gsoep_mnl < multinom(
school ~ meducation + memployment + log(income) + log(size) + parity + year2,
data = gsoep)
coeftest(gsoep_mnl)[c(1:6, 1:6 + 14),]
## alternatively
library("mlogit")
gsoep_mnl2 < mlogit(
school ~ 0  meducation + memployment + log(income) + log(size) + parity + year2,
data = gsoep, shape = "wide", reflevel = "Hauptschule")
coeftest(gsoep_mnl2)[1:12,]
## Table 5.2
library("effects")
gsoep_eff < effect("meducation", gsoep_mnl,
xlevels = list(meducation = sort(unique(gsoep$meducation))))
gsoep_eff$prob
plot(gsoep_eff, confint = FALSE)
## omit year
gsoep_mnl1 < multinom(
school ~ meducation + memployment + log(income) + log(size) + parity,
data = gsoep)
lrtest(gsoep_mnl, gsoep_mnl1)
## Chapter 6
## Table 6.1
library("MASS")
gsoep_pop < polr(
school ~ meducation + I(memployment != "none") + log(income) + log(size) + parity + year2,
data = gsoep, method = "probit", Hess = TRUE)
gsoep_pol < polr(
school ~ meducation + I(memployment != "none") + log(income) + log(size) + parity + year2,
data = gsoep, Hess = TRUE)
## compare polr and multinom via AIC
gsoep_pol1 < polr(
school ~ meducation + memployment + log(income) + log(size) + parity,
data = gsoep, Hess = TRUE)
AIC(gsoep_pol1, gsoep_mnl)
## effects
eff_pol1 < allEffects(gsoep_pol1)
plot(eff_pol1, ask = FALSE, confint = FALSE)
## More examples can be found in:
## help("WinkelmannBoes2009")
Crosssection data for 9120 women taken from every fourth year of the US General Social Survey between 1974 and 2002 to investigate the determinants of fertility.
data("GSS7402")
A data frame containing 9120 observations on 10 variables.
Number of children. This is coded as a numerical variable
but note that the value 8
actually encompasses 8 or more children.
Age of respondent.
Highest year of school completed.
GSS year for respondent.
Number of brothers and sisters.
Woman's age at birth of first child.
factor indicating ethnicity.
Is the individual Caucasian ("cauc"
) or not ("other"
)?
factor. Did the respondent live in a city (with population > 50,000) at age 16?
factor. Was the income below average at age 16?
factor. Was the respondent (or both parents) born abroad?
This subset of the US General Social Survey (GSS) for every fourth year between 1974 and 2002 has been selected by Winkelmann and Boes (2009) to investigate the determinants of fertility. To do so they typically restrict their empirical analysis to the women for which the completed fertility is (assumed to be) known, employing the common cutoff of 40 years. Both, the average number of children borne to a woman and the probability of being childless, are of interest.
Online complements to Winkelmann and Boes (2009).
Winkelmann, R., and Boes, S. (2009). Analysis of Microdata, 2nd ed. Berlin and Heidelberg: SpringerVerlag.
## completed fertility subset
data("GSS7402", package = "AER")
gss40 < subset(GSS7402, age >= 40)
## Chapter 1
## exploratory statistics
gss_kids < prop.table(table(gss40$kids))
names(gss_kids)[9] < "8+"
gss_zoo < as.matrix(with(gss40, cbind(
tapply(kids, year, mean),
tapply(kids, year, function(x) mean(x <= 0)),
tapply(education, year, mean))))
colnames(gss_zoo) < c("Number of children",
"Proportion childless", "Years of schooling")
gss_zoo < zoo(gss_zoo, sort(unique(gss40$year)))
## visualizations instead of tables
barplot(gss_kids,
xlab = "Number of children ever borne to women (age 40+)",
ylab = "Relative frequencies")
library("lattice")
trellis.par.set(theme = canonical.theme(color = FALSE))
print(xyplot(gss_zoo[,3:1], type = "b", xlab = "Year"))
## Chapter 3, Example 3.14
## Table 3.1
gss40$nokids < factor(gss40$kids <= 0, levels = c(FALSE, TRUE), labels = c("no", "yes"))
gss40$trend < gss40$year  1974
nokids_p1 < glm(nokids ~ 1, data = gss40, family = binomial(link = "probit"))
nokids_p2 < glm(nokids ~ trend, data = gss40, family = binomial(link = "probit"))
nokids_p3 < glm(nokids ~ trend + education + ethnicity + siblings,
data = gss40, family = binomial(link = "probit"))
lrtest(nokids_p1, nokids_p2, nokids_p3)
## Chapter 4, Figure 4.4
library("effects")
nokids_p3_ef < effect("education", nokids_p3, xlevels = list(education = 0:20))
plot(nokids_p3_ef, rescale.axis = FALSE, ylim = c(0, 0.3))
## Chapter 8, Example 8.11
kids_pois < glm(kids ~ education + trend + ethnicity + immigrant + lowincome16 + city16,
data = gss40, family = poisson)
library("MASS")
kids_nb < glm.nb(kids ~ education + trend + ethnicity + immigrant + lowincome16 + city16,
data = gss40)
lrtest(kids_pois, kids_nb)
## More examples can be found in:
## help("WinkelmannBoes2009")
Guns is a balanced panel of data on 50 US states, plus the District of Columbia (for a total of 51 states), by year for 1977–1999.
data("Guns")
A data frame containing 1,173 observations on 13 variables.
factor indicating state.
factor indicating year.
violent crime rate (incidents per 100,000 members of the population).
murder rate (incidents per 100,000).
robbery rate (incidents per 100,000).
incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; value for the previous year).
percent of state population that is AfricanAmerican, ages 10 to 64.
percent of state population that is Caucasian, ages 10 to 64.
percent of state population that is male, ages 10 to 29.
state population, in millions of people.
real per capita personal income in the state (US dollars).
population per square mile of land area, divided by 1,000.
factor. Does the state have a shall carry law in effect in that year?
Each observation is a given state in a given year. There are a total of 51 states times 23 years = 1,173 observations.
Online complements to Stock and Watson (2007).
Ayres, I., and Donohue, J.J. (2003). Shooting Down the ‘More Guns Less Crime’ Hypothesis. Stanford Law Review, 55, 1193–1312.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
## data
data("Guns")
## visualization
library("lattice")
xyplot(log(violent) ~ as.numeric(as.character(year))  state, data = Guns, type = "l")
## Stock & Watson (2007), Empirical Exercise 10.1, pp. 376377
fm1 < lm(log(violent) ~ law, data = Guns)
coeftest(fm1, vcov = sandwich)
fm2 < lm(log(violent) ~ law + prisoners + density + income +
population + afam + cauc + male, data = Guns)
coeftest(fm2, vcov = sandwich)
fm3 < lm(log(violent) ~ law + prisoners + density + income +
population + afam + cauc + male + state, data = Guns)
printCoefmat(coeftest(fm3, vcov = sandwich)[1:9,])
fm4 < lm(log(violent) ~ law + prisoners + density + income +
population + afam + cauc + male + state + year, data = Guns)
printCoefmat(coeftest(fm4, vcov = sandwich)[1:9,])
Crosssection data originating from the Medical Expenditure Panel Survey survey conducted in 1996.
data("HealthInsurance")
A data frame containing 8,802 observations on 11 variables.
factor. Is the selfreported health status “healthy”?.
age in years.
factor. Is there any limitation?
factor indicating gender.
factor. Does the individual have a health insurance?
factor. Is the individual married?
factor. Is the individual selfemployed?
family size.
factor indicating region.
factor indicating ethnicity: AfricanAmerican, Caucasian, other.
factor indicating highest degree attained: no degree, GED (high school equivalent), high school, bachelor, master, PhD, other.
This is a subset of the data used in Perry and Rosen (2004).
Online complements to Stock and Watson (2007).
Perry, C. and Rosen, H.S. (2004). “The SelfEmployed are Less Likely than WageEarners to Have Health Insurance. So What?” in HoltzEakin, D. and Rosen, H.S. (eds.), Entrepeneurship and Public Policy, MIT Press.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
data("HealthInsurance")
summary(HealthInsurance)
prop.table(xtabs(~ selfemp + insurance, data = HealthInsurance), 1)
Crosssection data on the Home Mortgage Disclosure Act (HMDA).
data("HMDA")
A data frame containing 2,380 observations on 14 variables.
Factor. Was the mortgage denied?
Payments to income ratio.
Housing expense to income ratio.
Loan to value ratio.
Factor. Credit history: consumer payments.
Factor. Credit history: mortgage payments.
Factor. Public bad credit record?
1989 Massachusetts unemployment rate in applicant's industry.
Factor. Is the individual selfemployed?
Factor. Was the individual denied mortgage insurance?
Factor. Is the unit a condominium?
Factor. Is the individual AfricanAmerican?
Factor. Is the individual single?
Factor. Does the individual have a highschool diploma?
Only includes variables used by Stock and Watson (2007), some of which had to be generated from the raw data.
Online complements to Stock and Watson (2007).
Munnell, A. H., Tootell, G. M. B., Browne, L. E. and McEneaney, J. (1996). Mortgage Lending in Boston: Interpreting HMDA Data. American Economic Review, 86, 25–53.
Stock, J. H. and Watson, M. W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
data("HMDA")
## Stock and Watson (2007)
## Equations 11.1, 11.3, 11.7, 11.8 and 11.10, pp. 387395
fm1 < lm(I(as.numeric(deny)  1) ~ pirat, data = HMDA)
fm2 < lm(I(as.numeric(deny)  1) ~ pirat + afam, data = HMDA)
fm3 < glm(deny ~ pirat, family = binomial(link = "probit"), data = HMDA)
fm4 < glm(deny ~ pirat + afam, family = binomial(link = "probit"), data = HMDA)
fm5 < glm(deny ~ pirat + afam, family = binomial(link = "logit"), data = HMDA)
## More examples can be found in:
## help("StockWatson2007")
Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
data("HousePrices")
A data frame containing 546 observations on 12 variables.
Sale price of a house.
Lot size of a property in square feet.
Number of bedrooms.
Number of full bathrooms.
Number of stories excluding basement.
Factor. Does the house have a driveway?
Factor. Does the house have a recreational room?
Factor. Does the house have a full finished basement?
Factor. Does the house use gas for hot water heating?
Factor. Is there central air conditioning?
Number of garage places.
Factor. Is the house located in the preferred neighborhood of the city?
Journal of Applied Econometrics Data Archive.
http://qed.econ.queensu.ca/jae/1996v11.6/anglingencay/
Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.
Verbeek, M. (2004). A Guide to Modern Econometrics, 2nd ed. Chichester, UK: John Wiley.
data("HousePrices")
### Anglin + Gencay (1996), Table II
fm_ag < lm(log(price) ~ driveway + recreation + fullbase + gasheat +
aircon + garage + prefer + log(lotsize) + log(bedrooms) +
log(bathrooms) + log(stories), data = HousePrices)
### Anglin + Gencay (1996), Table III
fm_ag2 < lm(log(price) ~ driveway + recreation + fullbase + gasheat +
aircon + garage + prefer + log(lotsize) + bedrooms +
bathrooms + stories, data = HousePrices)
### Verbeek (2004), Table 3.1
fm < lm(log(price) ~ log(lotsize) + bedrooms + bathrooms + aircon, data = HousePrices)
summary(fm)
### Verbeek (2004), Table 3.2
fm_ext < lm(log(price) ~ .  lotsize + log(lotsize), data = HousePrices)
summary(fm_ext)
### Verbeek (2004), Table 3.3
fm_lin < lm(price ~ . , data = HousePrices)
summary(fm_lin)
Fit instrumentalvariable regression by twostage least squares. This is equivalent to direct instrumentalvariables estimation when the number of instruments is equal to the number of predictors.
ivreg(formula, instruments, data, subset, na.action, weights, offset,
contrasts = NULL, model = TRUE, y = TRUE, x = FALSE, ...)
formula , instruments

formula specification(s) of the regression
relationship and the instruments. Either 
data 
an optional data frame containing the variables in the model.
By default the variables are taken from the environment of the 
subset 
an optional vector specifying a subset of observations to be used in fitting the model. 
na.action 
a function that indicates what should happen when the
data contain 
weights 
an optional vector of weights to be used in the fitting process. 
offset 
an optional offset that can be used to specify an a priori known component to be included during fitting. 
contrasts 
an optional list. See the 
model , x , y

logicals. If 
... 
further arguments passed to 
ivreg
is the highlevel interface to the workhorse function ivreg.fit
,
a set of standard methods (including print
, summary
, vcov
, anova
,
hatvalues
, predict
, terms
, model.matrix
, bread
,
estfun
) is available and described on summary.ivreg
.
Regressors and instruments for ivreg
are most easily specified in a formula
with two parts on the righthand side, e.g., y ~ x1 + x2  z1 + z2 + z3
,
where x1
and x2
are the regressors and z1
,
z2
, and z3
are the instruments. Note that exogenous
regressors have to be included as instruments for themselves. For
example, if there is one exogenous regressor ex
and one endogenous
regressor en
with instrument in
, the appropriate formula
would be y ~ ex + en  ex + in
. Equivalently, this can be specified as
y ~ ex + en  .  en + in
, i.e., by providing an update formula with a
.
in the second part of the formula. The latter is typically more convenient,
if there is a large number of exogenous regressors.
ivreg
returns an object of class "ivreg"
, with the following components:
coefficients 
parameter estimates. 
residuals 
a vector of residuals. 
fitted.values 
a vector of predicted means. 
weights 
either the vector of weights used (if any) or 
offset 
either the offset used (if any) or 
n 
number of observations. 
nobs 
number of observations with nonzero weights. 
rank 
the numeric rank of the fitted linear model. 
df.residual 
residual degrees of freedom for fitted model. 
cov.unscaled 
unscaled covariance matrix for the coefficients. 
sigma 
residual standard error. 
call 
the original function call. 
formula 
the model formula. 
terms 
a list with elements 
levels 
levels of the categorical regressors. 
contrasts 
the contrasts used for categorical regressors. 
model 
the full model frame (if 
y 
the response vector (if 
x 
a list with elements 
Greene, W. H. (1993) Econometric Analysis, 2nd ed., Macmillan.
## data
data("CigarettesSW", package = "AER")
CigarettesSW < transform(CigarettesSW,
rprice = price/cpi,
rincome = income/population/cpi,
tdiff = (taxs  tax)/cpi
)
## model
fm < ivreg(log(packs) ~ log(rprice) + log(rincome)  log(rincome) + tdiff + I(tax/cpi),
data = CigarettesSW, subset = year == "1995")
summary(fm)
summary(fm, vcov = sandwich, df = Inf, diagnostics = TRUE)
## ANOVA
fm2 < ivreg(log(packs) ~ log(rprice)  tdiff, data = CigarettesSW, subset = year == "1995")
anova(fm, fm2)
Fit instrumentalvariable regression by twostage least squares. This is equivalent to direct instrumentalvariables estimation when the number of instruments is equal to the number of predictors.
ivreg.fit(x, y, z, weights, offset, ...)
x 
regressor matrix. 
y 
vector with dependent variable. 
z 
instruments matrix. 
weights 
an optional vector of weights to be used in the fitting process. 
offset 
an optional offset that can be used to specify an a priori known component to be included during fitting. 
... 
further arguments passed to 
ivreg
is the highlevel interface to the workhorse function ivreg.fit
,
a set of standard methods (including summary
, vcov
, anova
,
hatvalues
, predict
, terms
, model.matrix
, bread
,
estfun
) is available and described on summary.ivreg
.
ivreg.fit
is a convenience interface to lm.fit
(or lm.wfit
)
for first projecting x
onto the image of z
and the running
a regression of y
onto the projected x
.
ivreg.fit
returns an unclassed list with the following components:
coefficients 
parameter estimates. 
residuals 
a vector of residuals. 
fitted.values 
a vector of predicted means. 
weights 
either the vector of weights used (if any) or 
offset 
either the offset used (if any) or 
estfun 
a matrix containing the empirical estimating functions. 
n 
number of observations. 
nobs 
number of observations with nonzero weights. 
rank 
the numeric rank of the fitted linear model. 
df.residual 
residual degrees of freedom for fitted model. 
cov.unscaled 
unscaled covariance matrix for the coefficients. 
sigma 
residual standard error. 
## data
data("CigarettesSW")
CigarettesSW < transform(CigarettesSW,
rprice = price/cpi,
rincome = income/population/cpi,
tdiff = (taxs  tax)/cpi
)
## highlevel interface
fm < ivreg(log(packs) ~ log(rprice) + log(rincome)  log(rincome) + tdiff + I(tax/cpi),
data = CigarettesSW, subset = year == "1995")
## lowlevel interface
y < fm$y
x < model.matrix(fm, component = "regressors")
z < model.matrix(fm, component = "instruments")
ivreg.fit(x, y, z)$coefficients
Subscriptions to economics journals at US libraries, for the year 2000.
data("Journals")
A data frame containing 180 observations on 10 variables.
Journal title.
factor with publisher name.
factor. Is the journal published by a scholarly society?
Library subscription price.
Number of pages.
Characters per page.
Total number of citations.
Year journal was founded.
Number of library subscriptions.
factor with field description.
Data on 180 economic journals, collected in particular for analyzing journal pricing. See also https://econ.ucsb.edu/~tedb/Journals/jpricing.html for general information on this topic as well as a more uptodate version of the data set. This version is taken from Stock and Watson (2007).
The data as obtained from the online complements for Stock and Watson (2007) contained two journals with title “World Development”. One of these (observation 80) seemed to be an error and was changed to “The World Economy”.
Online complements to Stock and Watson (2007).
Bergstrom, T. (2001). Free Labor for Costly Journals? Journal of Economic Perspectives, 15, 183–198.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
## data and transformed variables
data("Journals")
journals < Journals[, c("subs", "price")]
journals$citeprice < Journals$price/Journals$citations
journals$age < 2000  Journals$foundingyear
journals$chars < Journals$charpp*Journals$pages/10^6
## Stock and Watson (2007)
## Figure 8.9 (a) and (b)
plot(subs ~ citeprice, data = journals, pch = 19)
plot(log(subs) ~ log(citeprice), data = journals, pch = 19)
fm1 < lm(log(subs) ~ log(citeprice), data = journals)
abline(fm1)
## Table 8.2, use HC1 for comparability with Stata
fm2 < lm(subs ~ citeprice + age + chars, data = log(journals))
fm3 < lm(subs ~ citeprice + I(citeprice^2) + I(citeprice^3) +
age + I(age * citeprice) + chars, data = log(journals))
fm4 < lm(subs ~ citeprice + age + I(age * citeprice) + chars, data = log(journals))
coeftest(fm1, vcov = vcovHC(fm1, type = "HC1"))
coeftest(fm2, vcov = vcovHC(fm2, type = "HC1"))
coeftest(fm3, vcov = vcovHC(fm3, type = "HC1"))
coeftest(fm4, vcov = vcovHC(fm4, type = "HC1"))
waldtest(fm3, fm4, vcov = vcovHC(fm3, type = "HC1"))
## changes with respect to age
library("strucchange")
## NyblomHansen test
scus < gefp(subs ~ citeprice, data = log(journals), fit = lm, order.by = ~ age)
plot(scus, functional = meanL2BB)
## estimate breakpoint(s)
journals < journals[order(journals$age),]
bp < breakpoints(subs ~ citeprice, data = log(journals), h = 20)
plot(bp)
bp.age < journals$age[bp$breakpoints]
## visualization
plot(subs ~ citeprice, data = log(journals), pch = 19, col = (age > log(bp.age)) + 1)
abline(coef(bp)[1,], col = 1)
abline(coef(bp)[2,], col = 2)
legend("bottomleft", legend = c("age > 18", "age < 18"), lty = 1, col = 2:1, bty = "n")
Klein's Model I for the US economy.
data("KleinI")
An annual multiple time series from 1920 to 1941 with 9 variables.
Consumption.
Corporate profits.
Private wage bill.
Investment.
Previous year's capital stock.
Gross national product.
Government wage bill.
Government spending.
Taxes.
Online complements to Greene (2003). Table F15.1.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Klein, L. (1950). Economic Fluctuations in the United States, 1921–1941. New York: John Wiley.
Maddala, G.S. (1977). Econometrics. New York: McGrawHill.
data("KleinI", package = "AER")
plot(KleinI)
## Greene (2003), Tab. 15.3, OLS
library("dynlm")
fm_cons < dynlm(consumption ~ cprofits + L(cprofits) + I(pwage + gwage), data = KleinI)
fm_inv < dynlm(invest ~ cprofits + L(cprofits) + capital, data = KleinI)
fm_pwage < dynlm(pwage ~ gnp + L(gnp) + I(time(gnp)  1931), data = KleinI)
summary(fm_cons)
summary(fm_inv)
summary(fm_pwage)
## More examples can be found in:
## help("Greene2003")
US macroeconomic time series, 1947–1962.
data("Longley")
An annual multiple time series from 1947 to 1962 with 4 variables.
Number of people employed (in 1000s).
GNP deflator.
Gross national product.
Number of people in the armed forces.
An extended version of this data set, formatted as a "data.frame"
is available as longley
in base R.
Online complements to Greene (2003). Table F4.2.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Longley, J.W. (1967). An Appraisal of LeastSquares Programs from the Point of View of the User. Journal of the American Statistical Association, 62, 819–841.
data("Longley")
library("dynlm")
## Example 4.6 in Greene (2003)
fm1 < dynlm(employment ~ time(employment) + price + gnp + armedforces,
data = Longley)
fm2 < update(fm1, end = 1961)
cbind(coef(fm2), coef(fm1))
## Figure 4.3 in Greene (2003)
plot(rstandard(fm2), type = "b", ylim = c(3, 3))
abline(h = c(2, 2), lty = 2)
US time series data on prices and cost shares in manufacturing, 1947–1971.
data("ManufactCosts")
An annual multiple time series from 1947 to 1971 with 9 variables.
Cost index.
Capital cost share.
Labor cost share.
Energy cost share.
Materials cost share.
Capital price.
Labor price.
Energy price.
Materials price.
Online complements to Greene (2003).
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Berndt, E. and Wood, D. (1975). Technology, Prices, and the Derived Demand for Energy. Review of Economics and Statistics, 57, 376–384.
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
data("ManufactCosts")
plot(ManufactCosts)
A time series of intraday percentage returns of Deutsche mark/US dollar (DEM/USD) exchange rates, consisting of two observations per day from 19921001 through 19930929.
data("MarkDollar")
A univariate time series of 518 returns (exact dates unknown) for the DEM/USD exchange rate.
Journal of Business & Economic Statistics Data Archive.
http://www.amstat.org/publications/jbes/upload/index.cfm?fuseaction=ViewArticles&pub=JBES&issue=962APR
Bollerslev, T., and Ghysels, E. (1996). Periodic Autoregressive Conditional Heteroskedasticity. Journal of Business & Economic Statistics, 14, 139–151.
library("tseries")
data("MarkDollar")
## GARCH(1,1)
fm < garch(MarkDollar, grad = "numerical")
summary(fm)
logLik(fm)
A daily time series of percentage returns of Deutsche mark/British pound (DEM/GBP) exchange rates from 19840103 through 19911231.
data("MarkPound")
A univariate time series of 1974 returns (exact dates unknown) for the DEM/GBP exchange rate.
Greene (2003, Table F11.1) rounded the series to six digits while eight digits are given in
Bollerslev and Ghysels (1996). Here, we provide the original data. Using round
a series can be produced that is virtually identical to that of Greene (2003) (except for
eight observations where a slightly different rounding arithmetic was used).
Journal of Business & Economic Statistics Data Archive.
http://www.amstat.org/publications/jbes/upload/index.cfm?fuseaction=ViewArticles&pub=JBES&issue=962APR
Bollerslev, T., and Ghysels, E. (1996). Periodic Autoregressive Conditional Heteroskedasticity. Journal of Business & Economic Statistics, 14, 139–151.
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
## data as given by Greene (2003)
data("MarkPound")
mp < round(MarkPound, digits = 6)
## Figure 11.3 in Greene (2003)
plot(mp)
## Example 11.8 in Greene (2003), Table 11.5
library("tseries")
mp_garch < garch(mp, grad = "numerical")
summary(mp_garch)
logLik(mp_garch)
## Greene (2003) also includes a constant and uses different
## standard errors (presumably computed from Hessian), here
## OPG standard errors are used. garchFit() in "fGarch"
## implements the approach used by Greene (2003).
## compare Errata to Greene (2003)
library("dynlm")
res < residuals(dynlm(mp ~ 1))^2
mp_ols < dynlm(res ~ L(res, 1:10))
summary(mp_ols)
logLik(mp_ols)
summary(mp_ols)$r.squared * length(residuals(mp_ols))
The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in Massachusetts.
data("MASchools")
A data frame containing 220 observations on 16 variables.
character. District code.
character. Municipality name.
Expenditures per pupil, regular.
Expenditures per pupil, special needs.
Expenditures per pupil, bilingual.
Expenditures per pupil, occupational.
Expenditures per pupil, total.
Students per computer.
Special education students (per cent).
Percent qualifying for reducedprice lunch.
Studentteacher ratio.
Per capita income.
4th grade score (math + English + science).
8th grade score (math + English + science).
Average teacher salary.
Percent of English learners.
The Massachusetts data are districtwide averages for public elementary school districts in 1998. The test score is taken from the Massachusetts Comprehensive Assessment System (MCAS) test, administered to all fourth graders in Massachusetts public schools in the spring of 1998. The test is sponsored by the Massachusetts Department of Education and is mandatory for all public schools. The data analyzed here are the overall total score, which is the sum of the scores on the English, Math, and Science portions of the test. Data on the studentteacher ratio, the percent of students receiving a subsidized lunch and on the percent of students still learning english are averages for each elementary school district for the 1997–1998 school year and were obtained from the Massachusetts department of education. Data on average district income are from the 1990 US Census.
Online complements to Stock and Watson (2007).
Stock, J. H. and Watson, M. W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
## Massachusetts
data("MASchools")
## compare with California
data("CASchools")
CASchools$stratio < with(CASchools, students/teachers)
CASchools$score4 < with(CASchools, (math + read)/2)
## Stock and Watson, parts of Table 9.1, p. 330
vars < c("score4", "stratio", "english", "lunch", "income")
cbind(
CA_mean = sapply(CASchools[, vars], mean),
CA_sd = sapply(CASchools[, vars], sd),
MA_mean = sapply(MASchools[, vars], mean),
MA_sd = sapply(MASchools[, vars], sd))
## Stock and Watson, Table 9.2, p. 332, col. (1)
fm1 < lm(score4 ~ stratio, data = MASchools)
coeftest(fm1, vcov = vcovHC(fm1, type = "HC1"))
## More examples, notably the entire Table 9.2, can be found in:
## help("StockWatson2007")
Crosssection data originating from the 1986 Medicaid Consumer Survey. The data comprise two groups of Medicaid eligibles at two sites in California (Santa Barbara and Ventura counties): a group enrolled in a managed care demonstration program and a feeforservice comparison group of nonenrollees.
data("Medicaid1986")
A data frame containing 996 observations on 14 variables.
Number of doctor visits.
Length of observation period for ambulatory care (days).
Total number of children in the household.
Age of the respondent.
Annual household income (average of income range in million USD).
The first principal component (divided by 1000) of three healthstatus variables: functional limitations, acute conditions, and chronic conditions.
The second principal component (divided by 1000) of three healthstatus variables: functional limitations, acute conditions, and chronic conditions.
Availability of health services (0 = low access, 1 = high access).
Factor. Is the individual married?
Factor indicating gender.
Factor indicating ethnicity ("cauc"
or "other"
).
Number of years completed in school.
Factor. Is the individual enrolled in a demonstration program?
Factor indicating the managed care demonstration program:
Aid to Families with Dependent Children ("afdc"
) or
noninstitutionalized Supplementary Security Income ("ssi"
).
Journal of Applied Econometrics Data Archive.
http://qed.econ.queensu.ca/jae/1997v12.3/gurmu/
Gurmu, S. (1997). SemiParametric Estimation of Hurdle Regression Models with an Application to Medicaid Utilization. Journal of Applied Econometrics, 12, 225–242.
## data and packages
data("Medicaid1986")
library("MASS")
library("pscl")
## scale regressors
Medicaid1986$age2 < Medicaid1986$age^2 / 100
Medicaid1986$school < Medicaid1986$school / 10
Medicaid1986$income < Medicaid1986$income / 10
## subsets
afdc < subset(Medicaid1986, program == "afdc")[, c(1, 3:4, 15, 5:9, 11:13)]
ssi < subset(Medicaid1986, program == "ssi")[, c(1, 3:4, 15, 5:13)]
## Gurmu (1997):
## Table VI., Poisson and negbin models
afdc_pois < glm(visits ~ ., data = afdc, family = poisson)
summary(afdc_pois)
coeftest(afdc_pois, vcov = sandwich)
afdc_nb < glm.nb(visits ~ ., data = afdc)
ssi_pois < glm(visits ~ ., data = ssi, family = poisson)
ssi_nb < glm.nb(visits ~ ., data = ssi)
## Table VII., Hurdle models (without semiparametric effects)
afdc_hurdle < hurdle(visits ~ .  .  access, data = afdc, dist = "negbin")
ssi_hurdle < hurdle(visits ~ .  .  access, data = ssi, dist = "negbin")
## Table VIII., Observed and expected frequencies
round(cbind(
Observed = table(afdc$visits)[1:8],
Poisson = sapply(0:7, function(x) sum(dpois(x, fitted(afdc_pois)))),
Negbin = sapply(0:7, function(x) sum(dnbinom(x, mu = fitted(afdc_nb), size = afdc_nb$theta))),
Hurdle = colSums(predict(afdc_hurdle, type = "prob")[,1:8])
)/nrow(afdc), digits = 3) * 100
round(cbind(
Observed = table(ssi$visits)[1:8],
Poisson = sapply(0:7, function(x) sum(dpois(x, fitted(ssi_pois)))),
Negbin = sapply(0:7, function(x) sum(dnbinom(x, mu = fitted(ssi_nb), size = ssi_nb$theta))),
Hurdle = colSums(predict(ssi_hurdle, type = "prob")[,1:8])
)/nrow(ssi), digits = 3) * 100
Crosssection data about fixed versus adjustable mortgages for 78 households.
data("Mortgage")
A data frame containing 78 observations on 16 variables.
Factor with levels "fixed"
and "adjustable"
.
Age of the borrower.
Years of schooling for the borrower.
Net worth of the borrower.
Fixed interest rate.
Ratio of points paid on adjustable to fixed rate mortgages.
Ratio of maturities on adjustable to fixed rate mortgages.
Years at the present address.
Factor. Is the borrower married?
Factor. Is the borrower a firsttime home buyer?
Factor. Is the borrower selfemployed?
The difference between the 10year treasury rate less the 1year treasury rate.
The margin on the adjustable rate mortgage.
Factor. Is there a coborrower?
Shortterm liabilities.
Liquid assets.
The data is from Baltagi (2002).
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
Dhillon, U.S., Shilling, J.D. and Sirmans, C.F. (1987). Choosing Between Fixed and Adjustable Rate Mortgages. Journal of Money, Credit and Banking, 19, 260–267.
data("Mortgage")
plot(rate ~ interest, data = Mortgage, breaks = fivenum(Mortgage$interest))
plot(rate ~ margin, data = Mortgage, breaks = fivenum(Mortgage$margin))
plot(rate ~ coborrower, data = Mortgage)
Time series of stock of motor cycles (two wheels) in The Netherlands (in thousands).
data("MotorCycles")
An annual univariate time series from 1946 to 1993.
An updated version is available under the name MotorCycles2
. However, the values for the years 1992 and 1993 differ there.
Online complements to Franses (1998).
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
data("MotorCycles")
plot(MotorCycles)
Time series of stock of motor cycles (two wheels) in The Netherlands (in thousands).
data("MotorCycles2")
An annual univariate time series from 1946 to 2012.
This is an update of the series that was available with Franses (1998). However, the values for the years 1992 and 1993 differ.
Online complements to Franses, van Dijk and Opschoor (2014).
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
Franses, P.H., van Dijk, D. and Opschoor, A. (2014). Time Series Models for Business and Economic Forecasting, 2nd ed. Cambridge, UK: Cambridge University Press.
data("MotorCycles2")
plot(MotorCycles2)
Time series of the MSCI Switzerland index.
data("MSCISwitzerland")
A daily univariate time series from 19941230 to 20121231 (of class "zoo"
with "Date"
index).
Online complements to Franses, van Dijk and Opschoor (2014).
Ding, Z., Granger, C. W. J. and Engle, R. F. (1993). A Long Memory Property of Stock Market Returns and a New Model. Journal of Empirical Finance, 1(1), 83–106.
Franses, P.H., van Dijk, D. and Opschoor, A. (2014). Time Series Models for Business and Economic Forecasting, 2nd ed. Cambridge, UK: Cambridge University Press.
data("MSCISwitzerland", package = "AER")
## p.190, Fig. 7.6
dlmsci < 100 * diff(log(MSCISwitzerland))
plot(dlmsci)
dlmsci9501 < window(dlmsci, end = as.Date("20011231"))
## Figure 7.7
plot(acf(dlmsci9501^2, lag.max = 200, na.action = na.exclude),
ylim = c(0.1, 0.3), type = "l")
## GARCH(1,1) model, p.190, eq. (7.60)
## standard errors using first derivatives (as apparently used by Franses et al.)
library("tseries")
msci9501_g11 < garch(zooreg(dlmsci9501), trace = FALSE)
summary(msci9501_g11)
## standard errors using second derivatives
library("fGarch")
msci9501_g11a < garchFit( ~ garch(1,1), include.mean = FALSE,
data = dlmsci9501, trace = FALSE)
summary(msci9501_g11a)
round(msci9501_g11a@fit$coef, 3)
round(msci9501_g11a@fit$se.coef, 3)
## Fig. 7.8, p.192
plot(msci9501_g11a, which = 2)
abline(h = sd(dlmsci9501))
## TGARCH model (also known as GJRGARCH model), p. 191, eq. (7.61)
msci9501_tg11 < garchFit( ~ aparch(1,1), include.mean = FALSE,
include.delta = FALSE, delta = 2, data = dlmsci9501, trace = FALSE)
summary(msci9501_tg11)
## GJR form using reparameterization as given by Ding et al. (1993, pp. 100101)
coef(msci9501_tg11)["alpha1"] * (1  coef(msci9501_tg11)["gamma1"])^2 ## alpha*
4 * coef(msci9501_tg11)["alpha1"] * coef(msci9501_tg11)["gamma1"] ## gamma*
## GARCH and GJRGARCH with rugarch
library("rugarch")
spec_g11 < ugarchspec(variance.model = list(model = "sGARCH"),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE))
msci9501_g11b < ugarchfit(spec_g11, data = dlmsci9501)
msci9501_g11b
spec_gjrg11 < ugarchspec(variance.model = list(model = "gjrGARCH", garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0, 0), include.mean = FALSE))
msci9501_gjrg11 < ugarchfit(spec_gjrg11, data = dlmsci9501)
msci9501_gjrg11
round(coef(msci9501_gjrg11), 3)
Panel data set for 265 Swedish municipalities covering 9 years (19791987).
data("Municipalities")
A data frame containing 2,385 observations on 5 variables.
factor with ID number for municipality.
factor coding year.
total expenditures.
total ownsource revenues.
intergovernmental grants received by the municipality.
Total expenditures contains both capital and current expenditures.
Expenditures, revenues, and grants are expressed in million SEK. The series are deflated and in per capita form. The implicit deflator is a municipalityspecific price index obtained by dividing total local consumption expenditures at current prices by total local consumption expenditures at fixed (1985) prices.
The data are gathered by Statistics Sweden and obtained from Financial Accounts for the Municipalities (Kommunernas Finanser).
Journal of Applied Econometrics Data Archive.
http://qed.econ.queensu.ca/jae/2000v15.4/dahlbergjohansson/
Dahlberg, M., and Johansson, E. (2000). An Examination of the Dynamic Behavior of Local Governments Using GMM Bootstrapping Methods. Journal of Applied Econometrics, 15, 401–416.
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
## Greene (2003), Table 18.2
data("Municipalities")
summary(Municipalities)
Crosssection data on states in 1950.
data("MurderRates")
A data frame containing 44 observations on 8 variables.
Murder rate per 100,000 (FBI estimate, 1950).
Number of convictions divided by number of murders in 1950.
Average number of executions during 1946–1950 divided by convictions in 1950.
Median time served (in months) of convicted murderers released in 1951.
Median family income in 1949 (in 1,000 USD).
Labor force participation rate in 1950 (in percent).
Proportion of population that is nonCaucasian in 1950.
Factor indicating region.
Maddala (2001), Table 8.4, p. 330
Maddala, G.S. (2001). Introduction to Econometrics, 3rd ed. New York: John Wiley.
McManus, W.S. (1985). Estimates of the Deterrent Effect of Capital Punishment: The Importance of the Researcher's Prior Beliefs. Journal of Political Economy, 93, 417–425.
Stokes, H. (2004). On the Advantage of Using Two or More Econometric Software Systems to Solve the Same Problem. Journal of Economic and Social Measurement, 29, 307–320.
data("MurderRates")
## Maddala (2001, pp. 331)
fm_lm < lm(rate ~ . + I(executions > 0), data = MurderRates)
summary(fm_lm)
model < I(executions > 0) ~ time + income + noncauc + lfp + southern
fm_lpm < lm(model, data = MurderRates)
summary(fm_lpm)
## Binomial models. Note: southern coefficient
fm_logit < glm(model, data = MurderRates, family = binomial)
summary(fm_logit)
fm_logit2 < glm(model, data = MurderRates, family = binomial,
control = list(epsilon = 1e15, maxit = 50, trace = FALSE))
summary(fm_logit2)
fm_probit < glm(model, data = MurderRates, family = binomial(link = "probit"))
summary(fm_probit)
fm_probit2 < glm(model, data = MurderRates , family = binomial(link = "probit"),
control = list(epsilon = 1e15, maxit = 50, trace = FALSE))
summary(fm_probit2)
## Explanation: quasicomplete separation
with(MurderRates, table(executions > 0, southern))
Panel data originating from 6 US states over the period 1967–1989.
data("NaturalGas")
A data frame containing 138 observations on 10 variables.
factor. State abbreviation.
factor. State Code.
factor coding year.
Consumption of natural gas by the residential sector.
Price of natural gas
Price of electricity.
Price of distillate fuel oil.
Price of liquefied petroleum gas.
Heating degree days.
Real percapita personal income.
The data are from Baltagi (2002).
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
data("NaturalGas")
summary(NaturalGas)
Crosssection data originating from the US National Medical Expenditure Survey (NMES) conducted in 1987 and 1988. The NMES is based upon a representative, national probability sample of the civilian noninstitutionalized population and individuals admitted to longterm care facilities during 1987. The data are a subsample of individuals ages 66 and over all of whom are covered by Medicare (a public insurance program providing substantial protection against healthcare costs).
data("NMES1988")
A data frame containing 4,406 observations on 19 variables.
Number of physician office visits.
Number of nonphysician office visits.
Number of physician hospital outpatient visits.
Number of nonphysician hospital outpatient visits.
Emergency room visits.
Number of hospital stays.
Factor indicating selfperceived health status, levels are
"poor"
, "average"
(reference category), "excellent"
.
Number of chronic conditions.
Factor indicating whether the individual has a condition that
limits activities of daily living ("limited"
) or not ("normal"
).
Factor indicating region, levels are northeast
,
midwest
, west
, other
(reference category).
Age in years (divided by 10).
Factor. Is the individual AfricanAmerican?
Factor indicating gender.
Factor. is the individual married?
Number of years of education.
Family income in USD 10,000.
Factor. Is the individual employed?
Factor. Is the individual covered by private insurance?
Factor. Is the individual covered by Medicaid?
Journal of Applied Econometrics Data Archive for Deb and Trivedi (1997).
http://qed.econ.queensu.ca/jae/1997v12.3/debtrivedi/
Cameron, A.C. and Trivedi, P.K. (1998). Regression Analysis of Count Data. Cambridge: Cambridge University Press.
Deb, P., and Trivedi, P.K. (1997). Demand for Medical Care by the Elderly: A Finite Mixture Approach. Journal of Applied Econometrics, 12, 313–336.
Zeileis, A., Kleiber, C., and Jackman, S. (2008). Regression Models for Count Data in R. Journal of Statistical Software, 27(8). doi:10.18637/jss.v027.i08.
## packages
library("MASS")
library("pscl")
## select variables for analysis
data("NMES1988")
nmes < NMES1988[, c(1, 7:8, 13, 15, 18)]
## dependent variable
hist(nmes$visits, breaks = 0:(max(nmes$visits)+1)  0.5)
plot(table(nmes$visits))
## convenience transformations for exploratory graphics
clog < function(x) log(x + 0.5)
cfac < function(x, breaks = NULL) {
if(is.null(breaks)) breaks < unique(quantile(x, 0:10/10))
x < cut(x, breaks, include.lowest = TRUE, right = FALSE)
levels(x) < paste(breaks[length(breaks)], ifelse(diff(breaks) > 1,
c(paste("", breaks[c(1, length(breaks))]  1, sep = ""), "+"), ""), sep = "")
return(x)
}
## bivariate visualization
par(mfrow = c(3, 2))
plot(clog(visits) ~ health, data = nmes, varwidth = TRUE)
plot(clog(visits) ~ cfac(chronic), data = nmes)
plot(clog(visits) ~ insurance, data = nmes, varwidth = TRUE)
plot(clog(visits) ~ gender, data = nmes, varwidth = TRUE)
plot(cfac(visits, c(0:2, 4, 6, 10, 100)) ~ school, data = nmes, breaks = 9)
par(mfrow = c(1, 1))
## Poisson regression
nmes_pois < glm(visits ~ ., data = nmes, family = poisson)
summary(nmes_pois)
## LM test for overdispersion
dispersiontest(nmes_pois)
dispersiontest(nmes_pois, trafo = 2)
## sandwich covariance matrix
coeftest(nmes_pois, vcov = sandwich)
## quasipoisson model
nmes_qpois < glm(visits ~ ., data = nmes, family = quasipoisson)
## NegBin regression
nmes_nb < glm.nb(visits ~ ., data = nmes)
## hurdle regression
nmes_hurdle < hurdle(visits ~ .  chronic + insurance + school + gender,
data = nmes, dist = "negbin")
## zeroinflated regression model
nmes_zinb < zeroinfl(visits ~ .  chronic + insurance + school + gender,
data = nmes, dist = "negbin")
## compare estimated coefficients
fm < list("MLPois" = nmes_pois, "QuasiPois" = nmes_qpois, "NB" = nmes_nb,
"HurdleNB" = nmes_hurdle, "ZINB" = nmes_zinb)
round(sapply(fm, function(x) coef(x)[1:7]), digits = 3)
## associated standard errors
round(cbind("MLPois" = sqrt(diag(vcov(nmes_pois))),
"AdjPois" = sqrt(diag(sandwich(nmes_pois))),
sapply(fm[1], function(x) sqrt(diag(vcov(x)))[1:7])),
digits = 3)
## loglikelihoods and number of estimated parameters
rbind(logLik = sapply(fm, function(x) round(logLik(x), digits = 0)),
Df = sapply(fm, function(x) attr(logLik(x), "df")))
## predicted number of zeros
round(c("Obs" = sum(nmes$visits < 1),
"MLPois" = sum(dpois(0, fitted(nmes_pois))),
"AdjPois" = NA,
"QuasiPois" = NA,
"NB" = sum(dnbinom(0, mu = fitted(nmes_nb), size = nmes_nb$theta)),
"NBHurdle" = sum(predict(nmes_hurdle, type = "prob")[,1]),
"ZINB" = sum(predict(nmes_zinb, type = "prob")[,1])))
## coefficients of zeroaugmentation models
t(sapply(fm[4:5], function(x) round(x$coefficients$zero, digits = 3)))
A daily time series from 1990 to 2005 of the New York Stock Exchange composite index.
data("NYSESW")
A daily univariate time series from 19900102 to 20051111 (of class
"zoo"
with "Date"
index).
Online complements to Stock and Watson (2007).
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
## returns
data("NYSESW")
ret < 100 * diff(log(NYSESW))
plot(ret)
## Stock and Watson (2007), p. 667, GARCH(1,1) model
library("tseries")
fm < garch(coredata(ret))
summary(fm)
Panel data on gasoline consumption in 18 OECD countries over 19 years, 1960–1978.
data("OECDGas")
A data frame containing 342 observations on 6 variables.
Factor indicating country.
Year.
Logarithm of motor gasoline consumption per car.
Logarithm of real percapita income.
Logarithm of real motor gasoline price.
Logarithm of the stock of cars percapita.
The data is from Baltagi (2002).
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
Baltagi, B.H. and Griffin, J.M. (1983). Gasoline Demand in the OECD: An Application of Pooling and Testing Procedures. European Economic Review, 22, 117–137.
data("OECDGas")
library("lattice")
xyplot(exp(cars) ~ year  country, data = OECDGas, type = "l")
xyplot(exp(gas) ~ year  country, data = OECDGas, type = "l")
Crosssection data on OECD countries, used for growth regressions.
data("OECDGrowth")
A data frame with 22 observations on the following 6 variables.
real GDP in 1985 (per person of working age, i.e., age 15 to 65), in 1985 international prices.
real GDP in 1960 (per person of working age, i.e., age 15 to 65), in 1985 international prices.
average of annual ratios of real domestic investment to real GDP (1960–1985).
percentage of the workingage population that is in secondary school.
average of annual ratios of gross domestic expenditure on research and development to nominal GDP (of available observations during 1960–1985).
annual population growth 1960–1985, computed as log(pop85/pop60)/25
.
Appendix 1 Nonneman and Vanhoudt (1996), except for one bad misprint:
The value of school
for Norway is given as 0.01, the correct value is 0.1
(see Mankiw, Romer and Weil, 1992). OECDGrowth
contains the corrected data.
Mankiw, N.G., Romer, D., and Weil, D.N. (1992). A Contribution to the Empirics of Economic Growth. Quarterly Journal of Economics, 107, 407–437.
Nonneman, W., and Vanhoudt, P. (1996). A Further Augmentation of the Solow Model and the Empirics of Economic Growth. Quarterly Journal of Economics, 111, 943–953.
Zaman, A., Rousseeuw, P.J., and Orhan, M. (2001). Econometric Applications of HighBreakdown Robust Regression Techniques. Economics Letters, 71, 1–8.
data("OECDGrowth")
## Nonneman and Vanhoudt (1996), Table II
cor(OECDGrowth[, 3:6])
cor(log(OECDGrowth[, 3:6]))
## textbook Solow model
## Nonneman and Vanhoudt (1996), Table IV, and
## Zaman, Rousseeuw and Orhan (2001), Table 2
so_ols < lm(log(gdp85/gdp60) ~ log(gdp60) + log(invest) + log(popgrowth+.05),
data = OECDGrowth)
summary(so_ols)
## augmented and extended Solow growth model
## Nonneman and Vanhoudt (1996), Table IV
aso_ols < lm(log(gdp85/gdp60) ~ log(gdp60) + log(invest) +
log(school) + log(popgrowth+.05), data = OECDGrowth)
eso_ols < lm(log(gdp85/gdp60) ~ log(gdp60) + log(invest) +
log(school) + log(randd) + log(popgrowth+.05), data = OECDGrowth)
## determine unusual observations using LTS
library("MASS")
so_lts < lqs(log(gdp85/gdp60) ~ log(gdp60) + log(invest) + log(popgrowth+.05),
data = OECDGrowth, psamp = 13, nsamp = "exact")
## large residuals
nok1 < abs(residuals(so_lts))/so_lts$scale[2] > 2.5
residuals(so_lts)[nok1]/so_lts$scale[2]
## high leverage
X < model.matrix(so_ols)[,1]
cv < cov.rob(X, nsamp = "exact")
mh < sqrt(mahalanobis(X, cv$center, cv$cov))
nok2 < mh > 2.5
mh[nok2]
## bad leverage
nok < which(nok1 & nok2)
nok
## robust results without bad leverage points
so_rob < update(so_ols, subset = nok)
summary(so_rob)
## This is similar to Zaman, Rousseeuw and Orhan (2001), Table 2
## but uses exact computations (and not suboptimal results
## for the robust functions lqs and cov.rob)
Television rights for Olympic Games for US networks (in millions USD).
data("OlympicTV")
A data frame with 10 observations and 2 variables.
time series of television rights (in million USD),
factor coding television network.
Online complements to Franses (1998).
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
data("OlympicTV")
plot(OlympicTV$rights)
Quarterly time series data on employment in Orange county, 1965–1983.
data("OrangeCounty")
A quarterly multiple time series from 1965 to 1983 with 2 variables.
Quarterly employment in Orange county.
Quarterly real GNP.
The data is from Baltagi (2002).
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
data("OrangeCounty")
plot(OrangeCounty)
US earnings data, as provided in an annual survey of Parade (here from 2005), the Sunday newspaper magazine supplementing the Sunday (or Weekend) edition of many daily newspapers in the USA.
data("Parade2005")
A data frame containing 130 observations on 5 variables.
Annual personal earnings.
Age in years.
Factor indicating gender.
Factor indicating state.
Factor. Is the individual a celebrity?
In addition to the four variables provided by Parade (earnings, age, gender, and state), a fifth variable was introduced, the “celebrity factor” (here actors, athletes, TV personalities, politicians, and CEOs are considered celebrities). The data are quite far from a simple random sample, there being substantial oversampling of celebrities.
Parade (2005). What People Earn. Issue March 13, 2005.
## data
data("Parade2005")
attach(Parade2005)
summary(Parade2005)
## bivariate visualizations
plot(density(log(earnings), bw = "SJ"), type = "l", main = "log(earnings)")
rug(log(earnings))
plot(log(earnings) ~ gender, main = "log(earnings)")
## celebrity vs. noncelebrity earnings
noncel < subset(Parade2005, celebrity == "no")
cel < subset(Parade2005, celebrity == "yes")
library("ineq")
plot(Lc(noncel$earnings), main = "log(earnings)")
lines(Lc(cel$earnings), lty = 2)
lines(Lc(earnings), lty = 3)
Gini(noncel$earnings)
Gini(cel$earnings)
Gini(earnings)
## detach data
detach(Parade2005)
Time series of average monthly European spot prices for black and white pepper (fair average quality) in US dollars per ton.
data("PepperPrice")
A monthly multiple time series from 1973(10) to 1996(4) with 2 variables.
spot price for black pepper,
spot price for white pepper.
Originally available as an online supplement to Franses (1998). Now available via online complements to Franses, van Dijk and Opschoor (2014).
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. Cambridge, UK: Cambridge University Press.
Franses, P.H., van Dijk, D. and Opschoor, A. (2014). Time Series Models for Business and Economic Forecasting, 2nd ed. Cambridge, UK: Cambridge University Press.
## data
data("PepperPrice", package = "AER")
plot(PepperPrice, plot.type = "single", col = 1:2)
## package
library("tseries")
library("urca")
## unit root tests
adf.test(log(PepperPrice[, "white"]))
adf.test(diff(log(PepperPrice[, "white"])))
pp.test(log(PepperPrice[, "white"]), type = "Z(t_alpha)")
pepper_ers < ur.ers(log(PepperPrice[, "white"]),
type = "DFGLS", model = "const", lag.max = 4)
summary(pepper_ers)
## stationarity tests
kpss.test(log(PepperPrice[, "white"]))
## cointegration
po.test(log(PepperPrice))
pepper_jo < ca.jo(log(PepperPrice), ecdet = "const", type = "trace")
summary(pepper_jo)
pepper_jo2 < ca.jo(log(PepperPrice), ecdet = "const", type = "eigen")
summary(pepper_jo2)
Crosssection data on the scientific productivity of PhD students in biochemistry.
data("PhDPublications")
A data frame containing 915 observations on 6 variables.
Number of articles published during last 3 years of PhD.
factor indicating gender.
factor. Is the PhD student married?
Number of children less than 6 years old.
Prestige of the graduate program.
Number of articles published by student's mentor.
Online complements to Long (1997).
Long, J.S. (1990). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks: Sage Publications.
Long, J.S. (1997). The Origin of Sex Differences in Science. Social Forces, 68, 1297–1315.
## from Long (1997)
data("PhDPublications")
## Table 8.1, p. 227
summary(PhDPublications)
## Figure 8.2, p. 220
plot(0:10, dpois(0:10, mean(PhDPublications$articles)), type = "b", col = 2,
xlab = "Number of articles", ylab = "Probability")
lines(0:10, prop.table(table(PhDPublications$articles))[1:11], type = "b")
legend("topright", c("observed", "predicted"), col = 1:2, lty = rep(1, 2), bty = "n")
## Table 8.2, p. 228
fm_lrm < lm(log(articles + 0.5) ~ ., data = PhDPublications)
summary(fm_lrm)
2 * logLik(fm_lrm)
fm_prm < glm(articles ~ ., data = PhDPublications, family = poisson)
library("MASS")
fm_nbrm < glm.nb(articles ~ ., data = PhDPublications)
## Table 8.3, p. 246
library("pscl")
fm_zip < zeroinfl(articles ~ .  ., data = PhDPublications)
fm_zinb < zeroinfl(articles ~ .  ., data = PhDPublications, dist = "negbin")
Data used to study the effectiveness of a program.
data("ProgramEffectiveness")
A data frame containing 32 crosssection observations on 4 variables.
Factor with levels "increase"
and "decrease"
.
Gradepoint average.
Test score on economics test.
Factor. Did the individual participate in the program?
The data are taken form Spencer and Mazzeo (1980) who examined whether a new method of teaching economics significantly influenced performance in later economics courses.
Online complements to Greene (2003).
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Spector, L. and Mazzeo, M. (1980). Probit Analysis and Economic Education. Journal of Economic Education, 11, 37–44.
data("ProgramEffectiveness")
## Greene (2003), Table 21.1, col. "Probit"
fm_probit < glm(grade ~ average + testscore + participation,
data = ProgramEffectiveness, family = binomial(link = "probit"))
summary(fm_probit)
Crosssection data originating from the 1976 Panel Study of Income Dynamics (PSID), based on data for the previous year, 1975.
data("PSID1976")
A data frame containing 753 observations on 21 variables.
Factor. Did the individual participate in the labor force in 1975?
(This is essentially wage > 0
or hours > 0
.)
Wife's hours of work in 1975.
Number of children less than 6 years old in household.
Number of children between ages 6 and 18 in household.
Wife's age in years.
Wife's education in years.
Wife's average hourly wage, in 1975 dollars.
Wife's wage reported at the time of the 1976 interview
(not the same as the 1975 estimated wage). To use the subsample with this wage,
one needs to select 1975 workers with participation == "yes"
, then select only those
women with nonzero wage. Only 325 women work in 1975 and have a nonzero wage in 1976.
Husband's hours worked in 1975.
Husband's age in years.
Husband's education in years.
Husband's wage, in 1975 dollars.
Family income, in 1975 dollars. (This variable is used to construct the property income variable.)
Marginal tax rate facing the wife, and is taken from published federal tax tables (state and local income taxes are excluded). The taxable income on which this tax rate is calculated includes Social Security, if applicable to wife.
Wife's mother's educational attainment, in years.
Wife's father's educational attainment, in years.
Unemployment rate in county of residence, in percentage points. (This is taken from bracketed ranges.)
Factor. Does the individual live in a large city?
Actual years of wife's previous labor market experience.
Factor. Did the individual attend college?
Factor. Did the individual's husband attend college?
This data set is also known as the Mroz (1987) data.
Warning: Typical applications using these data employ the variable
wage
(aka earnings
in previous versions of the data) as the dependent variable.
The variable repwage
is the reported wage in a 1976 interview, named RPWG by Greene (2003).
Online complements to Greene (2003). Table F4.1.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
McCullough, B.D. (2004). Some Details of Nonlinear Estimation. In: Altman, M., Gill, J., and McDonald, M.P.: Numerical Issues in Statistical Computing for the Social Scientist. Hoboken, NJ: John Wiley, Ch. 8, 199–218.
Mroz, T.A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions. Econometrica, 55, 765–799.
Winkelmann, R., and Boes, S. (2009). Analysis of Microdata, 2nd ed. Berlin and Heidelberg: SpringerVerlag.
Wooldridge, J.M. (2002). Econometric Analysis of CrossSection and Panel Data. Cambridge, MA: MIT Press.
Greene2003
, WinkelmannBoes2009
## data and transformations
data("PSID1976")
PSID1976$kids < with(PSID1976, factor((youngkids + oldkids) > 0,
levels = c(FALSE, TRUE), labels = c("no", "yes")))
PSID1976$nwincome < with(PSID1976, (fincome  hours * wage)/1000)
PSID1976$partnum < as.numeric(PSID1976$participation)  1
###################
## Greene (2003) ##
###################
## Example 4.1, Table 4.2
## (reproduced in Example 7.1, Table 7.1)
gr_lm < lm(log(hours * wage) ~ age + I(age^2) + education + kids,
data = PSID1976, subset = participation == "yes")
summary(gr_lm)
vcov(gr_lm)
## Example 4.5
summary(gr_lm)
## or equivalently
gr_lm1 < lm(log(hours * wage) ~ 1, data = PSID1976, subset = participation == "yes")
anova(gr_lm1, gr_lm)
## Example 21.4, p. 681, and Tab. 21.3, p. 682
gr_probit1 < glm(participation ~ age + I(age^2) + I(fincome/10000) + education + kids,
data = PSID1976, family = binomial(link = "probit") )
gr_probit2 < glm(participation ~ age + I(age^2) + I(fincome/10000) + education,
data = PSID1976, family = binomial(link = "probit"))
gr_probit3 < glm(participation ~ kids/(age + I(age^2) + I(fincome/10000) + education),
data = PSID1976, family = binomial(link = "probit"))
## LR test of all coefficients
lrtest(gr_probit1)
## Chowtype test
lrtest(gr_probit2, gr_probit3)
## equivalently:
anova(gr_probit2, gr_probit3, test = "Chisq")
## Table 21.3
summary(gr_probit1)
## Example 22.8, Table 22.7, p. 786
library("sampleSelection")
gr_2step < selection(participation ~ age + I(age^2) + fincome + education + kids,
wage ~ experience + I(experience^2) + education + city,
data = PSID1976, method = "2step")
gr_ml < selection(participation ~ age + I(age^2) + fincome + education + kids,
wage ~ experience + I(experience^2) + education + city,
data = PSID1976, method = "ml")
gr_ols < lm(wage ~ experience + I(experience^2) + education + city,
data = PSID1976, subset = participation == "yes")
## NOTE: ML estimates agree with Greene, 5e errata.
## Standard errors are based on the Hessian (here), while Greene has BHHH/OPG.
#######################
## Wooldridge (2002) ##
#######################
## Table 15.1, p. 468
wl_lpm < lm(partnum ~ nwincome + education + experience + I(experience^2) +
age + youngkids + oldkids, data = PSID1976)
wl_logit < glm(participation ~ nwincome + education + experience + I(experience^2) +
age + youngkids + oldkids, family = binomial, data = PSID1976)
wl_probit < glm(participation ~ nwincome + education + experience + I(experience^2) +
age + youngkids + oldkids, family = binomial(link = "probit"), data = PSID1976)
## (same as Altman et al.)
## convenience functions
pseudoR2 < function(obj) 1  as.vector(logLik(obj)/logLik(update(obj, . ~ 1)))
misclass < function(obj) 1  sum(diag(prop.table(table(
model.response(model.frame(obj)), round(fitted(obj))))))
coeftest(wl_logit)
logLik(wl_logit)
misclass(wl_logit)
pseudoR2(wl_logit)
coeftest(wl_probit)
logLik(wl_probit)
misclass(wl_probit)
pseudoR2(wl_probit)
## Table 16.2, p. 528
form < hours ~ nwincome + education + experience + I(experience^2) + age + youngkids + oldkids
wl_ols < lm(form, data = PSID1976)
wl_tobit < tobit(form, data = PSID1976)
summary(wl_ols)
summary(wl_tobit)
#######################
## McCullough (2004) ##
#######################
## p. 203
mc_probit < glm(participation ~ nwincome + education + experience + I(experience^2) +
age + youngkids + oldkids, family = binomial(link = "probit"), data = PSID1976)
mc_tobit < tobit(hours ~ nwincome + education + experience + I(experience^2) + age +
youngkids + oldkids, data = PSID1976)
coeftest(mc_probit)
coeftest(mc_tobit)
coeftest(mc_tobit, vcov = vcovOPG)
Crosssection data originating from the Panel Study on Income Dynamics, 1982.
data("PSID1982")
A data frame containing 595 observations on 12 variables.
Years of fulltime work experience.
Weeks worked.
factor. Is the individual a whitecollar ("white"
) or bluecollar ("blue"
) worker?
factor. Does the individual work in a manufacturing industry?
factor. Does the individual reside in the South?
factor. Does the individual reside in a SMSA (standard metropolitan statistical area)?
factor. Is the individual married?
factor indicating gender.
factor. Is the individual's wage set by a union contract?
Years of education.
factor indicating ethnicity.
Is the individual AfricanAmerican ("afam"
) or not ("other"
)?
Wage.
PSID1982
is the crosssection for the year 1982 taken from a larger panel data set
PSID7682
for the years 1976–1982, originating from Cornwell and Rupert (1988).
Baltagi (2002) just uses the 1982 crosssection; hence PSID1982
is available as a
standalone data set because it was included in AER prior to the availability of the
full PSID7682
panel version.
The data is from Baltagi (2002).
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
Cornwell, C., and Rupert, P. (1988). Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators. Journal of Applied Econometrics, 3, 149–155.
data("PSID1982")
plot(density(PSID1982$wage, bw = "SJ"))
## Baltagi (2002), Table 4.1
earn_lm < lm(log(wage) ~ . + I(experience^2), data = PSID1982)
summary(earn_lm)
## Baltagi (2002), Table 13.1
union_lpm < lm(I(as.numeric(union)  1) ~ .  wage, data = PSID1982)
union_probit < glm(union ~ .  wage, data = PSID1982, family = binomial(link = "probit"))
union_logit < glm(union ~ .  wage, data = PSID1982, family = binomial)
## probit OK, logit and LPM rather different.
Panel data on earnings of 595 individuals for the years 1976–1982, originating from the Panel Study of Income Dynamics.
data("PSID7682")
A data frame containing 7 annual observations on 12 variables for 595 individuals.
Years of fulltime work experience.
Weeks worked.
factor. Is the individual a whitecollar ("white"
)
or bluecollar ("blue"
) worker?
factor. Does the individual work in a manufacturing industry?
factor. Does the individual reside in the South?
factor. Does the individual reside in a SMSA (standard metropolitan statistical area)?
factor. Is the individual married?
factor indicating gender.
factor. Is the individual's wage set by a union contract?
Years of education.
factor indicating ethnicity.
Is the individual AfricanAmerican ("afam"
) or not ("other"
)?
Wage.
factor indicating year.
factor indicating individual subject ID.
The data were originally analyzed by Cornwell and Rupert (1988) and employed for assessing various instrumentalvariable estimators for panel models (including the HausmanTaylor model). Baltagi and KhantiAkom (1990) reanalyzed the data, made corrections to the data and also suggest modeling with a different set of instruments.
PSID7682
is the version of the data as provided by Baltagi (2005),
or Greene (2008).
Baltagi (2002) just uses the crosssection for the year 1982,
i.e., subset(PSID7682, year == "1982")
. This is also available as
a standalone data set PSID1982
because it was included
in AER prior to the availability of the full PSID7682
panel
version.
Online complements to Baltagi (2005).
http://www.wiley.com/legacy/wileychi/baltagi3e/data_sets.html
Also provided in the online complements to Greene (2008), Table F9.1.
https://pages.stern.nyu.edu/~wgreene/Text/Edition6/tablelist6.htm
Baltagi, B.H., and KhantiAkom, S. (1990). On Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators. Journal of Applied Econometrics, 5, 401–406.
Baltagi, B.H. (2001). Econometric Analysis of Panel Data, 2nd ed. Chichester, UK: John Wiley.
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
Baltagi, B.H. (2005). Econometric Analysis of Panel Data, 3rd ed. Chichester, UK: John Wiley.
Cornwell, C., and Rupert, P. (1988). Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators. Journal of Applied Econometrics, 3, 149–155.
Greene, W.H. (2008). Econometric Analysis, 6th ed. Upper Saddle River, NJ: Prentice Hall.
data("PSID7682")
library("plm")
psid < pdata.frame(PSID7682, c("id", "year"))
## Baltagi & KhantiAkom, Table I, column "HT"
## original Cornwell & Rupert choice of exogenous variables
psid_ht1 < plm(log(wage) ~ weeks + south + smsa + married +
experience + I(experience^2) + occupation + industry + union + gender + ethnicity + education 
weeks + south + smsa + married + gender + ethnicity,
data = psid, model = "ht")
## Baltagi & KhantiAkom, Table II, column "HT"
## alternative choice of exogenous variables
psid_ht2 < plm(log(wage) ~ occupation + south + smsa + industry +
experience + I(experience^2) + weeks + married + union + gender + ethnicity + education 
occupation + south + smsa + industry + gender + ethnicity,
data = psid, model = "ht")
## Baltagi & KhantiAkom, Table III, column "HT"
## original choice of exogenous variables + time dummies
## (see also Baltagi, 2001, Table 7.1)
psid$time < psid$year
psid_ht3 < plm(log(wage) ~ weeks + south + smsa + married + experience + I(experience^2) +
occupation + industry + union + gender + ethnicity + education + time 
weeks + south + smsa + married + gender + ethnicity + time,
data = psid, model = "ht")
Crosssection data on the number of recreational boating trips to Lake Somerville, Texas, in 1980, based on a survey administered to 2,000 registered leisure boat owners in 23 counties in eastern Texas.
data("RecreationDemand")
A data frame containing 659 observations on 8 variables.
Number of recreational boating trips.
Facility's subjective quality ranking on a scale of 1 to 5.
factor. Was the individual engaged in waterskiing at the lake?
Annual household income of the respondent (in 1,000 USD).
factor. Did the individual pay an annual user fee at Lake Somerville?
Expenditure when visiting Lake Conroe (in USD).
Expenditure when visiting Lake Somerville (in USD).
Expenditure when visiting Lake Houston (in USD).
According to the original source (Seller, Stoll and Chavas, 1985, p. 168), the quality rating is on a scale from 1 to 5 and gives 0 for those who had not visited the lake. This explains the remarkably low mean for this variable, but also suggests that its treatment in various more recent publications is far from ideal. For consistency with other sources we handle the variable as a numerical variable, including the zeros.
Journal of Business & Economic Statistics Data Archive.
http://www.amstat.org/publications/jbes/upload/index.cfm?fuseaction=ViewArticles&pub=JBES&issue=964OCT
Cameron, A.C. and Trivedi, P.K. (1998). Regression Analysis of Count Data. Cambridge: Cambridge University Press.
Gurmu, S. and Trivedi, P.K. (1996). Excess Zeros in Count Models for Recreational Trips. Journal of Business & Economic Statistics, 14, 469–477.
Ozuna, T. and Gomez, I.A. (1995). Specification and Testing of Count Data Recreation Demand Functions. Empirical Economics, 20, 543–550.
Seller, C., Stoll, J.R. and Chavas, J.P. (1985). Validation of Empirical Measures of Welfare Change: A Comparison of Nonmarket Techniques. Land Economics, 61, 156–175.
data("RecreationDemand")
## Poisson model:
## Cameron and Trivedi (1998), Table 6.11
## Ozuna and Gomez (1995), Table 2, col. 3
fm_pois < glm(trips ~ ., data = RecreationDemand, family = poisson)
summary(fm_pois)
logLik(fm_pois)
coeftest(fm_pois, vcov = sandwich)
## Negbin model:
## Cameron and Trivedi (1998), Table 6.11
## Ozuna and Gomez (1995), Table 2, col. 5
library("MASS")
fm_nb < glm.nb(trips ~ ., data = RecreationDemand)
coeftest(fm_nb, vcov = vcovOPG)
## ZIP model:
## Cameron and Trivedi (1998), Table 6.11
library("pscl")
fm_zip < zeroinfl(trips ~ .  quality + income, data = RecreationDemand)
summary(fm_zip)
## Hurdle models
## Cameron and Trivedi (1998), Table 6.13
## poissonpoisson
fm_hp < hurdle(trips ~ ., data = RecreationDemand, dist = "poisson", zero = "poisson")
## negbinnegbin
fm_hnb < hurdle(trips ~ ., data = RecreationDemand, dist = "negbin", zero = "negbin")
## binomnegbin == geonegbin
fm_hgnb < hurdle(trips ~ ., data = RecreationDemand, dist = "negbin")
## Note: quasicomplete separation
with(RecreationDemand, table(trips > 0, userfee))
Crosssection data about resume, callback and employer information for 4,870 fictitious resumes.
data("ResumeNames")
A data frame containing 4,870 observations on 27 variables.
factor indicating applicant's first name.
factor indicating gender.
factor indicating ethnicity (i.e., Caucasiansounding vs. AfricanAmerican sounding first name).
factor indicating quality of resume.
factor. Was the applicant called back?
factor indicating city: Boston or Chicago.
number of jobs listed on resume.
number of years of work experience on the resume.
factor. Did the resume mention some honors?
factor. Did the resume mention some volunteering experience?
factor. Does the applicant have military experience?
factor. Does the resume have some employment holes?
factor. Does the resume mention some work experience while at school?
factor. Was the email address on the applicant's resume?
factor. Does the resume mention some computer skills?
factor. Does the resume mention some special skills?
factor. Does the applicant have a college degree or more?
factor indicating minimum experience requirement of the employer.
factor. Is the employer EOE (equal opportunity employment)?
factor indicating type of position wanted by employer.
factor. Does the ad mention some requirement for the job?
factor. Does the ad mention some experience requirement?
factor. Does the ad mention some communication skills requirement?
factor. Does the ad mention some educational requirement?
factor. Does the ad mention some computer skills requirement?
factor. Does the ad mention some organizational skills requirement?
factor indicating type of employer industry.
Crosssection data about resume, callback and employer information for 4,870 fictitious resumes sent in response to employment advertisements in Chicago and Boston in 2001, in a randomized controlled experiment conducted by Bertrand and Mullainathan (2004). The resumes contained information concerning the ethnicity of the applicant. Because ethnicity is not typically included on a resume, resumes were differentiated on the basis of socalled “Caucasian sounding names” (such as Emily Walsh or Gregory Baker) and “African American sounding names” (such as Lakisha Washington or Jamal Jones). A large collection of fictitious resumes were created and the presupposed ethnicity (based on the sound of the name) was randomly assigned to each resume. These resumes were sent to prospective employers to see which resumes generated a phone call from the prospective employer.
Online complements to Stock and Watson (2007).
Bertrand, M. and Mullainathan, S. (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94, 991–1013.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
data("ResumeNames")
summary(ResumeNames)
prop.table(xtabs(~ ethnicity + call, data = ResumeNames), 1)
Data on ship accidents.
data("ShipAccidents")
A data frame containing 40 observations on 5 ship types in 4 vintages and 2 service periods.
factor with levels "A"
to "E"
for the different ship types,
factor with levels "196064"
, "196569"
, "197074"
,
"197579"
for the periods of construction,
factor with levels "196074"
, "197579"
for the periods of
operation,
aggregate months of service,
number of damage incidents.
The data are from McCullagh and Nelder (1989, p. 205, Table 6.2) and were also used by Greene (2003, Ch. 21), see below.
There are five ships (observations 7, 15, 23, 31, 39) with an operation period
before the construction period, hence the variables service
and
incidents
are necessarily 0. An additional observation (34) has entries
representing accidentally empty cells (see McCullagh and Nelder, 1989, p. 205).
It is a bit unclear what exactly the above means. In any case, the models are fit
only to those observations with service > 0
.
Online complements to Greene (2003).
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2nd edition. London: Chapman & Hall.
data("ShipAccidents")
sa < subset(ShipAccidents, service > 0)
## Greene (2003), Table 21.20
## (see also McCullagh and Nelder, 1989, Table 6.3)
sa_full < glm(incidents ~ type + construction + operation, family = poisson,
data = sa, offset = log(service))
summary(sa_full)
sa_notype < glm(incidents ~ construction + operation, family = poisson,
data = sa, offset = log(service))
summary(sa_notype)
sa_noperiod < glm(incidents ~ type + operation, family = poisson,
data = sa, offset = log(service))
summary(sa_noperiod)
## model comparison
anova(sa_full, sa_notype, test = "Chisq")
anova(sa_full, sa_noperiod, test = "Chisq")
## test for overdispersion
dispersiontest(sa_full)
dispersiontest(sa_full, trafo = 2)
Statewide production data for primary metals industry (SIC 33).
data("SIC33")
A data frame containing 27 observations on 3 variables.
Value added.
Labor input.
Capital stock.
Online complements to Greene (2003). Table F6.1.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
data("SIC33", package = "AER")
## Example 6.2 in Greene (2003)
## Translog model
fm_tl < lm(output ~ labor + capital + I(0.5 * labor^2) + I(0.5 * capital^2) + I(labor * capital),
data = log(SIC33))
## CobbDouglas model
fm_cb < lm(output ~ labor + capital, data = log(SIC33))
## Table 6.2 in Greene (2003)
deviance(fm_tl)
deviance(fm_cb)
summary(fm_tl)
summary(fm_cb)
vcov(fm_tl)
vcov(fm_cb)
## CobbDouglas vs. Translog model
anova(fm_cb, fm_tl)
## hypothesis of constant returns
linearHypothesis(fm_cb, "labor + capital = 1")
## 3D Visualization
library("scatterplot3d")
s3d < scatterplot3d(log(SIC33)[,c(2, 3, 1)], pch = 16)
s3d$plane3d(fm_cb, lty.box = "solid", col = 4)
## Interactive 3D Visualization
if(require("rgl")) {
x < log(SIC33)[,2]
y < log(SIC33)[,3]
z < log(SIC33)[,1]
plot3d(x, y, z, type = "s", col = "gray", radius = 0.1)
x < seq(4.5, 7.5, by = 0.5)
y < seq(5.5, 10, by = 0.5)
z < outer(x, y, function(x, y) predict(fm_cb, data.frame(labor = x, capital = y)))
surface3d(x, y, z, color = "blue", alpha = 0.5, shininess = 128)
}
Estimation of the effect of workplace smoking bans on smoking of indoor workers.
data("SmokeBan")
A data frame containing 10,000 observations on 7 variables.
factor. Is the individual a current smoker?
factor. Is there a work area smoking ban?
age in years.
factor indicating highest education level attained: high school (hs) drop out, high school graduate, some college, college graduate, master's degree (or higher).
factor. Is the individual AfricanAmerican?
factor. Is the individual Hispanic?
factor indicating gender.
SmokeBank
is a crosssectional data set with observations on 10,000 indoor workers, which
is a subset of a 18,090observation data set collected as part of the National Health
Interview Survey in 1991 and then again (with different respondents) in 1993.
The data set contains information on whether individuals were, or were not, subject to a workplace
smoking ban, whether or not the individuals smoked and other individual characteristics.
Online complements to Stock and Watson (2007).
Evans, W. N., Farrelly, M.C., and Montgomery, E. (1999). Do Workplace Smoking Bans Reduce Smoking? American Economic Review, 89, 728–747.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
data("SmokeBan")
## proportion of nonsmokers increases with education
plot(smoker ~ education, data = SmokeBan)
## proportion of nonsmokers constant over age
plot(smoker ~ age, data = SmokeBan)
Trading sports cards: Does ownership increase the value of goods to consumers?
data("SportsCards")
A data frame containing 148 observations on 9 variables.
factor. Was the individual given good A or B (see below)?
factor. Was the individual a dealer?
number of trades per month reported by the individual.
number of years that the individual has been trading.
factor indicating income group (in 1000 USD).
factor indicating gender.
factor indicating highest level of education (8th grade or less, high school, 2year college, other posthigh school, 4year college or graduate school).
age in years.
factor. Did the individual trade the good he was given for the other good?
SportsCards
contains data from 148 randomly selected traders who attended
a trading card show in Orlando, Florida, in 1998. Traders were randomly given one
of two sports collectables, say good A or good B, that had approximately equal market
value. Those receiving good A were then given the option of trading good A for good B
with the experimenter; those receiving good B were given the option of trading good B
for good A with the experimenter. Good A was a ticket stub from the game that Cal Ripken Jr.
set the record for consecutive games played, and Good B was a souvenir
from the game that Nolan Ryan won his 300th game.
Online complements to Stock and Watson (2007).
List, J.A. (2003). Does Market Experience Eliminate Market Anomalies? Quarterly Journal of Economcis, 118, 41–71.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
data("SportsCards")
summary(SportsCards)
plot(trade ~ permonth, data = SportsCards, breaks = c(0, 5, 10, 20, 30, 70))
plot(trade ~ years, data = SportsCards, breaks = c(0, 5, 10, 20, 60))
The Project STAR public access data set, assessing the effect of reducing class size on test scores in the early grades.
data("STAR")
A data frame containing 11,598 observations on 47 variables.
factor indicating student's gender.
factor indicating student's ethnicity with levels
"cauc"
(Caucasian), "afam"
(AfricanAmerican), "asian"
(Asian),
"hispanic"
(Hispanic), "amindian"
(AmericanIndian) or "other"
.
student's birth quarter (of class yearqtr
).
factor indicating the STAR class type in kindergarten:
regular, small, or regularwithaide. NA
indicates that no STAR class was attended.
factor indicating the STAR class type in 1st grade:
regular, small, or regularwithaide. NA
indicates that no STAR class was attended.
factor indicating the STAR class type in 2nd grade:
regular, small, or regularwithaide. NA
indicates that no STAR class was attended.
factor indicating the STAR class type in 3rd grade:
regular, small, or regularwithaide. NA
indicates that no STAR class was attended.
total reading scaled score in kindergarten.
total reading scaled score in 1st grade.
total reading scaled score in 2nd grade.
total reading scaled score in 3rd grade.
total math scaled score in kindergarten.
total math scaled score in 1st grade.
total math scaled score in 2nd grade.
total math scaled score in 3rd grade.
factor indicating whether the student qualified for free lunch in kindergarten.
factor indicating whether the student qualified for free lunch in 1st grade.
factor indicating whether the student qualified for free lunch in 2nd grade.
factor indicating whether the student qualified for free lunch in 3rd grade.
factor indicating school type in kindergarten:
"innercity"
, "suburban"
, "rural"
or "urban"
.
factor indicating school type in 1st grade:
"innercity"
, "suburban"
, "rural"
or "urban"
.
factor indicating school type in 2nd grade:
"innercity"
, "suburban"
, "rural"
or "urban"
.
factor indicating school type in 3rd grade:
"innercity"
, "suburban"
, "rural"
or "urban"
.
factor indicating highest degree of kindergarten teacher:
"bachelor"
, "master"
, "specialist"
, or "master+"
.
factor indicating highest degree of 1st grade teacher:
"bachelor"
, "master"
, "specialist"
, or "phd"
.
factor indicating highest degree of 2nd grade teacher:
"bachelor"
, "master"
, "specialist"
, or "phd"
.
factor indicating highest degree of 3rd grade teacher:
"bachelor"
, "master"
, "specialist"
, or "phd"
.
factor indicating teacher's career ladder level in kindergarten: "level1"
,
"level2"
, "level3"
, "apprentice"
, "probation"
or "pending"
.
factor indicating teacher's career ladder level in 1st grade: "level1"
,
"level2"
, "level3"
, "apprentice"
, "probation"
or "noladder"
.
factor indicating teacher's career ladder level in 2nd grade: "level1"
,
"level2"
, "level3"
, "apprentice"
, "probation"
or "noladder"
.
factor indicating teacher's career ladder level in 3rd grade: "level1"
,
"level2"
, "level3"
, "apprentice"
, "probation"
or "noladder"
.
years of teacher's total teaching experience in kindergarten.
years of teacher's total teaching experience in 1st grade.
years of teacher's total teaching experience in 2nd grade.
years of teacher's total teaching experience in 3rd grade.
factor indicating teacher's ethnicity in kindergarten with levels
"cauc"
(Caucasian) or "afam"
(AfricanAmerican).
factor indicating teacher's ethnicity in 1st grade with levels
"cauc"
(Caucasian) or "afam"
(AfricanAmerican).
factor indicating teacher's ethnicity in 2nd grade with levels
"cauc"
(Caucasian) or "afam"
(AfricanAmerican).
factor indicating teacher's ethnicity in 3rd grade with levels
"cauc"
(Caucasian), "afam"
(AfricanAmerican), or "asian"
(Asian).
factor indicating school system ID in kindergarten.
factor indicating school system ID in 1st grade.
factor indicating school system ID in 2nd grade.
factor indicating school system ID in 3rd grade.
factor indicating school ID in kindergarten.
factor indicating school ID in 1st grade.
factor indicating school ID in 2nd grade.
factor indicating school ID in 3rd grade.
Project STAR (Student/Teacher Achievement Ratio) was a fouryear longitudinal classsize study funded by the Tennessee General Assembly and conducted in the late 1980s by the State Department of Education. Over 7,000 students in 79 schools were randomly assigned into one of three interventions: small class (13 to 17 students per teacher), regular class (22 to 25 students per teacher), and regularwithaide class (22 to 25 students with a fulltime teacher's aide). Classroom teachers were also randomly assigned to the classes they would teach. The interventions were initiated as the students entered school in kindergarten and continued through third grade.
The Project STAR public access data set contains data on test scores, treatment groups, and student and teacher characteristics for the four years of the experiment, from academic year 1985–1986 to academic year 1988–1989. The test score data analyzed in this chapter are the sum of the scores on the math and reading portion of the Stanford Achievement Test.
Stock and Watson (2007) obtained the data set from the Project STAR Web site.
The data is provided in wide format. Reshaping it into long format
is illustrated below. Note that the levels of the degree
, ladder
and tethnicity
variables differ slightly between kindergarten
and higher grades.
Online complements to Stock and Watson (2007).
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
data("STAR")
## Stock and Watson, p. 488
fmk < lm(I(readk + mathk) ~ stark, data = STAR)
fm1 < lm(I(read1 + math1) ~ star1, data = STAR)
fm2 < lm(I(read2 + math2) ~ star2, data = STAR)
fm3 < lm(I(read3 + math3) ~ star3, data = STAR)
coeftest(fm3, vcov = sandwich)
plot(I(read3 + math3) ~ star3, data = STAR)
## Stock and Watson, p. 489
fmke < lm(I(readk + mathk) ~ stark + experiencek, data = STAR)
coeftest(fmke, vcov = sandwich)
## reshape data from wide into long format
## 1. variables and their levels
nam < c("star", "read", "math", "lunch", "school", "degree", "ladder",
"experience", "tethnicity", "system", "schoolid")
lev < c("k", "1", "2", "3")
## 2. reshaping
star < reshape(STAR, idvar = "id", ids = row.names(STAR),
times = lev, timevar = "grade", direction = "long",
varying = lapply(nam, function(x) paste(x, lev, sep = "")))
## 3. improve variable names and type
names(star)[5:15] < nam
star$id < factor(star$id)
star$grade < factor(star$grade, levels = lev, labels = c("kindergarten", "1st", "2nd", "3rd"))
rm(nam, lev)
## fit a single model nested in grade (equivalent to fmk, fm1, fm2, fmk)
fm < lm(I(read + math) ~ 0 + grade/star, data = star)
coeftest(fm, vcov = sandwich)
## visualization
library("lattice")
bwplot(I(read + math) ~ star  grade, data = star)
This manual page collects a list of examples from the book. Some solutions might not be exact and the list is certainly not complete. If you have suggestions for improvement (preferably in the form of code), please contact the package maintainer.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
CartelStability
, CASchools
, CigarettesSW
,
CollegeDistance
, CPSSW04
, CPSSW3
, CPSSW8
,
CPSSW9298
, CPSSW9204
, CPSSWEducation
,
Fatalities
, Fertility
, Fertility2
, FrozenJuice
,
GrowthSW
, Guns
, HealthInsurance
, HMDA
,
Journals
, MASchools
, NYSESW
, ResumeNames
,
SmokeBan
, SportsCards
, STAR
, TeachingRatings
,
USMacroSW
, USMacroSWM
, USMacroSWQ
, USSeatBelts
,
USStocksSW
, WeakInstrument
###############################
## Current Population Survey ##
###############################
## p. 165
data("CPSSWEducation", package = "AER")
plot(earnings ~ education, data = CPSSWEducation)
fm < lm(earnings ~ education, data = CPSSWEducation)
coeftest(fm, vcov = sandwich)
abline(fm)
############################
## California test scores ##
############################
## data and transformations
data("CASchools", package = "AER")
CASchools < transform(CASchools,
stratio = students/teachers,
score = (math + read)/2
)
## p. 152
fm1 < lm(score ~ stratio, data = CASchools)
coeftest(fm1, vcov = sandwich)
## p. 159
fm2 < lm(score ~ I(stratio < 20), data = CASchools)
## p. 199
fm3 < lm(score ~ stratio + english, data = CASchools)
## p. 224
fm4 < lm(score ~ stratio + expenditure + english, data = CASchools)
## Table 7.1, p. 242 (numbers refer to columns)
fmc3 < lm(score ~ stratio + english + lunch, data = CASchools)
fmc4 < lm(score ~ stratio + english + calworks, data = CASchools)
fmc5 < lm(score ~ stratio + english + lunch + calworks, data = CASchools)
## Equation 8.2, p. 258
fmquad < lm(score ~ income + I(income^2), data = CASchools)
## Equation 8.11, p. 266
fmcub < lm(score ~ income + I(income^2) + I(income^3), data = CASchools)
## Equation 8.23, p. 272
fmloglog < lm(log(score) ~ log(income), data = CASchools)
## Equation 8.24, p. 274
fmloglin < lm(log(score) ~ income, data = CASchools)
## Equation 8.26, p. 275
fmlinlogcub < lm(score ~ log(income) + I(log(income)^2) + I(log(income)^3),
data = CASchools)
## Table 8.3, p. 292 (numbers refer to columns)
fmc2 < lm(score ~ stratio + english + lunch + log(income), data = CASchools)
fmc7 < lm(score ~ stratio + I(stratio^2) + I(stratio^3) + english + lunch + log(income),
data = CASchools)
#####################################
## Economics journal Subscriptions ##
#####################################
## data and transformed variables
data("Journals", package = "AER")
journals < Journals[, c("subs", "price")]
journals$citeprice < Journals$price/Journals$citations
journals$age < 2000  Journals$foundingyear
journals$chars < Journals$charpp*Journals$pages/10^6
## Figure 8.9 (a) and (b)
plot(subs ~ citeprice, data = journals, pch = 19)
plot(log(subs) ~ log(citeprice), data = journals, pch = 19)
fm1 < lm(log(subs) ~ log(citeprice), data = journals)
abline(fm1)
## Table 8.2, use HC1 for comparability with Stata
fm1 < lm(subs ~ citeprice, data = log(journals))
fm2 < lm(subs ~ citeprice + age + chars, data = log(journals))
fm3 < lm(subs ~ citeprice + I(citeprice^2) + I(citeprice^3) +
age + I(age * citeprice) + chars, data = log(journals))
fm4 < lm(subs ~ citeprice + age + I(age * citeprice) + chars, data = log(journals))
coeftest(fm1, vcov = vcovHC(fm1, type = "HC1"))
coeftest(fm2, vcov = vcovHC(fm2, type = "HC1"))
coeftest(fm3, vcov = vcovHC(fm3, type = "HC1"))
coeftest(fm4, vcov = vcovHC(fm4, type = "HC1"))
waldtest(fm3, fm4, vcov = vcovHC(fm3, type = "HC1"))
###############################
## Massachusetts test scores ##
###############################
## compare Massachusetts with California
data("MASchools", package = "AER")
data("CASchools", package = "AER")
CASchools < transform(CASchools,
stratio = students/teachers,
score4 = (math + read)/2
)
## parts of Table 9.1, p. 330
vars < c("score4", "stratio", "english", "lunch", "income")
cbind(
CA_mean = sapply(CASchools[, vars], mean),
CA_sd = sapply(CASchools[, vars], sd),
MA_mean = sapply(MASchools[, vars], mean),
MA_sd = sapply(MASchools[, vars], sd))
## Table 9.2, pp. 332333, numbers refer to columns
MASchools < transform(MASchools, higheng = english > median(english))
fm1 < lm(score4 ~ stratio, data = MASchools)
fm2 < lm(score4 ~ stratio + english + lunch + log(income), data = MASchools)
fm3 < lm(score4 ~ stratio + english + lunch + income + I(income^2) + I(income^3),
data = MASchools)
fm4 < lm(score4 ~ stratio + I(stratio^2) + I(stratio^3) + english + lunch +
income + I(income^2) + I(income^3), data = MASchools)
fm5 < lm(score4 ~ stratio + higheng + I(higheng * stratio) + lunch +
income + I(income^2) + I(income^3), data = MASchools)
fm6 < lm(score4 ~ stratio + lunch + income + I(income^2) + I(income^3),
data = MASchools)
## for comparability with Stata use HC1 below
coeftest(fm1, vcov = vcovHC(fm1, type = "HC1"))
coeftest(fm2, vcov = vcovHC(fm2, type = "HC1"))
coeftest(fm3, vcov = vcovHC(fm3, type = "HC1"))
coeftest(fm4, vcov = vcovHC(fm4, type = "HC1"))
coeftest(fm5, vcov = vcovHC(fm5, type = "HC1"))
coeftest(fm6, vcov = vcovHC(fm6, type = "HC1"))
## Testing exclusion of groups of variables
fm3r < update(fm3, . ~ .  I(income^2)  I(income^3))
waldtest(fm3, fm3r, vcov = vcovHC(fm3, type = "HC1"))
fm4r_str1 < update(fm4, . ~ .  stratio  I(stratio^2)  I(stratio^3))
waldtest(fm4, fm4r_str1, vcov = vcovHC(fm4, type = "HC1"))
fm4r_str2 < update(fm4, . ~ .  I(stratio^2)  I(stratio^3))
waldtest(fm4, fm4r_str2, vcov = vcovHC(fm4, type = "HC1"))
fm4r_inc < update(fm4, . ~ .  I(income^2)  I(income^3))
waldtest(fm4, fm4r_inc, vcov = vcovHC(fm4, type = "HC1"))
fm5r_str < update(fm5, . ~ .  stratio  I(higheng * stratio))
waldtest(fm5, fm5r_str, vcov = vcovHC(fm5, type = "HC1"))
fm5r_inc < update(fm5, . ~ .  I(income^2)  I(income^3))
waldtest(fm5, fm5r_inc, vcov = vcovHC(fm5, type = "HC1"))
fm5r_high < update(fm5, . ~ .  higheng  I(higheng * stratio))
waldtest(fm5, fm5r_high, vcov = vcovHC(fm5, type = "HC1"))
fm6r_inc < update(fm6, . ~ .  I(income^2)  I(income^3))
waldtest(fm6, fm6r_inc, vcov = vcovHC(fm6, type = "HC1"))
##################################
## Home mortgage disclosure act ##
##################################
## data
data("HMDA", package = "AER")
## 11.1, 11.3, 11.7, 11.8 and 11.10, pp. 387395
fm1 < lm(I(as.numeric(deny)  1) ~ pirat, data = HMDA)
fm2 < lm(I(as.numeric(deny)  1) ~ pirat + afam, data = HMDA)
fm3 < glm(deny ~ pirat, family = binomial(link = "probit"), data = HMDA)
fm4 < glm(deny ~ pirat + afam, family = binomial(link = "probit"), data = HMDA)
fm5 < glm(deny ~ pirat + afam, family = binomial(link = "logit"), data = HMDA)
## Table 11.1, p. 401
mean(HMDA$pirat)
mean(HMDA$hirat)
mean(HMDA$lvrat)
mean(as.numeric(HMDA$chist))
mean(as.numeric(HMDA$mhist))
mean(as.numeric(HMDA$phist)1)
prop.table(table(HMDA$insurance))
prop.table(table(HMDA$selfemp))
prop.table(table(HMDA$single))
prop.table(table(HMDA$hschool))
mean(HMDA$unemp)
prop.table(table(HMDA$condomin))
prop.table(table(HMDA$afam))
prop.table(table(HMDA$deny))
## Table 11.2, pp. 403404, numbers refer to columns
HMDA$lvrat < factor(ifelse(HMDA$lvrat < 0.8, "low",
ifelse(HMDA$lvrat >= 0.8 & HMDA$lvrat <= 0.95, "medium", "high")),
levels = c("low", "medium", "high"))
HMDA$mhist < as.numeric(HMDA$mhist)
HMDA$chist < as.numeric(HMDA$chist)
fm1 < lm(I(as.numeric(deny)  1) ~ afam + pirat + hirat + lvrat + chist + mhist +
phist + insurance + selfemp, data = HMDA)
fm2 < glm(deny ~ afam + pirat + hirat + lvrat + chist + mhist + phist + insurance +
selfemp, family = binomial, data = HMDA)
fm3 < glm(deny ~ afam + pirat + hirat + lvrat + chist + mhist + phist + insurance +
selfemp, family = binomial(link = "probit"), data = HMDA)
fm4 < glm(deny ~ afam + pirat + hirat + lvrat + chist + mhist + phist + insurance +
selfemp + single + hschool + unemp, family = binomial(link = "probit"), data = HMDA)
fm5 < glm(deny ~ afam + pirat + hirat + lvrat + chist + mhist + phist + insurance +
selfemp + single + hschool + unemp + condomin +
I(mhist==3) + I(mhist==4) + I(chist==3) + I(chist==4) + I(chist==5) + I(chist==6),
family = binomial(link = "probit"), data = HMDA)
fm6 < glm(deny ~ afam * (pirat + hirat) + lvrat + chist + mhist + phist + insurance +
selfemp + single + hschool + unemp, family = binomial(link = "probit"), data = HMDA)
coeftest(fm1, vcov = sandwich)
fm4r < update(fm4, . ~ .  single  hschool  unemp)
waldtest(fm4, fm4r, vcov = sandwich)
fm5r < update(fm5, . ~ .  single  hschool  unemp)
waldtest(fm5, fm5r, vcov = sandwich)
fm6r < update(fm6, . ~ .  single  hschool  unemp)
waldtest(fm6, fm6r, vcov = sandwich)
fm5r2 < update(fm5, . ~ .  I(mhist==3)  I(mhist==4)  I(chist==3)  I(chist==4) 
I(chist==5)  I(chist==6))
waldtest(fm5, fm5r2, vcov = sandwich)
fm6r2 < update(fm6, . ~ .  afam * (pirat + hirat) + pirat + hirat)
waldtest(fm6, fm6r2, vcov = sandwich)
fm6r3 < update(fm6, . ~ .  afam * (pirat + hirat) + pirat + hirat + afam)
waldtest(fm6, fm6r3, vcov = sandwich)
#########################################################
## Shooting down the "More Guns Less Crime" hypothesis ##
#########################################################
## data
data("Guns", package = "AER")
## Empirical Exercise 10.1
fm1 < lm(log(violent) ~ law, data = Guns)
fm2 < lm(log(violent) ~ law + prisoners + density + income +
population + afam + cauc + male, data = Guns)
fm3 < lm(log(violent) ~ law + prisoners + density + income +
population + afam + cauc + male + state, data = Guns)
fm4 < lm(log(violent) ~ law + prisoners + density + income +
population + afam + cauc + male + state + year, data = Guns)
coeftest(fm1, vcov = sandwich)
coeftest(fm2, vcov = sandwich)
printCoefmat(coeftest(fm3, vcov = sandwich)[1:9,])
printCoefmat(coeftest(fm4, vcov = sandwich)[1:9,])
###########################
## US traffic fatalities ##
###########################
## data from Stock and Watson (2007)
data("Fatalities", package = "AER")
Fatalities < transform(Fatalities,
## fatality rate (number of traffic deaths per 10,000 people living in that state in that year)
frate = fatal/pop * 10000,
## add discretized version of minimum legal drinking age
drinkagec = relevel(cut(drinkage, breaks = 18:22, include.lowest = TRUE, right = FALSE), ref = 4),
## any punishment?
punish = factor(jail == "yes"  service == "yes", labels = c("no", "yes"))
)
## plm package
library("plm")
## for comparability with Stata we use HC1 below
## p. 351, Eq. (10.2)
f1982 < subset(Fatalities, year == "1982")
fm_1982 < lm(frate ~ beertax, data = f1982)
coeftest(fm_1982, vcov = vcovHC(fm_1982, type = "HC1"))
## p. 353, Eq. (10.3)
f1988 < subset(Fatalities, year == "1988")
fm_1988 < lm(frate ~ beertax, data = f1988)
coeftest(fm_1988, vcov = vcovHC(fm_1988, type = "HC1"))
## pp. 355, Eq. (10.8)
fm_diff < lm(I(f1988$frate  f1982$frate) ~ I(f1988$beertax  f1982$beertax))
coeftest(fm_diff, vcov = vcovHC(fm_diff, type = "HC1"))
## pp. 360, Eq. (10.15)
## (1) via formula
fm_sfe < lm(frate ~ beertax + state  1, data = Fatalities)
## (2) by hand
fat < with(Fatalities,
data.frame(frates = frate  ave(frate, state),
beertaxs = beertax  ave(beertax, state)))
fm_sfe2 < lm(frates ~ beertaxs  1, data = fat)
## (3) via plm()
fm_sfe3 < plm(frate ~ beertax, data = Fatalities,
index = c("state", "year"), model = "within")
coeftest(fm_sfe, vcov = vcovHC(fm_sfe, type = "HC1"))[1,]
## uses different df in sd and pvalue
coeftest(fm_sfe2, vcov = vcovHC(fm_sfe2, type = "HC1"))[1,]
## uses different df in pvalue
coeftest(fm_sfe3, vcov = vcovHC(fm_sfe3, type = "HC1", method = "white1"))[1,]
## pp. 363, Eq. (10.21)
## via lm()
fm_stfe < lm(frate ~ beertax + state + year  1, data = Fatalities)
coeftest(fm_stfe, vcov = vcovHC(fm_stfe, type = "HC1"))[1,]
## via plm()
fm_stfe2 < plm(frate ~ beertax, data = Fatalities,
index = c("state", "year"), model = "within", effect = "twoways")
coeftest(fm_stfe2, vcov = vcovHC) ## different
## p. 368, Table 10.1, numbers refer to cols.
fm1 < plm(frate ~ beertax, data = Fatalities, index = c("state", "year"),
model = "pooling")
fm2 < plm(frate ~ beertax, data = Fatalities, index = c("state", "year"),
model = "within")
fm3 < plm(frate ~ beertax, data = Fatalities, index = c("state", "year"),
model = "within", effect = "twoways")
fm4 < plm(frate ~ beertax + drinkagec + jail + service + miles + unemp + log(income),
data = Fatalities, index = c("state", "year"), model = "within", effect = "twoways")
fm5 < plm(frate ~ beertax + drinkagec + jail + service + miles,
data = Fatalities, index = c("state", "year"), model = "within", effect = "twoways")
fm6 < plm(frate ~ beertax + drinkage + punish + miles + unemp + log(income),
data = Fatalities, index = c("state", "year"), model = "within", effect = "twoways")
fm7 < plm(frate ~ beertax + drinkagec + jail + service + miles + unemp + log(income),
data = Fatalities, index = c("state", "year"), model = "within", effect = "twoways")
## summaries not too close, s.e.s generally too small
coeftest(fm1, vcov = vcovHC)
coeftest(fm2, vcov = vcovHC)
coeftest(fm3, vcov = vcovHC)
coeftest(fm4, vcov = vcovHC)
coeftest(fm5, vcov = vcovHC)
coeftest(fm6, vcov = vcovHC)
coeftest(fm7, vcov = vcovHC)
######################################
## Cigarette consumption panel data ##
######################################
## data and transformations
data("CigarettesSW", package = "AER")
CigarettesSW < transform(CigarettesSW,
rprice = price/cpi,
rincome = income/population/cpi,
rtax = tax/cpi,
rtdiff = (taxs  tax)/cpi
)
c1985 < subset(CigarettesSW, year == "1985")
c1995 < subset(CigarettesSW, year == "1995")
## convenience function: HC1 covariances
hc1 < function(x) vcovHC(x, type = "HC1")
## Equations 12.912.11
fm_s1 < lm(log(rprice) ~ rtdiff, data = c1995)
coeftest(fm_s1, vcov = hc1)
fm_s2 < lm(log(packs) ~ fitted(fm_s1), data = c1995)
fm_ivreg < ivreg(log(packs) ~ log(rprice)  rtdiff, data = c1995)
coeftest(fm_ivreg, vcov = hc1)
## Equation 12.15
fm_ivreg2 < ivreg(log(packs) ~ log(rprice) + log(rincome)  log(rincome) + rtdiff, data = c1995)
coeftest(fm_ivreg2, vcov = hc1)
## Equation 12.16
fm_ivreg3 < ivreg(log(packs) ~ log(rprice) + log(rincome)  log(rincome) + rtdiff + rtax,
data = c1995)
coeftest(fm_ivreg3, vcov = hc1)
## Table 12.1, p. 448
ydiff < log(c1995$packs)  log(c1985$packs)
pricediff < log(c1995$price/c1995$cpi)  log(c1985$price/c1985$cpi)
incdiff < log(c1995$income/c1995$population/c1995$cpi) 
log(c1985$income/c1985$population/c1985$cpi)
taxsdiff < (c1995$taxs  c1995$tax)/c1995$cpi  (c1985$taxs  c1985$tax)/c1985$cpi
taxdiff < c1995$tax/c1995$cpi  c1985$tax/c1985$cpi
fm_diff1 < ivreg(ydiff ~ pricediff + incdiff  incdiff + taxsdiff)
fm_diff2 < ivreg(ydiff ~ pricediff + incdiff  incdiff + taxdiff)
fm_diff3 < ivreg(ydiff ~ pricediff + incdiff  incdiff + taxsdiff + taxdiff)
coeftest(fm_diff1, vcov = hc1)
coeftest(fm_diff2, vcov = hc1)
coeftest(fm_diff3, vcov = hc1)
## checking instrument relevance
fm_rel1 < lm(pricediff ~ taxsdiff + incdiff)
fm_rel2 < lm(pricediff ~ taxdiff + incdiff)
fm_rel3 < lm(pricediff ~ incdiff + taxsdiff + taxdiff)
linearHypothesis(fm_rel1, "taxsdiff = 0", vcov = hc1)
linearHypothesis(fm_rel2, "taxdiff = 0", vcov = hc1)
linearHypothesis(fm_rel3, c("taxsdiff = 0", "taxdiff = 0"), vcov = hc1)
## testing overidentifying restrictions (J test)
fm_or < lm(residuals(fm_diff3) ~ incdiff + taxsdiff + taxdiff)
(fm_or_test < linearHypothesis(fm_or, c("taxsdiff = 0", "taxdiff = 0"), test = "Chisq"))
## warning: df (and hence pvalue) invalid above.
## correct df: # instruments  # endogenous variables
pchisq(fm_or_test[2,5], df.residual(fm_diff3)  df.residual(fm_or), lower.tail = FALSE)
#####################################################
## Project STAR: Studentteacher achievement ratio ##
#####################################################
## data
data("STAR", package = "AER")
## p. 488
fmk < lm(I(readk + mathk) ~ stark, data = STAR)
fm1 < lm(I(read1 + math1) ~ star1, data = STAR)
fm2 < lm(I(read2 + math2) ~ star2, data = STAR)
fm3 < lm(I(read3 + math3) ~ star3, data = STAR)
coeftest(fm3, vcov = sandwich)
## p. 489
fmke < lm(I(readk + mathk) ~ stark + experiencek, data = STAR)
coeftest(fmke, vcov = sandwich)
## equivalently:
##  reshape data from wide into long format
##  fit a single model nested in grade
## (a) variables and their levels
nam < c("star", "read", "math", "lunch", "school", "degree", "ladder",
"experience", "tethnicity", "system", "schoolid")
lev < c("k", "1", "2", "3")
## (b) reshaping
star < reshape(STAR, idvar = "id", ids = row.names(STAR),
times = lev, timevar = "grade", direction = "long",
varying = lapply(nam, function(x) paste(x, lev, sep = "")))
## (c) improve variable names and type
names(star)[5:15] < nam
star$id < factor(star$id)
star$grade < factor(star$grade, levels = lev,
labels = c("kindergarten", "1st", "2nd", "3rd"))
rm(nam, lev)
## (d) model fitting
fm < lm(I(read + math) ~ 0 + grade/star, data = star)
#################################################
## Quarterly US macroeconomic data (19572005) ##
#################################################
## data
data("USMacroSW", package = "AER")
library("dynlm")
usm < ts.intersect(USMacroSW, 4 * 100 * diff(log(USMacroSW[, "cpi"])))
colnames(usm) < c(colnames(USMacroSW), "infl")
## Equation 14.7, p. 536
fm_ar1 < dynlm(d(infl) ~ L(d(infl)),
data = usm, start = c(1962,1), end = c(2004,4))
coeftest(fm_ar1, vcov = sandwich)
## Equation 14.13, p. 538
fm_ar4 < dynlm(d(infl) ~ L(d(infl), 1:4),
data = usm, start = c(1962,1), end = c(2004,4))
coeftest(fm_ar4, vcov = sandwich)
## Equation 14.16, p. 542
fm_adl41 < dynlm(d(infl) ~ L(d(infl), 1:4) + L(unemp),
data = usm, start = c(1962,1), end = c(2004,4))
coeftest(fm_adl41, vcov = sandwich)
## Equation 14.17, p. 542
fm_adl44 < dynlm(d(infl) ~ L(d(infl), 1:4) + L(unemp, 1:4),
data = usm, start = c(1962,1), end = c(2004,4))
coeftest(fm_adl44, vcov = sandwich)
## Granger causality test mentioned on p. 547
waldtest(fm_ar4, fm_adl44, vcov = sandwich)
## Equation 14.28, p. 559
fm_sp1 < dynlm(infl ~ log(gdpjp), start = c(1965,1), end = c(1981,4), data = usm)
coeftest(fm_sp1, vcov = sandwich)
## Equation 14.29, p. 559
fm_sp2 < dynlm(infl ~ log(gdpjp), start = c(1982,1), end = c(2004,4), data = usm)
coeftest(fm_sp2, vcov = sandwich)
## Equation 14.34, p. 563: ADF by hand
fm_adf < dynlm(d(infl) ~ L(infl) + L(d(infl), 1:4),
data = usm, start = c(1962,1), end = c(2004,4))
coeftest(fm_adf)
## Figure 14.5, p. 570
## SW perform partial break test of unemp coefs
## here full model is used
library("strucchange")
infl < usm[, "infl"]
unemp < usm[, "unemp"]
usm < ts.intersect(diff(infl), lag(diff(infl), k = 1), lag(diff(infl), k = 2),
lag(diff(infl), k = 3), lag(diff(infl), k = 4), lag(unemp, k = 1),
lag(unemp, k = 2), lag(unemp, k = 3), lag(unemp, k = 4))
colnames(usm) < c("dinfl", paste("dinfl", 1:4, sep = ""), paste("unemp", 1:4, sep = ""))
usm < window(usm, start = c(1962, 1), end = c(2004, 4))
fs < Fstats(dinfl ~ ., data = usm)
sctest(fs, type = "supF")
plot(fs)
## alternatively: reuse fm_adl44
mf < model.frame(fm_adl44)
mf < ts(as.matrix(mf), start = c(1962, 1), freq = 4)
colnames(mf) < c("y", paste("x", 1:8, sep = ""))
ff < as.formula(paste("y", "~", paste("x", 1:8, sep = "", collapse = " + ")))
fs < Fstats(ff, data = mf, from = 0.1)
plot(fs)
lines(boundary(fs, alpha = 0.01), lty = 2, col = 2)
lines(boundary(fs, alpha = 0.1), lty = 3, col = 2)
##########################################
## Monthly US stock returns (19312002) ##
##########################################
## package and data
library("dynlm")
data("USStocksSW", package = "AER")
## Table 14.3, p. 540
fm1 < dynlm(returns ~ L(returns), data = USStocksSW, start = c(1960,1))
coeftest(fm1, vcov = sandwich)
fm2 < dynlm(returns ~ L(returns, 1:2), data = USStocksSW, start = c(1960,1))
waldtest(fm2, vcov = sandwich)
fm3 < dynlm(returns ~ L(returns, 1:4), data = USStocksSW, start = c(1960,1))
waldtest(fm3, vcov = sandwich)
## Table 14.7, p. 574
fm4 < dynlm(returns ~ L(returns) + L(d(dividend)),
data = USStocksSW, start = c(1960, 1))
fm5 < dynlm(returns ~ L(returns, 1:2) + L(d(dividend), 1:2),
data = USStocksSW, start = c(1960, 1))
fm6 < dynlm(returns ~ L(returns) + L(dividend),
data = USStocksSW, start = c(1960, 1))
##################################
## Price of frozen orange juice ##
##################################
## load data
data("FrozenJuice")
## Stock and Watson, p. 594
library("dynlm")
fm_dyn < dynlm(d(100 * log(price/ppi)) ~ fdd, data = FrozenJuice)
coeftest(fm_dyn, vcov = vcovHC(fm_dyn, type = "HC1"))
## equivalently, returns can be computed 'by hand'
## (reducing the complexity of the formula notation)
fj < ts.union(fdd = FrozenJuice[, "fdd"],
ret = 100 * diff(log(FrozenJuice[,"price"]/FrozenJuice[,"ppi"])))
fm_dyn < dynlm(ret ~ fdd, data = fj)
## Stock and Watson, p. 595
fm_dl < dynlm(ret ~ L(fdd, 0:6), data = fj)
coeftest(fm_dl, vcov = vcovHC(fm_dl, type = "HC1"))
## Stock and Watson, Table 15.1, p. 620, numbers refer to columns
## (1) Dynamic Multipliers
fm1 < dynlm(ret ~ L(fdd, 0:18), data = fj)
coeftest(fm1, vcov = NeweyWest(fm1, lag = 7, prewhite = FALSE))
## (2) Cumulative Multipliers
fm2 < dynlm(ret ~ L(d(fdd), 0:17) + L(fdd, 18), data = fj)
coeftest(fm2, vcov = NeweyWest(fm2, lag = 7, prewhite = FALSE))
## (3) Cumulative Multipliers, more lags in NW
coeftest(fm2, vcov = NeweyWest(fm2, lag = 14, prewhite = FALSE))
## (4) Cumulative Multipliers with monthly indicators
fm4 < dynlm(ret ~ L(d(fdd), 0:17) + L(fdd, 18) + season(fdd), data = fj)
coeftest(fm4, vcov = NeweyWest(fm4, lag = 7, prewhite = FALSE))
## monthly indicators needed?
fm4r < update(fm4, . ~ .  season(fdd))
waldtest(fm4, fm4r, vcov= NeweyWest(fm4, lag = 7, prewhite = FALSE)) ## close ...
#############################################
## New York Stock Exchange composite index ##
#############################################
## returns
data("NYSESW", package = "AER")
ret < 100 * diff(log(NYSESW))
plot(ret)
## fit GARCH(1,1)
library("tseries")
fm < garch(coredata(ret))
Data on the duration of strikes in US manufacturing industries, 1968–1976.
data("StrikeDuration")
A data frame containing 62 observations on 2 variables for the period 1968–1976.
strike duration in days.
unanticipated output (a measure of unanticipated aggregate industrial production net of seasonal and trend components).
The original data provided by Kennan (1985) are on a monthly basis, for the period 1968(1) through 1976(12). Greene (2003) only provides the June data for each year. Also, the duration for observation 36 is given as 3 by Greene while Kennan has 2. Here we use Greene's version.
uoutput
is the residual from a regression of the logarithm of industrial production in manufacturing on time, time squared, and monthly dummy variables.
Online complements to Greene (2003).
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Kennan, J. (1985). The Duration of Contract Strikes in US Manufacturing. Journal of Econometrics, 28, 5–28.
data("StrikeDuration")
library("MASS")
## Greene (2003), Table 22.10
fit_exp < fitdistr(StrikeDuration$duration, "exponential")
fit_wei < fitdistr(StrikeDuration$duration, "weibull")
fit_wei$estimate[2]^(1)
fit_lnorm < fitdistr(StrikeDuration$duration, "lognormal")
1/fit_lnorm$estimate[2]
exp(fit_lnorm$estimate[1])
## Weibull and lognormal distribution have
## different parameterizations, see Greene p. 794
## Greene (2003), Example 22.10
library("survival")
fm_wei < survreg(Surv(duration) ~ uoutput, dist = "weibull", data = StrikeDuration)
summary(fm_wei)
Methods to standard generics for instrumentalvariable regressions
fitted by ivreg
.
## S3 method for class 'ivreg'
summary(object, vcov. = NULL, df = NULL, diagnostics = FALSE, ...)
## S3 method for class 'ivreg'
anova(object, object2, test = "F", vcov = NULL, ...)
## S3 method for class 'ivreg'
terms(x, component = c("regressors", "instruments"), ...)
## S3 method for class 'ivreg'
model.matrix(object, component = c("projected", "regressors", "instruments"), ...)
object , object2 , x

an object of class 
vcov. , vcov

a specification of the covariance matrix of the estimated
coefficients. This can be specified as a matrix or as a function yielding a matrix
when applied to the fitted model. If it is a function it is also employed in the two
diagnostic F tests (if 
df 
the degrees of freedom to be used. By default this is set to
residual degrees of freedom for which a t or F test is computed. Alternatively,
it can be set to 
diagnostics 
logical. Should diagnostic tests for the instrumentalvariable regression be carried out? These encompass an F test of the first stage regression for weak instruments, a WuHausman test for endogeneity, and a Sargan test of overidentifying restrictions (only if there are more instruments than regressors). 
test 
character specifying whether to compute the large sample Chisquared statistic (with asymptotic Chisquared distribution) or the finite sample F statistic (with approximate F distribution). 
component 
character specifying for which component of the
terms or model matrix should be extracted. 
... 
currently not used. 
ivreg
is the highlevel interface to the workhorse function ivreg.fit
,
a set of standard methods (including summary
, vcov
, anova
,
hatvalues
, predict
, terms
, model.matrix
, update
, bread
,
estfun
) is available.
## data
data("CigarettesSW")
CigarettesSW < transform(CigarettesSW,
rprice = price/cpi,
rincome = income/population/cpi,
tdiff = (taxs  tax)/cpi
)
## model
fm < ivreg(log(packs) ~ log(rprice) + log(rincome)  log(rincome) + tdiff + I(tax/cpi),
data = CigarettesSW, subset = year == "1995")
summary(fm)
summary(fm, vcov = sandwich, df = Inf, diagnostics = TRUE)
## ANOVA
fm2 < ivreg(log(packs) ~ log(rprice)  tdiff, data = CigarettesSW, subset = year == "1995")
anova(fm, fm2, vcov = sandwich, test = "Chisq")
Crosssection data originating from the health survey SOMIPOPS for Switzerland in 1981.
data("SwissLabor")
A data frame containing 872 observations on 7 variables.
Factor. Did the individual participate in the labor force?
Logarithm of nonlabor income.
Age in decades (years divided by 10).
Years of formal education.
Number of young children (under 7 years of age).
Number of older children (over 7 years of age).
Factor. Is the individual a foreigner (i.e., not Swiss)?
Journal of Applied Econometrics Data Archive.
http://qed.econ.queensu.ca/jae/1996v11.3/gerfin/
Gerfin, M. (1996). Parametric and SemiParametric Estimation of the Binary Response Model of Labour Market Participation. Journal of Applied Econometrics, 11, 321–339.
data("SwissLabor")
### Gerfin (1996), Table I.
fm_probit < glm(participation ~ . + I(age^2), data = SwissLabor,
family = binomial(link = "probit"))
summary(fm_probit)
### alternatively
fm_logit < glm(participation ~ . + I(age^2), data = SwissLabor,
family = binomial)
summary(fm_logit)
Data on course evaluations, course characteristics, and professor characteristics for 463 courses for the academic years 2000–2002 at the University of Texas at Austin.
data("TeachingRatings")
A data frame containing 463 observations on 13 variables.
factor. Does the instructor belong to a minority (nonCaucasian)?
the professor's age.
factor indicating instructor's gender.
factor. Is the course a singlecredit elective (e.g., yoga, aerobics, dance)?
rating of the instructor's physical appearance by a panel of six students, averaged across the six panelists, shifted to have a mean of zero.
course overall teaching evaluation score, on a scale of 1 (very unsatisfactory) to 5 (excellent).
factor. Is the course an upper or lower division course? (Lower division courses are mainly large freshman and sophomore courses)?
factor. Is the instructor a native English speaker?
factor. Is the instructor on tenure track?
number of students that participated in the evaluation.
number of students enrolled in the course.
factor indicating instructor identifier.
A sample of student instructional ratings for a group of university teachers along with beauty rating (average from six independent judges) and a number of other characteristics.
The data were provided by Prof. Hamermesh. The first 8 variables are also available in the online complements to Stock and Watson (2007) at
Hamermesh, D.S., and Parker, A. (2005). Beauty in the Classroom: Instructors' Pulchritude and Putative Pedagogical Productivity. Economics of Education Review, 24, 369–376.
Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.
data("TeachingRatings", package = "AER")
## evaluation score vs. beauty
plot(eval ~ beauty, data = TeachingRatings)
fm < lm(eval ~ beauty, data = TeachingRatings)
abline(fm)
summary(fm)
## prediction of Stock & Watson's evaluation score
sw < with(TeachingRatings, mean(beauty) + c(0, 1) * sd(beauty))
names(sw) < c("Watson", "Stock")
predict(fm, newdata = data.frame(beauty = sw))
## Hamermesh and Parker, 2005, Table 3
fmw < lm(eval ~ beauty + gender + minority + native + tenure + division + credits,
weights = students, data = TeachingRatings)
coeftest(fmw, vcov = vcovCL, cluster = TeachingRatings$prof)
US time series data, 1909–1949.
data("TechChange")
An annual multiple time series from 1909 to 1949 with 3 variables.
Output.
Capital/labor ratio.
Index of technology.
Online complements to Greene (2003), Table F7.2.
https://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Solow, R. (1957). Technical Change and the Aggregate Production Function. Review of Economics and Statistics, 39, 312–320.
data("TechChange")
## Greene (2003)
## Exercise 7.1
fm1 < lm(I(output/technology) ~ log(clr), data = TechChange)
fm2 < lm(I(output/technology) ~ I(1/clr), data = TechChange)
fm3 < lm(log(output/technology) ~ log(clr), data = TechChange)
fm4 < lm(log(output/technology) ~ I(1/clr), data = TechChange)
## Exercise 7.2 (a) and (c)
plot(I(output/technology) ~ clr, data = TechChange)
library("strucchange")
sctest(I(output/technology) ~ log(clr), data = TechChange, type = "Chow", point = c(1942, 1))
Fitting and testing tobit regression models for censored data.
tobit(formula, left = 0, right = Inf, dist = "gaussian",
subset = NULL, data = list(), ...)
formula 
a symbolic description of a regression model of type

left 
left limit for the censored dependent variable 
right 
right limit for the censored dependent variable 
dist 
assumed distribution for the dependent variable 
subset 
a specification of the rows to be used. 
data 
a data frame containing the variables in the model. 
... 
further arguments passed to 
The function tobit
is a convenience interface to survreg
(for survival regression, including censored regression) setting different
defaults and providing a more convenient interface for specification
of the censoring information.
The default is the classical tobit model (Tobin 1958, Greene 2003) assuming a normal distribution for the dependent variable with leftcensoring at 0.
Technically, the formula of type y ~ x1 + x2 + ...
passed to tobit
is simply transformed into a formula suitable for survreg
: This means
the dependent variable is first censored and then wrapped into a Surv
object containing the censoring information which is subsequently passed to
survreg
, e.g., Surv(ifelse(y <= 0, 0, y), y > 0, type = "left") ~ x1 + x2 + ...
for the default settings.
An object of class "tobit"
inheriting from class "survreg"
.
Greene, W.H. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
Tobin, J. (1958). Estimation of Relationships for Limited Dependent Variables. Econometrica, 26, 24–36.
data("Affairs")
## from Table 22.4 in Greene (2003)
fm.tobit < tobit(affairs ~ age + yearsmarried + religiousness + occupation + rating,
data = Affairs)
fm.tobit2 < tobit(affairs ~ age + yearsmarried + religiousness + occupation + rating,
right = 4, data = Affairs)
summary(fm.tobit)
summary(fm.tobit2)
Macroeconomic time series data from 1946 to 1966 on trade credit and the money market.
data("TradeCredit")
An annual multiple time series from 1946 to 1966 on 7 variables.
Nominal total trade money.
Nominal effective reserve money.
GNP in current dollars.
Degree of market utilization.
Shortterm rate of interest.
Mean real size of the representative economic unit (1939 = 100).
GNP price deflator (1958 = 100).
The data are from Baltagi (2002).
Baltagi, B.H. (2002). Econometrics, 3rd ed. Berlin, Springer.
Laffer, A.B. (1970). Trade Credit and the Money Market. Journal of Political Economy, 78, 239–267.
data("TradeCredit")
plot(TradeCredit)
Data on travel mode choice for travel between Sydney and Melbourne, Australia.
data