Title: | Approximate Marginal Inference for Regression-Scale Models |
---|---|
Description: | Likelihood inference based on higher order approximations for linear nonnormal regression models. |
Authors: | S original by Alessandra R. Brazzale <[email protected]>. R port by Alessandra R. Brazzale <[email protected]>, following earlier work by Douglas Bates. |
Maintainer: | Alessandra R. Brazzale <[email protected]> |
License: | GPL (>= 2) | file LICENCE |
Version: | 1.2-2.1 |
Built: | 2024-11-12 06:34:14 UTC |
Source: | CRAN |
Likelihood inference based on higher order approximations for linear nonnormal regression models
Package: | marg |
Version: | 1.2-0 |
Date: | 2009-10-03 |
Depends: | R (>= 2.6.0), statmod, survival |
Suggests: | boot, cond, csampling, nlreg |
License: | GPL (>= 2) |
URL: | http://www.r-project.org, http://statwww.epfl.ch/AA/ |
LazyLoad: | yes |
LazyData: | yes |
Index:
Functions: ========= cond Approximate Conditional Inference - Generic Function cond.rsm Approximate Conditional Inference in Regression-Scale Models dHuber Huber's Least Favourable Distribution family.rsm Use family() on a "rsm" object family.rsm.object Family Object for Regression-Scale Models logLik.rsm Compute the Log Likelihood for Regression-Scale Models marg.object Approximate Marginal Inference Object plot.marg Generate Plots for an Approximate Marginal Inference Object print.summary.marg Use print() on a "summary.marg" object residuals.rsm Compute Residuals for Regression-Scale Models rsm Fit a Regression-Scale Model rsm.diag Diagnostics for Regression-Scale Models rsm.diag.plots Diagnostic Plots for Regression-Scale Models rsm.families Generate a RSM Family Object rsm.fit Fit a Regression-Scale Model Without Computing the Model Matrix rsm.null Fit an Empty Regression-Scale Model rsm.object Regression-Scale Model Object rsm.surv Fit a Regression-Scale Model Without Computing the Model Matrix summary.marg Summary Method for Objects of Class "marg" summary.rsm Summary Method for Regression-Scale Models update.rsm Update and Re-fit a RSM Model Call vcov.rsm Calculate Variance-Covariance Matrix for a Fitted RSM Model Datasets: ======== darwin Darwin's Data on Growth Rates of Plants houses House Price Data nuclear Nuclear Power Station Data venice Sea Level Data
Further information is available in the following vignettes:
Rnews-paper |
hoa: An R Package Bundle for Higher Order Likelihood Inference (source, pdf) |
S original by Alessandra R. Brazzale <[email protected]>. R port by Alessandra R. Brazzale <[email protected]>, following earlier work by Douglas Bates.
Maintainer: Alessandra R. Brazzale <[email protected]>
Performs approximate conditional inference.
cond(object, offset, ...)
cond(object, offset, ...)
object |
a fitted model object. Families supported are binomial and
Poisson with canonical link function (class |
offset |
the covariate occurring in the model formula whose coefficient
represents the parameter of interest. May be numerical or a
two-level factor. In case of a two-level factor, it must be
coded by contrasts and not appear as two dummy variables in the
model. Can also be a call to a mathematical function (such as
|
... |
absorbs any additional arguments. See |
This function is generic (see methods
); method
functions can be written to handle specific classes of data.
Classes which already have methods for this function include:
glm
and rsm
.
The returned value is an approximate conditional inference
object. Classes already supported are cond
and
marg
depending on whether the fitted model object passed
through the object
argument has class glm
or
rsm
. See cond.object
or
marg.object
for more details.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne. Chapter 6.
cond.glm
, cond.rsm
,
cond.object
, marg.object
## Urine Data ## Not run: data(urine) urine.glm <- glm(r ~ gravity + ph + osmo + cond + urea + log(calc), family = binomial, data = urine) ## ## function call as offset variable labels(coef(urine.glm)) cond(urine.glm, log(calc)) ## ## large estimate of regression coefficient urine.glm <- glm(r ~ gravity + ph + osmo + cond + urea + calc, family = binomial, data = urine) coef(urine.glm) urine.glm <- glm(r ~ I(gravity * 100) + ph + osmo + cond + urea + calc, family = binomial, data = urine) coef(urine.glm) urine.cond <- cond(urine.glm, I(gravity * 100)) plot(urine.cond, which = 4) ## End(Not run) ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## ## parameter of interest: scale parameter houses.marg <- cond(houses.rsm, scale) plot(houses.marg, which = 2)
## Urine Data ## Not run: data(urine) urine.glm <- glm(r ~ gravity + ph + osmo + cond + urea + log(calc), family = binomial, data = urine) ## ## function call as offset variable labels(coef(urine.glm)) cond(urine.glm, log(calc)) ## ## large estimate of regression coefficient urine.glm <- glm(r ~ gravity + ph + osmo + cond + urea + calc, family = binomial, data = urine) coef(urine.glm) urine.glm <- glm(r ~ I(gravity * 100) + ph + osmo + cond + urea + calc, family = binomial, data = urine) coef(urine.glm) urine.cond <- cond(urine.glm, I(gravity * 100)) plot(urine.cond, which = 4) ## End(Not run) ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## ## parameter of interest: scale parameter houses.marg <- cond(houses.rsm, scale) plot(houses.marg, which = 2)
Performs approximate conditional inference on a scalar parameter of
interest in regression-scale models. The output is stored in an
object of class marg
.
## S3 method for class 'rsm' cond(object, offset, formula = NULL, family = NULL, dispersion = NULL, data = sys.frame(sys.parent()), pts = 20, n = max(100, 2*pts), tms = 0.6, from = NULL, to = NULL, control = glm.control(...), trace = FALSE, ...)
## S3 method for class 'rsm' cond(object, offset, formula = NULL, family = NULL, dispersion = NULL, data = sys.frame(sys.parent()), pts = 20, n = max(100, 2*pts), tms = 0.6, from = NULL, to = NULL, control = glm.control(...), trace = FALSE, ...)
object |
a |
offset |
either the covariate occurring in the model formula whose
coefficient represents the parameter of interest or |
formula |
a formula expression (only if no |
family |
a |
dispersion |
argument only to be used if no |
data |
an optional data frame in which to interpret the variables
occurring in the formula (only if no |
pts |
number of output points (minimum 10) that are calculated exactly; the default is 20. |
n |
approximate number of output points (minimum 50) produced by the
spline interpolation. The default is the maximum between 100 and
twice |
tms |
defines the range MLE +/- |
from |
starting value of the sequence that contains the values of the parameter of interest for which output points are calculated exactly. The default is MLE - 3.5 * s.e. |
to |
ending value of the sequence that contains the values of the parameter of interest for which output points are calculated exactly. The default is MLE + 3.5 * s.e. |
control |
a list of iteration and algorithmic constants that control the
|
trace |
if |
... |
additional arguments, such as |
This function is a method for the generic function cond
for class rsm
. It can be invoked by calling cond
for
an object of the appropriate class, or directly by calling
cond.rsm
regardless of the class of the object.
cond.rsm
has also to be used if the rsm
object is not
provided throught the object
argument but specified by
formula
and family
.
The function cond.rsm
implements several small sample
asymptotic methods for approximate conditional inference in
regression-scale models. Approximations for both the modified/marginal
log likelihood function and approximate conditional/marginal tail
probabilities are
available (see marg.object
for details). Attention is
restricted to a scalar parameter of interest, either a regression
coefficient or the scale parameter. In the first case, the
associated covariate may be either numerical or a two-level factor.
Approximate conditional (or equivalently marginal) inference is performed
by either updating a
fitted regression-scale model or defining the model formula and
family. All approximations are calculated exactly for pts
equally spaced points ranging from from
to to
. A
spline interpolation is used to extend them over the whole interval
of interest, except for the range of values defined by MLE
+/- tms
* s.e. where the spline interpolation is
replaced by a higher order polynomial interpolation. This is done
in order to avoid numerical instabilities which are likely to occur
for values of the parameter of interest close to the MLE.
Results
are stored in an object of class marg
. Method functions
like print
, summary
and
plot
can be used to examine the output or
represent it graphically. Components can be extracted using
coef
, formula
and
family
.
Main references for the methods considered are the papers by Barndorff-Nielsen (1991), DiCiccio, Field and Fraser (1990) and DiCiccio and Field (1991). The theory and statistics used are summarized in Brazzale (2000, Chapters 2 and 3). More details of the implementation are given in Brazzale (1999; 2000, Section 6.3.1).
The returned value is an object of class marg
; see
marg.object
for details.
If the parameter of interest is the scale parameter, all calculations are performed on the logarithmic scale, though most results are reported on the original scale.
In rare occasions, cond.rsm
dumps because of non-convergence
of the function rsm
which is used to refit the model
for a fixed value of the parameter of interest. This happens for
instance if this value is too extreme. The arguments from
and to
may then be used to limit the default range of
MLE +/- 3.5 * s.e. A further possibility is to
fine-tuning the constants (number of iterations, convergence
threshold) that control the rsm
fit through the
control
argument.
cond.rsm
may also dump if the estimate of the parameter of
interest is large (tipically > 400) in absolute value. This may be
avoided by reparametrizing the model.
Barndorff-Nielsen, O. E. (1991) Modified signed log likelihood ratio. Biometrika, 78, 557–564.
Brazzale, A. R. (1999) Approximate conditional inference for logistic and loglinear models. J. Comput. Graph. Statist., 8, 653–661.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
DiCiccio, T. J., Field, C. A. and Fraser, D. A. S. (1990) Approximations of marginal tail probabilities and inference for scalar parameters. Biometrika, 77, 77–95.
DiCiccio, T. J. and Field, C. A. (1991) An accurate method for approximate conditional and Bayesian inference about linear regression models from censored data. Biometrika, 78, 903–910.
marg.object
, summary.marg
,
plot.marg
, rsm
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) ## ## quadratic model fitted to the sea level, includes 18.62-year ## astronomical tidal cycle and 11-year sunspot cycle venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) names(coef(venice.rsm)) ## "(Intercept)" "Year" "I(Year^2)" "c11" "s11" "c19" "s19" ## ## variable of interest: quadratic term venice.marg <- cond(venice.rsm, I(Year^2)) ## detach() ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## ## parameter of interest: scale parameter houses.marg <- cond(houses.rsm, scale)
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) ## ## quadratic model fitted to the sea level, includes 18.62-year ## astronomical tidal cycle and 11-year sunspot cycle venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) names(coef(venice.rsm)) ## "(Intercept)" "Year" "I(Year^2)" "c11" "s11" "c19" "s19" ## ## variable of interest: quadratic term venice.marg <- cond(venice.rsm, I(Year^2)) ## detach() ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## ## parameter of interest: scale parameter houses.marg <- cond(houses.rsm, scale)
The darwin
data frame has 15 rows and 3 columns.
Charles Darwin conducted an experiment to examine the superiority of cross-fertilized plants over self-fertilized plants. 15 pairs of plants were used. Each pair consisted of one cross-fertilized plant and one self-fertilized plant which germinated at the same time and grew in the same pot. The heights of the plants were measured at a fixed time after planting. Four different pots were used.
data(darwin)
data(darwin)
This data frame contains the following columns:
cross
the heights of the cross-fertilized plants (in inches);
self
the heights of the self-fertilized plants (in inches);
pot
a factor variable for the pot number.
The data were obtained from
Andrews, D. F. and Herzberg, A. M. (1985) Data: A Collection of Problems From Many Fields for the Student and Research Worker (Chapter 2). New York: Springer-Verlag.
Darwin, C. (1878) The Effects of Cross and Self Fertilisation in the Vegetable Kingdom (2nd ed.). London: John Murray.
data(darwin) plot(cross - self ~ pot, data = darwin)
data(darwin) plot(cross - self ~ pot, data = darwin)
This is a method for the function family()
for objects
from which a family.rsm
object can be extracted. Typically
a fitted rsm
model object. See family
for
the general behaviour of this function.
## S3 method for class 'rsm' family(object, ...)
## S3 method for class 'rsm' family(object, ...)
object |
any object from which a |
... |
absorbs any additional argument. |
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) family(venice.rsm) detach() ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) family(houses.rsm)
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) family(venice.rsm) detach() ## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) family(houses.rsm)
Class of objects that characterize the error distribution of a regression-scale model.
This class of objects is returned by a call to a family.rsm
generator function. See rsm.families
for the
distributions which are supported by the marg
package. The
object includes a list of functions and expressions that
characterize the error distribution of a regression-scale model.
These are used by the IRLS algorithm implemented in the
rsm
fitting routine. New families can be added to the
ones already supported. See the demonstration file
‘margdemo.R’ that ships with the package. There is a
print
method for family.rsm
objects which
produces a simple summary without any detail; use
unclass(family.rsm.object)
to see the contents.
The following components, with the corresponding functionality,
are required for a family.rsm
object:
family
a character vector giving the family name.
g0
a function that yields minus the log density of the error distribution in the regression-scale model.
g1
a function that yields the first derivative of minus the log density.
g2
a function that yields the second derivative of minus the log density.
df
argument with NULL
value; must be included to guarantee
compatibility with the existing code.
k
argument with NULL
value; must be included to guarantee
compatibility with the existing code.
For the sake of compatibility, the g0
, g1
and
g2
functions of a user-defined family can only take two
arguments: y
representing an observation and the
...
argument which absorbes any additional arguments.
The houses
data frame has 26 rows and 5 columns.
Ms. Terry Tasch of Long-Kogan Realty, Chicago, provides data on the selling prices of houses and on different variables describing their status. This data frame contains the prices and a subset of the covariates.
data(houses)
data(houses)
This data frame contains the following columns:
price
selling price (in thousands of dollars);
bdroom
number of bedrooms;
floor
floor space (in square feet);
rooms
total number of rooms;
front
front footage of lot (in feet).
The data were obtained from
Sen, A. and Srivastava, M. (1990) Regression Analysis: Theory, Methods and Applications (Exhibit 2.2, page 32). New York: Springer-Verlag.
data(houses) summary(houses) pairs(houses)
data(houses) summary(houses) pairs(houses)
Density, cumulative distribution, quantiles and random number generator for Huber's least favourable distribution.
dHuber(x, k = 1.345) pHuber(q, k = 1.345) qHuber(p, k = 1.345) rHuber(n, k = 1.345)
dHuber(x, k = 1.345) pHuber(q, k = 1.345) qHuber(p, k = 1.345) rHuber(n, k = 1.345)
x |
vector of quantiles. Missing values ( |
q |
vector of quantiles. Missing values ( |
p |
vector of probabilities. Missing values ( |
n |
sample size. If |
k |
tuning constant. Values should preferably lie between 1 and 1.5. The default is 1.345, which gives 95% efficiency at the Normal. |
Inversion of the cumulative distribution function is used to generate deviates from Huber's least favourable distribution.
Density (dHuber
), probability (pHuber
),
quantile (qHuber
), or random sample (rHuber
)
for Huber's least favourable distribution with tuning constant
k
. If values are missing, NA
s will be returned.
The function rHuber
causes creation of the dataset
.Random.seed
if it does not already exist; otherwise its
value is updated.
Huber's least favourable distribution is a compound distribution
with gaussian behaviour in the interval (-k
,k
) and
double exponential tails. It is strongly related to Huber's
M-estimator, which represents the maximum likelihood estimator of
the location parameter.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986) Robust Statistics: The Approach Based on Influence Functions. New York: Wiley.
pHuber(0.5) ## 0.680374 pHuber(0.5, k = 1.5) ## 0.6842623
pHuber(0.5) ## 0.680374 pHuber(0.5, k = 1.5) ## 0.6842623
Computes the log likelihood for regression-scale models.
## S3 method for class 'rsm' logLik(object, ...)
## S3 method for class 'rsm' logLik(object, ...)
object |
an object inheriting from class |
... |
absorbs any additional argument. |
This is a method for the function logLik()
for objects
inheriting from class rsm
.
Returns an object class logLik
which is a number
with attributes, attr(r, "df")
(degrees of freedom)
giving the number of parameters (regression coefficients plus
scale parameter, if not fixed) in the model.
The default
print
method for logLik
objects is used.
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## logLik(venice.rsm) detach()
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## logLik(venice.rsm) detach()
Class of objects returned when performing approximate conditional inference for regression-scale models.
Objects of class marg
are implemented as a list. The
following components are included:
workspace |
a list whose elements are the spline interpolations of several first order and higher order statistics. They are used to implement the following likelihood quantities: - the profile and modified profile/approximate marginal log likelihoods; - the Wald pivots from the unconditional and conditional/approximate marginal MLEs; - the profile and modified/approximate marginal likelihood roots; - the conditional/approximate marginal Lugannani-Rice tail area approximation; - the correction term used in the higher order statistics; - the conditional/marginal information and nuisance parameter aspects. Method functions work mainly on this part of the object. In order to avoid errors in the calculation of confidence intervals and tail probabilities, this part of the object should not be modified. |
coefficients |
a |
call |
the function call that created the |
formula |
the model formula. |
family |
the name of the error distribution. |
offset |
the covariate occurring in the model formula whose coefficient
represents the parameter of interest or |
diagnostics |
diagnostics related to the decomposition of the higher order adjustments into an information and a nuisance parameters term. |
n.approx |
the number of output points for which the statistics have been calculated exactly. |
omitted.val |
the range of values omitted in the spline interpolation of some of the higher order statistics considered. The aim is to avoid numerical instabilities around the maximum likelihood estimate. |
is.scalar |
a logical value indicating whether there are any nuisance
parameters. If |
Main references for the methods considered are the papers by Barndorff-Nielsen (1991), DiCiccio, Field and Fraser (1990) and DiCiccio and Field (1991). The theory and statistics used are summarized in Brazzale (2000, Chapters 2 and 3). More details of the implementation and of the methods considered are given in Brazzale (1999; 2000, Section 6.3.1).
This class of objects is returned from calls to the function
cond.rsm
.
The class marg
has methods for summary
,
plot
, print
,
coef
and family
, among
others.
If the parameter of interest is the scale parameter, all calculations are performed on the logarithmic scale, though most results are reported on the original scale.
Barndorff-Nielsen, O. E. (1991) Modified signed log likelihood ratio. Biometrika, 78, 557–564.
Brazzale, A. R. (1999) Approximate conditional inference for logistic and loglinear models. J. Comput. Graph. Statist., 8, 653–661.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
DiCiccio, T. J., Field, C. A. and Fraser, D. A. S. (1990) Approximations of marginal tail probabilities and inference for scalar parameters. Biometrika, 77, 77–95.
DiCiccio, T. J. and Field, C. A. (1991) An accurate method for approximate conditional and Bayesian inference about linear regression models from censored data. Biometrika, 78, 903–910.
cond.rsm
, summary.marg
,
plot.marg
This data frame contains data on the construction of 32 light water reactors constructed in the USA.
data(nuclear)
data(nuclear)
A data frame with 32 observations on the following 11 variables.
cost
cost of construction (in billions of dollars adjusted to a 1976 base)
date
date at which the construction permit was issued
T1
time between application for and issue of permit
T2
time between issue of operating license and construction permit
cap
power plant capacity (in MWe)
PR
1
if light water reactor already present on site
NE
1
if constructed in north-east region of USA
CT
1
if cooling tower used
BW
1
if nuclear steam supply system manufactured by
Babcock-Wilcox
N
cumulative number of power plants constructed by each architect-engineer
PT
1
if partial turnkey plant
The data were obtained from
Cox, D.R. and Snell, E.J. (1981). Applied Statistics (page 81, Example G). Chapman and Hall, London.
data(nuclear)
data(nuclear)
Creates a set of plots for an object of class marg
.
## S3 method for class 'marg' plot(x = stop("nothing to plot"), from = x.axis[1], to = x.axis[n], which = NULL, alpha = 0.05, add.leg = TRUE, loc.leg = FALSE, add.labs = TRUE, cex = 0.7, cex.lab = 1, cex.axis = 1, cex.main = 1, lwd1 = 1, lwd2 = 2, lty1 = "solid", lty2 = "dashed", col1 = "black", col2 = "blue", tck = 0.02, las = 1, adj = 0.5, lab = c(15, 15, 5), ...)
## S3 method for class 'marg' plot(x = stop("nothing to plot"), from = x.axis[1], to = x.axis[n], which = NULL, alpha = 0.05, add.leg = TRUE, loc.leg = FALSE, add.labs = TRUE, cex = 0.7, cex.lab = 1, cex.axis = 1, cex.main = 1, lwd1 = 1, lwd2 = 2, lty1 = "solid", lty2 = "dashed", col1 = "black", col2 = "blue", tck = 0.02, las = 1, adj = 0.5, lab = c(15, 15, 5), ...)
x |
a |
from |
the starting value for the x-axis range. The default value has
been set by |
to |
the ending value for the x-axis range. The default value has been
set by |
which |
which plot to print. Admissible values are |
alpha |
the level used to read off confidence intervals; the default is 5%. |
add.leg |
if |
loc.leg |
if |
add.labs |
if |
cex , cex.lab , cex.axis , cex.main
|
the character expansions relative to the standard size of the
device to be used for printing text, labels, axes and main title.
See |
lwd1 , lwd2
|
the line widths used to compare different curves in the same plot;
default is |
lty1 , lty2
|
line type used to compare different curves in the same plot;
default is |
col1 , col2
|
colors used to compare different curves in the same plot; default
is |
tck , las , adj , lab
|
further graphical parameters. See |
... |
optional graphical parameters; see |
Several plots are produced for an object of class marg
. A
menu lists all the plots that can be produced. They may be one or
all of the following ones:
Make a plot selection (or 0 to exit) 1: All 2: Profile and modified profile log likelihoods 3: Profile and modified profile likelihood ratios 4: Profile and modified likelihood root 5: Lugannani-Rice approximation 6: Confidence intervals 7: Diagnostics based on INF/NP decomposition Selection:
If no nuisance parameters are presented, a subset of the above pictures is produced. A message is printed if this is the case. More details and examples are given in Brazzale (2000, Sections 6.5 and 5.3.2).
This function is a method for the generic function plot()
for
class marg
. It can be invoked by calling plot
or
directly plot.marg
for an object of the appropriate class.
A plot is created on the current graphics device.
The current device is cleared. When add.leg
is
TRUE
, a legend is added to each plot. Furthermore, if
loc.leg
is TRUE
, the location of the legend can
be set by the user. All screens are closed, but not cleared, on
termination of the function.
If the parameter of interest is the scale parameter, all calculations are performed on the log scale, though most results are reported on the original scale.
Four diagnostic plots are provided. The two panels on the right trace the information and nuisance correction terms, INF and NP, against the likelihood root statistic. These are generally smooth functions and used to approximate the information and nuisance parameter aspects as a function of the parameter of interest, as shown in the two panels on the left. This procedure has the advantage of largely eliminating the numerical instabilities that affect the statistics around the MLE. All four pictures are intended to give an idea of the order of magnitude of the two correction terms while trying to deal with the numerical problems that likely occur for these kinds of data.
More details can be found in Brazzale (2000, Appendix B.2).
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
cond.rsm
, marg.object
,
summary.marg
# Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) # # quadratic model fitted to the sea level, includes 18.62-year # astronomical tidal cycle and 11-year sunspot cycle venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) venice.marg <- cond(venice.rsm, I(Year^2)) plot(venice.marg, which = 4) ## detach()
# Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) # # quadratic model fitted to the sea level, includes 18.62-year # astronomical tidal cycle and 11-year sunspot cycle venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) venice.marg <- cond(venice.rsm, I(Year^2)) plot(venice.marg, which = 4) ## detach()
This is a method for the function print()
for objects of
class summary.marg
. See print
and print.default
for the general behaviour of
this function and for the interpretation of digits
.
## S3 method for class 'summary.marg' print(x, all = x$all, Coef = x$cf, int = x$int, test = x$hyp, digits = if(!is.null(x$digits)) x$digits else max(3, getOption("digits")-3), ...) ## S3 method for class 'summary.marg' print(x, all, Coef, int, test, digits, ...)
## S3 method for class 'summary.marg' print(x, all = x$all, Coef = x$cf, int = x$int, test = x$hyp, digits = if(!is.null(x$digits)) x$digits else max(3, getOption("digits")-3), ...) ## S3 method for class 'summary.marg' print(x, all, Coef, int, test, digits, ...)
x |
a |
all |
if |
Coef |
if |
int |
if |
test |
if |
digits |
the number of significant digits to be printed. The default
depends on the value of |
... |
additional arguments. |
Changing the default values of all
, Coef
, int
and test
allows only a subset of the information in the
summary.marg
object to be printed. With all = FALSE
,
one-sided confidence intervals and the Lugannani-Rice tail area
approximation are omitted. See summary.marg
for more
details.
If the parameter of interest is the scale parameter, all calculations are performed on the log scale, though most results are reported on the original scale.
The amount of information printed may vary depending on whether there are any nuisance parameters. A message is printed if there are none.
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) houses.cond <- cond(houses.rsm, front) print(summary(houses.cond), digits = 4) print(summary(houses.cond), Coef = FALSE)
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) houses.cond <- cond(houses.rsm, front) print(summary(houses.cond), digits = 4) print(summary(houses.cond), Coef = FALSE)
Computes one of the six types of residuals available for regression-scale models.
## S3 method for class 'rsm' residuals(object, type = c("deviance", "pearson", "response", "r.star", "prob", "deletion"), weighting = "observed", ...)
## S3 method for class 'rsm' residuals(object, type = c("deviance", "pearson", "response", "r.star", "prob", "deletion"), weighting = "observed", ...)
object |
an object inheriting from class |
type |
character string; defines the type of residuals, with choices
|
weighting |
character string; defines the weight matrix that should be used in
the calculation of the residuals and diagnostics. Possible
choices are |
... |
absorbs any additional argument. |
This is a method for the function residuals()
for objects
inheriting from class rsm
. As several types of residuals are
available for rsm
objects, there is an additional optional
argument type
. The "deviance"
, "pearson"
,
"r.star"
, "prob"
and "deletion"
residuals are
derived from the final IRLS fit. The "response"
residuals
are standardized residuals on the scale of the response, the
"prob"
residuals are on the scale,
whereas the remaining ones follow approximately the standard normal
distribution.
The default weighting scheme used is "observed"
. The weights
used are the values stored in the q2
component of the
rsm
object. Some of the IRLS weights
returned by rsm
may be negative if the error distribution
is Student's t or user-defined. In order to avoid missing values
in the residuals, the default weighting scheme used is then
"score"
unless otherwise specified. The "score"
weights are also used by default if Huber's least favourable error
distribution is used.
More details, in particular of the use of these residuals, are given in Brazzale (2000, Section 6.3.1).
A numeric vector of residuals. See Davison and Snell (1991) for detailed definitions of each type of residual.
The summary
method for rsm
objects produces
response residuals. The residuals
component of a rsm
object contains the response residuals.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
Davison, A. C. and Snell, E. J. (1991) Residuals and diagnostics. In Statistical Theory and Modelling: In Honour of Sir David Cox (eds. D.V. Hinkley, N. Reid, and E.J. Snell), 83–106. London: Chapman \& Hall.
Davison, A. C. and Tsai, C.-L. (1992) Regression model diagnostics. Int. Stat. Rev., 60, 337–353.
Jorgensen, B. (1984). The delta algorithm and GLIM. Int. Stat. Rev., 52, 283–300.
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## residuals(venice.rsm) ## deviance residuals with observed weights residuals(venice.rsm, type = "r.star", weighting = "score") ## r* residuals with score weights detach()
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## residuals(venice.rsm) ## deviance residuals with observed weights residuals(venice.rsm, type = "r.star", weighting = "score") ## r* residuals with score weights detach()
Produces an object of class rsm
which is a regression-scale
model fit of the data.
rsm(formula = formula(data), family = gaussian, data = sys.frame(sys.parent()), dispersion = NULL, weights = NULL, subset = NULL, na.action = na.fail, offset = NULL, method = "rsm.surv", control = glm.control(maxit=100, trace=FALSE), model = FALSE, x = FALSE, y = TRUE, contrasts = NULL, ...)
rsm(formula = formula(data), family = gaussian, data = sys.frame(sys.parent()), dispersion = NULL, weights = NULL, subset = NULL, na.action = na.fail, offset = NULL, method = "rsm.surv", control = glm.control(maxit=100, trace=FALSE), model = FALSE, x = FALSE, y = TRUE, contrasts = NULL, ...)
formula |
a formula expression as for other linear regression models, of the
form |
family |
a |
data |
an optional data frame in which to interpret the variables
occurring in the model formula, or in the |
dispersion |
if |
weights |
the optional weights for the fitting criterion. If supplied, the
response variable and the covariates are multiplied by the weights
in the IRLS algorithm. The length of the |
subset |
expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default. |
na.action |
a function to filter missing data. This is applied to the model
frame after any |
offset |
this can be used to specify an a priori known component to
be included in the linear predictor during fitting. An
|
method |
the fitting method to be used; the default is |
control |
a list of iteration and algorithmic constants. See
|
model |
if |
x |
if |
y |
if |
contrasts |
a list of contrasts to be used for some or all of the factors appearing as variables in the model formula. The names of the list should be the names of the corresponding variables, and the elements should either be contrast-type matrices (matrices with as many rows as levels of the factor and with columns linearly independent of each other and of a column of one's), or else they should be functions that compute such contrast matrices. |
... |
absorbs any additional argument. |
The model is fitted using Iteratively Reweighted Least
Squares, IRLS for short (Green, 1984,
Jorgensen, 1984). The working response and iterative
weights are computed using the functions contained in the
family.rsm
object.
The two workhorses of rsm
are rsm.fit
and
rsm.surv
, which expect an X
and Y
argument rather then a formula. The first function is used for the
families student
with df
3 and
Huber
;
the second one, based on the survreg.fit
routine for fitting parametric survival models, is used in case of
extreme
, logistic
, logWeibull
,
logExponential
, logRayleigh
and student
(with
df
> 2) error distributions. In the presence of a
user-defined error distribution the rsm.fit
routine is used.
The rsm.null
function is invoked to fit an empty (null)
model.
The details are given in Brazzale (2000, Section 6.3.1).
an object of class rsm
is returned which inherits from
glm
and lm
. See rsm.object
for details.
The output can be examined by print
,
summary
, rsm.diag.plots
and
anova
. Components can be extracted using
fitted
, residuals
,
formula
and family
. It can
be modified using update
. It has most of the
components of a glm
object, with a few more. Use
rsm.object
for further details.
In case of extreme
, logistic
, logWeibull
,
logExponential
, logRayleigh
and student
(with
df
> 2) error distributions, both methods,
rsm.fit
(default choice) and
rsm.surv
, can be used to fit the model.
There are, however, examples where one of the two algorithms (most
likely the one invoked by rsm.surv
) breaks
down. If this is the case, try and refit the model with the
alternative choice.
The message "negative iterative weights returned!"
is
returned if some of the iterative weights (q2
component of
the fitted rsm
object) are negative. These would be used by
default by the rsm.diag
routine for the definition of
residuals and regression diagnostics. In order to avoid missing
values (NA
s), the default weighting scheme "observed"
automatically switches to "score"
unless otherwise specified.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
Green, P. J. (1984) Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with Discussion). J. R. Statist. Soc. B, 46, 149–192.
Jorgensen, B. (1984) The delta algorithm and GLIM. Int. Stat. Rev., 52, 283–300.
rsm.object
, rsm.fit
,
rsm.surv
, rsm.null
,
rsm.families
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## model fit including all covariates houses.rsm <- rsm(price ~ ., family = student(5), data = houses, method = "rsm.fit", control = glm.control(trace = TRUE)) ## prints information about the iterative procedure at each iteration update(houses.rsm, ~ . - bdroom + offset(7 * bdroom)) ## "bdroom" is included as offset variable with fixed (= 7) coefficient ## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 venice.2.rsm <- rsm(sea ~ Year + I(Year^2), family = extreme) ## quadratic model fitted to sea level data venice.1.rsm <- update(venice.2.rsm, ~. - I(Year^2)) ## linear model fit ## c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## includes 18.62-year astronomical tidal cycle and 11-year sunspot cycle venice.11.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11, family = extreme) venice.19.rsm <- rsm(sea ~ Year + I(Year^2) + c19 + s19, family = extreme) ## includes either astronomical cycle ## ## comparison of linear, quadratic and periodic (11-year, 19-year) models plot(year, sea, ylab = "sea level") lines(year, fitted(venice.1.rsm)) lines(year, fitted(venice.2.rsm), col="red") lines(year, fitted(venice.11.rsm), col="blue") lines(year, fitted(venice.19.rsm), col="green") ## detach() ## Darwin's Data on Growth Rates of Plants data(darwin) darwin.rsm <- rsm(cross - self ~ pot - 1, family = student(3), data = darwin) ## Maximum likelihood estimates darwin.rsm <- rsm(cross - self ~ pot - 1, family = Huber, data = darwin) ## M-estimates
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## model fit including all covariates houses.rsm <- rsm(price ~ ., family = student(5), data = houses, method = "rsm.fit", control = glm.control(trace = TRUE)) ## prints information about the iterative procedure at each iteration update(houses.rsm, ~ . - bdroom + offset(7 * bdroom)) ## "bdroom" is included as offset variable with fixed (= 7) coefficient ## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 venice.2.rsm <- rsm(sea ~ Year + I(Year^2), family = extreme) ## quadratic model fitted to sea level data venice.1.rsm <- update(venice.2.rsm, ~. - I(Year^2)) ## linear model fit ## c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## includes 18.62-year astronomical tidal cycle and 11-year sunspot cycle venice.11.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11, family = extreme) venice.19.rsm <- rsm(sea ~ Year + I(Year^2) + c19 + s19, family = extreme) ## includes either astronomical cycle ## ## comparison of linear, quadratic and periodic (11-year, 19-year) models plot(year, sea, ylab = "sea level") lines(year, fitted(venice.1.rsm)) lines(year, fitted(venice.2.rsm), col="red") lines(year, fitted(venice.11.rsm), col="blue") lines(year, fitted(venice.19.rsm), col="green") ## detach() ## Darwin's Data on Growth Rates of Plants data(darwin) darwin.rsm <- rsm(cross - self ~ pot - 1, family = student(3), data = darwin) ## Maximum likelihood estimates darwin.rsm <- rsm(cross - self ~ pot - 1, family = Huber, data = darwin) ## M-estimates
Calculates different types of residuals, Cook's distance and the leverages for a regression-scale model.
rsm.diag(rsmfit, weighting = "observed")
rsm.diag(rsmfit, weighting = "observed")
rsmfit |
an |
weighting |
character string; defines the weight matrix that should be used
in the calculation of the residuals and diagnostics. Possible
choices are |
If the weighting scheme is "observed"
, the weights used are
the values stored in the q2
component of the rsm
object rsmfit
. Otherwise, they are calculated by
rsm.diag
. Some of the IRLS weights returned by
rsm
may be negative if the error distribution is Student's
t or user-defined. In order to avoid missing values in the
residuals and regression diagnostics, the default weighting scheme
used in rsm.diag
switches automatically from
"observed"
to "score"
unless otherwise specified. The
"score"
weights are also used by default if Huber's least
favourable error distribution is used.
There are three types of residuals. The response residuals are
taken on the response scale, whereas the probability transform
residuals are on the scale. The remaining
ones follow approximately the standard normal distribution.
More details and in particular the definitions of the above residuals and diagnostics can be found in Brazzale (2000, Section 6.3.1).
Returns a list with the following components:
resid |
the response residuals on the response scale. |
rd |
the standardized deviance residuals from the IRLS fit. |
rp |
the standardized Pearson residuals from the IRLS fit. |
rg |
the deletion residuals from the IRLS fit. |
rs |
the |
rcs |
the probability transform residuals from the IRLS fit. |
cook |
Cook's distance. |
h |
the leverages of the observations. |
dispersion |
the value of the scale parameter. |
This function is based on A.J. Canty's function glm.diag
contained in the package boot.
Huber's least favourable distribution represents a special case.
The regression diagnostics are only meaningful if the errors
truly follow a Huber-type distribution. This no longer holds
if the option family = Huber
in rsm
is used to
obtain the M-estimates of the parameters in place or the maximum
likelihood estimates.
Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.
Jorgensen, B. (1984) The delta algorithm and GLIM. Int. Stat. Rev., 52, 283–300.
Davison, A. C. and Snell, E. J. (1991) Residuals and diagnostics. In Statistical Theory and Modelling: In Honour of Sir David Cox (eds. D. V. Hinkley, N. Reid, and E. J. Snell), 83–106. London: Chapman & Hall.
Davison, A. C. and Tsai, C.-L. (1992) Regression model diagnostics. Int. Stat. Rev., 60, 337–353.
rsm.diag.plots
, rsm.object
,
summary.rsm
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) venice.diag <- rsm.diag(venice.rsm) ## observed weights detach() ## Darwin's Data on Growth Rates of Plants data(darwin) darwin.rsm <- rsm(cross-self ~ pot - 1, family = Huber, data = darwin) darwin.diag <- rsm.diag(darwin.rsm) ## score weights
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) venice.diag <- rsm.diag(venice.rsm) ## observed weights detach() ## Darwin's Data on Growth Rates of Plants data(darwin) darwin.rsm <- rsm(cross-self ~ pot - 1, family = Huber, data = darwin) darwin.diag <- rsm.diag(darwin.rsm) ## score weights
Generates diagnostic plots for a regression-scale model using different types of residuals, Cook's distance and the leverages.
rsm.diag.plots(rsmfit, rsmdiag = NULL, weighting = NULL, which = NULL, subset = NULL, iden = FALSE, labels = NULL, ret = FALSE, ...) ## S3 method for class 'rsm' plot(x, ...)
rsm.diag.plots(rsmfit, rsmdiag = NULL, weighting = NULL, which = NULL, subset = NULL, iden = FALSE, labels = NULL, ret = FALSE, ...) ## S3 method for class 'rsm' plot(x, ...)
rsmfit , x
|
a |
rsmdiag |
the object returned by a call to |
weighting |
character string; defines the weight matrix that should be used in
the calculation of the residuals and diagnostics. Possible
choices are |
which |
which plot to print. Admissible values are |
subset |
subset of data used in the original |
iden |
logical argument. If |
labels |
a vector of labels for use with |
ret |
logical argument indicating if |
... |
additional arguments such as graphical parameters. |
The diagnostics required for the plots are calculated by
rsm.diag
. These are then used to produce the plots
on the current graphics device.
A menu lists all the plots that can be produced. They may be one or all of the following:
Make a plot selection (or 0 to exit) 1: All 2: Response residuals against fitted values 3: Deviance residuals against fitted values 4: QQ-plot of deviance residuals 5: Normal QQ-plot of r* residuals 6: Cook statistic against h/(1-h) 7: Cook statistic against observation number Selection:
In the normal scores plots, the dotted line represents the expected line if the residuals are normally distributed, i.e. it is the line with intercept 0 and slope 1.
In general, when plotting Cook's distance against the standardized
leverages, there will be two dotted lines on the plot. The
horizontal line is at , where
is
the number of observations and
is the number of
estimated parameters. Points above this line may be points with
high influence on the model. The vertical line is at
and points to the right of this line have
high leverage compared to the variance of the raw residual at that
point. If all points are below the horizontal line or to the left
of the vertical line then the line is not shown.
Use of iden = TRUE
is encouraged for proper exploration of
these plots as a guide to how well the model fits the data and
whether certain observations have an unduly large effect on
parameter estimates.
If ret
is TRUE
then the value of rsmdiag
is returned, otherwise there is no returned value.
The current device is cleared. If iden = TRUE
, interactive
identification of points is enabled. All screens are closed, but
not cleared, on termination of the function.
This function is based on A. J. Canty's function
glm.diag.plots
contained in the package boot.
Davison, A. C. and Snell, E. J. (1991) Residuals and diagnostics. In Statistical Theory and Modelling: In Honour of Sir David Cox (eds. D. V. Hinkley, N. Reid, and E. J. Snell), 83–106. London: Chapman & Hall, London.
Davison, A. C. and Tsai, C.-L. (1992) Regression model diagnostics. Int. Stat. Rev., 60, 337–353.
Jorgensen, B. (1984) The Delta Algorithm and GLIM. Int. Stat. Rev., 52, 283–300.
rsm.diag
, rsm.object
,
identify
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## Not run: rsm.diag.plots(venice.rsm, which = 3) ## End(Not run) ## or ## Not run: plot(venice.rsm) ## End(Not run) ## menu-driven ## rsm.diag.plots(venice.rsm, which = 5, las = 1) ## normal QQ-plot of r* residuals ## Not run: rsm.diag.plots(venice.rsm, which = 7, iden = T, labels = paste(1931:1981)) ## End(Not run) ## year 1932 highly influential detach()
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## Not run: rsm.diag.plots(venice.rsm, which = 3) ## End(Not run) ## or ## Not run: plot(venice.rsm) ## End(Not run) ## menu-driven ## rsm.diag.plots(venice.rsm, which = 5, las = 1) ## normal QQ-plot of r* residuals ## Not run: rsm.diag.plots(venice.rsm, which = 7, iden = T, labels = paste(1931:1981)) ## End(Not run) ## year 1932 highly influential detach()
Generates a family.rsm
object containing a list of functions
and expressions used by rsm
.
extreme() Huber(k = 1.345) logistic() logWeibull() student(df = stop("Argument \"df\" is missing, with no default"))
extreme() Huber(k = 1.345) logistic() logWeibull() student(df = stop("Argument \"df\" is missing, with no default"))
k |
the tuning constant in Huber's least favourable distribution. |
df |
the degrees of freedom in Student's t distribution. |
Each of the names are associated with a member of the class of error
distributions for regression-scale models. Users can construct
their own families, as long as they have components compatible with
those given in rsm.distributions
. The demonstration file
‘margdemo.R’ that accompanies the package shows how to
create a new generator function. When passed as an argument to
rsm
with the default setting, the empty parentheses
()
can be omitted. There is a print
method for the
class family.rsm
.
A family.rsm
object, which is a list of functions and
expressions used by rsm
in the iteratively reweighted
least-squares algorithm. See family.rsm.object
for
details.
family.rsm.object
, family.rsm
,
rsm
, Huber
student(df = 3) ## generates Student's t error distribution with 3 d.f. ## Not run: rsm(formula = value, data = value, family = extreme) ## End(Not run)
student(df = 3) ## generates Student's t error distribution with 3 d.f. ## Not run: rsm(formula = value, data = value, family = extreme) ## End(Not run)
Fits a rsm
model without computing the model matrix of the
response vector.
rsm.fit(X, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)
rsm.fit(X, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)
X |
the model matrix (design matrix). |
Y |
the response vector. |
dispersion |
if |
score.dispersion |
must default to |
offset |
optional offset added to the linear predictor. |
family |
a |
maxit |
maximum number of iterations allowed. |
epsilon |
convergence threshold. |
trace |
if |
... |
not used, but do absorb any redundant argument. |
The rsm.fit
function is called internally by the
rsm
routine to do the actual model fitting. Although
it is not intended to be used directly by the user, it may be useful
when the same data frame is used over and over again. It might save
computational time, since the model matrix is not created. No
formula needs to be specified as an argument. As no weights
argument is available, the response Y
and the model matrix
X
must already include the weights if weighting is desired.
an object which is a subset of a rsm
object.
The rsm.fit
function is the workhorse of the rsm
fitting routine for the student
(with df
2),
Huber
and user-defined error
distributions. It receives X
and Y
data rather than a
formula, but still uses the family.rsm
object to define the
IRLS steps. Users can write
their own versions of rsm.fit
, and pass the name of their
function via the method
argument to rsm
. Care should
be taken to include as many of the arguments as feasible, but
definitely the ...
argument, which will absorb any
additional argument given in the call from rsm
.
rsm
, rsm.surv
, rsm.null
,
rsm.object
, rsm.families
Fits a rsm
model with empty model matrix.
rsm.null(X = NULL, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)
rsm.null(X = NULL, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)
X |
defaults to |
Y |
the response vector. |
dispersion |
either |
score.dispersion |
must default to |
offset |
optional offset added to the linear predictor. |
family |
a |
maxit |
maximum number of iterations allowed. |
epsilon |
convergence threshold. |
trace |
if |
... |
not used, but do absorb any redundant argument. |
The rsm.null
function is called internally by the
rsm
routine to do the actual model fitting in case of an
empty model. It is not intended to be used directly by the user. As
no weights
argument is available, the response Y
and
the model matrix X
must already include the weights if
weighting is desired.
an object which is a subset of a rsm
object.
rsm
, rsm.surv
, rsm.fit
,
rsm.object
, rsm.families
Class of objects returned when fitting a regression-scale model.
The following components must be included in a rsm
object:
coefficients |
the coefficients of the linear predictor, which multiply the columns of the model matrix. The names of the coefficients are the names of the single-degree-of-freedom effects (the columns of the model matrix). If the model is over-determined there will be missing values in the coefficients corresponding to inestimable coefficients. |
dispersion |
the (estimated or known) value of the scale parameter. |
fixed |
a logical value. If |
residuals |
the response residuals from the fit. If weights were used, they
are not taken into account. If you need other kinds of
residuals, use the |
fitted.values |
the fitted values from the fit. If weights were used, the fitted values are not adjusted for the weights. |
loglik |
the log likelihood from the fit. |
q1 |
the value of the first derivative of minus the log density for each observation. |
q2 |
the value of the second derivative of minus the log density for each observation. |
rank |
the computed rank (number of linearly independent columns in the model matrix). |
R |
the unscaled observed information matrix. |
score.dispersion |
a list containing the value of the objective function, its gradient and the convergence diagnostic, that result from estimating the scale parameter. |
iter |
the number of IRLS iterations used to compute the estimates. |
weights |
the (optional) weights used for the fit. |
assign |
the list of assignments of coefficients (and effects) to the terms in the model. The names of this list are the names of the terms. The ith element of the list is the vector saying which coefficients correspond to the ith term. It may be of length 0 if there were no estimable effects for the term. |
df.residuals |
the number of degrees of freedom for residuals. |
family |
the entire |
user.def |
a logical value. If |
dist |
a character string representing the name of the error distribution. |
formula |
the model formula. |
data |
the data frame in which to interpret the variables occurring in
the model formula, or in the |
terms |
an object of mode |
contrasts |
a list containing sufficient information to construct the contrasts used to fit any factors occurring in the model. The list contains entries that are either matrices or character vectors. When a factor is coded by contrasts, the corresponding contrast matrix is stored in this list. Factors that appear only as dummy variables and variables in the model that are matrices correspond to character vectors in the list. The character vector has the level names for a factor or the column labels for a matrix. |
control |
a list of iteration and algorithmic constants used in |
call |
an image of the call that produced the object, but with the arguments all named and with the actual formula included as the formula argument. |
y |
optionally the response, if |
x |
optionally the model matrix, if |
model |
optionally the model frame, if |
This class of objects is returned by the rsm
function
to represent a fitted regression-scale model. Class rsm
inherits from classes glm
and
lm
, since it is fitted by iteratively reweighted
least squares. The object returned has all the components of a
weighted least squares object.
Objects of this class have methods for the functions
print
, summary
,
anova
and fitted
among
others.
The residuals, fitted values and coefficients should be extracted by
the generic functions of the same name, rather than by the $
operator.
Fits a rsm
model without computing the model matrix of the
response vector.
rsm.surv(X, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)
rsm.surv(X, Y, offset, family, dispersion, score.dispersion, maxit, epsilon, trace, ...)
X |
the model matrix (design matrix). |
Y |
the response vector. |
offset |
optional offset added to the linear predictor. |
family |
a |
dispersion |
if |
score.dispersion |
must default to |
maxit |
maximum number of iterations. |
epsilon |
convergence threshold. |
trace |
if |
... |
not used, but do absorb any redundant argument. |
The rsm.surv
function is called internally by the
rsm
routine to do the actual model fitting. Although
it is not intended to be used directly by the user, it may be useful
when the same data frame is used over and over again. It might save
computational time, since the model matrix is not created. No
formula needs to be specified as an argument. As no weights
argument is available, the response Y
and the model matrix
X
must already include the weights if weighting is desired.
an object, which is a subset of a rsm
object.
The rsm.surv
function is the default option for
rsm
for the extreme
, logistic
,
logWeibull
, logExponential
, logRayleigh
and
student
(with df
larger than 2) error distributions.
It makes use of the survreg.fit
routine to
estimate parametric survival models. It receives X
and
Y
data rather than a formula, but still uses the
family.rsm
object to define the IRLS steps. The
rsm.surv
routine cannot be used for Huber-type and
user-defined error distributions.
rsm
, rsm.fit
, rsm.null
,
rsm.object
, rsm.families
Returns a summary list for objects of class marg
.
## S3 method for class 'marg' summary(object, alpha = 0.05, test = NULL, all = FALSE, coef = TRUE, int = ifelse((is.null(test) || all), TRUE, FALSE), digits = NULL, ...)
## S3 method for class 'marg' summary(object, alpha = 0.05, test = NULL, all = FALSE, coef = TRUE, int = ifelse((is.null(test) || all), TRUE, FALSE), digits = NULL, ...)
object |
a |
alpha |
a vector of levels for confidence intervals; the default is 5%. |
test |
a vector of values of the parameter of interest one wants to test
for. If |
all |
logical value; if |
coef |
logical value; if |
int |
logical value; if |
digits |
the number of significant digits to be printed. The default depends
on the value of |
... |
absorbs any additional argument. |
This function is a method for the generic function summary()
for objects of class marg
. It can be invoked by calling
summary
or directly summary.marg
for an object of the
appropriate class.
A list is returned with the following components:
coefficients |
a |
conf.int |
a matrix containing, for each level given in |
signif.tests |
a list with two elements. The first ( |
call |
the function call that created the |
formula |
the model formula. |
family |
the name of the error distribution. |
offset |
the covariate occurring in the model formula whose coefficient
represents the parameter of interest or |
alpha |
the vector of levels used to compute the confidence intervals. |
hypotheses |
the values for the parameter of interest that have been tested for. |
diagnostics |
the information and nuisance parameters aspects; see
|
n.approx |
the number of output points that have been calculated exactly. |
all |
logical value; if |
cf |
logical value; if |
int |
logical value; if |
is.scalar |
a logical value indicating whether there are any nuisance
parameters. If |
digits |
the number of significant digits to be printed. |
If the parameter of interest is the scale parameter, all calculations are performed on the log scale, though most results are reported on the original scale.
The amount of information calculated may vary depending on whether there are any nuisance parameters. A message is printed if there are none.
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) houses.marg <- cond(houses.rsm, floor) summary(houses.marg, test = 0, coef = FALSE)
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) houses.marg <- cond(houses.rsm, floor) summary(houses.marg, test = 0, coef = FALSE)
Returns a summary list for a fitted regression-scale model.
## S3 method for class 'rsm' summary(object, correlation = FALSE, digits = NULL, ...)
## S3 method for class 'rsm' summary(object, correlation = FALSE, digits = NULL, ...)
object |
a fitted |
correlation |
logical argument. If |
digits |
a non-null value specifies the minimum number of significant
digits to be printed in values. If |
... |
absorbs any additional argument. |
This function is a method for the generic function
summary()
for class rsm
. It can be invoked by
calling summary
for an object of the appropriate class, or
directly by calling summary.rsm
regardless of the class of
the object.
A list is returned with the following components:
coefficients |
a matrix with four columns, containing the coefficients, their
standard errors, the |
dispersion |
the value of the scale parameter used in the computations. |
fixed |
a logical value. If |
residuals |
the response residuals. |
cov.unscaled |
the unscaled covariance matrix, i.e, a matrix such that multiplying it by the squared scale parameter, or an estimate thereof, produces an estimated (asymptotic) covariance matrix for the coefficients. |
correlation |
the computed correlation matrix for the coefficients in the model. |
family |
the entire |
loglik |
the computed log likelihood. |
terms |
an object of mode |
df |
the number of degrees of freedom for the model and for the residuals. |
iter |
the number of IRLS iterations used to compute the estimates. |
nas |
a logical vector indicating which, if any, coefficients are missing. |
call |
an image of the call that produced the |
digits |
the value of the |
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) summary(houses.rsm)
## House Price Data data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) summary(houses.rsm)
update.rsm
is used to update a rsm
model
formulae. This typically involves adding or dropping terms, but
updates can be more general.
## S3 method for class 'rsm' update(object, formula., ..., evaluate = TRUE)
## S3 method for class 'rsm' update(object, formula., ..., evaluate = TRUE)
object |
a model of class |
formula. |
changes to the formula – see |
... |
additional arguments to the call, or arguments with changed
values. Use |
evaluate |
if |
If evaluate = TRUE
the fitted object, otherwise the updated
call.
Based upon update.default
.
update
, update.default
,
update.formula
data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## model fit including all covariates ## houses.rsm <- update(houses.rsm, method = "rsm.fit", control = glm.control(trace = TRUE)) ## prints information about the iterative procedure at each iteration ## update(houses.rsm, ~ . - bdroom + offset(7 * bdroom)) ## "bdroom" is included as offset variable with fixed (= 7) coefficient
data(houses) houses.rsm <- rsm(price ~ ., family = student(5), data = houses) ## model fit including all covariates ## houses.rsm <- update(houses.rsm, method = "rsm.fit", control = glm.control(trace = TRUE)) ## prints information about the iterative procedure at each iteration ## update(houses.rsm, ~ . - bdroom + offset(7 * bdroom)) ## "bdroom" is included as offset variable with fixed (= 7) coefficient
Returns the variance-covariance matrix of the parameters of a
fitted rsm
model object.
## S3 method for class 'rsm' vcov(object, correlation = FALSE, ...)
## S3 method for class 'rsm' vcov(object, correlation = FALSE, ...)
object |
a fitted model object of class |
correlation |
if |
... |
absobs any additional argument. |
This is a method for function vcov
for objects
of class rsm
.
A matrix of the estimated covariances between the parameter
estimates of a fitted regression-scale model, or, if
dispersion = TRUE
the correlation matrix.
vcov
, rsm.object
,
rsm
, summary.rsm
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## vcov(venice.rsm) vcov(venice.rsm, corr = TRUE) ## detach()
## Sea Level Data data(venice) attach(venice) Year <- 1:51/51 c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) ## vcov(venice.rsm) vcov(venice.rsm, corr = TRUE) ## detach()
The venice
data frame has 51 rows and 2 columns.
Pirazzoli (1982) collected the ten largest values of sea
levels in Venice (with a few exceptions) for the years 1887–1981.
The venice
data frame contains the maxima for the years
1931–1981.
data(venice)
data(venice)
This data frame contains the following columns:
year
the years;
sea
the sea levels (in cm).
The data were obtained from
Smith, R. L. (1986) Extreme value theory based on the
-largest annual events. Journal of Hydrology ,
86, 27–43.
Pirazzoli, P. (1982) Maree estreme a Venezia (periodo 1872–1981). Acqua Aria, 10, 1023–1039.
data(venice) attach(venice) # plot(sea ~ year, ylab = "sea level") ## Year <- 1:51/51 venice.l <- rsm(sea ~ Year + I(Year^2), family = extreme) lines(year, fitted(venice.l)) ## c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.p <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) lines(year, fitted(venice.p), col = "red") ## detach()
data(venice) attach(venice) # plot(sea ~ year, ylab = "sea level") ## Year <- 1:51/51 venice.l <- rsm(sea ~ Year + I(Year^2), family = extreme) lines(year, fitted(venice.l)) ## c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11) c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62) venice.p <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, family = extreme) lines(year, fitted(venice.p), col = "red") ## detach()