Title: | Global Validation of Linear Models Assumptions |
---|---|
Description: | Methods from the paper: Pena, EA and Slate, EH, "Global Validation of Linear Model Assumptions," J. American Statistical Association, 101(473):341-354, 2006. |
Authors: | Edsel A. Pena <[email protected]> and Elizabeth H. Slate <[email protected]> |
Maintainer: | Elizabeth Slate <[email protected]> |
License: | GPL |
Version: | 1.0.0.3 |
Built: | 2024-10-14 06:30:52 UTC |
Source: | CRAN |
Perform a single global test to assess the linear model assumptions, as well as perform specific directional tests designed to detect skewness, kurtosis, a nonlinear link function, and heteroscedasticity.
Package: | gvlma |
Type: | Package |
Version: | 1.0 |
Date: | 2006-06-07 |
License: | GPL |
The function gvlma
will take either a linear models object or a
formula and data set for a linear model (single response) and compute
the global
and directional tests for assessing modeling assumptions as described in
the reference listed below. The function deletion.gvlma
will
compute the deletion (“leave-one-out”) global statistics described in
that paper.
Slate, EH [email protected] and Pena, EA [email protected]
Maintainer: Slate, EH <[email protected]>
Pena, EA and Slate, EH (2006). “Global validation of linear model assumptions,” J.\ Amer.\ Statist.\ Assoc., 101(473):341-354.
x1 <- rnorm(100,0,2) x2 <- runif(100) y <- 3*x1 -x2 + rnorm(100) gvmodel <- gvlma(lm(y ~ x1 + x2)) plot(gvmodel) summary(gvmodel) gvmodel.del <- deletion.gvlma(gvmodel) summary(gvmodel.del) plot(gvmodel.del)
x1 <- rnorm(100,0,2) x2 <- runif(100) y <- 3*x1 -x2 + rnorm(100) gvmodel <- gvlma(lm(y ~ x1 + x2)) plot(gvmodel) summary(gvmodel) gvmodel.del <- deletion.gvlma(gvmodel) summary(gvmodel.del) plot(gvmodel.del)
Data on automobile gas mileage performace recorded at each gasoline fill-up from Oct. 20, 1996 through January 27, 1999.
data(CarMileageData)
data(CarMileageData)
A data frame with 205 observations on the following 7 variables.
Date
Date of gasoline fill-up
Lag1Date
Lagged gasoline fill-up date
NumDaysBetw
Number of days since last gasoline fill-up
TotalMiles
Current odometer reading
NumGallons
Number of gallons to fill tank
MilesLastFill
Miles driven since last fill-up
AveMilesGal
Average miles per gallon achieved since last fill-up
Many people routinely record data on automobile mileage performance at each gasoline fill-up. Prof.\ E.\ Pena generously contributed his data for this time period.
These data were used in Example 1 of the publication “Global Validation of Linear Model Assumptions” by E.\ Pena and E. Slate, Journal of the American Statistical Association, 101(473):341-354, 2006. The data were recorded by Prof.\ E.\ Pena.
data(CarMileageData) plot(CarMileageData)
data(CarMileageData) plot(CarMileageData)
Computes the deletion statistics (leave-one-out) for assessing unusual observations in a linear model.
deletion.gvlma(gvlmaobj)
deletion.gvlma(gvlmaobj)
gvlmaobj |
A |
Given a gvlma
object, which contains in the component GlobalTest
the test statistics and p-values for the global and directional tests to
assess linear models assumptions, deletion.gvlma
computes the
leave-one-out global and directional statistics. The deletion
statistics are reported as percent relative change from the
corresponding statistic value based on the full data set.
A dataframe is returned with variables
DeltaGlobalStat
, GStatpvalue
, DeltaStat1
,
Stat1pvalue
, DeltaStat2
, Stat2pvalue
,
DeltaStat3
,
Stat3pvalue
, DeltaStat4
, and Stat4pvalue
.
Each “Delta” variable is the percent relative change in the
statistic when the corresponding observation (row of the data
frame) is dropped. Each “pvalue” variable is the p-value
associated with the deletion statistic. (Note the p-value is
NOT a change in the p-values for the full and leave-one-out
statistic values.)
Slate, EH [email protected] and Pena, EA [email protected].
Pena, EA and Slate, EH (2006). “Global validation of linear model assumptions,” J.\ Amer.\ Statist.\ Assoc., 101(473):341-354.
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill, data = CarMileageData) CarModelDel <- deletion.gvlma(CarModelAssess) CarModelDel
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill, data = CarMileageData) CarModelDel <- deletion.gvlma(CarModelAssess) CarModelDel
Creates a graph of the p-values associated with the deletion statistics
versus the deletion statistics with
unusual observations highlighted. This function is called by
plot.gvlmaDel
.
display.delstats(deletedStatvals, deletedpvals, nsd = 3, TukeyStyle = TRUE, statname = "G", pointlabels)
display.delstats(deletedStatvals, deletedpvals, nsd = 3, TukeyStyle = TRUE, statname = "G", pointlabels)
deletedStatvals |
The vector of deletion statistics, with i-th entry defined as the percent relative change in the global test statistic when the i-th observation is removed from the analysis. |
deletedpvals |
The vector of p-values associated with the global test statistics, with i-th entry being the p-value for the global test statistic with observation i removed. |
nsd |
Parameter that governs which observations are deemed
unusual. When |
TukeyStyle |
Controls how unusual observations are determined.
If |
statname |
A string used to label the |
pointlabels |
Character vector of same length as |
Generally display.delstats
is not called directly, but rather
by the function plot.gvlmaDel
.
Plots the deletedpvals
versus the deletedStatvals
and adds
“control
limits” determined by the parameters nsd
and TukeyStyle
.
Points outside
the “control limits” (in either the deletedStatval
or
deletedpval
) are
labeled as unusual.
A dataframe consisting of the unusual observations with variables
deletedStatval
and deletedpval
.
Slate, EH [email protected] and Pena, EA [email protected].
Pena, EA and Slate, EH (2006). “Global validation of linear model assumptions,” J.\ Amer.\ Statist.\ Assoc., 101(473):341-354.
data(CarMileageData) CarMileageAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) CarMileageDel <- deletion.gvlma(CarMileageAssess) plot(CarMileageDel) display.delstats(CarMileageDel$DeltaGlobalStat, CarMileageDel$GStatpvalue) display.delstats(CarMileageDel$DeltaStat1, CarMileageDel$Stat1pvalue)
data(CarMileageData) CarMileageAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) CarMileageDel <- deletion.gvlma(CarMileageAssess) plot(CarMileageDel) display.delstats(CarMileageDel$DeltaGlobalStat, CarMileageDel$GStatpvalue) display.delstats(CarMileageDel$DeltaStat1, CarMileageDel$Stat1pvalue)
Top-level function for Global Validation of Linear Models Assumptions.
gvlma(x, data, alphalevel = 0.05, timeseq, ...) gvlma.form(formula, data, alphalevel = 0.05, timeseq = 1:nrow(data), ...) gvlma.lm(lmobj, alphalevel = 0.05, timeseq)
gvlma(x, data, alphalevel = 0.05, timeseq, ...) gvlma.form(formula, data, alphalevel = 0.05, timeseq = 1:nrow(data), ...) gvlma.lm(lmobj, alphalevel = 0.05, timeseq)
x |
Either a formula, in which case |
formula |
A linear models formula interpretable within the
dataframe |
lmobj |
An object resulting from a call to |
data |
Required if |
alphalevel |
Level of significance at which to perform the global and directional tests for linear models assumptions. |
timeseq |
A vector of length the number of observations in the linear model that gives a "time ordering" for the observations. This time sequence is used in the heteroscedasticity test statistic. Defaults to 1:n where n is the number of observations in the linear model. |
... |
Additional arguments such as |
gvlma
is the top-level function to create a gvlma
object
for assessment of linear models assumptions.
A gvlma
object is returned. This is a list of class
“gvlma” that contains all of the components returned by the call to
lm
for fitting the linear model, plus an additional component
entitled “GlobalTest.” This new GlobalTest
component is a list with
the following components:
LevelOfSignificance |
The level of significance at which the decisions reported for the global and directional tests were made. |
GlobalStat4 |
A list consisting of the components |
DirectionalStat1 |
A list consisting of the |
DirectionalStat2 |
A list consisting of the |
DirectionalStat3 |
A list consisting of the |
DirectionalStat4 |
A list consisting of the |
timeseq |
The ordering of the observations used when computing the heteroscedasticity directional statistic. |
call |
The call used to invoke |
Slate, EH [email protected] and Pena, EA [email protected].
Pena, EA and Slate, EH (2006). “Global validation of linear model assumptions,” J.\ Amer.\ Statist.\ Assoc., 101(473):341-354.
plot.gvlma
, deletion.gvlma
,
update.gvlma
,
lm
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) CarModelAssess summary(CarModelAssess) CarModel2 <- gvlma(lm(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData)) CarModel2 summary(CarModel2) plot(CarModel2)
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) CarModelAssess summary(CarModelAssess) CarModel2 <- gvlma(lm(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData)) CarModel2 summary(CarModel2) plot(CarModel2)
Diagnostic plots for a single-response gvlma linear model.
## S3 method for class 'gvlma' plot(x, onepage = TRUE, ask = !onepage && prod(par("mfcol")) < ncol(model.matrix(x)) + 4 && dev.interactive(), ...)
## S3 method for class 'gvlma' plot(x, onepage = TRUE, ask = !onepage && prod(par("mfcol")) < ncol(model.matrix(x)) + 4 && dev.interactive(), ...)
x |
A |
onepage |
If TRUE, all plots will be displayed in one page of graphs. |
ask |
If TRUE, user will be prompted before plots begin a new page. |
... |
Additional arguments that are ignored. |
A series of plots is generated for diagnostic assessment of a linear
model for a single response variable. The plots are similar to those
generated by plot.lm
. The
plots are (a) the response versus each of the predictors in the model,
(b) the response versus the time sequence in the gvlma object
(gvlmaobj\$GlobalTest\$timeseq
), which is the time sequence used for
computing the directional test statistic , (c) the
standardized
residuals vs the fitted values, (d) a histogram of the standardized
residuals, (e) a normal probability plot of the standardized residuals,
and (f) a plot of the standardized residuals versus the time sequence.
Note that the standardized residuals here are computed as the raw residuals divided by the MLE of the error standard deviation (i.e. sqrt(SSE/n)).
No value is returned.
The standardized residuals here are computed as the raw residuals divided by the MLE of the error standard deviation (i.e. sqrt(SSE/n)).
Slate, EH [email protected] and Pena, EA [email protected].
Pena, EA and Slate, EH (2006). “Global validation of linear model assumptions,” J.\ Amer.\ Statist.\ Assoc., 101(473):341-354.
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) plot(CarModelAssess) par(mfrow=c(2,2)) plot(CarModelAssess, onepage = FALSE)
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) plot(CarModelAssess) par(mfrow=c(2,2)) plot(CarModelAssess, onepage = FALSE)
Plots to display the behavior of the deletion statistics stored in a gvlmaDel object.
## S3 method for class 'gvlmaDel' plot(x, which = 1:2, TukeyStyle = TRUE, ask = prod(par("mfcol")) < max(c(10, 5)[which]) && dev.interactive(), pointlabels, ...)
## S3 method for class 'gvlmaDel' plot(x, which = 1:2, TukeyStyle = TRUE, ask = prod(par("mfcol")) < max(c(10, 5)[which]) && dev.interactive(), pointlabels, ...)
x |
A |
which |
Vector indicating which, or both, of two types of plots to show. |
TukeyStyle |
If TRUE, determine unusual observations in a robust way based on inter-quartile ranges, else based on standard deviations. |
ask |
If TRUE, prompt the user before beginning a new page of graphs. |
pointlabels |
A vector of length the number of observations in
the linear model fit in the |
... |
Additional arguments that are ignored. |
If which = 1
, each of the 5 deletion statistics (deletion
global statistic and each of the 4 directional statistics) is plotted
against the time sequence used for the 4th directional statistic
(assessing heteroscedasticity).
If which = 2
, the function display.delstats
is called
for each of the 5 deletion statistics. The argument TukeyStyle
is passed directly to display.delstats
. See the help for
display.delstats
for details.
If which = c(1,2)
, the default, then all 10 plots are
generated.
The deletion statistics in the gvlmaDel
object are the percent
relative change when each observation, in turn, is omitted from the
model fitting.
No value is returned.
Slate, EH [email protected] and Pena, EA [email protected].
Pena, EA and Slate, EH (2006). “Global validation of linear model assumptions,” J.\ Amer.\ Statist.\ Assoc., 101(473):341-354.
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) CarModelDel <- deletion.gvlma(CarModelAssess) par(mfrow=c(1,1)) plot(CarModelDel) par(mfrow=c(2,2)) plot(CarModelDel) plot(CarModelDel, TukeyStyle = FALSE) plot(CarModelDel, which = 2)
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) CarModelDel <- deletion.gvlma(CarModelAssess) par(mfrow=c(1,1)) plot(CarModelDel) par(mfrow=c(2,2)) plot(CarModelDel) plot(CarModelDel, TukeyStyle = FALSE) plot(CarModelDel, which = 2)
Prints the basic information for a gvlma object, which is the output
object from the function gvlma
.
## S3 method for class 'gvlma' summary(object, ...) ## S3 method for class 'gvlma' print(x, ...) display.gvlmatests(gvlmaobj)
## S3 method for class 'gvlma' summary(object, ...) ## S3 method for class 'gvlma' print(x, ...) display.gvlmatests(gvlmaobj)
x , object , gvlmaobj
|
An object resulting from a call to gvlma. It is a list containing the components of a call to lm plus an item with the name GlobalTest. |
... |
Additional arguments that are passed to |
print.gvlma
invokes print on the lm
object and then calls
display.gvlmatests
.
summary.gvlma
invokes summary
on the lm
object with the
additional ... arguments and then calls
display.gvlmatests
.
display.gvlmatests
provides the test statistics, p-values and decision
(whether linear models assumptions are satisfied) for the global and
directional tests associated with the gvlma object. The decision is
reported at the level of significance used when the gvlma object was
created. See the argument alphalevel
to gvlma
.
The value returned invisibly is a dataframe with row names indicating the global test and the 4 directional tests. Variables are
Value |
Value of the test statistic. |
p-value |
p-value associated with the test. |
Decision |
Text string indicating whether the test statistic is
significant at the significance level specified in the original call
to |
Slate, EH [email protected] and Pena, EA [email protected].
Pena, EA and Slate, EH (2006). “Global validation of linear model assumptions,” J.\ Amer.\ Statist.\ Assoc., 101(473):341-354.
gvlma
, display.gvlmatests
, summary.lm
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill, data = CarMileageData) CarModelAssess summary(CarModelAssess)
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill, data = CarMileageData) CarModelAssess summary(CarModelAssess)
Summarize the test statistic values and p-values for assessing unusual
observations using the global and directional test statistics that were
computed in a gvlmaDel
object resulting from a call to
deletion.gvlma
.
## S3 method for class 'gvlmaDel' summary(object, allstats = TRUE, ...) ## S3 method for class 'gvlmaDel' print(x, ...)
## S3 method for class 'gvlmaDel' summary(object, allstats = TRUE, ...) ## S3 method for class 'gvlmaDel' print(x, ...)
object , x
|
Object resulting from a call to
|
allstats |
For |
... |
Additional arguments that are ignored. |
The summary values are the min, first quartile, median, average, 3rd quartile and maximum of the deletion test statistic values and p-values. Additionally, observations and the corresponding deletion test statistic values and p-values for which the deletion test statistic value or its p-value is outside the outer fences (Q1 - 3*IQR, Q3 + 3*IQR) of the set of deletion statistics are reported.
print.gvlmaDel
simply invokes summary.gvlmaDel
with
allstats = TRUE
.
A dataframe of dimension nobs
x 5 is returned invisibly, where
nobs
is
the number of observations in the linear model fit. The 5 columns are
named DeltaGlobalStat
, DeltaStat1
, DeltaStat2
,
DeltaStat3
, and DeltaStat4
, indicating the deletion
global test and the four deletion directional test statistics. Each
entry in the dataframe is TRUE/FALSE, indicating whether the
corresponding test statistic was unusual (i.e. beyond the outer
fences) with respect to either its value or its p-value.
Slate, EH [email protected] and Pena, EA [email protected].
Pena, EA and Slate, EH (2006). “Global validation of linear model assumptions,” J.\ Amer.\ Statist.\ Assoc., 101(473):341-354.
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill, data = CarMileageData) CarModelAssess CarModelDel <- deletion.gvlma(CarModelAssess) CarModelDel summary(CarModelDel) summary(CarModelDel, allstats = FALSE)
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill, data = CarMileageData) CarModelAssess CarModelDel <- deletion.gvlma(CarModelAssess) CarModelDel summary(CarModelDel) summary(CarModelDel, allstats = FALSE)
Update a gvlma object with changes to the linear model, the level of significance for global tests, or the time sequence used for the heteroscedasticity directional test.
## S3 method for class 'gvlma' update(object, formula, ...)
## S3 method for class 'gvlma' update(object, formula, ...)
object |
A gvlma object resulting from a call to |
formula |
(optional) A new formula describing the underlying linear model. |
... |
Additional arguments to be changed from the original call
to gvlma. These may include arguments to the |
All arguments other than alphalevel
and timeseq
(and
warn
) are passed
on to a call to update
for the underlying linear model.
If alphalevel
is
specified, then subsequent displays of the global and directional test
statistic decisions will be based on the new level of significance. If
timeseq
is specified, then the heteroscdasticity direction test,
, will be updated to use the new time sequence.
A new gvlma object is returned.
Slate, EH [email protected] and Pena, EA [email protected].
Pena, EA and Slate, EH (2006). “Global validation of linear model assumptions,” J.\ Amer.\ Statist.\ Assoc., 101(473):341-354.
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) CarModelAssess summary(CarModelAssess) CarModelNew <- update(CarModelAssess, alphalevel = 0.01) CarModelNew CarModelNew <- update(CarModelAssess, subset = -(1:10)) CarModelNew summary(CarModelNew)
data(CarMileageData) CarModelAssess <- gvlma(NumGallons ~ MilesLastFill + NumDaysBetw, data = CarMileageData) CarModelAssess summary(CarModelAssess) CarModelNew <- update(CarModelAssess, alphalevel = 0.01) CarModelNew CarModelNew <- update(CarModelAssess, subset = -(1:10)) CarModelNew summary(CarModelNew)