Title: | Local Regression, Likelihood and Density Estimation |
---|---|
Description: | Local regression, likelihood and density estimation methods as described in the 1999 book by Loader. |
Authors: | Catherine Loader [aut], Jiayang Sun [ctb], Lucent Technologies [cph], Andy Liaw [cre] |
Maintainer: | Andy Liaw <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.5-9.10 |
Built: | 2024-11-22 06:26:04 UTC |
Source: | CRAN |
The calling sequence for aic
matches those for the
locfit
or locfit.raw
functions.
The fit is not returned; instead, the returned object contains
Akaike's information criterion for the fit.
The definition of AIC used here is -2*log-likelihood + pen*(fitted d.f.). For quasi-likelihood, and local regression, this assumes the scale parameter is one. Other scale parameters can effectively be used by changing the penalty.
The AIC score is exact (up to numerical roundoff) if the
ev="data"
argument is provided. Otherwise, the residual
sum-of-squares and degrees of freedom are computed using locfit's
standard interpolation based approximations.
aic(x, ..., pen=2)
aic(x, ..., pen=2)
x |
model formula |
... |
other arguments to locfit |
pen |
penalty for the degrees of freedom term |
The aicplot
function loops through calls to the aic
function (and hence to locfit
), using a different
smoothing parameter for each call.
The returned structure contains the AIC statistic for each fit, and can
be used to produce an AIC plot.
aicplot(..., alpha)
aicplot(..., alpha)
... |
|
alpha |
Matrix of smoothing parameters. The |
An object with class "gcvplot"
, containing the smoothing
parameters and AIC scores. The actual plot is produced using
plot.gcvplot
.
locfit
,
locfit.raw
,
gcv
,
aic
,
plot.gcvplot
data(morths) plot(aicplot(deaths~age,weights=n,data=morths,family="binomial", alpha=seq(0.2,1.0,by=0.05)))
data(morths) plot(aicplot(deaths~age,weights=n,data=morths,family="binomial", alpha=seq(0.2,1.0,by=0.05)))
The first two columns are the gender of the athlete and their sport. The remaining 11 columns are various measurements made on the athletes.
data(ais)
data(ais)
A dataframe.
Cook and Weisberg (1994).
Cook and Weisberg (1994). An Introduction to Regression Graphics. Wiley, New York.
The ang()
function is used in a locfit model formula
to specify that a variable should be treated as an angular
or periodic term. The scale
argument is used to
set the period.
ang(x)
is equivalent to lp(x,style="ang")
.
ang(x,...)
ang(x,...)
x |
numeric variable to be treated periodically. |
... |
Other arguments to |
Loader, C. (1999). Local Regression and Likelihood. Springer, NY (Section 6.2).
# generate an x variable, and a response with period 0.2 x <- seq(0,1,length=200) y <- sin(10*pi*x)+rnorm(200)/5 # compute the periodic local fit. Note the scale argument is period/(2pi) fit <- locfit(y~ang(x,scale=0.2/(2*pi))) # plot the fit over a single period plot(fit) # plot the fit over the full range of the data plot(fit,xlim=c(0,1))
# generate an x variable, and a response with period 0.2 x <- seq(0,1,length=200) y <- sin(10*pi*x)+rnorm(200)/5 # compute the periodic local fit. Note the scale argument is period/(2pi) fit <- locfit(y~ang(x,scale=0.2/(2*pi))) # plot the fit over a single period plot(fit) # plot the fit over the full range of the data plot(fit,xlim=c(0,1))
Example dataset from Loader (1999).
data(bad)
data(bad)
Data Frame with x and y variables.
Loader, C. (1999). Bandwidth Selection: Classical or Plug-in? Annals of Statistics 27.
Scores in 265 innings for Australian batsman Allan Border.
data(border)
data(border)
A dataframe with day (decimalized); not out indicator and score. The not out indicator should be used as a censoring variable.
Compiled from the Cricinfo archives.
CricInfo: The Home of Cricket on the Internet. https://www.espncricinfo.com/
Numeric variables are rw
, fpg
,
ga
, ina
and sspg
. Classifier cc
is the Diabetic
type.
data(chemdiab)
data(chemdiab)
Data frame with five numeric measurements and categroical response.
Reaven and Miller (1979).
Reaven, G. M. and Miller, R. G. (1979). An attempt to define the nature of chemical diabetes using a multidimensional analysis. Diabetologia 16, 17-24.
A random sample of size 54 from the claw density of Marron and Wand (1992), as used in Figure 10.5 of Loader (1999).
data(claw54)
data(claw54)
Numeric vector with length 54.
Randomly generated.
Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
Marron, J. S. and Wand, M. P. (1992). Exact mean integrated squared error. Annals of Statistics 20, 712-736.
Observations from Figure 8.7 of Loader (1999).
data(cldem)
data(cldem)
Data Frame with x and y variables.
Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
200 observations from a 2 population model. Under population 0,
has a standard normal distribution, and
, where
is also standard normal.
Under population 1,
.
The optimal classification regions form a checkerboard pattern,
with horizontal boundary at
, vertical boundaries at
.
This is the same model as the cltrain dataset.
data(cltest)
data(cltest)
Data Frame. Three variables x1, x2 and y. The latter indicates class membership.
200 observations from a 2 population model. Under population 0,
has a standard normal distribution, and
, where
is also standard normal.
Under population 1,
.
The optimal classification regions form a checkerboard pattern,
with horizontal boundary at
, vertical boundaries at
.
This is the same model as the cltest dataset.
data(cltrain)
data(cltrain)
Data Frame. Three variables x1, x2 and y. The latter indicates class membership.
Monthly time series of carbon dioxide measurements at Mauna Loa, Hawaii from 1959 to 1990.
data(co2)
data(co2)
Data frame with year
, month
and co2
variables.
Boden, Sepanski and Stoss (1992).
Boden, Sepanski and Stoss (1992). Trends '91: A compedium of data on global change - Highlights. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory.
The calling sequence for cp
matches those for the
locfit
or locfit.raw
functions.
The fit is not returned; instead, the returned object contains
Cp criterion for the fit.
Cp is usually computed using a variance estimate from the largest
model under consideration, rather than
. This will be done
automatically when the
cpplot
function is used.
The Cp score is exact (up to numerical roundoff) if the
ev="data"
argument is provided. Otherwise, the residual
sum-of-squares and degrees of freedom are computed using locfit's
standard interpolation based approximations.
cp(x, ..., sig2=1)
cp(x, ..., sig2=1)
x |
model formula or numeric vector of the independent variable. |
... |
other arguments to |
sig2 |
residual variance estimate. |
A term entered in a locfit
model formula using
cpar
will result in a fit that is conditionally parametric.
Equivalent to lp(x,style="cpar")
.
This function is presently almost deprecated. Specifying a conditionally
parametric fit as y~x1+cpar(x2)
wil no longer work; instead, the
model is specified as y~lp(x1,x2,style=c("n","cpar"))
.
cpar(x,...)
cpar(x,...)
x |
numeric variable. |
... |
Other arguments to |
data(ethanol, package="locfit") # fit a conditionally parametric model fit <- locfit(NOx ~ lp(E, C, style=c("n","cpar")), data=ethanol) plot(fit) # one way to force a parametric fit with locfit fit <- locfit(NOx ~ cpar(E), data=ethanol)
data(ethanol, package="locfit") # fit a conditionally parametric model fit <- locfit(NOx ~ lp(E, C, style=c("n","cpar")), data=ethanol) plot(fit) # one way to force a parametric fit with locfit fit <- locfit(NOx ~ cpar(E), data=ethanol)
The cpplot
function loops through calls to the cp
function (and hence to link{locfit}
), using a different
smoothing parameter for each call.
The returned structure contains the Cp statistic for each fit, and can
be used to produce an AIC plot.
cpplot(..., alpha, sig2)
cpplot(..., alpha, sig2)
... |
|
alpha |
Matrix of smoothing parameters. The |
sig2 |
Residual variance. If not specified, the residual variance is computed using the fitted model with the fewest residual degrees of freedom. |
An object with class "gcvplot"
, containing the smoothing
parameters and CP scores. The actual plot is produced using
plot.gcvplot
.
locfit
,
locfit.raw
,
gcv
,
aic
,
plot.gcvplot
data(ethanol) plot(cpplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
data(ethanol) plot(cpplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
Every "locfit"
object contains a critical value object to be used in
computing and ploting confidence intervals. By default, a 95% pointwise
confidence level is used. To change the confidence level, the critical
value object must be substituted using crit
and
crit<-
.
crit(fit, const=c(0, 1), d=1, cov=0.95, rdf=0) crit(fit) <- value
crit(fit, const=c(0, 1), d=1, cov=0.95, rdf=0) crit(fit) <- value
fit |
|
const |
Tube formula constants for simultaneous bands (the default,
|
d |
Dimension of the fit. Again, users shouldn't usually provide it. |
cov |
Coverage Probability for critical values. |
rdf |
Residual degrees of freedom. If non-zero, the critical values
are based on the Student's t distribution. When |
value |
Critical value object.
locfit
, plot.locfit
,
kappa0
, crit<-
.
# compute and plot 99% confidence intervals, with local variance estimate. data(ethanol) fit <- locfit(NOx~E,data=ethanol) crit(fit) <- crit(fit,cov=0.99) plot(fit,band="local") # compute and plot 99% simultaneous bands crit(fit) <- kappa0(NOx~E,data=ethanol,cov=0.99) plot(fit,band="local")
# compute and plot 99% confidence intervals, with local variance estimate. data(ethanol) fit <- locfit(NOx~E,data=ethanol) crit(fit) <- crit(fit,cov=0.99) plot(fit,band="local") # compute and plot 99% simultaneous bands crit(fit) <- kappa0(NOx~E,data=ethanol,cov=0.99) plot(fit,band="local")
dat
is used to specify evaluation on the given data points
for locfit.raw()
.
dat(cv=FALSE)
dat(cv=FALSE)
cv |
Whether cross-validation should be done. |
This function provides an interface to Locfit, in the syntax of
(a now old version of) the S-Plus density
function. This can reproduce
density
results, but allows additional
locfit.raw
arguments, such as the degree of fit, to be given.
It also works in double precision, whereas density
only works
in single precision.
density.lf(x, n = 50, window = "gaussian", width, from, to, cut = if(iwindow == 4.) 0.75 else 0.5, ev = lfgrid(mg = n, ll = from, ur = to), deg = 0, family = "density", link = "ident", ...)
density.lf(x, n = 50, window = "gaussian", width, from, to, cut = if(iwindow == 4.) 0.75 else 0.5, ev = lfgrid(mg = n, ll = from, ur = to), deg = 0, family = "density", link = "ident", ...)
x |
numeric vector of observations whose density is to be estimated. |
n |
number of evaluation points.
Equivalent to the |
window |
Window type to use for estimation.
Equivalent to the |
width |
Window width. Following |
from |
Lower limit for estimation domain. |
to |
Upper limit for estimation domain. |
cut |
Controls default expansion of the domain. |
ev |
Locfit evaluation structure – default |
deg |
Fitting degree – default 0 for kernel estimation. |
family |
Fitting family – default is |
link |
Link function – default is the |
... |
Additional arguments to |
A list with components x
(evaluation points) and y
(estimated density).
density
,
locfit
,
locfit.raw
data(geyser) density.lf(geyser, window="tria") # the same result with density, except less precision. density(geyser, window="tria")
data(geyser) density.lf(geyser, window="tria") # the same result with density, except less precision. density(geyser, window="tria")
NOx exhaust emissions from a single cylinder engine. Two predictor variables are E (the engine's equivalence ratio) and C (Compression ratio).
data(ethanol)
data(ethanol)
Data frame with NOx, E and C variables.
Brinkman (1981). Also studied extensively by Cleveland (1993).
Brinkman, N. D. (1981). Ethanol fuel - a single-cylinder engine study of efficiency and exhaust emissions. SAE transactions 90, 1414-1424.
Cleveland, W. S. (1993). Visualizing data. Hobart Press, Summit, NJ.
NOx exhaust emissions from a single cylinder engine. Two predictor variables are E (the engine's equivalence ratio) and C (Compression ratio).
data(ethanol)
data(ethanol)
Data frame with NOx, E and C variables.
Brinkman (1981). Also studied extensively by Cleveland (1993).
Brinkman, N. D. (1981). Ethanol fuel - a single-cylinder engine study of efficiency and exhaust emissions. SAE transactions 90, 1414-1424.
Cleveland, W. S. (1993). Visualizing data. Hobart Press, Summit, NJ.
Computes .
This is the inverse of the logistic link function,
.
expit(x)
expit(x)
x |
numeric vector |
Evaluates the fitted values (i.e. evaluates the surface at the original data points) for a Locfit object. This function works by reconstructing the model matrix from the original formula, and predicting at those points. The function may be fooled; for example, if the original data frame has changed since the fit, or if the model formula includes calls to random number generators.
## S3 method for class 'locfit' fitted(object, data=NULL, what="coef", cv=FALSE, studentize=FALSE, type="fit", tr, ...)
## S3 method for class 'locfit' fitted(object, data=NULL, what="coef", cv=FALSE, studentize=FALSE, type="fit", tr, ...)
object |
|
data |
The data frame for the original fit. Usually, this shouldn't be needed, especially when the function is called directly. It may be needed when called inside another function. |
what |
What to compute fitted values of. The default, |
cv |
If |
studentize |
If |
type |
Type of fit or residuals to compute. The default is |
tr |
Back transformation for likelihood models. |
... |
arguments passed to and from methods. |
A numeric vector of the fitted values.
locfit
,
predict.locfit
,
residuals.locfit
Extract the model formula from a locfit object.
## S3 method for class 'locfit' formula(x, ...)
## S3 method for class 'locfit' formula(x, ...)
x |
|
... |
Arguments passed to and from other methods. |
Returns the formula from the locfit object.
This is a locfit calling function used by
lf()
terms in additive models. It is
not normally called directly by users.
gam.lf(x, y, w, xeval, ...)
gam.lf(x, y, w, xeval, ...)
x |
numeric predictor |
y |
numeric response |
w |
prior weights |
xeval |
evaluation points |
... |
other arguments to |
locfit
,
locfit.raw
,
lf
,
gam
This vector adds "lf"
to the default vector of special
terms recognized by a gam()
model formula.
To ensure this is recognized, attach the Locfit library with
library(locfit,first=T)
.
Character vector.
lf
,
gam
The calling sequence for gcv
matches those for the
locfit
or locfit.raw
functions.
The fit is not returned; instead, the returned object contains
Wahba's generalized cross-validation score for the fit.
The GCV score is exact (up to numerical roundoff) if the
ev="data"
argument is provided. Otherwise, the residual
sum-of-squares and degrees of freedom are computed using locfit's
standard interpolation based approximations.
For likelihood models, GCV is computed uses the deviance in place of the residual sum of squares. This produces useful results but I do not know of any theory validating this extension.
gcv(x, ...)
gcv(x, ...)
x , ...
|
Arguments passed on to |
The gcvplot
function loops through calls to the gcv
function (and hence to link{locfit}
), using a different
smoothing parameter for each call.
The returned structure contains the GCV statistic for each fit, and can
be used to produce an GCV plot.
gcvplot(..., alpha, df=2)
gcvplot(..., alpha, df=2)
... |
|
alpha |
Matrix of smoothing parameters. The |
df |
Degrees of freedom to use as the x-axis. 2=trace(L), 3=trace(L'L). |
An object with class "gcvplot"
, containing the smoothing
parameters and GCV scores. The actual plot is produced using
plot.gcvplot
.
locfit
,
locfit.raw
,
gcv
,
plot.gcvplot
,
summary.gcvplot
data(ethanol) plot(gcvplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
data(ethanol) plot(gcvplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
The durations of 107 eruptions of the Old Faithful Geyser.
data(geyser)
data(geyser)
A numeric vector of length 107.
Scott (1992). Note that several different Old Faithful Geyser datasets (including the faithful dataset in R's base library) have been used in various places in the statistics literature. The version provided here has been used in density estimation and bandwidth selection work.
Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. Wiley.
This is a variant of the geyser
dataset, where
each observation is rounded to the nearest 0.05 minutes, and the
counts tallied.
data(geyser.round)
data(geyser.round)
Data Frame with variables duration
and count
.
Scott (1992). Note that several different Old Faithful Geyser datasets (including the faithful dataset in R's base library) have been used in various places in the statistics literature. The version provided here has been used in density estimation and bandwidth selection work.
Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. Wiley.
hatmatrix()
computes the weight diagrams (also known as
equivalent or effective kernels) for a local regression smooth.
Essentially, hatmatrix()
is a front-end to locfit()
,
setting a flag to compute and return weight diagrams, rather than the
fit.
hatmatrix(formula, dc=TRUE, ...)
hatmatrix(formula, dc=TRUE, ...)
formula |
model formula. |
dc |
derivative adjustment (see |
... |
Other arguments to |
A matrix with n rows and p columns; each column being the
weight diagram for the corresponding locfit
fit point.
If ev="data"
, this is the transpose of the hat matrix.
locfit
, plot.locfit.1d
, plot.locfit.2d
,
plot.locfit.3d
, lines.locfit
, predict.locfit
The survival times of 184 participants in the Stanford heart transplant program.
data(heart)
data(heart)
Data frame with surv, cens and age variables.
Miller and Halperin (1982). The original dataset includes information on additional patients who never received a transplant. Other authors reported earlier versions of the data.
Miller, R. G. and Halperin, J. (1982). Regression with censored data. Biometrika 69, 521-531.
An experiment measuring death rates for insects, with 30 insects at each of five treatment levels.
data(insect)
data(insect)
Data frame with lconc
(dosage), deaths
(number of deaths) and nins
(number of insects) variables.
Bliss (1935).
Bliss (1935). The calculation of the dosage-mortality curve. Annals of Applied Biology 22, 134-167.
Four measurements on each of fifty flowers of two species of iris (Versicolor and Virginica) – A classification dataset. Fisher's original dataset contained a third species (Setosa) which is trivially seperable.
data(iris)
data(iris)
Data frame with species, petal.wid, petal.len, sepal.wid, sepal.len.
Fisher (1936). Reproduced in Andrews and Herzberg (1985) Chapter 1.
Andrews, D. F. and Herzberg, A. M. (1985). Data. Springer-Verlag.
Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7, Part II. 179-188.
Variables are sex
(m/f), spec
(giganteus, melanops,
fuliginosus) and 18 numeric measurements.
data(kangaroo)
data(kangaroo)
Data frame with measurements on the skulls of 101 kangaroos. (number of insects) variables.
Andrews and Herzberg (1985) Chapter 53.
Andrews, D. F. and Herzberg, A. M. (1985). Data. Springer-Verlag, New York.
The geometric constants for simultaneous confidence bands are computed,
as described in Sun and Loader (1994) (bias adjustment is not implemented
here). These are then passed to the crit
function, which
computes the critical value for the confidence bands.
The method requires both the weight diagrams l(x), the derivative l'(x) and (in 2 or more dimensions) the second derivatives l”(x). These are implemented exactly for a constant bandwidth. For nearest neighbor bandwidths, the computations are approximate and a warning is produced.
The theoretical justification for the bands uses normality of
the random errors in the regression model,
and in particular the spherical symmetry of the error vector.
For non-normal distributions, and likelihood models, one relies
on central limit and related theorems.
Computation uses the product Simpson's rule to evaluate the
multidimensional integrals (The domain of integration, and
hence the region of simultaneous coverage, is determined by
the flim
argument). Expect the integration to be slow in more
than one dimension. The mint
argument controls the
precision.
kappa0(formula, cov=0.95, ev=lfgrid(20), ...)
kappa0(formula, cov=0.95, ev=lfgrid(20), ...)
formula |
Local regression model formula. A |
cov |
Coverage Probability for critical values. |
ev |
Locfit evaluation structure. Should usually be a grid – this specifies the integration rule. |
... |
Other arguments to |
A list with components for the critical value, geometric constants,
e.t.c. Can be passed directly to plot.locfit
as the
crit
argument.
Sun, J. and Loader, C. (1994). Simultaneous confidence bands for linear regression and smoothing. Annals of Statistics 22, 1328-1345.
locfit
, plot.locfit
,
crit
, crit<-
.
# compute and plot simultaneous confidence bands data(ethanol) fit <- locfit(NOx~E,data=ethanol) crit(fit) <- kappa0(NOx~E,data=ethanol) plot(fit,crit=crit,band="local")
# compute and plot simultaneous confidence bands data(ethanol) fit <- locfit(NOx~E,data=ethanol) crit(fit) <- kappa0(NOx~E,data=ethanol) plot(fit,crit=crit,band="local")
Function to compute kernel density estimate bandwidths, as used in the simulation results in Chapter 10 of Loader (1999).
This function is included for comparative purposes only. Plug-in selectors are based on flawed logic, make unreasonable and restrictive assumptions and do not use the full power of the estimates available in Locfit. Any relation between the results produced by this function and desirable estimates are entirely coincidental.
kdeb(x, h0 = 0.01 * sd, h1 = sd, meth = c("AIC", "LCV", "LSCV", "BCV", "SJPI", "GKK"), kern = "gauss", gf = 2.5)
kdeb(x, h0 = 0.01 * sd, h1 = sd, meth = c("AIC", "LCV", "LSCV", "BCV", "SJPI", "GKK"), kern = "gauss", gf = 2.5)
x |
One dimensional data vector. |
h0 |
Lower limit for bandwidth selection. Can be fairly small, but h0=0 would cause problems. |
h1 |
Upper limit. |
meth |
Required selection method(s). |
kern |
Kernel. Most methods require |
gf |
Standard deviation for the gaussian kernel. Default 2.5, as Locfit's standard. Most papers use 1. |
Vector of selected bandwidths.
Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
This function computes the mean residual life for censored data
using the Kaplan-Meier estimate of the survival function. If
is the K-M estimate, the MRL for a censored observation
is computed as
. We take
when
is greater than the largest observation,
regardless of whether that observation was censored.
When there are ties between censored and uncensored observations, for definiteness our ordering places the censored observations before uncensored.
This function is used by locfit.censor
to compute
censored regression estimates.
km.mrl(times, cens)
km.mrl(times, cens)
times |
Obsereved survival times. |
cens |
Logical variable indicating censoring. The coding is |
A vector of the estimated mean residual life. For uncensored observations, the corresponding estimate is 0.
Buckley, J. and James, I. (1979). Linear Regression with censored data. Biometrika 66, 429-436.
Loader, C. (1999). Local Regression and Likelihood. Springer, NY (Section 7.2).
# censored regression using the Kaplan-Meier estimate. data(heart, package="locfit") fit <- locfit.censor(log10(surv+0.5)~age, cens=cens, data=heart, km=TRUE) plotbyfactor(heart$age, 0.5+heart$surv, heart$cens, ylim=c(0.5,16000), log="y") lines(fit, tr=function(x)10^x)
# censored regression using the Kaplan-Meier estimate. data(heart, package="locfit") fit <- locfit.censor(log10(surv+0.5)~age, cens=cens, data=heart, km=TRUE) plotbyfactor(heart$age, 0.5+heart$surv, heart$cens, ylim=c(0.5,16000), log="y") lines(fit, tr=function(x)10^x)
The calling sequence for lcv
matches those for the
locfit
or locfit.raw
functions.
The fit is not returned; instead, the returned object contains
likelihood cross validation score for the fit.
The LCV score is exact (up to numerical roundoff) if the
ev="cross"
argument is provided. Otherwise, the influence
and cross validated residuals
are computed using locfit's
standard interpolation based approximations.
lcv(x, ...)
lcv(x, ...)
x |
model formula |
... |
other arguments to locfit |
The lcvplot
function loops through calls to the lcv
function (and hence to link{locfit}
), using a different
smoothing parameter for each call.
The returned structure contains the likelihood cross validation statistic
for each fit, and can be used to produce an LCV plot.
lcvplot(..., alpha)
lcvplot(..., alpha)
... |
|
alpha |
Matrix of smoothing parameters. The |
An object with class "gcvplot"
, containing the smoothing
parameters and LCV scores. The actual plot is produced using
plot.gcvplot
.
locfit
,
locfit.raw
,
gcv
,
lcv
,
plot.gcvplot
data(ethanol) plot(lcvplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
data(ethanol) plot(lcvplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
The left()
function is used in a locfit model formula
to specify a one-sided smooth: when fitting at a point ,
only data points with
should be used.
This can be useful in estimating points of discontinuity,
and in cross-validation for forecasting a time series.
left(x)
is equivalent to lp(x,style="left")
.
When using this function, it will usually be necessary to specify an
evaluation structure, since the fit is not smooth and locfit's
interpolation methods are unreliable. Also, it is usually best
to use deg=0
or deg=1
, otherwise the fits may be too
variable. If nearest neighbor bandwidth specification is used,
it does not recognize left()
.
left(x,...)
left(x,...)
x |
numeric variable. |
... |
Other arguments to |
# compute left and right smooths data(penny) xev <- (1945:1988)+0.5 fitl <- locfit(thickness~left(year,h=10,deg=1), ev=xev, data=penny) fitr <- locfit(thickness~right(year,h=10,deg=1),ev=xev, data=penny) # plot the squared difference, to show the change points. plot( xev, (predict(fitr,where="ev") - predict(fitl,where="ev"))^2 )
# compute left and right smooths data(penny) xev <- (1945:1988)+0.5 fitl <- locfit(thickness~left(year,h=10,deg=1), ev=xev, data=penny) fitr <- locfit(thickness~right(year,h=10,deg=1),ev=xev, data=penny) # plot the squared difference, to show the change points. plot( xev, (predict(fitr,where="ev") - predict(fitl,where="ev"))^2 )
This function is used to specify a smooth term in a gam()
model formula.
This function is designed to be used with the S-Plus
gam()
function. For R users, there are at least two different
gam()
functions available. Most current distributions of R
will include the mgcv
library by Simon Wood; lf()
is not compatable with this function.
On CRAN, there is a gam
package by Trevor Hastie, similar to
the S-Plus version. lf()
should be compatable with this, although
it's untested.
lf(..., alpha=0.7, deg=2, scale=1, kern="tcub", ev=rbox(), maxk=100)
lf(..., alpha=0.7, deg=2, scale=1, kern="tcub", ev=rbox(), maxk=100)
... |
numeric predictor variable(s) |
alpha , deg , scale , kern , ev , maxk
|
these are as in
|
locfit
,
locfit.raw
,
gam.lf
,
gam
## Not run: # fit an additive semiparametric model to the ethanol data. stopifnot(require(gam)) # The `gam' package must be attached _before_ `locfit', otherwise # the following will not work. data(ethanol, package = "lattice") fit <- gam(NOx ~ lf(E) + C, data=ethanol) op <- par(mfrow=c(2, 1)) plot(fit) par(op) ## End(Not run)
## Not run: # fit an additive semiparametric model to the ethanol data. stopifnot(require(gam)) # The `gam' package must be attached _before_ `locfit', otherwise # the following will not work. data(ethanol, package = "lattice") fit <- gam(NOx ~ lf(E) + C, data=ethanol) op <- par(mfrow=c(2, 1)) plot(fit) par(op) ## End(Not run)
Extracts the evaluation structure from a "locfit"
object.
This object has the class "lfeval"
, and has its own set of
methods for plotting e.t.c.
lfeval(object)
lfeval(object)
object |
|
"lfeval"
object.
locfit
,
plot.lfeval
,
print.lfeval
lfgrid()
is used to specify evaluation on a grid of points
for locfit.raw()
. The structure computes
a bounding box for the data, and divides that into a grid with
specified margins.
lfgrid(mg=10, ll, ur)
lfgrid(mg=10, ll, ur)
mg |
Number of grid points along each margin. Can be a single number (which is applied in each dimension), or a vector specifying a value for each dimension. |
ll |
Lower left limits for the grid. Length should be the number
of dimensions of the data provided to |
ur |
Upper right limits for the grid. By default, |
data(ethanol, package="locfit") plot.eval(locfit(NOx ~ lp(E, C, scale=TRUE), data=ethanol, ev=lfgrid()))
data(ethanol, package="locfit") plot.eval(locfit(NOx ~ lp(E, C, scale=TRUE), data=ethanol, ev=lfgrid()))
Extracts information, such as fitted values, influence functions
from a "locfit"
object.
lfknots(x, tr, what = c("x", "coef", "h", "nlx"), delete.pv = TRUE)
lfknots(x, tr, what = c("x", "coef", "h", "nlx"), delete.pv = TRUE)
x |
Fitted object from |
tr |
Back transformation. Default is the invers link function from the Locfit object. |
what |
What to return; default is |
delete.pv |
If |
A matrix with one row for each fit point. Columns correspond to
the specified what
vector; some fields contribute multiple columns.
This function is used internally to interpret xlim
and flim
arguments. It should not be called directly.
lflim(limits, nm, ret)
lflim(limits, nm, ret)
limits |
Limit argument. |
nm |
Variable names. |
ret |
Initial return vector. |
Vector with length 2*dim.
This function is usually called by plot.locfit
.
lfmarg(xlim, m = 40)
lfmarg(xlim, m = 40)
xlim |
Vector of limits for the grid. Should be of length 2*d;
the first d components represent the lower left corner,
and the next d components the upper right corner.
Can also be a |
m |
Number of points for each grid margin. Can be a vector of length d. |
A list, whose components are the d grid margins.
Adds a Locfit line to an existing plot. llines
is for use
within a panel function for Lattice.
## S3 method for class 'locfit' lines(x, m=100, tr=x$trans, ...) ## S3 method for class 'locfit' llines(x, m=100, tr=x$trans, ...)
## S3 method for class 'locfit' lines(x, m=100, tr=x$trans, ...) ## S3 method for class 'locfit' llines(x, m=100, tr=x$trans, ...)
x |
|
m |
Number of points to evaluate the line at. |
tr |
Transformation function to use for plotting. Default is the inverse link function, or the identity function if derivatives are required. |
... |
Other arguments to the default |
Survival times for 622 patients diagnosed with Liver Metastases.
Beware, the censoring variable
is coded as 1 = uncensored, so use cens=1-z
in
locfit()
calls.
data(livmet)
data(livmet)
Data frame with survival times (t
), censoring indicator
(z
) and a number of covariates.
Haupt and Mansmann (1995)
Haupt, G. and Mansmann, U. (1995) CART for Survival Data. Statlib Archive.
locfit
is the model formula-based interface to the Locfit
library for fitting local regression and likelihood models.
locfit
is implemented as a front-end to locfit.raw
.
See that function for options to control smoothing parameters,
fitting family and other aspects of the fit.
locfit(formula, data=sys.frame(sys.parent()), weights=1, cens=0, base=0, subset, geth=FALSE, ..., lfproc=locfit.raw)
locfit(formula, data=sys.frame(sys.parent()), weights=1, cens=0, base=0, subset, geth=FALSE, ..., lfproc=locfit.raw)
formula |
Model Formula; e.g. |
data |
Data Frame. |
weights |
Prior weights (or sample sizes) for individual observations. This is typically used where observations have unequal variance. |
cens |
Censoring indicator. |
base |
Baseline for local fitting. For local regression models, specifying
a |
subset |
Subset observations in the data frame. |
geth |
Don't use. |
... |
Other arguments to |
lfproc |
A processing function to compute the local fit. Default is
|
An object with class "locfit"
. A standard set of methods for printing,
ploting, etc. these objects is provided.
Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
# fit and plot a univariate local regression data(ethanol, package="locfit") fit <- locfit(NOx ~ E, data=ethanol) plot(fit, get.data=TRUE) # a bivariate local regression with smaller smoothing parameter fit <- locfit(NOx~lp(E,C,nn=0.5,scale=0), data=ethanol) plot(fit) # density estimation data(geyser, package="locfit") fit <- locfit( ~ lp(geyser, nn=0.1, h=0.8)) plot(fit,get.data=TRUE)
# fit and plot a univariate local regression data(ethanol, package="locfit") fit <- locfit(NOx ~ E, data=ethanol) plot(fit, get.data=TRUE) # a bivariate local regression with smaller smoothing parameter fit <- locfit(NOx~lp(E,C,nn=0.5,scale=0), data=ethanol) plot(fit) # density estimation data(geyser, package="locfit") fit <- locfit( ~ lp(geyser, nn=0.1, h=0.8)) plot(fit,get.data=TRUE)
locfit.censor
produces local regression estimates for censored
data. The basic idea is to use an EM style algorithm, where one
alternates between estimating the regression and the true values
of censored observations.
locfit.censor
is designed as a front end
to locfit.raw
with data vectors, or as an intemediary
between locfit
and locfit.raw
with a
model formula. If you can stand the syntax, the second calling
sequence above will be slightly more efficient than the third.
locfit.censor(x, y, cens, ..., iter=3, km=FALSE)
locfit.censor(x, y, cens, ..., iter=3, km=FALSE)
x |
Either a |
y |
If |
cens |
Logical variable indicating censoring. The coding is |
... |
Other arguments to |
iter |
Number of EM iterations to perform |
km |
If |
locfit
object.
Buckley, J. and James, I. (1979). Linear Regression with censored data. Biometrika 66, 429-436.
Loader, C. (1999). Local Regression and Likelihood. Springer, NY (Section 7.2).
Schmee, J. and Hahn, G. J. (1979). A simple method for linear regression analysis with censored data (with discussion). Technometrics 21, 417-434.
data(heart, package="locfit") fit <- locfit.censor(log10(surv+0.5) ~ age, cens=cens, data=heart) ## Can also be written as: ## Not run: fit <- locfit(log10(surv + 0.5) ~ age, cens=cens, data=heart, lfproc=locfit.censor) with(heart, plotbyfactor(age, 0.5 + surv, cens, ylim=c(0.5, 16000), log="y")) lines(fit, tr=function(x) 10^x)
data(heart, package="locfit") fit <- locfit.censor(log10(surv+0.5) ~ age, cens=cens, data=heart) ## Can also be written as: ## Not run: fit <- locfit(log10(surv + 0.5) ~ age, cens=cens, data=heart, lfproc=locfit.censor) with(heart, plotbyfactor(age, 0.5 + surv, cens, ylim=c(0.5, 16000), log="y")) lines(fit, tr=function(x) 10^x)
Reconstructs the model matrix, and associated variables such as
the response, prior weights and censoring indicators, from a
locfit
object. This is used by functions such as
fitted.locfit
; it is not normally called directly.
The function will only work properly if the data frame has not been
changed since the fit was constructed.
locfit.matrix(fit, data)
locfit.matrix(fit, data)
fit |
Locfit object |
data |
Data Frame. |
A list with variables x
(the model matrix); y
(the response);
w
(prior weights); sc
(scales); ce
(censoring indicator)
and base
(baseline fit).
locfit
, fitted.locfit
, residuals.locfit
locfit.quasi
assumes a specified mean-variance relation,
and performs iterartive reweighted local regression under this
assumption. This is appropriate for local quasi-likelihood models,
and is an alternative to specifying a family such as "qpoisson"
.
locfit.quasi
is designed as a front end
to locfit.raw
with data vectors, or as an intemediary
between locfit
and locfit.raw
with a
model formula. If you can stand the syntax, the second calling
sequence above will be slightly more efficient than the third.
locfit.quasi(x, y, weights, ..., iter=3, var=abs)
locfit.quasi(x, y, weights, ..., iter=3, var=abs)
x |
Either a |
y |
If |
weights |
Case weights to use in the fitting. |
... |
Other arguments to |
iter |
Number of EM iterations to perform |
var |
Function specifying the assumed relation between the mean and variance. |
"locfit"
object.
locfit.raw
is an interface to Locfit using numeric vectors
(for a model-formula based interface, use locfit
).
Although this function has a large number of arguments, most users
are likely to need only a small subset.
The first set of arguments (x
, y
, weights
,
cens
, and base
) specify the regression
variables and associated quantities.
Another set (scale
, alpha
, deg
, kern
,
kt
, acri
and basis
) control the amount of smoothing:
bandwidth, smoothing weights and the local model. Most of these arguments
are deprecated - they'll currently still work, but should be provided through
the lp()
model term instead.
deriv
and dc
relate to derivative (or local slope)
estimation.
family
and link
specify the likelihood family.
xlim
and renorm
may be used in density estimation.
ev
specifies the evaluation structure or set of evaluation points.
maxk
, itype
, mint
, maxit
and debug
control the Locfit algorithms, and will be rarely used.
geth
and sty
are used by other functions calling
locfit.raw
, and should not be used directly.
locfit.raw(x, y, weights=1, cens=0, base=0, scale=FALSE, alpha=0.7, deg=2, kern="tricube", kt="sph", acri="none", basis=list(NULL), deriv=numeric(0), dc=FALSE, family, link="default", xlim, renorm=FALSE, ev=rbox(), maxk=100, itype="default", mint=20, maxit=20, debug=0, geth=FALSE, sty="none")
locfit.raw(x, y, weights=1, cens=0, base=0, scale=FALSE, alpha=0.7, deg=2, kern="tricube", kt="sph", acri="none", basis=list(NULL), deriv=numeric(0), dc=FALSE, family, link="default", xlim, renorm=FALSE, ev=rbox(), maxk=100, itype="default", mint=20, maxit=20, debug=0, geth=FALSE, sty="none")
x |
Vector (or matrix) of the independent variable(s). Can be constructed using the
|
y |
Response variable for regression models. For density families,
|
weights |
Prior weights for observations (reciprocal of variance, or sample size). |
cens |
Censoring indicators for hazard rate or censored regression. The coding
is |
base |
Baseline parameter estimate. If provided, the local regression model is
fitted as |
scale |
Deprecated - see |
alpha |
Deprecated - see |
deg |
Degree of local polynomial. Deprecated - see |
kern |
Weight function, default = |
kt |
Kernel type, |
acri |
Deprecated - see |
basis |
User-specified basis functions. |
deriv |
Derivative estimation. If |
dc |
Derivative adjustment. |
family |
Local likelihood family; |
link |
Link function for local likelihood fitting. Depending on the family,
choices may be |
xlim |
For density estimation, Locfit allows the density to be supported on
a bounded interval (or rectangle, in more than one dimension).
The format should be |
renorm |
Local likelihood density estimates may not integrate
exactly to 1. If |
ev |
The evaluation structure,
|
maxk |
Controls space assignment for evaluation structures.
For the adaptive evaluation structures, it is impossible to be sure
in advance how many vertices will be generated. If you get
warnings about ‘Insufficient vertex space’, Locfit's default assigment
can be increased by increasing |
itype |
Integration type for density estimation. Available methods include
|
mint |
Points for numerical integration rules. Default 20. |
maxit |
Maximum iterations for local likelihood estimation. Default 20. |
debug |
If > 0; prints out some debugging information. |
geth |
Don't use! |
sty |
Deprecated - see |
An object with class "locfit". A standard set of methods for printing, ploting, etc. these objects is provided.
Loader, C., (1999) Local Regression and Likelihood.
locfit.robust
implements a robust local regression where
outliers are iteratively identified and downweighted, similarly
to the lowess method (Cleveland, 1979). The iterations and scale
estimation are performed on a global basis.
The scale estimate is 6 times the median absolute residual, while the robust downweighting uses the bisquare function. These are performed in the S code so easily changed.
This can be interpreted as an extension of M estimation to local
regression. An alternative extension (implemented in locfit via
family="qrgauss"
) performs the iteration and scale estimation
on a local basis.
locfit.robust(x, y, weights, ..., iter=3)
locfit.robust(x, y, weights, ..., iter=3)
x |
Either a |
y |
If |
weights |
weights to use in the fitting. |
... |
Other arguments to |
iter |
Number of iterations to perform |
"locfit"
object.
Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assn. 74, 829-836.
lp
is a local polynomial model term for Locfit models.
Usually, it will be the only term on the RHS of the model formula.
Smoothing parameters should be provided as arguments to lp()
,
rather than to locfit()
.
lp(..., nn, h, adpen, deg, acri, scale, style)
lp(..., nn, h, adpen, deg, acri, scale, style)
... |
Predictor variables for the local regression model. |
nn |
Nearest neighbor component of the smoothing parameter.
Default value is 0.7, unless either |
h |
The constant component of the smoothing parameter. Default: 0. |
adpen |
Penalty parameter for adaptive fitting. |
deg |
Degree of polynomial to use. |
acri |
Criterion for adaptive bandwidth selection. |
style |
Style for special terms ( |
scale |
A scale to apply to each variable. This is especially important for
multivariate fitting, where variables may be measured in
non-comparable units. It is also used to specify the frequency
for |
data(ethanol, package="locfit") # fit with 50% nearest neighbor bandwidth. fit <- locfit(NOx~lp(E,nn=0.5),data=ethanol) # bivariate fit. fit <- locfit(NOx~lp(E,C,scale=TRUE),data=ethanol) # density estimation data(geyser, package="locfit") fit <- locfit.raw(lp(geyser,nn=0.1,h=0.8))
data(ethanol, package="locfit") # fit with 50% nearest neighbor bandwidth. fit <- locfit(NOx~lp(E,nn=0.5),data=ethanol) # bivariate fit. fit <- locfit(NOx~lp(E,C,scale=TRUE),data=ethanol) # density estimation data(geyser, package="locfit") fit <- locfit.raw(lp(geyser,nn=0.1,h=0.8))
The calling sequence for lscv
matches those for the
locfit
or locfit.raw
functions.
Note that this function is only designed for density estimation
in one dimension. The returned object contains the
least squares cross validation score for the fit.
The computation of is performed numerically.
For kernel density estimation, this is unlikely to agree exactly
with other LSCV routines, which may perform the integration analytically.
lscv(x, ..., exact=FALSE)
lscv(x, ..., exact=FALSE)
x |
model formula (or numeric vector, if |
... |
other arguments to |
exact |
By default, the computation is approximate.
If |
A vector consisting of the LSCV statistic and fitted degrees of freedom.
locfit
,
locfit.raw
,
lscv.exact
lscvplot
# approximate calculation for a kernel density estimate data(geyser, package="locfit") lscv(~lp(geyser,h=1,deg=0), ev=lfgrid(100,ll=1,ur=6), kern="gauss") # same computation, exact lscv(lp(geyser,h=1),exact=TRUE)
# approximate calculation for a kernel density estimate data(geyser, package="locfit") lscv(~lp(geyser,h=1,deg=0), ev=lfgrid(100,ll=1,ur=6), kern="gauss") # same computation, exact lscv(lp(geyser,h=1),exact=TRUE)
This function performs the exact computation of the least squares cross validation statistic for one-dimensional kernel density estimation and a constant bandwidth.
At the time of writing, it is implemented only for the Gaussian kernel (with the standard deviation of 0.4; Locfit's standard).
lscv.exact(x, h=0)
lscv.exact(x, h=0)
x |
Numeric data vector. |
h |
The bandwidth. If |
A vector of the LSCV statistic and the fitted degrees of freedom.
data(geyser, package="locfit") lscv.exact(lp(geyser,h=0.25)) # equivalent form using lscv lscv(lp(geyser, h=0.25), exact=TRUE)
data(geyser, package="locfit") lscv.exact(lp(geyser,h=0.25)) # equivalent form using lscv lscv(lp(geyser, h=0.25), exact=TRUE)
The lscvplot
function loops through calls to the lscv
function (and hence to link{locfit}
), using a different
smoothing parameter for each call.
The returned structure contains the LSCV statistic for each density
estimate, and can be used to produce an LSCV plot.
lscvplot(..., alpha)
lscvplot(..., alpha)
... |
|
alpha |
Matrix of smoothing parameters. The |
An object with class "gcvplot"
, containing the smoothing
parameters and LSCV scores. The actual plot is produced using
plot.gcvplot
.
locfit
,
locfit.raw
,
gcv
,
lscv
,
plot.gcvplot
Measurements of the acceleration of a motorcycle as it hits a wall. Actually, rumored to be a concatenation of several such datasets.
data(mcyc)
data(mcyc)
Data frame with time and accel variables.
H\"ardle (1990).
H\"ardle, W. (1990). Applied Nonparametric Regression. Cambridge University Press.
The number of fractures in the upper seam of coal mines, and four predictor variables. This dataset can be modeled using Poisson regression.
data(mine)
data(mine)
A dataframe with the response frac, and predictor variables extrp, time, seamh and inb.
Myers (1990).
Myers, R. H. (1990). Classical and Modern Regression with Applications (Second edition). PWS-Kent Publishing, Boston.
50 observations, as used in Figure 13.1 of Loader (1999).
data(cltest)
data(cltest)
Data Frame with x and y variables.
Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
Observed mortality for 55 to 99.
data(morths)
data(morths)
Data frame with age, n and number of deaths.
Henderson and Sheppard (1919).
Henderson, R. and Sheppard, H. N. (1919). Graduation of mortality and other tables. Actuarial Society of America, New York.
none()
is an evaluation structure for locfit.raw()
,
specifying no evaluation points. Only the initial parametric fit is
computed - this is the easiest and most efficient way to coerce
Locfit into producing a parametric regression fit.
none()
none()
data(ethanol, package="locfit") # fit a fourth degree polynomial using locfit fit <- locfit(NOx~E,data=ethanol,deg=4,ev=none()) plot(fit,get.data=TRUE)
data(ethanol, package="locfit") # fit a fourth degree polynomial using locfit fit <- locfit(NOx~E,data=ethanol,deg=4,ev=none()) plot(fit,get.data=TRUE)
For each year, 1945 to 1989, the thickness of two U.S. pennies was recorded.
data(penny)
data(penny)
A dataframe.
Scott (1992).
Scott (1992). Multivariate Density Estimation. Wiley, New York.
This function is used to plot the evaluation structure generated by
Locfit for a two dimensional fit. Vertices of the tree structure are
displayed as O
; pseudo-vertices as *
.
plot.eval(x, add=FALSE, text=FALSE, ...)
plot.eval(x, add=FALSE, text=FALSE, ...)
x |
|
add |
If |
text |
If |
... |
Arguments passed to and from other methods. |
data(ethanol, package="locfit") fit <- locfit(NOx ~ E + C, data=ethanol, scale=0) plot.eval(fit)
data(ethanol, package="locfit") fit <- locfit(NOx ~ E + C, data=ethanol, scale=0) plot.eval(fit)
Plots the value of the GCV (or other statistic) in a gcvplot
object
against the degrees of freedom of the fit.
## S3 method for class 'gcvplot' plot(x, xlab = "Fitted DF", ylab = x$cri, ...)
## S3 method for class 'gcvplot' plot(x, xlab = "Fitted DF", ylab = x$cri, ...)
x |
|
xlab |
Text label for the x axis. |
ylab |
Text label for the y axis. |
... |
Other arguments to |
locfit
,
locfit.raw
,
gcv
,
aicplot
,
cpplot
,
gcvplot
,
lcvplot
data(ethanol) plot(gcvplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
data(ethanol) plot(gcvplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
Plots the evaluation points from a locfit
or lfeval
structure, for one- or two-dimensional fits.
## S3 method for class 'lfeval' plot(x, add=FALSE, txt=FALSE, ...)
## S3 method for class 'lfeval' plot(x, add=FALSE, txt=FALSE, ...)
x |
A |
add |
If |
txt |
If |
... |
Additional graphical parameters. |
"lfeval"
object.
The plot.locfit
function generates grids of ploting points, followed
by a call to preplot.locfit
. The returned object is then
passed to plot.locfit.1d
, plot.locfit.2d
or
plot.locfit.3d
as appropriate.
## S3 method for class 'locfit' plot(x, xlim, pv, tv, m, mtv=6, band="none", tr=NULL, what = "coef", get.data=FALSE, f3d=(d == 2) && (length(tv) > 0), ...)
## S3 method for class 'locfit' plot(x, xlim, pv, tv, m, mtv=6, band="none", tr=NULL, what = "coef", get.data=FALSE, f3d=(d == 2) && (length(tv) > 0), ...)
x |
locfit object. |
xlim |
Plotting limits. Eg. |
pv |
Panel variables, to be varied within each panel of a plot. May be specified as a character vector, or variable numbers. There must be one or two panel variables; default is all variables in one or two dimensions; Variable 1 in three or more dimensions. May by specified using either variable numbers or names. |
tv |
Trellis variables, to be varied from panel to panel of the plot. |
m |
Controls the plot resolution (within panels, for trellis displays). Default is 100 points in one dimension; 40 points (per dimension) in two or more dimensions. |
mtv |
Number of points for trellis variables; default 6. |
band |
Type of confidence bands to add to the plot. Default is |
tr |
Transformation function to use for plotting. Default is the inverse link function, or the identity function if derivatives are requested. |
what |
What to plot. See |
get.data |
If |
f3d |
Force the |
... |
Other arguments to |
locfit
, plot.locfit.1d
,
plot.locfit.2d
, plot.locfit.3d
,
lines.locfit
, predict.locfit
,
preplot.locfit
x <- rnorm(100) y <- dnorm(x) + rnorm(100) / 5 plot(locfit(y~x), band="global") x <- cbind(rnorm(100), rnorm(100)) plot(locfit(~x), type="persp")
x <- rnorm(100) y <- dnorm(x) + rnorm(100) / 5 plot(locfit(y~x), band="global") x <- cbind(rnorm(100), rnorm(100)) plot(locfit(~x), type="persp")
This function is not usually called directly. It will be called automatically
when plotting a one-dimensional locfit
or preplot.locfit
object.
## S3 method for class 'locfit.1d' plot(x, add=FALSE, main="", xlab="default", ylab=x$yname, type="l", ylim, lty=1, col=1, ...)
## S3 method for class 'locfit.1d' plot(x, add=FALSE, main="", xlab="default", ylab=x$yname, type="l", ylim, lty=1, col=1, ...)
x |
One dimensional |
add |
If |
main , xlab , ylab , type , ylim , lty , col
|
Graphical parameters
passed on to |
... |
Additional graphical parameters to the |
locfit
, plot.locfit
, preplot.locfit
This function is not usually called directly. It will be called automatically
when plotting one-dimensional locfit
or preplot.locfit
objects.
## S3 method for class 'locfit.2d' plot(x, type="contour", main, xlab, ylab, zlab=x$yname, ...)
## S3 method for class 'locfit.2d' plot(x, type="contour", main, xlab, ylab, zlab=x$yname, ...)
x |
Two dimensional |
type |
one of |
main |
title for the plot. |
xlab , ylab
|
text labels for the x- and y-axes. |
zlab |
if |
... |
Additional arguments to the |
locfit
, plot.locfit
, preplot.locfit
This function plots cross-sections of a Locfit model (usually in three
or more dimensions) using trellis displays. It is not usually called
directly, but is invoked by plot.locfit
.
The R libraries lattice
and grid
provide a partial
(at time of writing) implementation of trellis. Currently, this works
with one panel variable.
## S3 method for class 'locfit.3d' plot(x, main="", pv, tv, type = "level", pred.lab = x$vnames, resp.lab=x$yname, crit = 1.96, ...)
## S3 method for class 'locfit.3d' plot(x, main="", pv, tv, type = "level", pred.lab = x$vnames, resp.lab=x$yname, crit = 1.96, ...)
x |
|
main |
title for the plot. |
pv |
Panel variables. These are the variables (either one or two) that are varied within each panel of the display. |
tv |
Trellis variables. These are varied from panel to panel of the display. |
type |
Type of display. When there are two panel variables,
the choices are |
pred.lab |
label for the predictor variable. |
resp.lab |
label for the response variable. |
crit |
critical value for the confidence level. |
... |
graphical parameters passed to |
plot.locfit
,
preplot.locfit
The plot.locfit()
function is implemented, roughly, as
a call to preplot.locfit()
, followed by a call to
plot.locfitpred()
. For most users, there will be little
need to call plot.locfitpred()
directly.
## S3 method for class 'preplot.locfit' plot(x, pv, tv, ...)
## S3 method for class 'preplot.locfit' plot(x, pv, tv, ...)
x |
A |
pv , tv , ...
|
Other arguments to |
locfit
, plot.locfit
,
preplot.locfit
, plot.locfit.1d
,
plot.locfit.2d
, plot.locfit.3d
.
Plot method for simultaneous confidence bands created by the
scb
function.
## S3 method for class 'scb' plot(x, add=FALSE, ...)
## S3 method for class 'scb' plot(x, add=FALSE, ...)
x |
|
add |
If |
... |
Arguments passed to and from other methods. |
# corrected confidence bands for a linear logistic model data(insect) fit <- scb(deaths ~ lconc, type=4, w=nins, data=insect, deg=1, family="binomial", kern="parm") plot(fit)
# corrected confidence bands for a linear logistic model data(insect) fit <- scb(deaths ~ lconc, type=4, w=nins, data=insect, deg=1, family="binomial", kern="parm") plot(fit)
Produces a scatter plot of x-y data, with different classes given by a factor f. The different classes are identified by different colours and/or symbols.
plotbyfactor(x, y, f, data, col = 1:10, pch = "O", add = FALSE, lg, xlab = deparse(substitute(x)), ylab = deparse(substitute(y)), log = "", ...)
plotbyfactor(x, y, f, data, col = 1:10, pch = "O", add = FALSE, lg, xlab = deparse(substitute(x)), ylab = deparse(substitute(y)), log = "", ...)
x |
Variable for x axis. |
y |
Variable for y axis. |
f |
Factor (or variable for which as.factor() works). |
data |
data frame for variables x, y, f. Default: sys.parent(). |
col |
Color numbers to use in plot. Will be replicated if shorter than the number of levels of the factor f. Default: 1:10. |
pch |
Vector of plot characters. Replicated if necessary. Default: "O". |
add |
If |
lg |
Coordinates to place a legend. Default: Missing (no legend). |
xlab , ylab
|
Axes labels. |
log |
Should the axes be in log scale? Use |
... |
Other graphical parameters, labels, titles e.t.c. |
data(iris) plotbyfactor(petal.wid, petal.len, species, data=iris)
data(iris) plotbyfactor(petal.wid, petal.len, species, data=iris)
This function shows the points at which the local fit was computed directly, rather than being interpolated. This can be useful if one is unsure of the validity of interpolation.
## S3 method for class 'locfit' points(x, tr, ...)
## S3 method for class 'locfit' points(x, tr, ...)
x |
|
tr |
Back transformation. |
... |
Other arguments to the default |
The locfit
function computes a local fit at a selected set
of points (as defined by the ev
argument). The predict.locfit
function is used to interpolate from these points to any other points.
The method is based on cubic hermite polynomial interpolation, using the
estimates and local slopes at each fit point.
The motivation for this two-step procedure is computational speed.
Depending on the sample size, dimension and fitting procedure, the
local fitting method can be expensive, and it is desirable to keep the
number of points at which the direct fit is computed to a minimum.
The interpolation method used by predict.locfit()
is usually
much faster, and can be computed at larger numbers of points.
## S3 method for class 'locfit' predict(object, newdata=NULL, where = "fitp", se.fit=FALSE, band="none", what="coef", ...)
## S3 method for class 'locfit' predict(object, newdata=NULL, where = "fitp", se.fit=FALSE, band="none", what="coef", ...)
object |
Fitted object from |
newdata |
Points to predict at. Can be given in several forms: vector/matrix; list, data frame. |
se.fit |
If |
where , what , band
|
arguments passed on to
|
... |
Additional arguments to |
If se.fit=F
, a numeric vector of predictors.
If se.fit=T
, a list with components fit
, se.fit
and
residual.scale
.
data(ethanol, package="locfit") fit <- locfit(NOx ~ E, data=ethanol) predict(fit,c(0.6,0.8,1.0))
data(ethanol, package="locfit") fit <- locfit(NOx ~ E, data=ethanol) predict(fit,c(0.6,0.8,1.0))
preplot.locfit
can be called directly, although it is more usual
to call plot.locfit
or predict.locfit
.
The advantage of preplot.locfit
is in S-Plus 5, where arithmetic
and transformations can be performed on the "preplot.locfit"
object.
plot(preplot(fit))
is essentially synonymous with plot(fit)
.
## S3 method for class 'locfit' preplot(object, newdata=NULL, where, tr=NULL, what="coef", band="none", get.data=FALSE, f3d=FALSE, ...)
## S3 method for class 'locfit' preplot(object, newdata=NULL, where, tr=NULL, what="coef", band="none", get.data=FALSE, f3d=FALSE, ...)
object |
Fitted object from |
newdata |
Points to predict at. Can be given in several forms: vector/matrix; list, data frame. |
where |
An alternative to |
tr |
Transformation for likelihood models. Default is the inverse of the link function. |
what |
What to compute predicted values of. The default,
|
band |
Compute standard errors for the fit and include confidence
bands on the returned object. Default is |
get.data |
If |
f3d |
If |
... |
arguments passed to and from other methods. |
An object with class "preplot.locfit"
, containing the predicted
values and additional information used to construct the plot.
locfit
, predict.locfit
, plot.locfit
.
preplot.locfit.raw
is an internal function used by
predict.locfit
and preplot.locfit
.
It should not normally be called directly.
## S3 method for class 'locfit.raw' preplot(object, newdata, where, what, band, ...)
## S3 method for class 'locfit.raw' preplot(object, newdata, where, what, band, ...)
object |
Fitted object from |
newdata |
New data points. |
where |
Type of data provided in |
what |
What to compute predicted values of. |
band |
Compute standard errors for the fit and include confidence bands on the returned object. |
... |
Arguments passed to and from other methods. |
A list containing raw output from the internal prediction routines.
locfit
, predict.locfit
, preplot.locfit
.
Print method for "gcvplot"
objects. Actually, equivalent to
plot.gcvplot()
.
scb
function.
## S3 method for class 'gcvplot' print(x, ...)
## S3 method for class 'gcvplot' print(x, ...)
x |
|
... |
Arguments passed to and from other methods. |
gcvplot
,
plot.gcvplot
summary.gcvplot
Prints a matrix of the evaluation points from a locfit
or lfeval
structure.
## S3 method for class 'lfeval' print(x, ...)
## S3 method for class 'lfeval' print(x, ...)
x |
A |
... |
Arguments passed to and from other methods. |
Matrix of the fit points.
Prints a short summary of a "locfit"
object.
## S3 method for class 'locfit' print(x, ...)
## S3 method for class 'locfit' print(x, ...)
x |
|
... |
Arguments passed to and from other methods. |
Print method for objects created by the
preplot.locfit
function.
## S3 method for class 'preplot.locfit' print(x, ...)
## S3 method for class 'preplot.locfit' print(x, ...)
x |
|
... |
Arguments passed to and from other methods. |
preplot.locfit
,
predict.locfit
Print method for simultaneous confidence bands created by the
scb
function.
## S3 method for class 'scb' print(x, ...)
## S3 method for class 'scb' print(x, ...)
x |
|
... |
Arguments passed to and from other methods. |
Print method for "summary.locfit"
objects.
## S3 method for class 'summary.locfit' print(x, ...)
## S3 method for class 'summary.locfit' print(x, ...)
x |
Object from |
... |
Arguments passed to and from methods. |
rbox()
is used to specify a rectangular box evaluation
structure for locfit.raw()
. The structure begins
by generating a bounding box for the data, then recursively divides
the box to a desired precision.
rbox(cut=0.8, type="tree", ll, ur)
rbox(cut=0.8, type="tree", ll, ur)
type |
If |
cut |
Precision of the tree; a smaller value of |
ll |
Lower left corner of the initial cell. Length should be the number
of dimensions of the data provided to |
ur |
Upper right corner of the initial cell. By default, |
Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
Cleveland, W. and Grosse, E. (1991). Computational Methods for Local Regression. Statistics and Computing 1.
data(ethanol, package="locfit") plot.eval(locfit(NOx~E+C,data=ethanol,scale=0,ev=rbox(cut=0.8))) plot.eval(locfit(NOx~E+C,data=ethanol,scale=0,ev=rbox(cut=0.3)))
data(ethanol, package="locfit") plot.eval(locfit(NOx~E+C,data=ethanol,scale=0,ev=rbox(cut=0.8))) plot.eval(locfit(NOx~E+C,data=ethanol,scale=0,ev=rbox(cut=0.3)))
Function to compute local regression bandwidths for local linear regression,
implemented as a front end to locfit()
.
This function is included for comparative purposes only. Plug-in selectors are based on flawed logic, make unreasonable and restrictive assumptions and do not use the full power of the estimates available in Locfit. Any relation between the results produced by this function and desirable estimates are entirely coincidental.
regband(formula, what = c("CP", "GCV", "GKK", "RSW"), deg=1, ...)
regband(formula, what = c("CP", "GCV", "GKK", "RSW"), deg=1, ...)
formula |
Model Formula (one predictor). |
what |
Methods to use. |
deg |
Degree of fit. |
... |
Other Locfit options. |
Vector of selected bandwidths.
residuals.locfit
is implemented as a front-end to
fitted.locfit
, with the type
argument set.
## S3 method for class 'locfit' residuals(object, data=NULL, type="deviance", ...)
## S3 method for class 'locfit' residuals(object, data=NULL, type="deviance", ...)
object |
|
data |
The data frame for the original fit. Usually, shouldn't be needed. |
type |
Type of fit or residuals to compute. The default is
|
... |
arguments passed to and from other methods. |
A numeric vector of the residuals.
The right()
function is used in a locfit model formula
to specify a one-sided smooth: when fitting at a point ,
only data points with
should be used.
This can be useful in estimating points of discontinuity,
and in cross-validation for forecasting a time series.
right(x)
is equivalent to lp(x,style="right")
.
When using this function, it will usually be necessary to specify an
evaluation structure, since the fit is not smooth and locfit's
interpolation methods are unreliable. Also, it is usually best
to use deg=0
or deg=1
, otherwise the fits may be too
variable. If nearest neighbor bandwidth specification is used,
it does not recognize right()
.
right(x,...)
right(x,...)
x |
numeric variable. |
... |
Other arguments to |
# compute left and right smooths data(penny) xev <- (1945:1988)+0.5 fitl <- locfit(thickness~left(year,h=10,deg=1), ev=xev, data=penny) fitr <- locfit(thickness~right(year, h=10, deg=1), ev=xev, data=penny) # plot the squared difference, to show the change points. plot( xev, (predict(fitr, where="ev") - predict(fitl, where="ev"))^2 )
# compute left and right smooths data(penny) xev <- (1945:1988)+0.5 fitl <- locfit(thickness~left(year,h=10,deg=1), ev=xev, data=penny) fitr <- locfit(thickness~right(year, h=10, deg=1), ev=xev, data=penny) # plot the squared difference, to show the change points. plot( xev, (predict(fitr, where="ev") - predict(fitl, where="ev"))^2 )
As part of the locfit
fitting procedure, an estimate
of the residual variance is computed; the rv
function extracts
the variance from the "locfit"
object.
The estimate used is the residual sum of squares
(or residual deviance, for quasi-likelihood models),
divided by the residual degrees of freedom.
For likelihood (not quasi-likelihood) models, the estimate is 1.0.
rv(fit)
rv(fit)
fit |
|
Returns the residual variance estimate from the "locfit"
object.
data(ethanol) fit <- locfit(NOx~E,data=ethanol) rv(fit)
data(ethanol) fit <- locfit(NOx~E,data=ethanol) rv(fit)
By default, Locfit uses the normalized residual sum of squares as the variance estimate when constructing confidence intervals. In some cases, the user may like to use alternative variance estimates; this function allows the default value to be changed.
rv(fit) <- value
rv(fit) <- value
fit |
|
value |
numeric replacement value. |
locfit(), rv(), plot.locfit()
scb
is implemented as a front-end to locfit
,
to compute simultaneous confidence bands using the tube formula
method and extensions, based on Sun and Loader (1994).
scb(x, ..., ev = lfgrid(20), simul = TRUE, type = 1)
scb(x, ..., ev = lfgrid(20), simul = TRUE, type = 1)
x |
A numeric vector or matrix of predictors (as in
|
... |
Additional arguments to |
ev |
The evaluation structure to use. See |
simul |
Should the coverage be simultaneous or pointwise? |
type |
Type of confidence bands. |
A list containing the evaluation points, fit, standard deviations and upper
and lower confidence bounds. The class is "scb"
; methods for
printing and ploting are provided.
Sun J. and Loader, C. (1994). Simultaneous confidence bands in linear regression and smoothing. The Annals of Statistics 22, 1328-1345.
Sun, J., Loader, C. and McCormick, W. (2000). Confidence bands in generalized linear models. The Annals of Statistics 28, 429-460.
# corrected confidence bands for a linear logistic model data(insect) fit <- scb(deaths~lp(lconc,deg=1), type=4, w=nins, data=insect,family="binomial",kern="parm") plot(fit)
# corrected confidence bands for a linear logistic model data(insect) fit <- scb(deaths~lp(lconc,deg=1), type=4, w=nins, data=insect,family="binomial",kern="parm") plot(fit)
Given a dataset and set of pilot bandwidths, this function computes a bandwidth via the plug-in method, and the assumed ‘pilot’ relationship of Sheather and Jones (1991). The S-J method chooses the bandwidth at which the two intersect.
The purpose of this function is to demonstrate the sensitivity of plug-in methods to pilot bandwidths and assumptions. This function does not provide a reliable method of bandwidth selection.
sjpi(x, a)
sjpi(x, a)
x |
data vector |
a |
vector of pilot bandwidths |
A matrix with four columns; the number of rows equals the length of a
.
The first column is the plug-in selected bandwidth. The second column
is the pilot bandwidths a
. The third column is the pilot bandwidth
according to the assumed relationship of Sheather and Jones. The fourth
column is an intermediate calculation.
Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. JRSS-B 53, 683-690.
# Fig 10.2 (S-J parts) from Loader (1999). data(geyser, package="locfit") gf <- 2.5 a <- seq(0.05, 0.7, length=100) z <- sjpi(geyser, a) # the plug-in curve. Multiplying by gf=2.5 corresponds to Locfit's standard # scaling for the Gaussian kernel. plot(gf*z[, 2], gf*z[, 1], type = "l", xlab = "Pilot Bandwidth k", ylab = "Bandwidth h") # Add the assumed curve. lines(gf * z[, 3], gf * z[, 1], lty = 2) legend(gf*0.05, gf*0.4, lty = 1:2, legend = c("Plug-in", "SJ assumed"))
# Fig 10.2 (S-J parts) from Loader (1999). data(geyser, package="locfit") gf <- 2.5 a <- seq(0.05, 0.7, length=100) z <- sjpi(geyser, a) # the plug-in curve. Multiplying by gf=2.5 corresponds to Locfit's standard # scaling for the Gaussian kernel. plot(gf*z[, 2], gf*z[, 1], type = "l", xlab = "Pilot Bandwidth k", ylab = "Bandwidth h") # Add the assumed curve. lines(gf * z[, 3], gf * z[, 1], lty = 2) legend(gf*0.05, gf*0.4, lty = 1:2, legend = c("Plug-in", "SJ assumed"))
smooth.lf
is a simple interface to the Locfit library.
The input consists of a predictor vector (or matrix) and response.
The output is a list with vectors of fitting points and fitted values.
Most locfit.raw
options are valid.
smooth.lf(x, y, xev=x, direct=FALSE, ...)
smooth.lf(x, y, xev=x, direct=FALSE, ...)
x |
Vector (or matrix) of the independent variable(s). |
y |
Response variable. If omitted, |
xev |
Fitting Points. Default is the data vector |
direct |
Logical variable. If |
... |
Other arguments to |
A list with components x
(fitting points) and y
(fitted values).
Also has a call
component, so update()
will work.
locfit()
,
locfit.raw()
,
density.lf()
.
# using smooth.lf() to fit a local likelihood model. data(morths) fit <- smooth.lf(morths$age, morths$deaths, weights=morths$n, family="binomial") plot(fit,type="l") # update with the direct fit fit1 <- update(fit, direct=TRUE) lines(fit1,col=2) print(max(abs(fit$y-fit1$y)))
# using smooth.lf() to fit a local likelihood model. data(morths) fit <- smooth.lf(morths$age, morths$deaths, weights=morths$n, family="binomial") plot(fit,type="l") # update with the direct fit fit1 <- update(fit, direct=TRUE) lines(fit1,col=2) print(max(abs(fit$y-fit1$y)))
Spencer's 15 point rule is a weighted moving average operation for a sequence of observations equally spaced in time. The average at time t depends on the observations at times t-7,...,t+7.
Except for boundary effects, the function will reproduce polynomials up to degree 3.
spence.15(y)
spence.15(y)
y |
Data vector of observations at equally spaced points. |
A vector with the same length as the input vector, representing the graduated (smoothed) values.
Spencer, J. (1904). On the graduation of rates of sickness and mortality. Journal of the Institute of Actuaries 38, 334-343.
data(spencer) yy <- spence.15(spencer$mortality) plot(spencer$age, spencer$mortality) lines(spencer$age, yy)
data(spencer) yy <- spence.15(spencer$mortality) plot(spencer$age, spencer$mortality) lines(spencer$age, yy)
Spencer's 21 point rule is a weighted moving average operation for a sequence of observations equally spaced in time. The average at time t depends on the observations at times t-11,...,t+11.
Except for boundary effects, the function will reproduce polynomials up to degree 3.
spence.21(y)
spence.21(y)
y |
Data vector of observations at equally spaced points. |
A vector with the same length as the input vector, representing the graduated (smoothed) values.
Spencer, J. (1904). On the graduation of rates of sickness and mortality. Journal of the Institute of Actuaries 38, 334-343.
data(spencer) yy <- spence.21(spencer$mortality) plot(spencer$age, spencer$mortality) lines(spencer$age, yy)
data(spencer) yy <- spence.21(spencer$mortality) plot(spencer$age, spencer$mortality) lines(spencer$age, yy)
Observed mortality rates for ages 20 to 45.
data(spencer)
data(spencer)
Data frame with age and mortality variables.
Spencer (1904).
Spencer, J. (1904). On the graduation of rates of sickness and mortality. Journal of the Institute of Actuaries 38, 334-343.
Thicknesses of 482 postage stamps of the 1872 Hidalgo issue of Mexico.
data(stamp)
data(stamp)
Data frame with thick
(stamp thickness) and count
(number of stamps) variables.
Izenman and Sommer (1988).
Izenman, A. J. and Sommer, C. J. (1988). Philatelic mixtures and multimodal densities. Journal of the American Statistical Association 73, 602-606.
I've gotta keep track of this mess somehow!
store(data=FALSE, grand=FALSE)
store(data=FALSE, grand=FALSE)
data |
whether data objects are to be saved. |
grand |
whether everything is to be saved. |
Computes a short summary for a generalized cross-validation plot structure
## S3 method for class 'gcvplot' summary(object, ...)
## S3 method for class 'gcvplot' summary(object, ...)
object |
A |
... |
arugments to and from other methods. |
A matrix with two columns; one row for each fit computed in the
gcvplot
call.
The first column is the fitted degrees
of freedom; the second is the GCV or other criterion computed.
data(ethanol) summary(gcvplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
data(ethanol) summary(gcvplot(NOx~E,data=ethanol,alpha=seq(0.2,1.0,by=0.05)))
Prints a short summary of a "locfit"
object.
## S3 method for class 'locfit' summary(object, ...)
## S3 method for class 'locfit' summary(object, ...)
object |
|
... |
arguments passed to and from methods. |
A summary.locfit
object, containg a short summary of the
locfit
object.
Prints a short summary of a "preplot.locfit"
object.
## S3 method for class 'preplot.locfit' summary(object, ...)
## S3 method for class 'preplot.locfit' summary(object, ...)
object |
|
... |
arguments passed to and from methods. |
The fitted values from a
preplot.locfit
object.
This is a random sample from a mixture of three bivariate standard normal components; the sample was used for the examples in Loader (1996).
Data frame with 225 observations and variables x0, x1.
Randomly generated in S.
Loader, C. R. (1996). Local Likelihood Density Estimation. Annals of Statistics 24, 1602-1618.
xbar()
is an evaluation structure for locfit.raw()
,
evaluating the fit at a single point, namely, the average of each predictor
variable.
xbar()
xbar()