Title: | Routines for L1 Estimation |
---|---|
Description: | L1 estimation for linear regression using Barrodale and Roberts' method <doi:10.1145/355616.361024> and the EM algorithm <doi:10.1023/A:1020759012226>. Estimation of mean and covariance matrix using the multivariate Laplace distribution, density, distribution function, quantile function and random number generation for univariate and multivariate Laplace distribution <doi:10.1080/03610929808832115>. Implementation of Naik and Plungpongpun <doi:10.1007/0-8176-4487-3_7> for the Generalized spatial median estimator is included. |
Authors: | Felipe Osorio [aut, cre] , Tymoteusz Wolodzko [aut] |
Maintainer: | Felipe Osorio <[email protected]> |
License: | GPL-3 |
Version: | 0.50 |
Built: | 2025-01-07 06:38:07 UTC |
Source: | CRAN |
lad
modelsComputes confidence intervals for one or more parameters in a fitted
model associated to a lad
object.
## S3 method for class 'lad' confint(object, parm, level = 0.95, ...)
## S3 method for class 'lad' confint(object, parm, level = 0.95, ...)
object |
a fitted model object. |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. |
... |
additional argument(s) for related methods. |
confint
is a generic function. Confidence intervals associated to lad
objects are asymptotic, and needs suitable coef
and vcov
methods to be available.
A matrix (or vector) with columns giving lower and upper confidence limits for
each parameter. These will be labelled as (1-level)/2
and 1 - (1-level)/2
in % (by default 2.5% and 97.5%).
confint.glm
and
confint.nls
in package MASS.
fm <- lad(stack.loss ~ ., data = stackloss, method = "BR") confint(fm) # based on asymptotic normality
fm <- lad(stack.loss ~ ., data = stackloss, method = "BR") confint(fm) # based on asymptotic normality
Data from the Martin Marietta and American Can companies collected over a period of 5 years on a monthly basis.
data(ereturns)
data(ereturns)
A data frame with 60 observations on the following 4 variables.
the month in which the observations were collected.
excess returns from the American Can company.
excess returns from the Martin Marietta company.
an index for the excess rate returns for the New York stock exchange.
Butler, R.J., McDonald, J.B., Nelson, R.D., and White, S.B. (1990). Robust and partially adaptive estimation of regression models. The Review of Economics and Statistics 72, 321-327.
Performs an L1 regression on a matrix of explanatory variables and a vector of responses.
l1fit(x, y, intercept = TRUE, tolerance = 1e-07, print.it = TRUE)
l1fit(x, y, intercept = TRUE, tolerance = 1e-07, print.it = TRUE)
x |
vector or matrix of explanatory variables. Each row corresponds to an
observation and each column to a variable. The number of rows of |
y |
numeric vector containing the response. Missing values are not allowed. |
intercept |
logical flag. If |
tolerance |
numerical value used to test for singularity in the regression. |
print.it |
logical flag. If |
The Barrodale-Roberts algorithm, which is a specialized linear programming algorithm, is used.
list defining the regression (compare with function lsfit
).
coefficients |
vector of coefficients. |
residuals |
residuals from the fit. |
message |
character strings stating whether a non-unique solution is possible,
or if the |
Barrodale, I., Roberts, F.D.K. (1973). An improved algorithm for discrete L1 linear approximations. SIAM Journal of Numerical Analysis 10, 839-848.
Barrodale, I., Roberts, F.D.K. (1974). Solution of an overdetermined system of equations in the L1 norm. Communications of the ACM 17, 319-320.
Bloomfield, P., Steiger, W.L. (1983). Least Absolute Deviations: Theory, Applications, and Algorithms. Birkhauser, Boston, Mass.
l1fit(stack.x, stack.loss)
l1fit(stack.x, stack.loss)
This function is used to fit linear models considering Laplace errors.
lad(formula, data, subset, na.action, method = "BR", tol = 1e-7, maxiter = 200, x = FALSE, y = FALSE, contrasts = NULL)
lad(formula, data, subset, na.action, method = "BR", tol = 1e-7, maxiter = 200, x = FALSE, y = FALSE, contrasts = NULL)
formula |
an object of class |
data |
an optional data frame containing the variables in the model. If
not found in |
subset |
an optional expression indicating the subset of the rows of data that should be used in the fit. |
na.action |
a function that indicates what should happen when the data contain NAs. |
method |
character string specifying the fitting method to be used; the options
are |
tol |
the relative tolerance for the iterative algorithm. Default is |
maxiter |
The maximum number of iterations for the |
x , y
|
logicals. If |
contrasts |
an optional list. See the |
An object of class lad
representing the linear model fit. Generic
function print
, show the results of the fit.
The functions print
and summary
are used to obtain and print a summary
of the results. The generic accessor functions coefficients
, fitted.values
and residuals
extract various useful features of the value returned by lad
.
The design was inspired by the R function lm
.
Barrodale, I., Roberts, F.D.K. (1974). Solution of an overdetermined system of equations in the L1 norm. Communications of the ACM 17, 319-320.
Phillips, R.F. (2002). Least absolute deviations estimation via the EM algorithm. Statistics and Computing 12, 281-285.
fm <- lad(stack.loss ~ ., data = stackloss, method = "BR") summary(fm) data(ereturns) fm <- lad(m.marietta ~ CRSP, data = ereturns, method = "EM") summary(fm) # basic observations fm$basic
fm <- lad(stack.loss ~ ., data = stackloss, method = "BR") summary(fm) data(ereturns) fm <- lad(m.marietta ~ CRSP, data = ereturns, method = "EM") summary(fm) # basic observations fm$basic
This function is a switcher among various numerical fitting functions
(lad.fit.BR
, and lad.fit.EM
). The argument method
does the switching: "BR"
for lad.fit.BR
, etc. This should
usually not be used directly unless by experienced users.
lad.fit(x, y, method = "BR", tol = 1e-7, maxiter = 200)
lad.fit(x, y, method = "BR", tol = 1e-7, maxiter = 200)
x |
design matrix of dimension |
y |
vector of observations of length |
method |
currently, methods |
tol |
the relative tolerance for the iterative algorithm. Default is |
maxiter |
The maximum number of iterations for the |
a list
with components:
coefficients |
a named vector of coefficients. |
scale |
final scale estimate of the random error. |
residuals |
the residuals, that is response minus fitted values. |
fitted.values |
the fitted values. |
SAD |
the sum of absolute deviations. |
weights |
estimated |
basic |
basic observations, that is observations with zero residuals. |
logLik |
the log-likelihood at convergence. |
x <- cbind(1, stack.x) fm <- lad.fit(x, stack.loss, method = "BR") fm
x <- cbind(1, stack.x) fm <- lad.fit(x, stack.loss, method = "BR") fm
Fits a linear model using LAD methods, returning the bare minimum computations.
lad.fit.BR(x, y, tol = 1e-7) lad.fit.EM(x, y, tol = 1e-7, maxiter = 200)
lad.fit.BR(x, y, tol = 1e-7) lad.fit.EM(x, y, tol = 1e-7, maxiter = 200)
x , y
|
numeric vectors or matrices for the predictors and the response in
a linear model. Typically, but not necessarily, |
tol |
the relative tolerance for the iterative algorithm. Default is |
maxiter |
The maximum number of iterations for the |
The bare bones of a lad
object: the coefficients, residuals, fitted values,
and some information used by summary.lad
.
x <- cbind(1, stack.x) z <- lad.fit.BR(x, stack.loss) z
x <- cbind(1, stack.x) z <- lad.fit.BR(x, stack.loss) z
Density, distribution function, quantile function and random generation for the
Laplace distribution with location parameter location
and scale parameter
scale
.
dlaplace(x, location = 0, scale = 1, log = FALSE) plaplace(q, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) qlaplace(p, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) rlaplace(n, location = 0, scale = 1)
dlaplace(x, location = 0, scale = 1, log = FALSE) plaplace(q, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) qlaplace(p, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) rlaplace(n, location = 0, scale = 1)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. |
location |
location parameter |
scale |
scale parameter |
log , log.p
|
logical; if TRUE, probabilities |
lower.tail |
logical; if TRUE (default), probabilities are |
If location
or scale
are not specified, they assume the default
values of 0
and 1
respectively.
The Laplace distribution with location and scale
has density
where ,
and
.
The mean is
and the variance is
.
The cumulative distribution function, assumes the form
The quantile function, is given by
dlaplace
, plaplace
, and qlaplace
are respectively the density,
distribution function and quantile function of the Laplace distribution. rlaplace
generates random deviates drawn from the Laplace distribution, the length of the result
is determined by n
.
Felipe Osorio and Tymoteusz Wolodzko
Kotz, S., Kozubowski, T.J., Podgorski, K. (2001). The Laplace Distributions and Generalizations. Birkhauser, Boston.
Krishnamoorthy, K. (2006). Handbook of Statistical Distributions with Applications, 2nd Ed. Chapman & Hall, Boca Raton.
Distributions for other standard distributions and rmLaplace
for the random generation from the multivariate Laplace distribution.
x <- rlaplace(1000) ## QQ-plot for Laplace data against true theoretical distribution: qqplot(qlaplace(ppoints(1000)), x, main = "Laplace QQ-plot", xlab = "Theoretical quantiles", ylab = "Sample quantiles") abline(c(0,1), col = "red", lwd = 2)
x <- rlaplace(1000) ## QQ-plot for Laplace data against true theoretical distribution: qqplot(qlaplace(ppoints(1000)), x, main = "Laplace QQ-plot", xlab = "Theoretical quantiles", ylab = "Sample quantiles") abline(c(0,1), col = "red", lwd = 2)
Estimates the location vector and scatter matrix assuming the data came from a multivariate Laplace distribution.
LaplaceFit(x, data, subset, na.action, tol = 1e-6, maxiter = 200)
LaplaceFit(x, data, subset, na.action, tol = 1e-6, maxiter = 200)
x |
a formula or a numeric matrix or an object that can be coerced to a numeric matrix. |
data |
an optional data frame (or similar: see |
subset |
an optional expression indicating the subset of the rows of data that should be used in the fitting process. |
na.action |
a function that indicates what should happen when the data contain NAs. |
tol |
the relative tolerance in the iterative algorithm. |
maxiter |
maximum number of iterations. The default is 200. |
A list with class 'LaplaceFit'
containing the following components:
call |
a list containing an image of the |
center |
final estimate of the location vector. |
Scatter |
final estimate of the scale matrix. |
logLik |
the log-likelihood at convergence. |
numIter |
the number of iterations used in the iterative algorithm. |
weights |
estimated weights corresponding to the Laplace distribution. |
distances |
estimated squared Mahalanobis distances. |
Generic function print
show the results of the fit.
Yavuz, F.G., Arslan, O. (2018). Linear mixed model with Laplace distribution (LLMM). Statistical Papers 59, 271-289.
fit <- LaplaceFit(stack.x) fit # covariance matrix p <- fit$dims[2] Sigma <- (4 * (p + 1)) * fit$Scatter Sigma
fit <- LaplaceFit(stack.x) fit # covariance matrix p <- fit$dims[2] Sigma <- (4 * (p + 1)) * fit$Scatter Sigma
These functions provide the density and random number generation from the multivariate Laplace distribution.
dmLaplace(x, center = rep(0, nrow(Scatter)), Scatter = diag(length(center)), log = FALSE) rmLaplace(n = 1, center = rep(0, nrow(Scatter)), Scatter = diag(length(center)))
dmLaplace(x, center = rep(0, nrow(Scatter)), Scatter = diag(length(center)), log = FALSE) rmLaplace(n = 1, center = rep(0, nrow(Scatter)), Scatter = diag(length(center)))
x |
vector or matrix of data. |
n |
the number of samples requested. |
center |
a |
Scatter |
a |
log |
logical; if TRUE, the logarithm of the density function is returned. |
The multivariate Laplace distribution is a multidimensional extension of the univariate symmetric Laplace distribution. There are multiple forms of the multivariate Laplace distribution. Here, a particular case of the multivarite power exponential distribution introduced by Gomez et al. (1998) is considered.
The multivariate Laplace distribution with location =
center
and =
Scatter
has density
The function rmLaplace
is an interface to C routines, which make calls to
subroutines from LAPACK. The matrix decomposition is internally done using
the Cholesky decomposition. If Scatter
is not non-negative definite then
there will be a warning message.
If x
is a matrix with rows, then
dmLaplace
returns a
vector considering each row of
x
as a copy from
the multivariate Laplace.
If n = 1
, then rmLaplace
returns a vector of the same length as
center
, otherwise a matrix of n
rows of random vectors.
Fang, K.T., Kotz, S., Ng, K.W. (1990). Symmetric Multivariate and Related Distributions. Chapman & Hall, London.
Gomez, E., Gomez-Villegas, M.A., Marin, J.M. (1998). A multivariate generalization of the power exponential family of distributions. Communications in Statistics - Theory and Methods 27, 589-600.
# dispersion parameters Scatter <- matrix(c(1,.5,.5,1), ncol = 2) Scatter # generate the sample y <- rmLaplace(n = 2000, Scatter = Scatter) # scatterplot of a random bivariate Laplace sample with center # vector zero and scale matrix 'Scatter' par(pty = "s") plot(y, xlab = "", ylab = "") title("bivariate Laplace sample", font.main = 1)
# dispersion parameters Scatter <- matrix(c(1,.5,.5,1), ncol = 2) Scatter # generate the sample y <- rmLaplace(n = 2000, Scatter = Scatter) # scatterplot of a random bivariate Laplace sample with center # vector zero and scale matrix 'Scatter' par(pty = "s") plot(y, xlab = "", ylab = "") title("bivariate Laplace sample", font.main = 1)
lad
modelsSimulate one or more responses from the distribution corresponding to a
fitted lad
object.
## S3 method for class 'lad' simulate(object, nsim = 1, seed = NULL, ...)
## S3 method for class 'lad' simulate(object, nsim = 1, seed = NULL, ...)
object |
an object representing a fitted model. |
nsim |
number of response vectors to simulate. Defaults to 1. |
seed |
an object specifying if and how the random number generator should
be initialized (‘seeded’). For the "lad" method, either |
... |
additional optional arguments. |
For the "lad"
method, the result is a data frame with an attribute
"seed"
. If argument seed
is NULL
, the attribute is the
value of .Random.seed
before the simulation was started.
Tymoteusz Wolodzko and Felipe Osorio
fm <- lad(stack.loss ~ ., data = stackloss) sm <- simulate(fm, nsim = 4)
fm <- lad(stack.loss ~ ., data = stackloss) sm <- simulate(fm, nsim = 4)
Computation of the generalized spatial median estimator as defined by Rao (1988).
spatial.median(x, data, subset, na.action, tol = 1e-6, maxiter = 200)
spatial.median(x, data, subset, na.action, tol = 1e-6, maxiter = 200)
x |
a formula or a numeric matrix or an object that can be coerced to a numeric matrix. |
data |
an optional data frame (or similar: see |
subset |
an optional expression indicating the subset of the rows of data that should be used in the fitting process. |
na.action |
a function that indicates what should happen when the data contain NAs. |
tol |
the relative tolerance in the iterative algorithm. |
maxiter |
maximum number of iterations. The default is 200. |
An interesting fact is that the generalized spatial median estimator proposed by Rao (1988) is
the maximum likelihood estimator under the Kotz-type distribution discussed by Naik and Plungpongpun (2006).
The generalized spatial median estimators are defined as and
which minimize
simultaneously with respect to and
.
The function spatial.median
follows the iterative reweighting algorithm of Naik and Plungpongpun (2006).
A list with class 'spatial.median'
containing the following components:
call |
a list containing an image of the |
median |
final estimate of the location vector. |
Scatter |
final estimate of the scale matrix. |
logLik |
the log-likelihood at convergence. |
numIter |
the number of iterations used in the iterative algorithm. |
innerIter |
the total number of iterations used in the inner iterative algorithm. |
weights |
estimated weights corresponding to the Kotz distribution. |
distances |
estimated squared Mahalanobis distances. |
Generic function print
show the results of the fit.
Naik, D.N., Plungpongpun, K. (2006). A Kotz-type distribution for multivariate statistical inference. In: Balakrishnan, N., Sarabia, J.M., Castillo, E. (Eds) Advances in Distribution Theory, Order Statistics, and Inference. Birkhauser Boston, pp. 111-124.
Rao, C.R. (1988). Methodology based on the L1-norm in statistical inference. Sankhya, Series A 50, 289-313.
z <- spatial.median(stack.x) z
z <- spatial.median(stack.x) z
lad
modelsReturns the variance-covariance matrix of the main parameters of a fitted model
for lad
objects. The “main” parameters of model correspond to those
returned by coef
, and typically do not contain the nuisance scale
parameter.
## S3 method for class 'lad' vcov(object, ...)
## S3 method for class 'lad' vcov(object, ...)
object |
an object representing a fitted model. |
... |
additional arguments for method functions. |
A matrix of the estimated covariances between the parameter estimates in the
linear regression model. This should have row and column names corresponding
to the parameter names given by the coef
method.
Returns the Wilson-Hilferty transformation for multivariate Laplace deviates.
WH.Laplace(x, center, Scatter)
WH.Laplace(x, center, Scatter)
x |
object of class |
center |
mean vector of the distribution or data vector of length |
Scatter |
Scatter matrix ( |
Let be a Gamma distributed random variable, where
denotes the squared Mahalanobis distance defined as
Thus, the Wilson-Hilferty transformation is given by
and is approximately distributed as a standard normal distribution. This is useful,
for instance, in the construction of QQ-plots.
Osorio, F., Galea, M., Henriquez, C., Arellano-Valle, R. (2023). Addressing non-normality in multivariate analysis using the t-distribution. AStA Advances in Statistical Analysis 107, 785-813.
Terrell, G.R. (2003). The Wilson-Hilferty transformation is locally saddlepoint. Biometrika 90, 445-453.
Wilson, E.B., Hilferty, M.M. (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences of the United States of America 17, 684-688.
Scatter <- matrix(c(1,.5,.5,1), ncol = 2) Scatter # generate the sample y <- rmLaplace(n = 500, Scatter = Scatter) fit <- LaplaceFit(y) z <- WH.Laplace(fit) par(pty = "s") qqnorm(z, main = "Transformed distances QQ-plot") abline(c(0,1), col = "red", lwd = 2)
Scatter <- matrix(c(1,.5,.5,1), ncol = 2) Scatter # generate the sample y <- rmLaplace(n = 500, Scatter = Scatter) fit <- LaplaceFit(y) z <- WH.Laplace(fit) par(pty = "s") qqnorm(z, main = "Transformed distances QQ-plot") abline(c(0,1), col = "red", lwd = 2)