Title: | Pena-Yohai Initial Estimator for Robust S-Regression |
---|---|
Description: | Deterministic Pena-Yohai initial estimator for robust S estimators of regression. The procedure is described in detail in Pena, D., & Yohai, V. (1999) <doi:10.2307/2670164>. |
Authors: | David Kepplinger [aut, cre], Matias Salibian-Barrera [aut], Gabriela Cohen Freue [aut] |
Maintainer: | David Kepplinger <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.3 |
Built: | 2024-12-22 06:36:34 UTC |
Source: | CRAN |
Compute the M-estimate of scale using the MAD as initial estimate.
mscale( x, delta = 0.5, rho = c("bisquare", "huber", "gauss"), cc, eps = 1e-08, maxit = 200 )
mscale( x, delta = 0.5, rho = c("bisquare", "huber", "gauss"), cc, eps = 1e-08, maxit = 200 )
x |
numeric vector. |
delta |
desired value for the right-hand side of the M-estimation equation. |
rho |
rho function to use in the M-estimation equation. Valid options
are |
cc |
non-negative constant for the chosen rho function. If missing, it will be
chosen such that the expected value of the rho function under the normal model
is equal to |
eps |
threshold for convergence. Defaults to |
maxit |
maximum number of iterations. Defaults to |
This solves the M-estimation equation given by
All NA
values in x
are removed before calculating the scale.
Numeric vector of length one containing the solution s_n
to
the equation above.
Computes the PY initial estimates for S-estimates of regression.
pyinit( x, y, intercept = TRUE, delta = 0.5, cc, maxit = 10, psc_keep, resid_keep_method = c("threshold", "proportion"), resid_keep_prop, resid_keep_thresh, eps = 1e-08, mscale_maxit = 200, mscale_tol = eps, mscale_rho_fun = c("bisquare", "huber", "gauss") )
pyinit( x, y, intercept = TRUE, delta = 0.5, cc, maxit = 10, psc_keep, resid_keep_method = c("threshold", "proportion"), resid_keep_prop, resid_keep_thresh, eps = 1e-08, mscale_maxit = 200, mscale_tol = eps, mscale_rho_fun = c("bisquare", "huber", "gauss") )
x |
a matrix with the data, each observation in a row. |
y |
the response vector. |
intercept |
logical, should an intercept be included in the model? Defaults to
|
delta , cc
|
parameters for the M-scale estimator equation. If |
maxit |
the maximum number of iterations to perform. |
psc_keep |
proportion of observations to keep based on PSCs. |
resid_keep_method |
how to clean the data based on large residuals.
If |
resid_keep_prop , resid_keep_thresh
|
see parameter
|
eps |
the relative tolerance for convergence. Defaults to |
mscale_maxit |
maximum number of iterations allowed for the M-scale
algorithm. Defaults to |
mscale_tol |
convergence threshold for the m-scale |
mscale_rho_fun |
A string containing the name of the rho
function to use for the M-scale. Valid options
are |
coefficients |
numeric matrix with coefficient vectors in columns. These
are regression estimators based on "cleaned" subsets of the data. The
M-scales of the corresponding residuals are returned in the entry
|
objective |
vector of values of the M-scale estimate of the residuals
associated with each vector of regression coefficients in the columns of
|
Pena, D., & Yohai, V.. (1999). A Fast Procedure for Outlier Diagnostics in Large Regression Problems. Journal of the American Statistical Association, 94(446), 434-445. <doi:10.2307/2670164>
# generate a simple synthetic data set for a linear regression model # with true regression coefficients all equal to one "(1, 1, 1, 1, 1)" set.seed(123) x <- matrix(rnorm(100*4), 100, 4) y <- rnorm(100) + rowSums(x) + 1 # add masked outliers a <- svd(var(x))$v[,4] x <- rbind(x, t(outer(a, rnorm(20, mean=4, sd=1)))) y <- c(y, rnorm(20, mean=-2, sd=.2)) # these outliers are difficult to find plot(lm(y~x), ask=FALSE) # use pyinit to obtain estimated regression coefficients tmp <- pyinit(x=x, y=y, resid_keep_method='proportion', psc_keep = .5, resid_keep_prop=.5) # the vector of regression coefficients with smallest residuals scale # is returned in the first column of the "coefficients" element tmp$coefficients[,1] # compare that with the LS estimator on the clean data coef(lm(y~x, subset=1:100)) # compare it with the LS estimator on the full data coef(lm(y~x))
# generate a simple synthetic data set for a linear regression model # with true regression coefficients all equal to one "(1, 1, 1, 1, 1)" set.seed(123) x <- matrix(rnorm(100*4), 100, 4) y <- rnorm(100) + rowSums(x) + 1 # add masked outliers a <- svd(var(x))$v[,4] x <- rbind(x, t(outer(a, rnorm(20, mean=4, sd=1)))) y <- c(y, rnorm(20, mean=-2, sd=.2)) # these outliers are difficult to find plot(lm(y~x), ask=FALSE) # use pyinit to obtain estimated regression coefficients tmp <- pyinit(x=x, y=y, resid_keep_method='proportion', psc_keep = .5, resid_keep_prop=.5) # the vector of regression coefficients with smallest residuals scale # is returned in the first column of the "coefficients" element tmp$coefficients[,1] # compare that with the LS estimator on the clean data coef(lm(y~x, subset=1:100)) # compare it with the LS estimator on the full data coef(lm(y~x))