Title: | Robust Regression with Data Driven Tuning Parameter |
---|---|
Description: | Data driven approach for robust regression estimation in homoscedastic and heteroscedastic context. See Wang et al. (2007), <doi:10.1198/106186007X180156> regarding homoscedastic framework. |
Authors: | You-Gan Wang, Benoit Liquet, Aurelien Callens and Na Wang. |
Maintainer: | You-Gan Wang <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 0.4.0 |
Built: | 2024-12-13 06:46:43 UTC |
Source: | CRAN |
The data are collected from 273 female patients.
plasma
plasma
This data frame contains the following columns:
plasma beta-carotene
calories in KJ
dietary beta-carotene
Jiang, Y., Wang, Y-G., Fu, L., & Wang, X. (2018). Robust Estimation Using Modified Huber's Functions With New Tails. Technometrics, in press, see <doi:10.1080/00401706.2018.1470037>.
Robust estimation often relies on a dispersion function that is more slowly varying at large values than the squared function. However, the choice of tuning constant in dispersion function may impact the estimation efficiency to a great extent. For a given family of dispersion functions, we suggest obtaining the ‘best’ tuning constant from the data so that the asymptotic efficiency is maximized.
This library provides a robust linear regression with a tuning parameter being automatically chosen to provide the necessary resistance against outliers. The robust (loss) functions include the Huber, Tukey bisquare and the exponential loss.
rlmDD(yy, xx, beta0, betaR, method, plot)
rlmDD(yy, xx, beta0, betaR, method, plot)
yy |
Vector representing the response variable |
xx |
Design matrix of the covariates excluding the intercept in the first column |
beta0 |
Initial parameter estimate using |
betaR |
Robust estimate of beta with a fixed tuning constant using |
method |
Huber, Bisquare or Exponential |
plot |
"Y" gives a plot: the efficiency factor versus a range of tunning parameter values. |
The function returns a list including
esti |
Value of the robust estimate |
Std.Error |
Standard error of the robust estimate |
tunning |
Optimum tunning parameter |
R2 |
R-squared value |
You-Gan Wang, Na Wang
Wang, Y-G., Lin, X., Zhu, M., & Bai, Z. (2007). Robust estimation using the Huber function with a data-dependent tuning constant. Journal of Computational and Graphical Statistics, 16(2), 468-481.
Wang, X., Jiang, Y., Huang, M., & Zhang, H. (2013). Robust variable selection with exponential squared loss. Journal of the American Statistical Association, 108, 632-643.
Wang, N., Wang, Y-G., Hu, S., Hu, Z. H., Xu, J., Tang, H., & Jin, G. (2018). Robust Regression with Data-Dependent Regularization Parameters and Autoregressive Temporal Correlations. Environmental Modeling & Assessment, in press.
rlm
function from package MASS
library(MASS) data(stackloss) LS <- lm(stack.loss ~ stack.x) RB <- rlm(stack.loss ~ stack.x, psi = psi.huber, k = 1.345) DD1 <- rlmDD(stack.loss, stack.x, LS$coef, RB$coef, method = "Huber", plot = "Y") LS <- lm(stack.loss ~ stack.x) RB <- rlm(stack.loss ~ stack.x, psi = psi.bisquare, c = 4.685) DD2 <- rlmDD(stack.loss, stack.x, LS$coef, RB$coef, method = "Bisquare", plot = "Y") LS <- lm(stack.loss ~ stack.x) RB <- rlm(stack.loss ~ stack.x, psi = psi.huber, k = 1.345) DD3 <- rlmDD(stack.loss, stack.x, LS$coef, RB$coef, method = "Exponential", plot = "Y") ## Plasma dataset data(plasma) y <- plasma$y x <- cbind(plasma$calories, plasma$dietary) LS <- lm(y ~ x) RB <- rlm(y ~ x, psi = psi.huber, k = 1.345) DD.h <- rlmDD(y, x, LS$coef, RB$coef, method = "Huber", plot = "Y") LS <- lm(y ~ x) RB <- rlm(y ~ x, psi = psi.bisquare, c = 4.685) DD.b <- rlmDD(y, x, LS$coef, RB$coef, method = "Bisquare", plot = "Y") LS <- lm(y ~ x) RB <- rlm(y ~ x, psi = psi.huber, k = 1.345) DD.e <- rlmDD(y, x, LS$coef, RB$coef, method = "Exponential", plot = "Y")
library(MASS) data(stackloss) LS <- lm(stack.loss ~ stack.x) RB <- rlm(stack.loss ~ stack.x, psi = psi.huber, k = 1.345) DD1 <- rlmDD(stack.loss, stack.x, LS$coef, RB$coef, method = "Huber", plot = "Y") LS <- lm(stack.loss ~ stack.x) RB <- rlm(stack.loss ~ stack.x, psi = psi.bisquare, c = 4.685) DD2 <- rlmDD(stack.loss, stack.x, LS$coef, RB$coef, method = "Bisquare", plot = "Y") LS <- lm(stack.loss ~ stack.x) RB <- rlm(stack.loss ~ stack.x, psi = psi.huber, k = 1.345) DD3 <- rlmDD(stack.loss, stack.x, LS$coef, RB$coef, method = "Exponential", plot = "Y") ## Plasma dataset data(plasma) y <- plasma$y x <- cbind(plasma$calories, plasma$dietary) LS <- lm(y ~ x) RB <- rlm(y ~ x, psi = psi.huber, k = 1.345) DD.h <- rlmDD(y, x, LS$coef, RB$coef, method = "Huber", plot = "Y") LS <- lm(y ~ x) RB <- rlm(y ~ x, psi = psi.bisquare, c = 4.685) DD.b <- rlmDD(y, x, LS$coef, RB$coef, method = "Bisquare", plot = "Y") LS <- lm(y ~ x) RB <- rlm(y ~ x, psi = psi.huber, k = 1.345) DD.e <- rlmDD(y, x, LS$coef, RB$coef, method = "Exponential", plot = "Y")
Performs robust regression for autoregressive models with heterogeneity.
First, a M-estimation is performed on the data assuming that the variance is constant. The residuals of this model are used to robustly estimate the variance parameter. Then, a weighted M-estimation with variance as weight is used to update the regression parameters. These steps are repeated for different values of tuning parameter. The best tuning parameter is the one which minimizes the variance of the estimator.
Finally, lagged term are built and added to the regression model therefore accounting for temporal correlations. The loss function used is Huber's function.
rlmDD_het(yy, xx, var.function = c("power", "exponential"), phi.par = TRUE, tuning.para = NULL, step = 0.1, n.lag = NULL, print.summary = TRUE)
rlmDD_het(yy, xx, var.function = c("power", "exponential"), phi.par = TRUE, tuning.para = NULL, step = 0.1, n.lag = NULL, print.summary = TRUE)
yy |
Vector representing the response variable |
xx |
Design matrix of the covariates (including the intercept) |
var.function |
Assumed function for the variance. " |
phi.par |
If |
tuning.para |
If |
step |
Only works when |
n.lag |
If |
print.summary |
If |
The function returns a list including
coefficients |
Value of the robust estimates |
residuals |
Residuals of the model. |
p_residuals |
Pearson residuals of the model. |
r_residuals |
Robust pearson residuals of the model : |
with
the derivative of the loss function and c the chosen tuning parameter.
fitted values |
Fitted values obtained with the robust method |
vcov |
Variance-covariance matrix of the estimates |
summary |
Summary of the model including: values, standard errors and z-values of the estimates |
model |
Design matrix of the model |
tuningpara |
When |
varpara |
Estimates of the variance parameters |
Aurelien Callens, You-Gan Wang, Benoit Liquet
Callens, A., Wang, Y-G., Fu, L. & Liquet, B. (2018). Robust estimation for autoregressive models with heterogeneity. Submitted.
rlm
function from package MASS
library(tseries) data(ice.river) xx <- model.matrix(flow.vat ~ prec + temp, data = ice.river) yy <- flow.jok least_square <- lm(flow.vat ~ prec + temp, data = ice.river) pacf(least_square$residuals) qqnorm(least_square$residuals) qqline(least_square$residuals, col = "red", lwd = 2) #With choice of optimal tuning parameter and 2 lags. #Note that if lag = NULL, a Pacf plot will appear to help you choose #the number of lags, you will need to input this number in the console. model_1 <- rlmDD_het(yy, xx, var.function = "exponential", tuning.para = NULL, n.lag = 2) pacf(model_1$p_residuals) qqnorm(model_1$r_residuals) qqline(model_1$r_residuals, col = "red", lwd = 2) #For fixed number of lags and tuning parameter model_2 <- rlmDD_het(yy, xx, var.function = "exponential", tuning.para = 1.345, n.lag = 2)
library(tseries) data(ice.river) xx <- model.matrix(flow.vat ~ prec + temp, data = ice.river) yy <- flow.jok least_square <- lm(flow.vat ~ prec + temp, data = ice.river) pacf(least_square$residuals) qqnorm(least_square$residuals) qqline(least_square$residuals, col = "red", lwd = 2) #With choice of optimal tuning parameter and 2 lags. #Note that if lag = NULL, a Pacf plot will appear to help you choose #the number of lags, you will need to input this number in the console. model_1 <- rlmDD_het(yy, xx, var.function = "exponential", tuning.para = NULL, n.lag = 2) pacf(model_1$p_residuals) qqnorm(model_1$r_residuals) qqline(model_1$r_residuals, col = "red", lwd = 2) #For fixed number of lags and tuning parameter model_2 <- rlmDD_het(yy, xx, var.function = "exponential", tuning.para = 1.345, n.lag = 2)
This function performs a weighted M-estimation described by Carroll and Ruppert (1982) with the Huber loss function. First, a M-estimation is performed on the data assuming that the variance is constant. The residuals of this model are used to robustly estimate the variance parameter. Then, a weighted M-estimation with variance as weight is used to update the regression parameters. These steps are iterated until desired convergence.
whm(yy, xx, var.function = "power", tuning.para = 1.345, ite = 5)
whm(yy, xx, var.function = "power", tuning.para = 1.345, ite = 5)
yy |
Vector representing the response variable |
xx |
Design matrix of the covariates including the intercept in the first column |
var.function |
Assumed function for the variance. " |
tuning.para |
Value of the tuning parameter associated with the loss function. |
ite |
Number of iterations for the estimation procedure. |
The function returns a list including
esti |
Value of the robust estimate |
Std.Error |
Standard error of the robust estimate |
tunning |
Optimum tunning parameter |
R2 |
R-squared value |
Aurelien Callens, You-Gan Wang, Benoit Liquet.
Carroll, R. J., & Ruppert, D. (1982). Robust estimation in heteroscedastic linear models. The annals of statistics, 429-441.
rlm
function from package MASS
library(MASS) data(stackloss) LS <- lm(stack.loss ~ stack.x) RB <- rlm(stack.loss ~ stack.x, psi = psi.huber, k = 1.345) yy <- stack.loss xx <- model.matrix(stack.loss ~ stack.x) #With power function as variance function WHM_p <- whm(yy, xx, var.function = "power", tuning.para = 1.345) #With exponential function as variance function WHM_e <- whm(yy, xx, var.function = "exponential", tuning.para = 1.345)
library(MASS) data(stackloss) LS <- lm(stack.loss ~ stack.x) RB <- rlm(stack.loss ~ stack.x, psi = psi.huber, k = 1.345) yy <- stack.loss xx <- model.matrix(stack.loss ~ stack.x) #With power function as variance function WHM_p <- whm(yy, xx, var.function = "power", tuning.para = 1.345) #With exponential function as variance function WHM_e <- whm(yy, xx, var.function = "exponential", tuning.para = 1.345)