Title: | Fast and Robust Bootstrap |
---|---|
Description: | Perform robust inference based on applying Fast and Robust Bootstrap on robust estimators (Van Aelst and Willems (2013) <doi:10.18637/jss.v053.i03>). This method constitutes an alternative to ordinary bootstrap or asymptotic inference. procedures when using robust estimators such as S-, MM- or GS-estimators. The available methods are multivariate regression, principal component analysis and one-sample and two-sample Hotelling tests. It provides both the robust point estimates and uncertainty measures based on the fast and robust bootstrap. |
Authors: | Ella Roelant [aut], Stefan Van Aelst [aut], Gert Willems [aut], Valentin Todorov [cre] |
Maintainer: | Valentin Todorov <[email protected]> |
License: | GPL (>= 3) |
Version: | 2.0-1 |
Built: | 2024-12-07 06:26:32 UTC |
Source: | CRAN |
Diagnostic plots for objects of class FRBmultireg
, FRBpca
and FRBhot
. It shows robust distances
and allows detection of multivariate outliers.
## S3 method for class 'FRBmultireg' diagplot(x, Xdist = TRUE, ...) ## S3 method for class 'FRBpca' diagplot(x, EIF = TRUE, ...) ## S3 method for class 'FRBhot' diagplot(x, ...)
## S3 method for class 'FRBmultireg' diagplot(x, Xdist = TRUE, ...) ## S3 method for class 'FRBpca' diagplot(x, EIF = TRUE, ...) ## S3 method for class 'FRBhot' diagplot(x, ...)
x |
an R object of class |
Xdist |
logical: if TRUE, the plot shows the robust distance versus the distance in the space of the explanatory variables; if FALSE, it plots the robust distance versus the index of the observation |
EIF |
logical: if TRUE, the plot shows the robust distance versus an influence measure for each point; if FALSE, it plots the robust distance versus the index of the observation |
... |
potentially more arguments to be passed |
The diagnostic plots are based on the robust distances of the observations. In a multivariate sample ,
the robust distance
of observation
is given by
.
where
and
are robust estimates of location and covariance.
Observations with large robust distance are considered as outlying.
The default diagnostic plot in the multivariate regresssion setting (i.e. for objects of type FRBmultireg
and Xdist=TRUE
),
shows the residual distances (i.e. the robust distances of the multivariate residuals) based on the estimates in x
,
versus the distances within the space of the explanatory variables. The latter are based on robust estimates of location and scatter for the
data matrix x$X
(without intercept). Computing these robust estimates may take an appreciable amount of time. The estimator used
corresponds to the one which was used in obtaining Xmultireg
(with the same breakdown point, for example, and the same control parameters).
On the vertical axis a cutoff line is drawn at the square root of the .975 quantile of the chi-squared distribution with degrees of
freedom equal to the number of response variables. On the horizontal axis the same quantile is drawn but now with degrees of freedom
equal to the number of covariates (not including intercept).
Those points to the right of the cutoff can be viewed as high-leverage points. These can be classified into so-called
'bad' or 'good' leverage points depending on whether they are above or below the cutoff. Points above the cutoff but to the
left of the vertical cutoff are sometimes called vertical outliers.
See also Van Aelst and Willems (2005) for example.
To avoid the additional computation time, one can choose Xdist=FALSE
, in which case the residual distances are simply plotted
versus the index of the observation.
The default plot in the context of PCA (i.e. for objects of type FRBpca
and EIF=FALSE
)
is a plot proposed by Pison and Van Aelst (2004). It shows the robust distance versus a measure of the overall empirical influence
of the observation on the (classical) principal components. The empirical influences are obtained by using the influence function of
the eigenvectors of the empirical or classical shape estimator at the normal model, and by
substituting therein the robust estimates for the population parameters.
The overall influence value is then defined by averaging the squared influence
over all coefficients in the eigenvectors.
The vertical line on the plot is an indicative cutoff value, obtained through simulation. This last part takes
a few moments of computation time.
Again, to avoid the additional computation time, one can choose EIF=FALSE
, in which case the robust distances are simply plotted
versus the index of the observation.
For the result of the robust Hotelling test (i.e. for objects of type FRBhot
), the method plots the robust
distance versus the index. In case of a two-sample test, the indices are within-sample and a vertical line separates
the two groups. In the two-sample case, each group has its own location estimate and a common
covariance estimate
.
Returns invisibly the first argument.
Gert Willems and Ella Roelant
G. Pison and S. Van Aelst (2004). Diagnostic Plots for Robust Multivariate Methods. Journal of Computational and Graphical Statistics, 13, 310–329.
S. Van Aelst and G. Willems (2005). Multivariate Regression S-Estimators for Robust Estimation and Inference. Statistica Sinica, 15, 981–1001.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBmultiregS
, FRBmultiregMM
, FRBmultiregGS
, FRBpcaS
, FRBpcaMM
, FRBhotellingS
, FRBhotellingMM
## for multivariate regression: data(schooldata) MMres <- MMest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata) diagplot(MMres) ## a large 'bad leverage' outlier should be noticeable (observation 59) ## for PCA: data(ForgedBankNotes) MMres <- FRBpcaMM(ForgedBankNotes) diagplot(MMres) ## a group of 15 fairly strong outliers can be seen which apparently would have ## a large general influence on a classical PCA analysis ## for Hotelling tests (two-sample) data(hemophilia, package="rrcov") MMres <- FRBhotellingMM(cbind(AHFactivity, AHFantigen) ~ gr, data=hemophilia) diagplot(MMres) ## the data seem practically outlier-free
## for multivariate regression: data(schooldata) MMres <- MMest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata) diagplot(MMres) ## a large 'bad leverage' outlier should be noticeable (observation 59) ## for PCA: data(ForgedBankNotes) MMres <- FRBpcaMM(ForgedBankNotes) diagplot(MMres) ## a group of 15 fairly strong outliers can be seen which apparently would have ## a large general influence on a classical PCA analysis ## for Hotelling tests (two-sample) data(hemophilia, package="rrcov") MMres <- FRBhotellingMM(cbind(AHFactivity, AHFantigen) ~ gr, data=hemophilia) diagplot(MMres) ## the data seem practically outlier-free
Six measurements made on 100 forged Swiss bank notes.
data(ForgedBankNotes)
data(ForgedBankNotes)
The data frame contains the following columns:
Length
metric length of the bill
Left
height of the bill, measured on the left
Right
height of the bill, measured on the right
Bottom
distance of inner frame to the lower border
Top
distance of inner frame to the upper border
Diagonal
length of the diagonal
The original data set in Flury and Riedwyl (1988) additionally contained 100 genuine bank notes, but these are not included here.
B. Flury and H. Riedwyl (1988) Multivariate Statistics: A practical approach. London: Chapman & Hall.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2006) PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198-1211.
data(ForgedBankNotes) pairs(ForgedBankNotes)
data(ForgedBankNotes) pairs(ForgedBankNotes)
Robust one-sample and two-sample Hotelling test using the MM-estimator and the Fast and Robust Bootstrap.
## S3 method for class 'formula' FRBhotellingMM(formula, data=NULL, ...) ## Default S3 method: FRBhotellingMM(X, Y=NULL, mu0 = 0, R = 999, conf = 0.95, method = c("HeFung", "pool"),control=MMcontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' FRBhotellingMM(formula, data=NULL, ...) ## Default S3 method: FRBhotellingMM(X, Y=NULL, mu0 = 0, R = 999, conf = 0.95, method = c("HeFung", "pool"),control=MMcontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
X |
a matrix or data-frame |
Y |
an optional matrix or data-frame in case of a two-sample test |
mu0 |
an optional vector of data values (or a single number which will be repeated |
R |
number of bootstrap samples. Default is |
conf |
confidence level for the simultaneous confidence intervals. Default is |
method |
for the two-sample Hotelling test, indicates the way the common covariance matrix is estimated:
|
control |
a list with control parameters for tuning the MM-estimate and its computing algorithm,
see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
The classical Hotelling test for testing if the mean equals a certain value or if two means
are equal is modified into a robust one through substitution of the empirical estimates
by the MM-estimates of location and scatter. The MM-estimator, using Tukey's biweight function, is tuned by default to have
a breakdown point of 50% and 95% location efficiency. This could be changed through the control
argument if desired.
The MM-estimates are obtained by first computing the S-estimates with the fast-S algorithm followed by the M-part through reweighted least squares (RWLS) iteration. See MMcontrol
for some adjustable tuning parameters regarding the algorithm.
The fast and robust bootstrap is used to mimic the distribution of the test statistic under the null hypothesis. For instance, the 5% critical value for the test is given by the 95% quantile of the recalculated statistics.
Robust simultaneous confidence intervals for linear combinations of the mean (or difference in means) are developed similarly to the classical case
(Johnson and Wichern, 1988, page 239). The value CI
is a matrix with the confidence intervals for each element
of the mean (or difference in means), with level conf
. It consists of two rows, the first being the lower bound and the second the upper bound.
Note that these intervals are rather conservative in the sense that the simultaneous confidence level holds for all linear combinations
and here only of these are considered (with
the dimension of the data).
For the two-sample Hotelling test we assume that the samples have an underlying distribution with the same covariance matrix.
This covariance matrix can be estimated in two different ways using the pooled covariance matrix or the two-sample
estimator of He and Fung (He and Fung 2000), and argument method
defaults to the second option.
For more details see Roelant et al. (2008).
In the two-sample version, the null hypothesis always states that the two means are equal. For the one-sample version, the default
null hypothesis is that the mean equals zero, but the hypothesized value can be changed and specified through argument mu0
.
Bootstrap samples are discarded if the fast and robust covariance estimate is not positive definite, such that the actual number
of recalculations used can be lower than R
. This number is returned as ROK
.
An object of class FRBhot
which extends class htest
and contains at least the following components:
statistic |
the value of the robust test statistic. |
pvalue |
p-value of the robust one or two-sample Hotelling test, determined by the fast and robust bootstrap |
estimate |
the estimated mean vector or vectors depending on whether it was a one-sample test or a two-sample test. |
alternative |
a character string describing the alternative hypothesis. |
method |
a character string indicating what type of Hotelling test was performed. |
data.name |
a character string giving the name(s) of the data. |
teststat.boot |
the bootstrap recalculated values of the robust test statistic. |
CI |
bootstrap simultaneous confidence intervals for each component of the center |
conf |
a copy of the |
Sigma |
covariance of one-sample or common covariance matrix in the case of two samples |
w |
implicit weights corresponding to the MM-estimates (i.e. final weights in the RWLS procedure) |
outFlag |
outlier flags: 1 if the robust distance of the observation exceeds the .975 quantile of (the square root of)
the chi-square distribution with degrees of freedom equal to the dimension of |
ROK |
number of bootstrap samples actually used (i.e. not discarded due to non-positive definite covariance |
Ella Roelant, Stefan Van Aelst and Gert Willems
X. He and W.K. Fung (2000) High breakdown estimation for multiple populations with applications to discriminant analysis. Journal of Multivariate Analysis, 72, 151–162.
R.A. Johnson, D.W. Wichern (1988) Applied Multivariate Statistical Analysis, 2nd Edition, Prentice-Hall.
E. Roelant, S. Van Aelst and G. Willems, (2008) Fast Bootstrap for Robust Hotelling Tests, COMPSTAT 2008: Proceedings in Computational Statistics (P. Brito, Ed.) Heidelberg: Physika-Verlag, 709–719.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
plot.FRBhot
, summary.FRBhot
, FRBhotellingS
, MMcontrol
## One sample robust Hotelling test data(delivery, package="robustbase") delivery.x <- delivery[, 1:2] FRBhotellingMM(delivery.x, R=199) ## One sample robust Hotelling test data(ForgedBankNotes) samplemean <- apply(ForgedBankNotes, 2, mean) res = FRBhotellingMM(ForgedBankNotes, mu0=samplemean,R=199) res # Note that the test rejects the hypothesis that the true mean equals the # sample mean; this is due to outliers in the data (i.e. the robustly estimated # mean apparently significantly differs from the non-robust sample mean. # Graphical display of the results: plot(res) # It is clear from the (scaled) simultaneous confidence limits that the rejection # of the hypothesis is due to the differences in variables Bottom and Diagonal ## Two sample robust Hotelling test data(hemophilia, package="rrcov") grp <-as.factor(hemophilia[,3]) x <- hemophilia[which(grp==levels(grp)[1]),1:2] y <- hemophilia[which(grp==levels(grp)[2]),1:2] #using the pooled covariance matrix to estimate the common covariance matrix res <- FRBhotellingMM(x, y, method="pool") #using the estimator of He and Fung to estimate the common covariance matrix res <- FRBhotellingMM(x,y,method="HeFung",R=199) # or using the formula interface res <- FRBhotellingMM(as.matrix(hemophilia[,-3])~hemophilia[,3], method="HeFung") # From the confidence limits it can be seen that the significant difference # is mainly caused by the AHFactivity variable. The graphical display helps too: plot(res) # the red line on the histogram indicates the test statistic value in the original # sample (it is omitted if the statistic exceeds 100)
## One sample robust Hotelling test data(delivery, package="robustbase") delivery.x <- delivery[, 1:2] FRBhotellingMM(delivery.x, R=199) ## One sample robust Hotelling test data(ForgedBankNotes) samplemean <- apply(ForgedBankNotes, 2, mean) res = FRBhotellingMM(ForgedBankNotes, mu0=samplemean,R=199) res # Note that the test rejects the hypothesis that the true mean equals the # sample mean; this is due to outliers in the data (i.e. the robustly estimated # mean apparently significantly differs from the non-robust sample mean. # Graphical display of the results: plot(res) # It is clear from the (scaled) simultaneous confidence limits that the rejection # of the hypothesis is due to the differences in variables Bottom and Diagonal ## Two sample robust Hotelling test data(hemophilia, package="rrcov") grp <-as.factor(hemophilia[,3]) x <- hemophilia[which(grp==levels(grp)[1]),1:2] y <- hemophilia[which(grp==levels(grp)[2]),1:2] #using the pooled covariance matrix to estimate the common covariance matrix res <- FRBhotellingMM(x, y, method="pool") #using the estimator of He and Fung to estimate the common covariance matrix res <- FRBhotellingMM(x,y,method="HeFung",R=199) # or using the formula interface res <- FRBhotellingMM(as.matrix(hemophilia[,-3])~hemophilia[,3], method="HeFung") # From the confidence limits it can be seen that the significant difference # is mainly caused by the AHFactivity variable. The graphical display helps too: plot(res) # the red line on the histogram indicates the test statistic value in the original # sample (it is omitted if the statistic exceeds 100)
Robust one-sample and two-sample Hotelling test using the S-estimator and the Fast and Robust Bootstrap.
## S3 method for class 'formula' FRBhotellingS(formula, data=NULL, ...) ## Default S3 method: FRBhotellingS(X, Y=NULL, mu0 = 0, R = 999, bdp = 0.5, conf = 0.95, method = c("HeFung", "pool"), control=Scontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' FRBhotellingS(formula, data=NULL, ...) ## Default S3 method: FRBhotellingS(X, Y=NULL, mu0 = 0, R = 999, bdp = 0.5, conf = 0.95, method = c("HeFung", "pool"), control=Scontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
X |
a matrix or data-frame |
Y |
an optional matrix or data-frame in case of a two-sample test |
mu0 |
an optional vector of data values (or a single number which will be repeated p times) indicating the
true value of the mean (does not apply in case of the two-sample test). Default is the null vector |
R |
number of bootstrap samples. Default is |
bdp |
required breakdown point. Should have |
conf |
confidence level for the simultaneous confidence intervals. Default is |
method |
for the two-sample Hotelling test, indicates the way the common covariance matrix is estimated:
|
control |
a list with control parameters for tuning the computing algorithm, see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
The classical Hotelling test for testing if the mean equals a certain center or if two means
are equal is modified into a robust one through substitution of the empirical estimates
by the S-estimates of location and scatter. The S-estimator uses Tukey's biweight function where the constant is chosen
to obtain the desired breakdown point as specified by bdp
.
One-sample S-estimates are computed by a call to the implementation of the fast-S algorithm in the rrcov package of Todorov and Filzmoser (2009). For two-sample S-estimates an adaptation of the fast-S algorithm is used.
The tuning parameters of the algorithm can be changed via control
.
The fast and robust bootstrap is used to mimic the distribution of the test statistic under the null hypothesis. For instance, the 5% critical value for the test is given by the 95% quantile of the recalculated statistics.
Robust simultaneous confidence intervals for linear combinations of the mean (or difference in means) are developed similarly to the classical case
(Johnson and Wichern, 1988, page 239). The value CI
is a matrix with the confidence intervals for each element
of the mean (or difference in means), with level conf
. It consists of two rows, the first being the lower bound and the second the upper bound.
Note that these intervals are rather conservative in the sense that the simultaneous confidence level holds for all linear combinations
and here only of these are considered (with
the dimension of the data).
For the two-sample Hotelling test we assume that the samples have an underlying distribution with the same covariance matrix.
This covariance matrix can be estimated in two different ways using the pooled covariance matrix or the two-sample
estimator of He and Fung (He and Fung 2000), and argument method
defaults to the second option.
For more details see Roelant et al. (2008).
In the two-sample version, the null hypothesis always states that the two means are equal. For the one-sample version, the default
null hypothesis is that the mean equals zero, but the hypothesized value can be changed and specified through argument mu0
.
Bootstrap samples are discarded if the fast and robust covariance estimate is not positive definite, such that the actual number
of recalculations used can be lower than R
. This number is returned as ROK
.
An object of class FRBhot
which extends class htest
and contains at least the following components:
statistic |
the value of the robust test statistic. |
pvalue |
p-value of the robust one or two-sample Hotelling test, determined by the fast and robust bootstrap |
estimate |
the estimated mean vector or vectors depending on whether it was a one-sample test or a two-sample test. |
alternative |
a character string describing the alternative hypothesis. |
method |
a character string indicating what type of Hotelling test was performed. |
data.name |
a character string giving the name(s) of the data. |
teststat.boot |
the bootstrap recalculated values of the robust test statistic. |
CI |
bootstrap simultaneous confidence intervals for each component of the center |
conf |
a copy of the |
Sigma |
covariance of one-sample or common covariance matrix in the case of two samples |
w |
implicit weights corresponding to the S-estimates (i.e. final weights in the RWLS procedure at the end of the fast-S algorithm) |
outFlag |
outlier flags: 1 if the robust distance of the observation exceeds the .975 quantile of (the square root of)
the chi-square distribution with degrees of freedom equal to the dimension of |
ROK |
number of bootstrap samples actually used (i.e. not discarded due to non-positive definite covariance |
Ella Roelant, Stefan Van Aelst and Gert Willems
X. He and W.K. Fung (2000) High breakdown estimation for multiple populations with applications to discriminant analysis. Journal of Multivariate Analysis, 72, 151–162.
R.A. Johnson, D.W. Wichern (1988) Applied Multivariate Statistical Analysis, 2nd Edition, Prentice-Hall.
E. Roelant, S. Van Aelst and G. Willems, (2008) Fast Bootstrap for Robust Hotelling Tests, COMPSTAT 2008: Proceedings in Computational Statistics (P. Brito, Ed.) Heidelberg: Physika-Verlag, 709–719.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
V. Todorov and P. Filzmoser (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47. doi:10.18637/jss.v032.i03.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
plot.FRBhot
,summary.FRBhot
, FRBhotellingMM
, Scontrol
## One sample robust Hotelling test data(delivery, package="robustbase") delivery.x <- delivery[,1:2] FRBhotellingS(delivery.x,R=199) ## One sample robust Hotelling test data(ForgedBankNotes) samplemean <- apply(ForgedBankNotes, 2, mean) res = FRBhotellingS(ForgedBankNotes, mu0=samplemean,R=199) res # Note that the test rejects the hypothesis that the true mean equals the # sample mean; this is due to outliers in the data (i.e. the robustly estimated # mean apparently significantly differs from the non-robust sample mean. # Graphical display of the results: plot(res) # It is clear from the (scaled) simultaneous confidence limits that the rejection # of the hypothesis is due to the differences in variables Bottom and Diagonal ## Two sample robust Hotelling test data(hemophilia, package="rrcov") grp <-as.factor(hemophilia[,3]) x <- hemophilia[which(grp==levels(grp)[1]),1:2] y <- hemophilia[which(grp==levels(grp)[2]),1:2] #using the pooled covariance matrix to estimate the common covariance matrix res = FRBhotellingS(x,y,method="pool") #using the estimator of He and Fung to estimate the common covariance matrix res = FRBhotellingS(x,y,method="HeFung",R=199) # or using the formula interface res = FRBhotellingS(as.matrix(hemophilia[,-3])~hemophilia[,3],method="HeFung",R=99) # From the confidence limits it can be seen that the significant difference # is mainly caused by the AHFactivity variable. The graphical display helps too: plot(res) # the red line on the histogram indicates the test statistic value in the original # sample (it is omitted if the statistic exceeds 100)
## One sample robust Hotelling test data(delivery, package="robustbase") delivery.x <- delivery[,1:2] FRBhotellingS(delivery.x,R=199) ## One sample robust Hotelling test data(ForgedBankNotes) samplemean <- apply(ForgedBankNotes, 2, mean) res = FRBhotellingS(ForgedBankNotes, mu0=samplemean,R=199) res # Note that the test rejects the hypothesis that the true mean equals the # sample mean; this is due to outliers in the data (i.e. the robustly estimated # mean apparently significantly differs from the non-robust sample mean. # Graphical display of the results: plot(res) # It is clear from the (scaled) simultaneous confidence limits that the rejection # of the hypothesis is due to the differences in variables Bottom and Diagonal ## Two sample robust Hotelling test data(hemophilia, package="rrcov") grp <-as.factor(hemophilia[,3]) x <- hemophilia[which(grp==levels(grp)[1]),1:2] y <- hemophilia[which(grp==levels(grp)[2]),1:2] #using the pooled covariance matrix to estimate the common covariance matrix res = FRBhotellingS(x,y,method="pool") #using the estimator of He and Fung to estimate the common covariance matrix res = FRBhotellingS(x,y,method="HeFung",R=199) # or using the formula interface res = FRBhotellingS(as.matrix(hemophilia[,-3])~hemophilia[,3],method="HeFung",R=99) # From the confidence limits it can be seen that the significant difference # is mainly caused by the AHFactivity variable. The graphical display helps too: plot(res) # the red line on the histogram indicates the test statistic value in the original # sample (it is omitted if the statistic exceeds 100)
Computes GS-estimates for multivariate regression together with standard errors, confidence intervals and p-values based on the Fast and Robust Bootstrap.
## S3 method for class 'formula' FRBmultiregGS(formula, data=NULL, ...) ## Default S3 method: FRBmultiregGS(X, Y, int = TRUE, R = 999, bdp = 0.5, conf = 0.95, control=GScontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' FRBmultiregGS(formula, data=NULL, ...) ## Default S3 method: FRBmultiregGS(X, Y, int = TRUE, R = 999, bdp = 0.5, conf = 0.95, control=GScontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
X |
a matrix or data frame containing the explanatory variables. |
Y |
a matrix or data frame containing the response variables. |
int |
logical: if |
R |
number of bootstrap samples. Default is |
bdp |
required breakdown point. Should have |
conf |
confidence level of the bootstrap confidence intervals. Default is |
control |
a list with control parameters for tuning the computing algorithm, see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
Generalized S-estimators are defined by minimizing the determinant of a robust estimator of the scatter matrix of
the differences of the residuals (Roelant et al. 2009). Hence, this procedure is intercept free and only gives an estimate for the slope matrix. To estimate
the intercept, we use the M-type estimator of location of Lopuhaa (1992) on the residuals with the residual scatter matrix
estimate of the residuals as a preliminary estimate. This computation is carried out by a call to GSest_multireg
(),
which uses a fast-S-type algorithm (its tuning parameters can be changed via the control
argument).
The result of this call is also returned as the value est
.
The Fast and Robust Bootstrap (Salibian-Barrera and Zamar 2002) is used to calculate so-called
basic bootstrap confidence intervals and bias corrected and accelerated (BCa)
confidence intervals (Davison and Hinkley 1997, p.194 and p.204 respectively).
Apart from the intervals with the requested confidence level, the function also returns p-values for each coefficient
corresponding to the hypothesis that the actual coefficient is zero. The p-values are computed as
1 minus the smallest level for which the confidence intervals would include zero. Both BCa and basic bootstrap p-values in this sense are given.
The bootstrap calculation is carried out by a call to GSboot_multireg
(), the result
of which is returned as the value bootest
. Bootstrap standard errors are returned as well.
Note: Bootstrap samples which contain too few distinct observations with positive weights are discarded
(a warning is given if this happens). The number of samples actually used is returned via ROK
.
In the formula
-interface, a multivariate response is produced via cbind
. For example cbind(x4,x5) ~ x1+x2+x3
.
All arguments from the default method can also be passed to the formula
method.
The returned object inherits from class mlm
such that the standard coef
, residuals
, fitted
and predict
functions can be used.
An object of class FRBmultireg
which extends class mlm
and contains at least the following components:
coefficients |
GS-estimates of the regression coefficients |
residuals |
the residuals, that is response minus fitted values |
fitted.values |
the fitted values. |
Sigma |
GS-estimate of the error covariance matrix |
scale |
GS-estimate of the size of the multivariate errors |
weights |
implicit weights corresponding to the GS-estimates (i.e. final weights in the RWLS procedure for the intercept estimate) |
outFlag |
outlier flags: 1 if the robust distance of the residual exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of the responses; 0 otherwise |
SE |
bootstrap standard errors corresponding to the regression coefficients |
cov |
bootstrap covariance matrix corresponding to the regression coefficients (in vectorized form) |
CI.bca.lower |
a matrix containing the lower bound of the bias corrected and accelerated confidence intervals for the regression coefficients |
CI.bca.upper |
a matrix containing the upper bound of the bias corrected and accelerated confidence intervals for the regression coefficients |
CI.basic.lower |
a matrix containing the lower bound of basic bootstrap intervals for the regression coefficients |
CI.basic.upper |
a matrix containing the upper bound of basic bootstrap intervals for the regression coefficients |
p.bca |
a matrix containing the p-values based on the BCa confidence intervals for the regression coefficients |
p.basic |
a matrix containing the p-values based on the basic bootstrap intervals for the regression coefficients |
est |
GS-estimates as returned by the call to |
bootest |
bootstrap results for the GS-estimates as returned by the call to |
conf |
a copy of the |
method |
a list with following components: |
control |
a copy of the |
X , Y
|
either copies of the respective arguments or the corresponding matrices produced from |
ROK |
number of bootstrap samples actually used (i.e. not discarded due to too few distinct observations with positive weight) |
Ella Roelant, Stefan Van Aelst and Gert Willems
A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
H.P. Lopuhaa (1992) Highly efficient estimators of multivariate location with high breakdown point. The Annals of Statistics, 20, 398-413.
E. Roelant, S. Van Aelst and C. Croux (2009) Multivariate Generalized S-estimators. Journal of Multivariate Analysis, 100, 876–887.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41-71.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
summary.FRBmultireg
, plot.FRBmultireg
, GSboot_multireg
, GSest_multireg
,
FRBmultiregMM
, FRBmultiregS
, GScontrol
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) #computes 25% breakdown point GS-estimate and 80% confidence intervals #based on 99 bootstrap samples: GSres <- FRBmultiregGS(school.x, school.y, R=99, bdp = 0.25, conf = 0.8,nsamp=50) #or using the formula interface GSres <- FRBmultiregGS(cbind(reading,mathematics,selfesteem)~., data=schooldata, bdp = 0.25, conf = 0.8,R=99) #the print method just displays the coefficient estimates GSres #the summary function additionally displays the bootstrap standard errors and p-values #("BCA" method by default) summary(GSres) summary(GSres, confmethod="basic") #ask explicitely for the coefficient matrix: GSres$coefficients # or equivalently, coef(GSres) #For the error covariance matrix: GSres$Sigma #plot some bootstrap histograms for the coefficient estimates #(with "BCA" intervals by default) plot(GSres, expl=c("education", "occupation"), resp=c("selfesteem","reading")) #plot bootstrap histograms for all coefficient estimates plot(GSres) #possibly the plot-function has made a selection of coefficients to plot here, #since 'all' may have been too many to fit on one page, see help(plot.FRBmultireg); #this is platform-dependent
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) #computes 25% breakdown point GS-estimate and 80% confidence intervals #based on 99 bootstrap samples: GSres <- FRBmultiregGS(school.x, school.y, R=99, bdp = 0.25, conf = 0.8,nsamp=50) #or using the formula interface GSres <- FRBmultiregGS(cbind(reading,mathematics,selfesteem)~., data=schooldata, bdp = 0.25, conf = 0.8,R=99) #the print method just displays the coefficient estimates GSres #the summary function additionally displays the bootstrap standard errors and p-values #("BCA" method by default) summary(GSres) summary(GSres, confmethod="basic") #ask explicitely for the coefficient matrix: GSres$coefficients # or equivalently, coef(GSres) #For the error covariance matrix: GSres$Sigma #plot some bootstrap histograms for the coefficient estimates #(with "BCA" intervals by default) plot(GSres, expl=c("education", "occupation"), resp=c("selfesteem","reading")) #plot bootstrap histograms for all coefficient estimates plot(GSres) #possibly the plot-function has made a selection of coefficients to plot here, #since 'all' may have been too many to fit on one page, see help(plot.FRBmultireg); #this is platform-dependent
Computes MM-estimates for multivariate regression together with standard errors, confidence intervals and p-values based on the Fast and Robust Bootstrap.
## S3 method for class 'formula' FRBmultiregMM(formula, data=NULL, ...) ## Default S3 method: FRBmultiregMM(X, Y, int = TRUE, R = 999, conf = 0.95, control=MMcontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' FRBmultiregMM(formula, data=NULL, ...) ## Default S3 method: FRBmultiregMM(X, Y, int = TRUE, R = 999, conf = 0.95, control=MMcontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
X |
a matrix or data frame containing the explanatory variables. |
Y |
a matrix or data frame containing the response variables. |
int |
logical: if |
R |
number of bootstrap samples. Default is |
conf |
level of the bootstrap confidence intervals. Default is |
control |
a list with control parameters for tuning the MM-estimate and its computing algorithm,
see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
Multivariate MM-estimates combine high breakdown point and high Gaussian efficiency. They are defined by first computing an S-estimate of regression, then fixing the scale component of the error covariance estimate, and finally re-estimating the regression coefficients and the shape part of the error covariance by a more efficient M-estimate (see Tatsuoka and Tyler (2000) for MM-estimates in the special case of location/scatter estimation, and Van Aelst and Willems (2005) for S-estimates of multivariate regression).
Tukey's biweight is used for the loss functions. By default, the first loss function (in the S-estimate) is tuned in order to obtain 50% breakdown point.
The default tuning of the second loss function (M-estimate) ensures 95% efficiency at the normal model for the coefficient estimates.
The desired efficiency can be changed through argument control
.
The computation is carried out by a call to MMest_multireg
(), which first performs the fast-S algorithm
(see Sest_multireg
) and does the M-part by reweighted least squares (RWLS) iteration.
See MMcontrol
for some adjustable tuning parameters regarding the algorithm.
The result of this call is also returned as the value est
.
The Fast and Robust Bootstrap (Salibian-Barrera and Zamar 2002) is used to calculate so-called
basic bootstrap confidence intervals and bias corrected and accelerated (BCa)
confidence intervals (Davison and Hinkley 1997, p.194 and p.204 respectively).
Apart from the intervals with the requested confidence level, the function also returns p-values for each coefficient
corresponding to the hypothesis that the actual coefficient is zero. The p-values are computed as
1 minus the smallest level for which the confidence intervals would include zero. Both BCa and basic bootstrap p-values in this sense are given.
The bootstrap calculation is carried out by a call to MMboot_multireg
(), the result
of which is returned as the value bootest
. Bootstrap standard errors are returned as well.
Note: Bootstrap samples which contain too few distinct observations with positive weights are discarded
(a warning is given if this happens). The number of samples actually used is returned via ROK
.
In the formula
-interface, a multivariate response is produced via cbind
. For example cbind(x4,x5) ~ x1+x2+x3
.
All arguments from the default method can also be passed to the formula
method except for int
(passing int
explicitely
will produce an error; the inclusion of an intercept term is determined by formula
).
The returned object inherits from class mlm
such that the standard coef
, residuals
, fitted
and predict
functions can be used.
An object of class FRBmultireg
which extends class mlm
and contains at least the following components:
coefficients |
MM-estimates of the regression coefficients |
residuals |
the residuals, that is response minus fitted values |
fitted.values |
the fitted values. |
Sigma |
MM-estimate of the error covariance matrix |
scale |
MM-estimate of the size of the multivariate errors |
weights |
implicit weights corresponding to the MM-estimates (i.e. final weights in the RWLS procedure) |
outFlag |
outlier flags: 1 if the robust distance of the residual exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of the responses; 0 otherwise |
SE |
bootstrap standard errors corresponding the regression coefficients |
cov |
bootstrap covariance matrix corresponding to the regression coefficients (in vectorized form) |
CI.bca.lower |
a matrix containing the lower bounds of the bias corrected and accelerated confidence intervals for the regression coefficients. |
CI.bca.upper |
a matrix containing the upper bounds of the bias corrected and accelerated confidence intervals for the regression coefficients. |
CI.basic.lower |
a matrix containing the lower bounds of basic bootstrap intervals for the regression coefficients. |
CI.basic.upper |
a matrix containing the upper bounds of basic bootstrap intervals for the regression coefficients. |
p.bca |
a matrix containing the p-values based on the BCa confidence intervals for the regression coefficients. |
p.basic |
a matrix containing the p-values based on the basic bootstrap intervals for the regression coefficients. |
est |
MM-estimates as returned by the call to |
bootest |
bootstrap results for the MM-estimates as returned by the call to |
conf |
a copy of the |
method |
a list with following components: |
control |
a copy of the |
X , Y
|
either copies of the respective arguments or the corresponding matrices produced from |
ROK |
number of bootstrap samples actually used (i.e. not discarded due to too few distinct observations with positive weight) |
Gert Willems, stefan Van Aelst and Ella Roelant
A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41-71.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556-582.
K.S. Tatsuoka and D.E. Tyler (2000) The uniqueness of S and M-functionals under non-elliptical distributions. The Annals of Statistics, 28, 1219-1243.
S. Van Aelst and G. Willems (2005) Multivariate regression S-estimators for robust estimation and inference. Statistica Sinica, 15, 981-1001.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
summary.FRBmultireg
, plot.FRBmultireg
, MMboot_multireg
,
MMest_multireg
, FRBmultiregS
, FRBmultiregGS
, MMcontrol
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) #computes MM-estimate and 95% confidence intervals #based on 999 bootstrap samples: MMres <- FRBmultiregMM(school.x, school.y, R=999, conf = 0.95) #or, equivalently using the formula interface MMres <- FRBmultiregMM(cbind(reading,mathematics,selfesteem)~., data=schooldata, R=999, conf = 0.95) #the print method displays the coefficient estimates MMres #the summary function additionally displays the bootstrap standard errors and p-values #("BCA" method by default) summary(MMres) summary(MMres, confmethod="basic") #ask explicitely for the coefficient matrix: MMres$coefficients # or equivalently, coef(MMres) #For the error covariance matrix: MMres$Sigma #plot some bootstrap histograms for the coefficient estimates #(with "BCA" intervals by default) plot(MMres, expl=c("education", "occupation"), resp=c("selfesteem","reading")) #plot bootstrap histograms for all coefficient estimates plot(MMres) #probably the plot-function has made a selection of coefficients to plot here, #since 'all' was too many to fit on one page, see help(plot.FRBmultireg); #this is platform-dependent
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) #computes MM-estimate and 95% confidence intervals #based on 999 bootstrap samples: MMres <- FRBmultiregMM(school.x, school.y, R=999, conf = 0.95) #or, equivalently using the formula interface MMres <- FRBmultiregMM(cbind(reading,mathematics,selfesteem)~., data=schooldata, R=999, conf = 0.95) #the print method displays the coefficient estimates MMres #the summary function additionally displays the bootstrap standard errors and p-values #("BCA" method by default) summary(MMres) summary(MMres, confmethod="basic") #ask explicitely for the coefficient matrix: MMres$coefficients # or equivalently, coef(MMres) #For the error covariance matrix: MMres$Sigma #plot some bootstrap histograms for the coefficient estimates #(with "BCA" intervals by default) plot(MMres, expl=c("education", "occupation"), resp=c("selfesteem","reading")) #plot bootstrap histograms for all coefficient estimates plot(MMres) #probably the plot-function has made a selection of coefficients to plot here, #since 'all' was too many to fit on one page, see help(plot.FRBmultireg); #this is platform-dependent
Computes S-estimates for multivariate regression together with standard errors, confidence intervals and p-values based on the Fast and Robust Bootstrap.
## S3 method for class 'formula' FRBmultiregS(formula, data=NULL, ...) ## Default S3 method: FRBmultiregS(X, Y, int = TRUE, R = 999, bdp = 0.5, conf = 0.95, control=Scontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' FRBmultiregS(formula, data=NULL, ...) ## Default S3 method: FRBmultiregS(X, Y, int = TRUE, R = 999, bdp = 0.5, conf = 0.95, control=Scontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
X |
a matrix or data frame containing the explanatory variables. |
Y |
a matrix or data frame containing the response variables. |
int |
logical: if |
R |
number of bootstrap samples. Default is |
bdp |
required breakdown point for the S-estimates. Should have |
conf |
level of the bootstrap confidence intervals. Default is |
control |
a list with control parameters for tuning the computing algorithm, see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
Multivariate S-estimates were introduced by Davies (1987) and can be highly robust while enjoying a reasonable Gaussian efficiency.
Their use in the multivariate regression setting was discussed in Van Aelst and Willems (2005).
The loss function used here is Tukey's biweight. It is tuned in order to achieve the required breakdown point bdp
(any value between 0 and 0.5).
The computation is carried out by a call to Sest_multireg
(), which performs the fast-S algorithm
(Salibian-Barrera and Yohai 2006), see Scontrol
for its tuning parameters.
The result of this call is also returned as the value est
.
The Fast and Robust Bootstrap (Salibian-Barrera and Zamar 2002) is used to calculate so-called
basic bootstrap confidence intervals and bias corrected and accelerated (BCa)
confidence intervals (Davison and Hinkley 1997, p.194 and p.204 respectively).
Apart from the intervals with the requested confidence level, the function also returns p-values for each coefficient
corresponding to the hypothesis that the actual coefficient is zero. The p-values are computed as
1 minus the smallest level for which the confidence intervals would include zero. Both BCa and basic bootstrap p-values in this sense are given.
The bootstrap calculation is carried out by a call to Sboot_multireg
(), the result
of which is returned as the value bootest
. Bootstrap standard errors are returned as well.
Note: Bootstrap samples which contain too few distinct observations with positive weights are discarded
(a warning is given if this happens). The number of samples actually used is returned via ROK
.
In the formula
-interface, a multivariate response is produced via cbind
. For example cbind(x4,x5) ~ x1+x2+x3
.
All arguments from the default method can also be passed to the formula
method except for int
(passing int
explicitely
will produce an error; the inclusion of an intercept term is determined by formula
).
The returned object inherits from class mlm
such that the standard coef
, residuals
, fitted
and predict
functions can be used.
An object of class FRBmultireg
which extends class mlm
and contains at least the following components:
coefficients |
MM-estimates of the regression coefficients |
residuals |
the residuals, that is response minus fitted values |
fitted.values |
the fitted values. |
Sigma |
S-estimate of the error covariance matrix |
scale |
MM-estimate of the size of the multivariate errors |
weights |
implicit weights corresponding to the S-estimates (i.e. final weights in the RWLS procedure at the end of the fast-S algorithm) |
outFlag |
outlier flags: 1 if the robust distance of the residual exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of the responses; 0 otherwise |
SE |
bootstrap standard errors corresponding to the regression coefficients. |
cov |
bootstrap covariance matrix corresponding to the regression coefficients (in vectorized form) |
CI.bca.lower |
a matrix containing the lower bounds of the bias corrected and accelerated confidence intervals for the regression coefficients. |
CI.bca.upper |
a matrix containing the upper bounds of the bias corrected and accelerated confidence intervals for the regression coefficients. |
CI.basic.lower |
a matrix containing the lower bounds of basic bootstrap intervals for the regression coefficients. |
CI.basic.upper |
a matrix containing the upper bounds of basic bootstrap intervals for the regression coefficients. |
p.bca |
a matrix containing the p-values based on the BCa confidence intervals for the regression coefficients. |
p.basic |
a matrix containing the p-values based on the basic bootstrap intervals for the regression coefficients. |
est |
S-estimates as returned by the call to |
bootest |
bootstrap results for the S-estimates as returned by the call to |
conf |
a copy of the |
method |
a list with following components: |
control |
a copy of the |
X , Y
|
either copies of the respective arguments or the corresponding matrices produced from |
ROK |
number of bootstrap samples actually used (i.e. not discarded due to too few distinct observations with positive weight) |
Gert Willems, Stefan Van Aelst and Ella Roelant
P.L. Davies (1987) Asymptotic behavior of S-estimates of multivariate location parameters and dispersion matrices. The Annals of Statistics, 15, 1269-1292.
A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41-71.
M. Salibian-Barrera and V. Yohai (2006) A fast algorithm for S-regression estimates. Journal of Computational and Graphical Statistics, 15, 414-427.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556-582.
S. Van Aelst and G. Willems (2005) Multivariate regression S-estimators for robust estimation and inference. Statistica Sinica, 15, 981-1001.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
summary.FRBmultireg
, plot.FRBmultireg
, Sboot_multireg
,
Sest_multireg
, FRBmultiregMM
, FRBmultiregGS
, Scontrol
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) ## computes 25% breakdown point S-estimate and 99% confidence intervals ## based on 999 bootstrap samples: Sres <- FRBmultiregS(school.x, school.y, R=999, bdp = 0.25, conf = 0.99) ## or, equivalently using the formula interface Sres <- FRBmultiregS(cbind(reading,mathematics,selfesteem)~., data=schooldata, R=999, bdp = 0.25, conf = 0.99) ## the print method displays the coefficient estimates Sres ## the summary function additionally displays the bootstrap standard errors and p-values ## ("BCA" method by default) summary(Sres) summary(Sres, confmethod="basic") ## ask explicitely for the coefficient matrix: Sres$coefficients ## or equivalently, coef(Sres) ## For the error covariance matrix: Sres$Sigma ## plot some bootstrap histograms for the coefficient estimates ## (with "BCA" intervals by default) plot(Sres, expl=c("education", "occupation"), resp=c("selfesteem","reading")) ## plot bootstrap histograms for all coefficient estimates plot(Sres) ## probably the plot-function has made a selection of coefficients to plot here, ## since 'all' was too many to fit on one page, see help(plot.FRBmultireg); ## this is platform-dependent
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) ## computes 25% breakdown point S-estimate and 99% confidence intervals ## based on 999 bootstrap samples: Sres <- FRBmultiregS(school.x, school.y, R=999, bdp = 0.25, conf = 0.99) ## or, equivalently using the formula interface Sres <- FRBmultiregS(cbind(reading,mathematics,selfesteem)~., data=schooldata, R=999, bdp = 0.25, conf = 0.99) ## the print method displays the coefficient estimates Sres ## the summary function additionally displays the bootstrap standard errors and p-values ## ("BCA" method by default) summary(Sres) summary(Sres, confmethod="basic") ## ask explicitely for the coefficient matrix: Sres$coefficients ## or equivalently, coef(Sres) ## For the error covariance matrix: Sres$Sigma ## plot some bootstrap histograms for the coefficient estimates ## (with "BCA" intervals by default) plot(Sres, expl=c("education", "occupation"), resp=c("selfesteem","reading")) ## plot bootstrap histograms for all coefficient estimates plot(Sres) ## probably the plot-function has made a selection of coefficients to plot here, ## since 'all' was too many to fit on one page, see help(plot.FRBmultireg); ## this is platform-dependent
Performs principal components analysis based on the robust MM-estimate of the shape matrix. Additionally uses the Fast and Robust Bootstrap method to compute inference measures such as standard errors and confidence intervals.
## S3 method for class 'formula' FRBpcaMM(formula, data=NULL, ...) ## Default S3 method: FRBpcaMM(Y, R = 999, conf = 0.95, control=MMcontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' FRBpcaMM(formula, data=NULL, ...) ## Default S3 method: FRBpcaMM(Y, R = 999, conf = 0.95, control=MMcontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
Y |
matrix or data frame. |
R |
number of bootstrap samples. Default is |
conf |
level of the bootstrap confidence intervals. Default is |
control |
a list with control parameters for tuning the MM-estimate and its computing algorithm,
see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
Multivariate MM-estimates are defined by first computing an S-estimate of location and covariance, then fixing its scale component and
re-estimating the location and the shape by a more efficient M-estimate, see Tatsuoka and Tyler (2000). Tukey's biweight is used for
the loss functions. By default, the first loss function (in the S-estimate) is tuned in order to obtain 50% breakdown point.
The default tuning of the second loss function (M-estimate) ensures 95% efficiency for the shape matrix estimate at the normal model.
The desired efficiency can be changed through argument control
. (However, control parameter
shapeEff
will always be considered as TRUE
by this function, whichever value is specified.) The MM-estimates are computed by a call to the implementation in the rrcov package of Todorov and Filzmoser (2009).
The result of this call is also returned as the value est
.
PCA is performed by computing the eigenvalues (eigval
) and eigenvectors (eigvec
) of the MM-estimate of shape, which is a rescaled version of the
MM-estimate of covariance (rescaled to have determinant equal to 1). With pvar
the function also provides the estimates for the percentage of
variance explained by the first principal components, which are simply the cumulative proportions of the eigenvalues sum.
Here,
ranges from 1 to
(with
the number of variables in
Y
).
The eigenvectors are always given in the order of descending eigenvalues.
The Fast and Robust Bootstrap (Salibian-Barrera and Zamar 2002) is used to calculate standard errors, and also so-called
basic bootstrap confidence intervals and bias corrected and accelerated (BCa) confidence intervals (Davison and Hinkley 1997, p.194 and p.204 respectively) corresponding
to the estimates eigval
, eigvec
and pvar
.
The bootstrap is also used to estimate the average angles between true and estimated eigenvectors, returned as avgangle
.
See Salibian-Barrera, Van Aelst and Willems (2006).
The fast and robust bootstrap computations for the MM-estimates are performed by MMboot_loccov
() and its raw result can be found in bootest
.
The actual bootstrap recalculations for the PCA-related quantities can be found in eigval.boot
, eigvec.boot
and pvar.boot
, where each column
represents a bootstrap sample. For eigvec.boot
, the eigenvectors are stacked on top of each other and the same goes for
eigvec.CI.bca
and eigvec.CI.basic
which hold the confidence limits.
The two columns in the confidence limits always respectively represent the lower and upper limits. For the percentage of variance the function also provides one-sided confidence intervals ([-infty upper]), which can be used to test the hypothesis that the true percentage at least equals a certain value.
Bootstrap samples are discarded if the fast and robust shape estimate is not positive definite, such that the actual number
of recalculations used can be lower than R
. This actual number equals R
- failedsamples
.
However, if more than 0.75R
of the bootstrap shape estimates is non-positive definite, all bootstrap samples will be
used anyway, and the negative eigenvalues are simply set to zero (which may impact the confidence limits and standard errors
for the smallest eigenvalues in eigval
and pvar
).
An object of class FRBpca
, which contains the following components:
shape |
(p x p) MM-estimate of the shape matrix of |
eigval |
(p x 1) eigenvalues of MM shape |
eigvec |
(p x p) eigenvectors of MM-shape |
pvar |
(p-1 x 1) percentages of variance for MM eigenvalues |
eigval.boot |
(p x R) eigenvalues of MM shape |
eigvec.boot |
(p*p x R) eigenvectors of MM-shape (vectorized) |
pvar.boot |
(p-1 x R) percentages of variance for MM eigenvalues |
eigval.SE |
(p x 1) bootstrap standard error for MM eigenvalues |
eigvec.SE |
(p x p) bootstrap standard error for MM eigenvectors |
pvar.SE |
(p-1 x 1) bootstrap standard error for percentage of variance for MM-eigenvalues |
angles |
(p x R) angles between bootstrap eigenvectors and original MM eigenvectors (in radians; in [0 pi/2]) |
avgangle |
(p x 1) average angles between bootstrap eigenvectors and original MM eigenvectors (in radians; in [0 pi/2]) |
eigval.CI.bca |
(p x 2) BCa intervals for MM eigenvalues |
eigvec.CI.bca |
(p*p x 2) BCa intervals for MM eigenvectors (vectorized) |
pvar.CI.bca |
(p-1 x 2) BCa intervals for percentage of variance for MM-eigenvalues |
pvar.CIone.bca |
(p-1 x 1) one-sided BCa intervals for percentage of variance for MM-eigenvalues ([-infty upper]) |
eigval.CI.basic |
(p x 2) basic bootstrap intervals for MM eigenvalues |
eigvec.CI.basic |
(p*p x 2) basic bootstrap intervals for MM eigenvectors (vectorized) |
pvar.CI.basic |
(p-1 x 2) basic bootstrap intervals for percentage of variance for MM-eigenvalues |
pvar.CIone.basic |
(p-1 x 1) one-sided basic bootstrap intervals for percentage of variance for MM-eigenvalues ([-infty upper]) |
est |
list containing the MM-estimates of location and scatter |
bootest |
(list) result of |
failedsamples |
number of bootstrap samples with non-positive definiteness of shape |
conf |
a copy of the |
method |
a character string giving the robust PCA method that was used |
w |
implicit weights corresponding to the MM-estimates (i.e. final weights in the RWLS procedure) |
outFlag |
outlier flags: 1 if the robust distance of the observation exceeds the .975 quantile of (the square root of)
the chi-square distribution with degrees of freedom equal to the dimension of |
Y |
copy of the data argument as a matrix |
Gert Willems, Stefan Van Aelst and Ella Roelant
A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2006) PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198-1211.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41-71.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556-582.
K.S. Tatsuoka and D.E. Tyler (2000) The uniqueness of S and M-functionals under non-elliptical distributions. The Annals of Statistics, 28, 1219-1243
V. Todorov and P. Filzmoser (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47. doi:10.18637/jss.v032.i03.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
plot.FRBpca
, summary.FRBpca
, print.FRBpca
, FRBpcaS
,
MMboot_loccov
, MMcontrol
data(ForgedBankNotes) MMpcares <- FRBpcaMM(ForgedBankNotes, R=999, conf=0.95) # or using the formula interface MMpcares <- FRBpcaMM(~.,data=ForgedBankNotes, R=999, conf=0.95) # the simple print method shows the standard deviations with confidence limits: MMpcares # the summary functions shows a lot more (see help(summary.FRBpca)): summary(MMpcares) # ask for the eigenvalues: MMpcares$eigval # or, in more pretty format, with confidence limits: summary(MMpcares)$eigvals # note that the standard deviations of the print-output can also be asked for by: sqrt( summary(MMpcares)$eigvals ) # the eigenvectors and their standard errors: MMpcares$eigvec # or prettier: summary(MMpcares)$eigvecs MMpcares$eigvec.SE # take a look at the bootstrap distribution of the first eigenvalue hist(MMpcares$eigval.boot[1,]) # that bootstrap distribution is used to compute confidence limits as depicted # by the screeplot function: plotFRBvars(MMpcares, cumul=0) # all plots for the FRB-PCA result: plot(MMpcares)
data(ForgedBankNotes) MMpcares <- FRBpcaMM(ForgedBankNotes, R=999, conf=0.95) # or using the formula interface MMpcares <- FRBpcaMM(~.,data=ForgedBankNotes, R=999, conf=0.95) # the simple print method shows the standard deviations with confidence limits: MMpcares # the summary functions shows a lot more (see help(summary.FRBpca)): summary(MMpcares) # ask for the eigenvalues: MMpcares$eigval # or, in more pretty format, with confidence limits: summary(MMpcares)$eigvals # note that the standard deviations of the print-output can also be asked for by: sqrt( summary(MMpcares)$eigvals ) # the eigenvectors and their standard errors: MMpcares$eigvec # or prettier: summary(MMpcares)$eigvecs MMpcares$eigvec.SE # take a look at the bootstrap distribution of the first eigenvalue hist(MMpcares$eigval.boot[1,]) # that bootstrap distribution is used to compute confidence limits as depicted # by the screeplot function: plotFRBvars(MMpcares, cumul=0) # all plots for the FRB-PCA result: plot(MMpcares)
Performs principal components analysis based on the robust S-estimate of the shape matrix. Additionally uses the Fast and Robust Bootstrap method to compute inference measures such as standard errors and confidence intervals.
## S3 method for class 'formula' FRBpcaS(formula, data=NULL, ...) ## Default S3 method: FRBpcaS(Y, R = 999, bdp = 0.5, conf = 0.95, control=Scontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' FRBpcaS(formula, data=NULL, ...) ## Default S3 method: FRBpcaS(Y, R = 999, bdp = 0.5, conf = 0.95, control=Scontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
Y |
matrix or data frame. |
R |
number of bootstrap samples. Default is |
bdp |
required breakdown point for the S-estimates. Should have |
conf |
level of the bootstrap confidence intervals. Default is |
control |
a list with control parameters for tuning the computing algorithm, see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
Multivariate S-estimates were introduced by Davies (1987) and can be highly robust while enjoying a reasonable Gaussian efficiency.
The loss function used here is Tukey's biweight. It will be tuned in order to achieve the required breakdown point bdp
(any value between 0 and 0.5).
The MM-estimates are computed by a call to the implementation of the fast-S algorithm (Salibian-Barrera and Yohai 2006) in the rrcov package of Todorov and Filzmoser (2009). Scontrol
provides some adjustable tuning parameters regarding the algorithm. The result of this call is also returned as the value est
.
PCA is performed by computing the eigenvalues (eigval
) and eigenvectors (eigvec
) of the S-estimate of shape, which is a rescaled version of the
S-estimate of covariance (rescaled to have determinant equal to 1). With pvar
the function also provides the estimates for the percentage of
variance explained by the first principal components, which are simply the cumulative proportions of the eigenvalues sum.
Here,
ranges from 1 to
(with
the number of variables in
Y
).
The eigenvectors are always given in the order of descending eigenvalues.
The Fast and Robust Bootstrap (Salibian-Barrera and Zamar 2002) is used to calculate standard errors, and also so-called
basic bootstrap confidence intervals and bias corrected and accelerated (BCa) confidence intervals (Davison and Hinkley 1997, p.194 and p.204 respectively) corresponding
to the estimates eigval
, eigvec
and pvar
.
The bootstrap is also used to estimate the average angles between true and estimated eigenvectors, returned as avgangle
.
See Salibian-Barrera, Van Aelst and Willems (2006).
The fast and robust bootstrap computations for the S-estimates are performed by Sboot_loccov
() and its raw result can be found in bootest
.
The actual bootstrap values of the PCA-related quantities can be found in eigval.boot
, eigvec.boot
and pvar.boot
, where each column
represents a bootstrap sample. For eigvec.boot
, the eigenvectors are stacked on top of each other and the same goes for
eigvec.CI.bca
and eigvec.CI.basic
which hold the confidence limits.
The two columns in the confidence limits always respectively represent the lower and upper limits. For the percentage of variance the function also provides one-sided confidence intervals ([-infty upper]), which can be used to test the hypothesis that the true percentage at least equals a certain value.
Bootstrap samples are discarded if the fast and robust covariance estimate is not positive definite, such that the actual number
of recalculations used can be lower than R
. This actual number equals R
- failedsamples
. However, if
more than 0.75R
of the bootstrap shape estimates is non-positive definite, the failed bootstrap samples are recovered
by applying the make.positive.definite
function (from package corpcor
). If this also fails, the corresponding bootstrap sample
is discarded after all, but such situation should be rare.
This recovery may have an impact on the confidence limits and standard errors of especially the smallest eigenvalues in
eigval
and pvar
.
An object of class FRBpca
, which contains the following components:
shape |
(p x p) S-estimate of the shape matrix of |
eigval |
(p x 1) eigenvalues of S shape |
eigvec |
(p x p) eigenvectors of S-shape |
pvar |
(p-1 x 1) percentages of variance for S eigenvalues |
eigval.boot |
(p x R) eigenvalues of S shape |
eigvec.boot |
(p*p x R) eigenvectors of S-shape (vectorized) |
pvar.boot |
(p-1 x R) percentages of variance for S eigenvalues |
eigval.SE |
(p x 1) bootstrap standard error for S eigenvalues |
eigvec.SE |
(p x p) bootstrap standard error for S eigenvectors |
pvar.SE |
(p-1 x 1) bootstrap standard error for percentage of variance for S eigenvalues |
angles |
(p x R) angles between bootstrap eigenvectors and original S eigenvectors (in radians; in [0 pi/2]) |
avgangle |
(p x 1) average angles between bootstrap eigenvectors and original S eigenvectors (in radians; in [0 pi/2]) |
eigval.CI.bca |
(p x 2) BCa intervals for S eigenvalues |
eigvec.CI.bca |
(p*p x 2) BCa intervals for S eigenvectors (vectorized) |
pvar.CI.bca |
(p-1 x 2) BCa intervals for percentage of variance for S-eigenvalues |
pvar.CIone.bca |
(p-1 x 1) one-sided BCa intervals for percentage of variance for S-eigenvalues ([-infty upper]) |
eigval.CI.basic |
(p x 2) basic bootstrap intervals for S eigenvalues |
eigvec.CI.basic |
(p*p x 2) basic bootstrap intervals for S eigenvectors (vectorized) |
pvar.CI.basic |
(p-1 x 2) basic bootstrap intervals for percentage of variance for S-eigenvalues |
pvar.CIone.basic |
(p-1 x 1) one-sided basic bootstrap intervals for percentage of variance for S-eigenvalues ([-infty upper]) |
est |
list containing the S-estimates of location and scatter |
bootest |
(list) result of |
failedsamples |
number of bootstrap samples with non-positive definiteness of shape |
conf |
a copy of the |
method |
a character string giving the robust PCA method that was used |
w |
implicit weights corresponding to the S-estimates (i.e. final weights in the RWLS procedure at the end of the fast-S algorithm) |
outFlag |
outlier flags: 1 if the robust distance of the observation exceeds the .975 quantile of (the square root of)
the chi-square distribution with degrees of freedom equal to the dimension of |
Y |
copy of the data argument as a matrix |
Gert Willems, Stefan Van Aelst and Ella Roelant
P.L. Davies (1987) Asymptotic behavior of S-estimates of multivariate location parameters and dispersion matrices. The Annals of Statistics, 15, 1269-1292.
A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2006) PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198-1211.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41-71.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556-582.
V. Todorov and P. Filzmoser (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47. doi:10.18637/jss.v032.i03.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
plot.FRBpca
, summary.FRBpca
, print.FRBpca
, FRBpcaMM
,
Sboot_loccov
, Scontrol
data(ForgedBankNotes) Spcares <- FRBpcaS(ForgedBankNotes, R=999, bdp=0.25, conf=0.95) ## or using the formula interface Spcares <- FRBpcaMM(~.,data=ForgedBankNotes, R=999, conf=0.95) ## the simple print method shows the standard deviations with confidence limits: Spcares ## the summary functions shows a lot more (see help(summary.FRBpca)): summary(Spcares) ## ask for the eigenvalues: Spcares$eigval ## or, in more pretty format, with confidence limits: summary(Spcares)$eigvals ## note that the standard deviations of the print-output can also be asked for by: sqrt(summary(Spcares)$eigvals) ## the eigenvectors and their standard errors: Spcares$eigvec # or prettier: summary(MMpcares)$eigvecs Spcares$eigvec.SE ## take a look at the bootstrap distribution of the first eigenvalue hist(Spcares$eigval.boot[1,]) ## that bootstrap distribution is used to compute confidence limits as depicted ## by the screeplot function: plotFRBvars(Spcares, cumul=0) ## all plots for the FRB-PCA result: plot(Spcares)
data(ForgedBankNotes) Spcares <- FRBpcaS(ForgedBankNotes, R=999, bdp=0.25, conf=0.95) ## or using the formula interface Spcares <- FRBpcaMM(~.,data=ForgedBankNotes, R=999, conf=0.95) ## the simple print method shows the standard deviations with confidence limits: Spcares ## the summary functions shows a lot more (see help(summary.FRBpca)): summary(Spcares) ## ask for the eigenvalues: Spcares$eigval ## or, in more pretty format, with confidence limits: summary(Spcares)$eigvals ## note that the standard deviations of the print-output can also be asked for by: sqrt(summary(Spcares)$eigvals) ## the eigenvectors and their standard errors: Spcares$eigvec # or prettier: summary(MMpcares)$eigvecs Spcares$eigvec.SE ## take a look at the bootstrap distribution of the first eigenvalue hist(Spcares$eigval.boot[1,]) ## that bootstrap distribution is used to compute confidence limits as depicted ## by the screeplot function: plotFRBvars(Spcares, cumul=0) ## all plots for the FRB-PCA result: plot(Spcares)
Calculates bootstrapped GS-estimates and bootstrap confidence intervals using the Fast and Robust Bootstrap method.
GSboot_multireg(X, Y, R = 999, conf=0.95, ests = GSest_multireg(X, Y))
GSboot_multireg(X, Y, R = 999, conf=0.95, ests = GSest_multireg(X, Y))
X |
a matrix or data frame containing the explanatory variables (possibly including intercept). |
Y |
a matrix or data frame containing the response variables. |
R |
number of bootstrap samples. Default is |
conf |
confidence level of the bootstrap confidence intervals. Default is |
ests |
GS-estimates as returned by |
Called by FRBmultiregGS
and typically not to
be used on its own. If no original GS-estimates are provided the function calls GSest_multireg
with its default settings.
The fast and robust bootstrap was first introduced by Salibian-Barrera and Zamar (2002) for univariate regression MM-estimators and developed for GS-estimates by Roelant et al. (2009).
The value centered
gives a matrix with R
columns and rows (
is the number of explanatory variables
and
is the number of response variables), containing the recalculated GS-estimates. Each column represents a different bootstrap sample.
The first
rows are the recalculated coefficient estimates and the next
rows are the covariance estimates
(the estimates are vectorized, i.e. columns stacked on top of each other). These bootstrap estimates are centered by the original
estimates, which are also returned through
vecest
in vectorized form.
The output list further contains bootstrap standard errors, as well as so-called basic bootstrap confidence intervals and bias corrected and accelerated confidence intervals (Davison and Hinkley, 1997, p.194 and p.204 respectively). Also in the output are p-values defined as 1 minus the smallest confidence level for which the confidence intervals would include the (hypothesised) value of zero. Both BCa and basic bootstrap p-values are given. These are only useful for the regression coefficient estimates (not really for the covariance estimates).
Bootstrap samples which contain too few distinct observations with positive weights are discarded
(a warning is given if this happens). The number of samples actually used is returned via ROK
.
A list containing the following components:
centered |
a matrix of all fast and robust bootstrap recalculations where the recalculations are centered by the original estimates (see Details) |
vecest |
a vector containing the orginal estimates stacked on top of each other |
SE |
bootstrap standard errors for the estimates in |
cov |
bootstrap covariance matrix for the estimates in |
CI.bca |
a matrix containing bias corrected and accelerated confidence intervals, corresponding to the
estimates in |
CI.basic |
a matrix containing basic bootstrap intervals, corresponding to the
estimates in |
p.bca |
a vector containing p-values based on the bias corrected and accelerated confidence intervals (corresponding to the
estimates in |
p.basic |
a vector containing p-values based on the basic bootstrap intervals (corresponding to the
estimates in |
ROK |
number of bootstrap samples actually used (i.e. not discarded due to too few distinct observations with positive weight) |
Ella Roelant, Stefan Van Aelst and Gert Willems
A.C. Davison, D.V. Hinkley (1997) Bootstrap methods and their application. Cambridge University Press.
E. Roelant, S. Van Aelst and C. Croux (2009) Multivariate Generalized S-estimators. Journal of Multivariate Analysis, 100, 876–887.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556–582.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
data(schooldata) school.x1 <- data.matrix(schooldata[,1:2]) school.y <- data.matrix(schooldata[,6:8]) ## computes 10 bootstrap recalculations starting from the GS-estimator ## obtained from GSest_multireg bootres <- GSboot_multireg(school.x1,school.y,R=5)
data(schooldata) school.x1 <- data.matrix(schooldata[,1:2]) school.y <- data.matrix(schooldata[,6:8]) ## computes 10 bootstrap recalculations starting from the GS-estimator ## obtained from GSest_multireg bootres <- GSboot_multireg(school.x1,school.y,R=5)
Computes GS-Estimates of multivariate regression based on Tukey's biweight function.
## S3 method for class 'formula' GSest_multireg(formula, data=NULL, ...) ## Default S3 method: GSest_multireg(X, Y, int = TRUE, bdp = 0.5, control=GScontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' GSest_multireg(formula, data=NULL, ...) ## Default S3 method: GSest_multireg(X, Y, int = TRUE, bdp = 0.5, control=GScontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
X |
a matrix or data frame containing the explanatory variables. |
Y |
a matrix or data frame containing the response variables. |
int |
logical: if |
bdp |
required breakdown point. Should have |
control |
a list with control parameters for tuning the computing algorithm, see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
Generalized S-estimators are defined by minimizing the determinant of a robust estimator of the scatter matrix of
the differences of the residuals. Hence, this procedure is intercept free and only gives an estimate for the slope matrix. To estimate
the intercept, we use the M-type estimator of location of Lopuhaa (1992) on the residuals with the residual scatter matrix
estimate of the residuals as a preliminary estimate. We use a fast algorithm similar to the one proposed by Salibian-Barrera
and Yohai (2006) for the regression case. See GScontrol
for the adjustable tuning parameters of this algorithm.
The returned object inherits from class mlm
such that the standard coef
, residuals
, fitted
and predict
functions can be used.
An object of class FRBmultireg
which extends class mlm
and contains at least the following components:
coefficients |
GS-estimates of the regression coefficients |
residuals |
the residuals, that is response minus fitted values |
fitted.values |
the fitted values. |
Sigma |
GS-estimate of the error covariance matrix |
Gamma |
GS-estimate of the error shape matrix |
scale |
GS-estimate of the size of the multivariate errors |
weights |
implicit weights corresponding to the GS-estimates (i.e. final weights in the RWLS procedure for the intercept estimate) |
outFlag |
outlier flags: 1 if the robust distance of the residual exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of the responses; 0 otherwise |
b , c
|
tuning parameters used in Tukey biweight loss function, as determined by |
method |
a list with following components: |
control |
a copy of the |
Ella Roelant, Gert Willems and Stefan Van Aelst
H.P. Lopuhaa (1992) Highly efficient estimators of multivariate location with high breakdown point. The Annals of Statistics, 20, 398-413.
E. Roelant, S. Van Aelst and C. Croux (2009) Multivariate Generalized S-estimators. Journal of Multivariate Analysis, 100, 876–887.
M. Salibian-Barrera and V. Yohai (2006) A fast algorithm for S-regression estimates. Journal of Computational and Graphical Statistics, 15, 414-427.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBmultiregGS
, GSboot_multireg
, Sest_multireg
, GScontrol
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) GSest <- GSest_multireg(school.x,school.y,nsamp=50) ## or using the formula interface GSests <- GSest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata)
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) GSest <- GSest_multireg(school.x,school.y,nsamp=50) ## or using the formula interface GSests <- GSest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata)
Calculates bootstrapped MM-estimates of multivariate location and scatter using the Fast and Robust Bootstrap method.
MMboot_loccov(Y, R = 999, ests = MMest_loccov(Y))
MMboot_loccov(Y, R = 999, ests = MMest_loccov(Y))
Y |
matrix or data frame. |
R |
number of bootstrap samples. Default is |
ests |
original MM-estimates as returned by |
This function is called by FRBpcaMM
and FRBhotellingMM
, it is typically not to be used on its own.
It requires the MM-estimates of multivariate location and scatter/shape
(the result of MMest_loccov
applied on Y
), supplied through the argument ests
.
If ests
is not provided, MMest_loccov
calls the implementation of the multivariate MM-estimates in package rrcov of Todorov and Filzmoser (2009) with default arguments.
For multivariate data the fast and robust bootstrap was developed by Salibian-Barrera, Van Aelst and Willems (2006).
The value centered
gives a matrix with R
columns and rows (
is the number of variables in
Y
),
containing the recalculated estimates of the MM-location, MM-shape, S-covariance and S-location.
Each column represents a different bootstrap sample.
The first rows are the MM-location estimates, the next
rows are the MM-shape estimates (vectorized). Then the next
rows are the S-covariance estimates (vectorized) and the final
rows are the S-location estimates.
The estimates are centered by the original estimates, which are also returned through
MMest
in vectorized form.
A list containing:
centered |
recalculated MM- and S-estimates of location and scatter (centered by original estimates), see Details |
MMest |
original MM- and S-estimates of location and scatter, see Details |
Gert Willems, Ella Roelant and Stefan Van Aelst
M. Salibian-Barrera, S. Van Aelst and G. Willems (2006) PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198–1211.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
V. Todorov and P. Filzmoser (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47. doi:10.18637/jss.v032.i03.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBpcaMM
, FRBhotellingMM
, Sboot_loccov
Y <- matrix(rnorm(50*5), ncol=5) MMests <- MMest_loccov(Y) bootresult <- MMboot_loccov(Y, R = 1000, ests = MMests)
Y <- matrix(rnorm(50*5), ncol=5) MMests <- MMest_loccov(Y) bootresult <- MMboot_loccov(Y, R = 1000, ests = MMests)
Calculates bootstrapped MM-estimates of multivariate regression and corresponding bootstrap confidence intervals using the Fast and Robust Bootstrap method.
MMboot_multireg(X, Y, R = 999, conf=0.95, ests = MMest_multireg(X, Y))
MMboot_multireg(X, Y, R = 999, conf=0.95, ests = MMest_multireg(X, Y))
X |
a matrix or data frame containing the explanatory variables (possibly including intercept). |
Y |
a matrix or data frame containing the response variables. |
R |
number of bootstrap samples. Default is |
conf |
level of the bootstrap confidence intervals. Default is |
ests |
MM-estimates as returned by |
Called by FRBmultiregMM
and typically not to
be used on its own. It requires the result of MMest_multireg
applied on X
and Y
,
supplied through the argument ests
. If ests
is not provided, MMest_multireg
will be called with default arguments.
The fast and robust bootstrap was first developed by Salibian-Barrera and Zamar (2002) for univariate regression MM-estimators and extended to multivariate regression by Van Aelst and Willems (2005).
The value centered
gives a matrix with R
columns and rows (
is the number of explanatory variables
and
the number of response variables),
containing the recalculated MM-estimates and initial S-estimates.
Each column represents a different bootstrap sample.
The first rows are the MM-coefficient estimates, the next
rows represent the MM-estimate of the error shape matrix
(having determinant 1).
Then the next
rows are the S-estimate of error covariance and the final
rows are the S-estimates of the regression
coefficients (all estimates are vectorized, i.e. columns stacked on top of each other).
These estimates are centered by the original estimates, which are also returned through
vecest
in vectorized form.
The output list further contains bootstrap standard errors, as well as so-called basic bootstrap confidence intervals and bias corrected and accelerated confidence intervals (Davison and Hinkley, 1997, p.194 and p.204 respectively). Also in the output are p-values defined as 1 minus the smallest confidence level for which the confidence intervals would include the (hypothesised) value of zero. Both BCa and basic bootstrap p-values are given. These are only useful for the regression coefficient estimates (not really for the covariance estimates).
Bootstrap samples which contain less than distinct observations with positive weights are discarded
(a warning is given if this happens). The number of samples actually used is returned via
ROK
.
A list containing the following components:
centered |
a matrix of all fast/robust bootstrap recalculations where the recalculations are centered by original estimates (see Details) |
vecest |
a vector containing the original estimates (see Details) |
SE |
bootstrap standard errors for the estimates in |
cov |
bootstrap covariance matrix for the estimates in |
CI.bca |
a matrix containing 95% bias corrected and accelerated confidence intervals corresponding to the
estimates in |
CI.basic |
a matrix containing 95% basic bootstrap intervals corresponding to the
estimates in |
p.bca |
a vector containing p-values based on the bias corrected and accelerated confidence intervals (corresponding to the
estimates in |
p.basic |
a vector containing p-values based on the basic bootstrap intervals (corresponding to the
estimates in |
ROK |
number of bootstrap samples actually used (i.e. not discarded due to too few distinct observations with positive weight) |
Gert Willems, Ella Roelant and Stefan Van Aelst
A.C. Davison, D.V. Hinkley (1997) Bootstrap methods and their application. Cambridge University Press.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556–582.
S. Van Aelst and G. Willems (2005) Multivariate regression S-estimators for robust estimation and inference. Statistica Sinica, 15, 981–1001.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBmultiregMM
, MMest_multireg
, Sboot_multireg
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) ## computes 1000 bootstrap recalculations starting from the MM-estimator ## obtained from MMest_multireg() bootres <- MMboot_multireg(school.x,school.y,R=1000)
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) ## computes 1000 bootstrap recalculations starting from the MM-estimator ## obtained from MMest_multireg() bootres <- MMboot_multireg(school.x,school.y,R=1000)
Calculates bootstrapped two sample MM-estimates using the Fast and Robust Bootstrap method.
MMboot_twosample(X, groups, R = 999, ests = MMest_twosample(X, groups))
MMboot_twosample(X, groups, R = 999, ests = MMest_twosample(X, groups))
X |
matrix of data frame. |
groups |
vector of 1's and 2's, indicating group numbers. |
R |
number of bootstrap samples. Default is |
ests |
original MM-estimates as returned by |
This function is called by FRBhotellingMM
, it is typically not to be used on its own.
It requires the result of MMest_twosample
applied on X
, supplied through the argument ests
.
If ests
is not provided, MMest_twosample
will be called with default arguments.
The fast and robust bootstrap was first developed by Salibian-Barrera and Zamar (2002) for univariate regression MM-estimators and extended to the two sample setting by Roelant et al. (2008).
The value centered
gives a matrix with R
columns and rows (
is the number of variables in
X
),
containing the recalculated estimates of the MM-locations, MM-shape, S-covariance and S-locations.
Each column represents a different bootstrap sample.
The first rows are the MM-location estimates of the first sample, the next
rows are the MM-location estimates of the second sample,
the next
rows are the common MM-shape estimates (vectorized). Then the next
rows are the common S-covariance estimates (vectorized), the next
are the S-location estimates of the first sample,
the final
rows are the S-location estimates of the second sample.
The estimates are centered by the original estimates, which are also returned through
MMest
in vectorized form.
A list containing:
centered |
recalculated two sample MM- and S-estimates of location and scatter (centered by original estimates), see Details |
MMest |
original two sample MM- and S-estimates of location and scatter, see Details |
Ella Roelant, Gert Willems and Stefan Van Aelst
E. Roelant, S. Van Aelst and G. Willems, (2008) Fast Bootstrap for Robust Hotelling Tests, COMPSTAT 2008: Proceedings in Computational Statistics (P. Brito, Ed.) Heidelberg: Physika-Verlag, 709–719.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556–582.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
See Also FRBhotellingMM
, Sboot_twosample
Y1 <- matrix(rnorm(50*5), ncol=5) Y2 <- matrix(rnorm(50*5), ncol=5) Ybig <- rbind(Y1,Y2) grp <- c(rep(1,50),rep(2,50)) MMests <- MMest_twosample(Ybig, grp) bootresult <- MMboot_twosample(Ybig, grp, R=500, ests=MMests)
Y1 <- matrix(rnorm(50*5), ncol=5) Y2 <- matrix(rnorm(50*5), ncol=5) Ybig <- rbind(Y1,Y2) grp <- c(rep(1,50),rep(2,50)) MMests <- MMest_twosample(Ybig, grp) bootresult <- MMboot_twosample(Ybig, grp, R=500, ests=MMests)
Compute S- and MM-Estimates of multivariate location and covariance matrix
MMest_loccov(Y, control=MMcontrol(...), ...) Sest_loccov(Y, bdp=.5, control=Scontrol(...), ...) MMest_twosample(X, groups, control=MMcontrol(...), ...) Sest_twosample(X, groups, bdp=0.5, control=Scontrol(...), ...)
MMest_loccov(Y, control=MMcontrol(...), ...) Sest_loccov(Y, bdp=.5, control=Scontrol(...), ...) MMest_twosample(X, groups, control=MMcontrol(...), ...) Sest_twosample(X, groups, bdp=0.5, control=Scontrol(...), ...)
Y |
input matrix or data frame |
X |
input matrix or data frame |
bdp |
breakdown point, defaults to 0.5 |
groups |
grouping variable |
control |
a list with control parameters for tuning the S- or MM-estimate
and its computing algorithm, see |
... |
further arguments to be passed to |
This functions are internal, wrappers around the functions Sest()
CovMMest()
.
Return lists with the following components:
Mu |
location |
Gamma |
shape |
scale |
scale=det^(1/(2*m)) |
Sigma |
covariance matrix |
c1 |
tuning parameter of the loss function for MM-estimation |
SMu |
location of the initial S-estimate |
SGamma |
shape of the initial S-estimate |
SSigma |
covariance matrix of the initial S-estimate |
b |
tuning parameters used in Tukey biweight loss function for S-estimation, as determined by bdp |
w |
scaled weights |
outflag |
outlier flags |
Y <- matrix(rnorm(50*5), ncol=5) (MMests <- MMest_loccov(Y)) (Sests <- Sest_loccov(Y, bdp = 0.25)) Y1 <- matrix(rnorm(50*5), ncol=5) Y2 <- matrix(rnorm(50*5), ncol=5) Ybig <- rbind(Y1,Y2) grp <- c(rep(1,50),rep(2,50)) (MMests <- MMest_twosample(Ybig, grp))
Y <- matrix(rnorm(50*5), ncol=5) (MMests <- MMest_loccov(Y)) (Sests <- Sest_loccov(Y, bdp = 0.25)) Y1 <- matrix(rnorm(50*5), ncol=5) Y2 <- matrix(rnorm(50*5), ncol=5) Ybig <- rbind(Y1,Y2) grp <- c(rep(1,50),rep(2,50)) (MMests <- MMest_twosample(Ybig, grp))
Computes MM-Estimates of multivariate regression, using initial S-estimates
## S3 method for class 'formula' MMest_multireg(formula, data=NULL, ...) ## Default S3 method: MMest_multireg(X, Y, int = TRUE, control=MMcontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' MMest_multireg(formula, data=NULL, ...) ## Default S3 method: MMest_multireg(X, Y, int = TRUE, control=MMcontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
X |
a matrix or data frame containing the explanatory variables (possibly including intercept). |
Y |
a matrix or data frame containing the response variables. |
int |
logical: if |
control |
a list with control parameters for tuning the MM-estimate and its computing algorithm,
see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
This function is called by FRBmultiregMM
.
The MM-estimates are defined by first computing S-estimates of regression, then fixing the scale component of the error covariance
estimate, and finally re-estimating the regression coefficients and the shape part of the error covariance by more efficient
M-estimates (see Tatsuoka and Tyler (2000) for MM-estimates in the special case of location/scatter estimation, and Van Aelst and
Willems (2005) for S-estimates of multivariate regression). Tukey's biweight is used for
the loss functions. By default, the first loss function (in the S-estimates) is tuned in order to obtain 50% breakdown point.
The default tuning of the second loss function (M-estimates) ensures 95% efficiency at the normal model for the coefficient estimates.
The desired efficiency can be changed via argument control
.
The computation of the S-estimates is performed by a call to Sest_multireg
, which uses the fast-S algorithm.
See MMcontrol
() to see or change the tuning parameters for this algorithm. The M-estimate part is computed
through iteratively reweighted least squares (RWLS).
Apart from the MM-estimate of the regression coefficients, the function returns both the MM-estimate of the error
covariance Sigma
and the corresponding shape estimate Gamma
(which has determinant equal to 1).
Additionally, the initial S-estimates are returned as well (their Gaussian efficiency is usually lower than the MM-estimates but they may
have a lower bias).
The returned object inherits from class mlm
such that the standard coef
, residuals
, fitted
and predict
functions can be used.
An object of class FRBmultireg
which extends class mlm
and contains at least the following components:
coefficients |
MM-estimates of the regression coefficients |
residuals |
the residuals, that is response minus fitted values |
fitted.values |
the fitted values. |
Sigma |
MM-estimate of the error covariance matrix |
Gamma |
MM-estimate of the error shape matrix |
scale |
S-estimate of the size of the multivariate errors |
weights |
implicit weights corresponding to the MM-estimates (i.e. final weights in the RWLS procedure) |
outFlag |
outlier flags: 1 if the robust distance of the residual exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of the responses; 0 otherwise |
c0 , b , c1
|
tuning parameters of the loss functions (depend on control parameters |
method |
a list with following components: |
control |
a copy of the |
SBeta |
S-estimate of the regression coefficient matrix |
SSigma |
S-estimate of the error covariance matrix |
SGamma |
S-estimate of the error shape matrix |
Gert Willems, Stefan Van Aelst and Ella Roelant
K.S. Tatsuoka and D.E. Tyler (2000), The uniqueness of S and M-functionals under non-elliptical distributions. The Annals of Statistics, 28, 1219–1243.
S. Van Aelst and G. Willems (2005), Multivariate regression S-estimators for robust estimation and inference. Statistica Sinica, 15, 981–1001.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBmultiregMM
, MMboot_multireg
, Sest_multireg
, MMcontrol
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) ## compute 95% efficient MM-estimates MMres <- MMest_multireg(school.x,school.y) ## or using the formula interface MMres <- MMest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata) ## the MM-estimate of the regression coefficient matrix: MMres$coefficients ## or alternatively coef(MMres) ## Do plots n <- nrow(schooldata) oldpar <- par(mfrow=c(2,1)) ## the estimates can be considered as weighted least squares estimates with the ## following implicit weights plot(1:n, MMres$weights) ## Sres$outFlag tells which points are outliers based on whether or not their ## robust distance exceeds the .975 chi-square cut-off: plot(1:n, MMres$outFlag) ## (see also the diagnostic plot in plotDiag()) par(oldpar)
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) ## compute 95% efficient MM-estimates MMres <- MMest_multireg(school.x,school.y) ## or using the formula interface MMres <- MMest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata) ## the MM-estimate of the regression coefficient matrix: MMres$coefficients ## or alternatively coef(MMres) ## Do plots n <- nrow(schooldata) oldpar <- par(mfrow=c(2,1)) ## the estimates can be considered as weighted least squares estimates with the ## following implicit weights plot(1:n, MMres$weights) ## Sres$outFlag tells which points are outliers based on whether or not their ## robust distance exceeds the .975 chi-square cut-off: plot(1:n, MMres$outFlag) ## (see also the diagnostic plot in plotDiag()) par(oldpar)
Plot function for FRBhot
objects: plots the bootstrap histogram of the null distribution, and
the simultaneous confidence limits (scaled)
## S3 method for class 'FRBhot' plot(x,...)
## S3 method for class 'FRBhot' plot(x,...)
x |
an R object of class |
... |
potentially more arguments |
This generic plot
function presents two graphs. The first (top panel) is a histogram representing the test statistics
in the bootstrap samples, which estimate the null distribution.
A red line indicates the test statistic in the original sample (but is not shown when this value exceeds 100).
The second (bottom panel) displays the simultaneous confidence intervals based on the same bootstrap result.
The intervals are scaled such that they all have the same length. Furthermore, in case of the one-sample test
the intervals are shown relative to the hypothesized value mu0
. Such visualization is meant to easily
recognize the extent to which each variable is responsible for the overall deviation from the hypothesized value.
Returns invisibly the first argument.
Gert Willems, Ella Roelant and Stefan Van Aelst
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
## One sample robust Hotelling test data(ForgedBankNotes) samplemean <- apply(ForgedBankNotes, 2, mean) res = FRBhotellingS(ForgedBankNotes, mu0=samplemean,R=99) plot(res) ## Note that the test rejects the hypothesis that the true mean equals the ## sample mean; this is due to outliers in the data (i.e. the robustly estimated ## center apparently significantly differs from the non-robust sample mean. ## It is clear from the scaled simultaneous confidence limits that the rejection ## of the hypothesis is due to the differences in variables Bottom and Diagonal ## For comparison, the hypothesis would be accepted if only the first three ## variables were considered: res = FRBhotellingS(ForgedBankNotes[,1:3], mu0=samplemean[1:3],R=99) plot(res) ## Two sample robust Hotelling test data(hemophilia, package="rrcov") res <- FRBhotellingMM(cbind(AHFactivity, AHFantigen) ~ gr, data=hemophilia, R=99) plot(res) ## From the confidence limits it can be seen that the significant difference ## is mainly caused by the AHFactivity variable. ## the red line on the histogram indicates the test statistic value in the original ## sample (it is omitted if the statistic exceeds 100)
## One sample robust Hotelling test data(ForgedBankNotes) samplemean <- apply(ForgedBankNotes, 2, mean) res = FRBhotellingS(ForgedBankNotes, mu0=samplemean,R=99) plot(res) ## Note that the test rejects the hypothesis that the true mean equals the ## sample mean; this is due to outliers in the data (i.e. the robustly estimated ## center apparently significantly differs from the non-robust sample mean. ## It is clear from the scaled simultaneous confidence limits that the rejection ## of the hypothesis is due to the differences in variables Bottom and Diagonal ## For comparison, the hypothesis would be accepted if only the first three ## variables were considered: res = FRBhotellingS(ForgedBankNotes[,1:3], mu0=samplemean[1:3],R=99) plot(res) ## Two sample robust Hotelling test data(hemophilia, package="rrcov") res <- FRBhotellingMM(cbind(AHFactivity, AHFantigen) ~ gr, data=hemophilia, R=99) plot(res) ## From the confidence limits it can be seen that the significant difference ## is mainly caused by the AHFactivity variable. ## the red line on the histogram indicates the test statistic value in the original ## sample (it is omitted if the statistic exceeds 100)
Plot function for objects of class FRBmultireg
. It produces histograms for the bootstrap estimates for all
(or a selection) of the regression coefficients, based on Fast and Robust Bootstrap and with visualization of bootstrap confidence limits.
## S3 method for class 'FRBmultireg' plot(x, expl, resp, confmethod = c("BCA","basic"), onepage = TRUE, ...)
## S3 method for class 'FRBmultireg' plot(x, expl, resp, confmethod = c("BCA","basic"), onepage = TRUE, ...)
x |
an R object of class |
expl |
optional; vector specifying the explanatory variables to be shown (either by index or by variable name) |
resp |
optional; vector specifying the response variables to be shown (either by index or by variable name) |
confmethod |
which kind of bootstrap confidence intervals to be displayed: 'BCA'= bias corrected and accelerated method, 'basic'= basic bootstrap method |
onepage |
logical: if TRUE, all requested histograms are plotted on one page; if FALSE, separate pages are used for each response variable |
... |
potentially more arguments to be passed |
With and
the number of explanatory resp. response variables specified, the function by default
(i.e. if
onepage=TRUE
)
plots a by
matrix of histograms, showing the bootstrap recalculations of the corresponding entry in the regression
coefficient matrix as provided in
x
. The original estimates for the coefficients are indicated by dotted lines,
while the solid lines are the bootstrap confidence limits. In case the interval does not contain zero, the plot title is
printed in red and a star is added, indicating significance.
However, if and/or
are large, the histograms may not fit on the page and an attempt to do it may
result in an error. Therefore, the function first tries whether it fits (the outcome is platform-dependent),
and if not it reduces
and/or
until all plots do fit on the page. Hence, only a selection may be shown and the
user is given a warning in that case.
If onepage=FALSE
, separate pages are used for each response variable and the user is prompted for page change.
In case the number () of explanatory variables is very large, the function again may show only a selection.
Returns invisibly the first argument.
Gert Willems and Ella Roelant
S. Van Aelst and G. Willems (2005). Multivariate regression S-estimators for robust estimation and inference. Statistica Sinica, 15, 981-1001.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBmultiregS
, FRBmultiregMM
, FRBmultiregGS
, summary.FRBmultireg
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) Sres <- FRBmultiregS(school.x, school.y, R=999, bdp = 0.25, conf = 0.99) plot(Sres) ## the plot command above selected a subset, since otherwise an error may occur; ## as may happen when you explicitely ask for all coefficients to be plotted on one page: plot(Sres, expl=1:6, resp=1:3) ## use separate pages for each response in case of many covariates: plot(Sres, onepage=FALSE) ## perhaps specify some specific variables of interest: plot(Sres, expl=c("education", "occupation"), resp=c("selfesteem","reading")) ## or (the same): plot(Sres, expl=2:3, resp=c(3,1))
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) Sres <- FRBmultiregS(school.x, school.y, R=999, bdp = 0.25, conf = 0.99) plot(Sres) ## the plot command above selected a subset, since otherwise an error may occur; ## as may happen when you explicitely ask for all coefficients to be plotted on one page: plot(Sres, expl=1:6, resp=1:3) ## use separate pages for each response in case of many covariates: plot(Sres, onepage=FALSE) ## perhaps specify some specific variables of interest: plot(Sres, expl=c("education", "occupation"), resp=c("selfesteem","reading")) ## or (the same): plot(Sres, expl=2:3, resp=c(3,1))
Plot functions for FRBpca
objects: plots PC variances, PC angles and PC loadings, with bootstrap inference
## S3 method for class 'FRBpca' plot(x, which = 1:3, pcs.loadings = 1:min(5, length(x$eigval)), confmethod = c("BCA","basic"), ...) plotFRBvars(x, cumul = 2, confmethod = c("BCA","basic"), npcs = min(10, length(x$eigval))) plotFRBangles(x, pcs = 1:min(12,length(x$eigval))) plotFRBloadings(x, confmethod = c("BCA","basic"), pcs = 1:min(5, length(x$eigval)), nvars=min(10, length(x$eigval)))
## S3 method for class 'FRBpca' plot(x, which = 1:3, pcs.loadings = 1:min(5, length(x$eigval)), confmethod = c("BCA","basic"), ...) plotFRBvars(x, cumul = 2, confmethod = c("BCA","basic"), npcs = min(10, length(x$eigval))) plotFRBangles(x, pcs = 1:min(12,length(x$eigval))) plotFRBloadings(x, confmethod = c("BCA","basic"), pcs = 1:min(5, length(x$eigval)), nvars=min(10, length(x$eigval)))
x |
an R object of class |
which |
integer number(s) between 1 and 3 to specify which plot is desired (1 = variances; 2 = angles; 3 = loadings) |
pcs.loadings |
integer number(s) indicating for which of the PCs the loadings should be shown (in case the
|
cumul |
integer between 0 and 2: 0 = screeplot, i.e. the variances of the PCs are shown; 1 = the cumulative variances (percentage) of the PCs are shown; 2 = (default) both plots are shown on the same page |
confmethod |
which kind of bootstrap confidence intervals to be displayed: 'BCA'= bias corrected and accelerated method, 'basic'= basic bootstrap method |
npcs |
number of PCs to be included in screeplot/cumulative variances plot |
pcs |
PCs to consider in plot; defaults to first 12 (maximally) for |
nvars |
number of variables for which loadings should be shown in each PC; the loadings are shown in decreasing order in each PC |
... |
potentially more arguments |
The generic plot
function calls plotFRBvars
, plotFRBangles
and
plotFRBloadings
, according to which
of these are respectively specified in argument which
, and
displays the plots on separate pages (the user is prompted for each new page). The PCs for which the loadings should be plotted
can be specified through the pcs.loadings
argument. The other arguments are set to their default values by plot
.
The solid curves displayed by plotFRBvars
indicate the actual estimates of the variances (or percentages), while the dashed curves represent the confidence
limits as computed by FRBpcaS
or FRBpcaMM
.
plotFRBangles
plots, for each PC, histograms of the angles between the bootstrapped PC and the original PC estimate.
The angles are in radians, between 0 and pi/2. These limits are indicated by the red vertical lines. Angles close to zero correspond
to bootstrapped PCs closely aligned with the original PC, while an angle close to pi/2 means the bootstrapped PC is roughly
perpendicular to the original estimate (hence a large number of angles close to pi/2 implies high variability).
If the number of PCs specified in pcs
is very large (usually larger than the default settings), the histograms may not fit
on one page and a selection will be made (the user will be given a warning in that case).
In plotFRBloadings
, the red dots represent the loadings, which are between -1 and 1. The square brackets indicate the confidence
limits as computed by FRBpcaS
or FRBpcaMM
. Only the loadings of the first nvars
variables
are shown, where the variables were ordered according to the absolute value of the loading (i.e. only the nvars
most important
variables for that particular PC are shown).
Returns invisibly the first argument.
Gert Willems and Ella Roelant
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBpcaS
, FRBpcaMM
, summary.FRBpca
data(ForgedBankNotes) MMpcares <- FRBpcaMM(ForgedBankNotes, R=999, conf=0.95) plot(MMpcares) ## a closer look at the screeplot, specifying basic bootstrap intervals plotFRBvars(MMpcares, cumul=0, confmethod="basic") ## plots the bootstrap angles for the first PC only plotFRBangles(MMpcares, pcs=1) ## plots the loadings, with basic bootstrap intervals, for *all* the PCs plotFRBloadings(MMpcares, confmethod="basic", pcs=1:ncol(ForgedBankNotes))
data(ForgedBankNotes) MMpcares <- FRBpcaMM(ForgedBankNotes, R=999, conf=0.95) plot(MMpcares) ## a closer look at the screeplot, specifying basic bootstrap intervals plotFRBvars(MMpcares, cumul=0, confmethod="basic") ## plots the bootstrap angles for the first PC only plotFRBangles(MMpcares, pcs=1) ## plots the loadings, with basic bootstrap intervals, for *all* the PCs plotFRBloadings(MMpcares, confmethod="basic", pcs=1:ncol(ForgedBankNotes))
Calculates bootstrapped S-estimates using the Fast and Robust Bootstrap method.
Sboot_loccov(Y, R = 999, ests = Sest_loccov(Y))
Sboot_loccov(Y, R = 999, ests = Sest_loccov(Y))
Y |
matrix or data frame. |
R |
number of bootstrap samples. Default is |
ests |
original S-estimates as returned by |
This function is called by FRBpcaS
and FRBhotellingS
, it is typically not to be used on its own.
It requires the S-estimates of multivariate location and scatter/shape
(the result of Sest_loccov
applied on Y
), supplied through the argument ests
.
If ests
is not provided, Sest_loccov
calls the implementation of the multivariate S-estimates in package rrcov of Todorov and Filzmoser (2009) with default arguments.
For multivariate data the fast and robust bootstrap was developed by Salibian-Barrera, Van Aelst and Willems (2006).
The value centered
gives a matrix with R
columns and rows (
is the number of variables in
Y
),
containing the recalculated estimates of the S-location and -covariance. Each column represents a different bootstrap sample.
The first rows are the location estimates and the next
rows are the covariance estimates (vectorized). The estimates
are centered by the original estimates, which are also returned through
Sest
.
A list containing:
centered |
recalculated estimates of location and covariance (centered by original estimates) |
Sest |
original estimates of location and covariance |
Gert Willems, Ella Roelant and Stefan Van Aelst
M. Salibian-Barrera, S. Van Aelst and G. Willems (2006) PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198–1211.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
V. Todorov and P. Filzmoser (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47. doi:10.18637/jss.v032.i03.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBpcaS
, FRBhotellingS
, MMboot_loccov
Y <- matrix(rnorm(50*5), ncol=5) Sests <- Sest_loccov(Y, bdp = 0.25) bootresult <- Sboot_loccov(Y, R = 1000, ests = Sests)
Y <- matrix(rnorm(50*5), ncol=5) Sests <- Sest_loccov(Y, bdp = 0.25) bootresult <- Sboot_loccov(Y, R = 1000, ests = Sests)
Calculates bootstrapped S-estimates of multivariate regression and corresponding bootstrap confidence intervals using the Fast and Robust Bootstrap method.
Sboot_multireg(X, Y, R = 999, conf=0.95, ests = Sest_multireg(X, Y))
Sboot_multireg(X, Y, R = 999, conf=0.95, ests = Sest_multireg(X, Y))
X |
a matrix or data frame containing the explanatory variables (possibly including intercept). |
Y |
a matrix or data frame containing the response variables. |
R |
number of bootstrap samples. Default is |
conf |
level of the bootstrap confidence intervals. Default is |
ests |
S-estimates as returned by |
Called by FRBmultiregS
and typically not to
be used on its own. It requires the result of Sest_multireg
applied on X
and Y
,
supplied through the argument ests
. If ests
is not provided, Sest_multireg
will be called with default arguments.
The fast and robust bootstrap was first developed by Salibian-Barrera and Zamar (2002) for univariate regression MM-estimators and extended to multivariate regression by Van Aelst and Willems (2005).
The value centered
gives a matrix with R
columns and rows (
is the number of explanatory variables
and
the number of response variables),
containing the recalculated S-estimates of the regression coefficients and the error covariance matrix.
Each column represents a different bootstrap sample.
The first
rows are the coefficient estimates, the next
rows represent the covariance estimate
(the estimates are vectorized, i.e. columns stacked on top of each other).
The estimates are centered by the original estimates, which are also returned through
vecest
in vectorized form.
The output list further contains bootstrap standard errors, as well as so-called basic bootstrap confidence intervals and bias corrected and accelerated (BCa) confidence intervals (Davison and Hinkley, 1997, p.194 and p.204 respectively). Also in the output are p-values defined as 1 minus the smallest confidence level for which the confidence intervals would include the (hypothesised) value of zero. Both BCa and basic bootstrap p-values are given. These are only useful for the regression coefficient estimates (not really for the covariance estimates).
Bootstrap samples which contain less than distinct observations with positive weights are discarded
(a warning is given if this happens). The number of samples actually used is returned via
ROK
.
A list containing the following components:
centered |
a matrix of all fast/robust bootstrap recalculations where the recalculations are centered by original estimates (see Details) |
vecest |
a vector containing the original estimates (see Details) |
SE |
bootstrap standard errors for the estimates in |
cov |
bootstrap covariance matrix for the estimates in |
CI.bca |
a matrix containing bias corrected and accelerated confidence intervals corresponding to the
estimates in |
CI.basic |
a matrix containing basic bootstrap intervals corresponding to the
estimates in |
p.bca |
a vector containing p-values based on the bias corrected and accelerated confidence intervals (corresponding to the
estimates in |
p.basic |
a vector containing p-values based on the basic bootstrap intervals (corresponding to the
estimates in |
ROK |
number of bootstrap samples actually used (i.e. not discarded due to too few distinct observations with positive weight) |
Gert Willems, Ella Roelant and Stefan Van Aelst
A.C. Davison, D.V. Hinkley (1997) Bootstrap methods and their application. Cambridge University Press.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556–582.
S. Van Aelst and G. Willems (2005) Multivariate regression S-estimators for robust estimation and inference. Statistica Sinica, 15, 981–1001.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBmultiregS
, Sest_multireg
, MMboot_multireg
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) #computes 1000 bootstrap recalculations starting from the S-estimator #obtained from Sest_multireg() bootres <- Sboot_multireg(school.x,school.y,R=1000)
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) #computes 1000 bootstrap recalculations starting from the S-estimator #obtained from Sest_multireg() bootres <- Sboot_multireg(school.x,school.y,R=1000)
Calculates bootstrapped two-sample S-estimates using the Fast and Robust Bootstrap method.
Sboot_twosample(X, groups, R = 999, ests = Sest_twosample(X, groups))
Sboot_twosample(X, groups, R = 999, ests = Sest_twosample(X, groups))
X |
matrix or data frame. |
groups |
vector of 1's and 2's, indicating group numbers. |
R |
number of bootstrap samples. Default is |
ests |
original two-sample S-estimates as returned by |
This function is called by FRBhotellingS
, it is typically not to be used on its own.
It requires the result of Sest_twosample
applied on X
, supplied through the argument ests
.
If ests
is not provided, Sest_twosample
will be called with default arguments.
The fast and robust bootstrap was first developed by Salibian-Barrera and Zamar (2002) for univariate regression MM-estimators and extended to the two sample setting by Roelant et al. (2008).
The value centered
gives a matrix with R
columns and rows (
is the number of variables in
X
),
containing the recalculated estimates of the S-location for the first and second center and common S-covariance. Each column represents
a different bootstrap sample.
The first rows are the location estimates of the first center, the next
rows are the location
estimates of the second center and the last
rows are the common covariance estimates (vectorized). The estimates
are centered by the original estimates, which are also returned through
Sest
.
A list containing:
centered |
recalculated estimates of location of first and second center and covariance (centered by original estimates) |
Sest |
original estimates of first and second center and common covariance |
Ella Roelant, Gert Willems and Stefan Van Aelst
E. Roelant, S. Van Aelst and G. Willems, (2008) Fast Bootstrap for Robust Hotelling Tests, COMPSTAT 2008: Proceedings in Computational Statistics (P. Brito, Ed.) Heidelberg: Physika-Verlag, 709–719.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556–582.
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
Y1 <- matrix(rnorm(50*5), ncol=5) Y2 <- matrix(rnorm(50*5), ncol=5) Ybig <- rbind(Y1,Y2) grp <- c(rep(1,50),rep(2,50)) Sests <- Sest_twosample(Ybig, grp, bdp=0.25) bootresult <- Sboot_twosample(Ybig,grp,R=1000,ests=Sests)
Y1 <- matrix(rnorm(50*5), ncol=5) Y2 <- matrix(rnorm(50*5), ncol=5) Ybig <- rbind(Y1,Y2) grp <- c(rep(1,50),rep(2,50)) Sests <- Sest_twosample(Ybig, grp, bdp=0.25) bootresult <- Sboot_twosample(Ybig,grp,R=1000,ests=Sests)
School Data, from Charnes et al. (1981). The aim is to explain scores on 3 different tests from 70 school sites by means of 5 explanatory variables.
data(schooldata)
data(schooldata)
A data frame with 70 observations on the following 8 variables.
education
education level of mother as measured in terms of percentage of high school graduates among female parents
occupation
highest occupation of a family member according to a pre-arranged rating scale
visit
parental visits index representing the number of visits to the school site
counseling
parent counseling index calculated from data on time spent with child on school-related topics such as reading together, etc.
teacher
number of teachers at a given site
reading
total reading score as measured by the Metropolitan Achievement Test
mathematics
total mathematics score as measured by the Metropolitan Achievement Test
selfesteem
Coopersmith Self-Esteem Inventory, intended as a measure of self-esteem
Charnes et al. (1981)
A. Charnes, W.W. Cooper and E. Rhodes (1981) Evaluating Program and Managerial Efficiency: An Application of Data Envelopment Analysis to Program Follow Through. Management Science, 27, 668-697.
data(schooldata)
data(schooldata)
Tuning parameters for multivariate S, MM and GS estimates as used in FRB functions for multivariate regression, PCA and Hotelling tests. Mainly regarding the fast-(G)S algorithm.
Scontrol(nsamp = 500, k = 3, bestr = 5, convTol = 1e-10, maxIt = 50) MMcontrol(bdp = 0.5, eff = 0.95, shapeEff = FALSE, convTol.MM = 1e-07, maxIt.MM = 50, fastScontrols = Scontrol(...), ...) GScontrol(nsamp = 100, k = 3, bestr = 5, convTol = 1e-10, maxIt = 50)
Scontrol(nsamp = 500, k = 3, bestr = 5, convTol = 1e-10, maxIt = 50) MMcontrol(bdp = 0.5, eff = 0.95, shapeEff = FALSE, convTol.MM = 1e-07, maxIt.MM = 50, fastScontrols = Scontrol(...), ...) GScontrol(nsamp = 100, k = 3, bestr = 5, convTol = 1e-10, maxIt = 50)
nsamp |
number of random subsamples to be used in the fast-(G)S algorithm |
k |
number of initial concentration steps performed on each subsample candidate |
bestr |
number of best candidates to keep for full iteration (i.e. concentration steps until convergence) |
convTol |
relative convergence tolerance for estimates used in (G)S-concentration iteration |
maxIt |
maximal number of steps in (G)S-concentration iteration |
bdp |
breakdown point of the MM-estimates; usually equals 0.5 |
eff |
Gaussian efficiency of the MM-estimates; usually set at 0.95 |
shapeEff |
logical; if |
convTol.MM |
relative convergence tolerance for estimates used in MM-iteration |
maxIt.MM |
maximal number of steps in MM-iteration |
fastScontrols |
the tuning parameters of the initial S-estimate |
... |
allows for any individual parameter from |
The default number of random samples is lower for GS-estimates than for S-estimates, because computations regarding the former are more demanding.
A list with the tuning parameters as set by the arguments.
Gert Willems and Ella Roelant
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
GSest_multireg
, Sest_multireg
,
MMest_multireg
, Sest_twosample
, MMest_twosample
, FRBpcaS
, ...
## Show the default settings: str(Scontrol()) str(MMcontrol()) str(GScontrol())
## Show the default settings: str(Scontrol()) str(MMcontrol()) str(GScontrol())
Computes S-Estimates of multivariate regression based on Tukey's biweight function using the fast-S algorithm.
## S3 method for class 'formula' Sest_multireg(formula, data=NULL, ...) ## Default S3 method: Sest_multireg(X, Y, int = TRUE, bdp = 0.5, control=Scontrol(...), na.action=na.omit, ...)
## S3 method for class 'formula' Sest_multireg(formula, data=NULL, ...) ## Default S3 method: Sest_multireg(X, Y, int = TRUE, bdp = 0.5, control=Scontrol(...), na.action=na.omit, ...)
formula |
an object of class |
data |
data frame from which variables specified in formula are to be taken. |
X |
a matrix or data frame containing the explanatory variables (possibly including intercept). |
Y |
a matrix or data frame containing the response variables. |
int |
logical: if |
bdp |
required breakdown point. Should have |
control |
a list with control parameters for tuning the computing algorithm, see |
na.action |
a function which indicates what should happen when the data contain NAs. Defaults to |
... |
allows for specifying control parameters directly instead of via |
This function is called by FRBmultiregS
.
S-estimates for multivariate regression were discussed in Van Aelst and Willems (2005). The algorithm used here is a multivariate
version of the fast-S algorithm introduced by Salibian-Barrera and Yohai (2006).
See Scontrol
for the adjustable tuning parameters of this algorithm.
Apart from the regression coefficients, the function returns both the error covariance matrix estimate Sigma
and
the corresponding shape estimate Gamma
(which has determinant equal to 1).
The scale
is determined by , with
the number of response variables.
The returned object inherits from class mlm
such that the standard coef
, residuals
, fitted
and predict
functions can be used.
An object of class FRBmultireg
which extends class mlm
and contains at least the following components:
coefficients |
S-estimates of the regression coefficients |
residuals |
the residuals, that is response minus fitted values |
fitted.values |
the fitted values. |
Gamma |
S-estimate of the error shape matrix |
Sigma |
S-estimate of the error covariance matrix |
scale |
S-estimate of the size of the multivariate errors |
weights |
implicit weights corresponding to the S-estimates (i.e. final weights in the RWLS procedure at the end of the fast-S algorithm) |
outFlag |
outlier flags: 1 if the robust distance of the residual exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of the responses; 0 otherwise |
b , c
|
tuning parameters used in Tukey biweight loss function, as determined by |
method |
a list with following components: |
control |
a copy of the |
Gert Willems, Stefan Van Aelst and Ella Roelant
M. Salibian-Barrera and V. Yohai (2006) A fast algorithm for S-regression estimates. Journal of Computational and Graphical Statistics, 15, 414–427.
S. Van Aelst and G. Willems (2005) Multivariate regression S-estimators for robust estimation and inference. Statistica Sinica, 15, 981–1001
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBmultiregS
, Sboot_multireg
, MMest_multireg
, Scontrol
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) ## compute 25% breakdown S-estimates Sres <- Sest_multireg(school.x,school.y, bdp=0.25) ## or using the formula interface Sres <- Sest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata, bdp=0.25) ## the regression coefficients: Sres$coefficients ## or alternatively coef(Sres) n <- nrow(schooldata) oldpar <- par(mfrow=c(2,1)) ## the estimates can be considered as weighted least squares estimates with the ## following implicit weights plot(1:n, Sres$weights) ## Sres$outFlag tells which points are outliers based on whether or not their ## robust distance exceeds the .975 chi-square cut-off: plot(1:n, Sres$outFlag) ## (see also the diagnostic plot in plotDiag()) par(oldpar)
data(schooldata) school.x <- data.matrix(schooldata[,1:5]) school.y <- data.matrix(schooldata[,6:8]) ## compute 25% breakdown S-estimates Sres <- Sest_multireg(school.x,school.y, bdp=0.25) ## or using the formula interface Sres <- Sest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata, bdp=0.25) ## the regression coefficients: Sres$coefficients ## or alternatively coef(Sres) n <- nrow(schooldata) oldpar <- par(mfrow=c(2,1)) ## the estimates can be considered as weighted least squares estimates with the ## following implicit weights plot(1:n, Sres$weights) ## Sres$outFlag tells which points are outliers based on whether or not their ## robust distance exceeds the .975 chi-square cut-off: plot(1:n, Sres$outFlag) ## (see also the diagnostic plot in plotDiag()) par(oldpar)
Summary method for objects of class FRBhot
, and print method of the summary object.
## S3 method for class 'FRBhot' summary(object, digits = 5, ...) ## S3 method for class 'summary.FRBhot' print(x, ...)
## S3 method for class 'FRBhot' summary(object, digits = 5, ...) ## S3 method for class 'summary.FRBhot' print(x, ...)
object |
an R object of class |
digits |
number of digits for printing (default is 5) |
x |
an R object of class |
... |
potentially more arguments to be passed to methods |
The print
method here displays the value of the test statistic and the corresponding bootstrap p-value. It also presents the simultaneous confidence intervals for the components of the location vector (or difference
between the two location vectors), and the robust estimates for the location vector(s) and covariance matrix.
summary.FRBhot
simply returns its two arguments in a list.
Gert Willems, Ella Roelant and Stefan Van Aelst
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBhotellingS
, FRBhotellingMM
, plot.FRBhot
data(ForgedBankNotes) samplemean <- apply(ForgedBankNotes, 2, mean) res = FRBhotellingS(ForgedBankNotes, mu0=samplemean) summary(res) # -> print.summary.FRBhot() method
data(ForgedBankNotes) samplemean <- apply(ForgedBankNotes, 2, mean) res = FRBhotellingS(ForgedBankNotes, mu0=samplemean) summary(res) # -> print.summary.FRBhot() method
Summary method for objects of class FRBmultireg
, and print method of the summary object.
## S3 method for class 'FRBmultireg' summary(object, confmethod = c("BCA", "basic", "both"), digits = 3, print.CI=FALSE, sep="", ...) ## S3 method for class 'summary.FRBmultireg' print(x, ...)
## S3 method for class 'FRBmultireg' summary(object, confmethod = c("BCA", "basic", "both"), digits = 3, print.CI=FALSE, sep="", ...) ## S3 method for class 'summary.FRBmultireg' print(x, ...)
object |
an R object of class |
confmethod |
which kind of bootstrap confidence intervals to be displayed: 'BCA'= bias corrected and accelerated method, 'basic'= basic bootstrap method, 'both'=both kinds of confidence intervals |
digits |
number of digits for printing (default is 3) |
print.CI |
logical: Should Confidence intervals be printed? |
sep |
Symmbol to separate columns in output. Default is |
x |
an R object of class |
... |
potentially more arguments to be passed to methods |
The print
method displays in a “familiar way” the components of the summary
object, which are listed in the Value section.
summary
returns an object of class summary.FRBmultireg
, which contains the following components:
responses |
the names of the response variables in the fitted model |
covariates |
the names of the covariates (predictors) in the fitted model |
Betawstd |
a data frame containing the coefficient estimates and their bootstrap standard errors |
Sigma |
estimate for the error covariance matrix |
table.bca |
a list with for each response variable a matrix containing the estimates, standard errors, lower and upper limits of the
BCa confidence intervals, p-values and a significance code (only present when |
table.basic |
a list with for each response variable a matrix containing the estimates, standard errors, lower and upper limits of the
basic bootstrap confidence intervals, p-values and a significance code (only present when |
method |
multivariate regression method that was used |
conf |
confidence level that was used |
digits |
number of digits for printing |
Gert Willems, Ella Roelant and Stefan Van Aelst
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBmultiregS
, FRBmultiregMM
, FRBmultiregGS
, print.FRBmultireg
, plot.FRBmultireg
data(schooldata) MMres <- FRBmultiregMM(cbind(reading,mathematics,selfesteem)~., data=schooldata, R=199, conf = 0.99,nsamp=200) summary(MMres) # -> print.summary.FRBmultireg() method GSres <- FRBmultiregGS(cbind(reading,mathematics,selfesteem)~., data=schooldata, bdp = 0.25,R=199,nsamp=50) summary(GSres, confmethod="both") # -> print.summary.FRBmultireg() method
data(schooldata) MMres <- FRBmultiregMM(cbind(reading,mathematics,selfesteem)~., data=schooldata, R=199, conf = 0.99,nsamp=200) summary(MMres) # -> print.summary.FRBmultireg() method GSres <- FRBmultiregGS(cbind(reading,mathematics,selfesteem)~., data=schooldata, bdp = 0.25,R=199,nsamp=50) summary(GSres, confmethod="both") # -> print.summary.FRBmultireg() method
Summary method for objects of class FRBpca
, and print method of the summary object.
## S3 method for class 'FRBpca' summary(object, confmethod = c("BCA", "basic", "both"), digits = 3, ...) ## S3 method for class 'summary.FRBpca' print(x, ...)
## S3 method for class 'FRBpca' summary(object, confmethod = c("BCA", "basic", "both"), digits = 3, ...) ## S3 method for class 'summary.FRBpca' print(x, ...)
object |
an R object of class |
confmethod |
which kind of bootstrap confidence intervals to be displayed: 'BCA'= bias corrected and accelerated method, 'basic'= basic bootstrap method, 'both'= both kinds of confidence intervals |
digits |
number of digits for printing (default is 3) |
x |
an R object of class |
... |
potentially more arguments to be passed to methods |
The print
method displays mostly the components of the summary
object as listed in the Value section.
summary
returns an object of class summary.FRBpca
, which contains the following components:
eigvals |
eigenvalues of the shape estimate (variances of the principal components) with confidence limits |
eigvecs |
eigenvectors of the shape estimate (loadings of the principal components) |
avgangle |
bootstrap estimates of average angles between true and estimated eigenvectors |
pvars |
cumulative percentage of variance explained by first principal components with confidence limits |
method |
PCA method that was used |
digits |
number of digits for printing |
Gert Willems, Ella Roelant and Stefan Van Aelst
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. doi:10.18637/jss.v053.i03.
FRBpcaS
, FRBpcaMM
, print.FRBpca
, plot.FRBpca
data(ForgedBankNotes) MMpcares <- FRBpcaMM(ForgedBankNotes, R=999, conf=0.95) summary(MMpcares) # -> print.summary.FRBpca() method
data(ForgedBankNotes) MMpcares <- FRBpcaMM(ForgedBankNotes, R=999, conf=0.95) summary(MMpcares) # -> print.summary.FRBpca() method