Title: | Rank-Based Estimation for Linear Models |
---|---|
Description: | Rank-based (R) estimation and inference for linear models. Estimation is for general scores and a library of commonly used score functions is included. |
Authors: | John Kloke, Joseph McKean |
Maintainer: | John Kloke <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.27.0 |
Built: | 2024-10-31 22:25:32 UTC |
Source: | CRAN |
Package provides functions for rank-based analyses of linear models. Rank-based estimation and inference offers a robust alternative to least squares.
Package: | Rfit |
Type: | Package |
Version: | 0.27.0 |
Date: | 2024-05-25 |
License: | GPL (version 2 or later) |
LazyLoad: | yes |
John Kloke, Joesph McKean
Maintainer: John Kloke <[email protected]>
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of residuals. Annal s of Mathematical Statistics, 43, 1449 - 1458.
Jureckova, J. (1971). Nonparametric estimate of regression coefficients. Annals of Mathematical Statistics , 42, 1328 - 1338.
data(baseball) data(wscores) fit<-rfit(weight~height,data=baseball) summary(fit) plot(fitted(fit),rstudent(fit)) ### Example of the Reduction (Drop) in dispersion test ### y<-rnorm(47) x1<-rnorm(47) x2<-rnorm(47) fitF<-rfit(y~x1+x2) fitR<-rfit(y~x1) drop.test(fitF,fitR)
data(baseball) data(wscores) fit<-rfit(weight~height,data=baseball) summary(fit) plot(fitted(fit),rstudent(fit)) ### Example of the Reduction (Drop) in dispersion test ### y<-rnorm(47) x1<-rnorm(47) x2<-rnorm(47) fitF<-rfit(y~x1+x2) fitR<-rfit(y~x1) drop.test(fitF,fitR)
An object of class scores which includes the score function and it's derivative for rank-based regression inference.
data(wscores)
data(wscores)
The format is: Formal class 'scores' [package ".GlobalEnv"] with 2 slots ..@ phi :function (u) ..@ Dphi:function (u)
Using Wilcoxon (linear) scores leads to inference which has ARE of 0.955 to least squares (ML) when the data are normal. Wilcoxon scores are optimal when the underlying error distribution is logistic. Normal scores are optimal when the data are normally distributed. Log-rank scores are optimal when the data are from an exponential distribution, e.g. in a proportional hazards model. Log-Generalized F scores can also be used in the analysis of survival data (see Hettmansperger and McKean p. 233).
bentscores1 are recommended for right-skewed distributions. bentscores2 are recommended for light-tailed distributions. bentscores3 are recommended for left-skewed distributions. bentscores4 are recommended for heavy-tailed distributions.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
u <- seq(0.01,0.99,by=0.01) plot(u,getScores(wscores,u),type='l',main='Wilcoxon Scores') plot(u,getScores(nscores,u),type='l',main='Normal Scores') data(wscores) x<-runif(50) y<-rlogis(50) rfit(y~x,scores=wscores) x<-rnorm(50) y<-rnorm(50) rfit(y~x,scores=nscores)
u <- seq(0.01,0.99,by=0.01) plot(u,getScores(wscores,u),type='l',main='Wilcoxon Scores') plot(u,getScores(nscores,u),type='l',main='Normal Scores') data(wscores) x<-runif(50) y<-rlogis(50) rfit(y~x,scores=wscores) x<-rnorm(50) y<-rnorm(50) rfit(y~x,scores=nscores)
These data come from the back-side of 59 baseball cards that Carrie had.
data(baseball)
data(baseball)
A data frame with 59 observations on the following 6 variables.
height
Height in inches
weight
Weight in pounds
bat
a factor with levels L
R
S
throw
a factor with levels L
R
field
a factor with levels 0
1
average
ERA if the player is a pitcher and his batting average if the player is a fielder
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
data(baseball) wilcox.test(height~field,data=baseball) rfit(weight~height,data=baseball)
data(baseball) wilcox.test(height~field,data=baseball) rfit(weight~height,data=baseball)
Salaries of 176 professional baseball players for the 1987 season.
data(bbsalaries)
data(bbsalaries)
A data frame with 176 observations on the following 8 variables.
logYears
Log of the number of years experience
aveWins
Average wins per year
aveLosses
Average losses per year
era
Earned Run Average
aveGames
Average games pitched in per year
aveInnings
Average number of innings pitched per year
aveSaves
Average number of saves per year
logSalary
Log of the base salary in dollars
http://lib.stat.cmu.edu/datasets/baseball.data
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
data(bbsalaries) summary(rfit(logSalary~logYears+aveWins+aveLosses+era+aveGames+aveInnings+aveSaves,data=bbsalaries))
data(bbsalaries) summary(rfit(logSalary~logYears+aveWins+aveLosses+era+aveGames+aveInnings+aveSaves,data=bbsalaries))
The data are the results of a 3 * 4 two-way design, where forty-eight animals were exposed to three different poisons and four different treatments. The design is balanced with four replications per cell. The response was the log survival time of the animal.
data(BoxCox)
data(BoxCox)
A data frame with 48 observations on the following 3 variables.
logSurv
log Survival Time
Poison
a factor indicating poison level
Treatment
a factor indicating treatment level
Box, G.E.P. and Cox, D.R. (1964), An analysis of transformations, Journal of the Royal Statistical Society, Series B, Methodological, 26, 211-252.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
data(BoxCox) with(BoxCox,interaction.plot(Treatment,Poison,logSurv,median)) raov(logSurv~Poison+Treatment,data=BoxCox)
data(BoxCox) with(BoxCox,interaction.plot(Treatment,Poison,logSurv,median)) raov(logSurv~Poison+Treatment,data=BoxCox)
Data from a study to investigate assocation between uric acid and various cardiovascular risk factors in developing countries (Heritier et. al. 2009). There are 474 men and 524 women aged 25-64.
data(CardioRiskFactors)
data(CardioRiskFactors)
A data frame with 998 observations on the following 14 variables.
age
Age of subject
bmi
Body Mass Index
waisthip
waist/hip ratio(?)
smok
indicator for regular smoker
choles
total cholesterol
trig
triglycerides level in body fat
hdl
high-density lipoprotien(?)
ldl
low-density lipoprotein
sys
systolic blood pressure
dia
diastolic blood pressure(?)
Uric
serum uric
sex
indicator for male
alco
alcohol intake (mL/day)
apoa
apoprotein A
Data set and description taken from Heritier et. al. (2009) (c.f. Conen et. al. 2004).
Heritier, S., Cantoni, E., Copt, S., and Victoria-Feser, M. (2009), Robust Methods in Biostatistics, New York: John Wiley and Sons.
Conen, D., Wietlisbach, V., Bovet, P., Shamlaye, C., Riesen, W., Paccaud, F., and Burnier, M. (2004), Prevalence of hyperuricemia and relation of serum uric acid with cardiovascular risk factors in a developing country. BMC Public Health.
data(CardioRiskFactors) fitF<-rfit(Uric~bmi+sys+choles+ldl+sex+smok+alco+apoa+trig+age,data=CardioRiskFactors) fitR<-rfit(Uric~bmi+sys+choles+ldl+sex,data=CardioRiskFactors) drop.test(fitF,fitR) summary(fitR)
data(CardioRiskFactors) fitF<-rfit(Uric~bmi+sys+choles+ldl+sex+smok+alco+apoa+trig+age,data=CardioRiskFactors) fitR<-rfit(Uric~bmi+sys+choles+ldl+sex,data=CardioRiskFactors) drop.test(fitF,fitR) summary(fitR)
Returns the critical value to be used in calculating adjusted confidence intervals. Currently provides methods for Boneferroni and Tukey for confidence interval adjustment methods as well as no adjustment.
confintadjust(n, k, alpha = 0.05, method = confintadjust.methods, ...)
confintadjust(n, k, alpha = 0.05, method = confintadjust.methods, ...)
n |
sample size |
k |
number of comparisons |
alpha |
overall (experimentwise) type I error rate |
method |
one of confintadjust.methods |
... |
Additonal arguments. Currently not used. |
Returns critial value based on one of the adjustment methods.
cv |
critical value |
method |
the method used |
Joseph McKean, John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Returns the value of Jaeckel's dispersion function for given values of the regression coefficents.
disp(beta, x, y, scores)
disp(beta, x, y, scores)
beta |
p by 1 vector of regression coefficents |
x |
n by p design matrix |
y |
n by 1 response vector |
scores |
an object of class scores |
Returns the value of Jaeckel's disperion function evaluated
at the value of the parameters in the function call.
That is,
where
R denotes rank
and a(1) <= a(2) <= ... <= a(n) are the scores.
The residuals (e_i i=1,...n) are calculated y - x beta.
John Kloke, Joseph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of residuals. Annals of Mathematical Statistics, 43, 1449 - 1458.
Given two full model fits, this function performs a reduction in dispersion test.
drop.test(fitF, fitR = NULL)
drop.test(fitF, fitR = NULL)
fitF |
An object of class rfit. The full model fit. |
fitR |
An object of class rfit. The reduced model fit. |
Rank-based inference procedure analogous to the traditional (LS) reduced model test.
The full and reduced model dispersions are calculated. The reduction in dispersion test, or drop test for short, has an asymptotic chi-sq distribution. Simulation studies suggest using F critical values. The p-value returned is based on a F-distribution with df1 and df2 degrees of freedom where df1 is the difference in the number of parameters in the fits of fitF and fitR and df2 is the residual degrees of freedom in the fit fitF.
Both fits are based on a minimization routine. It is possible that resulting solutions are such that the fitF$disp > fitRdisp. We recommend starting the full model at the reduced model fit as a way to avoid this situation. See examples.
Checks to see if models appear to be proper subsets. The space spanned by the columns of the reduced model design matrix should be a subset of the space spanned by the columns of the full model design matrix.
F |
Value of the F test statistic |
p.value |
The observed significance level of the test (using an F quantile) |
RD |
Reduced model dispersion minus Full model dispersion |
tauhat |
Estimate of the scale parameter (using the full model residuals) |
df1 |
numerator degrees of freedom |
df2 |
denominator degrees of freedom |
John Kloke, Joseph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
y<-rnorm(47) x1<-rnorm(47) x2<-rnorm(47) fitF<-rfit(y~x1+x2) fitR<-rfit(y~x1) drop.test(fitF,fitR) ## try starting the full model at the reduced model fit ## fitF<-rfit(y~x1+x2,yhat0=fitR$fitted) drop.test(fitF,fitR)
y<-rnorm(47) x1<-rnorm(47) x2<-rnorm(47) fitF<-rfit(y~x1+x2) fitR<-rfit(y~x1) drop.test(fitF,fitR) ## try starting the full model at the reduced model fit ## fitF<-rfit(y~x1+x2,yhat0=fitR$fitted) drop.test(fitF,fitR)
The response variable is level of free fatty acid in a sample of prepubescent boys. The explanatory variables are age (in months), weight (in lbs), and skin fold thickness.
data(ffa)
data(ffa)
A data frame with 41 rows and 4 columns.
age
age in years
weight
weight in lbs
skin
skin fold thinkness
ffa
free fatty acid
Morrison, D.F. (1983), Applied Linear Statistical Models, Englewood Cliffs, NJ:Prentice Hall.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
data(ffa) summary(rfit(ffa~age+weight+skin,data=ffa)) #using the default (Wilcoxon scores) summary(rfit(ffa~age+weight+skin,data=ffa,scores=bentscores1))
data(ffa) summary(rfit(ffa~age+weight+skin,data=ffa)) #using the default (Wilcoxon scores) summary(rfit(ffa~age+weight+skin,data=ffa,scores=bentscores1))
~~ Methods for function getScores
~~
Calculates the centered and scaled scores as used in rank-based analysis.
signature(object = "scores")
~~ Methods for function getScoresDeriv
~~
This derivative is used in the estimate of the scale parameter tau.
signature(object = "scores")
An estimate of the scale parameter tau may be used for the standard errors of the coefficients in rank-based regression.
gettau(ehat, p, scores = Rfit::wscores, delta = 0.8, hparm = 2, ...) gettauF0(ehat, p, scores = Rfit::wscores, delta = 0.8, hparm = 2, ...)
gettau(ehat, p, scores = Rfit::wscores, delta = 0.8, hparm = 2, ...) gettauF0(ehat, p, scores = Rfit::wscores, delta = 0.8, hparm = 2, ...)
ehat |
vector of length n: full model residuals |
p |
scalar: number of regression coefficients (excluding the intercept); see Details |
scores |
object of class scores, defaults to Wilcoxon scores |
delta |
confidence level; see Details |
hparm |
used in Huber's degrees of freedom correction; see Details |
... |
additional arguments. currently unused |
For rank-based analyses of linear models, the estimator of the scale parameter
plays a standardizing role in the standard errors (SE) of the rank-based estimators of the regression coefficients and in the denominator of Wald-type and the drop-in-dispersion test statistics of linear hypotheses.
rfit
currently implements the KSM (Koul, Sievers, and McKean 1987) estimator of tau.
The functions gettau
and gettauF0
are both available to compute the KSM estimate and may be call from rfit
and used for inference. The default is to use the faster FORTRAN version gettauF0
via the to option TAU='F0'
.
The R version, gettau
, may be much slower especially when sample sizes are large; this version may be called from rfit
using the option TAU='R'
.
The KSM estimator tauhat is a density type estimator that has the bandwidth given by ,
where
is the
quantile of the cdf
given in expression (3.7.2) of Hettmansperger and McKean (2011), with the corresponding estimator
, given in expression (3.7.7) of Hettmansperger and McKean (2011).
Based on simulation studies, most situations where (n/p >= 6), the default delta = 0.80 provides a valid rank-based analysis (McKean and Sheather, 1991). For situations with n/p < 6, caution is needed as the KSM estimate is sensitive to choice of bandwidth. McKean and Sheather (1991) recommend using a value of 0.95 for delta in such situations.
To correct for heavy-tailed random errors, Huber (1973) proposed a degree of freedom correction for the M-estimate scale parameter. The correction is given by where
is the proportion of standardized residuals in absolute value less than the parameter
hparm
. This correction is used as a multiplicative factor to tauhat. The default value of hparm is set at 2.
The usual degrees of freedom correction, , is also used as a multiplicative factor to tauhat.
Length one numeric object.
Joseph McKean, John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Huber, P.J. (1973), Robust regression: Asymptotics, conjectures and Monte Carlo, Annals of Statistics, 1, 799–821.
Koul, H.L., Sievers, G.L., and McKean, J.W. (1987), An estimator of the scale parameter for the rank analysis of linear models under general score functions, Scandinavian Journal of Statistics, 14, 131–141.
McKean, J. W. and Sheather, S. J. (1991), Small Sample Properties of Robust Analyses of Linear Models Based on R-Estimates: A Survey, in Directions in Robust Statistics and Diagnostics, Part II, Editors: W.\ Stahel and S.\ Weisberg, Springer-Verlag: New York, 1–19.
# For a standard normal distribution the parameter tau has the value 1.023327 (sqrt(pi/3)). set.seed(283643659) n <- 12; p <- 6; y <- rnorm(n); x <- matrix(rnorm(n*p),ncol=p) tau1 <- rfit(y~x)$tauhat; tau2 <- rfit(y~x,delta=0.95)$tauhat c(tau1,tau2) # 0.5516708 1.0138415 n <- 120; p <- 6; y <- rnorm(n); x <- matrix(rnorm(n*p),ncol=p) tau3 <- rfit(y~x)$tauhat; tau4 <- rfit(y~x,delta=0.95)$tauhat c(tau3,tau4) # 1.053974 1.041783
# For a standard normal distribution the parameter tau has the value 1.023327 (sqrt(pi/3)). set.seed(283643659) n <- 12; p <- 6; y <- rnorm(n); x <- matrix(rnorm(n*p),ncol=p) tau1 <- rfit(y~x)$tauhat; tau2 <- rfit(y~x,delta=0.95)$tauhat c(tau1,tau2) # 0.5516708 1.0138415 n <- 120; p <- 6; y <- rnorm(n); x <- matrix(rnorm(n*p),ncol=p) tau3 <- rfit(y~x)$tauhat; tau4 <- rfit(y~x,delta=0.95)$tauhat c(tau3,tau4) # 1.053974 1.041783
Calculate the Gradiant of Jaeckel's Dispersion Function
grad(x, y, beta, scores)
grad(x, y, beta, scores)
x |
n by p design matrix |
y |
n by 1 response vector |
beta |
p by 1 vector of regression coefficients |
scores |
an object of class scores |
The gradiant evaluated at beta.
John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of residuals. Annals of Mathematical Statistics, 43, 1449 - 1458.
Jureckova, J. (1971). Nonparametric estimate of regression coefficients. Annals of Mathematical Statistics, 42, 1328 - 1338.
## The function is currently defined as function (x, y, beta, scores) { x <- as.matrix(x) e <- y - x %*% beta r <- rank(e, ties.method = "first")/(length(e) + 1) -t(x) %*% scores@phi(r) }
## The function is currently defined as function (x, y, beta, scores) { x <- as.matrix(x) e <- y - x %*% beta r <- rank(e, ties.method = "first")/(length(e) + 1) -t(x) %*% scores@phi(r) }
Uses the built-in function optim
to minimize Jaeckel's dispersion function with respect to beta.
jaeckel(x, y, beta0 = lm(y ~ x)$coef[2:(ncol(x) + 1)], scores = Rfit::wscores, control = NULL,...)
jaeckel(x, y, beta0 = lm(y ~ x)$coef[2:(ncol(x) + 1)], scores = Rfit::wscores, control = NULL,...)
x |
n by p design matrix |
y |
n by 1 response vector |
beta0 |
initial estimate of beta |
scores |
object of class 'scores' |
control |
control passed to fitting routine |
... |
addtional arguments to be passed to fitting routine |
Jaeckel's dispersion function (Jaeckel 1972) is a convex function which measures the distance between the observed responses and the fitted values
.
The dispersion function is a sum of the products of the residuals,
, and the scored ranks of the residuals. A rank-based fit minimizes the dispersion function; see McKean and Schrader (1980) and Kloke and McKean (2012) for discussion.
jaeckel
uses optim
with the method set to BFGS to minimize Jaeckel's dispersion function.
If control is not specified at the function call, the relative tolerance (reltol) is set to .Machine$double.eps^(3/4)
and maximum number of iterations is set to 200.
jaeckel
is intended to be an internal function. See rfit
for a general purpose function.
Results of optim
are returned.
John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Jaeckel, L. A. (1972), Estimating regression coefficients by minimizing the dispersion of residuals. Annals of Mathematical Statistics, 43, 1449 - 1458.
Kapenga, J. A., McKean, J. W., and Vidmar, T. J. (1988), RGLM: Users Manual, Statist. Assoc. Short Course on Robust Statistical Procedures for the Analysis of Linear and Nonlinear Models, New Orleans.
## This is a internal function. See rfit for user-level examples.
## This is a internal function. See rfit for user-level examples.
These are internal functions used to construct the robust anova table.
The function raov
is the main program.
kwayr(levs, data,...) cellx(X) khmat(levsind,permh) pasteColsRfit(x,sep="") redmod(xmat,amat) subsets(k)
kwayr(levs, data,...) cellx(X) khmat(levsind,permh) pasteColsRfit(x,sep="") redmod(xmat,amat) subsets(k)
levs |
vector of levels corresponding to each of the factors |
data |
data matrix in the form y, factor 1,..., factor k |
X |
n x k matrix where the columns represent the levels of the k factors. |
levsind |
Internal parameter. |
permh |
Internal parameter. |
x |
n x k matrix where the columns represent the levels of the k factors. |
xmat |
n x p full model design matrix |
amat |
Internal parameter. |
k |
Internal parameter. |
sep |
Seperator used in pasteColsRfit |
... |
additional arguments |
Renamed pasteCols of library plotrix written by Jim Lemon et. al. June 2011 under GPL 2
Joseph McKean, John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Hocking, R. R. (1985), The Analysis of Linear Models, Monterey, California: Brooks/Cole.
Carries out a robust analysis of variance for a one factor design. Analysis is based on the R estimates.
oneway.rfit(y, g, scores = Rfit::wscores, p.adjust = "none",...)
oneway.rfit(y, g, scores = Rfit::wscores, p.adjust = "none",...)
y |
n by 1 response vector |
g |
n by 1 vector representing group membership |
scores |
an object of class 'scores' |
p.adjust |
adjustment to the p-values, argument passed to p.adjust |
... |
additional arguments |
Carries out a robust one-way analysis of variance based on full model r fit.
fit |
full model fit from rfit |
est |
Estimates |
se |
Standard Errors |
I |
First Index |
J |
Second Index |
p.value |
p-values |
y |
response vector |
g |
vector denoting group membership |
Joseph McKean, John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
data(quail) oneway.rfit(quail$ldl,quail$treat)
data(quail) oneway.rfit(quail$ldl,quail$treat)
Internal class for use with score functions.
A virtual Class: No objects may be created from it.
No methods defined with class "param" in the signature.
John Kloke
showClass("param")
showClass("param")
These functions print the output in a user-friendly manner using the internal R function print
.
## S3 method for class 'rfit' print(x, ...) ## S3 method for class 'summary.rfit' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'drop.test' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'oneway.rfit' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'summary.oneway.rfit' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'raov' print(x, digits = max(5, .Options$digits - 2), ...)
## S3 method for class 'rfit' print(x, ...) ## S3 method for class 'summary.rfit' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'drop.test' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'oneway.rfit' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'summary.oneway.rfit' print(x, digits = max(5, .Options$digits - 2), ...) ## S3 method for class 'raov' print(x, digits = max(5, .Options$digits - 2), ...)
x |
An object to be printed |
digits |
number of digits to display |
... |
additional arguments to be passed to |
John Kloke
Thirty-nine quail were randomized to one of for treatments for lowering cholesterol.
data(quail)
data(quail)
A data frame with 39 observations on the following 2 variables.
treat
a factor with levels 1
2
3
4
ldl
a numeric vector
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
data(quail) boxplot(ldl~treat,data=quail)
data(quail) boxplot(ldl~treat,data=quail)
Returns full model fit and robust ANOVA table for all main effects and interactions.
raov(f, data = list(), ...)
raov(f, data = list(), ...)
f |
an object of class formula |
data |
an optional data frame |
... |
additional arguments |
Based on reduction in dispersion tests for testing main effects and interaction. Uses an algorithm described in Hocking (1985).
table |
Description of 'comp1' |
fit |
full model fit returned from rfit |
residuals |
the residuals, i.e. y-yhat |
fitted.values |
yhat = x betahat |
call |
Call to the function |
Joseph McKean, John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Hocking, R. R. (1985), The Analysis of Linear Models, Monterey, California: Brooks/Cole.
raov(logSurv~Poison+Treatment,data=BoxCox)
raov(logSurv~Poison+Treatment,data=BoxCox)
Minimizes Jaeckel's dispersion function to obtain a rank-based solution for linear models.
rfit(formula, data = list(), ...) ## Default S3 method: rfit(formula, data, subset, yhat0 = NULL, scores = Rfit::wscores, symmetric = FALSE, TAU = "F0", betahat0 = NULL, ...)
rfit(formula, data = list(), ...) ## Default S3 method: rfit(formula, data, subset, yhat0 = NULL, scores = Rfit::wscores, symmetric = FALSE, TAU = "F0", betahat0 = NULL, ...)
formula |
an object of class formula |
data |
an optional data frame |
subset |
an optional argument specifying the subset of observations to be used |
yhat0 |
an n by 1 vector of initial fitted values, default is NULL |
scores |
an object of class 'scores' |
symmetric |
logical. If 'FALSE' uses median of residuals as estimate of intercept |
TAU |
version of estimation routine for scale parameter. F0 for Fortran, R for (slower) R, N for none |
betahat0 |
a p by 1 vector of initial parameter estimates, default is NULL |
... |
additional arguments to be passed to fitting routines |
Rank-based estimation involves replacing the L2 norm of least squares estimation with a pseudo-norm which is a function of the residuals and the scored ranks of the residuals. That is, in rank-based estimation, the usual notion of Euclidean distance is replaced with another measure of distance which is referred to as Jaeckel's (1972) dispersion function. Jaeckel's dispersion function depends on a score function and a library of commonly used score functions is included; eg., linear (Wilcoxon) and normal (Gaussian) scores. If an inital fit is not supplied (i.e. yhat0 = NULL and betahat0 = NULL) then inital fit is based on a LS fit.
Esimation of scale parameter tau is provided which may be used for inference.
coefficients |
estimated regression coefficents with intercept |
residuals |
the residuals, i.e. y-yhat |
fitted.values |
yhat = x betahat |
xc |
centered design matrix |
tauhat |
estimated value of the scale parameter tau |
taushat |
estimated value of the scale parameter tau_s |
betahat |
estimated regression coefficents |
call |
Call to the function |
John Kloke, Joesph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of residuals. Annals of Mathematical Statistics, 43, 1449 - 1458.
Jureckova, J. (1971). Nonparametric estimate of regression coefficients. Annals of Mathematical Statistics, 42, 1328 - 1338.
summary.rfit
drop.test
rstudent.rfit
data(baseball) data(wscores) fit<-rfit(weight~height,data=baseball) summary(fit) ### set the starting value x1 <- runif(47); x2 <- runif(47); y <- 1 + 0.5*x1 + rnorm(47) # based on a fit to a sub-model rfit(y~x1+x2,yhat0=fitted.values(rfit(y~x1))) ### set value of delta used in estimation of tau ### w <- factor(rep(1:3,each=3)) y <- rt(9,9) rfit(y~w)$tauhat rfit(y~w,delta=0.95)$tauhat # recommended when n/p < 5
data(baseball) data(wscores) fit<-rfit(weight~height,data=baseball) summary(fit) ### set the starting value x1 <- runif(47); x2 <- runif(47); y <- 1 + 0.5*x1 + rnorm(47) # based on a fit to a sub-model rfit(y~x1+x2,yhat0=fitted.values(rfit(y~x1))) ### set value of delta used in estimation of tau ### w <- factor(rep(1:3,each=3)) y <- rt(9,9) rfit(y~w)$tauhat rfit(y~w,delta=0.95)$tauhat # recommended when n/p < 5
Returns the Studentized residuals based on rank-based estimation.
## S3 method for class 'rfit' rstudent(model,...)
## S3 method for class 'rfit' rstudent(model,...)
model |
an object of class rfit |
... |
additional arguments. currently not used. |
John Kloke, Joseph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
x<-runif(47) y<-rcauchy(47) qqnorm(rstudent(fit<-rfit(y~x))) plot(x,rstudent(fit)) ; abline(h=c(-2,2))
x<-runif(47) y<-rcauchy(47) qqnorm(rstudent(fit<-rfit(y~x))) plot(x,rstudent(fit)) ; abline(h=c(-2,2))
A score function and it's corresponding derivative is required for rank-based estimation. This object puts them together.
Objects can be created by calls of the form new("scores", ...)
.
phi
:Object of class "function"
the score function
Dphi
:Object of class "function"
the first derivative of the score function
param
:Object of class "param"
John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
showClass("scores")
showClass("scores")
Hollander and Wolfe (1999) discuss a 2 by 5 factorial design for a study to determine the effect of light on the release of luteinizing hormone (LH). The factors in the design are: light regimes at two levels (constant light and 14 hours of light followed by 10 hours of darkness) and a luteinizing release factor (LRF) at 5 different dosage levels. The response is the level of luteinizing hormone (LH), nanograms per ml of serum in blood samples. Sixty rats were put on test under these 10 treatment combinations, six rats per combination.
data(serumLH)
data(serumLH)
A data frame with 60 observations on the following 3 variables.
serum
a numeric vector
light.regime
a factor with levels Constant
Intermittent
LRF.dose
a factor with levels 0
10
1250
250
50
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
data(serumLH) raov(serum~light.regime + LRF.dose + light.regime*LRF.dose, data = serumLH)
data(serumLH) raov(serum~light.regime + LRF.dose + light.regime*LRF.dose, data = serumLH)
Returns the signed-rank estimate of intercept with is equivalent to the Hodges-Lehmann estimate of the residuals.
signedrank(x)
signedrank(x)
x |
numeric vector |
Returns the median of the Walsh averages.
John Kloke, Joseph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
## The function is currently defined as function (x) median(walsh(x))
## The function is currently defined as function (x) median(walsh(x))
Provides a summary for the oneway anova based on an R fit including a test for main effects as tests for pairwise comparisons.
## S3 method for class 'oneway.rfit' summary(object, alpha=0.05,method=confintadjust.methods,...)
## S3 method for class 'oneway.rfit' summary(object, alpha=0.05,method=confintadjust.methods,...)
object |
an object of class 'oneway.rfit', usually, a result of a call to 'oneway.rfit' |
alpha |
Experimentwise Error Rate |
method |
method used in confidence interval adjustment |
... |
additional arguments |
John Kloke, Joseph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
data(quail) oneway.rfit(quail$ldl,quail$treat)
data(quail) oneway.rfit(quail$ldl,quail$treat)
Provides a summary similar to the traditional least squares fit.
## S3 method for class 'rfit' summary(object,overall.test,...)
## S3 method for class 'rfit' summary(object,overall.test,...)
object |
an object of class 'rfit', usually, a result of a call to 'rfit' |
overall.test |
either 'wald' or 'drop' |
... |
additional arguments |
Provides summary statistics based on a rank-based fit. A table of estimates, standard errors, t-ratios, and p-values are provided. An overall test of the explantory variables is provided; the default is to use a Wald test. A drop in dispersion test is also availble in which case a robust R^2 is provided as well.
John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
data(baseball) fit<-rfit(weight~height,data=baseball) summary(fit) summary(fit,overall.test='drop')
data(baseball) fit<-rfit(weight~height,data=baseball) summary(fit) summary(fit,overall.test='drop')
These are internal functions used for calculating the scale parameter tau necessary for estimating the standard errors of coefficients for rank-regression.
hstarreadyscr(ehat,asc,ascpr) hstar(abdord, wtord, const, n, y) looptau(delta, abdord, wtord, const, n) pairup(x,type="less")
hstarreadyscr(ehat,asc,ascpr) hstar(abdord, wtord, const, n, y) looptau(delta, abdord, wtord, const, n) pairup(x,type="less")
ehat |
Full model residals |
delta |
Window parameter (proportion) used in the Koul et al. estimator of tau. Default value is 0.80. If the ratio of sample size to number of regression parameters (n to p) is less than 5, larger values such as 0.90 to 0.95 are more approporiate. |
y |
Argument of function hstar |
abdord |
Ordered absolute differences of residuals |
wtord |
Standardized (by const) ordered absolute differences of residuals |
const |
Range of score function |
n |
Sample size |
x |
Argument for pairup |
type |
Argument for the function pairup |
asc |
scores |
ascpr |
derivative of the scores |
Joseph McKean, John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Koul, H.L., Sievers, G.L., and McKean, J.W. (1987) An esimator of the scale parameter for the rank analysis of linear models under general score functions, Scandinavian Journal of Statistics, 14, 131-141.
An estimate of the scale parameter taustar = 1/(2*f(0)) is needed for the standard error of the intercept in rank-based regression.
taustar(e, p, conf = 0.95)
taustar(e, p, conf = 0.95)
e |
n x 1 vector of full model residuals |
p |
is the number of regression coefficients (without the intercept) |
conf |
confidence level of CI used |
Confidence interval estimate of taustar. See, for example, Hettmansperger and McKean (1998) p.7-8 and p.25-26.
Length-one numeric object containing the estimated scale parameter taustar.
Joseph McKean, John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
## This is an internal function. See rfit for user-level examples.
## This is an internal function. See rfit for user-level examples.
The number of telephone calls (in tens of millions) made in Belgium from 1950-1973.
data(telephone)
data(telephone)
A data frame with 24 observations on the following 2 variables.
year
years since 1950 AD
calls
number of telephone calls in tens of millions
Rousseeuw, P.J. and Leroy, A.M. (1987), Robust Regression and Outlier Detection, New York: Wiley.
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
data(telephone) plot(telephone) abline(rfit(calls~year,data=telephone))
data(telephone) plot(telephone) abline(rfit(calls~year,data=telephone))
Returns the variance-covariance matrix of the regression estimates from an object of type rfit.
## S3 method for class 'rfit' vcov(object, intercept = NULL,...)
## S3 method for class 'rfit' vcov(object, intercept = NULL,...)
object |
an object of type rfit |
intercept |
logical. If TRUE include the variance-covariance estimates corresponding to the intercept |
... |
additional arguments |
John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Conducts a Wald test of all regression parameters are zero
wald.test.overall(fit)
wald.test.overall(fit)
fit |
result from a rfit |
John Kloke
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
x <- rnorm(47) y <- rnorm(47) wald.test.overall(rfit(y~x))
x <- rnorm(47) y <- rnorm(47) wald.test.overall(rfit(y~x))
Given a list of n numbers, the Walsh averages are the pairwise averages.
walsh(x)
walsh(x)
x |
A numeric vector |
The Walsh averages.
John Kloke, Joseph McKean
Hettmansperger, T.P. and McKean J.W. (2011), Robust Nonparametric Statistical Methods, 2nd ed., New York: Chapman-Hall.
Hollander, M. and Wolfe, D.A. (1999), Nonparametric Statistical Methods, New York: Wiley.
median(walsh(rnorm(100))) # Hodges-Lehmann estimate of location ## The function is currently defined as function (x) { n <- length(x) w <- vector(n * (n + 1)/2, mode = "numeric") ind <- 0 for (i in 1:n) { for (j in i:n) { ind <- ind + 1 w[ind] <- 0.5 * (x[i] + x[j]) } } return(w) }
median(walsh(rnorm(100))) # Hodges-Lehmann estimate of location ## The function is currently defined as function (x) { n <- length(x) w <- vector(n * (n + 1)/2, mode = "numeric") ind <- 0 for (i in 1:n) { for (j in i:n) { ind <- ind + 1 w[ind] <- 0.5 * (x[i] + x[j]) } } return(w) }