Title: | A Double Bootstrap Method for Analyzing Linear Models with Autoregressive Errors |
---|---|
Description: | Computes the double bootstrap as discussed in McKnight, McKean, and Huitema (2000) <doi:10.1037/1082-989X.5.1.87>. The double bootstrap method provides a better fit for a linear model with autoregressive errors than ARIMA when the sample size is small. |
Authors: | Joseph W. McKean and Shaofeng Zhang |
Maintainer: | Shaofeng Zhang <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0 |
Built: | 2024-12-11 07:16:21 UTC |
Source: | CRAN |
Computes the double bootstrap as discussed in McKnight, McKean, and Huitema (2000) <doi:10.1037/1082-989X.5.1.87>. The double bootstrap method provides a better fit for a linear model with autoregressive errors than ARIMA when the sample size is small.
The DESCRIPTION file:
Package: | DBfit |
Type: | Package |
Title: | A Double Bootstrap Method for Analyzing Linear Models with Autoregressive Errors |
Version: | 2.0 |
Date: | 2021-04-30 |
Author: | Joseph W. McKean and Shaofeng Zhang |
Maintainer: | Shaofeng Zhang <[email protected]> |
Description: | Computes the double bootstrap as discussed in McKnight, McKean, and Huitema (2000) <doi:10.1037/1082-989X.5.1.87>. The double bootstrap method provides a better fit for a linear model with autoregressive errors than ARIMA when the sample size is small. |
License: | GPL (>= 2) |
Depends: | Rfit |
NeedsCompilation: | no |
Packaged: | 2021-04-30 20:11:09 UTC; zsf |
Repository: | CRAN |
Date/Publication: | 2021-04-30 20:30:02 UTC |
Index of help topics:
DBfit-package A Double Bootstrap Method for Analyzing Linear Models With Autoregressive Errors boot1 First Boostrap Procedure For parameter estimations boot2 First Boostrap Procedure For parameter estimations dbfit The main function for the double bootstrap method durbin1fit Durbin stage 1 fit durbin1xy Creating New X and Y for Durbin Stage 1 durbin2fit Durbin stage 2 fit fullr QR decomposition for non-full rank design matrix for Rfit. hmdesign2 the Two-Phase Design Matrix hmmat K-Phase Design Matrix hypothmat General Linear Tests of the regression coefficients lagx Lag Functions nurho Creating a new response variable for Durbin stage 2 print.dbfit DBfit Internal Print Functions rhoci2 A fisher type CI of the autoregressive parameter rho simpgen1hm2 Simulation Data Generating Function simula Work Horse Function to implement the Double Bootstrap method simulacorrection Work Horse Function to Implement the Double Bootstrap Method For .99 Cases summary.dbfit Summarize the double bootstrap (DB) fit testdata testdata wrho Creating a new design matrix for Durbin stage 2
Joseph W. McKean and Shaofeng Zhang
Maintainer: Shaofeng Zhang <[email protected]>
McKnight, S. D., McKean, J. W., and Huitema, B. E. (2000). A double bootstrap method to analyze linear models with autoregressive error terms. Psychological methods, 5 (1), 87. Shaofeng Zhang (2017). Ph.D. Dissertation.
Function performing the first bootstrap procedure to yield the parameter estimates
boot1(y, phi1, arp, nbs, x, allb, method, scores)
boot1(y, phi1, arp, nbs, x, allb, method, scores)
y |
the response variable |
phi1 |
the Durbin two-stage estimate of the autoregressive parameter rho |
arp |
the order of autoregressive errors |
nbs |
the bootstrap size |
x |
the original design matrix (including intercept), without centering |
allb |
all the Durbin two-stage estimates of the regression coefficients |
method |
If "OLS", uses the ordinary least square; If "RANK", uses the rank-based fit |
scores |
Default is Wilcoxon scores |
An estimate of the bias is returned
This function is for internal use. The main function for users is dbfit
.
Function performing the second bootstrap procedure to yield the inference of the regression coefficients
boot2(y, xcopy, phi1, beta, nbs, method, scores)
boot2(y, xcopy, phi1, beta, nbs, method, scores)
y |
the response variable |
xcopy |
the original design matrix (including intercept), without centering |
phi1 |
the estimate of the autoregressive parameter rho from the first bootstrap procedure |
beta |
the estimates of the regression coefficients from the first bootstrap procedure |
nbs |
the bootstrap size |
method |
If "OLS", uses the ordinary least square; If "RANK", uses rank-based fit |
scores |
Default is Wilcoxon scores |
betacov |
the estimate of var-cov matrix of betas |
allbeta |
the estimates of betas inside of the second bootstrap, not the final estimates of betas. The final estimates of betas are still from |
rhostar |
the estimates of rho inside of the second bootstrap, not the final estimates of rho. The final estimate(s) of rho are still from |
MSEstar |
MSE used inside of the second bootstrap. |
This function is for internal use. The main function for users is dbfit
This function is used to implement the double bootstrap method. It is used to yield estimates of both regression coefficients and autoregressive parameters(rho), and also the inference of them.
## Default S3 method: dbfit(x, y, arp, nbs = 500, nbscov = 500, conf = 0.95, correction = TRUE, method = "OLS", scores, ...)
## Default S3 method: dbfit(x, y, arp, nbs = 500, nbscov = 500, conf = 0.95, correction = TRUE, method = "OLS", scores, ...)
x |
the design matrix, including intercept, i.e. the first column being ones. |
y |
the response variable. |
arp |
the order of autoregressive errors. |
nbs |
the bootstrap size for the first bootstrap procedure. Default is 500. |
nbscov |
the bootstrap size for the second bootstrap procedure. Default is 500. |
conf |
the confidence level of CI for rho, default is 0.95. |
correction |
logical. Currently, ONLY works for order 1, i.e. for order > 1, this correction will not get involved. If TRUE, uses the correction for cases that the estimate of rho is 0.99. Default is TRUE. |
method |
the method to be used for fitting. If "OLS", uses the ordinary least square |
scores |
Default is Wilcoxon scores |
... |
additional arguments to be passed to fitting routines |
Computes the double bootstrap as discussed in McKnight, McKean, and Huitema (2000). For details, see the references.
coefficients |
the estimates of regression coefficients based on the first bootstrap procedure |
rho1 |
the Durbin two-stage estimate of the autoregressive parameter rho |
adjar |
the estimates of regression coefficients based on the first bootstrap procedure |
mse |
the mean square error |
rho_CI_1 |
the first type of CI for rho, see the second reference for details. |
rho_CI_2 |
the second type of CI for rho, see the second reference for details. |
rho_CI_3 |
the third type of CI for rho, see the second reference for details. |
betacov |
the estimate of the variance-covariance matrix of betas |
tabbeta |
a table of point estimates, SE's, test statistics and p-values. |
flag99 |
an indicator; if 1, it indicates the original fit yields an estimate of rho to be 0.99. When the correction is requested (default), the correction procedure kicks in, and the final estimates of rho is corrected. Only valid if order 1 is specified. |
residuals |
the residuals, that is response minus fitted values. |
fitted.values |
the fitted mean values. |
Joseph W. McKean and Shaofeng Zhang
McKnight, S. D., McKean, J. W., and Huitema, B. E. (2000). A double bootstrap method to analyze linear models with autoregressive error terms. Psychological methods, 5 (1), 87.
Shaofeng Zhang (2017). Ph.D. Dissertation.
dbfit.formula
# make sure the dependent package Rfit is installed # To save users time, we set both bootstrap sizes to be 100 in this example. # Defaults are both 500. # data(testdata) # This data is generated by a two-phase design, with autoregressive order being one, # autoregressive coefficient being 0.6 and all regression coefficients being 0. # Both the first and second phase have 20 observations. # y <- testdata[,5] # x <- testdata[,1:4] # fit1 <- dbfit(x,y,1, nbs = 100, nbscov = 100) # OLS fit, default # summary(fit1) # Note that the CI's of autoregressive coef are not shown in the summary. # Instead, they are attributes of model fit. # fit1$rho_CI_1 # fit2 <- dbfit(x,y,1, nbs = 100, nbscov = 100 ,method="RANK") # rank-based fit # When fitting with autoregressive order 2, # the estimate of the second order autoregressive coefficient should not be significant, # since this data is generated with order 1. # fit3 <- dbfit(x,y,2, nbs = 100, nbscov = 100) # fit3$rho_CI_1 # The first row is lower bounds, and second row is upper bounds
# make sure the dependent package Rfit is installed # To save users time, we set both bootstrap sizes to be 100 in this example. # Defaults are both 500. # data(testdata) # This data is generated by a two-phase design, with autoregressive order being one, # autoregressive coefficient being 0.6 and all regression coefficients being 0. # Both the first and second phase have 20 observations. # y <- testdata[,5] # x <- testdata[,1:4] # fit1 <- dbfit(x,y,1, nbs = 100, nbscov = 100) # OLS fit, default # summary(fit1) # Note that the CI's of autoregressive coef are not shown in the summary. # Instead, they are attributes of model fit. # fit1$rho_CI_1 # fit2 <- dbfit(x,y,1, nbs = 100, nbscov = 100 ,method="RANK") # rank-based fit # When fitting with autoregressive order 2, # the estimate of the second order autoregressive coefficient should not be significant, # since this data is generated with order 1. # fit3 <- dbfit(x,y,2, nbs = 100, nbscov = 100) # fit3$rho_CI_1 # The first row is lower bounds, and second row is upper bounds
Function implements the Durbin stage 1 fit
durbin1fit(y, x, arp, method, scores)
durbin1fit(y, x, arp, method, scores)
y |
the response variable in stage 1, not the original response variable |
x |
the model matrix in stage 1, not the original design matrix |
arp |
the order of autoregressive errors. |
method |
the method to be used for fitting. If "OLS", uses the ordinary least square; If "RANK", uses the rank-based fit. |
scores |
Default is Wilcoxon scores |
This function is for internal use. The main function for users is dbfit
.
McKnight, S. D., McKean, J. W., and Huitema, B. E. (2000). A double bootstrap method to analyze linear models with autoregressive error terms. Psychological methods, 5 (1), 87. Shaofeng Zhang (2017). Ph.D. Dissertation.
Functions provides the tranformed reponse variable and model matrix for Durbin stage 1 fit. For details of the transformation, see the reference.
durbin1xy(y, x, arp)
durbin1xy(y, x, arp)
y |
the orginal response variable |
x |
the orginal design matrix with first column of all one's (corresponding to the intercept) |
arp |
the order of autoregressive errors. |
McKnight, S. D., McKean, J. W., and Huitema, B. E. (2000). A double bootstrap method to analyze linear models with autoregressive error terms. Psychological methods, 5 (1), 87. Shaofeng Zhang (2017). Ph.D. Dissertation.
Function implements the Durbin stage 1 fit
durbin2fit(yc, xc, adjphi, method, scores)
durbin2fit(yc, xc, adjphi, method, scores)
yc |
a transformed reponse variable |
xc |
a transformed design matrix |
adjphi |
the Durbin stage 1 estimate(s) of the autoregressive parameters rho |
method |
the method to be used for fitting. If "OLS", uses the ordinary least square; If "RANK", uses the rank-based fit. |
scores |
Default is Wilcoxon scores |
beta |
the estimates of regression coefficients |
sigma |
the estimate of standard deviation of the white noise |
This function is for internal use. The main function for users is dbfit
.
McKnight, S. D., McKean, J. W., and Huitema, B. E. (2000). A double bootstrap method to analyze linear models with autoregressive error terms. Psychological methods, 5 (1), 87. Shaofeng Zhang (2017). Ph.D. Dissertation.
With Rfit recent update, it cannot return partial results with sigular design matrix (as opposed to lm). This function uses QR decomposition for Rfit to resolve this issue, so that dbfit can run robust version.
fullr(x, p1)
fullr(x, p1)
x |
design matrix, including intercept, i.e. the first column being ones. |
p1 |
number of first few columns of x that are lineraly independent. |
This function is for internal use.
Returns the design matrix for a two-phase intervention model.
hmdesign2(n1, n2)
hmdesign2(n1, n2)
n1 |
number of obs in phase 1 |
n2 |
number of obs in phase 2 |
It returns a matrix of 4 columns. As discussed in Huitema, Mckean, & Mcknight (1999), in two-phase design: beta0 = intercept, beta1 = slope for Phase 1, beta2 = level change from Phase 1 to Phase 2, and beta3 slope change from Phase 1 to Phase 2.
Huitema, B. E., Mckean, J. W., & Mcknight, S. (1999). Autocorrelation effects on least- squares intervention analysis of short time series. Educational and Psychological Measurement, 59 (5), 767-786.
n1 <- 15 n2 <- 15 hmdesign2(n1, n2)
n1 <- 15 n2 <- 15 hmdesign2(n1, n2)
Returns the design matrix for a general k-phase intervention model
hmmat(vecss, k)
hmmat(vecss, k)
vecss |
a vector of length k with each element being the number of observations in each phase |
k |
number of phases |
It returns a matrix of 2*k columns. The design can be unbalanced, i.e. each phase has different observations.
Huitema, B. E., Mckean, J. W., & Mcknight, S. (1999). Autocorrelation effects on least- squares intervention analysis of short time series. Educational and Psychological Measurement, 59 (5), 767-786.
# a three-phase design matrix hmmat(c(10,10,10),3)
# a three-phase design matrix hmmat(c(10,10,10),3)
Performs general linear tests of the regressio coefficients.
hypothmat(sfit, mmat, n, p)
hypothmat(sfit, mmat, n, p)
sfit |
the result of a call to dbfit. |
mmat |
a full row rank q*(p+1) matrix, where q is the row number of the matrix and p is number of independent variables. |
n |
total number of observations. |
p |
number of independent variables. |
This functions performs the general linear F-test of the form H0: Mb = 0 vs HA: Mb != 0.
tst |
the test statistic |
pvf |
the p-value of the F-test |
McKnight, S. D., McKean, J. W., and Huitema, B. E. (2000). A double bootstrap method to analyze linear models with autoregressive error terms. Psychological methods, 5 (1), 87. Shaofeng Zhang (2017). Ph.D. Dissertation.
# data(testdata) # y<-testdata[,5] # x<-testdata[,1:4] # fit1<-dbfit(x,y,1) # OLS fit, default # a test that H0: b1 = b3 vs HA: b1 != b3 # mat<-matrix(c(1,0,0,-1),nrow=1) # hypothmat(sfit=fit1,mmat=mat,n=40,p=4)
# data(testdata) # y<-testdata[,5] # x<-testdata[,1:4] # fit1<-dbfit(x,y,1) # OLS fit, default # a test that H0: b1 = b3 vs HA: b1 != b3 # mat<-matrix(c(1,0,0,-1),nrow=1) # hypothmat(sfit=fit1,mmat=mat,n=40,p=4)
For preparing the transformed x and y in the Durbin stage 1 fit
lagx(x, s1, s2) lagmat(x, p)
lagx(x, s1, s2) lagmat(x, p)
x |
a vector or the design matrix, including intercept, i.e. the first column being ones. |
s1 |
starting index of the slice. |
s2 |
end index of the slice. |
p |
the order of autoregressive errors. |
These function are for internal use.
It returns a new response variable (vector) for Durbin stage 2.
nurho(yc, adjphi)
nurho(yc, adjphi)
yc |
the centered response variable y |
adjphi |
(initial) estimate of rho in Durbin stage 1 |
see reference.
This function is for internal use. The main function for users is dbfit
.
McKnight, S. D., McKean, J. W., and Huitema, B. E. (2000). A double bootstrap method to analyze linear models with autoregressive error terms. Psychological methods, 5 (1), 87. Shaofeng Zhang (2017). Ph.D. Dissertation.
These functions print the output in a user-friendly manner using the internal R function print
.
## S3 method for class 'dbfit' print(x, ...) ## S3 method for class 'summary.dbfit' print(x, ...)
## S3 method for class 'dbfit' print(x, ...) ## S3 method for class 'summary.dbfit' print(x, ...)
x |
An object to be printed |
... |
additional arguments to be passed to |
This function returns a Fisher type CI for rho, which is then used to correct the .99 cases.
rhoci2(n, rho, cv)
rhoci2(n, rho, cv)
n |
total number of observations |
rho |
final estimate of rho, usually .99. |
cv |
critical value for CI |
see reference.
This function is for internal use.
Shaofeng Zhang (2017). Ph.D. Dissertation. Rao, C. R. (1952). Advanced statistical methods in biometric research. p. 231
Generates the simulation data for a two-phase intervention model.
simpgen1hm2(n1, n2, rho, beta = c(0, 0, 0, 0))
simpgen1hm2(n1, n2, rho, beta = c(0, 0, 0, 0))
n1 |
number of obs in phase 1 |
n2 |
number of obs in phase 2 |
rho |
pre-defined autoregressive parameter(s) |
beta |
pre-defined regression coefficients |
This function is used for simulations when developing the package. With pre-defined sample sizes in both phases and parameters, it returns a simulated data.
mat |
a matrix containing the simulation data. The last column is the response variable. All other columns make up the design matrix. |
n1 <- 15 n2 <- 15 rho <- 0.6 beta <- c(0,0,0,0) dat <- simpgen1hm2(n1, n2, rho, beta) dat
n1 <- 15 n2 <- 15 rho <- 0.6 beta <- c(0,0,0,0) dat <- simpgen1hm2(n1, n2, rho, beta) dat
simula
is the original work horse function to implement the DB method. However, when this function returns an estimate of rho to be .99, another work horse function simulacorrection
kicks in.
simula(x, y, arp, nbs, nbscov, conf, method, scores)
simula(x, y, arp, nbs, nbscov, conf, method, scores)
x |
the design matrix, including intercept, i.e. the first column being ones. |
y |
the response variable. |
arp |
the order of autoregressive errors. |
nbs |
the bootstrap size for the first bootstrap procedure. Default is 500. |
nbscov |
the bootstrap size for the second bootstrap procedure. Default is 500. |
conf |
the confidence level of CI for rho, default is 0.95. |
method |
the method to be used for fitting. If "OLS", uses the ordinary least square |
scores |
Default is Wilcoxon scores |
see dbfit
.
Users should use dbfit
to perform the analysis.
McKnight, S. D., McKean, J. W., and Huitema, B. E. (2000). A double bootstrap method to analyze linear models with autoregressive error terms. Psychological methods, 5 (1), 87. Shaofeng Zhang (2017). Ph.D. Dissertation.
When function simula
returns an estimate of rho to be .99, this function kicks in and ouputs a corrected estimate of rho. Currently, this only works for order 1, i.e. for order > 1, this correction will not get involved.
simulacorrection(x, y, arp, nbs, nbscov, method, scores)
simulacorrection(x, y, arp, nbs, nbscov, method, scores)
x |
the design matrix, including intercept, i.e. the first column being ones. |
y |
the response variable. |
arp |
the order of autoregressive errors. |
nbs |
the bootstrap size for the first bootstrap procedure. Default is 500. |
nbscov |
the bootstrap size for the second bootstrap procedure. Default is 500. |
method |
the method to be used for fitting. If "OLS", uses the ordinary least square |
scores |
Default is Wilcoxon scores |
If 0.99 problem is detected, then construct Fisher CI for both initial estimate (in Durbin stage 1) and first bias-corrected estimate (perform only one bootstrap, instead of a loop); if the midpoint of latter is smaller than 0.95, then this midpoint is the final estimate for rho; otherwise the midpoint of the former CI is the final estimate.
By default, when function simula
returns an estimate of rho to be .99, this function kicks in and ouputs a corrected estimate of rho. However, users can turn the auto correction off by setting correction="FALSE" in dbfit
. Users are encouraged to investigate why the stationarity assumption is violated based on their experience of time series analysis and knowledge of the data.
Users should use dbfit
to perform the analysis.
Shaofeng Zhang (2017). Ph.D. Dissertation.
It summarizes the DB fit in a way that is similar to OLS lm
.
## S3 method for class 'dbfit' summary(object, ...)
## S3 method for class 'dbfit' summary(object, ...)
object |
a result of the call to |
... |
additional arguments to be passed |
call |
the call to |
tab |
a table of point estimates, standard errors, t-ratios and p-values |
rho1 |
the Durbin two-stage estimate of rho |
adjar |
the DB (final) estimate of rho |
flag99 |
an indicator; if 1, it indicates the original fit yields an estimate of rho to be 0.99. Only valid if order 1 is specified. |
# data(testdata) # y<-testdata[,5] # x<-testdata[,1:4] # fit1<-dbfit(x,y,1) # OLS fit, default # summary(fit1)
# data(testdata) # y<-testdata[,5] # x<-testdata[,1:4] # fit1<-dbfit(x,y,1) # OLS fit, default # summary(fit1)
This data serves as a test data.
data("testdata")
data("testdata")
A data frame with 40 observations. First 4 columns make up the design matrix, while the last column is the response variable. This data is generated by a two-phase design, with autoregressive order being one, autoregressive coefficient being 0.6 and all regression coefficients being 0. Both the first and second phase have 20 observations.
data(testdata)
data(testdata)
It returns a new design matrix for Durbin stage 2.
wrho(xc, adjphi)
wrho(xc, adjphi)
xc |
centered design matrix, no column of ones |
adjphi |
(initial) estimate of rho in Durbin stage 1 |
see reference.
This function is for internal use. The main function for users is dbfit
.
McKnight, S. D., McKean, J. W., and Huitema, B. E. (2000). A double bootstrap method to analyze linear models with autoregressive error terms. Psychological methods, 5 (1), 87. Shaofeng Zhang (2017). Ph.D. Dissertation.