| Title: | Panel Cointegration Tests Based on Westerlund (2007) |
|---|---|
| Description: | Implements a functional approximation of the four panel cointegration tests developed by Westerlund (2007) <doi:10.1111/j.1468-0084.2007.00477.x>. The tests are based on structural rather than residual dynamics and allow for heterogeneity in both the long-run cointegrating relationship and the short-run dynamics. The package includes logic for automated lag and lead selection via AIC/BIC, Bartlett kernel long-run variance estimation, and a bootstrap procedure to handle cross-sectional dependence. It also includes a bootstrapping distribution visualization function for diagnostic purposes. |
| Authors: | Bosco Hung [aut, cre] (ORCID: <https://orcid.org/0000-0003-3073-303X>) |
| Maintainer: | Bosco Hung <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.3 |
| Built: | 2026-05-12 09:26:41 UTC |
| Source: | https://github.com/cran/Westerlund |
Computes a Bartlett-kernel (Newey–West style) long-run variance estimate for a
univariate series using Stata-like conventions: missing values are removed
(na.omit), autocovariances are scaled by (not ),
and optional centering is controlled by nodemean.
calc_lrvar_bartlett(x, maxlag, nodemean = FALSE)calc_lrvar_bartlett(x, maxlag, nodemean = FALSE)
x |
A numeric vector. Missing values are removed prior to computation. |
maxlag |
Non-negative integer. Maximum lag order |
nodemean |
Logical. If |
Let be the input series after removing missing values. If
nodemean=FALSE, the function replaces with
.
Define the lag- autocovariance using the Stata-style scaling:
where is the length of the cleaned series and .
Note that the scaling uses for all (rather than ).
The Bartlett kernel weight at lag is:
The long-run variance estimate is then:
If the cleaned series has zero length, the function returns NA.
A single numeric value:
The Bartlett-kernel long-run variance estimate (scalar),
or NA if x contains no non-missing values.
This section illustrates how calc_lrvar_bartlett() computes a Bartlett-kernel
long-run variance (LRV) estimate and how its options map to common time-series
preprocessing choices.
This helper function is designed to match Stata-style calculations:
Missing values are dropped: na.omit(x) is applied first.
Scaling by : autocovariances at all lags divide by , not by .
Optional de-meaning: controlled by nodemean.
By default (nodemean=FALSE), the series is centered: .
Set nodemean=TRUE when you have already centered the series elsewhere.
maxlag sets the truncation point . Larger captures more
serial correlation but increases estimation noise.
## Example 1: Basic usage x <- rnorm(200) calc_lrvar_bartlett(x, maxlag = 4) ## Example 2: maxlag = 0 returns gamma_0 calc_lrvar_bartlett(x, maxlag = 0) ## Example 3: Handle missing values (they are removed) x_na <- x x_na[c(5, 10, 50)] <- NA calc_lrvar_bartlett(x_na, maxlag = 4) ## Example 4: Compare centering choices ## Default: de-mean internally lr1 <- calc_lrvar_bartlett(x, maxlag = 4, nodemean = FALSE) ## Pre-center and skip de-meaning x_centered <- x - mean(x) lr2 <- calc_lrvar_bartlett(x_centered, maxlag = 4, nodemean = TRUE)## Example 1: Basic usage x <- rnorm(200) calc_lrvar_bartlett(x, maxlag = 4) ## Example 2: maxlag = 0 returns gamma_0 calc_lrvar_bartlett(x, maxlag = 0) ## Example 3: Handle missing values (they are removed) x_na <- x x_na[c(5, 10, 50)] <- NA calc_lrvar_bartlett(x_na, maxlag = 4) ## Example 4: Compare centering choices ## Default: de-mean internally lr1 <- calc_lrvar_bartlett(x, maxlag = 4, nodemean = FALSE) ## Pre-center and skip de-meaning x_centered <- x - mean(x) lr2 <- calc_lrvar_bartlett(x_centered, maxlag = 4, nodemean = TRUE)
Formats, standardizes, and prints the Westerlund (2007) ECM-based panel
cointegration test results for the null hypothesis of no cointegration. Given
raw statistics , , , and , the function
computes standardized Z-statistics using tabulated (or hard-coded) asymptotic
moments and reports left-tail asymptotic p-values. If a bootstrap distribution
is supplied, it also computes bootstrap (robust) p-values for the raw statistics.
DisplayWesterlund( stats, bootstats = NULL, nobs, nox, constant = FALSE, trend = FALSE, meanlag = -1, meanlead = -1, realmeanlag = -1, realmeanlead = -1, auto = 0, westerlund = FALSE, aic = TRUE, verbose = FALSE )DisplayWesterlund( stats, bootstats = NULL, nobs, nox, constant = FALSE, trend = FALSE, meanlag = -1, meanlead = -1, realmeanlag = -1, realmeanlead = -1, auto = 0, westerlund = FALSE, aic = TRUE, verbose = FALSE )
stats |
A numeric vector of length 4 or a |
bootstats |
Optional. A numeric matrix with bootstrap replications of the raw statistics, with 4 columns in the order |
nobs |
Integer. Number of valid cross-sectional units (series) used in the test. |
nox |
Integer. Number of covariates in the long-run relationship (length of |
constant |
Logical. Indicates whether a constant was included in the cointegrating relationship. Affects the deterministic-case row used for asymptotic moment lookup. |
trend |
Logical. Indicates whether a trend was included. If |
meanlag |
Integer. Mean selected lag length (rounded down to integer) reported in the header when |
meanlead |
Integer. Mean selected lead length (rounded down to integer) reported in the header when |
realmeanlag |
Numeric. Unrounded mean selected lag length (for display when |
realmeanlead |
Numeric. Unrounded mean selected lead length (for display when |
auto |
Logical/integer. If non-zero, the header prints the average AIC-selected lag and lead lengths based on |
westerlund |
Logical. If |
aic |
Logical. If |
verbose |
Logical. If |
What this function does.
DisplayWesterlund() takes raw Westerlund ECM test statistics and produces:
standardized Z-statistics for , , , ,
left-tail asymptotic p-values using pnorm(),
optionally, bootstrap (robust) p-values when bootstats is provided,
a console table summarizing the results.
Standardization and deterministic cases. The function selects a deterministic-case index:
Row 1: no constant, no trend,
Row 2: constant, no trend,
Row 3: constant and trend,
implemented as row_idx = (as.integer(constant) + as.integer(trend)) + 1.
In the non-westerlund case, asymptotic means and variances are taken from
hard-coded lookup matrices indexed by row_idx and nox. Z-statistics are computed using:
for mean-group statistics and analogous formulas for pooled statistics as
implemented in the code, where nobs and is the raw statistic.
In the westerlund=TRUE case, the function uses a separate set of
hard-coded mean/variance constants, with different values depending on whether
trend is included.
Asymptotic p-values.
All asymptotic p-values are computed as left-tail probabilities:
pnorm(Z).
Bootstrap p-values (robust p-values).
If bootstats is supplied, bootstrap p-values are computed for each raw
statistic using a left-tail empirical rule:
after dropping non-finite bootstrap draws. This is a common finite-sample correction used in Stata-style bootstrap code.
Return values. In addition to printing, the function returns a list containing the raw statistics, Z-statistics, asymptotic p-values, and (if applicable) bootstrap p-values.
A list containing at minimum:
gt, ga, pt, pa: raw test statistics,
gt_z, ga_z, pt_z, pa_z: standardized Z-statistics,
gt_pval, ga_pval, pt_pval, pa_pval: left-tail asymptotic p-values.
If bootstats is provided, the list also includes:
gt_pvalboot, ga_pvalboot, pt_pvalboot, pa_pvalboot: bootstrap (robust) p-values for the raw statistics.
This section explains how to use DisplayWesterlund() and how its output
connects to the broader testing workflow.
DisplayWesterlund() fit?Typically, the workflow is:
Compute observed raw statistics via WesterlundPlain.
Optionally compute a bootstrap distribution via WesterlundBootstrap.
Call DisplayWesterlund() to standardize, print, and return p-values.
The user-facing westerlund_test wraps these steps and collects the
returned scalars.
stats
stats can be either a numeric vector c(Gt, Ga, Pt, Pa) or a 1x4 matrix with [1,] corresponding to Gt, Ga, Pt, Pa.
The function converts raw statistics to Z-statistics using hard-coded asymptotic
means and variances. P-values are computed as pnorm(Z), i.e., a left-tail test.
If bootstats is provided, robust p-values are computed by comparing the
observed raw statistic to its bootstrap distribution, using a finite-sample
correction:
where is the number of bootstrap draws less than or
equal to the observed statistic.
## Example 1: Asymptotic-only display (no bootstrap) stats <- c(Gt = -2.1, Ga = -9.5, Pt = -1.8, Pa = -6.2) res1 <- DisplayWesterlund( stats = stats, bootstats = NULL, nobs = 18, nox = 2, constant = TRUE, trend = FALSE, auto = 0, westerlund = FALSE ) ## Example 2: With bootstrap distribution (robust p-values) set.seed(123) bootstats <- cbind( rnorm(399, mean = -1.8, sd = 0.9), rnorm(399, mean = -8.0, sd = 3.0), rnorm(399, mean = -1.5, sd = 1.0), rnorm(399, mean = -5.0, sd = 3.5) ) res2 <- DisplayWesterlund( stats = stats, bootstats = bootstats, nobs = 18, nox = 2, constant = TRUE, trend = FALSE, auto = 1, realmeanlag = 1.35, realmeanlead = 0.40, westerlund = FALSE )
Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709–748.
westerlund_test,
WesterlundPlain,
WesterlundBootstrap
Computes the first difference of a time-indexed series using strict time-based
lagging. The function respects gaps in the time index and returns NA when
the previous time period does not exist, mirroring Stata’s D. operator.
get_diff(vec, tvec)get_diff(vec, tvec)
vec |
A numeric (or atomic) vector of observations. |
tvec |
A vector of time indices corresponding one-to-one with |
This helper function computes first differences as:
where the lagged value is obtained using get_lag,
which performs strict time-based lookup.
Internally, the function calls:
val_t_minus_1 <- get_lag(vec, tvec, 1)
and then subtracts this lagged vector from vec. If the time index contains
gaps, or if the previous time period does not exist for a given observation,
the lagged value is NA and the corresponding difference is also
NA.
No interpolation or implicit shifting is performed; missing time periods propagate as missing differences.
A vector of the same length as vec, containing the first differences
aligned by the time index. Elements are NA when the previous time period
does not exist.
This section explains how get_diff() computes first differences and why
strict time indexing matters in the presence of gaps.
The function replicates the behaviour of Stata’s first-difference operator
D.x. When time periods are missing, Stata returns missing values rather
than differencing across gaps. Because get_diff() relies on
get_lag, it follows the same rule.
The base R function diff() computes differences based on vector positions.
This implicitly assumes a complete and regularly spaced time index. When time
periods are missing, diff() can produce misleading results by differencing
across gaps. get_diff() avoids this by differencing only when the
previous time period exists.
get_lag,
get_ts_val,
westerlund_test
## Example 1: Regular time series t <- 1:5 x <- c(10, 20, 30, 40, 50) get_diff(x, t) # [1] NA 10 10 10 10 ## Example 2: Time series with a gap t_gap <- c(1, 2, 4, 5) x_gap <- c(10, 20, 40, 50) get_diff(x_gap, t_gap) # [1] NA 10 NA 10 ## Explanation: ## At t = 4, the previous period t-1 = 3 does not exist, so the difference is NA. ## Example 3: Comparison with diff() diff(x_gap) # [1] 10 20 10## Example 1: Regular time series t <- 1:5 x <- c(10, 20, 30, 40, 50) get_diff(x, t) # [1] NA 10 10 10 10 ## Example 2: Time series with a gap t_gap <- c(1, 2, 4, 5) x_gap <- c(10, 20, 40, 50) get_diff(x_gap, t_gap) # [1] NA 10 NA 10 ## Explanation: ## At t = 4, the previous period t-1 = 3 does not exist, so the difference is NA. ## Example 3: Comparison with diff() diff(x_gap) # [1] 10 20 10
Extracts lagged (or led) values from a time-indexed vector using a strict
time-based lookup. The function respects gaps in the time index and returns
NA when the requested lagged time does not exist, mirroring the behaviour
of Stata’s time-series lag/lead operators.
get_lag(vec, tvec, k)get_lag(vec, tvec, k)
vec |
A numeric (or atomic) vector of observations. |
tvec |
A vector of time indices corresponding one-to-one with |
k |
Integer. The lag order. Positive values correspond to lags
(e.g., |
This helper function performs strict time-based lagging rather than position-based shifting. Internally, it constructs a mapping from time indices to observed values:
val_map <- setNames(vec, tvec)
For each observation at time , the function retrieves the value associated
with time . If that time value is not present in tvec, the
result is NA.
This behaviour is particularly important when working with irregular time
series or panel data with gaps. In such cases, simple vector shifting can
incorrectly carry values across missing time periods, while get_lag()
preserves the correct alignment.
No interpolation, padding, or reordering is performed; missing time periods
propagate as NA in the lagged (or led) series.
A vector of the same length as vec, containing the lagged (or led) values
aligned by the time index. Elements are NA when the requested time period
does not exist.
This section illustrates the logic of get_lag() and explains how
it differs from standard position-based lagging.
In many applications, lags are computed by shifting vectors by positions.
This implicitly assumes a complete and regular time index. When time periods
are missing, such shifting produces incorrect lag values. get_lag()
avoids this by explicitly matching on time values.
The function mirrors the behaviour of Stata’s time-series operators:
L.x: lagged value at ,
L2.x: lagged value at ,
F.x: lead value at .
As in Stata, gaps in the time index result in missing values in the lagged series.
get_ts_val,
westerlund_test,
calc_lrvar_bartlett
## Example 1: Regular time series t <- 1:5 x <- c(10, 20, 30, 40, 50) get_lag(x, t, k = 1) # [1] NA 10 20 30 40 get_lag(x, t, k = -1) # [1] 20 30 40 50 NA ## Example 2: Time series with a gap t_gap <- c(1, 2, 4, 5) x_gap <- c(10, 20, 40, 50) get_lag(x_gap, t_gap, k = 1) # [1] NA 10 NA 40 ## Explanation: ## At t = 4, the requested time t-1 = 3 does not exist, so NA is returned. ## Example 3: Higher-order lags get_lag(x_gap, t_gap, k = 2) # [1] NA NA NA 20## Example 1: Regular time series t <- 1:5 x <- c(10, 20, 30, 40, 50) get_lag(x, t, k = 1) # [1] NA 10 20 30 40 get_lag(x, t, k = -1) # [1] 20 30 40 50 NA ## Example 2: Time series with a gap t_gap <- c(1, 2, 4, 5) x_gap <- c(10, 20, 40, 50) get_lag(x_gap, t_gap, k = 1) # [1] NA 10 NA 40 ## Explanation: ## At t = 4, the requested time t-1 = 3 does not exist, so NA is returned. ## Example 3: Higher-order lags get_lag(x_gap, t_gap, k = 2) # [1] NA NA NA 20
Retrieves lagged (or led) values from a time-indexed vector using a strict time-based mapping.
get_ts_val(vec, tvec, lag)get_ts_val(vec, tvec, lag)
vec |
A numeric (or atomic) vector of observations. |
tvec |
A vector of time indices corresponding one-to-one with |
lag |
Integer. The lag order. Positive values correspond to lags
(e.g., |
This helper function implements a strict time-based lookup rather than position-based indexing. Internally, it constructs a named mapping from time indices to observed values:
val_map <- setNames(vec, tvec)
For each observation at time , the function computes the target time
and retrieves the corresponding value from the map.
If the target time does not exist in tvec, the function returns
NA for that observation. This behaviour exactly mirrors Stata’s
lag/lead operators in the presence of gaps.
Importantly, the function does not interpolate or shift values when
time periods are missing. A gap in the time index propagates as NA in the
lagged (or led) series.
A vector of the same length as vec, containing the lagged (or led)
values aligned by time index. Elements are NA when the requested
time period does not exist.
This section explains why get_ts_val() is useful and how it differs from
standard lagging approaches based on vector positions.
In panel and time-series econometrics, lagged variables should be defined with
respect to the time index, not the row position. When data contain gaps in time,
simple shifting (e.g., c(NA, x[-length(x)])) can produce incorrect values.
get_ts_val() avoids this by explicitly matching on time values.
This function replicates the behaviour of Stata’s time-series operators:
L.x: lagged value at ,
L2.x: lagged value at ,
F.x: lead value at .
When time periods are missing, Stata returns missing values rather than shifting across gaps.
westerlund_test,
calc_lrvar_bartlett
## Example 1: Regular time series t <- 1:5 x <- c(10, 20, 30, 40, 50) ## Lag by one period get_ts_val(x, t, lag = 1) # [1] NA 10 20 30 40 ## Lead by one period get_ts_val(x, t, lag = -1) # [1] 20 30 40 50 NA ## Example 2: Time series with a gap t_gap <- c(1, 2, 4, 5) x_gap <- c(10, 20, 40, 50) ## Lag by one period: note the NA at time 4 get_ts_val(x_gap, t_gap, lag = 1) # [1] NA 10 NA 40 ## Explanation: ## - At t = 4, t - 1 = 3 does not exist in t_gap, so NA is returned. ## Example 3: Higher-order lags get_ts_val(x_gap, t_gap, lag = 2) # [1] NA NA NA 20## Example 1: Regular time series t <- 1:5 x <- c(10, 20, 30, 40, 50) ## Lag by one period get_ts_val(x, t, lag = 1) # [1] NA 10 20 30 40 ## Lead by one period get_ts_val(x, t, lag = -1) # [1] 20 30 40 50 NA ## Example 2: Time series with a gap t_gap <- c(1, 2, 4, 5) x_gap <- c(10, 20, 40, 50) ## Lag by one period: note the NA at time 4 get_ts_val(x_gap, t_gap, lag = 1) # [1] NA 10 NA 40 ## Explanation: ## - At t = 4, t - 1 = 3 does not exist in t_gap, so NA is returned. ## Example 3: Higher-order lags get_ts_val(x_gap, t_gap, lag = 2) # [1] NA NA NA 20
This function provides an S3 method for visualizing objects of class westerlund_test.
It creates a faceted ggplot2 visualization of the bootstrap distributions for
the four Westerlund (2007) panel cointegration statistics: , ,
, and . The plot displays the kernel density of bootstrap
replications, the observed test statistic as a solid line, and the left-tail
bootstrap critical value as a dashed line.
## S3 method for class 'westerlund_test' plot( x, title = "Westerlund Test: Bootstrap Distributions", conf_level = 0.05, colors = list( obs = "#D55E00", crit = "#0072B2", fill = "grey80", density = "grey30" ), lwd = list(obs = 1, crit = 0.8, density = 0.5), show_grid = TRUE, ... )## S3 method for class 'westerlund_test' plot( x, title = "Westerlund Test: Bootstrap Distributions", conf_level = 0.05, colors = list( obs = "#D55E00", crit = "#0072B2", fill = "grey80", density = "grey30" ), lwd = list(obs = 1, crit = 0.8, density = 0.5), show_grid = TRUE, ... )
x |
An object of class |
title |
Character. The main title of the plot. |
conf_level |
Numeric in (0,1). Left-tail quantile used as the critical value.
For example, |
colors |
A named list of colors for plot elements. Expected elements include
|
lwd |
A named list of line widths or sizes for the |
show_grid |
Logical. If |
... |
Additional arguments passed to or from other methods. |
The function converts the bootstrap distribution matrix into a long-format data
frame to utilize the faceting capabilities of the ggplot2 package. This
implementation uses ggplot2::geom_density() for empirical distribution
estimation and arranges the four subplots into a 2x2 grid. By using
facet_wrap(~Statistic, scales = "free"), the plots maintain independent
x-axis scales to better visualize the specific range of each statistic.
Text annotations are automatically added to each facet to display the exact
numerical values of the Observed Stat and the Critical Value for quick reference.
The dashed line represents the bootstrap critical value for a left-tailed test.
If the observed statistic, represented by the solid line, is located to the left
of the dashed line, it indicates a rejection of the null hypothesis of no
cointegration at the specified conf_level significance level. This
visual approach allows for an immediate assessment of the test results relative
to the empirically derived distribution.
The function requires the ggplot2, tidyr, and scales packages
to be installed. Because it is an S3 method, it is typically invoked by calling
plot() directly on the result object of a Westerlund test.
Returns a ggplot object. This allows the user to apply further
customizations, such as changing themes with + theme_bw() or adding
additional layers and labels.
Westerlund's pooled statistics can be sensitive to nuisance parameters and cross-sectional dependence in finite samples. Visualizing the bootstrap density provides a more robust reference than asymptotic normal curves, offering a clearer picture of the distribution under the null hypothesis.
Because the function returns a ggplot object, users can modify the
output easily. For example, the theme can be changed by adding a theme
layer, such as plot(res) + ggplot2::theme_dark().
Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709–748.
westerlund_test,
WesterlundBootstrap,
DisplayWesterlund
## Example: generating a mock westerlund_test object set.seed(123) mock_res <- list( bootstrap_distributions = cbind( Gt = rnorm(399, mean = -2, sd = 1), Ga = rnorm(399, mean = -8, sd = 3), Pt = rnorm(399, mean = -2, sd = 1), Pa = rnorm(399, mean = -6, sd = 3) ), test_stats = list(Gt = -3.1, Ga = -10.2, Pt = -2.7, Pa = -7.8) ) class(mock_res) <- "westerlund_test" ## Use the S3 plot method p <- plot(mock_res) ## Display plot if (interactive()) print(p)## Example: generating a mock westerlund_test object set.seed(123) mock_res <- list( bootstrap_distributions = cbind( Gt = rnorm(399, mean = -2, sd = 1), Ga = rnorm(399, mean = -8, sd = 3), Pt = rnorm(399, mean = -2, sd = 1), Pa = rnorm(399, mean = -6, sd = 3) ), test_stats = list(Gt = -3.1, Ga = -10.2, Pt = -2.7, Pa = -7.8) ) class(mock_res) <- "westerlund_test" ## Use the S3 plot method p <- plot(mock_res) ## Display plot if (interactive()) print(p)
This function provides a clean and concise console output for objects of class
westerlund_test. Instead of displaying the raw list structure, it
summarizes the key results of the Westerlund (2007) panel cointegration test,
including the four observed test statistics and information regarding the
bootstrap replications if they were performed.
## S3 method for class 'westerlund_test' print(x, ...)## S3 method for class 'westerlund_test' print(x, ...)
x |
An object of class |
... |
Further arguments passed to or from other methods. Currently unused but maintained for S3 generic compatibility. |
The print method is designed to enhance readability by presenting the test
results in a formatted table-like structure. It first identifies the test type
and then displays the observed values for the group-mean statistics (
and ) and the pooled panel statistics ( and ). This
summary provides an immediate overview of the test's findings without
requiring the user to navigate the internal list structure of the object.
If the test was conducted with a bootstrap distribution, the method also
reports the total number of successful replications. This is particularly
useful for verifying that the requested number of bootstrap draws was
attained after accounting for any non-finite results. Finally, the output
includes a brief instructional note suggesting the use of the plot()
method for visual inference, which is the recommended way to interpret the
bootstrap results relative to the observed statistics.
The function returns the input object x invisibly (via invisible(x)).
Its primary purpose is the side effect of printing a formatted summary to the
R console.
Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709–748.
westerlund_test,
plot.westerlund_test
## Example: generating a mock westerlund_test object set.seed(123) mock_res <- list( test_stats = list(Gt = -3.14, Ga = -10.22, Pt = -2.71, Pa = -7.89), bootstrap_distributions = matrix(rnorm(400), ncol = 4) ) class(mock_res) <- "westerlund_test" ## Calling the object directly invokes print.westerlund_test mock_res ## Or call it explicitly print(mock_res)## Example: generating a mock westerlund_test object set.seed(123) mock_res <- list( test_stats = list(Gt = -3.14, Ga = -10.22, Pt = -2.71, Pa = -7.89), bootstrap_distributions = matrix(rnorm(400), ncol = 4) ) class(mock_res) <- "westerlund_test" ## Calling the object directly invokes print.westerlund_test mock_res ## Or call it explicitly print(mock_res)
Shifts a vector forward or backward by a specified number of positions and
fills out-of-range values with NA. This helper is designed for
position-based transformations where missing values should be explicitly
propagated as NA rather than assumed to be zero.
shiftNA(v, k)shiftNA(v, k)
v |
A vector (numeric, character, etc.). |
k |
Integer. Shift order. Positive values correspond to lags
(shifting the series downward, i.e.\ |
This function performs a position-based shift of the input vector v.
Unlike strict time-indexed helpers (such as get_lag), shiftNA()
does not rely on an explicit time index and does not propagate gaps based on
timestamps. Instead, it performs a simple index shift, padding the
resulting empty slots with NA.
Let denote the length of v. The behaviour is:
k = 0: return v unchanged,
k > 0: lag by k positions; the first k elements are NA,
k < 0: lead by positions; the last elements are NA.
Formally, for k > 0,
and for k < 0,
A vector of the same length as v, containing the shifted values
with NA inserted where the shift would otherwise go out of range.
This section explains the implications of using NA padding in
recursive or difference-based calculations.
Using NA is the standard behavior in R for out-of-bounds operations.
It ensures that subsequent calculations (like v - shiftNA(v, 1))
correctly result in NA for boundary cases, preventing the
accidental use of arbitrary values (like zero) in statistical estimations
unless explicitly intended.
Unlike get_lag, shiftNA() is position-based and ignores
time indices. Use shiftNA() when you need a fast, simple shift on
vectors that are already correctly ordered.
WesterlundBootstrap,
get_lag,
get_diff
v <- c(1, 2, 3, 4, 5) ## Lag by one (k = 1) shiftNA(v, 1) # [1] NA 1 2 3 4 ## Lead by one (k = -1) shiftNA(v, -1) # [1] 2 3 4 5 NA ## Larger shifts shiftNA(v, 2) # [1] NA NA 1 2 3 shiftNA(v, -2) # [1] 3 4 5 NA NAv <- c(1, 2, 3, 4, 5) ## Lag by one (k = 1) shiftNA(v, 1) # [1] NA 1 2 3 4 ## Lead by one (k = -1) shiftNA(v, -1) # [1] 2 3 4 5 NA ## Larger shifts shiftNA(v, 2) # [1] NA NA 1 2 3 shiftNA(v, -2) # [1] 3 4 5 NA NA
Provides a detailed summary of the Westerlund (2007) panel cointegration test
results. This method extends the basic print output by including
standardized Z-scores, asymptotic p-values, bootstrap p-values (if applicable),
and the Mean Group (MG) parameter estimates.
## S3 method for class 'westerlund_test' summary(object, ...) ## S3 method for class 'summary.westerlund_test' print(x, ...)## S3 method for class 'westerlund_test' summary(object, ...) ## S3 method for class 'summary.westerlund_test' print(x, ...)
object |
An object of class |
x |
An object of class |
... |
Additional arguments passed to or from other methods. |
The summary method acts as a bridge between the raw test results and
a human-readable report. While the standard print method focuses on
the raw statistics, summary organizes the data into tables that
compare the observed statistics against their standardized counterparts and
associated p-values.
Specifically, the output includes a summary of the model settings (such as
the inclusion of constants or trends), the main test statistics table, and the
Mean Group results which highlight the average error correction speed ()
and long-run relationships (). If a bootstrap was performed, the
bootstrap p-values are appended to the main statistics table for direct
comparison with asymptotic values.
The summary method returns a list of class summary.westerlund_test
containing:
stats_table: A data frame combining raw stats, Z-scores, and p-values.
settings: A list of the parameters used in the test.
mg_results: A data frame of the Mean Group estimation results.
n_units: The number of cross-sectional units in the panel.
westerlund_test, plot.westerlund_test
## Assuming 'res' is the output from westerlund_test() # res <- westerlund_test(data, ...) # summary(res)## Assuming 'res' is the output from westerlund_test() # res <- westerlund_test(data, ...) # summary(res)
Implements the error-correction-based panel cointegration tests proposed by
Westerlund (2007). The function computes both mean-group (, )
and pooled (, ) statistics based on unit-specific ECMs, with
Stata-like input validation, time-continuity checks, and optional bootstrap
p-values for the raw statistics.
westerlund_test( data, yvar, xvars, idvar, timevar, constant = FALSE, trend = FALSE, lags, leads = NULL, westerlund = FALSE, aic = TRUE, bootstrap = -1, indiv.ecm = FALSE, lrwindow = 2, verbose = FALSE )westerlund_test( data, yvar, xvars, idvar, timevar, constant = FALSE, trend = FALSE, lags, leads = NULL, westerlund = FALSE, aic = TRUE, bootstrap = -1, indiv.ecm = FALSE, lrwindow = 2, verbose = FALSE )
data |
A |
yvar |
String. Name of the dependent variable. |
xvars |
Character vector. Names of regressors entering the long-run relationship. A maximum of 6 regressors is allowed. If |
idvar |
String. Column identifying cross-sectional units. Missing values are not allowed. |
timevar |
String. Column identifying time. Within each unit, the time index must be strictly increasing and continuous (differences of 1). |
constant |
Logical. Include an intercept in the cointegrating relationship. |
trend |
Logical. Include a linear trend in the cointegrating relationship. Setting |
lags |
Integer or length-2 integer vector. Fixed lag order or range |
leads |
Integer or length-2 integer vector, or |
westerlund |
Logical. If |
aic |
Logical. If |
bootstrap |
Integer. If |
indiv.ecm |
Logical. If |
lrwindow |
Integer. Window parameter used for long-run variance estimation in internal routines. |
verbose |
Logical. If |
The test is based on estimating unit-specific error-correction models (ECMs) and testing the null hypothesis of no error correction for all cross-sectional units. Four statistics are reported:
: mean of individual t-ratios of the error-correction coefficient,
: mean of individually scaled error-correction coefficients,
: pooled t-type statistic based on a common error-correction coefficient,
: pooled statistic based on the scaled common coefficient.
Input validation and data checks. The function enforces several Stata-like guardrails:
A maximum of 6 regressors can be specified in xvars.
If trend=TRUE, then constant=TRUE is required.
If westerlund=TRUE, at least a constant must be included and xvars must contain at most one regressor.
The panel must be declared by specifying idvar and timevar, and idvar may not contain missing values.
Rows with missing values are excluded, and continuity checks are applied to the resulting usable sample.
Inference.
If bootstrap <= 0, the function returns standardized Z-scores and
asymptotic (left-tail) p-values. If bootstrap > 0,
it additionally produces bootstrap p-values for the raw statistics.
An object of class westerlund_test. This is a list containing the following components:
test_stats: A numeric vector containing:
gt, ga, pt, pa: Raw test statistics.
gt_z, ga_z, pt_z, pa_z: Standardized Z-scores.
gt_pval, ga_pval, pt_pval, pa_pval: Asymptotic left-tail p-values.
boot_pvals: A list containing gt_pvalboot, ga_pvalboot, pt_pvalboot, pa_pvalboot
(only populated if bootstrap > 0).
bootstrap_distributions: A matrix of the bootstrap replicates for each statistic.
unit_data: A data.frame with unit-specific ECM results (alpha and beta estimates).
indiv_data: A list of length equal to the number of cross-sectional units storing unit-specific results.
mean_group: A list containing the Mean Group estimates (mg_alpha, mg_betas)
and their respective standard errors.
settings: A list of the lag/lead specifications and internal parameters used.
mg_results: A data.frame summarizing the Mean Group estimation results.
This section demonstrates how to use westerlund_test() and interprets common outputs.
The test evaluates the null hypothesis of no error correction for all cross-sectional units. Rejection suggests that at least some units exhibit error-correcting behaviour.
Unbalanced panels are permitted as long as each unit's time index is continuous after removing rows with missing values. If any unit has gaps (e.g., time jumps from 2001 to 2003), the function stops and reports the unit identifier.
## Plain (asymptotic) test
res_plain <- westerlund_test(
data = df,
yvar = "y",
xvars = c("x1","x2"),
idvar = "id",
timevar = "t",
constant = TRUE,
trend = FALSE,
lags = 1,
leads = 0,
westerlund = FALSE,
bootstrap = -1,
indiv.ecm = FALSE,
lrwindow = 2
)
Continuous time-series are required: at least one unit has holes in the time index. Fix by ensuring the usable sample is continuous.
At least ... observations are required: units are too short for the requested lag/lead orders.
If a trend is included, a constant must be included:
set constant=TRUE when trend=TRUE.
Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709–748.
WesterlundPlain,
WesterlundBootstrap,
DisplayWesterlund
Internal reporting helper that prints two Stata-style coefficient tables:
(1) the mean-group error-correction model (short-run ECM output) and
(2) the estimated long-run relationship with short-run adjustment.
This function delegates all table computation and formatting to
westerlund_test_reg.
westerlund_test_mg(b, V, b2, V2, auto, verbose = FALSE)westerlund_test_mg(b, V, b2, V2, auto, verbose = FALSE)
b |
A numeric vector of coefficients for the long-run relationship and short-run adjustment terms (as reported in the second table). |
V |
A numeric variance-covariance matrix corresponding to |
b2 |
A numeric vector of coefficients for the mean-group error-correction model (as reported in the first table). |
V2 |
A numeric variance-covariance matrix corresponding to |
auto |
Logical/integer. If non-zero, prints a note indicating that short-run coefficients (apart from the error-correction term) are omitted because selected lag/lead lengths may differ across units. |
verbose |
Logical. If |
What it prints. The function prints two sections:
Mean-group error-correction model: This corresponds to reporting the mean-group ECM coefficients (often denoted b2 with variance V2). If auto is non-zero, a note is printed to remind that short-run coefficients may be omitted due to unit-specific lag/lead selection.
Estimated long-run relationship and short run adjustment: This corresponds to reporting the long-run relationship and the adjustment dynamics (b with variance V).
In each section, westerlund_test_reg is called to compute standard
errors, z-statistics, two-sided p-values, and 95% confidence intervals, and to
print the formatted coefficient table using stats::printCoefmat().
Intended use. This is an internal display helper used in Stata-aligned reporting flows. It is not required for computing the Westerlund test statistics themselves.
A named list containing two matrices:
mg_model |
A numeric matrix containing the Mean-group ECM results (coefficients, SE, z, p-values, and CI). |
long_run |
A numeric matrix containing the long-run relationship and adjustment results. |
This section shows the expected calling pattern for westerlund_test_mg()
and explains the role of the auto flag.
In mean-group implementations, it is common to show:
a short-run ECM table (possibly with omitted coefficients when lag/lead lengths differ by unit),
a long-run relationship table with the short-run adjustment term.
westerlund_test_mg() prints these tables sequentially.
## Long-run relationship table inputs b <- c(alpha = -0.20, beta = 1.05) V <- diag(c(0.03^2, 0.10^2)) rownames(V) <- colnames(V) <- names(b) ## Mean-group ECM table inputs b2 <- c(ec = -0.18) V2 <- matrix(0.04^2, nrow = 1, ncol = 1) rownames(V2) <- colnames(V2) <- names(b2) ## Print both sections (auto = TRUE prints the omission note) westerlund_test_mg(b, V, b2, V2, auto = TRUE, verbose = FALSE)
westerlund_test_reg,
westerlund_test
Internal helper that calculates standard errors, z-statistics, p-values, and confidence intervals for a given coefficient vector and covariance matrix, then prints a Stata-style regression table.
westerlund_test_reg(b, V, verbose = FALSE)westerlund_test_reg(b, V, verbose = FALSE)
b |
A named numeric vector of coefficients. |
V |
A numeric variance-covariance matrix corresponding to |
verbose |
Logical. If |
Calculation Logic.
The function replicates the behavior of Stata's ereturn post when no
degrees of freedom are specified, using the standard normal distribution (Z)
for all inference:
Standard Errors: Derived as the square root of the diagonal of V.
z-statistics: Calculated as .
P-values: Two-tailed probabilities from the standard normal distribution.
Confidence Intervals: Calculated as .
Formatting.
The table is printed using stats::printCoefmat() to ensure clean
alignment and decimal consistency. It includes columns for the Coefficient,
Standard Error, z-statistic, P-value, and the 95% Confidence Interval.
Intended use.
This is an internal utility called by westerlund_test_mg. It
provides a standardized way to report results across different parts of the
Westerlund cointegration test output.
This function returns a numeric matrix with rows corresponding to the
coefficients in b and columns for "Coef.", "Std. Err.", "z", "P>|z|",
and the 95% confidence interval bounds.
This section describes the alignment with econometric software reporting standards.
The output format (column headers and statistical assumptions) is designed to
match the output seen in Stata's xtcpmg or xtwest routines. This
ensures that users transitioning from or comparing results with Stata find the
output familiar.
westerlund_test_mg,
westerlund_test
## Define simple coefficient vector and VCV b <- c(alpha = -0.20, beta = 1.05) V <- diag(c(0.03^2, 0.10^2)) names(b) <- rownames(V) <- colnames(V) <- c("alpha", "beta") ## Print the formatted table westerlund_test_reg(b, V, verbose = TRUE)## Define simple coefficient vector and VCV b <- c(alpha = -0.20, beta = 1.05) V <- diag(c(0.03^2, 0.10^2)) names(b) <- rownames(V) <- colnames(V) <- c("alpha", "beta") ## Print the formatted table westerlund_test_reg(b, V, verbose = TRUE)
Internal bootstrap routine for generating a bootstrap distribution of the
Westerlund (2007) ECM-based panel cointegration statistics under the null
hypothesis of no error correction. The routine estimates short-run dynamics for
each unit, constructs residual-based bootstrap samples in a Stata-aligned manner,
and re-computes , , , and by calling
WesterlundPlain on each bootstrap sample.
WesterlundBootstrap( data, touse, idvar, timevar, yvar, xvars, constant = FALSE, trend = FALSE, lags, leads = NULL, westerlund = FALSE, aic = TRUE, bootstrap = 100, indiv.ecm = FALSE, lrwindow = 2, verbose = FALSE )WesterlundBootstrap( data, touse, idvar, timevar, yvar, xvars, constant = FALSE, trend = FALSE, lags, leads = NULL, westerlund = FALSE, aic = TRUE, bootstrap = 100, indiv.ecm = FALSE, lrwindow = 2, verbose = FALSE )
data |
A |
touse |
Logical vector of length |
idvar |
String. Column identifying cross-sectional units. |
timevar |
String. Column identifying time. Within-unit time gaps are handled by Stata-style differencing in the bootstrap setup. |
yvar |
String. Name of the dependent variable (levels) in the original data. |
xvars |
Character vector. Names of regressors (levels) in the original data. |
constant |
Logical. Include a constant term in the short-run regression used to obtain bootstrap residuals and coefficients. |
trend |
Logical. Include a trend term in the short-run regression used to obtain bootstrap residuals and coefficients. |
lags |
Integer or length-2 integer vector. Fixed lag order or range |
leads |
Integer or length-2 integer vector, or |
westerlund |
Logical. If |
aic |
Logical. If |
bootstrap |
Integer. Number of bootstrap replications. |
indiv.ecm |
Logical. If |
lrwindow |
Integer. Bartlett kernel window (maximum lag) forwarded to |
verbose |
Logical. If |
Purpose and status.
WesterlundBootstrap() is an internal engine typically called by
westerlund_test when bootstrap inference is requested.
It returns a matrix of bootstrap test statistics that can be used to compute
bootstrap p-values for the raw statistics.
High-level algorithm. For each bootstrap replication, the routine:
Filters the original data using touse and removes missing yvar/xvars.
For each unit, estimates a short-run model for using Stata-style differencing that respects time gaps:
if is not consecutive, the corresponding difference is set to missing.
Stores the estimated coefficients for lagged and leads/lags/current , and constructs residuals .
Demeans residuals within each unit and constructs within-unit centered differenced regressors on the residual-support rows.
Resamples residual-support clusters to form bootstrap innovations, recursively generates bootstrap with an AR recursion implied by the estimated coefficients, integrates to levels to obtain bootstrap booty and bootx series, and enforces Stata-style truncation rules.
Calls WesterlundPlain on the constructed bootstrap panel to compute , , , and .
The resulting statistics are stored row-by-row in BOOTSTATS.
Time indexing in bootstrap setup.
Unlike the main estimation routines (which use strict time-indexed helpers), this bootstrap routine uses a local differencing helper diff_ts() that mimics Stata's D. operator under gaps: differences are marked NA when diff(timevec) != 1.
Lag/lead selection in the bootstrap setup.
If lags and/or leads are supplied as ranges, the routine performs an information-criterion search over candidate short-run specifications for each unit. In westerlund=TRUE mode, a Westerlund-style criterion is used; otherwise, a standard AIC-based criterion is applied.
Random number generation.
The routine sets the RNG kind to Mersenne-Twister and uses a deterministic sequence given the current RNG state. Users should set set.seed() upstream if reproducibility is desired.
A list containing:
BOOTSTATS: a numeric matrix with bootstrap rows and 4 columns,
containing bootstrap replications of , , , and
(in that order).
This section describes how WesterlundBootstrap() constructs bootstrap
samples and how it is typically used in the overall testing workflow.
westerlund_test()?When the user-facing westerlund_test is called with bootstrap > 0, it typically:
obtains a bootstrap distribution via WesterlundBootstrap(),
computes the observed statistics via WesterlundPlain,
derives bootstrap p-values by comparing observed raw statistics to the bootstrap distribution.
The routine first estimates a short-run model for within each
unit and extracts residuals. It then resamples these residuals (clustered on the
residual-support rows) and uses the stored short-run coefficients to propagate
bootstrap dynamics.
To mirror Stata semantics in differencing, the bootstrap setup uses a local difference rule: if two adjacent observations are not exactly one time unit apart, the difference is set to missing for that location.
## Upstream: set a seed for reproducibility
set.seed(123)
boot_res <- WesterlundBootstrap(
data = df,
touse = touse,
idvar = "id",
timevar = "t",
yvar = "y",
xvars = c("x1","x2"),
constant = TRUE,
trend = FALSE,
lags = 1,
leads = 0,
westerlund = FALSE,
bootstrap = 399,
indiv.ecm = TRUE,
lrwindow = 2,
verbose = FALSE
)
## Example left-tail p-value for Gt (assuming 'obs' exists)
# mean(boot_res$BOOTSTATS[,1] <= obs$Gt, na.rm = TRUE)
Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709–748.
westerlund_test,
WesterlundPlain
Internal plain (non-bootstrap) routine for computing the four Westerlund (2007)
ECM-based panel cointegration test statistics , , ,
and . The function estimates unit-specific ECM regressions to form the
mean-group statistics and then constructs pooled (panel) statistics using
cross-unit aggregation and partialling-out steps. Time indexing is handled
strictly via gap-aware lag/difference helpers.
WesterlundPlain( data, touse, idvar, timevar, yvar, xvars, constant = FALSE, trend = FALSE, lags, leads = NULL, lrwindow = 2, westerlund = FALSE, aic = TRUE, bootno = FALSE, indiv.ecm = FALSE, verbose = FALSE )WesterlundPlain( data, touse, idvar, timevar, yvar, xvars, constant = FALSE, trend = FALSE, lags, leads = NULL, lrwindow = 2, westerlund = FALSE, aic = TRUE, bootno = FALSE, indiv.ecm = FALSE, verbose = FALSE )
data |
A |
touse |
Logical vector of length |
idvar |
String. Column identifying cross-sectional units. |
timevar |
String. Column identifying time. |
yvar |
String. Name of the dependent variable (levels). |
xvars |
Character vector. Names of regressors in the long-run relationship (levels). |
constant |
Logical. If |
trend |
Logical. If |
lags |
Integer or length-2 integer vector. Fixed lag order or range |
leads |
Integer or length-2 integer vector, or |
lrwindow |
Integer. Bartlett kernel window (maximum lag) used in long-run variance calculations via |
westerlund |
Logical. If |
aic |
Logical. If |
bootno |
Logical. If |
indiv.ecm |
Logical. If |
verbose |
Logical. If |
Purpose and status.
WesterlundPlain() is typically called internally by westerlund_test.
It returns the four raw test statistics and lag/lead diagnostics needed
for printing and standardization.
Workflow overview. The routine proceeds in two main stages:
Unit-specific ECM regressions (Loop 1): For each cross-sectional unit, it constructs an ECM with
as the dependent variable and includes deterministic terms (optional), ,
, lagged , and leads/lags of . Lags and leads are computed using
strict time-indexed helpers (get_lag, get_diff), which respect gaps in the time index.
If lags and/or leads are provided as ranges, an information-criterion search selects the
lag/lead orders for each unit. The routine stores the unit-level error-correction estimate
and its standard error.
Pooled (panel) aggregation (Loop 2): Using the mean of selected lag/lead orders across units,
the routine constructs pooled quantities needed for and via partialling-out regressions
and cross-unit aggregation of residual products.
Long-run variance calculations.
Long-run variances are computed using calc_lrvar_bartlett with
maxlag = lrwindow. In westerlund=TRUE mode, the routine applies
Stata-like trimming at the start/end of the differenced series based on selected
lags/leads prior to long-run variance estimation.
Returned statistics.
Let denote the unit-specific error-correction coefficient
on (as constructed in the ECM), with standard error .
The routine computes:
: the mean of the individual t-ratios ,
: a scaled mean-group statistic using a unit-specific normalization factor derived from long-run variances,
: a pooled t-type statistic based on a pooled and its pooled standard error,
: a pooled scaled statistic using an average effective time dimension.
A nested list containing:
stats: A list of the four raw Westerlund test statistics:
Gt: Mean-group tau statistic.
Ga: Mean-group alpha statistic.
Pt: Pooled tau statistic.
Pa: Pooled alpha statistic.
indiv_data: A named list where each element corresponds to a cross-sectional unit (ID), containing:
ai: The estimated speed of adjustment (alpha).
seai: The standard error of alpha (adjusted for degrees of freedom).
betai: Vector of long-run coefficients ().
blag, blead: The lags and leads selected for that specific unit.
ti: Raw observation count for the unit.
tnorm: Degrees of freedom used for normalization.
reg_coef: If indiv.ecm = TRUE, the full coefficient matrix from westerlund_test_reg.
results_df: A summary data.frame containing all unit-level results in vectorized format.
settings: A list of routine metadata:
constant: Logical indicating if a constant was included.
trend: Logical indicating if a trend was included.
meanlag, meanlead: Integer averages of the selected unit lags/leads.
realmeanlag, realmeanlead: Numeric averages of the selected unit lags/leads.
auto: Logical; TRUE if automatic selection (ranges) was used.
Loop 1 (mean-group) estimates unit-specific ECMs. Each unit produces an
estimated error-correction coefficient on and an associated standard
error. These are aggregated into and .
Loop 2 (pooled) fixes a common short-run structure based on the average
selected lag/lead orders and constructs pooled residual products to obtain and .
All lags and differences are computed using strict time-based helpers
(get_lag, get_diff). This ensures that gaps in the
time index propagate as missing values rather than shifting across gaps.
Westerlund, J. (2007). Testing for error correction in panel data. Oxford Bulletin of Economics and Statistics, 69(6), 709–748.
westerlund_test,
WesterlundBootstrap,
get_lag,
get_diff,
calc_lrvar_bartlett
set.seed(123) N <- 5 T <- 20 df <- data.frame( id = rep(1:N, each = T), t = rep(1:T, N), y = rnorm(N * T), x1 = rnorm(N * T), x2 = rnorm(N * T) ) touse <- rep(TRUE, nrow(df)) plain_res <- WesterlundPlain( data = df, touse = touse, idvar = "id", timevar = "t", yvar = "y", xvars = c("x1","x2"), lags = 1, leads = 0 ) # Accessing results from the nested structure: stats <- plain_res$stats print(c(Gt = stats$Gt, Ga = stats$Ga, Pt = stats$Pt, Pa = stats$Pa)) # Checking unit-specific coefficients for ID '101' unit_101 <- plain_res$indiv_data[["101"]] print(unit_101$ai)set.seed(123) N <- 5 T <- 20 df <- data.frame( id = rep(1:N, each = T), t = rep(1:T, N), y = rnorm(N * T), x1 = rnorm(N * T), x2 = rnorm(N * T) ) touse <- rep(TRUE, nrow(df)) plain_res <- WesterlundPlain( data = df, touse = touse, idvar = "id", timevar = "t", yvar = "y", xvars = c("x1","x2"), lags = 1, leads = 0 ) # Accessing results from the nested structure: stats <- plain_res$stats print(c(Gt = stats$Gt, Ga = stats$Ga, Pt = stats$Pt, Pa = stats$Pa)) # Checking unit-specific coefficients for ID '101' unit_101 <- plain_res$indiv_data[["101"]] print(unit_101$ai)