| Title: | Sequential Threshold-Spatial-Quantile Panel Estimation |
|---|---|
| Description: | Implements a sequential panel estimation protocol for regional economic panels that combines three estimation layers in a fixed order. The first layer applies a two-way fixed effects baseline. The second layer applies the panel threshold regression method of Hansen (1999) <doi:10.1016/S0304-4076(99)00025-1> to identify structural breaks at an unknown threshold of a moderating variable, with bootstrap inference following Hansen (2000) <doi:10.1111/1468-0262.00124>. The third layer applies a spatial Durbin model with an impact decomposition following LeSage and Pace (2009, ISBN:978-1-4200-6424-7) to quantify direct and indirect spillover effects. The fourth layer applies the two-step panel quantile estimator of Canay (2011) <doi:10.1111/j.1368-423X.2011.00349.x> to document distributional heterogeneity in the outcome. The threshold identified in the second layer defines a subsample used as structured input to the fourth layer, and a consistency check evaluates whether the three sets of results are jointly compatible with a common underlying structural relationship. An illustrative panel of 33 districts of the state of Maharashtra, India, observed over 10 agricultural years, is included with the package. |
| Authors: | Prakash Vhankade [aut, cre] (ORCID: <https://orcid.org/0009-0005-2436-0603>) |
| Maintainer: | Prakash Vhankade <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.2 |
| Built: | 2026-06-22 19:40:36 UTC |
| Source: | https://github.com/cran/tsqr |
Implements the Canay (2011) two-step panel quantile estimator. In the first step, unit and time fixed effects are estimated from a standard two-way fixed effects model and subtracted from the outcome. In the second step, standard quantile regression is applied to the corrected outcome at each requested quantile.
canay_quantile( formula, data, index, taus = c(0.1, 0.25, 0.5, 0.75, 0.9), n_boot = 500, seed = 42, ptr_result = NULL, threshold_var = NULL, verbose = TRUE )canay_quantile( formula, data, index, taus = c(0.1, 0.25, 0.5, 0.75, 0.9), n_boot = 500, seed = 42, ptr_result = NULL, threshold_var = NULL, verbose = TRUE )
formula |
A formula of the form |
data |
A data frame. |
index |
A character vector of length 2 giving unit and time identifiers. |
taus |
Numeric vector of quantile levels. Default is
|
n_boot |
Integer. Number of bootstrap replications for standard
errors. Default is |
seed |
Integer. Random seed for reproducibility. Default is |
ptr_result |
Optional object of class |
threshold_var |
Character string naming the threshold variable
used to define the high-regime subsample. Required if |
verbose |
Logical. Default is |
The returned list contains coefs (a data frame of quantile-specific
coefficients for all variables and all taus), se (a matching data
frame of bootstrap standard errors), slope_tests (a data frame of
pairwise slope equality z-tests for the key variable), gradient (a
named numeric vector giving the coefficient on the key variable at
each quantile), gradient_highstress (as gradient for the
high-regime subsample, or NULL if ptr_result was not supplied),
fe_corrected (the fixed-effect-corrected outcome vector), and call
(the matched function call).
A list of class canay_quantile containing the quantile
coefficients, bootstrap standard errors, pairwise slope equality
tests, and (optionally) the high-stress subsample results. See
Details.
Canay IA (2011). "A Simple Approach to Quantile Regression for Panel Data." Econometrics Journal, 14(3), 368-386.
data(maharashtra_panel) result <- canay_quantile( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, index = c("district", "year"), taus = c(0.25, 0.50, 0.75), n_boot = 50, seed = 42, verbose = FALSE ) print(result)data(maharashtra_panel) result <- canay_quantile( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, index = c("district", "year"), taus = c(0.25, 0.50, 0.75), n_boot = 50, seed = 42, verbose = FALSE ) print(result)
A balanced panel dataset covering 33 districts of the state of Maharashtra, India, over 10 agricultural years from 2014-15 to 2023-24 (330 observations). This dataset is the empirical illustration used throughout this package.
data(maharashtra_panel)data(maharashtra_panel)
A data frame with 330 rows and 30 variables:
District name.
District numeric identifier, 1 to 33.
Calendar year, 2014 to 2023.
Agricultural year label, e.g. "2014-15".
Gross District Value Added from Agriculture and Allied Sectors, constant prices, in INR crore.
Natural log of GDVA_agri_crore.
Farm household income proxy, INR thousands.
Area under high-yielding crop varieties, thousand hectares.
Natural log of HYV_area_000ha.
Annual rainfall, millimetres.
Natural log of Rainfall_mm.
Mean annual temperature, degrees Celsius.
Standardised within-district temperature anomaly.
Maximum Consecutive Dry Spell, days.
Standardised within-district rainfall anomaly.
Pesticide consumption, metric tonnes.
NPK fertiliser consumption, metric tonnes.
Share of gross cropped area under irrigation.
Variable input cost per hectare, INR.
Agricultural training intensity.
Agricultural training expenditure, INR lakhs.
Extension worker density per 1000 farm households.
Number of primary agricultural credit societies.
Number of agricultural market yards.
Rural road density.
Average operational holding size, hectares.
Share of holdings below 2 hectares.
Cropping intensity index.
Crop concentration index, 0 to 1.
Share of gross cropped area that is rainfed.
Compiled from public secondary government and research institution sources for the state of Maharashtra, India.
data(maharashtra_panel) head(maharashtra_panel)data(maharashtra_panel) head(maharashtra_panel)
Estimates a two-regime panel threshold regression model following Hansen (1999, 2000), with two-way fixed effects, grid search for the optimal threshold, bootstrap likelihood ratio test, and confidence interval construction by likelihood ratio inversion.
ptr_hansen( formula, data, threshold_var, index, trim = c(0.15, 0.85), n_boot = 300, seed = 42, verbose = TRUE )ptr_hansen( formula, data, threshold_var, index, trim = c(0.15, 0.85), n_boot = 300, seed = 42, verbose = TRUE )
formula |
A formula of the form |
data |
A data frame containing all variables in |
threshold_var |
Character string naming the threshold variable (the moderating variable hypothesised to govern the structural break). |
index |
A character vector of length 2 giving the unit and time
identifiers, e.g. |
trim |
A numeric vector of length 2 giving the lower and upper
percentile trimming bounds for the threshold grid. Default is
|
n_boot |
Integer. Number of bootstrap replications for the likelihood
ratio test. Default is |
seed |
Integer. Random seed for reproducibility. Default is |
verbose |
Logical. If |
The returned list contains the following elements: threshold (the
optimal threshold estimate), ci (a named numeric vector with lower
and upper bounds of the 95 percent confidence interval), LR_stat
(the likelihood ratio statistic), p_value (the bootstrap p-value for
the null of no threshold), cv (a named vector of bootstrap critical
values at the 10, 5, and 1 percent levels), coef_regime1 and
coef_regime2 (the coefficient on the key variable in each regime),
regime_shift (the difference between the two regime coefficients),
ssr_grid (a data frame of threshold candidates and their sum of
squared residuals), LR_boot (the vector of bootstrap likelihood ratio
statistics), fit_regime1 and fit_regime2 (the fitted regime models),
and call (the matched function call).
A list of class ptr_hansen containing the threshold estimate,
confidence interval, likelihood ratio statistic, bootstrap p-value,
regime-specific coefficients, and the fitted regime models. See Details.
Hansen BE (1999). "Threshold Effects in Non-Dynamic Panels: Estimation, Testing, and Inference." Journal of Econometrics, 93(2), 345-368.
Hansen BE (2000). "Sample Splitting and Threshold Estimation." Econometrica, 68(3), 575-603.
data(maharashtra_panel) result <- ptr_hansen( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, threshold_var = "MCDS_days", index = c("district", "year"), n_boot = 50, seed = 42, verbose = FALSE ) print(result)data(maharashtra_panel) result <- ptr_hansen( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, threshold_var = "MCDS_days", index = c("district", "year"), n_boot = 50, seed = 42, verbose = FALSE ) print(result)
Constructs a k-nearest neighbour spatial weights matrix, runs pre-tests for spatial dependence, estimates a panel Spatial Durbin Model (SDM), and computes the direct, indirect, and total impact decomposition following LeSage and Pace (2009).
sdm_impact(formula, data, index, coords, k = 4, verbose = TRUE)sdm_impact(formula, data, index, coords, k = 4, verbose = TRUE)
formula |
A formula of the form |
data |
A data frame. |
index |
A character vector of length 2 giving unit and time identifiers. |
coords |
A matrix or data frame with two columns giving the
longitude and latitude (or x and y) coordinates of each unique
cross-sectional unit, in the same order as the sorted unique values
of |
k |
Integer. Number of nearest neighbours for the spatial weights
matrix. Default is |
verbose |
Logical. Default is |
The returned list contains W (the row-standardised spatial weights
matrix), impacts (a data frame with Direct, Indirect, and Total
columns for each variable), spillover_ratio (the Indirect over Total
ratio for the key variable), rho (the spatial autoregressive
coefficient), moran_y and moran_x (Moran's I test results on the
cross-sectional means of the outcome and key variable), fit (the
fitted spatial model object), and call (the matched function call).
A list of class sdm_impact containing the spatial weights
matrix, the impact decomposition table, the spillover ratio, the
spatial autoregressive coefficient, Moran's I test results, and the
fitted model object. See Details.
LeSage J, Pace RK (2009). Introduction to Spatial Econometrics. CRC Press, Boca Raton.
## Not run: # sdm_impact() requires coordinate columns not included in the # bundled maharashtra_panel dataset. Supply your own coordinates: data(maharashtra_panel) units <- sort(unique(maharashtra_panel$district)) coords <- data.frame( longitude = stats::runif(length(units), 73, 80), latitude = stats::runif(length(units), 16, 22) ) result <- sdm_impact( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, index = c("district", "year"), coords = coords, k = 4 ) print(result) ## End(Not run)## Not run: # sdm_impact() requires coordinate columns not included in the # bundled maharashtra_panel dataset. Supply your own coordinates: data(maharashtra_panel) units <- sort(unique(maharashtra_panel$district)) coords <- data.frame( longitude = stats::runif(length(units), 73, 80), latitude = stats::runif(length(units), 16, 22) ) result <- sdm_impact( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, index = c("district", "year"), coords = coords, k = 4 ) print(result) ## End(Not run)
Evaluates whether the regime shift identified by ptr_hansen(), the
spillover ratio identified by sdm_impact(), and the distributional
gradient identified by canay_quantile() are mutually consistent with
a common underlying structural relationship.
tsq_consistency_check(ptr_result, sdm_result, qr_result, verbose = TRUE)tsq_consistency_check(ptr_result, sdm_result, qr_result, verbose = TRUE)
ptr_result |
Object of class |
sdm_result |
Object of class |
qr_result |
Object of class |
verbose |
Logical. Default is |
Criterion 1 checks whether the regime shift and the spillover ratio have the same sign. Criterion 2 checks whether the distributional gradient is steeper in the high-stress subsample than in the full sample. Criterion 3 checks whether the full-sample gradient is monotonically declining from the lowest to the highest quantile.
A list of class tsq_consistency with logical elements
criterion1, criterion2, criterion3, and all_pass, together
with a summary_table data frame. See Details.
ptr_mock <- structure( list(threshold = 23, regime_shift = 0.30, coef_regime1 = 0.346, coef_regime2 = 0.646), class = "ptr_hansen" ) sdm_mock <- structure( list(spillover_ratio = 0.467, rho = 0.31), class = "sdm_impact" ) qr_mock <- structure( list(gradient = c(Q10 = 0.57, Q50 = 0.55, Q90 = 0.52), gradient_highstress = c(Q10 = 0.62, Q50 = 0.56, Q90 = 0.50), taus = c(0.10, 0.50, 0.90), key_var = "ln_HYV"), class = "canay_quantile" ) check <- tsq_consistency_check(ptr_mock, sdm_mock, qr_mock, verbose = FALSE) print(check)ptr_mock <- structure( list(threshold = 23, regime_shift = 0.30, coef_regime1 = 0.346, coef_regime2 = 0.646), class = "ptr_hansen" ) sdm_mock <- structure( list(spillover_ratio = 0.467, rho = 0.31), class = "sdm_impact" ) qr_mock <- structure( list(gradient = c(Q10 = 0.57, Q50 = 0.55, Q90 = 0.52), gradient_highstress = c(Q10 = 0.62, Q50 = 0.56, Q90 = 0.50), taus = c(0.10, 0.50, 0.90), key_var = "ln_HYV"), class = "canay_quantile" ) check <- tsq_consistency_check(ptr_mock, sdm_mock, qr_mock, verbose = FALSE) print(check)
Runs the full four-step sequential panel estimation protocol: a
two-way fixed effects baseline, ptr_hansen(), sdm_impact(), and
canay_quantile(), followed by tsq_consistency_check().
tsq_panel( formula, data, index, threshold_var, coords, k = 4, taus = c(0.1, 0.25, 0.5, 0.75, 0.9), trim = c(0.15, 0.85), n_boot_ptr = 300, n_boot_qr = 500, seed = 42, verbose = TRUE )tsq_panel( formula, data, index, threshold_var, coords, k = 4, taus = c(0.1, 0.25, 0.5, 0.75, 0.9), trim = c(0.15, 0.85), n_boot_ptr = 300, n_boot_qr = 500, seed = 42, verbose = TRUE )
formula |
A formula of the form |
data |
A data frame. |
index |
A character vector of length 2: |
threshold_var |
Character string naming the threshold variable
for |
coords |
A matrix or data frame with coordinate columns, one row per unique cross-sectional unit in sorted order. Required for the spatial step. |
k |
Integer. Nearest neighbours for the spatial weights matrix.
Default is |
taus |
Numeric vector of quantile levels. Default is
|
trim |
Numeric vector of length 2 for threshold grid trimming.
Default is |
n_boot_ptr |
Integer. Bootstrap replications for the threshold
test. Default is |
n_boot_qr |
Integer. Bootstrap replications for quantile standard
errors. Default is |
seed |
Integer. Random seed. Default is |
verbose |
Logical. Default is |
A list of class tsq_panel containing step1_baseline,
step2_ptr, step3_sdm, step4_qr, consistency, and call.
## Not run: # This example requires coordinate data not included in the bundled # maharashtra_panel dataset; see ?sdm_impact for a worked example. data(maharashtra_panel) units <- sort(unique(maharashtra_panel$district)) coords <- data.frame( longitude = stats::runif(length(units), 73, 80), latitude = stats::runif(length(units), 16, 22) ) result <- tsq_panel( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, index = c("district", "year"), threshold_var = "MCDS_days", coords = coords, n_boot_ptr = 50, n_boot_qr = 50 ) print(result) ## End(Not run)## Not run: # This example requires coordinate data not included in the bundled # maharashtra_panel dataset; see ?sdm_impact for a worked example. data(maharashtra_panel) units <- sort(unique(maharashtra_panel$district)) coords <- data.frame( longitude = stats::runif(length(units), 73, 80), latitude = stats::runif(length(units), 16, 22) ) result <- tsq_panel( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, index = c("district", "year"), threshold_var = "MCDS_days", coords = coords, n_boot_ptr = 50, n_boot_qr = 50 ) print(result) ## End(Not run)
Produces diagnostic plots for TSQ protocol results: a likelihood ratio profile for threshold identification, a spillover decomposition bar chart, or a distributional gradient plot across quantiles.
tsq_plot(x, type = "all", ...)tsq_plot(x, type = "all", ...)
x |
An object of class |
type |
Character. One of |
... |
Currently unused. |
A ggplot object, or a named list of ggplot objects when
type = "all" and x is of class tsq_panel.
data(maharashtra_panel) ptr <- ptr_hansen( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, threshold_var = "MCDS_days", index = c("district", "year"), n_boot = 50, verbose = FALSE ) p <- tsq_plot(ptr, type = "threshold")data(maharashtra_panel) ptr <- ptr_hansen( formula = ln_GDVA ~ ln_HYV + MCDS_days + Irrigation_pct, data = maharashtra_panel, threshold_var = "MCDS_days", index = c("district", "year"), n_boot = 50, verbose = FALSE ) p <- tsq_plot(ptr, type = "threshold")