Title: | Joint Estimation of Latent Groups and Group-Specific Coefficients in Panel Data Models |
---|---|
Description: | Latent group structures are a common challenge in panel data analysis. Disregarding group-level heterogeneity can introduce bias. Conversely, estimating individual coefficients for each cross-sectional unit is inefficient and may lead to high uncertainty. This package addresses the issue of unobservable group structures by implementing the pairwise adaptive group fused Lasso (PAGFL) by Mehrabani (2023) <doi:10.1016/j.jeconom.2022.12.002>. PAGFL identifies latent group structures and group-specific coefficients in a single step. On top of that, we extend the PAGFL to time-varying coefficient functions. |
Authors: | Paul Haimerl [aut, cre] , Stephan Smeekes [ctb] , Ines Wilms [ctb] , Ali Mehrabani [ctb] |
Maintainer: | Paul Haimerl <[email protected]> |
License: | AGPL (>= 3) |
Version: | 1.1.2 |
Built: | 2024-12-11 06:43:48 UTC |
Source: | CRAN |
Estimate a grouped panel data model given an observed group structure. Slope parameters are homogeneous within groups but heterogeneous across groups. This function supports both static and dynamic panel data models, with or without endogenous regressors.
grouped_plm( formula, data, groups, index = NULL, n_periods = NULL, method = "PLS", Z = NULL, bias_correc = FALSE, rho = 0.07 * log(N * n_periods)/sqrt(N * n_periods), verbose = TRUE, parallel = TRUE, ... ) ## S3 method for class 'gplm' print(x, ...) ## S3 method for class 'gplm' formula(x, ...) ## S3 method for class 'gplm' df.residual(object, ...) ## S3 method for class 'gplm' summary(object, ...) ## S3 method for class 'gplm' coef(object, ...) ## S3 method for class 'gplm' residuals(object, ...) ## S3 method for class 'gplm' fitted(object, ...)
grouped_plm( formula, data, groups, index = NULL, n_periods = NULL, method = "PLS", Z = NULL, bias_correc = FALSE, rho = 0.07 * log(N * n_periods)/sqrt(N * n_periods), verbose = TRUE, parallel = TRUE, ... ) ## S3 method for class 'gplm' print(x, ...) ## S3 method for class 'gplm' formula(x, ...) ## S3 method for class 'gplm' df.residual(object, ...) ## S3 method for class 'gplm' summary(object, ...) ## S3 method for class 'gplm' coef(object, ...) ## S3 method for class 'gplm' residuals(object, ...) ## S3 method for class 'gplm' fitted(object, ...)
formula |
a formula object describing the model to be estimated. |
data |
a |
groups |
a numerical or character vector of length |
index |
a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit |
n_periods |
the number of observed time periods |
method |
the estimation method. Options are
Default is |
Z |
a |
bias_correc |
logical. If |
rho |
a tuning parameter balancing the fitness and penalty terms in the IC. If left unspecified, the heuristic |
verbose |
logical. If |
parallel |
logical. If |
... |
ellipsis |
x |
of class |
object |
of class |
Consider the grouped panel data model
where is the scalar dependent variable,
is an individual fixed effect,
is a
vector of explanatory variables, and
is a zero mean error.
The coefficient vector
is subject to the observed group pattern
with ,
and
for any
,
.
Using PLS, the group-specific coefficients for group are obtained via OLS
where ,
to concentrate out the individual fixed effects
(within-transformation).
In case of PGMM, the slope coefficients are derived as
where is a
p.d. symmetric weight matrix and
denotes the first difference operator
(first-difference transformation).
An object of class gplm
holding
model |
a |
coefficients |
a |
groups |
a |
residuals |
a vector of residuals of the demeaned model, |
fitted |
a vector of fitted values of the demeaned model, |
args |
a |
IC |
a |
call |
the function call. |
A gplm
object has print
, summary
, fitted
, residuals
, formula
, df.residual
, and coef
S3 methods.
Paul Haimerl
Dhaene, G., & Jochmans, K. (2015). Split-panel jackknife estimation of fixed-effect models. The Review of Economic Studies, 82(3), 991-1030. doi:10.1093/restud/rdv007. Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.
# Simulate a panel with a group structure sim <- sim_DGP(N = 20, n_periods = 80, p = 2, n_groups = 3) y <- sim$y X <- sim$X groups <- sim$groups df <- cbind(y = c(y), X) # Estimate the grouped panel data model estim <- grouped_plm(y ~ ., data = df, groups = groups, n_periods = 80, method = "PLS") summary(estim) # Lets pass a panel data set with explicit cross-sectional and time indicators i_index <- rep(1:20, each = 80) t_index <- rep(1:80, 20) df <- data.frame(y = c(y), X, i_index = i_index, t_index = t_index) estim <- grouped_plm( y ~ ., data = df, index = c("i_index", "t_index"), groups = groups, method = "PLS" ) summary(estim)
# Simulate a panel with a group structure sim <- sim_DGP(N = 20, n_periods = 80, p = 2, n_groups = 3) y <- sim$y X <- sim$X groups <- sim$groups df <- cbind(y = c(y), X) # Estimate the grouped panel data model estim <- grouped_plm(y ~ ., data = df, groups = groups, n_periods = 80, method = "PLS") summary(estim) # Lets pass a panel data set with explicit cross-sectional and time indicators i_index <- rep(1:20, each = 80) t_index <- rep(1:80, 20) df <- data.frame(y = c(y), X, i_index = i_index, t_index = t_index) estim <- grouped_plm( y ~ ., data = df, index = c("i_index", "t_index"), groups = groups, method = "PLS" ) summary(estim)
Estimate a grouped time-varying panel data model given an observed group structure. Coefficient functions are homogeneous within groups but heterogeneous across groups. The time-varying coefficients are modeled as polynomial B-splines. The function supports both static and dynamic panel data models.
grouped_tv_plm( formula, data, groups, index = NULL, n_periods = NULL, d = 3, M = floor(length(y)^(1/7) - log(p)), const_coef = NULL, rho = 0.04 * log(N * n_periods)/sqrt(N * n_periods), verbose = TRUE, parallel = TRUE, ... ) ## S3 method for class 'tv_gplm' summary(object, ...) ## S3 method for class 'tv_gplm' formula(x, ...) ## S3 method for class 'tv_gplm' df.residual(object, ...) ## S3 method for class 'tv_gplm' print(x, ...) ## S3 method for class 'tv_gplm' coef(object, ...) ## S3 method for class 'tv_gplm' residuals(object, ...) ## S3 method for class 'tv_gplm' fitted(object, ...)
grouped_tv_plm( formula, data, groups, index = NULL, n_periods = NULL, d = 3, M = floor(length(y)^(1/7) - log(p)), const_coef = NULL, rho = 0.04 * log(N * n_periods)/sqrt(N * n_periods), verbose = TRUE, parallel = TRUE, ... ) ## S3 method for class 'tv_gplm' summary(object, ...) ## S3 method for class 'tv_gplm' formula(x, ...) ## S3 method for class 'tv_gplm' df.residual(object, ...) ## S3 method for class 'tv_gplm' print(x, ...) ## S3 method for class 'tv_gplm' coef(object, ...) ## S3 method for class 'tv_gplm' residuals(object, ...) ## S3 method for class 'tv_gplm' fitted(object, ...)
formula |
a formula object describing the model to be estimated. |
data |
a |
groups |
a numerical or character vector of length |
index |
a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit |
n_periods |
the number of observed time periods |
d |
the polynomial degree of the B-splines. Default is 3. |
M |
the number of interior knots of the B-splines. If left unspecified, the default heuristic |
const_coef |
a character vector containing the variable names of explanatory variables that enter with time-constant coefficients. |
rho |
the tuning parameter balancing the fitness and penalty terms in the IC. If left unspecified, the heuristic |
verbose |
logical. If |
parallel |
logical. If |
... |
ellipsis |
object |
of class |
x |
of class |
Consider the grouped time-varying panel data model
where is the scalar dependent variable,
is an individual fixed effect,
is a
vector of explanatory variables, and
is a zero mean error.
The coefficient vector
is subject to the observed group pattern
with ,
and
for any
,
.
and, in turn,
is estimated as polynomial B-splines using the penalized sieve-technique. To this end, let
denote a
vector of polynomial spline basis functions, where
represents the polynomial degree and
gives the number of interior knots of the B-spline.
is approximated by forming a linear combination of the basis functions
, where
is a
coefficient matrix.
The explanatory variables are projected onto the spline basis system, which results in the vector
. Subsequently, the DGP can be reformulated as
where if
,
, and
reflects a sieve approximation error. We refer to Su et al. (2019, sec. 2) for more details on the sieve technique.
Finally, is obtained as
, where the vector of control points
is estimated using OLS
and ,
to concentrate out the fixed effect
(within-transformation).
In case of an unbalanced panel data set, the earliest and latest available observations per group define the start and end-points of the interval on which the group-specific time-varying coefficients are defined.
An object of class tv_gplm
holding
model |
a |
coefficients |
let |
groups |
a |
residuals |
a vector of residuals of the demeaned model, |
fitted |
a vector of fitted values of the demeaned model, |
args |
a |
IC |
a |
call |
the function call. |
An object of class tv_gplm
has print
, summary
, fitted
, residuals
, formula
, df.residual
and coef
S3 methods.
Paul Haimerl
Su, L., Wang, X., & Jin, S. (2019). Sieve estimation of time-varying panel data models with latent structures. Journal of Business & Economic Statistics, 37(2), 334-349. doi:10.1080/07350015.2017.1340299.
# Simulate a time-varying panel with a trend and a group pattern set.seed(1) sim <- sim_tv_DGP(N = 10, n_periods = 50, intercept = TRUE, p = 2) df <- data.frame(y = c(sim$y)) groups <- sim$groups # Estimate the time-varying grouped panel data model estim <- grouped_tv_plm(y ~ ., data = df, n_periods = 50, groups = groups) summary(estim)
# Simulate a time-varying panel with a trend and a group pattern set.seed(1) sim <- sim_tv_DGP(N = 10, n_periods = 50, intercept = TRUE, p = 2) df <- data.frame(y = c(sim$y)) groups <- sim$groups # Estimate the time-varying grouped panel data model estim <- grouped_tv_plm(y ~ ., data = df, n_periods = 50, groups = groups) summary(estim)
Estimate panel data models with a latent group structure using the pairwise adaptive group fused Lasso (PAGFL) by Mehrabani (2023). The PAGFL jointly identifies the group structure and group-specific slope parameters. The function supports both static and dynamic panels, with or without endogenous regressors.
pagfl( formula, data, index = NULL, n_periods = NULL, lambda, method = "PLS", Z = NULL, min_group_frac = 0.05, bias_correc = FALSE, kappa = 2, max_iter = 5000, tol_convergence = 1e-08, tol_group = 0.001, rho = 0.07 * log(N * n_periods)/sqrt(N * n_periods), varrho = max(sqrt(5 * N * n_periods * p)/log(N * n_periods * p) - 7, 1), verbose = TRUE, parallel = TRUE, ... ) ## S3 method for class 'pagfl' print(x, ...) ## S3 method for class 'pagfl' formula(x, ...) ## S3 method for class 'pagfl' df.residual(object, ...) ## S3 method for class 'pagfl' summary(object, ...) ## S3 method for class 'pagfl' coef(object, ...) ## S3 method for class 'pagfl' residuals(object, ...) ## S3 method for class 'pagfl' fitted(object, ...)
pagfl( formula, data, index = NULL, n_periods = NULL, lambda, method = "PLS", Z = NULL, min_group_frac = 0.05, bias_correc = FALSE, kappa = 2, max_iter = 5000, tol_convergence = 1e-08, tol_group = 0.001, rho = 0.07 * log(N * n_periods)/sqrt(N * n_periods), varrho = max(sqrt(5 * N * n_periods * p)/log(N * n_periods * p) - 7, 1), verbose = TRUE, parallel = TRUE, ... ) ## S3 method for class 'pagfl' print(x, ...) ## S3 method for class 'pagfl' formula(x, ...) ## S3 method for class 'pagfl' df.residual(object, ...) ## S3 method for class 'pagfl' summary(object, ...) ## S3 method for class 'pagfl' coef(object, ...) ## S3 method for class 'pagfl' residuals(object, ...) ## S3 method for class 'pagfl' fitted(object, ...)
formula |
a formula object describing the model to be estimated. |
data |
a |
index |
a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit |
n_periods |
the number of observed time periods |
lambda |
the tuning parameter determining the strength of the penalty term. Either a single |
method |
the estimation method. Options are
Default is |
Z |
a |
min_group_frac |
the minimum group cardinality as a fraction of the total number of individuals |
bias_correc |
logical. If |
kappa |
the a non-negative weight used to obtain the adaptive penalty weights. Default is 2. |
max_iter |
the maximum number of iterations for the ADMM estimation algorithm. Default is |
tol_convergence |
the tolerance limit for the stopping criterion of the iterative ADMM estimation algorithm. Default is |
tol_group |
the tolerance limit for within-group differences. Two individuals |
rho |
the tuning parameter balancing the fitness and penalty terms in the IC that determines the penalty parameter |
varrho |
the non-negative Lagrangian ADMM penalty parameter. For PLS, the |
verbose |
logical. If |
parallel |
logical. If |
... |
ellipsis |
x |
of class |
object |
of class |
Consider the grouped panel data model
where is the scalar dependent variable,
is an individual fixed effect,
is a
vector of weakly exogenous explanatory variables, and
is a zero mean error.
The coefficient vector
is subject to the latent group pattern
with ,
and
for any
,
.
The PLS method jointly estimates the latent group structure and group-specific coefficients by minimizing the criterion
with respect to .
,
to concentrate out the individual fixed effects
.
is the penalty tuning parameter and
reflects adaptive penalty weights (see Mehrabani, 2023, eq. 2.6).
denotes the Frobenius norm.
The adaptive weights
are obtained by a preliminary individual least squares estimation.
The criterion function is minimized via an iterative alternating direction method of multipliers (ADMM) algorithm (see Mehrabani, 2023, sec. 5.1).
PGMM employs a set of instruments to control for endogenous regressors. Using PGMM,
is estimated by minimizing
are obtained by an initial GMM estimation.
gives the first differences operator
.
represents a data-driven
weight matrix. I refer to Mehrabani (2023, eq. 2.10) for more details.
Again, the criterion function is minimized using an efficient ADMM algorithm (Mehrabani, 2023, sec. 5.2).
Two individuals are assigned to the same group if , where
is determined by
tol_group
. Subsequently, the number of groups follows as the number of distinct elements in . Given an estimated group structure, it is straightforward to obtain post-Lasso estimates using group-wise least squares or GMM (see
grouped_plm
).
We recommend identifying a suitable parameter by passing a logarithmically spaced grid of candidate values with a lower limit close to 0 and an upper limit that leads to a fully homogeneous panel. A BIC-type information criterion then selects the best fitting
value.
An object of class pagfl
holding
model |
a |
coefficients |
a |
groups |
a |
residuals |
a vector of residuals of the demeaned model, |
fitted |
a vector of fitted values of the demeaned model, |
args |
a |
IC |
a |
convergence |
a |
call |
the function call. |
A pagfl
object has print
, summary
, fitted
, residuals
, formula
, df.residual
, and coef
S3 methods.
Paul Haimerl
Dhaene, G., & Jochmans, K. (2015). Split-panel jackknife estimation of fixed-effect models. The Review of Economic Studies, 82(3), 991-1030. doi:10.1093/restud/rdv007. Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.
# Simulate a panel with a group structure sim <- sim_DGP(N = 20, n_periods = 80, p = 2, n_groups = 3) y <- sim$y X <- sim$X df <- cbind(y = c(y), X) # Run the PAGFL procedure estim <- pagfl(y ~ ., data = df, n_periods = 80, lambda = 0.5, method = "PLS") summary(estim) # Lets pass a panel data set with explicit cross-sectional and time indicators i_index <- rep(1:20, each = 80) t_index <- rep(1:80, 20) df <- data.frame(y = c(y), X, i_index = i_index, t_index = t_index) estim <- pagfl( y ~ ., data = df, index = c("i_index", "t_index"), lambda = 0.5, method = "PLS" ) summary(estim)
# Simulate a panel with a group structure sim <- sim_DGP(N = 20, n_periods = 80, p = 2, n_groups = 3) y <- sim$y X <- sim$X df <- cbind(y = c(y), X) # Run the PAGFL procedure estim <- pagfl(y ~ ., data = df, n_periods = 80, lambda = 0.5, method = "PLS") summary(estim) # Lets pass a panel data set with explicit cross-sectional and time indicators i_index <- rep(1:20, each = 80) t_index <- rep(1:80, 20) df <- data.frame(y = c(y), X, i_index = i_index, t_index = t_index) estim <- pagfl( y ~ ., data = df, index = c("i_index", "t_index"), lambda = 0.5, method = "PLS" ) summary(estim)
Construct a static or dynamic, exogenous or endogenous panel data set subject to a group structure in the slope coefficients with optional or
innovations.
sim_DGP( N = 50, n_periods = 40, p = 2, n_groups = 3, group_proportions = NULL, error_spec = "iid", dynamic = FALSE, dyn_panel = lifecycle::deprecated(), q = NULL, alpha_0 = NULL )
sim_DGP( N = 50, n_periods = 40, p = 2, n_groups = 3, group_proportions = NULL, error_spec = "iid", dynamic = FALSE, dyn_panel = lifecycle::deprecated(), q = NULL, alpha_0 = NULL )
N |
the number of cross-sectional units. Default is 50. |
n_periods |
the number of simulated time periods |
p |
the number of explanatory variables. Default is 2. |
n_groups |
the number of groups |
group_proportions |
a numeric vector of length |
error_spec |
options include
Default is |
dynamic |
Logical. If |
dyn_panel |
|
q |
the number of exogenous instruments when a panel with endogenous regressors is to be simulated. If panel data set with exogenous regressors is supposed to be generated, pass |
alpha_0 |
a |
The scalar dependent variable is generated according to the following grouped panel data model
represents individual fixed effects and
a
vector of regressors.
The individual slope coefficient vectors
are subject to a group structure
with ,
and
for any
,
. The total number of groups
is determined by
n_groups
.
If a panel data set with exogenous regressors is generated (set q = NULL
), the explanatory variables are simulated according to
where denotes a series of innovations.
and
are independent of each other.
In case alpha_0 = NULL
, the group-level slope parameters are drawn from
.
If a dynamic panel is specified (dynamic = TRUE
), the coefficients
are drawn from a uniform distribution with support
and
.
Moreover, the individual fixed effects enter the dependent variable via
to account for the autoregressive dependency.
We refer to Mehrabani (2023, sec 6) for details.
When specifying an endogenous panel (set q
to ), the
correlate with the cross-sectional innovations
by a magnitude of 0.5 to produce endogenous regressors (
). However, the endogenous regressors can be accounted for by exploiting the
instruments in
, for which
holds.
The instruments and the first stage coefficients are generated in the same fashion as
and
when
q = NULL
.
The function nests, among other, the DGPs employed in the simulation study of Mehrabani (2023, sec. 6).
A list holding
alpha |
the |
groups |
a vector indicating the group memberships |
y |
a |
X |
a |
Z |
a |
data |
a |
Paul Haimerl
Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.
# Simulate DGP 1 from Mehrabani (2023, sec. 6) alpha_0_DGP1 <- matrix(c(0.4, 1, 1.6, 1.6, 1, 0.4), ncol = 2) DGP1 <- sim_DGP( N = 50, n_periods = 20, p = 2, n_groups = 3, group_proportions = c(.4, .3, .3), alpha_0 = alpha_0_DGP1 )
# Simulate DGP 1 from Mehrabani (2023, sec. 6) alpha_0_DGP1 <- matrix(c(0.4, 1, 1.6, 1.6, 1, 0.4), ncol = 2) DGP1 <- sim_DGP( N = 50, n_periods = 20, p = 2, n_groups = 3, group_proportions = c(.4, .3, .3), alpha_0 = alpha_0_DGP1 )
Construct a time-varying panel data set subject to a group structure in the slope coefficients with optional innovations.
sim_tv_DGP( N = 50, n_periods = 40, intercept = TRUE, p = 1, n_groups = 3, d = 3, dynamic = FALSE, group_proportions = NULL, error_spec = "iid", locations = NULL, scales = NULL, polynomial_coef = NULL, sd_error = 1, DGP = lifecycle::deprecated() )
sim_tv_DGP( N = 50, n_periods = 40, intercept = TRUE, p = 1, n_groups = 3, d = 3, dynamic = FALSE, group_proportions = NULL, error_spec = "iid", locations = NULL, scales = NULL, polynomial_coef = NULL, sd_error = 1, DGP = lifecycle::deprecated() )
The scalar dependent variable is generated according to the following time-varying grouped panel data model
where is an individual fixed effect and
is a
vector of explanatory variables.
The coefficient vector
is subject to the group pattern
with ,
and
for any
,
. The total number of groups
is determined by
n_groups
.
The predictors are simulated as:
where denotes a series of innovations.
and
are independent of each other.
The errors feature a
standard normal distribution.
In case locations = NULL
, the location parameters are drawn from .
In case
scales = NULL
, the scale parameters are drawn from .
In case
polynomial_coef = NULL
, the polynomial coefficients are drawn from and normalized so that all coefficients of one polynomial sum up to 1.
The final coefficient function follows as
, where
denotes a cumulative logistic distribution function and
reflects a polynomial coefficient.
A list holding
alpha |
a |
beta |
a |
groups |
a vector indicating the group memberships |
y |
a |
X |
a |
data |
a |
Paul Haimerl
# Simulate a time-varying panel subject to a time trend and a group structure sim <- sim_tv_DGP(N = 20, n_periods = 50, intercept = TRUE, p = 1) y <- sim$y
# Simulate a time-varying panel subject to a time trend and a group structure sim <- sim_tv_DGP(N = 20, n_periods = 50, intercept = TRUE, p = 1) y <- sim$y
Estimate a time-varying panel data model with a latent group structure using the pairwise adaptive group fused lasso (time-varying PAGFL). The time-varying PAGFL jointly identifies the latent group structure and group-specific time-varying functional coefficients. The time-varying coefficients are modeled as polynomial B-splines. The function supports both static and dynamic panel data models.
tv_pagfl( formula, data, index = NULL, n_periods = NULL, lambda, d = 3, M = floor(length(y)^(1/7) - log(p)), min_group_frac = 0.05, const_coef = NULL, kappa = 2, max_iter = 50000, tol_convergence = 1e-10, tol_group = 0.001, rho = 0.04 * log(N * n_periods)/sqrt(N * n_periods), varrho = 1, verbose = TRUE, parallel = TRUE, ... ) ## S3 method for class 'tvpagfl' summary(object, ...) ## S3 method for class 'tvpagfl' formula(x, ...) ## S3 method for class 'tvpagfl' df.residual(object, ...) ## S3 method for class 'tvpagfl' print(x, ...) ## S3 method for class 'tvpagfl' coef(object, ...) ## S3 method for class 'tvpagfl' residuals(object, ...) ## S3 method for class 'tvpagfl' fitted(object, ...)
tv_pagfl( formula, data, index = NULL, n_periods = NULL, lambda, d = 3, M = floor(length(y)^(1/7) - log(p)), min_group_frac = 0.05, const_coef = NULL, kappa = 2, max_iter = 50000, tol_convergence = 1e-10, tol_group = 0.001, rho = 0.04 * log(N * n_periods)/sqrt(N * n_periods), varrho = 1, verbose = TRUE, parallel = TRUE, ... ) ## S3 method for class 'tvpagfl' summary(object, ...) ## S3 method for class 'tvpagfl' formula(x, ...) ## S3 method for class 'tvpagfl' df.residual(object, ...) ## S3 method for class 'tvpagfl' print(x, ...) ## S3 method for class 'tvpagfl' coef(object, ...) ## S3 method for class 'tvpagfl' residuals(object, ...) ## S3 method for class 'tvpagfl' fitted(object, ...)
formula |
a formula object describing the model to be estimated. |
data |
a |
index |
a character vector holding two strings. The first string denotes the name of the index variable identifying the cross-sectional unit |
n_periods |
the number of observed time periods |
lambda |
the tuning parameter determining the strength of the penalty term. Either a single |
d |
the polynomial degree of the B-splines. Default is 3. |
M |
the number of interior knots of the B-splines. If left unspecified, the default heuristic |
min_group_frac |
the minimum group cardinality as a fraction of the total number of individuals |
const_coef |
a character vector containing the variable names of explanatory variables that enter with time-constant coefficients. |
kappa |
the a non-negative weight used to obtain the adaptive penalty weights. Default is 2. |
max_iter |
the maximum number of iterations for the ADMM estimation algorithm. Default is |
tol_convergence |
the tolerance limit for the stopping criterion of the iterative ADMM estimation algorithm. Default is |
tol_group |
the tolerance limit for within-group differences. Two individuals are assigned to the same group if the Frobenius norm of their coefficient vector difference is below this threshold. Default is |
rho |
the tuning parameter balancing the fitness and penalty terms in the IC that determines the penalty parameter |
varrho |
the non-negative Lagrangian ADMM penalty parameter. For the employed penalized sieve estimation PSE, the |
verbose |
logical. If |
parallel |
logical. If |
... |
ellipsis |
object |
of class |
x |
of class |
Consider the grouped time-varying panel data model
where is the scalar dependent variable,
is an individual fixed effect,
is a
vector of explanatory variables, and
is a zero mean error.
The coefficient vector
is subject to the latent group pattern
with ,
and
for any
,
.
The time-varying coefficient functions are estimated as polynomial B-splines using the penalized sieve-technique. To this end, let denote a
vector basis functions, where
denotes the polynomial degree and
the number of interior knots.
Then,
and
are approximated by forming linear combinations of the basis functions
and
, where
and
are
coefficient matrices.
The explanatory variables are projected onto the spline basis system, which results in the vector
. Subsequently, the DGP can be reformulated as
where and
reflects a sieve approximation error. We refer to Su et al. (2019, sec. 2) for more details on the sieve technique.
Inspired by Su et al. (2019) and Mehrabani (2023), the time-varying PAGFL jointly estimates the functional coefficients and the group structure by minimizing the criterion
with respect to .
,
to concentrate out the individual fixed effects
.
is the penalty tuning parameter and
denotes adaptive penalty weights which are obtained by a preliminary non-penalized estimation.
represents the Frobenius norm.
The solution criterion function is minimized via the iterative alternating direction method of multipliers (ADMM) algorithm proposed by Mehrabani (2023, sec. 5.1).
Two individuals are assigned to the same group if , where
is determined by
tol_group
. Subsequently, the number of groups follows as the number of distinct elements in . Given an estimated group structure, it is straightforward to obtain post-Lasso estimates
using group-wise least squares (see
grouped_tv_plm
).
We recommend identifying a suitable parameter by passing a logarithmically spaced grid of candidate values with a lower limit close to 0 and an upper limit that leads to a fully homogeneous panel. A BIC-type information criterion then selects the best fitting
value.
In case of an unbalanced panel data set, the earliest and latest available observations per group define the start and end-points of the interval on which the group-specific time-varying coefficients are defined.
An object of class tvpagfl
holding
model |
a |
coefficients |
let |
groups |
a |
residuals |
a vector of residuals of the demeaned model, |
fitted |
a vector of fitted values of the demeaned model, |
args |
a |
IC |
a |
convergence |
a |
call |
the function call. |
An object of class tvpagfl
has print
, summary
, fitted
, residuals
, formula
, df.residual
and coef
S3 methods.
Paul Haimerl
Mehrabani, A. (2023). Estimation and identification of latent group structures in panel data. Journal of Econometrics, 235(2), 1464-1482. doi:10.1016/j.jeconom.2022.12.002.
Su, L., Wang, X., & Jin, S. (2019). Sieve estimation of time-varying panel data models with latent structures. Journal of Business & Economic Statistics, 37(2), 334-349. doi:10.1080/07350015.2017.1340299.
# Simulate a time-varying panel with a trend and a group pattern set.seed(1) sim <- sim_tv_DGP(N = 10, n_periods = 50, intercept = TRUE, p = 1) df <- data.frame(y = c(sim$y)) # Run the time-varying PAGFL estim <- tv_pagfl(y ~ ., data = df, n_periods = 50, lambda = 10, parallel = FALSE) summary(estim)
# Simulate a time-varying panel with a trend and a group pattern set.seed(1) sim <- sim_tv_DGP(N = 10, n_periods = 50, intercept = TRUE, p = 1) df <- data.frame(y = c(sim$y)) # Run the time-varying PAGFL estim <- tv_pagfl(y ~ ., data = df, n_periods = 50, lambda = 10, parallel = FALSE) summary(estim)