| Title: | Triple-Difference Estimators |
|---|---|
| Description: | Implements triple-difference (DDD) estimators for both average treatment effects and event-study parameters. Methods include regression adjustment, inverse-probability weighting, and doubly-robust estimators, all of which rely on a conditional DDD parallel-trends assumption and allow covariate adjustment across multiple pre- and post-treatment periods. The methodology is detailed in Ortiz-Villavicencio and Sant'Anna (2025) <doi:10.48550/arXiv.2505.09942>. |
| Authors: | Marcelo Ortiz-Villavicencio [aut, cre], Pedro H. C. Sant'Anna [aut] |
| Maintainer: | Marcelo Ortiz-Villavicencio <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0 |
| Built: | 2026-05-16 05:17:02 UTC |
| Source: | https://github.com/cran/triplediff |
agg_ddd is a function that take group-time average treatment effects
and aggregate them into a smaller number of summary parameters in staggered triple differences designs.
There are several possible aggregations including "simple", "eventstudy", "group",
and "calendar". Default is "eventstudy".
agg_ddd( ddd_obj, type = "eventstudy", balance_e = NULL, min_e = -Inf, max_e = Inf, na.rm = FALSE, boot = NULL, nboot = NULL, cband = NULL, alpha = 0.05 )agg_ddd( ddd_obj, type = "eventstudy", balance_e = NULL, min_e = -Inf, max_e = Inf, na.rm = FALSE, boot = NULL, nboot = NULL, cband = NULL, alpha = 0.05 )
ddd_obj |
a |
type |
Which type of aggregated treatment effect parameter to compute.
|
balance_e |
If set (and if one computes event study), it balances
the sample with respect to event time. For example, if |
min_e |
For event studies, this is the smallest event time to compute
dynamic effects for. By default, |
max_e |
For event studies, this is the largest event time to compute
dynamic effects for. By default, |
na.rm |
Logical value if we are to remove missing Values from analyses. Defaults is FALSE. |
boot |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
nboot |
The number of bootstrap iterations to use. The default is the value set in the ddd object,
and this is only applicable if |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
alpha |
The level of confidence for the confidence intervals. The default is 0.05. Otherwise, it will use the value set in the ddd object. |
A object (list) of class agg_ddd that holds the results from the
aggregation step.
#---------------------------------------------------------- # Triple Diff with multiple time periods #---------------------------------------------------------- data <- gen_dgp_mult_periods(size = 500, dgp_type = 1)[["data"]] out <- ddd(yname = "y", tname = "time", idname = "id", gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4, data = data, control_group = "nevertreated", base_period = "varying", est_method = "dr") # Simple aggregation agg_ddd(out, type = "simple", alpha = 0.10) # Event study aggregation agg_ddd(out, type = "eventstudy", alpha = 0.10) # Group aggregation agg_ddd(out, type = "group", alpha = 0.10) # Calendar aggregation agg_ddd(out, type = "calendar", alpha = 0.10)#---------------------------------------------------------- # Triple Diff with multiple time periods #---------------------------------------------------------- data <- gen_dgp_mult_periods(size = 500, dgp_type = 1)[["data"]] out <- ddd(yname = "y", tname = "time", idname = "id", gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4, data = data, control_group = "nevertreated", base_period = "varying", est_method = "dr") # Simple aggregation agg_ddd(out, type = "simple", alpha = 0.10) # Event study aggregation agg_ddd(out, type = "eventstudy", alpha = 0.10) # Group aggregation agg_ddd(out, type = "group", alpha = 0.10) # Calendar aggregation agg_ddd(out, type = "calendar", alpha = 0.10)
ddd is the main function for computing the Doubly Robust DDD estimators for the ATT, with balanced panel data.
It can be used with covariates and/or under multiple time periods. At its core, triplediff employs
the doubly robust estimator for the ATT, which is a combination of the propensity score weighting and the outcome regression.
Furthermore, this package supports the application of machine learning methods for the estimation of the nuisance parameters.
ddd( yname, tname, idname = NULL, gname, pname, xformla, data, control_group = NULL, base_period = NULL, est_method = "dr", panel = TRUE, allow_unbalanced_panel = FALSE, weightsname = NULL, boot = FALSE, nboot = NULL, cluster = NULL, cband = FALSE, alpha = 0.05, use_parallel = FALSE, cores = 1, inffunc = FALSE, skip_data_checks = FALSE )ddd( yname, tname, idname = NULL, gname, pname, xformla, data, control_group = NULL, base_period = NULL, est_method = "dr", panel = TRUE, allow_unbalanced_panel = FALSE, weightsname = NULL, boot = FALSE, nboot = NULL, cluster = NULL, cband = FALSE, alpha = 0.05, use_parallel = FALSE, cores = 1, inffunc = FALSE, skip_data_checks = FALSE )
yname |
The name of the outcome variable. |
tname |
The name of the column containing the time periods. |
idname |
The name of the column containing the unit id. |
gname |
The name of the column containing the first period when a particular observation is treated. It is a positive number for treated units and defines which group the unit belongs to. It takes value 0 or Inf for untreated units. |
pname |
The name of the column containing the partition variable (e.g., the subgroup identifier). This is an indicator variable that is 1 for the units eligible for treatment and 0 otherwise. |
xformla |
The formula for the covariates to be included in the model. It should be of the form |
data |
A data frame or data table containing the data. |
control_group |
Valid for multiple periods only. The control group to be used in the estimation. Default is |
base_period |
Valid for multiple periods. Choose between a "varying" or "universal" base period. Both yield the same post-treatment ATT(g,t) estimates. Varying base period: Computes pseudo-ATT in pre-treatment periods by comparing outcome changes for a group to its comparison group from t-1 to t, repeatedly changing t. Universal base period: Fixes the base period to (g-1), reporting average changes from t to (g-1) for a group relative to its comparison group, similar to event study regressions. Varying base period reports ATT(g,t) right before treatment. Universal base period normalizes the estimate before treatment to be 0, adding one extra estimate in an earlier period. |
est_method |
The estimation method to be used. Default is |
panel |
Logical. If |
allow_unbalanced_panel |
Logical. If |
weightsname |
The name of the column containing the weights. Default is |
boot |
Logical. If |
nboot |
The number of bootstrap samples to be used. Default is |
cluster |
The name of the variable to be used for clustering. The maximum number of cluster variables is 1. Default is |
cband |
Logical. If |
alpha |
The level of significance for the confidence intervals. Default is |
use_parallel |
Logical. If |
cores |
The number of cores to be used in the parallel processing. Default is |
inffunc |
Logical. If |
skip_data_checks |
Logical. If |
A ddd object with the following basic elements:
ATT |
The average treatment effect on the treated. |
se |
The standard error of the ATT. |
uci |
The upper confidence interval of the ATT. |
lci |
The lower confidence interval of the ATT. |
inf_func |
The estimate of the influence function. |
#---------------------------------------------------------- # Triple Diff with covariates and 2 time periods #---------------------------------------------------------- set.seed(1234) # Set seed for reproducibility # Simulate data for a two-periods DDD setup df <- gen_dgp_2periods(size = 5000, dgp_type = 1)$data head(df) att_22 <- ddd(yname = "y", tname = "time", idname = "id", gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4, data = df, control_group = "nevertreated", est_method = "dr") summary(att_22) # Performing clustered standard errors with mutiplier bootstrap att_cluster <- ddd(yname = "y", tname = "time", idname = "id", gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4, data = df, control_group = "nevertreated", base_period = "universal", est_method = "dr", boot = TRUE, nboot = 500, cband = TRUE, cluster = "cluster") summary(att_cluster) #---------------------------------------------------------- # Triple Diff with multiple time periods #---------------------------------------------------------- data <- gen_dgp_mult_periods(size = 1000, dgp_type = 1)[["data"]] ddd(yname = "y", tname = "time", idname = "id", gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4, data = data, control_group = "nevertreated", base_period = "varying", est_method = "dr")#---------------------------------------------------------- # Triple Diff with covariates and 2 time periods #---------------------------------------------------------- set.seed(1234) # Set seed for reproducibility # Simulate data for a two-periods DDD setup df <- gen_dgp_2periods(size = 5000, dgp_type = 1)$data head(df) att_22 <- ddd(yname = "y", tname = "time", idname = "id", gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4, data = df, control_group = "nevertreated", est_method = "dr") summary(att_22) # Performing clustered standard errors with mutiplier bootstrap att_cluster <- ddd(yname = "y", tname = "time", idname = "id", gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4, data = df, control_group = "nevertreated", base_period = "universal", est_method = "dr", boot = TRUE, nboot = 500, cband = TRUE, cluster = "cluster") summary(att_cluster) #---------------------------------------------------------- # Triple Diff with multiple time periods #---------------------------------------------------------- data <- gen_dgp_mult_periods(size = 1000, dgp_type = 1)[["data"]] ddd(yname = "y", tname = "time", idname = "id", gname = "state", pname = "partition", xformla = ~cov1 + cov2 + cov3 + cov4, data = data, control_group = "nevertreated", base_period = "varying", est_method = "dr")
Generate panel data with a single treatment date and two periods
gen_dgp_2periods(size, dgp_type)gen_dgp_2periods(size, dgp_type)
size |
Integer. Number of units. |
dgp_type |
Integer in {1,2,3,4}. 1 = both nuisance functions correct; 2 = only the outcome model correct; 3 = only the propensity score correct; 4 = both nuisance functions incorrect. |
A list with the following elements:
A data.table in long format with columns:
id: unit identifier
state: state variable
time: time variable
partition: partition assignment
x1, x2, x3, x4: covariates
y: outcome variable
cluster: cluster ID (no within-cluster correlation)
True average treatment effect on the treated (ATT), set to 0.
Oracle ATT computed under the unfeasible specification.
Theoretical efficiency bound for the estimator.
Generate panel data where units adopt treatment at different times across three periods.
gen_dgp_mult_periods(size, dgp_type = 1, include_covariates = TRUE)gen_dgp_mult_periods(size, dgp_type = 1, include_covariates = TRUE)
size |
Integer. Number of units to simulate. |
dgp_type |
Integer in {1,2,3,4}. Only used when |
include_covariates |
Logical. If |
A named list with components:
A data.table in long format with columns:
id: unit identifier
cohort: first period when treatment is assigned
partition: partition indicator
x1, x2, x3, x4: covariates
cluster: cluster identifier (no within-cluster correlation)
time: time period index
y: observed outcome
A data.table in wide format (one row per id) with columns:
id, cohort, partition, x1, x2, x3, x4, cluster
y_t0, y_t1, y_t2: outcomes in periods 0, 1, and 2
(Only if include_covariates = TRUE) Unfeasible (oracle) event-study parameter at time 0.
(Only if include_covariates = TRUE) Proportion of units with cohort == 2 and eligibility in period 1.
(Only if include_covariates = TRUE) Proportion of units with cohort == 3 and eligibility in period 1.