| Title: | Identification and Estimation of Child Penalties |
|---|---|
| Description: | Tools to simulate child-penalty data and estimate DID, TD, and NTD identification frameworks from Leventer (2025), "Identification of Child Penalties" <doi:10.48550/arXiv.2602.07486>. |
| Authors: | Dor Leventer [aut, cre] |
| Maintainer: | Dor Leventer <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.3 |
| Built: | 2026-06-02 18:55:02 UTC |
| Source: | https://github.com/cran/childpen |
Takes the stacked output of multiple_treatment_group_analysis() and
computes three aggregate estimands across treatment groups for each event
time:
aggregate_estimands( results, weights = "sample", methods = c("DID_Female", "DID_Male", "TD", "NTD_Conv", "NTD_New"), include_pre = FALSE )aggregate_estimands( results, weights = "sample", methods = c("DID_Female", "DID_Male", "TD", "NTD_Conv", "NTD_New"), include_pre = FALSE )
results |
A |
weights |
How to weight treatment groups. One of:
|
methods |
Character vector of methods to aggregate. Defaults to all five main methods. |
include_pre |
Logical. If |
)Weighted average of the group-specific normalised effects
across treatment groups . This is the
preferred estimand because it averages effects that are already scaled
by each group's baseline.
)Ratio of the weighted-average ATE to the weighted-average APO. The
implicit weight on each group is , giving
higher-earning groups more influence.
)Weighted average of NTD_New (estimand == "Delta_rho") across
treatment groups – the aggregate gender-inequality estimand.
Standard errors. When the results object carries
influence-function (IF) data from multiple_treatment_group_analysis(),
aggregate SEs account for dependence across treatment groups caused by shared
control individuals.
With weights = "sample", the IF additionally accounts for estimation
of the weights, following the formula in Leventer (2025, Appendix G):
where and is the IF of the group
proportion.
With fixed weights (NULL or a named vector), the second term drops
out and the IF reduces to .
For ratio_of_avgs, the delta method is applied to the ratio
using the aggregate IFs
for the numerator and denominator.
If IF data is not available (e.g., when the user supplies a manually constructed results table), SEs are computed under an independence approximation with a warning.
Handling missing cells. Not every treatment group produces an
estimate for every event time (due to max_age / min_age
bounds). The function operates on whichever groups are present for each
cell and reports how many via n_groups. If weights is
supplied as a named vector, only the entries whose names appear in the
observed treatment groups are used; the remaining weights are dropped and
the retained weights are renormalised.
A data.frame with one row per
event_time by estimand by method by agg_type
combination, containing:
event_time – event time
estimand – "APO", "ATE", "theta", or "Delta_rho"
method – method name
agg_type – one of "avg_of_ratios",
"ratio_of_avgs", "gender_ineq"
est – aggregate estimate
se – standard error (see Details)
ci_l, ci_h – 95 \
n_groups – number of treatment groups contributing
set.seed(1) sim <- simulate_data(n_individuals = 500) res <- multiple_treatment_group_analysis(sim, treatment_groups = 24:25, periods_post = 2, verbose = FALSE) agg <- aggregate_estimands(res) head(agg)set.seed(1) sim <- simulate_data(n_individuals = 500) res <- multiple_treatment_group_analysis(sim, treatment_groups = 24:25, periods_post = 2, verbose = FALSE) agg <- aggregate_estimands(res) head(agg)
Child penalty analysis over multiple treatment groups
multiple_treatment_group_analysis( data, treatment_groups, periods_post, periods_pre = 4, max_age = 999, min_age = 0, pre = 1, Y_name = "Y", age_name = "age", D_name = "D", id_name = "id", female_name = "female", verbose = TRUE )multiple_treatment_group_analysis( data, treatment_groups, periods_post, periods_pre = 4, max_age = 999, min_age = 0, pre = 1, Y_name = "Y", age_name = "age", D_name = "D", id_name = "id", female_name = "female", verbose = TRUE )
data |
A data.frame or data.table with the needed columns. Names can be
mapped via |
treatment_groups |
Integer vector of treatment groups (e.g., 24:34). |
periods_post |
Integer H >= 0. Post-treatment horizons; evaluates event times e = 0, 1, ..., H with target age a = d + e and control dp = a + 1. |
periods_pre |
Integer K >= 0 (default 4). Number of pre-treatment horizons.
Evaluates e = -K, ..., -pre with a = d + e. For each pre period, tests the same
control offsets used post, i.e., dp = d + 1, 2, ..., H + 1. Set |
max_age |
Integer (default 999). Upper bound; cells with dp > max_age are skipped. |
min_age |
Integer (default 0). Lower bound; cells with a < min_age are skipped. |
pre |
Integer (default 1). Pre-treatment anchor used in APO (uses d - pre). |
Y_name, age_name, D_name, id_name, female_name
|
Column name mappings passed to |
verbose |
Logical (default |
A data.frame stacking results from single_treatment_group_analysis().
set.seed(1) sim <- simulate_data(n_individuals = 500) res <- multiple_treatment_group_analysis(sim, treatment_groups = 24:25, periods_post = 2, verbose = FALSE) head(res)set.seed(1) sim <- simulate_data(n_individuals = 500) res <- multiple_treatment_group_analysis(sim, treatment_groups = 24:25, periods_post = 2, verbose = FALSE) head(res)
Generates a balanced panel with lifecycle earnings, a gender gap, selection on treatment timing, and gendered treatment effects. The DGP is:
simulate_data(n_individuals = 10000, treatment_groups = 24:28, seed = 42)simulate_data(n_individuals = 10000, treatment_groups = 24:28, seed = 42)
n_individuals |
Integer. Number of individuals (default 10 000). |
treatment_groups |
Integer vector. Treatment groups to include
(default |
seed |
Integer or |
where is a permanent individual
effect and is a
transitory shock. The term generates positive selection
on treatment timing: individuals who have children later earn more, on
average, than those who have children earlier.
A data.frame with columns id, female,
age, D, Y.
sim <- simulate_data(n_individuals = 2000) head(sim)sim <- simulate_data(n_individuals = 2000) head(sim)
Estimates 15 descriptive estimands for triplet (treatment group, control group and target age). SEs are calcualted using influence-function (IF) calculations with clustering within id.
single_treatment_group_analysis( data, d, dp, a, pre = 1, Y_name = "Y", age_name = "age", D_name = "D", id_name = "id", female_name = "female" )single_treatment_group_analysis( data, d, dp, a, pre = 1, Y_name = "Y", age_name = "age", D_name = "D", id_name = "id", female_name = "female" )
data |
A
|
d |
Integer. Treatment group (age at first childbirth) |
dp |
Integer. Control group (closest not-yet-treated group) |
a |
Integer. Target age. |
pre |
Integer, default |
Y_name, age_name, D_name, id_name, female_name
|
Column name mappings passed to |
Let denote the mean outcome at age for gender
(1 = female) when assigned to group .
The core components are:
APO(g; d, d', a)
ATE(g; d, d', a)
(g)
From these, the cross-gender contrasts are formed:
TD
NTD_Conv
NTD_New
TD Null and NTD_Conv_Null variants are defined analogously under a null-effect-for-fathers bias-correction.
Internally, influence functions for all pieces are written into temporary
columns of a data.table via compute_mean_if(), and cluster-robust
standard errors are computed by summing the IFs at the id level via se_cluster().
A data.frame with one row per estimand/method combination:
estimand — one of "APO", "ATE", "theta", "Delta_rho"
method — one of "DID_Female", "DID_Male", "TD",
"NTD_Conv", "NTD_New", "TD_Null", "NTD_Conv_Null"
est — estimate
se — cluster-robust standard error
n_female_treat, n_female_control,
n_male_treat, n_male_control — sample counts
Requires helper functions compute_mean_if() and se_cluster().
set.seed(1) sim <- simulate_data(n_individuals = 500) res <- single_treatment_group_analysis(sim, d = 25, dp = 26, a = 26, pre = 1) head(res)set.seed(1) sim <- simulate_data(n_individuals = 500) res <- single_treatment_group_analysis(sim, d = 25, dp = 26, a = 26, pre = 1) head(res)