Package 'childpen'

Title: Identification and Estimation of Child Penalties
Description: Tools to simulate child-penalty data and estimate DID, TD, and NTD identification frameworks from Leventer (2025), "Identification of Child Penalties" <doi:10.48550/arXiv.2602.07486>.
Authors: Dor Leventer [aut, cre]
Maintainer: Dor Leventer <[email protected]>
License: MIT + file LICENSE
Version: 0.2.3
Built: 2026-06-02 18:55:02 UTC
Source: https://github.com/cran/childpen

Help Index


Aggregate estimands across treatment groups

Description

Takes the stacked output of multiple_treatment_group_analysis() and computes three aggregate estimands across treatment groups for each event time:

Usage

aggregate_estimands(
  results,
  weights = "sample",
  methods = c("DID_Female", "DID_Male", "TD", "NTD_Conv", "NTD_New"),
  include_pre = FALSE
)

Arguments

results

A data.frame as returned by multiple_treatment_group_analysis(), with at minimum the columns d, event_time, estimand, method, est, and se. If results carries influence-function data (attached automatically by multiple_treatment_group_analysis()), standard errors account for shared control groups across treatment groups.

weights

How to weight treatment groups. One of:

  • "sample" (default): use sample-proportion weights as in Leventer (2025). Within-gender weights wg,d=ng,d/ng,d~w_{g,d} = n_{g,d}/\sum n_{g,\tilde d} are used for DID_Female and DID_Male; cross-gender weights wd=nd/nd~w_d = n_d / \sum n_{\tilde d} for TD, NTD_Conv, and NTD_New. Standard errors account for estimation of the weights.

  • NULL: uniform weights (equal weight per group).

  • A named numeric vector: custom fixed weights (names = treatment groups as characters). Values are renormalised to sum to 1.

methods

Character vector of methods to aggregate. Defaults to all five main methods.

include_pre

Logical. If TRUE, also aggregate pre-treatment event times (event_time < 0). Default FALSE.

Details

avg_of_ratios (θAgg,1\theta_{\text{Agg},1})

Weighted average of the group-specific normalised effects θ(g,d,d+e)\theta(g, d, d+e) across treatment groups dd. This is the preferred estimand because it averages effects that are already scaled by each group's baseline.

ratio_of_avgs (θAgg,2\theta_{\text{Agg},2})

Ratio of the weighted-average ATE to the weighted-average APO. The implicit weight on each group is pdAPOdp_d \cdot \text{APO}_d, giving higher-earning groups more influence.

gender_ineq (ΔρAgg\Delta\rho_{\text{Agg}})

Weighted average of NTD_New (estimand == "Delta_rho") across treatment groups – the aggregate gender-inequality estimand.

Standard errors. When the results object carries influence-function (IF) data from multiple_treatment_group_analysis(), aggregate SEs account for dependence across treatment groups caused by shared control individuals.

With weights = "sample", the IF additionally accounts for estimation of the weights, following the formula in Leventer (2025, Appendix G):

ψA(e)=d[wdψBd+BdA(e)Mψpd]\psi_{A(e)} = \sum_d \left[ w_d\,\psi_{B_d} + \frac{B_d - A(e)}{M}\,\psi_{p_d} \right]

where M=dpdM = \sum_d p_d and ψpd\psi_{p_d} is the IF of the group proportion.

With fixed weights (NULL or a named vector), the second term drops out and the IF reduces to dwdψBd\sum_d w_d\,\psi_{B_d}.

For ratio_of_avgs, the delta method is applied to the ratio μˉATE/μˉAPO\bar\mu_{\text{ATE}} / \bar\mu_{\text{APO}} using the aggregate IFs for the numerator and denominator.

If IF data is not available (e.g., when the user supplies a manually constructed results table), SEs are computed under an independence approximation with a warning.

Handling missing cells. Not every treatment group produces an estimate for every event time (due to max_age / min_age bounds). The function operates on whichever groups are present for each cell and reports how many via n_groups. If weights is supplied as a named vector, only the entries whose names appear in the observed treatment groups are used; the remaining weights are dropped and the retained weights are renormalised.

Value

A data.frame with one row per event_time by estimand by method by agg_type combination, containing:

  • event_time – event time

  • estimand"APO", "ATE", "theta", or "Delta_rho"

  • method – method name

  • agg_type – one of "avg_of_ratios", "ratio_of_avgs", "gender_ineq"

  • est – aggregate estimate

  • se – standard error (see Details)

  • ci_l, ci_h – 95 \

  • n_groups – number of treatment groups contributing

Examples

set.seed(1)
sim <- simulate_data(n_individuals = 500)
res <- multiple_treatment_group_analysis(sim, treatment_groups = 24:25,
                                         periods_post = 2, verbose = FALSE)
agg <- aggregate_estimands(res)
head(agg)

Child penalty analysis over multiple treatment groups

Description

Child penalty analysis over multiple treatment groups

Usage

multiple_treatment_group_analysis(
  data,
  treatment_groups,
  periods_post,
  periods_pre = 4,
  max_age = 999,
  min_age = 0,
  pre = 1,
  Y_name = "Y",
  age_name = "age",
  D_name = "D",
  id_name = "id",
  female_name = "female",
  verbose = TRUE
)

Arguments

data

A data.frame or data.table with the needed columns. Names can be mapped via Y_name, age_name, D_name, id_name, female_name.

treatment_groups

Integer vector of treatment groups (e.g., 24:34).

periods_post

Integer H >= 0. Post-treatment horizons; evaluates event times e = 0, 1, ..., H with target age a = d + e and control dp = a + 1.

periods_pre

Integer K >= 0 (default 4). Number of pre-treatment horizons. Evaluates e = -K, ..., -pre with a = d + e. For each pre period, tests the same control offsets used post, i.e., dp = d + 1, 2, ..., H + 1. Set NULL to skip pre-trends.

max_age

Integer (default 999). Upper bound; cells with dp > max_age are skipped.

min_age

Integer (default 0). Lower bound; cells with a < min_age are skipped.

pre

Integer (default 1). Pre-treatment anchor used in APO (uses d - pre).

Y_name, age_name, D_name, id_name, female_name

Column name mappings passed to prep_data_table().

verbose

Logical (default TRUE). Print progress messages.

Value

A data.frame stacking results from single_treatment_group_analysis().

Examples

set.seed(1)
sim <- simulate_data(n_individuals = 500)
res <- multiple_treatment_group_analysis(sim, treatment_groups = 24:25, periods_post = 2,
                                         verbose = FALSE)
head(res)

Simulate panel data for child-penalty estimation

Description

Generates a balanced panel with lifecycle earnings, a gender gap, selection on treatment timing, and gendered treatment effects. The DGP is:

Usage

simulate_data(n_individuals = 10000, treatment_groups = 24:28, seed = 42)

Arguments

n_individuals

Integer. Number of individuals (default 10 000).

treatment_groups

Integer vector. Treatment groups to include (default 24:28).

seed

Integer or NULL. RNG seed (default 42). The caller's RNG state is saved and restored on exit, so calling this function does not alter the global random stream. Set to NULL to draw from the current RNG state without reseeding.

Details

logYit=μ0+λDi+αi+β1(a20)+β2(a20)2+γ1[f]+θf1[f,aD]+θm1[m,aD]+εit\log Y_{it} = \mu_0 + \lambda D_i + \alpha_i + \beta_1 (a - 20) + \beta_2 (a - 20)^2 + \gamma \cdot \mathbf{1}[f] + \theta_f \cdot \mathbf{1}[f, a \ge D] + \theta_m \cdot \mathbf{1}[m, a \ge D] + \varepsilon_{it}

where αiN(0,σα2)\alpha_i \sim N(0,\sigma_\alpha^2) is a permanent individual effect and εitN(0,σε2)\varepsilon_{it} \sim N(0,\sigma_\varepsilon^2) is a transitory shock. The term λDi\lambda D_i generates positive selection on treatment timing: individuals who have children later earn more, on average, than those who have children earlier.

Value

A data.frame with columns id, female, age, D, Y.

Examples

sim <- simulate_data(n_individuals = 2000)
head(sim)

Unconditional estimands for a single treatment group

Description

Estimates 15 descriptive estimands for triplet (treatment group, control group and target age). SEs are calcualted using influence-function (IF) calculations with clustering within id.

Usage

single_treatment_group_analysis(
  data,
  d,
  dp,
  a,
  pre = 1,
  Y_name = "Y",
  age_name = "age",
  D_name = "D",
  id_name = "id",
  female_name = "female"
)

Arguments

data

A data.table with columns:

  • id — cluster identifier (i.e., person)

  • age — integer age

  • female — 0/1 indicator (1 = females)

  • D — treatment group

  • Y — numeric outcome

d

Integer. Treatment group (age at first childbirth)

dp

Integer. Control group (closest not-yet-treated group)

a

Integer. Target age.

pre

Integer, default 1. Offset used for the pre-treatment anchor.

Y_name, age_name, D_name, id_name, female_name

Column name mappings passed to prep_data_table().

Details

Let Y(a,g,d)Y(a, g, d^\star) denote the mean outcome at age aa for gender g{0,1}g \in \{0,1\} (1 = female) when assigned to group dd^\star. The core components are:

  • APO(g; d, d', a) =Y(dpre,g,d)+Y(a,g,d)Y(dpre,g,d)= Y(d-\mathrm{pre}, g, d) + Y(a, g, d') - Y(d-\mathrm{pre}, g, d')

  • ATE(g; d, d', a) =Y(a,g,d)APO(g;d,d,a)= Y(a, g, d) - \mathrm{APO}(g; d, d', a)

  • θ\theta(g) =ATE(g)/APO(g)= \mathrm{ATE}(g) / \mathrm{APO}(g)

From these, the cross-gender contrasts are formed:

  • TD =ATE(F)ATE(M)= \mathrm{ATE}(F) - \mathrm{ATE}(M)

  • NTD_Conv =θ(F)θ(M)= \theta(F) - \theta(M)

  • NTD_New =Y(a,F,d)Y(a,M,d)APO(F)APO(M)= \frac{Y(a,F,d)}{Y(a,M,d)} - \frac{\mathrm{APO}(F)}{\mathrm{APO}(M)}

  • TD Null and NTD_Conv_Null variants are defined analogously under a null-effect-for-fathers bias-correction.

Internally, influence functions for all pieces are written into temporary columns of a data.table via compute_mean_if(), and cluster-robust standard errors are computed by summing the IFs at the id level via se_cluster().

Value

A data.frame with one row per estimand/method combination:

  • estimand — one of "APO", "ATE", "theta", "Delta_rho"

  • method — one of "DID_Female", "DID_Male", "TD", "NTD_Conv", "NTD_New", "TD_Null", "NTD_Conv_Null"

  • est — estimate

  • se — cluster-robust standard error

  • n_female_treat, n_female_control, n_male_treat, n_male_control — sample counts

Note

Requires helper functions compute_mean_if() and se_cluster().

Examples

set.seed(1)
sim <- simulate_data(n_individuals = 500)
res <- single_treatment_group_analysis(sim, d = 25, dp = 26, a = 26, pre = 1)
head(res)