---
title: "Asymmetric Poisson Pseudo-Maximum Likelihood (APPML)"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Asymmetric Poisson Pseudo-Maximum Likelihood (APPML)}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```r
library(capybara)
```

## Overview

Standard Poisson Pseudo-Maximum Likelihood (PPML) estimates the conditional mean of the outcome
variable. `fepoisson_asymmetric()` extends this by fitting conditional expectiles — a
generalization that lets you examine different parts of the conditional distribution with the
same efficiency gains of high-dimensional fixed effects.

The implementation follows Bergstrand et al. (2025): instead of minimizing a symmetric loss, the
estimator minimizes an *asymmetric* weighted loss controlled by a single parameter `expectile`
($\tau \in (0, 1)$):

- $\tau = 0.5$ — standard PPML (symmetric, estimates the conditional mean).
- $\tau < 0.5$ — places more weight on observations with *negative* residuals, pulling estimates
  toward the lower part of the distribution.
- $\tau > 0.5$ — places more weight on observations with *positive* residuals, shifting estimates
  toward the upper part of the distribution.

The `expectile` is set through `fit_control()` and defaults to `0.5`, so if you don't specify it,
`fepoisson_asymmetric()` will behave like standard PPML.

## Estimating lower and upper expectiles

Shifting `expectile` away from 0.5 lets you trace how the entire conditional distribution
of `mpg` varies with `wt`, after absorbing the `cyl` fixed effects.

```r
ross2004_subset <- ross2004[ross2004$year %in% seq(1989, 1999, 5), ]
ross2004_subset$trade <- exp(ross2004_subset$ltrade)
ross2004_subset$exp_year <- paste0(ross2004_subset$ctry1, ross2004_subset$year)
ross2004_subset$imp_year <- paste0(ross2004_subset$ctry2, ross2004_subset$year)

# 10th expectile — sensitive to small values of mpg
fit10 <- fepoisson_asymmetric(
  trade ~ ldist + border + comlang + colony | exp_year + imp_year,
  data    = ross2004_subset,
  control = fit_control(expectile = 0.1)
)

summary(fit10)
```

```r
Formula: trade ~ ldist + border + comlang + colony | exp_year + imp_year

Family: Poisson

Estimates:

|         | Estimate | Std. Error | z value     | Pr(>|z|)  |
|---------|----------|------------|-------------|-----------|
| ldist   |  -1.0916 |     0.0000 | -39358.0438 | 0.0000 ** |
| border  |   0.3419 |     0.0001 |   5729.9222 | 0.0000 ** |
| comlang |   0.3352 |     0.0001 |   5700.9311 | 0.0000 ** |
| colony  |   0.3942 |     0.0001 |   5416.8894 | 0.0000 ** |

Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10

Fixed effects:
  exp_year: 457
  imp_year: 457

Number of observations: Full 21450; Missing 0; Perfect classification 0 

Number of Fisher Scoring iterations: 17
```

```r
# 90th expectile — sensitive to large values of mpg
fit90 <- fepoisson_asymmetric(
  trade ~ ldist + border + comlang + colony | exp_year + imp_year,
  data    = ross2004_subset,
  control = fit_control(expectile = 0.9)
)

summary(fit90)
```

```r
Formula: trade ~ ldist + border + comlang + colony | exp_year + imp_year

Family: Poisson

Estimates:

|         | Estimate | Std. Error | z value     | Pr(>|z|)  |
|---------|----------|------------|-------------|-----------|
| ldist   |  -0.9096 |     0.0000 | -45409.0443 | 0.0000 ** |
| border  |   0.3160 |     0.0000 |   7382.8698 | 0.0000 ** |
| comlang |   0.1897 |     0.0000 |   4789.5546 | 0.0000 ** |
| colony  |   0.4657 |     0.0001 |   7910.0068 | 0.0000 ** |

Significance codes: ** p < 0.01; * p < 0.05; + p < 0.10

Fixed effects:
  exp_year: 457
  imp_year: 457

Number of observations: Full 21450; Missing 0; Perfect classification 0 

Number of Fisher Scoring iterations: 18
```

## Comparing coefficients across expectiles

A natural use-case is comparing how the effect of a regressor changes across the distribution.
The table below collects the coefficient on `wt` at the three expectile levels:

```r
summary_table(
  fit10, fit90,
  model_names = c("10th expectile", "90th expectile")
)
```

```r
|     Variable     |   10th expectile   |   90th expectile   |
|------------------|--------------------|--------------------|
| ldist            |           -1.092** |           -0.910** |
|                  |            (0.000) |            (0.000) |
| border           |            0.342** |            0.316** |
|                  |            (0.000) |            (0.000) |
| comlang          |            0.335** |            0.190** |
|                  |            (0.000) |            (0.000) |
| colony           |            0.394** |            0.466** |
|                  |            (0.000) |            (0.000) |
|                  |                    |                    |
| Fixed effects    |                    |                    |
| exp_year         |                Yes |                Yes |
| imp_year         |                Yes |                Yes |
|                  |                    |                    |
| N                |             21,450 |             21,450 |

Standard errors in parenthesis
Significance levels: ** p < 0.01; * p < 0.05; + p < 0.10
```

The log-distance coefficient shrinks in magnitude from `fit10` to `fit90`, which indicates that
trade much less with each other at the top of the distribution compared to the bottom of the
distribution.

## Convergence diagnostics

The outer APPML loop updates observation weights until the coefficient vector stops changing.
You can inspect convergence through the returned list elements:

```r
cat("Outer loop converged:", fit10$conv_outer, "\n")
cat("Outer iterations:    ", fit10$iter_outer, "\n")
cat("Expectile used:      ", fit10$expectile, "\n")
```

```r
Outer loop converged: TRUE 
Outer iterations:     4 
Expectile used:       0.1
```

## References

Bergstrand, Jeffrey H., Matthew W. Clance, and JMC Santos Silva. "The tails of gravity: Using
expectiles to quantify the trade-margins effects of economic integration agreements."
*Journal of International Economics* (2025): 104145.