Panel count data are ubiquitous, such as the sales of products month by month and the views of videos day by day. There are two common issues with modeling panel count data:
The PanelCount package implements multiple models to address both issues. Specifically, it supports the estimation of the following models:
Let i and t index individual and time, respectively. The conditional mean of the PoissonRE model is specified as follows:
E[yit|xit, vi] = exp(βxit′ + σvi)
where xit represents the set of covariates influencing the outcome yit, and vi denotes the individual level random effects and is assumed to follow the standard normal distribution. σ2 represents the variance of the random effect.
The conditional mean of the PLN_RE model is specified as follows:
E[yit|xit, vi, ϵit] = exp(βxit′ + σvi + γϵit)
where vi represents individual random effects and ϵit represents individual-time level random effects. Both are assumed to follow a standard normal distribution. σ2 and γ2 represent the variances of the individual and individual-time level random effects, respectively.
The specification of the ProbitRE model is given byzit = 1(αwit′ + δui + ξit > 0)
where wit represents the set of covariates influencing individual i’s decision in period t, and where ui represents the individual level random effect following the standard normal distribution, with the variance of the random effect captured by δ2. The variance of the individual-time level random shock ξit is normalized to 1 to ensure unique identification.
This model estimates the following selection and outcome equations jontly, allowing the random effects at the individual level to be correlated.
Selection Equation (ProbitRE): zit = 1(αwit′ + δui + ξit > 0)
Outcome Equation (PoissonRE): E[yit|xit, vi] = exp(βxit′ + σvi)
Sample Selection at individual level: $$\begin{pmatrix} u_i \\ v_i \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}\right). $$
This model estimates the following selection and outcome equations jontly, allowing the random effects (or error terms) at both the individual and individual-time level to be respectively correlated.
Selection Equation (ProbitRE): zit = 1(αwit′ + δui + ξit > 0)
Outcome Equation (PLN_RE): E[yit|xit, vi, ϵit] = exp(βxit′ + σvi + γϵit)
Sample Selection at individual level: $$\begin{pmatrix} u_i \\ v_i \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}\right). $$
Sample Selection at individual-time level: $$\begin{pmatrix} \xi_{it} \\ \epsilon_{it} \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & \tau \\ \tau & 1 \end{pmatrix}\right). $$
We begin with simulating a dataset with 200 invidiuals and 10 periods using the following data generating process (DGP):
zit = 1(1 + xit + wit + ui + ξit > 0)
E[yit|xit, vi, ϵit] = exp(−1 + xit + vi + ϵit)
$$\begin{pmatrix} u_i \\ v_i \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & 0.25 \\ 0.25 & 1 \end{pmatrix}\right). $$
$$\begin{pmatrix} \xi_{it} \\ \epsilon_{it} \end{pmatrix}\sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix},\begin{pmatrix} 1 & 0.5 \\ 0.5 & 1 \end{pmatrix}\right). $$
library(MASS)
library(PanelCount)
set.seed(1)
N = 200
periods = 10
rho = 0.25
tau = 0.5
id = rep(1:N, each=periods)
time = rep(1:periods, N)
x = rnorm(N*periods)
w = rnorm(N*periods)
# correlated random effects at the individual level
r = mvrnorm(N, mu=c(0,0), Sigma=matrix(c(1,rho,rho,1), nrow=2))
r1 = rep(r[,1], each=periods)
r2 = rep(r[,2], each=periods)
# correlated error terms at the individual-time level
e = mvrnorm(N*periods, mu=c(0,0), Sigma=matrix(c(1,tau,tau,1), nrow=2))
e1 = e[,1]
e2 = e[,2]
# selection
z = as.numeric(1+x+w+r1+e1>0)
# outcome
y = rpois(N*periods, exp(-1+x+r2+e2))
y[z==0] = NA
sim = data.frame(id,time,x,w,z,y)
head(sim)
#> id time x w z y
#> 1 1 1 -0.6264538 -0.88614959 0 NA
#> 2 1 2 0.1836433 -1.92225490 0 NA
#> 3 1 3 -0.8356286 1.61970074 1 0
#> 4 1 4 1.5952808 0.51926990 1 0
#> 5 1 5 0.3295078 -0.05584993 1 0
#> 6 1 6 -0.8204684 0.69641761 1 0
Next, we estimate the true parameters in the above DGP using various models. In particular, we examine whether we can recover the true value of x’s coefficient in the second stage.
m1 = PoissonRE(y~x, data=sim[!is.na(sim$y), ], id.name='id', verbose=-1)
round(m1$estimates, digits=3)
#> estimate se z p lci uci
#> (Intercept) -0.498 0.091 -5.496 0 -0.675 -0.320
#> x 0.887 0.024 36.800 0 0.840 0.934
#> sigma 1.125 0.066 17.065 0 1.003 1.262
The estimate of x is biased because the above model fails to consider the individual-time level fixed effects and the sample selection issue in the true DGP.
m2 = PLN_RE(y~x, data=sim[!is.na(sim$y), ], id.name='id', verbose=-1)
round(m2$estimates, digits=3)
#> estimate se z p lci uci
#> (Intercept) -0.921 0.100 -9.204 0 -1.117 -0.725
#> x 0.932 0.052 17.964 0 0.830 1.034
#> sigma 1.056 0.078 13.519 0 0.914 1.221
#> gamma 0.951 0.048 19.721 0 0.861 1.050
The estimate of x is still biased because the above model fails to consider the sample selection issue in the true DGP.
m3 = ProbitRE(z~x+w, data=sim, id.name='id', verbose=-1)
round(m3$estimates, digits=3)
#> estimate se z p lci uci
#> (Intercept) 0.985 0.086 11.401 0 0.816 1.154
#> x 0.991 0.058 17.013 0 0.877 1.105
#> w 1.041 0.060 17.220 0 0.923 1.160
#> delta 0.937 0.084 11.130 0 0.786 1.117
The specification of this model is consistent with the DGP of the first stage. Therefore, it can produce consistent estimates of the parameters in the first stage.
m4 = ProbitRE_PoissonRE(z~x+w, y~x, data=sim, id.name='id', verbose=-1)
round(m4$estimates, digits=3)
#> estimate se z p lci uci
#> (Intercept) 0.962 0.090 10.683 0.000 0.786 1.139
#> x 0.994 0.060 16.683 0.000 0.877 1.110
#> w 1.042 0.062 16.896 0.000 0.921 1.163
#> (Intercept) -0.542 0.041 -13.185 0.000 -0.623 -0.462
#> x 0.884 0.010 92.126 0.000 0.865 0.903
#> delta 0.923 0.080 11.511 0.000 0.778 1.094
#> sigma 0.741 0.015 48.099 0.000 0.711 0.771
#> rho 0.213 0.071 2.993 0.003 0.070 0.347
The results above the second “(Intercept)” are for the first stage. After accounting for self-selection at the individual level, the estimate of x in the second stage is still biased because the true DGP also includes self-selection at the individual-time level.
# The estimation may take up to 1 minute
m5 = ProbitRE_PLNRE(z~x+w, y~x, data=sim, id.name='id', verbose=-1)
round(m5$estimates, digits=3)
#> estimate se z p lci uci
#> (Intercept) 1.013 0.092 10.986 0 0.832 1.193
#> x 0.995 0.058 17.111 0 0.881 1.109
#> w 1.049 0.061 17.240 0 0.930 1.169
#> (Intercept) -1.021 0.116 -8.776 0 -1.249 -0.793
#> x 1.028 0.043 23.678 0 0.943 1.113
#> delta 0.947 0.083 11.458 0 0.798 1.123
#> sigma 1.170 0.055 21.271 0 1.067 1.283
#> gamma 0.995 0.037 26.556 0 0.924 1.071
#> rho 0.338 0.094 3.609 0 0.144 0.507
#> tau 0.598 0.139 4.318 0 0.261 0.805
The results above the second “(Intercept)” are for the first stage. The specification of this model is consistent with the true DGP and hence the estimate of x is very close to its true value 1.
The estimation of ProbitRE_PoissonRE and ProbitRE_PLNRE does not require a variable like w that exclusively influences the first-stage outcome, but the identification is stronger with such a variable.
Peng, J., & Van den Bulte, C. (2023). Participation vs. Effectiveness in Sponsored Tweet Campaigns: A Quality-Quantity Conundrum. Management Science (forthcoming). Available at SSRN: https://www.ssrn.com/abstract=2702053
Peng, J., & Van den Bulte, C. (2015). How to Better Target and Incent Paid Endorsers in Social Advertising Campaigns: A Field Experiment. 2015 International Conference on Information Systems. https://aisel.aisnet.org/icis2015/proceedings/SocialMedia/24/