| Title: | Heckman Selection Models Based on Bayesian Analysis |
|---|---|
| Description: | Implements Heckman selection models using a Bayesian approach via 'Stan' and compares the performance of normal, Student’s t, and contaminated normal distributions in addressing complexities and selection bias (Heeju Lim, Victor E. Lachos, and Victor H. Lachos, Bayesian analysis of flexible Heckman selection models using Hamiltonian Monte Carlo, 2025, under submission). |
| Authors: | Heeju Lim [aut, cre], Victor E. Lachos [aut], Victor H. Lachos [aut] |
| Maintainer: | Heeju Lim <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-05-28 08:53:41 UTC |
| Source: | https://github.com/cran/HeckmanStan |
'geraHeckman()' generates a random sample from the Heckman selection model (Normal, Student-t or CN).
geraHeckman(x, w, beta, gamma, sigma2, rho, nu, family = "T")geraHeckman(x, w, beta, gamma, sigma2, rho, nu, family = "T")
x |
A covariate matrix for the response y. |
w |
A covariate matrix for the missing indicator cc. |
beta |
Values for the beta vector. |
gamma |
Values for the gamma vector. |
sigma2 |
Value for the variance. |
rho |
Value for the dependence between the response and missing value. |
nu |
When using the t- distribution, the initial value for the degrees of freedom. |
family |
The distribution family to be used (Normal, T, or CN). |
Return an object with the response (y) and missing values (cc).
Lachos, V. H., Prates, M. O., & Dey, D. K. (2021). Heckman selection-t model: Parameter estimation via the EM-algorithm. Journal of Multivariate Analysis, 184, 104737.
n <- 100 rho <- .6 cens <- 0.25 nu <- 4 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) family <- "T" c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma<- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) data <- geraHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family)n <- 100 rho <- .6 cens <- 0.25 nu <- 4 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) family <- "T" c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma<- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) data <- geraHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family)
'HeckmanStan()' fits the Heckman selection model using a Bayesian approach to address sample selection bias.
HeckmanStan( y, x, w, cc, family = "CN", init = "random", thin = 5, chains = 1, iter = 10, warmup = 5 )HeckmanStan( y, x, w, cc, family = "CN", init = "random", thin = 5, chains = 1, iter = 10, warmup = 5 )
y |
A response vector. |
x |
A covariate matrix for the response y. |
w |
A covariate matrix for the missing indicator cc. |
cc |
A missing indicator vector (1=observed, 0=missing) . |
family |
The distribution family to be used (Normal, T, or CN). |
init |
Parameters specifies the initial values for model parameters. |
thin |
An Interval at which samples are retained from the MCMC process to reduce autocorrelation. |
chains |
The number of chains to run during the MCMC sampling. Running multiple chains is useful for checking convergence. |
iter |
The total number of iterations for the MCMC sampling, determining how many samples will be drawn. |
warmup |
The number of initial iterations that will be discarded as the algorithm stabilizes before collecting samples. |
An object of class HeckmanStan, which is a list containing two elements:
list[[1]]: Includes inference results from the Stan model, along with EAIC and EBIC.
list[[2]]: Includes the HPC confidence intervals, along with LOOIC, WAIC, and CPO.
################################################################################ # Simulation ################################################################################ library(mvtnorm) n<- 100 w<- cbind(1,rnorm(n),rnorm(n)) x<- cbind(w[,1:2]) family="CN" sigma2<- 1 rho<-0.7 beta<- c(1,0.5) gamma<- c(1,0.3,-.5) nu=c(0.1,0.1) data<-geraHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family) y<-data$y cc<-data$cc # Fit Heckman Normal Stan model fit.n_stan <- HeckmanStan(y, x, w, cc, family="Normal" , thin = 5, chains = 1, iter = 10000, warmup = 1000) qoi=c("beta","gamma","sigma_e","sigma2", "rho","EAIC","EBIC") print(fit.n_stan[[1]],par=qoi) print(fit.n_stan[[2]]) require(rstan) plot(fit.n_stan[[1]], pars=qoi) plot(fit.n_stan[[1]], plotfun="hist", pars=qoi) plot(fit.n_stan[[1]], plotfun="trace", pars=qoi) plot(fit.n_stan[[1]], plotfun = "rhat")################################################################################ # Simulation ################################################################################ library(mvtnorm) n<- 100 w<- cbind(1,rnorm(n),rnorm(n)) x<- cbind(w[,1:2]) family="CN" sigma2<- 1 rho<-0.7 beta<- c(1,0.5) gamma<- c(1,0.3,-.5) nu=c(0.1,0.1) data<-geraHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family) y<-data$y cc<-data$cc # Fit Heckman Normal Stan model fit.n_stan <- HeckmanStan(y, x, w, cc, family="Normal" , thin = 5, chains = 1, iter = 10000, warmup = 1000) qoi=c("beta","gamma","sigma_e","sigma2", "rho","EAIC","EBIC") print(fit.n_stan[[1]],par=qoi) print(fit.n_stan[[2]]) require(rstan) plot(fit.n_stan[[1]], pars=qoi) plot(fit.n_stan[[1]], plotfun="hist", pars=qoi) plot(fit.n_stan[[1]], plotfun="trace", pars=qoi) plot(fit.n_stan[[1]], plotfun = "rhat")
This dataset is an extract from the 2001 Medical Expenditure Panel Survey (MEPS), providing information on ambulatory expenditures and various demographic and health-related variables. It has been used for illustrative examples by Cameron and Trivedi (2009, Chapter 16).
data(MEPS2001)data(MEPS2001)
A data frame with 3,328 observations on the following 22 variables.
Education status
Age
Income
Gender
Self-reported health status, very good
Self-reported health status, good
Hospital expenditures
Total number of chronic diseases
Family support
Dummy variable for hospital expenditures
Age squared
Interaction between age and gender
Self-reported health status, fair or poor
Year of survey
Type of insurance
Ambulatory expenditures
Log of ambulatory expenditures
Ethnicity
Insurance type, version 1
Dummy variable for ambulatory expenditures
Log-transformed ambulatory expenditures
Insurance status
2001 Medical Expenditure Panel Survey by the Agency for Healthcare Research and Quality.
Cameron, C.A. and Trivedi, P.K. (2009). *Microeconometrics Using Stata*. College Station, TX: Stata Press.
data(MEPS2001) head(MEPS2001)data(MEPS2001) head(MEPS2001)
Cross-section data originating from the 1976 Panel Study of Income Dynamics (PSID). The dataset includes demographic and economic characteristics of married women and their husbands, and is commonly used for analyzing female labor force participation.
data(PSID1976)data(PSID1976)
A data frame with 753 observations on the following 22 variables.
age of the woman
dummy for living in a city
dummy for college education (woman)
years of education (woman)
years of labor market experience
father's years of education
family income in 1,000s
husband's age
dummy for husband's college education
husband's years of education
husband's weekly working hours
woman's weekly working hours
husband's log hourly wage
mother's years of education
number of children older than 6
dummy for woman's labor force participation
replacement wage (predicted wage if not employed)
marginal tax rate
state unemployment rate
log hourly wage of the woman
number of children 6 or younger
Mroz, T. A. (1987). The sensitivity of an empirical model of married women's hours of work to economic and statistical assumptions. *Econometrica*, 55(4), 765–799.
data(PSID1976) head(PSID1976)data(PSID1976) head(PSID1976)