Title: | Fit Normal, Student-t or Contaminated Normal Heckman Selection Models |
---|---|
Description: | It performs maximum likelihood estimation for the Heckman selection model (Normal, Student-t or Contaminated normal) using an EM-algorithm <doi:10.1016/j.jmva.2021.104737>. It also performs influence diagnostic through global and local influence for four possible perturbation schema. |
Authors: | Marcos Prates [aut, cre, trl] , Victor Lachos [aut] , Dipak Dey [aut], Marcos Oliveira [aut, ctb], Christian Galarza [ctb], Katherine Loor [ctb], Alejandro Ordonez [ctb] |
Maintainer: | Marcos Prates <[email protected]> |
License: | GPL-2 |
Version: | 0.2-1 |
Built: | 2024-12-04 07:17:01 UTC |
Source: | CRAN |
This function performs case deletion analysis based on a HeckmanEM object (not available for the contaminated normal model).
CaseDeletion(object)
CaseDeletion(object)
object |
A HeckmanEM object. |
This function uses the case deletion approach to study the impact of deleting one or more observations from the dataset on the parameters estimates, using the ideas of Cook (1977) and Zhu et.al. (2001). The GD vector contains the generalized Cook distances
where is the gradient vector after dropping the
th observation, and
is the Hessian matrix. The benchmark was adapted using the suggestion of Barros et al. (2010). We use
as the benchmark for the
, with
representing the number of estimated model parameters.
A list of class HeckmanEM.deletion
with a vector GD of dimension (see details), and a benchmark value.
M. Barros, M. Galea, M. González, V. Leiva, Influence diagnostics in the Tobit censored response model, Statistical Methods & Applications 19 (2010) 379–397.
R. D. Cook, Detection of influential observation in linear regression, Technometrics 19 (1977) 15–18.
H. Zhu, S. Lee, B. Wei, J. Zhou, Case-deletion measures for models with incomplete data, Biometrika 88 (2001) 727–737.
n <- 100 nu <- 3 cens <- 0.25 set.seed(13) w <- cbind(1, runif(n, -1, 1), rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df = nu) sigma2 <- 1 beta <- c(1, 0.5) gamma <- c(1, 0.3, -.5) gamma[1] <- -c * sqrt(sigma2) datas <- rHeckman(x, w, beta, gamma, sigma2, rho = 0.6, nu, family = "T") y <- datas$y cc <- datas$cc heckmodel <- HeckmanEM(y, x, w, cc, family = "Normal", iter.max = 50) global <- CaseDeletion(heckmodel) plot(global)
n <- 100 nu <- 3 cens <- 0.25 set.seed(13) w <- cbind(1, runif(n, -1, 1), rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df = nu) sigma2 <- 1 beta <- c(1, 0.5) gamma <- c(1, 0.3, -.5) gamma[1] <- -c * sqrt(sigma2) datas <- rHeckman(x, w, beta, gamma, sigma2, rho = 0.6, nu, family = "T") y <- datas$y cc <- datas$cc heckmodel <- HeckmanEM(y, x, w, cc, family = "Normal", iter.max = 50) global <- CaseDeletion(heckmodel) plot(global)
'HeckmanEM()' fits the Heckman selection model.
HeckmanEM( y, x, w, cc, nu = 4, family = "T", error = 1e-05, iter.max = 500, im = TRUE, criteria = TRUE, verbose = TRUE )
HeckmanEM( y, x, w, cc, nu = 4, family = "T", error = 1e-05, iter.max = 500, im = TRUE, criteria = TRUE, verbose = TRUE )
y |
A response vector. |
x |
A covariate matrix for the response y. |
w |
A covariate matrix for the missing indicator cc. |
cc |
A missing indicator vector (1=observed, 0=missing) . |
nu |
When using the t- distribution, the initial value for the degrees of freedom. When using the CN distribution, the initial values for the proportion of bad observations and the degree of contamination. |
family |
The family to be used (Normal, T or CN). |
error |
The absolute convergence error for the EM stopping rule. |
iter.max |
The maximum number of iterations for the EM algorithm. |
im |
TRUE/FALSE, boolean to decide if the standard errors of the parameters should be computed. |
criteria |
TRUE/FALSE, boolean to decide if the model selection criteria should be computed. |
verbose |
TRUE/FALSE, boolean to decide if the progress should be printed in the screen. |
An object of the class HeckmanEM with all the outputs provided from the function.
n <- 100 nu <- 3 cens <- 0.25 set.seed(13) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma <- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) set.seed(1) datas <- rHeckman(x,w,beta,gamma,sigma2,rho = 0.6,nu,family="T") y <- datas$y cc <- datas$cc # Normal EM res.N <- HeckmanEM(y, x, w, cc, family="Normal",iter.max = 50) # Student-t EM res.T <- HeckmanEM(y, x, w, cc, nu = 4, family="T", iter.max = 50)
n <- 100 nu <- 3 cens <- 0.25 set.seed(13) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma <- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) set.seed(1) datas <- rHeckman(x,w,beta,gamma,sigma2,rho = 0.6,nu,family="T") y <- datas$y cc <- datas$cc # Normal EM res.N <- HeckmanEM(y, x, w, cc, family="Normal",iter.max = 50) # Student-t EM res.T <- HeckmanEM(y, x, w, cc, nu = 4, family="T", iter.max = 50)
'HeckmanEM.criteria()' calculates the AIC, AICc, BIC selection criteria for the fitted Heckman selection model.
HeckmanEM.criteria(obj)
HeckmanEM.criteria(obj)
obj |
An object of the class HeckmanEM. |
The calculated AIC, AICc, and BIC for the parameters of the fitted model.
n <- 100 family <- "T" nu <- 4 rho <- .6 cens <- .25 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma <- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) set.seed(1) datas <- rHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family) y <- datas$y cc <- datas$cc res <- HeckmanEM(y, x, w, cc, nu = 4, family = "Normal", error = 1e-05, iter.max = 500, im = TRUE, criteria = FALSE) cr <- HeckmanEM.criteria(res)
n <- 100 family <- "T" nu <- 4 rho <- .6 cens <- .25 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma <- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) set.seed(1) datas <- rHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family) y <- datas$y cc <- datas$cc res <- HeckmanEM(y, x, w, cc, nu = 4, family = "Normal", error = 1e-05, iter.max = 500, im = TRUE, criteria = FALSE) cr <- HeckmanEM.criteria(res)
'HeckmanEM.envelope()' plots the envelope for the fitted Heckman selection model.
HeckmanEM.envelope(obj, envelope = 0.95, ...)
HeckmanEM.envelope(obj, envelope = 0.95, ...)
obj |
An object of the class HeckmanEM. |
envelope |
The envelope coverage percentage. |
... |
Other option for chart.QQPlot from PerformanceAnalytics package. |
A residual plot of the fitted data and its envelope.
n <- 100 family <- "T" nu <- 4 rho <- .6 cens <- .25 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma <- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) set.seed(1) datas <- rHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family) y <- datas$y cc <- datas$cc res <- HeckmanEM(y, x, w, cc, nu = 4, family = "Normal", error = 1e-05, iter.max = 500, im = TRUE, criteria = TRUE) HeckmanEM.envelope(res, ylab="Normalized Quantile Residuals",xlab="Standard normal quantile", line="quartile", col=c(20,1), pch=19, ylim = c(-5,4))
n <- 100 family <- "T" nu <- 4 rho <- .6 cens <- .25 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma <- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) set.seed(1) datas <- rHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family) y <- datas$y cc <- datas$cc res <- HeckmanEM(y, x, w, cc, nu = 4, family = "Normal", error = 1e-05, iter.max = 500, im = TRUE, criteria = TRUE) HeckmanEM.envelope(res, ylab="Normalized Quantile Residuals",xlab="Standard normal quantile", line="quartile", col=c(20,1), pch=19, ylim = c(-5,4))
'HeckmanEM.infomat()' estimates the standard errors for the parameters for the fitted Heckman selection model.
HeckmanEM.infomat(obj)
HeckmanEM.infomat(obj)
obj |
An object of the class HeckmanEM. |
The estimated standard errors for the parameters of the fitted model.
n <- 100 family <- "T" nu <- 4 rho <- .6 cens <- .25 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma <- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) set.seed(1) datas <- rHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family) y <- datas$y cc <- datas$cc res <- HeckmanEM(y, x, w, cc, nu = 4, family = "Normal", error = 1e-05, iter.max = 500, im = FALSE, criteria = TRUE) im <- HeckmanEM.infomat(res)
n <- 100 family <- "T" nu <- 4 rho <- .6 cens <- .25 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma <- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) set.seed(1) datas <- rHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family) y <- datas$y cc <- datas$cc res <- HeckmanEM(y, x, w, cc, nu = 4, family = "Normal", error = 1e-05, iter.max = 500, im = FALSE, criteria = TRUE) im <- HeckmanEM.infomat(res)
This function conducts influence analysis for a given 'HeckmanEM' object. The influence analysis can be conducted using several types of perturbations (not available for the contaminated Normal model).
Influence(object, type, colx = NULL, k = 3.5)
Influence(object, type, colx = NULL, k = 3.5)
object |
A 'HeckmanEM' object to perform the analysis on. |
type |
A character string indicating the type of perturbation to perform. The types can be one of "case-weight","scale","response" and"exploratory". |
colx |
Optional integer specifying the position of the column in the
object's matrix |
k |
A positive real constant to be used in the benchmark calculation: |
Returns a list of class HeckmanEM.influence
with the following elements:
M0 |
A vector of length |
benchmark |
|
influent |
A vector with the influential observations' positions. |
type |
The perturbation type. |
Marcos Oliveira
Insert any relevant references here.
n <- 100 nu <- 3 cens <- 0.25 set.seed(13) w <- cbind(1, runif(n, -1, 1), rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df = nu) sigma2 <- 1 beta <- c(1, 0.5) gamma <- c(1, 0.3, -.5) gamma[1] <- -c * sqrt(sigma2) datas <- rHeckman(x, w, beta, gamma, sigma2, rho = 0.6, nu, family = "T") y <- datas$y cc <- datas$cc heckmodel <- HeckmanEM(y, x, w, cc, family = "Normal", iter.max = 50) global <- CaseDeletion(heckmodel) plot(global) local_case <- Influence(heckmodel, type = "case-weight") local_case$influent # influential values here! plot(local_case) local_scale <- Influence(heckmodel, type = "scale") local_scale$influent # influential values here! plot(local_scale) local_response <- Influence(heckmodel, type = "response") local_response$influent # influential values here! plot(local_response) local_explore <- Influence(heckmodel, type = "exploratory", colx = 2) local_explore$influent # influential values here! plot(local_explore)
n <- 100 nu <- 3 cens <- 0.25 set.seed(13) w <- cbind(1, runif(n, -1, 1), rnorm(n)) x <- cbind(w[,1:2]) c <- qt(cens, df = nu) sigma2 <- 1 beta <- c(1, 0.5) gamma <- c(1, 0.3, -.5) gamma[1] <- -c * sqrt(sigma2) datas <- rHeckman(x, w, beta, gamma, sigma2, rho = 0.6, nu, family = "T") y <- datas$y cc <- datas$cc heckmodel <- HeckmanEM(y, x, w, cc, family = "Normal", iter.max = 50) global <- CaseDeletion(heckmodel) plot(global) local_case <- Influence(heckmodel, type = "case-weight") local_case$influent # influential values here! plot(local_case) local_scale <- Influence(heckmodel, type = "scale") local_scale$influent # influential values here! plot(local_scale) local_response <- Influence(heckmodel, type = "response") local_response$influent # influential values here! plot(local_response) local_explore <- Influence(heckmodel, type = "exploratory", colx = 2) local_explore$influent # influential values here! plot(local_explore)
'rHeckman()' generates a random sample from the Heckman selection model (Normal, Student-t or CN).
rHeckman(x, w, beta, gamma, sigma2, rho, nu = 4, family = "T")
rHeckman(x, w, beta, gamma, sigma2, rho, nu = 4, family = "T")
x |
A covariate matrix for the response y. |
w |
A covariate matrix for the missing indicator cc. |
beta |
Values for the beta vector. |
gamma |
Values for the gamma vector. |
sigma2 |
Value for the variance. |
rho |
Value for the dependence between the response and missing value. |
nu |
When using the t- distribution, the initial value for the degrees of freedom. When using the CN distribution, the initial values for the proportion of bad observations and the degree of contamination. |
family |
The family to be used (Normal, T, or CN). |
Return an object with the response (y) and missing values (cc).
n <- 100 rho <- .6 cens <- 0.25 nu <- 4 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) family <- "T" c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma<- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) data <- rHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family)
n <- 100 rho <- .6 cens <- 0.25 nu <- 4 set.seed(20200527) w <- cbind(1,runif(n,-1,1),rnorm(n)) x <- cbind(w[,1:2]) family <- "T" c <- qt(cens, df=nu) sigma2 <- 1 beta <- c(1,0.5) gamma<- c(1,0.3,-.5) gamma[1] <- -c*sqrt(sigma2) data <- rHeckman(x,w,beta,gamma,sigma2,rho,nu,family=family)