Title: | Ordinal Probit Switching Regression |
---|---|
Description: | Estimates ordinal probit switching regression models - a Heckman type selection model with an ordinal selection and continuous outcomes. Different model specifications are allowed for each treatment/regime. For more details on the method, see Wang & Mokhtarian (2024) <doi:10.1016/j.tra.2024.104072> or Chiburis & Lokshin (2007) <doi:10.1177/1536867X0700700202>. |
Authors: | Daniel Heimgartner [aut, cre, cph] , Xinyi Wang [aut] |
Maintainer: | Daniel Heimgartner <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.2 |
Built: | 2024-11-03 01:03:39 UTC |
Source: | CRAN |
Estimates ordinal probit switching regression models - a Heckman type selection model with an ordinal selection and continuous outcomes. Different model specifications are allowed for each treatment/regime. For more details on the method, see Wang & Mokhtarian (2024) doi:10.1016/j.tra.2024.104072 or Chiburis & Lokshin (2007) doi:10.1177/1536867X0700700202.
Maintainer: Daniel Heimgartner [email protected] (ORCID) [copyright holder]
Authors:
Xinyi Wang [email protected] (ORCID)
Useful links:
Conducts likelihood ratio tests for one or more OPSR model fits.
## S3 method for class 'opsr' anova(object, ...)
## S3 method for class 'opsr' anova(object, ...)
object |
an object of class |
... |
additional objects of class |
If only a single object is passed then the model is compared to the null model
(opsr_null_model
). If more than one object is specified, a likelihood ratio
test is conducted for each pair of neighboring models. It is conventional to
list the models from smallest to largest, but this is up to the user.
An object of class "anova.opsr"
.
stats::anova
, print.anova.opsr
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 fit <- opsr(model, dat) fit_null <- opsr_null_model(fit) fit_intercept <- update(fit, ~ . | 1) anova(fit) anova(fit_null, fit_intercept, fit)
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 fit <- opsr(model, dat) fit_null <- opsr_null_model(fit) fit_intercept <- update(fit, ~ . | 1) anova(fit) anova(fit_null, fit_intercept, fit)
This is the main method called when using functions from the texreg-package
.
## S4 method for signature 'opsr' extract( model, beside = FALSE, include.structural = TRUE, include.selection = TRUE, include.outcome = TRUE, include.pseudoR2 = FALSE, include.R2 = FALSE, ... )
## S4 method for signature 'opsr' extract( model, beside = FALSE, include.structural = TRUE, include.selection = TRUE, include.outcome = TRUE, include.pseudoR2 = FALSE, include.R2 = FALSE, ... )
model |
an object of class |
beside |
if |
include.structural |
whether or not structural coefficients should be printed. |
include.selection |
whether or not selection coefficients should be printed. |
include.outcome |
whether or not outcome coefficients should be printed. |
include.pseudoR2 |
whether or not the pseudo R2 statistic for the selection component should be printed. See also the 'Details' section. |
include.R2 |
whether or not the R2 statistic for the outcome component should be printed. |
... |
additional arguments passed to |
The extract
method is called internally. Higher-level functions from the
texreg-package
pass arguments via ...
to extract
.
include.pseudoR2
reports both the "equally likely" (EL) and "market share" (MS)
pseudo R2.
A texreg-class
object representing the statistical model.
texreg-package
, texreg::texreg
, texreg::screenreg
and related functions.
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 fit <- opsr(model, dat) fit_null <- opsr_null_model(fit) fit_intercept <- update(fit, ~ . | 1) texreg::screenreg(fit) texreg::screenreg(fit, beside = TRUE) texreg::screenreg(fit, beside = TRUE, include.pseudoR2 = TRUE, include.R2 = TRUE) texreg::screenreg(list(fit_null, fit_intercept, fit))
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 fit <- opsr(model, dat) fit_null <- opsr_null_model(fit) fit_intercept <- update(fit, ~ . | 1) texreg::screenreg(fit) texreg::screenreg(fit, beside = TRUE) texreg::screenreg(fit, beside = TRUE, include.pseudoR2 = TRUE, include.R2 = TRUE) texreg::screenreg(list(fit_null, fit_intercept, fit))
This is the main computation engine wrapped by opsr.fit
.
loglik_cpp(theta, W, X, Y, weights, nReg, nThreads)
loglik_cpp(theta, W, X, Y, weights, nReg, nThreads)
theta |
named coefficient vector as parsed from formula interface |
W |
list of matrices with explanatory variables for selection process for each regime. |
X |
list of matrices with expalanatory varialbes for outcome process for each regime. |
Y |
list of vectors with continuous outcomes for each regime. |
weights |
vector of weights. See also |
nReg |
integer number of regimes. |
nThreads |
number of threads to be used by |
Numeric vector of (weighted) log-likelihood contributions.
Extracting the Model Frame from OPSR Model Fits
## S3 method for class 'opsr' model.frame(formula, ...)
## S3 method for class 'opsr' model.frame(formula, ...)
formula |
an object of class |
... |
a mix of further arguments such as |
A data.frame
containing the variables used in formula$formula
.
Construct Design Matrices for OPSR Model Fits
## S3 method for class 'opsr' model.matrix(object, data, .filter = NULL, ...)
## S3 method for class 'opsr' model.matrix(object, data, .filter = NULL, ...)
object |
an object of class |
data |
a data frame containing the terms from |
.filter |
used internally in |
... |
further arguments passed to or from other methods. |
A list of lists with the design matrices W
(selection process) and
X
(outcome process). Both of these lists have object$nReg
elements (a
separate design matrix for each regime).
model.frame.opsr
, stats::model.matrix
High-level formula interface to the workhorse opsr.fit
.
opsr( formula, data, subset, weights, na.action, start = NULL, fixed = NULL, method = "BFGS", iterlim = 1000, printLevel = 2, nThreads = 1, .get2step = FALSE, .useR = FALSE, .censorRho = TRUE, ... )
opsr( formula, data, subset, weights, na.action, start = NULL, fixed = NULL, method = "BFGS", iterlim = 1000, printLevel = 2, nThreads = 1, .get2step = FALSE, .useR = FALSE, .censorRho = TRUE, ... )
formula |
an object of class |
data |
an optional data frame, list or environment (or object coercible by
|
subset |
an optional vector specifying a subset of observations to be used
in the fitting process. (See additional details in the 'Details' section of
the |
weights |
an optional vector of weights to be used in the fitting process.
Should be |
na.action |
a function which indicates what should happen when the data
contain |
start |
a numeric vector with the starting values (passed to |
fixed |
parameters to be treated as constants at their |
method |
maximzation method (passed to |
iterlim |
maximum number of iterations (passed to |
printLevel |
larger number prints more working information (passed to |
nThreads |
number of threads to be used. Do not pass higher number than
number of ordinal outcomes. See also |
.get2step |
if |
.useR |
if |
.censorRho |
if |
... |
further arguments passed to |
Models for opsr
are specified symbolically. A typical model has the form
ys | yo ~ terms_s | terms_o1 | terms_o2 | ...
. ys
is the ordered (numeric)
response vector (starting from 1, in integer-increasing fashion). For the terms
specification the rules of the regular formula interface apply (see also stats::lm).
The intercept in the terms_s
(selection process) is excluded automatically
(no need to specify -1
). If the user wants to specify the same process for
all continuous outcomes, two processes are enough (ys | yo ~ terms_s | terms_o
).
Note that the model is poorly identifiable if terms_s == terms_o
(same regressors
are used in selection and outcome processes).
An object of class "opsr" "maxLik" "maxim"
.
## simulated data sim_dat <- opsr_simulate() dat <- sim_dat$data # 1000 observations sim_dat$sigma # cov matrix of errors sim_dat$params # ground truth ## specify a model model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 | xo1 + xo2 | xo1 + xo2 model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 # since we use the same specification... ## estimate fit <- opsr(model, dat) ## inference summary(fit) ## using update and model comparison fit_updated <- update(fit, ~ . | 1) # only intercepts for the continuous outcomes ## null model fit_null <- opsr_null_model(fit) ## likelihood ratio test anova(fit_null, fit_updated, fit) ## predict p1 <- predict(fit, group = 1, type = "response") p2 <- predict(fit, group = 1, counterfact = 2, type = "response") plot(p1, p2) abline(a = 0, b = 1, col = "red") ## produce formatted tables texreg::screenreg(fit, beside = TRUE, include.pseudoR2 = TRUE, include.R2 = TRUE)
## simulated data sim_dat <- opsr_simulate() dat <- sim_dat$data # 1000 observations sim_dat$sigma # cov matrix of errors sim_dat$params # ground truth ## specify a model model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 | xo1 + xo2 | xo1 + xo2 model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 # since we use the same specification... ## estimate fit <- opsr(model, dat) ## inference summary(fit) ## using update and model comparison fit_updated <- update(fit, ~ . | 1) # only intercepts for the continuous outcomes ## null model fit_null <- opsr_null_model(fit) ## likelihood ratio test anova(fit_null, fit_updated, fit) ## predict p1 <- predict(fit, group = 1, type = "response") p2 <- predict(fit, group = 1, counterfact = 2, type = "response") plot(p1, p2) abline(a = 0, b = 1, col = "red") ## produce formatted tables texreg::screenreg(fit, beside = TRUE, include.pseudoR2 = TRUE, include.R2 = TRUE)
This is a utility function, used in opsr
and should not be used directly.
Tow-step estimation procedure to generate reasonable starting values.
opsr_2step(W, Xs, Z, Ys)
opsr_2step(W, Xs, Z, Ys)
W |
matrix with explanatory variables for selection process. |
Xs |
list of matrices with expalanatory varialbes for outcome process for each regime. |
Z |
vector with ordinal outcomes (in integer increasing fashion). |
Ys |
list of vectors with continuous outcomes for each regime. |
These estimates can be retrieved by specifying .get2step = TRUE
in opsr
.
Named vector with starting values passed to opsr.fit
.
Since the Heckman two-step estimator includes an estimate in the second step regression, the resulting OLS standard errors and heteroskedasticity-robust standard errors are incorrect (Greene 2002).
Greene WH (2002). LIMDEP Version 8.0 Econometric Modeling Guide, vol. 2.. Econometric Software, Plainview, New York.
OpenMP
is AvailableCheck Whether OpenMP
is Available
opsr_check_omp()
opsr_check_omp()
boolean
This is a utility function, used in opsr
and should not be used directly.
It is included here to document the expected structure of opsr
's start
argument.
Makes sure, the start vector conforms to the expected structure. Adds the
expected parameter names to the numeric vector. Therefore the user has to
conform to the expected order. See 'Details' for further explanation.
opsr_check_start(start, W, Xs)
opsr_check_start(start, W, Xs)
start |
vector of starting values. |
W |
matrix with explanatory variables for selection process. |
Xs |
list of matrices with expalanatory varialbes for outcome process for each regime. |
Expected order: 1. kappa threshold parameters (for ordinal probit model),
2. parameters of the selection process (names starting with s_
), 3. parameters
of the outcome processes (names starting with o[0-9]_
), 4. sigma, 5. rho.
If the same outcome process specification is used in the formula
, the starting
values have to be repeated (i.e., the length of the start
vector has to
correspond to the total number of estimated parameters in the model).
Named numeric vector conforming to the expected structure.
Check Maximum Number of Threads Available
opsr_max_threads()
opsr_max_threads()
integer
Intercept-only model with no error correlation.
opsr_null_model(object, ...)
opsr_null_model(object, ...)
object |
an object of class |
... |
further arguments passed to |
An object of class "opsr.null" "opsr"
.
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 fit <- opsr(model, dat) fit_null <- opsr_null_model(fit) summary(fit_null)
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 fit <- opsr(model, dat) fit_null <- opsr_null_model(fit) summary(fit_null)
Extracts the coefficients for each regime
opsr_prepare_coefs(theta, nReg)
opsr_prepare_coefs(theta, nReg)
theta |
named coefficient vector as parsed from formula interface |
nReg |
integer number of regimes. |
Named list of length nReg
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 start <- opsr(model, dat, .get2step = TRUE) opsr_prepare_coefs(start, 3)
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 start <- opsr(model, dat, .get2step = TRUE) opsr_prepare_coefs(start, 3)
Simulates data from an ordinal probit process and separate (for each regime) OLS process where the errors follow a multivariate normal distribution.
opsr_simulate(nobs = 1000, sigma = NULL)
opsr_simulate(nobs = 1000, sigma = NULL)
nobs |
number of observations to simulate. |
sigma |
the covariance matrix of the multivariate normal. |
Three ordinal outcomes are simulated and the distinct design matrices (W
and
X
) are used (if W == X
the model is poorely identified). Variables ys
and
xs
in data
correspond to the selection process and yo
, xo
to the outcome
process.
Named list:
params |
ground truth parameters. |
data |
simulated data (as observed by the researcher). See also 'Details' section. |
errors |
error draws from the multivariate normal (as used in the latent process). |
sigma |
assumed covariance matrix (to generate |
This is the basic computing engine called by opsr
used to fit ordinal
probit switching regression models. Should usually not be used directly.
The log-likelihood function is implemented in C++ which yields a considerable
speed-up. Parallel computation is implemented using OpenMP
.
opsr.fit( Ws, Xs, Ys, start, fixed, weights, method, iterlim, printLevel, nThreads, .useR = FALSE, ... )
opsr.fit( Ws, Xs, Ys, start, fixed, weights, method, iterlim, printLevel, nThreads, .useR = FALSE, ... )
Ws |
list of matrices with explanatory variables for selection process for each regime. |
Xs |
list of matrices with expalanatory varialbes for outcome process for each regime. |
Ys |
list of vectors with continuous outcomes for each regime. |
start |
a numeric vector with the starting values (passed to |
fixed |
parameters to be treated as constants at their |
weights |
a vector of weights to be used in the fitting process. Has to
conform with order ( |
method |
maximzation method (passed to |
iterlim |
maximum number of iterations (passed to |
printLevel |
larger number prints more working information (passed to |
nThreads |
number of threads to be used. Do not pass higher number than
number of ordinal outcomes. See also |
.useR |
if |
... |
further arguments passed to |
object of class "maxLik" "maxim"
.
maxLik::maxLik
, loglik_cpp
, opsr
Obtains predictions for the selection process (probabilities), the outcome process, or returns the inverse mills ratio. Handles also log-transformed outcomes.
## S3 method for class 'opsr' predict( object, newdata, group, counterfact = NULL, type = c("response", "unlog-response", "prob", "mills"), ... )
## S3 method for class 'opsr' predict( object, newdata, group, counterfact = NULL, type = c("response", "unlog-response", "prob", "mills"), ... )
object |
an object of class |
newdata |
an optional data frame in which to look for variables used in
|
group |
predict outcome of this group (regime). |
counterfact |
counterfactual group. |
type |
type of prediction. Can be abbreviated. See 'Details' section for more information. |
... |
further arguments passed to or from other methods. |
Elements are NA_real_
if the group
does not correspond to the observed
regime (selection outcome). This ensures consistent output length.
If the type
argument is "response"
then the continuous outcome is predicted.
Use "unlog-response"
if the outcome response was log-transformed during estimation.
"prob"
returns the probability vector of belonging to group
and "mills"
returns the inverse mills ratio.
a vector of length nrow(newdata)
(or data used during estimation).
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 fit <- opsr(model, dat) p <- predict(fit, group = 1, type = "response") fit_log <- update(fit, . | log(yo) ~ .) p_unlog <- predict(fit, group = 1, type = "unlog-response")
sim_dat <- opsr_simulate() dat <- sim_dat$data model <- ys | yo ~ xs1 + xs2 | xo1 + xo2 fit <- opsr(model, dat) p <- predict(fit, group = 1, type = "response") fit_log <- update(fit, . | log(yo) ~ .) p_unlog <- predict(fit, group = 1, type = "unlog-response")
Print Method for ANOVA OPSR Objects
## S3 method for class 'anova.opsr' print( x, digits = max(getOption("digits") - 2L, 3L), signif.stars = getOption("show.signif.stars"), ... )
## S3 method for class 'anova.opsr' print( x, digits = max(getOption("digits") - 2L, 3L), signif.stars = getOption("show.signif.stars"), ... )
x |
an object of class |
digits |
minimal number of significant digits, see |
signif.stars |
if |
... |
further arguments passed to |
Prints tables in a 'pretty' form and returns x
invisibly.
stats::printCoefmat
, anova.opsr
Print Method for Summary OPSR Objects
## S3 method for class 'summary.opsr' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'summary.opsr' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
and object of class |
digits |
minimum number of significant digits to be used for most numbers (passed to |
... |
further arguments passed to or from other methods. |
Prints summary in 'pretty' form and returns x
invisibly.
stats::printCoefmat
, summary.opsr
Follows the convention that opsr
does the bare minimum model fitting and
inference is performed in summary
.
## S3 method for class 'opsr' summary(object, rob = TRUE, ...)
## S3 method for class 'opsr' summary(object, rob = TRUE, ...)
object |
an object of class |
rob |
if |
... |
further arguments passed to or from other methods. |
An object of class "summary.opsr"
.
In particular the elements GOF
, GOFcomponents
and wald
require further
explanation:
GOF |
Contains the conventional goodness of fit indicators for the full
model. |
GOFcomponents |
Contains the goodness of fit for the model components.
|
wald |
Contains the results of two Wald-tests as conducted with help
of |
Telework data as used in Wang and Mokhtarian (2024).
telework_data
telework_data
Data frame with numeric columns
Respondent ID
Sample weight
Weekly vehicle-miles traveled
Log-transformed VMD, the dependent variable of the outcome model
Teleworking status: 1=Non-TWer, 2=Non-usual TWer, 3=Usual TWer
Sex: female
Mean-centered age
Sqaure of mean-centered age
Race: white only
Race: black only
Race: other
Education: high school or lower
Education: some college
Education: BA or higher
Household income: less than $50,000
Household income: $50,000 to $99,999
Household income: $100,000 or more
Flexible work schedule
Full-time worker
Teleworking feasibility (days/month)
Number of household vehicles
Number of children
Residential location: urban
Residential location: suburban
Residential location: small town
Residential location: rural
Attitude: pro-large-house
Attitude: pro-active-mode
Attitude: pro-car-owning
Attitude: work-interferes-with-family
Attitude: pro-teamwork
Attitude: TW effective teamwork
Attitude: TW enthusiasm
Attitude: TW location flexibility
Region indicator: respondents from WAA MSA
Wang X, Mokhtarian PL (2024). “Examining the Treatment Effect of Teleworking on Vehicle-Miles Driven: Applying an Ordered Probit Selection Model and Incorporating the Role of Travel Stress.” Transportatikon Research Part A, 186, 104072. doi:10.1016/j.tra.2024.104072.
## model as in Xinyi & Mokhtarian (2024) f <- ## ordinal and continuous outcome twing_status | vmd_ln ~ ## selection model edu_2 + edu_3 + hhincome_2 + hhincome_3 + flex_work + work_fulltime + twing_feasibility + att_proactivemode + att_procarowning + att_wif + att_proteamwork + att_tw_effective_teamwork + att_tw_enthusiasm + att_tw_location_flex | ## outcome model NTW female + age_mean + age_mean_sq + race_black + race_other + vehicle + suburban + smalltown + rural + work_fulltime + att_prolargehouse + att_procarowning + region_waa | ## outcome model NUTW edu_2 + edu_3 + suburban + smalltown + rural + work_fulltime + att_prolargehouse + att_proactivemode + att_procarowning | ## outcome model UTW female + hhincome_2 + hhincome_3 + child + suburban + smalltown + rural + att_procarowning + region_waa fit <- opsr(f, telework_data) texreg::screenreg(fit, beside = TRUE, include.pseudoR2 = TRUE, include.R2 = TRUE)
## model as in Xinyi & Mokhtarian (2024) f <- ## ordinal and continuous outcome twing_status | vmd_ln ~ ## selection model edu_2 + edu_3 + hhincome_2 + hhincome_3 + flex_work + work_fulltime + twing_feasibility + att_proactivemode + att_procarowning + att_wif + att_proteamwork + att_tw_effective_teamwork + att_tw_enthusiasm + att_tw_location_flex | ## outcome model NTW female + age_mean + age_mean_sq + race_black + race_other + vehicle + suburban + smalltown + rural + work_fulltime + att_prolargehouse + att_procarowning + region_waa | ## outcome model NUTW edu_2 + edu_3 + suburban + smalltown + rural + work_fulltime + att_prolargehouse + att_proactivemode + att_procarowning | ## outcome model UTW female + hhincome_2 + hhincome_3 + child + suburban + smalltown + rural + att_procarowning + region_waa fit <- opsr(f, telework_data) texreg::screenreg(fit, beside = TRUE, include.pseudoR2 = TRUE, include.R2 = TRUE)