Package 'ivgets' reference manual

Title:	General to Specific Modeling and Indicator Saturation in 2SLS Models
Description:	Provides facilities of general to specific model selection for exogenous regressors in 2SLS models. Furthermore, indicator saturation methods can be used to detect outliers and structural breaks in the sample.
Authors:	Kurle Jonas [aut, cre]
Maintainer:	Kurle Jonas <[email protected]>
License:	GPL (>= 3)
Version:	0.1.2
Built:	2024-10-14 06:45:49 UTC
Source:	CRAN

Artificial data set for illustration.

Description

A data set containing dependent variable, endogenous and exogenous regressors, and excluded instruments for 2SLS models. The structural error is also stored even though not observed in practice.

Usage

artificial2sls
artificial2sls

Format

A data frame with 100 observations (rows) and 16 variables (columns):

name	variable description
y	dependent variable
x1	intercept
x2	relevant exogenous regressor
x3	irrelevant exogenous regressor
x4	irrelevant exogenous regressor
x5	irrelevant exogenous regressor
x6	irrelevant exogenous regressor
x7	irrelevant exogenous regressor
x8	irrelevant exogenous regressor
x9	irrelevant exogenous regressor
x10	irrelevant exogenous regressor
x11	relevant endogenous regressor
u	structural error (in practice unobserved)
z11	excluded instrument
z12	excluded instrument
id	unique observation identifier

Artificial data set with outliers for illustration.

Description

A data set containing dependent variable, endogenous and exogenous regressors, and excluded instruments for 2SLS models. The structural error is also stored even though not observed in practice. Some errors are contaminated, making these observations outliers.

Usage

artificial2sls_contaminated
artificial2sls_contaminated

Format

A data frame with 100 observations (rows) and 16 variables (columns):

name	variable description
y	dependent variable
x1	intercept
x2	relevant exogenous regressor
x3	irrelevant exogenous regressor
x4	irrelevant exogenous regressor
x5	irrelevant exogenous regressor
x6	irrelevant exogenous regressor
x7	irrelevant exogenous regressor
x8	irrelevant exogenous regressor
x9	irrelevant exogenous regressor
x10	irrelevant exogenous regressor
x11	relevant endogenous regressor
u	structural error (in practice unobserved)
z11	excluded instrument
z12	excluded instrument
id	unique observation identifier

Details

The data frame has two additional attributes that store the indices of the outliers, "outliers", and their magnitudes "magnitude".

Artificial data set without outliers prepared for shiny application.

Description

Artificial data set without outliers prepared for shiny application.

Usage

artificial2sls_shiny
artificial2sls_shiny

Format

A data frame with 100 observations (rows) and 17 variables (columns):

name	variable description
y	dependent variable
x1	intercept
x2	relevant exogenous regressor
x3	irrelevant exogenous regressor
x4	irrelevant exogenous regressor
x5	irrelevant exogenous regressor
x6	irrelevant exogenous regressor
x7	irrelevant exogenous regressor
x8	irrelevant exogenous regressor
x9	irrelevant exogenous regressor
x10	irrelevant exogenous regressor
x11	relevant endogenous regressor
u	structural error (in practice unobserved)
z11	excluded instrument
z12	excluded instrument
id	unique observation identifier
is.outlier	factor variable whether the observation is an outlier (`1`) or not (`0`)

Extract the first and second stage regressors of ivreg formula

Description

extract_variables takes a formula object for ivreg::ivreg(), i.e. in a format of y ~ x1 + x2 | x1 + z2 and extracts the different elements in a list.

Usage

extract_variables(formula)
extract_variables(formula)

Arguments

formula

A formula for the ivreg::ivreg function, i.e. in format y ~ x1 + x2 | z1 + z2.

Value

extract_variables returns a list with three components: $yvar stores the name of the dependent variable, $first the names of the regressors of the first stage and $second the names of the second stage regressors.

Function factory for creating indicators from their names

Description

factory_indicators creates a function that takes the name of an indicator and returns the corresponding indicator to be used in a regression. For user-specified indicators, it extracts the corresponding column from the uis matrix.

Usage

factory_indicators(n)
factory_indicators(n)

Arguments

`n`	An integer specifying the length of the indicators.

Details

Argument n should equal the number of observations in the data set which will be augmented with the indicators.

The created function takes a name of an indicator and the original uis argument that was used in indicator saturation and returns the indicator.

Value

factory_indicators returns a function called creator().

Gets modeling on an ivreg object

Description

gets.ivreg conducts general-to-specific model selection on an ivreg object returned by ivreg::ivreg().

Usage

## S3 method for class 'ivreg'
gets(
  x,
  gum.result = NULL,
  t.pval = 0.05,
  wald.pval = t.pval,
  do.pet = TRUE,
  ar.LjungB = NULL,
  arch.LjungB = NULL,
  normality.JarqueB = NULL,
  include.gum = FALSE,
  include.1cut = FALSE,
  include.empty = FALSE,
  max.paths = NULL,
  turbo = FALSE,
  tol = 1e-07,
  max.regs = NULL,
  print.searchinfo = TRUE,
  alarm = FALSE,
  keep_exog = NULL,
  overid = NULL,
  weak = NULL,
  ...
)
## S3 method for class 'ivreg'
gets(
  x,
  gum.result = NULL,
  t.pval = 0.05,
  wald.pval = t.pval,
  do.pet = TRUE,
  ar.LjungB = NULL,
  arch.LjungB = NULL,
  normality.JarqueB = NULL,
  include.gum = FALSE,
  include.1cut = FALSE,
  include.empty = FALSE,
  max.paths = NULL,
  turbo = FALSE,
  tol = 1e-07,
  max.regs = NULL,
  print.searchinfo = TRUE,
  alarm = FALSE,
  keep_exog = NULL,
  overid = NULL,
  weak = NULL,
  ...
)

Arguments

`x`	An object of class `"ivreg"`, as returned by `ivreg::ivreg()`.
`gum.result`	a `list` with the estimation results of the General Unrestricted Model (GUM), or `NULL` (default). If the estimation results of the GUM are already available, then re-estimation of the GUM is skipped if the estimation results are provided via this argument
`t.pval`	`numeric` value between 0 and 1. The significance level used for the two-sided regressor significance t-tests
`wald.pval`	`numeric` value between 0 and 1. The significance level used for the Parsimonious Encompassing Tests (PETs)
`do.pet`	`logical`. If `TRUE` (default), then a Parsimonious Encompassing Test (PET) against the GUM is undertaken at each regressor removal for the joint significance of all the deleted regressors along the current path. If `FALSE`, then a PET is not undertaken at each regressor removal
`ar.LjungB`	a two element `vector` or `NULL` (default). In the former case, the first element contains the AR-order, the second element the significance level. If `NULL`, then a test for autocorrelation is not conducted
`arch.LjungB`	a two element `vector` or `NULL` (default). In the former case, the first element contains the ARCH-order, the second element the significance level. If `NULL`, then a test for ARCH is not conducted
`normality.JarqueB`	`NULL` or a `numeric` value between 0 and 1. In the latter case, a test for non-normality is conducted using a significance level equal to `normality.JarqueB`. If `NULL`, then no test for non-normality is conducted
`include.gum`	`logical`. If `TRUE`, then the GUM (i.e. the starting model) is included among the terminal models. If `FALSE` (default), then the GUM is not included
`include.1cut`	`logical`. If `TRUE`, then the 1-cut model is added to the list of terminal models. If `FALSE` (default), then the 1-cut is not added, unless it is a terminal model in one of the paths
`include.empty`	`logical`. If `TRUE`, then the empty model is added to the list of terminal models. If `FALSE` (default), then the empty model is not added, unless it is a terminal model in one of the paths
`max.paths`	`NULL` (default) or an integer equal to or greater than 0. If `NULL`, then there is no limit to the number of paths. If an integer, for example 1, then this integer constitutes the maximum number of paths searched
`turbo`	`logical`. If `TRUE`, then (parts of) paths are not searched twice (or more) unnecessarily, thus yielding a significant potential for speed-gain. However, the checking of whether the search has arrived at a point it has already been comes with a slight computational overhead. Accordingly, if `turbo=TRUE`, then the total search time might in fact be higher than if `turbo=FALSE`. This happens if estimation is very fast, say, less than quarter of a second. Hence the default is `FALSE`
`tol`	numeric value (`default = 1e-07`). The tolerance for detecting linear dependencies in the columns of the variance-covariance matrix when computing the Wald-statistic used in the Parsimonious Encompassing Tests (PETs), see the `qr.solve` function
`max.regs`	`integer`. The maximum number of regressions along a deletion path. Do not alter unless you know what you are doing!
`print.searchinfo`	`logical`. If `TRUE` (default), then a print is returned whenever simiplification along a new path is started
`alarm`	`logical`. If `TRUE`, then a sound or beep is emitted (in order to alert the user) when the model selection ends
`keep_exog`	A numeric vector of indices or a character vector of names corresponding to the exogenous regressors in the `data` that should not be selected over. Default `NULL` means that selection is over all exogenous regressors. If an intercept has been specified in the `formula` but is not already included in the `data`, then it can be kept by either including the index `0` or the character `"Intercept"`, respectively, as an element in `keep_exog`.
`overid`	`NULL` if no Sargan test of overidentifying restrictions should be used as a diagnostic check for model selection or a numeric value between 0 and 1. In the latter case, the test is conducted using this value as the significance level.
`weak`	`NULL` if no weak instrument F-test on the first stage should be used as a diagnostic check for model selection or a numeric value between 0 and 1. In the latter case, the test is conducted using this value as the significance level.
`...`	Further arguments passed to or from other methods.

Value

Returns a list of class "ivgets" with three named elements. $selection stores the selection results from getsFun (including paths, terminal models, and best specification). $final stores the ivreg model object of the best specification or NULL if the GUM does not pass all diagnostics. $keep stores the names of the regressors that were not selected over, including the endogenous regressors, which are always kept.

Indicator saturation modeling on an ivreg object

Description

isat.ivreg conducts indicator saturation model selection on an ivreg object returned by ivreg::ivreg().

Usage

## S3 method for class 'ivreg'
isat(
  y,
  iis = TRUE,
  sis = FALSE,
  tis = FALSE,
  uis = FALSE,
  blocks = NULL,
  ratio.threshold = 0.8,
  max.block.size = 30,
  t.pval = 1/NROW(data),
  wald.pval = t.pval,
  do.pet = FALSE,
  ar.LjungB = NULL,
  arch.LjungB = NULL,
  normality.JarqueB = NULL,
  info.method = c("sc", "aic", "hq"),
  include.1cut = FALSE,
  include.empty = FALSE,
  max.paths = NULL,
  parallel.options = NULL,
  turbo = FALSE,
  tol = 1e-07,
  max.regs = NULL,
  print.searchinfo = TRUE,
  plot = NULL,
  alarm = FALSE,
  overid = NULL,
  weak = NULL,
  fast = FALSE,
  ...
)
## S3 method for class 'ivreg'
isat(
  y,
  iis = TRUE,
  sis = FALSE,
  tis = FALSE,
  uis = FALSE,
  blocks = NULL,
  ratio.threshold = 0.8,
  max.block.size = 30,
  t.pval = 1/NROW(data),
  wald.pval = t.pval,
  do.pet = FALSE,
  ar.LjungB = NULL,
  arch.LjungB = NULL,
  normality.JarqueB = NULL,
  info.method = c("sc", "aic", "hq"),
  include.1cut = FALSE,
  include.empty = FALSE,
  max.paths = NULL,
  parallel.options = NULL,
  turbo = FALSE,
  tol = 1e-07,
  max.regs = NULL,
  print.searchinfo = TRUE,
  plot = NULL,
  alarm = FALSE,
  overid = NULL,
  weak = NULL,
  fast = FALSE,
  ...
)

Arguments

`y`	An object of class `"ivreg"`, as returned by `ivreg::ivreg()`.
`iis`	logical. If `TRUE`, impulse indicator saturation is performed.
`sis`	logical. If `TRUE`, step indicator saturation is performed.
`tis`	logical. If `TRUE`, trend indicator saturation is performed.
`uis`	a matrix of regressors, or a list of matrices. If a list, the matrices must have named columns that should not overlap with column names of any other matrices in the list.
`blocks`	`NULL` (default), an integer (the number of blocks) or a user-specified `list` that indicates how blocks should be put together. If `NULL`, then the number of blocks is determined automatically
`ratio.threshold`	Minimum ratio of variables in each block to total observations to determine the block size, default=0.8. Only relevant if blocks = `NULL`
`max.block.size`	Maximum size of block of variables to be selected over, default=30. Block size used is the maximum of given by either the ratio.threshold and max.block.size
`t.pval`	numeric value between 0 and 1. The significance level used for the two-sided regressor significance t-tests
`wald.pval`	numeric value between 0 and 1. The significance level used for the Parsimonious Encompassing Tests (PETs)
`do.pet`	logical. If `TRUE`, then a Parsimonious Encompassing Test (PET) against the GUM is undertaken at each regressor removal for the joint significance of all the deleted regressors along the current path. If FALSE (default), then a PET is not undertaken at each regressor removal. By default, the numeric value is the same as that of `t.pval`
`ar.LjungB`	a two-item list with names `lag` and `pval`, or NULL (default). In the former case `lag` contains the order of the Ljung and Box (1979) test for serial correlation in the standardised residuals, and `pval` contains the significance level. If `lag=NULL` (default), then the order used is that of the estimated 'arx' object. If `ar.Ljungb=NULL`, then the standardised residuals are not checked for serial correlation
`arch.LjungB`	a two-item list with names `lag` and `pval`, or NULL (default). In the former case, `lag` contains the order of the Ljung and Box (1979) test for serial correlation in the squared standardised residuals, and `pval` contains the significance level. If `lag=NULL` (default), then the order used is that of the estimated 'arx' object. If `arch.Ljungb=NULL`, then the standardised residuals are not checked for ARCH
`normality.JarqueB`	`NULL` (the default) or a value between 0 and 1. In the latter case, a test for non-normality is conducted using a significance level equal to `normality.JarqueB`. If `NULL`, then no test for non-normality is conducted
`info.method`	character string, "sc" (default), "aic" or "hq", which determines the information criterion to be used when selecting among terminal models. The abbreviations are short for the Schwarz or Bayesian information criterion (sc), the Akaike information criterion (aic) and the Hannan-Quinn (hq) information criterion
`include.1cut`	logical. If `TRUE`, then the 1-cut model is included among the terminal models, if it passes the diagnostic tests, even if it is not equal to one of the terminals. If FALSE (default), then the 1-cut model is not included (unless it is one of the terminals)
`include.empty`	logical. If `TRUE`, then an empty model is included among the terminal models, if it passes the diagnostic tests, even if it is not equal to one of the terminals. If FALSE (default), then the empty model is not included (unless it is one of the terminals)
`max.paths`	`NULL` (default) or an integer indicating the maximum number of paths to search
`parallel.options`	`NULL` or an integer, i.e. the number of cores/threads to be used for parallel computing (implemented w/`makeCluster` and `parLapply`)
`turbo`	logical. If `TRUE`, then (parts of) paths are not searched twice (or more) unnecessarily, thus yielding a significant potential for speed-gain. However, the checking of whether the search has arrived at a point it has already been comes with a slight computational overhead. Accordingly, if `turbo=TRUE`, then the total search time might in fact be higher than if `turbo=FALSE`. This happens if estimation is very fast, say, less than quarter of a second. Hence the default is `FALSE`
`tol`	numeric value (default = 1e-07). The tolerance for detecting linear dependencies in the columns of the regressors (see `qr` function). Only used if LAPACK is FALSE (default)
`max.regs`	integer. The maximum number of regressions along a deletion path. It is not recommended that this is altered
`print.searchinfo`	logical. If `TRUE` (default), then a print is returned whenever simiplification along a new path is started, and whenever regressors are dropped due to exact multicolinearity
`plot`	NULL or logical. If `TRUE`, then the fitted values and the residuals of the final model are plotted after model selection. If NULL (default), then the value set by `options` determines whether a plot is produced or not.
`alarm`	logical. If `TRUE`, then a sound is emitted (in order to alert the user) when the model selection ends
`overid`	`NULL` if no Sargan test of overidentifying restrictions should be used as a diagnostic check for model selection or a numeric value between 0 and 1. In the latter case, the test is conducted using this value as the significance level.
`weak`	`NULL` if no weak instrument F-test on the first stage should be used as a diagnostic check for model selection or a numeric value between 0 and 1. In the latter case, the test is conducted using this value as the significance level.
`fast`	A logical value indicating whether to speed up the 2SLS estimation but providing less details. Requires `overid == NULL` and `weak == NULL`.
`...`	Further arguments passed to or from other methods.

Value

Returns a list of class "ivisat" with two named elements. $selection stores the selection results from isat (including paths, terminal models, and best specification). $final stores the ivreg model object of the best specification or NULL if the GUM does not pass all diagnostics.

User diagnostics for getsFun() and isat()

Description

ivDiag provides several diagnostic tests for 2SLS models that can be used during model selection. Currently, a weak instrument F-test of the first stage(s) and the Sargan test of overidentifying restrictions on the validity of the instruments are implemented.

Usage

ivDiag(x, weak = FALSE, overid = FALSE)
ivDiag(x, weak = FALSE, overid = FALSE)

Arguments

`x`	A list containing the estimation results of the 2SLS model. Must contain an entry `$diag` that contains the diagnostics provided by the `ivreg::ivreg()` command.
`weak`	A logical value whether to conduct weak instrument tests.
`overid`	A logical value whether to conduct the Sargan test of overidentifying restrictions.

Details

The resulting matrix also has an attribute named "is.reject.bad", which is a logical vector of length m. Each entry records whether a rejection of the test means that the diagnostics have failed or vice versa. The first entry refers to the first row, the second entry to the second row etc. However, this attribute is not used in the following estimations. Instead, the decision rule is specified inside the user.fun argument of gets::diagnostics(), which allows for a named entry $is.reject.bad.

Value

Returns a matrix with three columns named "statistic", "df", and "p-value" and m rows. Each row records these results for one of the tests, so the number of rows varies by the arguments specified and the model (e.g. how many first stages equations there are).

General-to-specific modeling for 2SLS models

Description

General-to-specific modeling for 2SLS models

Usage

ivgets(
  formula,
  data,
  gum.result = NULL,
  t.pval = 0.05,
  wald.pval = t.pval,
  do.pet = TRUE,
  ar.LjungB = NULL,
  arch.LjungB = NULL,
  normality.JarqueB = NULL,
  include.gum = FALSE,
  include.1cut = FALSE,
  include.empty = FALSE,
  max.paths = NULL,
  turbo = FALSE,
  tol = 1e-07,
  max.regs = NULL,
  print.searchinfo = TRUE,
  alarm = FALSE,
  keep_exog = NULL,
  overid = NULL,
  weak = NULL
)
ivgets(
  formula,
  data,
  gum.result = NULL,
  t.pval = 0.05,
  wald.pval = t.pval,
  do.pet = TRUE,
  ar.LjungB = NULL,
  arch.LjungB = NULL,
  normality.JarqueB = NULL,
  include.gum = FALSE,
  include.1cut = FALSE,
  include.empty = FALSE,
  max.paths = NULL,
  turbo = FALSE,
  tol = 1e-07,
  max.regs = NULL,
  print.searchinfo = TRUE,
  alarm = FALSE,
  keep_exog = NULL,
  overid = NULL,
  weak = NULL
)

Arguments

`formula`	A formula in the format `y ~ x1 + x2 \| z1 + z2`.
`data`	A data frame with all necessary variables y, x, and z.
`gum.result`	a `list` with the estimation results of the General Unrestricted Model (GUM), or `NULL` (default). If the estimation results of the GUM are already available, then re-estimation of the GUM is skipped if the estimation results are provided via this argument
`t.pval`	`numeric` value between 0 and 1. The significance level used for the two-sided regressor significance t-tests
`wald.pval`	`numeric` value between 0 and 1. The significance level used for the Parsimonious Encompassing Tests (PETs)
`do.pet`	`logical`. If `TRUE` (default), then a Parsimonious Encompassing Test (PET) against the GUM is undertaken at each regressor removal for the joint significance of all the deleted regressors along the current path. If `FALSE`, then a PET is not undertaken at each regressor removal
`ar.LjungB`	a two element `vector` or `NULL` (default). In the former case, the first element contains the AR-order, the second element the significance level. If `NULL`, then a test for autocorrelation is not conducted
`arch.LjungB`	a two element `vector` or `NULL` (default). In the former case, the first element contains the ARCH-order, the second element the significance level. If `NULL`, then a test for ARCH is not conducted
`normality.JarqueB`	`NULL` or a `numeric` value between 0 and 1. In the latter case, a test for non-normality is conducted using a significance level equal to `normality.JarqueB`. If `NULL`, then no test for non-normality is conducted
`include.gum`	`logical`. If `TRUE`, then the GUM (i.e. the starting model) is included among the terminal models. If `FALSE` (default), then the GUM is not included
`include.1cut`	`logical`. If `TRUE`, then the 1-cut model is added to the list of terminal models. If `FALSE` (default), then the 1-cut is not added, unless it is a terminal model in one of the paths
`include.empty`	`logical`. If `TRUE`, then the empty model is added to the list of terminal models. If `FALSE` (default), then the empty model is not added, unless it is a terminal model in one of the paths
`max.paths`	`NULL` (default) or an integer equal to or greater than 0. If `NULL`, then there is no limit to the number of paths. If an integer, for example 1, then this integer constitutes the maximum number of paths searched
`turbo`	`logical`. If `TRUE`, then (parts of) paths are not searched twice (or more) unnecessarily, thus yielding a significant potential for speed-gain. However, the checking of whether the search has arrived at a point it has already been comes with a slight computational overhead. Accordingly, if `turbo=TRUE`, then the total search time might in fact be higher than if `turbo=FALSE`. This happens if estimation is very fast, say, less than quarter of a second. Hence the default is `FALSE`
`tol`	numeric value (`default = 1e-07`). The tolerance for detecting linear dependencies in the columns of the variance-covariance matrix when computing the Wald-statistic used in the Parsimonious Encompassing Tests (PETs), see the `qr.solve` function
`max.regs`	`integer`. The maximum number of regressions along a deletion path. Do not alter unless you know what you are doing!
`print.searchinfo`	`logical`. If `TRUE` (default), then a print is returned whenever simiplification along a new path is started
`alarm`	`logical`. If `TRUE`, then a sound or beep is emitted (in order to alert the user) when the model selection ends
`keep_exog`	A numeric vector of indices or a character vector of names corresponding to the exogenous regressors in the `data` that should not be selected over. Default `NULL` means that selection is over all exogenous regressors. If an intercept has been specified in the `formula` but is not already included in the `data`, then it can be kept by either including the index `0` or the character `"Intercept"`, respectively, as an element in `keep_exog`.
`overid`	`NULL` if no Sargan test of overidentifying restrictions should be used as a diagnostic check for model selection or a numeric value between 0 and 1. In the latter case, the test is conducted using this value as the significance level.
`weak`	`NULL` if no weak instrument F-test on the first stage should be used as a diagnostic check for model selection or a numeric value between 0 and 1. In the latter case, the test is conducted using this value as the significance level.

Value

Indicator saturation modeling for 2SLS models

Description

Indicator saturation modeling for 2SLS models

Usage

ivisat(
  formula,
  data,
  iis = TRUE,
  sis = FALSE,
  tis = FALSE,
  uis = FALSE,
  blocks = NULL,
  ratio.threshold = 0.8,
  max.block.size = 30,
  t.pval = 1/NROW(data),
  wald.pval = t.pval,
  do.pet = FALSE,
  ar.LjungB = NULL,
  arch.LjungB = NULL,
  normality.JarqueB = NULL,
  info.method = c("sc", "aic", "hq"),
  include.1cut = FALSE,
  include.empty = FALSE,
  max.paths = NULL,
  parallel.options = NULL,
  turbo = FALSE,
  tol = 1e-07,
  max.regs = NULL,
  print.searchinfo = TRUE,
  plot = NULL,
  alarm = FALSE,
  overid = NULL,
  weak = NULL,
  fast = FALSE
)
ivisat(
  formula,
  data,
  iis = TRUE,
  sis = FALSE,
  tis = FALSE,
  uis = FALSE,
  blocks = NULL,
  ratio.threshold = 0.8,
  max.block.size = 30,
  t.pval = 1/NROW(data),
  wald.pval = t.pval,
  do.pet = FALSE,
  ar.LjungB = NULL,
  arch.LjungB = NULL,
  normality.JarqueB = NULL,
  info.method = c("sc", "aic", "hq"),
  include.1cut = FALSE,
  include.empty = FALSE,
  max.paths = NULL,
  parallel.options = NULL,
  turbo = FALSE,
  tol = 1e-07,
  max.regs = NULL,
  print.searchinfo = TRUE,
  plot = NULL,
  alarm = FALSE,
  overid = NULL,
  weak = NULL,
  fast = FALSE
)

Arguments

`formula`	A formula in the format `y ~ x1 + x2 \| z1 + z2`.
`data`	A data frame with all necessary variables y, x, and z.
`iis`	logical. If `TRUE`, impulse indicator saturation is performed.
`sis`	logical. If `TRUE`, step indicator saturation is performed.
`tis`	logical. If `TRUE`, trend indicator saturation is performed.
`uis`	a matrix of regressors, or a list of matrices. If a list, the matrices must have named columns that should not overlap with column names of any other matrices in the list.
`blocks`	`NULL` (default), an integer (the number of blocks) or a user-specified `list` that indicates how blocks should be put together. If `NULL`, then the number of blocks is determined automatically
`ratio.threshold`	Minimum ratio of variables in each block to total observations to determine the block size, default=0.8. Only relevant if blocks = `NULL`
`max.block.size`	Maximum size of block of variables to be selected over, default=30. Block size used is the maximum of given by either the ratio.threshold and max.block.size
`t.pval`	numeric value between 0 and 1. The significance level used for the two-sided regressor significance t-tests
`wald.pval`	numeric value between 0 and 1. The significance level used for the Parsimonious Encompassing Tests (PETs)
`do.pet`	logical. If `TRUE`, then a Parsimonious Encompassing Test (PET) against the GUM is undertaken at each regressor removal for the joint significance of all the deleted regressors along the current path. If FALSE (default), then a PET is not undertaken at each regressor removal. By default, the numeric value is the same as that of `t.pval`
`ar.LjungB`	a two-item list with names `lag` and `pval`, or NULL (default). In the former case `lag` contains the order of the Ljung and Box (1979) test for serial correlation in the standardised residuals, and `pval` contains the significance level. If `lag=NULL` (default), then the order used is that of the estimated 'arx' object. If `ar.Ljungb=NULL`, then the standardised residuals are not checked for serial correlation
`arch.LjungB`	a two-item list with names `lag` and `pval`, or NULL (default). In the former case, `lag` contains the order of the Ljung and Box (1979) test for serial correlation in the squared standardised residuals, and `pval` contains the significance level. If `lag=NULL` (default), then the order used is that of the estimated 'arx' object. If `arch.Ljungb=NULL`, then the standardised residuals are not checked for ARCH
`normality.JarqueB`	`NULL` (the default) or a value between 0 and 1. In the latter case, a test for non-normality is conducted using a significance level equal to `normality.JarqueB`. If `NULL`, then no test for non-normality is conducted
`info.method`	character string, "sc" (default), "aic" or "hq", which determines the information criterion to be used when selecting among terminal models. The abbreviations are short for the Schwarz or Bayesian information criterion (sc), the Akaike information criterion (aic) and the Hannan-Quinn (hq) information criterion
`include.1cut`	logical. If `TRUE`, then the 1-cut model is included among the terminal models, if it passes the diagnostic tests, even if it is not equal to one of the terminals. If FALSE (default), then the 1-cut model is not included (unless it is one of the terminals)
`include.empty`	logical. If `TRUE`, then an empty model is included among the terminal models, if it passes the diagnostic tests, even if it is not equal to one of the terminals. If FALSE (default), then the empty model is not included (unless it is one of the terminals)
`max.paths`	`NULL` (default) or an integer indicating the maximum number of paths to search
`parallel.options`	`NULL` or an integer, i.e. the number of cores/threads to be used for parallel computing (implemented w/`makeCluster` and `parLapply`)
`turbo`	logical. If `TRUE`, then (parts of) paths are not searched twice (or more) unnecessarily, thus yielding a significant potential for speed-gain. However, the checking of whether the search has arrived at a point it has already been comes with a slight computational overhead. Accordingly, if `turbo=TRUE`, then the total search time might in fact be higher than if `turbo=FALSE`. This happens if estimation is very fast, say, less than quarter of a second. Hence the default is `FALSE`
`tol`	numeric value (default = 1e-07). The tolerance for detecting linear dependencies in the columns of the regressors (see `qr` function). Only used if LAPACK is FALSE (default)
`max.regs`	integer. The maximum number of regressions along a deletion path. It is not recommended that this is altered
`print.searchinfo`	logical. If `TRUE` (default), then a print is returned whenever simiplification along a new path is started, and whenever regressors are dropped due to exact multicolinearity
`plot`	NULL or logical. If `TRUE`, then the fitted values and the residuals of the final model are plotted after model selection. If NULL (default), then the value set by `options` determines whether a plot is produced or not.
`alarm`	logical. If `TRUE`, then a sound is emitted (in order to alert the user) when the model selection ends
`overid`	`NULL` if no Sargan test of overidentifying restrictions should be used as a diagnostic check for model selection or a numeric value between 0 and 1. In the latter case, the test is conducted using this value as the significance level.
`weak`	`NULL` if no weak instrument F-test on the first stage should be used as a diagnostic check for model selection or a numeric value between 0 and 1. In the latter case, the test is conducted using this value as the significance level.
`fast`	A logical value indicating whether to speed up the 2SLS estimation but providing less details. Requires `overid == NULL` and `weak == NULL`.

Value

User estimator ivreg for getsFun() and isat()

Description

ivregFun calls ivreg::ivreg() in a format that is suitable for the model selection function gets::getsFun() and for the indicator saturation function gets::isat().

Usage

ivregFun(y, x, z, formula, tests, fast = FALSE)
ivregFun(y, x, z, formula, tests, fast = FALSE)

Arguments

`y`	A numeric vector with no missing values.
`x`	A matrix or `NULL`.
`z`	A numeric vector or matrix.
`formula`	A formula in the format `y ~ x1 + x2 \| z1 + z2`.
`tests`	A logical value whether to calculate the `ivreg::summary.ivreg()` diagnostics.
`fast`	A logical value whether to speed up the 2SLS estimation but providing less details. Requires `tests == FALSE`.

Details

For the required outputs of user-specified estimators, see the article "User-Specified General-to-Specific and Indicator Saturation Methods" by Genaro Sucarrat, published in the R Journal: https://journal.r-project.org/archive/2021/RJ-2021-024/index.html

Value

A list with entries needed for model selection via gets::getsFun() or gets::isat().

Takes ivreg formula and returns formula compatible with model selection

Description

new_formula takes a formula object for ivreg::ivreg(), i.e. in a format of y ~ x1 + x2 | x1 + z2, and returns a list with element suitable for model selection. For example, it updates the data by creating an intercept if specified in the formula, checks for collinearity among the regressors, and updates the formula accordingly.

Usage

new_formula(formula, data, keep_exog)
new_formula(formula, data, keep_exog)

Arguments

`formula`	A formula for the ivreg::ivreg function, i.e. in format `y ~ x1 + x2 \| z1 + z2`.
`data`	A data frame.
`keep_exog`	A numeric vector of indices or a character vector of names corresponding to the exogenous regressors in the `data` that should not be selected over. Default `NULL` means that selection is over all exogenous regressors. If an intercept has been specified in the `formula` but is not already included in the `data`, then it can be kept by either including the index `0` or the character `"Intercept"`, respectively, as an element in `keep_exog`.

Value

A list with several named elements. Component $fml stores the new baseline formula that will be used for model selection. Components y, x, and z store the data of the dependent variable, structural regressors, and excluded instruments. The entries $depvar, $x1, $x2, $z1, and $z2 contain the names of the dependent variable, endogenous and exogenous regressors, included and excluded instruments. $dx1, $dx2, $dz1, $dz2 store the dimensions of the respective variables. Finally, $keep and $keep.names contain the indices and names of the regressors that will not be selected over.

Package 'ivgets'

Help Index

Artificial data set for illustration.

Description

Usage

Format

Artificial data set with outliers for illustration.

Description

Usage

Format

Details

Artificial data set without outliers prepared for shiny application.

Description

Usage

Format

Extract the first and second stage regressors of ivreg formula

Description

Usage

Arguments

Value

Function factory for creating indicators from their names

Description

Usage

Arguments

Details

Value

Gets modeling on an ivreg object

Description

Usage

Arguments

Value

Indicator saturation modeling on an ivreg object

Description

Usage

Arguments

Value

User diagnostics for getsFun() and isat()

Description

Usage

Arguments

Details

Value

General-to-specific modeling for 2SLS models

Description

Usage

Arguments

Value

Indicator saturation modeling for 2SLS models

Description

Usage

Arguments

Value

User estimator ivreg for getsFun() and isat()

Description

Usage

Arguments

Details

Value

Takes ivreg formula and returns formula compatible with model selection

Description

Usage

Arguments

Value