Title: | ROSE Random Forests for Robust Semiparametric Efficient Estimation |
---|---|
Description: | ROSE (RObust Semiparametric Efficient) random forests for robust semiparametric efficient estimation in partially parametric models (containing generalised partially linear models). Details can be found in the paper by Young and Shah (2024) <doi:10.48550/arXiv.2410.03471>. |
Authors: | Elliot H. Young [aut, cre], Rajen D. Shah [aut] |
Maintainer: | Elliot H. Young <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-12-11 06:48:01 UTC |
Source: | CRAN |
This is a method that prints a useful summary of aspects of a roseRF
object fitted by the functions roseRF_...
in roseRF
.
## S3 method for class 'roseforest' print(x, ...)
## S3 method for class 'roseforest' print(x, ...)
x |
a fitted |
... |
additional arguments |
Prints output for roseRF
object
Estimates the parameter of interest in the generalised partially linear model
for some (strictly increasing, differentiable) link function , which can be reposed in terms of
the ‘nuisance functions’
as
roseRF_gplm( y_on_xz_formula, y_on_xz_learner, y_on_xz_pars = list(), Gy_on_z_formula, Gy_on_z_learner, Gy_on_z_pars = list(), x_formula, x_learner, x_pars = list(), M1_formula = x_formula, M1_learner = x_learner, M1_pars = x_pars, M2_formula = NA, M2_learner = NA, M2_pars = list(), M3_formula = NA, M3_learner = NA, M3_pars = list(), M4_formula = NA, M4_learner = NA, M4_pars = list(), M5_formula = NA, M5_learner = NA, M5_pars = list(), link = "identity", data, K = 5, S = 1, max.depth = 10, num.trees = 500, min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))), replace = TRUE, sample.fraction = 0.8 )
roseRF_gplm( y_on_xz_formula, y_on_xz_learner, y_on_xz_pars = list(), Gy_on_z_formula, Gy_on_z_learner, Gy_on_z_pars = list(), x_formula, x_learner, x_pars = list(), M1_formula = x_formula, M1_learner = x_learner, M1_pars = x_pars, M2_formula = NA, M2_learner = NA, M2_pars = list(), M3_formula = NA, M3_learner = NA, M3_pars = list(), M4_formula = NA, M4_learner = NA, M4_pars = list(), M5_formula = NA, M5_learner = NA, M5_pars = list(), link = "identity", data, K = 5, S = 1, max.depth = 10, num.trees = 500, min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))), replace = TRUE, sample.fraction = 0.8 )
y_on_xz_formula |
a two-sided formula object describing the model for |
y_on_xz_learner |
a string specifying the regression method to fit the regression as given by |
y_on_xz_pars |
a list containing hyperparameters for the |
Gy_on_z_formula |
a two-sided formula object describing the model for |
Gy_on_z_learner |
a string specifying the regression method to fit the regression as given by |
Gy_on_z_pars |
a list containing hyperparameters for the |
x_formula |
a two-sided formula object describing the model for |
x_learner |
a string specifying the regression method to fit the regression of |
x_pars |
a list containing hyperparameters for the |
M1_formula |
a two-sided formula object for the model |
M1_learner |
a string specifying the regression method for |
M1_pars |
a list containing hyperparameters for the |
M2_formula |
a two-sided formula object for the model |
M2_learner |
a string specifying the regression method for |
M2_pars |
a list containing hyperparameters for the |
M3_formula |
a two-sided formula object for the model |
M3_learner |
a string specifying the regression method for |
M3_pars |
a list containing hyperparameters for the |
M4_formula |
a two-sided formula object for the model |
M4_learner |
a string specifying the regression method for |
M4_pars |
a list containing hyperparameters for the |
M5_formula |
a two-sided formula object for the model |
M5_learner |
a string specifying the regression method for |
M5_pars |
a list containing hyperparameters for the |
link |
link function ( |
data |
a data frame containing the variables for the partially linear model. |
K |
the number of folds used for |
S |
the number of repeats to mitigate the randomness in the estimator on the sample splits used for |
max.depth |
Maximum depth parameter used for ROSE random forests. Default is 5. |
num.trees |
Number of trees used for a single ROSE random forest. Default is 50. |
min.node.size |
Minimum node size of a leaf in each tree. Default is |
replace |
Whether sampling for a single random tree are performed with (bootstrap) or without replacement. Default is |
sample.fraction |
Proportion of data used for each random tree. Default is 0.8. |
The estimator of interest solves the estimating equation
where denotes user-chosen functions of
and
denotes weights estimated via ROSE random forests.
The default takes
and
; if taking
we recommend care
in checking the applicability and appropriateness of any additional user-chosen
regression tasks.
The parameter of interest is estimated using a DML2 /
-fold cross-fitting
framework, to allow for arbitrary (faster than
-consistent) learners for
i.e. solving
the estimating equation
where denotes a partition of the index set for the datapoints
,
denotes an estimator for
trained on the data indexed by
, and
denotes a ROSE random forest (again trained on the data
indexed by
).
A list containing:
theta
The estimator of .
stderror
Huber robust estimate of the standard error of the -estimator.
coefficients
Table of coefficient estimator, standard error, z-value and p-value.
ROSE random forest estimator for the partially linear instrumental variable model
roseRF_pliv( y_formula, y_learner, y_pars = list(), x_formula, x_learner, x_pars = list(), IV1_formula = NA, IV1_learner = NA, IV1_pars = list(), IV2_formula = NA, IV2_learner = NA, IV2_pars = list(), IV3_formula = NA, IV3_learner = NA, IV3_pars = list(), IV4_formula = NA, IV4_learner = NA, IV4_pars = list(), IV5_formula = NA, IV5_learner = NA, IV5_pars = list(), data, K = 5, S = 1, max.depth = 10, num.trees = 500, min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))), replace = TRUE, sample.fraction = 0.8 )
roseRF_pliv( y_formula, y_learner, y_pars = list(), x_formula, x_learner, x_pars = list(), IV1_formula = NA, IV1_learner = NA, IV1_pars = list(), IV2_formula = NA, IV2_learner = NA, IV2_pars = list(), IV3_formula = NA, IV3_learner = NA, IV3_pars = list(), IV4_formula = NA, IV4_learner = NA, IV4_pars = list(), IV5_formula = NA, IV5_learner = NA, IV5_pars = list(), data, K = 5, S = 1, max.depth = 10, num.trees = 500, min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))), replace = TRUE, sample.fraction = 0.8 )
y_formula |
a two-sided formula object describing the regression model for |
y_learner |
a string specifying the regression method to fit the regression of |
y_pars |
a list containing hyperparameters for the |
x_formula |
a two-sided formula object describing the regression model for |
x_learner |
a string specifying the regression method to fit the regression of |
x_pars |
a list containing hyperparameters for the |
IV1_formula |
a two-sided formula object for the model |
IV1_learner |
a string specifying the regression method for |
IV1_pars |
a list containing hyperparameters for the |
IV2_formula |
a two-sided formula object for the model |
IV2_learner |
a string specifying the regression method for |
IV2_pars |
a list containing hyperparameters for the |
IV3_formula |
a two-sided formula object for the model |
IV3_learner |
a string specifying the regression method for |
IV3_pars |
a list containing hyperparameters for the |
IV4_formula |
a two-sided formula object for the model |
IV4_learner |
a string specifying the regression method for |
IV4_pars |
a list containing hyperparameters for the |
IV5_formula |
a two-sided formula object for the model |
IV5_learner |
a string specifying the regression method for |
IV5_pars |
a list containing hyperparameters for the |
data |
a data frame containing the variables for the partially linear model. |
K |
the number of folds used for |
S |
the number of repeats to mitigate the randomness in the estimator on the sample splits used for |
max.depth |
Maximum depth parameter used for ROSE random forests. Default is 5. |
num.trees |
Number of trees used for a single ROSE random forest. Default is 50. |
min.node.size |
Minimum node size of a leaf in each tree. Default is |
replace |
Whether sampling for a single random tree are performed with (bootstrap) or without replacement. Default is |
sample.fraction |
Proportion of data used for each random tree. Default is 0.8. |
A list containing:
theta
The estimator of .
stderror
Huber robust estimate of the standard error of the -estimator.
coefficients
Table of coefficient estimator, standard error, z-value and p-value.
Estimates the parameter of interest in the partially linear model
which can be reposed in terms of
the ‘nuisance functions’ as
roseRF_plm( y_formula, y_learner, y_pars = list(), x_formula, x_learner, x_pars = list(), M1_formula = x_formula, M1_learner = x_learner, M1_pars = x_pars, M2_formula = NA, M2_learner = NA, M2_pars = list(), M3_formula = NA, M3_learner = NA, M3_pars = list(), M4_formula = NA, M4_learner = NA, M4_pars = list(), M5_formula = NA, M5_learner = NA, M5_pars = list(), data, K = 5, S = 1, max.depth = 10, num.trees = 500, min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))), replace = TRUE, sample.fraction = 0.8 )
roseRF_plm( y_formula, y_learner, y_pars = list(), x_formula, x_learner, x_pars = list(), M1_formula = x_formula, M1_learner = x_learner, M1_pars = x_pars, M2_formula = NA, M2_learner = NA, M2_pars = list(), M3_formula = NA, M3_learner = NA, M3_pars = list(), M4_formula = NA, M4_learner = NA, M4_pars = list(), M5_formula = NA, M5_learner = NA, M5_pars = list(), data, K = 5, S = 1, max.depth = 10, num.trees = 500, min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))), replace = TRUE, sample.fraction = 0.8 )
y_formula |
a two-sided formula object describing the model for |
y_learner |
a string specifying the regression method to fit the regression of |
y_pars |
a list containing hyperparameters for the |
x_formula |
a two-sided formula object describing the model for |
x_learner |
a string specifying the regression method to fit the regression of |
x_pars |
a list containing hyperparameters for the |
M1_formula |
a two-sided formula object for the model |
M1_learner |
a string specifying the regression method for |
M1_pars |
a list containing hyperparameters for the |
M2_formula |
a two-sided formula object for the model |
M2_learner |
a string specifying the regression method for |
M2_pars |
a list containing hyperparameters for the |
M3_formula |
a two-sided formula object for the model |
M3_learner |
a string specifying the regression method for |
M3_pars |
a list containing hyperparameters for the |
M4_formula |
a two-sided formula object for the model |
M4_learner |
a string specifying the regression method for |
M4_pars |
a list containing hyperparameters for the |
M5_formula |
a two-sided formula object for the model |
M5_learner |
a string specifying the regression method for |
M5_pars |
a list containing hyperparameters for the |
data |
a data frame containing the variables for the partially linear model. |
K |
the number of folds used for |
S |
the number of repeats to mitigate the randomness in the estimator on the sample splits used for |
max.depth |
Maximum depth parameter used for ROSE random forests. Default is 5. |
num.trees |
Number of trees used for a single ROSE random forest. Default is 50. |
min.node.size |
Minimum node size of a leaf in each tree. Default is |
replace |
Whether sampling for a single random tree are performed with (bootstrap) or without replacement. Default is |
sample.fraction |
Proportion of data used for each random tree. Default is 0.8. |
The estimator of interest solves the estimating equation
where denotes user-chosen functions of
and
denotes weights estimated via ROSE random forests.
The default takes
and
; if taking
we recommend care
in checking the applicability and appropriateness of any additional user-chosen
regression tasks.
The parameter of interest is estimated using a DML2 /
-fold cross-fitting
framework, to allow for arbitrary (faster than
-consistent) learners for
i.e. solving
the estimating equation
where denotes a partition of the index set for the datapoints
,
denotes an estimator for
trained on the data indexed by
, and
denotes a ROSE random forest (again trained on the data
indexed by
).
A list containing:
theta
The estimator of .
stderror
Huber robust estimate of the standard error of the -estimator.
coefficients
Table of coefficient estimator, standard error, z-value and p-value.
Prints a roseRF
object fitted by the functions roseRF_...
in roseRF
.
## S3 method for class 'roseforest' summary(object, ...)
## S3 method for class 'roseforest' summary(object, ...)
object |
a fitted |
... |
additional arguments |
Prints summary output for roseRF
object
Estimates the parameter of interest in the generalised partially linear regression model
as in roseRF_gplm
but without
any weights i.e. ,
and
.
unweighted_gplm( y_on_xz_formula, y_on_xz_learner, y_on_xz_pars = list(), Gy_on_z_formula, Gy_on_z_learner, Gy_on_z_pars = list(), x_formula, x_learner, x_pars = list(), link = "identity", data, K = 5, S = 1 )
unweighted_gplm( y_on_xz_formula, y_on_xz_learner, y_on_xz_pars = list(), Gy_on_z_formula, Gy_on_z_learner, Gy_on_z_pars = list(), x_formula, x_learner, x_pars = list(), link = "identity", data, K = 5, S = 1 )
y_on_xz_formula |
a two-sided formula object describing the regression model for |
y_on_xz_learner |
a string specifying the regression method to fit the regression as given by |
y_on_xz_pars |
a list containing hyperparameters for the |
Gy_on_z_formula |
a two-sided formula object describing the regression model for |
Gy_on_z_learner |
a string specifying the regression method to fit the regression as given by |
Gy_on_z_pars |
a list containing hyperparameters for the |
x_formula |
a two-sided formula object describing the regression model for |
x_learner |
a string specifying the regression method to fit the regression of |
x_pars |
a list containing hyperparameters for the |
link |
link function ( |
data |
a data frame containing the variables for the partially linear model. |
K |
the number of folds used for |
S |
the number of repeats to mitigate the randomness in the estimator on the sample splits used for |
A list containing:
theta
The estimator of .
stderror
Huber robust estimate of the standard error of the -estimator.
coefficients
Table of coefficient estimator, standard error, z-value and p-value.
Estimates the parameter of interest in the partially linear regression model
as in roseRF_plm
but without
any weights i.e. ,
and
.
unweighted_plm( y_formula, y_learner, y_pars = list(), x_formula, x_learner, x_pars = list(), data, K = 5, S = 1 )
unweighted_plm( y_formula, y_learner, y_pars = list(), x_formula, x_learner, x_pars = list(), data, K = 5, S = 1 )
y_formula |
a two-sided formula object describing the regression model for |
y_learner |
a string specifying the regression method to fit the regression of |
y_pars |
a list containing hyperparameters for the |
x_formula |
a two-sided formula object describing the regression model for |
x_learner |
a string specifying the regression method to fit the regression of |
x_pars |
a list containing hyperparameters for the |
data |
a data frame containing the variables for the partially linear model. |
K |
the number of folds used for |
S |
the number of repeats to mitigate the randomness in the estimator on the sample splits used for |
A list containing:
theta
The estimator of .
stderror
Huber robust estimate of the standard error of the -estimator.
coefficients
Table of coefficient estimator, standard error, z-value and p-value.