Package 'roseRF'

Title: ROSE Random Forests for Robust Semiparametric Efficient Estimation
Description: ROSE (RObust Semiparametric Efficient) random forests for robust semiparametric efficient estimation in partially parametric models (containing generalised partially linear models). Details can be found in the paper by Young and Shah (2024) <doi:10.48550/arXiv.2410.03471>.
Authors: Elliot H. Young [aut, cre], Rajen D. Shah [aut]
Maintainer: Elliot H. Young <[email protected]>
License: GPL-3
Version: 0.1.0
Built: 2024-12-11 06:48:01 UTC
Source: CRAN

Help Index


Print for a rose random forest fitted object

Description

This is a method that prints a useful summary of aspects of a roseRF object fitted by the functions roseRF_... in roseRF.

Usage

## S3 method for class 'roseforest'
print(x, ...)

Arguments

x

a fitted roseRF object fitted by roseRF....

...

additional arguments

Value

Prints output for roseRF object


ROSE random forest estimator for the generalised partially linear model

Description

Estimates the parameter of interest θ0\theta_0 in the generalised partially linear model

g(E[YX,Z])=Xθ0+f0(Z),g(\mathbb{E}[Y|X,Z]) = X\theta_0 + f_0(Z),

for some (strictly increasing, differentiable) link function gg, which can be reposed in terms of the ‘nuisance functions’ (E[XZ],E[g(E[YX,Z])Z])(\mathbb{E}[X|Z], \mathbb{E}[g(\mathbb{E}[Y|X,Z])|Z]) as

g(E[YX,Z])E[g(E[YX,Z])Z])=(XE[XZ])θ0.g\big(\mathbb{E}[Y|X,Z])-\mathbb{E}[g(\mathbb{E}[Y|X,Z])|Z]\big) = (X-\mathbb{E}[X|Z])\theta_0.

Usage

roseRF_gplm(
  y_on_xz_formula,
  y_on_xz_learner,
  y_on_xz_pars = list(),
  Gy_on_z_formula,
  Gy_on_z_learner,
  Gy_on_z_pars = list(),
  x_formula,
  x_learner,
  x_pars = list(),
  M1_formula = x_formula,
  M1_learner = x_learner,
  M1_pars = x_pars,
  M2_formula = NA,
  M2_learner = NA,
  M2_pars = list(),
  M3_formula = NA,
  M3_learner = NA,
  M3_pars = list(),
  M4_formula = NA,
  M4_learner = NA,
  M4_pars = list(),
  M5_formula = NA,
  M5_learner = NA,
  M5_pars = list(),
  link = "identity",
  data,
  K = 5,
  S = 1,
  max.depth = 10,
  num.trees = 500,
  min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))),
  replace = TRUE,
  sample.fraction = 0.8
)

Arguments

y_on_xz_formula

a two-sided formula object describing the model for E[YX,Z]\mathbb{E}[Y|X,Z] (regressing YY on (X)(X)).

y_on_xz_learner

a string specifying the regression method to fit the regression as given by y_on_xz_formula (e.g. randomforest, xgboost, neuralnet, gam).

y_on_xz_pars

a list containing hyperparameters for the y_on_xz_learner chosen. Default is an empty list, which performs hyperparameter tuning.

Gy_on_z_formula

a two-sided formula object describing the model for E[g(E[YX,Z])Z]\mathbb{E}[g(\mathbb{E}[Y|X,Z])|Z] (regressing g(E^[YX,Z])g(\hat{E}[Y|X,Z]) on ZZ).

Gy_on_z_learner

a string specifying the regression method to fit the regression as given by Gy_on_z_formula (e.g. randomforest, xgboost, neuralnet, gam).

Gy_on_z_pars

a list containing hyperparameters for the Gy_on_z_learner chosen. Default is an empty list, which performs hyperparameter tuning.

x_formula

a two-sided formula object describing the model for E[XZ]\mathbb{E}[X|Z].

x_learner

a string specifying the regression method to fit the regression of XX on ZZ as given by x_formula (e.g. randomforest, xgboost, neuralnet, gam).

x_pars

a list containing hyperparameters for the x_learner chosen. Default is an empty list, which performs hyperparameter tuning.

M1_formula

a two-sided formula object for the model E[M1(X)Z]\mathbb{E}[M_1(X)|Z]. Default is M1(X)=XM_1(X)=X.

M1_learner

a string specifying the regression method for E[M1(X)Z]\mathbb{E}[M_1(X)|Z] estimation.

M1_pars

a list containing hyperparameters for the M1_learner chosen.

M2_formula

a two-sided formula object for the model E[M2(X)Z]\mathbb{E}[M_2(X)|Z]. Default is no formula / regression (i.e. J=1J=1)

M2_learner

a string specifying the regression method for E[M2(X)Z]\mathbb{E}[M_2(X)|Z] estimation.

M2_pars

a list containing hyperparameters for the M2_learner chosen.

M3_formula

a two-sided formula object for the model E[M3(X)Z]\mathbb{E}[M_3(X)|Z]. Default is no formula / regression (i.e. J=1J=1).

M3_learner

a string specifying the regression method for E[M3(X)Z]\mathbb{E}[M_3(X)|Z] estimation.

M3_pars

a list containing hyperparameters for the M3_learner chosen.

M4_formula

a two-sided formula object for the model E[M4(X)Z]\mathbb{E}[M_4(X)|Z]. Default is no formula / regression (i.e. J=1J=1)

M4_learner

a string specifying the regression method for E[M4(X)Z]\mathbb{E}[M_4(X)|Z] estimation.

M4_pars

a list containing hyperparameters for the M4_learner chosen.

M5_formula

a two-sided formula object for the model E[M5(X)Z]\mathbb{E}[M_5(X)|Z]. Default is no formula / regression (i.e. J=1J=1)

M5_learner

a string specifying the regression method for E[M5(X)Z]\mathbb{E}[M_5(X)|Z] estimation.

M5_pars

a list containing hyperparameters for the M5_learner chosen.

link

link function (gg). Options include identity, log, sqrt, logit, probit. Default is identity.

data

a data frame containing the variables for the partially linear model.

K

the number of folds used for KK-fold cross-fitting. Default is 5.

S

the number of repeats to mitigate the randomness in the estimator on the sample splits used for KK-fold cross-fitting. Default is 5.

max.depth

Maximum depth parameter used for ROSE random forests. Default is 5.

num.trees

Number of trees used for a single ROSE random forest. Default is 50.

min.node.size

Minimum node size of a leaf in each tree. Default is max(10,ceiling(0.01 (K-1)/K nrow(data))).

replace

Whether sampling for a single random tree are performed with (bootstrap) or without replacement. Default is TRUE (i.e. bootstrap).

sample.fraction

Proportion of data used for each random tree. Default is 0.8.

Details

The estimator of interest θ0\theta_0 solves the estimating equation

iψ(Yi,Xi,Zi;θ,η^(Z),w^(Z))=0,\sum_{i}\psi(Y_i,X_i,Z_i; \theta,\hat{\eta}(Z),\hat{w}(Z)) = 0,

ψ(Y,X,Z;θ,η0,w):=j=1Jwj(Z)(Mj(X)E[Mj(X)Z])g(μ(X,Z;θ,η0))(Yμ(X,Z;θ,η0)),\psi(Y,X,Z;\theta,\eta_0,w) := \sum_{j=1}^J w_j(Z) \big( M_j(X) - \mathbb{E}[M_j(X)|Z] \big) g'(\mu(X,Z;\theta,\eta_0)) \big(Y-\mu(X,Z;\theta,\eta_0)\big) ,

μ(X,Z;θ,η0):=g1(E[g(E[YX,Z])Z]+(XE[XZ])θ),\mu(X,Z;\theta,\eta_0) := g^{-1}\big(\mathbb{E}[g(\mathbb{E}[Y|X,Z])|Z] + (X-\mathbb{E}[X|Z])\theta\big),

η0:=(E[YZ=],E[XZ=]),\eta_0 := \big(\mathbb{E}[Y|Z=\cdot], \mathbb{E}[X|Z=\cdot]\big),

where M1(X),,MJ(X)M_1(X),\ldots,M_J(X) denotes user-chosen functions of (X)(X) and w(Z)=(w1(Z),,wJ(Z))w(Z)=\big(w_1(Z),\ldots,w_J(Z)\big) denotes weights estimated via ROSE random forests. The default takes J=1J=1 and M1(X)=XM_1(X)=X; if taking J2J\geq 2 we recommend care in checking the applicability and appropriateness of any additional user-chosen regression tasks.

The parameter of interest θ0\theta_0 is estimated using a DML2 / KK-fold cross-fitting framework, to allow for arbitrary (faster than n1/4n^{1/4}-consistent) learners for η^\hat{\eta} i.e. solving the estimating equation

k[K]iIkψ(Yi,Xi,Zi;θ,η^(k)(Z),w^(k)(Z))=0,\sum_{k \in [K]}\sum_{i \in I_k}\psi(Y_i,X_i,Z_i; \theta,\hat{\eta}^{(k)}(Z),\hat{w}^{(k)}(Z)) = 0,

where I1,,IKI_1,\ldots,I_K denotes a partition of the index set for the datapoints (Yi,Xi,Zi)(Y_i,X_i,Z_i), η^(k)\hat{\eta}^{(k)} denotes an estimator for η0\eta_0 trained on the data indexed by IkcI_k^c, and w^(k)\hat{w}^{(k)} denotes a ROSE random forest (again trained on the data indexed by IkcI_k^c).

Value

A list containing:

theta

The estimator of θ0\theta_0.

stderror

Huber robust estimate of the standard error of the θ0\theta_0-estimator.

coefficients

Table of θ0\theta_0 coefficient estimator, standard error, z-value and p-value.


ROSE random forest estimator for the partially linear instrumental variable model

Description

ROSE random forest estimator for the partially linear instrumental variable model

Usage

roseRF_pliv(
  y_formula,
  y_learner,
  y_pars = list(),
  x_formula,
  x_learner,
  x_pars = list(),
  IV1_formula = NA,
  IV1_learner = NA,
  IV1_pars = list(),
  IV2_formula = NA,
  IV2_learner = NA,
  IV2_pars = list(),
  IV3_formula = NA,
  IV3_learner = NA,
  IV3_pars = list(),
  IV4_formula = NA,
  IV4_learner = NA,
  IV4_pars = list(),
  IV5_formula = NA,
  IV5_learner = NA,
  IV5_pars = list(),
  data,
  K = 5,
  S = 1,
  max.depth = 10,
  num.trees = 500,
  min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))),
  replace = TRUE,
  sample.fraction = 0.8
)

Arguments

y_formula

a two-sided formula object describing the regression model for E[YZ]\mathbb{E}[Y|Z].

y_learner

a string specifying the regression method to fit the regression of YY on ZZ as given by y_formula (e.g. randomforest, xgboost, neuralnet, gam).

y_pars

a list containing hyperparameters for the y_learner chosen. Default is an empty list, which performs hyperparameter tuning.

x_formula

a two-sided formula object describing the regression model for E[XZ]\mathbb{E}[X|Z].

x_learner

a string specifying the regression method to fit the regression of XX on ZZ as given by x_formula (e.g. randomforest, xgboost, neuralnet, gam).

x_pars

a list containing hyperparameters for the y_learner chosen. Default is an empty list, which performs hyperparameter tuning.

IV1_formula

a two-sided formula object for the model E[V1(X)Z]\mathbb{E}[V_1(X)|Z].

IV1_learner

a string specifying the regression method for E[V1(X)Z]\mathbb{E}[V_1(X)|Z] estimation.

IV1_pars

a list containing hyperparameters for the IV1_learner chosen.

IV2_formula

a two-sided formula object for the model E[V2Z]\mathbb{E}[V_2|Z]. Default is no formula / regression (i.e. J=1J=1)

IV2_learner

a string specifying the regression method for E[V2(X)Z]\mathbb{E}[V_2(X)|Z] estimation.

IV2_pars

a list containing hyperparameters for the IV2_learner chosen.

IV3_formula

a two-sided formula object for the model E[V3(X)Z]\mathbb{E}[V_3(X)|Z]. Default is no formula / regression (i.e. J=1J=1).

IV3_learner

a string specifying the regression method for E[V3(X)Z]\mathbb{E}[V_3(X)|Z] estimation.

IV3_pars

a list containing hyperparameters for the IV3_learner chosen.

IV4_formula

a two-sided formula object for the model E[V4(X)Z]\mathbb{E}[V_4(X)|Z]. Default is no formula / regression (i.e. J=1J=1)

IV4_learner

a string specifying the regression method for E[V4(X)Z]\mathbb{E}[V_4(X)|Z] estimation.

IV4_pars

a list containing hyperparameters for the IV4_learner chosen.

IV5_formula

a two-sided formula object for the model E[V5(X)Z]\mathbb{E}[V_5(X)|Z]. Default is no formula / regression (i.e. J=1J=1)

IV5_learner

a string specifying the regression method for E[V5(X)Z]\mathbb{E}[V_5(X)|Z] estimation.

IV5_pars

a list containing hyperparameters for the IV5_learner chosen.

data

a data frame containing the variables for the partially linear model.

K

the number of folds used for KK-fold cross-fitting. Default is 5.

S

the number of repeats to mitigate the randomness in the estimator on the sample splits used for KK-fold cross-fitting. Default is 5.

max.depth

Maximum depth parameter used for ROSE random forests. Default is 5.

num.trees

Number of trees used for a single ROSE random forest. Default is 50.

min.node.size

Minimum node size of a leaf in each tree. Default is max(10,ceiling(0.01 (K-1)/K nrow(data))).

replace

Whether sampling for a single random tree are performed with (bootstrap) or without replacement. Default is TRUE (i.e. bootstrap).

sample.fraction

Proportion of data used for each random tree. Default is 0.8.

Value

A list containing:

theta

The estimator of θ0\theta_0.

stderror

Huber robust estimate of the standard error of the θ0\theta_0-estimator.

coefficients

Table of θ0\theta_0 coefficient estimator, standard error, z-value and p-value.


ROSE random forest estimator for the partially linear model

Description

Estimates the parameter of interest θ0\theta_0 in the partially linear model

E[YX,Z]=Xθ0+f0(Z),\mathbb{E}[Y|X,Z] = X\theta_0 + f_0(Z),

which can be reposed in terms of the ‘nuisance functions’ (E[YX],E[XZ])(\mathbb{E}[Y|X], \mathbb{E}[X|Z]) as

E[YX,Z]E[YZ]=(XE[XZ])θ0.\mathbb{E}[Y|X,Z]-\mathbb{E}[Y|Z] = (X-\mathbb{E}[X|Z])\theta_0.

Usage

roseRF_plm(
  y_formula,
  y_learner,
  y_pars = list(),
  x_formula,
  x_learner,
  x_pars = list(),
  M1_formula = x_formula,
  M1_learner = x_learner,
  M1_pars = x_pars,
  M2_formula = NA,
  M2_learner = NA,
  M2_pars = list(),
  M3_formula = NA,
  M3_learner = NA,
  M3_pars = list(),
  M4_formula = NA,
  M4_learner = NA,
  M4_pars = list(),
  M5_formula = NA,
  M5_learner = NA,
  M5_pars = list(),
  data,
  K = 5,
  S = 1,
  max.depth = 10,
  num.trees = 500,
  min.node.size = max(10, ceiling(0.01 * (K - 1)/K * nrow(data))),
  replace = TRUE,
  sample.fraction = 0.8
)

Arguments

y_formula

a two-sided formula object describing the model for E[YZ]\mathbb{E}[Y|Z].

y_learner

a string specifying the regression method to fit the regression of YY on ZZ as given by y_formula (e.g. randomforest, xgboost, neuralnet, gam).

y_pars

a list containing hyperparameters for the y_learner chosen. Default is an empty list, which performs hyperparameter tuning.

x_formula

a two-sided formula object describing the model for E[XZ]\mathbb{E}[X|Z].

x_learner

a string specifying the regression method to fit the regression of XX on ZZ as given by x_formula (e.g. randomforest, xgboost, neuralnet, gam).

x_pars

a list containing hyperparameters for the y_learner chosen. Default is an empty list, which performs hyperparameter tuning.

M1_formula

a two-sided formula object for the model E[M1(X)Z]\mathbb{E}[M_1(X)|Z]. Default is M1(X)=XM_1(X)=X.

M1_learner

a string specifying the regression method for E[M1(X)Z]\mathbb{E}[M_1(X)|Z] estimation.

M1_pars

a list containing hyperparameters for the M1_learner chosen.

M2_formula

a two-sided formula object for the model E[M2(X)Z]\mathbb{E}[M_2(X)|Z]. Default is no formula / regression (i.e. J=1J=1)

M2_learner

a string specifying the regression method for E[M2(X)Z]\mathbb{E}[M_2(X)|Z] estimation.

M2_pars

a list containing hyperparameters for the M2_learner chosen.

M3_formula

a two-sided formula object for the model E[M3(X)Z]\mathbb{E}[M_3(X)|Z]. Default is no formula / regression (i.e. J=1J=1).

M3_learner

a string specifying the regression method for E[M3(X)Z]\mathbb{E}[M_3(X)|Z] estimation.

M3_pars

a list containing hyperparameters for the M3_learner chosen.

M4_formula

a two-sided formula object for the model E[M4(X)Z]\mathbb{E}[M_4(X)|Z]. Default is no formula / regression (i.e. J=1J=1)

M4_learner

a string specifying the regression method for E[M4(X)Z]\mathbb{E}[M_4(X)|Z] estimation.

M4_pars

a list containing hyperparameters for the M4_learner chosen.

M5_formula

a two-sided formula object for the model E[M5(X)Z]\mathbb{E}[M_5(X)|Z]. Default is no formula / regression (i.e. J=1J=1)

M5_learner

a string specifying the regression method for E[M5(X)Z]\mathbb{E}[M_5(X)|Z] estimation.

M5_pars

a list containing hyperparameters for the M5_learner chosen.

data

a data frame containing the variables for the partially linear model.

K

the number of folds used for KK-fold cross-fitting. Default is 5.

S

the number of repeats to mitigate the randomness in the estimator on the sample splits used for KK-fold cross-fitting. Default is 5.

max.depth

Maximum depth parameter used for ROSE random forests. Default is 5.

num.trees

Number of trees used for a single ROSE random forest. Default is 50.

min.node.size

Minimum node size of a leaf in each tree. Default is max(10,ceiling(0.01 (K-1)/K nrow(data))).

replace

Whether sampling for a single random tree are performed with (bootstrap) or without replacement. Default is TRUE (i.e. bootstrap).

sample.fraction

Proportion of data used for each random tree. Default is 0.8.

Details

The estimator of interest θ0\theta_0 solves the estimating equation

iψ(Yi,Xi,Zi;θ,η^(Z),w^(Z))=0,\sum_{i}\psi(Y_i,X_i,Z_i; \theta,\hat{\eta}(Z),\hat{w}(Z)) = 0,

ψ(Y,X,Z;θ,η0,w):=j=1Jwj(Z)(Mj(X)E[Mj(X)Z])((YE[YZ])(XE[XZ])θ),\psi(Y,X,Z;\theta,\eta_0,w) := \sum_{j=1}^J w_j(Z) \big( M_j(X) - \mathbb{E}[M_j(X)|Z]\big) \Big( \big(Y-\mathbb{E}[Y|Z]\big)-\big(X-\mathbb{E}[X|Z]\big)\theta \Big),

η0:=(E[YZ=],E[XZ=]),\eta_0 := \big(\mathbb{E}[Y|Z=\cdot], \mathbb{E}[X|Z=\cdot]\big),

where M1(X),,MJ(X)M_1(X),\ldots,M_J(X) denotes user-chosen functions of (X)(X) and w(Z)=(w1(Z),,wJ(Z))w(Z)=\big(w_1(Z),\ldots,w_J(Z)\big) denotes weights estimated via ROSE random forests. The default takes J=1J=1 and M1(X)=XM_1(X)=X; if taking J2J\geq 2 we recommend care in checking the applicability and appropriateness of any additional user-chosen regression tasks.

The parameter of interest θ0\theta_0 is estimated using a DML2 / KK-fold cross-fitting framework, to allow for arbitrary (faster than n1/4n^{1/4}-consistent) learners for η^\hat{\eta} i.e. solving the estimating equation

k[K]iIkψ(Yi,Xi,Zi;θ,η^(k)(Z),w^(k)(Z))=0,\sum_{k \in [K]}\sum_{i \in I_k}\psi(Y_i,X_i,Z_i; \theta,\hat{\eta}^{(k)}(Z),\hat{w}^{(k)}(Z)) = 0,

where I1,,IKI_1,\ldots,I_K denotes a partition of the index set for the datapoints (Yi,Xi,Zi)(Y_i,X_i,Z_i), η^(k)\hat{\eta}^{(k)} denotes an estimator for η0\eta_0 trained on the data indexed by IkcI_k^c, and w^(k)\hat{w}^{(k)} denotes a ROSE random forest (again trained on the data indexed by IkcI_k^c).

Value

A list containing:

theta

The estimator of θ0\theta_0.

stderror

Huber robust estimate of the standard error of the θ0\theta_0-estimator.

coefficients

Table of θ0\theta_0 coefficient estimator, standard error, z-value and p-value.


Summary for a rose random forest fitted object

Description

Prints a roseRF object fitted by the functions roseRF_... in roseRF.

Usage

## S3 method for class 'roseforest'
summary(object, ...)

Arguments

object

a fitted roseRF object fitted by roseRF_....

...

additional arguments

Value

Prints summary output for roseRF object


Unweighted (baseline) estimator for the generalised partially linear model

Description

Estimates the parameter of interest θ0\theta_0 in the generalised partially linear regression model

g(E[YX,Z])=Xθ0+f0(Z),g(\mathbb{E}[Y|X,Z]) = X\theta_0 + f_0(Z),

as in roseRF_gplm but without any weights i.e. J=1J=1, M1(X)=XM_1(X)=X and w11w_1\equiv 1.

Usage

unweighted_gplm(
  y_on_xz_formula,
  y_on_xz_learner,
  y_on_xz_pars = list(),
  Gy_on_z_formula,
  Gy_on_z_learner,
  Gy_on_z_pars = list(),
  x_formula,
  x_learner,
  x_pars = list(),
  link = "identity",
  data,
  K = 5,
  S = 1
)

Arguments

y_on_xz_formula

a two-sided formula object describing the regression model for E[YX,Z]\mathbb{E}[Y|X,Z] (regressing YY on (X,Z)(X,Z)).

y_on_xz_learner

a string specifying the regression method to fit the regression as given by y_on_xz_formula (e.g. randomforest, xgboost, neuralnet, gam).

y_on_xz_pars

a list containing hyperparameters for the y_on_xz_learner chosen. Default is an empty list, which performs hyperparameter tuning.

Gy_on_z_formula

a two-sided formula object describing the regression model for E[g(E[YX,Z])Z]\mathbb{E}[g(\mathbb{E}[Y|X,Z])|Z] (regressing g(E^[YX,Z])g(\hat{E}[Y|X,Z]) on ZZ).

Gy_on_z_learner

a string specifying the regression method to fit the regression as given by Gy_on_z_formula (e.g. randomforest, xgboost, neuralnet, gam).

Gy_on_z_pars

a list containing hyperparameters for the Gy_on_z_learner chosen. Default is an empty list, which performs hyperparameter tuning.

x_formula

a two-sided formula object describing the regression model for E[XZ]\mathbb{E}[X|Z].

x_learner

a string specifying the regression method to fit the regression of XX on ZZ as given by x_formula (e.g. randomforest, xgboost, neuralnet, gam).

x_pars

a list containing hyperparameters for the x_learner chosen. Default is an empty list, which performs hyperparameter tuning.

link

link function (gg). Options include identity, log, sqrt, logit, probit. Default is identity.

data

a data frame containing the variables for the partially linear model.

K

the number of folds used for KK-fold cross-fitting. Default is 5.

S

the number of repeats to mitigate the randomness in the estimator on the sample splits used for KK-fold cross-fitting. Default is 5.

Value

A list containing:

theta

The estimator of θ0\theta_0.

stderror

Huber robust estimate of the standard error of the θ0\theta_0-estimator.

coefficients

Table of θ0\theta_0 coefficient estimator, standard error, z-value and p-value.


Unweighted (baseline) estimator for the partially linear model

Description

Estimates the parameter of interest θ0\theta_0 in the partially linear regression model

E[YX,Z]=Xθ0+f0(Z),\mathbb{E}[Y|X,Z] = X\theta_0 + f_0(Z),

as in roseRF_plm but without any weights i.e. J=1J=1, M1(X)=XM_1(X)=X and w11w_1\equiv 1.

Usage

unweighted_plm(
  y_formula,
  y_learner,
  y_pars = list(),
  x_formula,
  x_learner,
  x_pars = list(),
  data,
  K = 5,
  S = 1
)

Arguments

y_formula

a two-sided formula object describing the regression model for E[YZ]\mathbb{E}[Y|Z].

y_learner

a string specifying the regression method to fit the regression of YY on ZZ as given by y_formula (e.g. randomforest, xgboost, neuralnet, gam).

y_pars

a list containing hyperparameters for the y_learner chosen. Default is an empty list, which performs hyperparameter tuning.

x_formula

a two-sided formula object describing the regression model for E[XZ]\mathbb{E}[X|Z].

x_learner

a string specifying the regression method to fit the regression of XX on ZZ as given by x_formula (e.g. randomforest, xgboost, neuralnet, gam).

x_pars

a list containing hyperparameters for the y_learner chosen. Default is an empty list, which performs hyperparameter tuning.

data

a data frame containing the variables for the partially linear model.

K

the number of folds used for KK-fold cross-fitting. Default is 5.

S

the number of repeats to mitigate the randomness in the estimator on the sample splits used for KK-fold cross-fitting. Default is 5.

Value

A list containing:

theta

The estimator of θ0\theta_0.

stderror

Huber robust estimate of the standard error of the θ0\theta_0-estimator.

coefficients

Table of θ0\theta_0 coefficient estimator, standard error, z-value and p-value.