Package 'sparseR' reference manual

Title:	Variable Selection under Ranked Sparsity Principles for Interactions and Polynomials
Description:	An implementation of ranked sparsity methods, including penalized regression methods such as the sparsity-ranked lasso, its non-convex alternatives, and elastic net, as well as the sparsity-ranked Bayesian Information Criterion. As described in Peterson and Cavanaugh (2022) <doi:10.1007/s10182-021-00431-7>, ranked sparsity is a philosophy with methods primarily useful for variable selection in the presence of prior informational asymmetry, which occurs in the context of trying to perform variable selection in the presence of interactions and/or polynomials. Ultimately, this package attempts to facilitate dealing with cumbersome interactions and polynomials while not avoiding them entirely. Typically, models selected under ranked sparsity principles will also be more transparent, having fewer falsely selected interactions and polynomials than other methods.
Authors:	Ryan Andrew Peterson [aut, cre]
Maintainer:	Ryan Andrew Peterson <[email protected]>
License:	GPL-3
Version:	0.3.1
Built:	2024-09-16 06:23:16 UTC
Source:	CRAN

sparseR: Implement ranked sparsity for selecting interactions and polynomials

Description

The sparseR package implements various techniques for selecting from a set of interaction and polynomial terms under ranked sparsity. Additional tools for data pre-processing, post-selection inference, and visualization are also included.

Author(s)

Maintainer: Ryan Andrew Peterson [email protected] (ORCID)

Data sets

Description

Detrano data sets (cleveland, hungarian, switzerland, va); The Iowa Radon Lung Cancer Study (irlcs_radon_syn): Data simulated to resemble the IRLCS study; Sheddon survival data (Z: clinical covariates, S:survival outcome)

Usage

cleveland

hungarian

switzerland

va

irlcs_radon_syn

Z

S
cleveland

hungarian

switzerland

va

irlcs_radon_syn

Z

S

Format

An object of class data.frame with 303 rows and 14 columns.

An object of class data.frame with 294 rows and 14 columns.

An object of class data.frame with 123 rows and 14 columns.

An object of class data.frame with 200 rows and 14 columns.

An object of class data.frame with 1027 rows and 16 columns.

An object of class data.frame with 442 rows and 6 columns.

An object of class Surv with 442 rows and 2 columns.

Source

Detrano data: https://archive.ics.uci.edu/ml/datasets/heart+disease
IRLCS data sets: https://cheec.uiowa.edu/research/residential-radon-and-lung-cancer-case-control-study
Sheddon: https://www.gsea-msigdb.org/gsea/msigdb/cards/SHEDDEN_LUNG_CANCER_POOR_SURVIVAL_A6

References

Detrano: Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64,304–310.
IRLCS: FIELD, R., SMITH, B., STECK, D. et al. Residential radon exposure and lung cancer: Variation in risk estimates using alternative exposure scenarios. J Expo Sci Environ Epidemiol 12, 197–203 (2002). https://www.nature.com/articles/7500215
Shedden: Director's Challenge Consortium for the Molecular Classification of Lung Adenocarcinoma, Shedden, K., Taylor, J. M., Enkemann, S. A., Tsao, M. S., Yeatman, T. J., Gerald, W. L., Eschrich, S., Jurisica, I., Giordano, T. J., Misek, D. E., Chang, A. C., Zhu, C. Q., Strumpf, D., Hanash, S., Shepherd, F. A., Ding, K., Seymour, L., Naoki, K., Pennell, N., … Beer, D. G. (2008). Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nature medicine, 14(8), 822–827. https://www.nature.com/articles/nm.1790

Custom IC functions for stepwise models

Description

Custom IC functions for stepwise models

Usage

EBIC(...)

## Default S3 method:
EBIC(fit, varnames, pen_info, gammafn = NULL, return_df = TRUE, ...)

RBIC(fit, ...)

## Default S3 method:
RBIC(fit, varnames, pen_info, gammafn = NULL, return_df = TRUE, ...)

RAIC(fit, ...)

## Default S3 method:
RAIC(fit, varnames, pen_info, gammafn = NULL, return_df = TRUE, ...)
EBIC(...)

## Default S3 method:
EBIC(fit, varnames, pen_info, gammafn = NULL, return_df = TRUE, ...)

RBIC(fit, ...)

## Default S3 method:
RBIC(fit, varnames, pen_info, gammafn = NULL, return_df = TRUE, ...)

RAIC(fit, ...)

## Default S3 method:
RAIC(fit, varnames, pen_info, gammafn = NULL, return_df = TRUE, ...)

Arguments

`...`	additional args
`fit`	a fitted object
`varnames`	names of variables
`pen_info`	penalty information
`gammafn`	What to use for gamma in formula
`return_df`	should the deg. freedom be returned

Value

A vector of values for the criterion requested, and the degrees of freedom (appended to front of vector) if return_df == TRUE.

Plot relevant effects of a sparseR object

Description

Plot relevant effects of a sparseR object

Usage

effect_plot(fit, ...)

## S3 method for class 'sparseR'
effect_plot(
  fit,
  coef_name,
  at = c("cvmin", "cv1se"),
  by = NULL,
  by_levels,
  nn = 101,
  plot.args = list(),
  resids = TRUE,
  legend_location = "bottomright",
  ...
)

## S3 method for class 'sparseRBIC'
effect_plot(
  fit,
  coef_name,
  by = NULL,
  by_levels,
  nn = 101,
  plot.args = list(),
  resids = TRUE,
  legend_location = "bottomright",
  ...
)
effect_plot(fit, ...)

## S3 method for class 'sparseR'
effect_plot(
  fit,
  coef_name,
  at = c("cvmin", "cv1se"),
  by = NULL,
  by_levels,
  nn = 101,
  plot.args = list(),
  resids = TRUE,
  legend_location = "bottomright",
  ...
)

## S3 method for class 'sparseRBIC'
effect_plot(
  fit,
  coef_name,
  by = NULL,
  by_levels,
  nn = 101,
  plot.args = list(),
  resids = TRUE,
  legend_location = "bottomright",
  ...
)

Arguments

`fit`	a 'sparseR' object
`...`	additional arguments
`coef_name`	The name of the coefficient to plot along the x-axis
`at`	value of lambda to use
`by`	the variable(s) involved in the (possible) interaction
`by_levels`	values to cut continuous by variable (defaults to 3 quantiles)
`nn`	number of points to plot along prediction line
`plot.args`	list of arguments passed to the plot itself
`resids`	should residuals be plotted or not?
`legend_location`	location for legend passed to 'legend'

Value

nothing returned

Nothing (invisible) returned

Helper function to help set up penalties

Description

Helper function to help set up penalties

Usage

get_penalties(
  varnames,
  poly,
  poly_prefix = "poly_",
  int_sep = "\\:",
  pool = FALSE,
  gamma = 0.5,
  cumulative_k = FALSE,
  cumulative_poly = TRUE
)
get_penalties(
  varnames,
  poly,
  poly_prefix = "poly_",
  int_sep = "\\:",
  pool = FALSE,
  gamma = 0.5,
  cumulative_k = FALSE,
  cumulative_poly = TRUE
)

Arguments

`varnames`	names of the covariates in the model matrix
`poly`	max polynomial considered
`poly_prefix`	what comes before the polynomial specification in these varnames?
`int_sep`	What denotes the multiplication for interactions?
`pool`	Should polynomials and interactions be pooled?
`gamma`	How much should the penalty increase with group size (0.5 assumes equal contribution of prior information)
`cumulative_k`	Should penalties be increased cumulatively as order interaction increases? (only used if !pool)
`cumulative_poly`	Should penalties be increased cumulatively as order polynomial increases? (only used if !pool)

Details

This is primarily a helper function for sparseR, but it may be useful if doing the model matrix set up by hand.

Value

a list of relevant information for the variables, including:

`penalties`	the numeric value of the penalties
`vartype`	Variable type (main effect, order k interaction, etc)
`varname`	names of variables

Plot relevant properties of sparseR objects

Description

Plot relevant properties of sparseR objects

Usage

## S3 method for class 'sparseR'
plot(x, plot_type = c("both", "cv", "path"), cols = NULL, log.l = TRUE, ...)
## S3 method for class 'sparseR'
plot(x, plot_type = c("both", "cv", "path"), cols = NULL, log.l = TRUE, ...)

Arguments

`x`	a 'sparseR' object
`plot_type`	should the solution path, CV results, or both be plotted?
`cols`	option to specify color of groups
`log.l`	should the x-axis (lambda) be logged?
`...`	extra plotting options

Value

nothing returned

Predict coefficients or responses for sparseR object

Description

Predict coefficients or responses for sparseR object

Usage

## S3 method for class 'sparseR'
predict(object, newdata, lambda, at = c("cvmin", "cv1se"), ...)

## S3 method for class 'sparseR'
coef(object, lambda, at = c("cvmin", "cv1se"), ...)
## S3 method for class 'sparseR'
predict(object, newdata, lambda, at = c("cvmin", "cv1se"), ...)

## S3 method for class 'sparseR'
coef(object, lambda, at = c("cvmin", "cv1se"), ...)

Arguments

`object`	sparseR object
`newdata`	new data on which to make predictions
`lambda`	a particular value of lambda to predict with
`at`	a "smart" guess to use for lambda
`...`	additional arguments passed to predict.ncvreg

Value

predicted outcomes for 'newdata' (or coefficients) at specified (or smart) lambda value

Print sparseR object

Description

Print sparseR object

Usage

## S3 method for class 'sparseR'
print(x, prep = FALSE, ...)
## S3 method for class 'sparseR'
print(x, prep = FALSE, ...)

Arguments

`x`	a sparseR object
`prep`	Should the SR set-up information be printed as well?
`...`	additional arguments passed to print.ncvreg

Value

returns x invisibly

Fit a ranked-sparsity model with regularized regression

Description

Fit a ranked-sparsity model with regularized regression

Usage

sparseR(
  formula,
  data,
  family = c("gaussian", "binomial", "poisson", "coxph"),
  penalty = c("lasso", "MCP", "SCAD"),
  alpha = 1,
  ncvgamma = 3,
  lambda.min = 0.005,
  k = 1,
  poly = 2,
  gamma = 0.5,
  cumulative_k = FALSE,
  cumulative_poly = TRUE,
  pool = FALSE,
  ia_formula = NULL,
  pre_process = TRUE,
  model_matrix = NULL,
  y = NULL,
  poly_prefix = "_poly_",
  int_sep = "\\:",
  pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
  filter = c("nzv", "zv"),
  extra_opts = list(),
  ...
)
sparseR(
  formula,
  data,
  family = c("gaussian", "binomial", "poisson", "coxph"),
  penalty = c("lasso", "MCP", "SCAD"),
  alpha = 1,
  ncvgamma = 3,
  lambda.min = 0.005,
  k = 1,
  poly = 2,
  gamma = 0.5,
  cumulative_k = FALSE,
  cumulative_poly = TRUE,
  pool = FALSE,
  ia_formula = NULL,
  pre_process = TRUE,
  model_matrix = NULL,
  y = NULL,
  poly_prefix = "_poly_",
  int_sep = "\\:",
  pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
  filter = c("nzv", "zv"),
  extra_opts = list(),
  ...
)

Arguments

`formula`	Names of the terms
`data`	Data
`family`	The family of the model
`penalty`	What penalty should be used (lasso, MCP, or SCAD)
`alpha`	The mix of L1 penalty (lower values introduce more L2 ridge penalty)
`ncvgamma`	The tuning parameter for ncvreg (for MCP or SCAD)
`lambda.min`	The minimum value to be used for lambda (as ratio of max, see ?ncvreg)
`k`	The maximum order of interactions to consider (default: 1; all pairwise)
`poly`	The maximum order of polynomials to consider (default: 2)
`gamma`	The degree of extremity of sparsity rankings (see details)
`cumulative_k`	Should penalties be increased cumulatively as order interaction increases?
`cumulative_poly`	Should penalties be increased cumulatively as order polynomial increases?
`pool`	Should interactions of order k and polynomials of order k+1 be pooled together for calculating the penalty?
`ia_formula`	formula to be passed to step_interact (for interactions, see details)
`pre_process`	Should the data be preprocessed (if FALSE, must provide model_matrix)
`model_matrix`	A data frame or matrix specifying the full model matrix (used if !pre_process)
`y`	A vector of responses (used if !pre_process)
`poly_prefix`	If model_matrix is specified, what is the prefix for polynomial terms?
`int_sep`	If model_matrix is specified, what is the separator for interaction terms?
`pre_proc_opts`	List of preprocessing steps (see details)
`filter`	The type of filter applied to main effects + interactions
`extra_opts`	A list of options for all preprocess steps (see details)
`...`	Additional arguments (passed to fitting function)

Details

Selecting gamma: higher values of gamma will penalize "group" size more. By default, this is set to 0.5, which yields equal contribution of prior information across orders of interactions/polynomials (this is a good default for most settings).

Additionally, setting cumulative_poly or cumulative_k to TRUE increases the penalty cumulatively based on the order of either polynomial or interaction.

The options that can be passed to pre_proc_opts are: - knnImpute (should missing data be imputed?) - scale (should data be standardized)? - center (should data be centered to the mean or another value?) - otherbin (should factors with low prevalence be combined?) - none (should no preprocessing be done? can also specify a null object)

The options that can be passed to extra_opts are:

centers (named numeric vector which denotes where each covariate should be centered)
center_fn (alternatively, a function can be specified to calculate center such as min or median)
freq_cut, unique_cut (see ?step_nzv; these get used by the filtering steps)
neighbors (the number of neighbors for knnImpute)
one_hot (see ?step_dummy), this defaults to cell-means coding which can be done in regularized regression (change at your own risk)
raw (should polynomials not be orthogonal? defaults to true because variables are centered and scaled already by this point by default)

ia_formula will by default interact all variables with each other up to order k. If specified, ia_formula will be passed as the terms argument to recipes::step_interact, so the help documentation for that function can be investigated for further assistance in specifying specific interactions.

Value

an object of class sparseR containing the following:

`fit`	the fit object returned by `ncvreg`
`srprep`	a `recipes` object used to prep the data
`pen_factors`	the factor multiple on penalties for ranked sparsity
`results`	all coefficients and penalty factors at minimum CV lambda
`results_summary`	a tibble of summary results at minimum CV lambda
`results1se`	all coefficients and penalty factors at lambda_1se
`results1se_summary`	a tibble of summary results at lambda_1se
`data`	the (unprocessed) data
`family`	the family argument (for non-normal, eg. poisson)
`info`	a list containing meta-info about the procedure

References

For fitting functionality, the ncvreg package is used; see Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

Preprocess & create a model matrix with interactions + polynomials

Description

Preprocess & create a model matrix with interactions + polynomials

Usage

sparseR_prep(
  formula,
  data,
  k = 1,
  poly = 1,
  pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
  ia_formula = NULL,
  filter = c("nzv", "zv"),
  extra_opts = list(),
  family = "gaussian"
)
sparseR_prep(
  formula,
  data,
  k = 1,
  poly = 1,
  pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
  ia_formula = NULL,
  filter = c("nzv", "zv"),
  extra_opts = list(),
  family = "gaussian"
)

Arguments

`formula`	A formula of the main effects + outcome of the model
`data`	A required data frame or tibble containing the variables in `formula`
`k`	Maximum order of interactions to numeric variables
`poly`	the maximum order of polynomials to consider
`pre_proc_opts`	A character vector specifying methods for preprocessing (see details)
`ia_formula`	formula to be passed to step_interact (for interactions, see details)
`filter`	which methods should be used to filter out variables with (near) zero variance? (see details)
`extra_opts`	extra options to be used for preprocessing
`family`	family passed from sparseR

Details

The pre_proc_opts acts as a wrapper for the corresponding procedures in the recipes package. The currently supported options that can be passed to pre_proc_opts are: knnImpute: Should k-nearest-neighbors be performed (if necessary?) scale: Should variables be scaled prior to creating interactions (does not scale factor variables or dummy variables) center: Should variables be centered (will not center factor variables or dummy variables ) otherbin:

The methods specified in filter are important; filtering is necessary to cut down on extraneous polynomials and interactions (in cases where they really don't make sense). This is true, for instance, when using dummy variables in polynomials , or when using interactions of dummy variables that relate to the same categorical variable.

Value

an object of class recipe; see recipes::recipe()

Bootstrap procedure for stepwise regression

Description

Runs bootstrap on models selection procedure using RBIC to find bootstrapped standard error (smoothed, see Efron 2014) as well as selection percentage across candidate variables. (experimental)

Usage

sparseRBIC_bootstrap(srbic_fit, B = 100, quiet = FALSE)
sparseRBIC_bootstrap(srbic_fit, B = 100, quiet = FALSE)

Arguments

`srbic_fit`	An object fitted by sparseRBIC_step
`B`	Number of bootstrap samples
`quiet`	Should the display of a progress bar be silenced?

Value

a list containing:

`results`	a tibble containing coefficients, p-values, selection pct
`bootstraps`	a tibble of bootstrapped coefficients

Sample split procedure for stepwise regression

Description

Runs multiple on models selection procedures using RBIC to achieve valid inferential results post-selection

Usage

sparseRBIC_sampsplit(srbic_fit, S = 100, quiet = FALSE)
sparseRBIC_sampsplit(srbic_fit, S = 100, quiet = FALSE)

Arguments

`srbic_fit`	An object fitted by sparseRBIC_step
`S`	Number of splitting iterations
`quiet`	Should the display of a progress bar be silenced?

Value

a list containing:

`results`	a tibble containing coefficients, p-values, selection pct
`splits`	a tibble of different split-based coefficients

Fit a ranked-sparsity model with forward stepwise RBIC (experimental)

Description

Fit a ranked-sparsity model with forward stepwise RBIC (experimental)

Usage

sparseRBIC_step(
  formula,
  data,
  family = c("gaussian", "binomial", "poisson"),
  k = 1,
  poly = 1,
  ic = c("RBIC", "RAIC", "BIC", "AIC", "EBIC"),
  hier = c("strong", "weak", "none"),
  sequential = (hier[1] != "none"),
  cumulative_k = FALSE,
  cumulative_poly = TRUE,
  pool = FALSE,
  ia_formula = NULL,
  pre_process = TRUE,
  model_matrix = NULL,
  y = NULL,
  poly_prefix = "_poly_",
  int_sep = "\\:",
  pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
  filter = c("nzv", "zv"),
  extra_opts = list(),
  trace = 0,
  message = TRUE,
  ...
)
sparseRBIC_step(
  formula,
  data,
  family = c("gaussian", "binomial", "poisson"),
  k = 1,
  poly = 1,
  ic = c("RBIC", "RAIC", "BIC", "AIC", "EBIC"),
  hier = c("strong", "weak", "none"),
  sequential = (hier[1] != "none"),
  cumulative_k = FALSE,
  cumulative_poly = TRUE,
  pool = FALSE,
  ia_formula = NULL,
  pre_process = TRUE,
  model_matrix = NULL,
  y = NULL,
  poly_prefix = "_poly_",
  int_sep = "\\:",
  pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
  filter = c("nzv", "zv"),
  extra_opts = list(),
  trace = 0,
  message = TRUE,
  ...
)

Arguments

`formula`	Names of the terms
`data`	Data
`family`	The family of the model
`k`	The maximum order of interactions to consider
`poly`	The maximum order of polynomials to consider
`ic`	The information criterion to use
`hier`	Should hierarchy be enforced (weak or strong)? Must be set with sequential == TRUE (see details)
`sequential`	Should the main effects be considered first, orders sequentially added/considered?
`cumulative_k`	Should penalties be increased cumulatively as order interaction increases?
`cumulative_poly`	Should penalties be increased cumulatively as order polynomial increases?
`pool`	Should interactions of order k and polynomials of order k+1 be pooled together for calculating the penalty?
`ia_formula`	formula to be passed to step_interact via terms argument
`pre_process`	Should the data be preprocessed (if FALSE, must provide model_matrix)
`model_matrix`	A data frame or matrix specifying the full model matrix (used if !pre_process)
`y`	A vector of responses (used if !pre_process)
`poly_prefix`	If model_matrix is specified, what is the prefix for polynomial terms?
`int_sep`	If model_matrix is specified, what is the separator for interaction terms?
`pre_proc_opts`	List of preprocessing steps (see details)
`filter`	The type of filter applied to main effects + interactions
`extra_opts`	A list of options for all preprocess steps (see details)
`trace`	Should intermediate results of model selection process be output
`message`	should experimental message be suppressed
`...`	additional arguments for running stepwise selection

Details

This function mirrors sparseR but uses stepwise selection guided by RBIC.

Additionally, setting cumulative_poly or cumulative_k to TRUE increases the penalty cumulatively based on the order of either polynomial or interaction.

The hier hierarchy enforcement will only work if sequential == TRUE, and notably will only consider the "first gen" hierarchy, that is, that all main effects which make up an interaction are already in the model. It is therefore possible for a third order interaction (x1:x2:x3) to enter a model without x1:x2 or x2:x3, so long as x1, x2, and x3 are all in the model.

The options that can be passed to pre_proc_opts are:

knnImpute (should missing data be imputed?)
scale (should data be standardized)?
center (should data be centered to the mean or another value?)
otherbin (should factors with low prevalence be combined?)
none (should no preprocessing be done? can also specify a null object)

The options that can be passed to extra_opts are:

centers (named numeric vector which denotes where each covariate should be centered)
center_fn (alternatively, a function can be specified to calculate center such as min or median)
freq_cut, unique_cut (see ?step_nzv - these get used by the filtering steps)
neighbors (the number of neighbors for knnImpute)
one_hot (see ?step_dummy), this defaults to cell-means coding which can be done in regularized regression (change at your own risk)
raw (should polynomials not be orthogonal? defaults to true because variables are centered and scaled already by this point by default)

Value

an object of class sparseRBIC containing the following:

`fit`	the final fit object
`srprep`	a `recipes` object used to prep the data
`pen_info`	coefficient-level variable counts, types + names
`data`	the (unprocessed) data
`family`	the family argument (for non-normal, eg. poisson)
`info`	a list containing meta-info about the procedure
`stats`	the IC for each fit and respective terms included

Centering numeric data to a value besides their mean

Description

'step_center_to' generalizes 'step_center' to allow for a different function than the 'mean' function to calculate centers. It creates a *specification* of a recipe step that will normalize numeric data to have a 'center' of zero.

Usage

step_center_to(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  centers = NULL,
  center_fn = mean,
  na_rm = TRUE,
  skip = FALSE,
  id = rand_id("center_to")
)

## S3 method for class 'step_center_to'
tidy(x, ...)
step_center_to(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  centers = NULL,
  center_fn = mean,
  na_rm = TRUE,
  skip = FALSE,
  id = rand_id("center_to")
)

## S3 method for class 'step_center_to'
tidy(x, ...)

Arguments

`recipe`	A recipe object. The step will be added to the sequence of operations for this recipe.
`...`	One or more selector functions to choose which variables are affected by the step. See [selections()] for more details. For the 'tidy' method, these are not currently used.
`role`	Not used by this step since no new variables are created.
`trained`	A logical to indicate if the quantities for preprocessing have been estimated.
`centers`	A named numeric vector of centers. This is 'NULL' until computed by [prep.recipe()] (or it can be specified as a named numeric vector as well?).
`center_fn`	a function to be used to calculate where the center should be
`na_rm`	A logical value indicating whether 'NA' values should be removed during computations.
`skip`	A logical. Should the step be skipped when the recipe is baked by [bake.recipe()]? While all operations are baked when [prep.recipe()] is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using 'skip = TRUE' as it may affect the computations for subsequent operations
`id`	A character string that is unique to this step to identify it.
`x`	A 'step_center_to' object.

Details

Centering data means that the average of a variable is subtracted from the data. 'step_center_to' estimates the variable centers from the data used in the 'training' argument of 'prep.recipe'. 'bake.recipe' then applies the centering to new data sets using these centers.

Value

An updated version of 'recipe' with the new step added to the sequence of existing steps (if any). For the 'tidy' method, a tibble with columns 'terms' (the selectors or variables selected) and 'value' (the centers).

Examples

data(biomass, package = "modeldata")

biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]

rec <- recipes::recipe(
 HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
 data = biomass_tr)

center_trans <- rec %>%
  step_center_to(carbon, contains("gen"), -hydrogen)

center_obj <- recipes::prep(center_trans, training = biomass_tr)

transformed_te <- recipes::bake(center_obj, biomass_te)

biomass_te[1:10, names(transformed_te)]
transformed_te

recipes::tidy(center_trans)
recipes::tidy(center_obj)
data(biomass, package = "modeldata")

biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]

rec <- recipes::recipe(
 HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
 data = biomass_tr)

center_trans <- rec %>%
  step_center_to(carbon, contains("gen"), -hydrogen)

center_obj <- recipes::prep(center_trans, training = biomass_tr)

transformed_te <- recipes::bake(center_obj, biomass_te)

biomass_te[1:10, names(transformed_te)]
transformed_te

recipes::tidy(center_trans)
recipes::tidy(center_obj)

Summary of sparseR model coefficients

Description

Summary of sparseR model coefficients

Usage

## S3 method for class 'sparseR'
summary(object, lambda, at = c("cvmin", "cv1se"), ...)
## S3 method for class 'sparseR'
summary(object, lambda, at = c("cvmin", "cv1se"), ...)

Arguments

`object`	a sparseR object
`lambda`	a particular value of lambda to predict with
`at`	a "smart" guess to use for lambda
`...`	additional arguments to be passed to summary.ncvreg

Value

an object of class 'summary.ncvreg' at specified or smart value of lambda.

Package 'sparseR'

Help Index

sparseR: Implement ranked sparsity for selecting interactions and polynomials

Description

Author(s)

See Also

Data sets

Description

Usage

Format

Source

References

Custom IC functions for stepwise models

Description

Usage

Arguments

Value

Plot relevant effects of a sparseR object

Description

Usage

Arguments

Value

Helper function to help set up penalties

Description

Usage

Arguments

Details

Value

Plot relevant properties of sparseR objects

Description

Usage

Arguments

Value

Predict coefficients or responses for sparseR object

Description

Usage

Arguments

Value

Print sparseR object

Description

Usage

Arguments

Value

Fit a ranked-sparsity model with regularized regression

Description

Usage

Arguments

Details

Value

References

Preprocess & create a model matrix with interactions + polynomials

Description

Usage

Arguments

Details

Value

Bootstrap procedure for stepwise regression

Description

Usage

Arguments

Value

Sample split procedure for stepwise regression

Description

Usage

Arguments

Value

Fit a ranked-sparsity model with forward stepwise RBIC (experimental)

Description

Usage

Arguments

Details

Value

Centering numeric data to a value besides their mean

Description

Usage

Arguments

Details

Value

See Also

Examples