Package 'BranchGLM'

Title: Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms
Description: Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.
Authors: Jacob Seedorff [aut, cre]
Maintainer: Jacob Seedorff <[email protected]>
License: Apache License (>= 2)
Version: 3.0.1
Built: 2024-12-28 06:39:50 UTC
Source: CRAN

Help Index


Bar Plot Method for BranchGLMVI Objects

Description

Creates a bar plot with the L0-penalization based variable importance values.

Usage

## S3 method for class 'BranchGLMVI'
barplot(
  height,
  modified = FALSE,
  horiz = TRUE,
  decreasing = FALSE,
  which = "all",
  las = ifelse(horiz, 1, 2),
  main = NULL,
  lab = NULL,
  ...
)

Arguments

height

a BranchGLMVI object.

modified

a logical value indicating if the modified variable importance values should be plotted.

horiz

a logical value to indicate whether bars should be horizontal.

decreasing

a logical value to indicate whether variables should be sorted in decreasing order. Can use NA if no ordering is desired.

which

which variable importance values to plot, can use a numeric vector of indices, a character vector of names, or "all" for all variables.

las

the style of axis labels, see par for more details.

main

the overall title for the plot.

lab

the title for the axis corresponding to the variable importance values.

...

further arguments passed to barplot.default.

Value

This only produces a plot, nothing is returned.

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
showprogress = FALSE)

# Getting variable importance
VI <- VariableImportance(VS, showprogress = FALSE)
VI

# Plotting variable importance
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
barplot(VI)
par(mar = oldmar)

Box Plot Method for BranchGLMVI.boot Objects

Description

Creates box-and-whisker plots of approximate null distributions for the modified variable importance values.

Usage

## S3 method for class 'BranchGLMVI.boot'
boxplot(
  x,
  which = "all",
  linecol = "red",
  linelwd = 2,
  horizontal = TRUE,
  lim = NULL,
  show.names = TRUE,
  lab = "Modified Variable Importance",
  main = NULL,
  las = ifelse(horizontal, 1, 2),
  ...
)

Arguments

x

a BranchGLMVI.boot object.

which

which approximate null distributions to plot, can use a numeric vector of indices, a character vector of names, or "all" for all variables. The default is to create box-and-whisker plots for each set of variables that are not kept in each model.

linecol

the color of the line which indicates the observed modified variable importance values.

linelwd

the width of the line which indicates the observed modified variable importance values.

horizontal

a logical value indicating if the boxplots should be horizontal.

lim

a numeric vector of length 2, giving the coordinates range.

show.names

set to TRUE or FALSE to override the defaults on whether an axis label is printed for each group.

lab

a label for the axis corresponding to the modified variable importance values.

main

a main title for the plot.

las

the style of axis labels, see more at par.

...

further arguments passed to boxplot.default.

Value

This only produces a plot, nothing is returned.

See Also

hist.BranchGLMVI.boot

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
showprogress = FALSE)

# Getting approximate null distributions
set.seed(40174)
myBoot <- VariableImportance.boot(VS, showprogress =  FALSE)

# Plotting boxplots of selected sets of variables
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
boxplot(myBoot, las = 1)
par(mar = oldmar)

# Plotting boxplots of selected sets of variables
boxplot(myBoot, las = 1, cex.axis = 0.55)

Fits GLMs

Description

Fits generalized linear models (GLMs) via RcppArmadillo with the ability to perform some computation in parallel with OpenMP.

Usage

BranchGLM(
  formula,
  data = NULL,
  family,
  link,
  offset = NULL,
  method = "Fisher",
  grads = 10,
  parallel = FALSE,
  nthreads = 8,
  tol = 1e-06,
  maxit = NULL,
  init = NULL,
  fit = TRUE,
  contrasts = NULL,
  keepData = TRUE,
  keepY = TRUE
)

BranchGLM.fit(
  x,
  y,
  family,
  link,
  offset = NULL,
  method = "Fisher",
  grads = 10,
  parallel = FALSE,
  nthreads = 8,
  init = NULL,
  maxit = NULL,
  tol = 1e-06
)

Arguments

formula

a formula for the model.

data

an optional data.frame, list or environment (or object coercible by as.data.frame to a data.frame), containing the variables in formula. Neither a matrix nor an array will be accepted. If not found in data, the variables are taken from environment(formula), typically the environment from which BranchGLM is called.

family

the distribution used to model the data, one of "gaussian", "gamma", "binomial", or "poisson". A family object may also be supplied for one of the accepted distributions.

link

the link used to link the mean structure to the linear predictors. One of "identity", "logit", "probit", "cloglog", "sqrt", "inverse", or "log". The accepted links depend on the specified family, see more in details. This only needs to be supplied if a string is supplied for family. If a family object is supplied for the family argument, then the link function is taken from that family object.

offset

the offset vector, by default the zero vector is used.

method

one of "Fisher", "BFGS", or "LBFGS". BFGS and L-BFGS are quasi-newton methods which are typically faster than Fisher's scoring when there are many covariates (at least 50).

grads

a positive integer to denote the number of gradients used to approximate the inverse information with, only for method = "LBFGS".

parallel

a logical value to indicate if parallelization should be used.

nthreads

a positive integer to denote the number of threads used with OpenMP, only used if parallel = TRUE.

tol

a positive number to denote the tolerance used to determine model convergence.

maxit

a positive integer to denote the maximum number of iterations performed. The default for Fisher's scoring is 50 and for the other methods the default is 200.

init

a numeric vector of initial values for the betas, if not specified then they are automatically selected via linear regression with the transformation specified by the link function. This is ignored for linear regression models.

fit

a logical value to indicate whether to fit the model or not.

contrasts

see contrasts.arg of model.matrix.default.

keepData

a logical value to indicate whether or not to store a copy of data and the design matrix, the default is TRUE. If this is FALSE, then the results from this cannot be used inside of VariableSelection.

keepY

a logical value to indicate whether or not to store a copy of y, the default is TRUE. If this is FALSE, then the binomial GLM helper functions may not work and this cannot be used inside of VariableSelection.

x

design matrix used for the fit, must be numeric.

y

outcome vector, must be numeric.

Details

Fitting

Can use BFGS, L-BFGS, or Fisher's scoring to fit the GLM. BFGS and L-BFGS are typically faster than Fisher's scoring when there are at least 50 covariates and Fisher's scoring is typically best when there are fewer than 50 covariates. This function does not currently support the use of weights. In the special case of gaussian regression with identity link the method argument is ignored and the normal equations are solved directly.

The models are fit in C++ by using Rcpp and RcppArmadillo. In order to help convergence, each of the methods makes use of a backtracking line-search using the strong Wolfe conditions to find an adequate step size. There are three conditions used to determine convergence, the first is whether there is a sufficient decrease in the negative log-likelihood, the second is whether the l2-norm of the score is sufficiently small, and the last condition is whether the change in each of the beta coefficients is sufficiently small. The tol argument controls all of these criteria. If the algorithm fails to converge, then iterations will be -1.

All observations with any missing values are removed before model fitting.

BranchGLM.fit can be faster than calling BranchGLM if the x matrix and y vector are already available, but doesn't return as much information. The object returned by BranchGLM.fit is not of class BranchGLM, so all of the methods for BranchGLM objects such as predict or VariableSelection cannot be used.

Dispersion Parameter

The dispersion parameter for binomial and Poisson regression is always fixed to be 1. For gaussian and gamma regression, the MLE of the dispersion parameter is used for the calculation of the log-likelihood and the Pearson estimator of the dispersion parameter is used for the calculation of standard errors for the coefficient estimates.

Families and Links

The binomial family accepts "cloglog", "log", "logit", and "probit" as possible link functions. The gamma and gaussian families accept "identity", "inverse", "log", and "sqrt" as possible link functions. The Poisson family accepts "identity", "log", and "sqrt" as possible link functions.

Value

BranchGLM returns a BranchGLM object which is a list with the following components

coefficients

a data.frame with the coefficient estimates, SEs, Wald test statistics, and p-values

iterations

number of iterations it took the algorithm to converge, if the algorithm failed to converge then this is -1

dispersion

a vector of length 2 with the MLE of the dispersion parameter first and the Pearson estimator of the dispersion parameter second

logLik

the log-likelihood of the fitted model

vcov

the variance-covariance matrix of the fitted model

resDev

the residual deviance of the fitted model

AIC

the AIC of the fitted model

preds

predictions from the fitted model

linpreds

linear predictors from the fitted model

residuals

a numeric vector with the Pearson residuals

variance

a numeric vector with the variance evaluated at the final coefficient estimates

tol

tolerance used to fit the model

maxit

maximum number of iterations used to fit the model

formula

formula used to fit the model

method

iterative method used to fit the model

grads

number of gradients used to approximate inverse information for L-BFGS

y

y vector used in the model, not included if keepY = FALSE

x

design matrix used to fit the model, not included if keepData = FALSE

rownames

rownames taken from the design matrix

offset

offset vector in the model, not included if keepData = FALSE

fulloffset

supplied offset vector, not included if keepData = FALSE

data

original data argument supplied to the function, not included if keepData = FALSE

mf

the model frame, not included if keepData = FALSE

numobs

number of observations in the design matrix

names

names of the predictor variables

yname

name of y variable

parallel

whether parallelization was employed to speed up model fitting process

missing

number of missing values removed from the original dataset

link

link function used to model the data

family

family used to model the data

ylevel

the levels of y, only included for binomial glms

xlev

the levels of the factors in the dataset

terms

the terms object used

BranchGLM.fit returns a list with the following components

coefficients

a data.frame with the coefficients estimates, SEs, Wald test statistics, and p-values

iterations

number of iterations it took the algorithm to converge, if the algorithm failed to converge then this is -1

dispersion

a vector of length 2 with the MLE of the dispersion parameter first and the Pearson estimator of the dispersion parameter second

logLik

the log-likelihood of the fitted model

vcov

the variance-covariance matrix of the fitted model

resDev

the residual deviance of the fitted model

AIC

the AIC of the fitted model

preds

predictions from the fitted model

linpreds

linear predictors from the fitted model

residuals

a numeric vector with the Pearson residuals

variance

a numeric vector with the variance evaluated at the final coefficient estimates

tol

tolerance used to fit the model

maxit

maximum number of iterations used to fit the model

References

McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman & Hall.

See Also

predict.BranchGLM, coef.BranchGLM, VariableSelection, confint.BranchGLM, logLik.BranchGLM

Examples

Data <- iris

# Linear regression
## Using BranchGLM
BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

## Using BranchGLM.fit
x <- model.matrix(Sepal.Length ~ ., data = Data)
y <- Data$Sepal.Length
BranchGLM.fit(x, y, family = "gaussian", link = "identity")

# Gamma regression
## Using BranchGLM
BranchGLM(Sepal.Length ~ ., data = Data, family = "gamma", link = "log")

### init
BranchGLM(Sepal.Length ~ ., data = Data, family = "gamma", link = "log", 
init = rep(0, 6), maxit = 50, tol = 1e-6, contrasts = NULL)

### method
BranchGLM(Sepal.Length ~ ., data = Data, family = "gamma", link = "log", 
init = rep(0, 6), maxit = 50, tol = 1e-6, contrasts = NULL, method = "LBFGS")

### offset
BranchGLM(Sepal.Length ~ ., data = Data, family = "gamma", link = "log", 
init = rep(0, 6), maxit = 50, tol = 1e-6, contrasts = NULL, 
offset = Data$Sepal.Width)

## Using BranchGLM.fit
x <- model.matrix(Sepal.Length ~ ., data = Data)
y <- Data$Sepal.Length
BranchGLM.fit(x, y, family = "gamma", link = "log", init = rep(0, 6), 
maxit = 50, tol = 1e-6, offset = Data$Sepal.Width)

Cindex/AUC

Description

Calculates the c-index/AUC.

Usage

Cindex(object, ...)

AUC(object, ...)

## S3 method for class 'numeric'
Cindex(object, y, ...)

## S3 method for class 'BranchGLM'
Cindex(object, ...)

## S3 method for class 'BranchGLMROC'
Cindex(object, ...)

Arguments

object

a BranchGLM object, a BranchGLMROC object, or a numeric vector.

...

further arguments passed to other methods.

y

Observed values, can be a numeric vector of 0s and 1s, a two-level factor vector, or a logical vector.

Details

Uses trapezoidal rule to calculate AUC when given a BranchGLMROC object and uses Mann-Whitney U to calculate it otherwise. The trapezoidal rule method is less accurate, so the two methods may give different results.

Value

A number corresponding to the c-index/AUC.

Examples

Data <- ToothGrowth
Fit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
Cindex(Fit)
AUC(Fit)

Extract Coefficients from BranchGLM Objects

Description

Extracts beta coefficients from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
coef(object, type = "estimates", ...)

Arguments

object

a BranchGLM object.

type

"estimates" to only return the coefficient estimates or "all" to return the estimates along with SEs, test statistics, and p-values.

...

further arguments passed to or from other methods.

Value

A named vector with the corresponding coefficient estimates or a data.frame with the coefficient estimates along with SEs, test statistics, and p-values.

Examples

Data <- iris

# Linear regression model
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
Fit

# Getting only coefficient estimates
coef(Fit, type = "estimates")

# Getting coefficient estimates along with SEs, tests, and p-values
coef(Fit, type = "all")

Extract Coefficients from BranchGLMVS or summary.BranchGLMVS Objects

Description

Extracts beta coefficients from BranchGLMVS or summary.BranchGLMVS objects.

Usage

## S3 method for class 'BranchGLMVS'
coef(object, which = 1, ...)

## S3 method for class 'summary.BranchGLMVS'
coef(object, which = 1, ...)

Arguments

object

a BranchGLMVS or summary.BranchGLMVS object.

which

a numeric vector of indices or "all" to indicate which models to get coefficients from, the default is 1 which is used for the best model. For the branch and bound algorithms the number k is used for the kth best model and for the stepwise algorithms the number k is used for the model that is k - 1 steps away from the final model.

...

ignored.

Value

A numeric matrix with the corresponding coefficient estimates.

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, 
family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
bestmodels = 10, showprogress = FALSE)

## Getting coefficients from best model
coef(VS, which = 1)

## Getting coefficients from all best models
coef(VS, which = "all")

Likelihood Ratio Confidence Intervals for Beta Coefficients for BranchGLM Objects

Description

Finds profile likelihood ratio confidence intervals for beta coefficients with the ability to calculate the intervals in parallel.

Usage

## S3 method for class 'BranchGLM'
confint(object, parm, level = 0.95, parallel = FALSE, nthreads = 8, ...)

Arguments

object

a BranchGLM object.

parm

a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

the confidence level required.

parallel

a logical value to indicate if parallelization should be used.

nthreads

a positive integer to denote the number of threads used with OpenMP, only used if parallel = TRUE.

...

further arguments passed from other methods.

Details

Endpoints of the confidence intervals that couldn't be found by the algorithm are filled in with NA. When there is a lot of multicollinearity in the data the algorithm may have problems finding many of the intervals.

Value

An object of class BranchGLMCIs which is a list with the following components.

CIs

a numeric matrix with the confidence intervals

level

the supplied level

MLE

a numeric vector of the MLEs of the coefficients

See Also

plot.BranchGLMCIs, plotCI

Examples

Data <- iris
### Fitting linear regression model
mymodel <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

### Getting confidence intervals
CIs <- confint(mymodel, level = 0.95)
CIs

### Plotting CIs
plot(CIs, mary = 7, cex.y = 0.9)

Extract the Deviance

Description

Extracts the deviance from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
deviance(object, ...)

Arguments

object

a BranchGLM object.

...

further arguments passed to or from other methods.

Value

A single number denoting the deviance.


Extract AIC from BranchGLM Objects

Description

Computes the (generalized) Akaike An Information Criterion for BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
extractAIC(fit, scale, k = 2, ...)

Arguments

fit

a BranchGLM object.

scale

ignored.

k

a non-negative number specifying the ‘weight’ of the equivalent degrees of freedom (= edf) part in the AIC formula.

...

further arguments passed to or from other methods.

Value

A numeric vector of length 2, with first and second elements giving

edf

the ‘equivalent degrees of freedom’ for fit

AIC

the (generalized) Akaike Information Criterion for fit

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
extractAIC(Fit)

Extract Family from BranchGLM Objects

Description

Extracts family from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
family(object, ...)

Arguments

object

a BranchGLM object.

...

further arguments passed to or from other methods.

Value

An object of class family.

Note

Only the family and link components of family objects are used by the BranchGLM package. The other components of the family object may not be the same as what is used in the fitting process for BranchGLM.


Extract Model Formula from BranchGLM Objects

Description

Extracts model formula from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
formula(x, ...)

Arguments

x

a BranchGLM object.

...

further arguments passed to or from other methods.

Value

A formula representing the model used to obtain x.


Histogram Method for BranchGLMVI.boot Objects

Description

Creates histograms of approximate null distributions for the modified variable importance values.

Usage

## S3 method for class 'BranchGLMVI.boot'
hist(
  x,
  which = "all",
  linecol = "red",
  linelwd = 2,
  xlim = NULL,
  xlab = "Modified Variable Importance",
  main = NULL,
  ...
)

Arguments

x

a BranchGLMVI.boot object.

which

which approximate null distributions to plot, can use a numeric vector of indices, a character vector of names, or "all" for all variables. The default is to create histograms for each set of variables that are not kept in each model.

linecol

the color of the line which indicates the observed modified variable importance values.

linelwd

the width of the line which indicates the observed modified variable importance values.

xlim

a numeric vector of length 2, giving the x coordinates range.

xlab

a label for the x axis.

main

a main title for the plot.

...

further arguments passed to hist.default.

Value

This only produces a plot, nothing is returned.

See Also

boxplot.BranchGLMVI.boot

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
showprogress = FALSE)

# Getting approximate null distributions
set.seed(40174)
myBoot <- VariableImportance.boot(VS, showprogress = FALSE)

# Plotting histograms of second set of variables
hist(myBoot, which = 2)

# Plotting histograms of third set of variables
hist(myBoot, which = 3, linecol = "blue", linelwd = 5)

Extract Log-Likelihood from BranchGLM Objects

Description

Extracts log-likelihood from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
logLik(object, ...)

Arguments

object

a BranchGLM object.

...

further arguments passed to or from other methods.

Value

An object of class logLik which is a number corresponding to the log-likelihood with the following attributes: "df" (degrees of freedom) and "nobs" (number of observations).


Extract Model Frame from a BranchGLM Object

Description

Extracts model frame from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
model.frame(formula, ...)

Arguments

formula

a BranchGLM object.

...

further arguments passed to or from other methods.

Value

A data.frame used for the fitted model.


Plotting Multiple ROC Curves

Description

Plotting Multiple ROC Curves

Usage

MultipleROCCurves(
  ...,
  legendpos = "bottomright",
  title = "ROC Curves",
  colors = NULL,
  names = NULL,
  lty = 1,
  lwd = 1
)

Arguments

...

any number of BranchGLMROC objects.

legendpos

a keyword to describe where to place the legend, such as "bottomright". The default is "bottomright"

title

title for the plot.

colors

vector of colors to be used on the ROC curves.

names

vector of names used to create a legend for the ROC curves.

lty

vector of linetypes used to create the ROC curves or a single linetype to be used for all ROC curves.

lwd

vector of linewidths used to create the ROC curves or a single linewidth to be used for all ROC curves.

Value

This only produces a plot, nothing is returned.

Examples

Data <- ToothGrowth

### Logistic ROC
LogisticFit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
LogisticROC <- ROC(LogisticFit)

### Probit ROC
ProbitFit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "probit")
ProbitROC <- ROC(ProbitFit)

### Cloglog ROC
CloglogFit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "cloglog")
CloglogROC <- ROC(CloglogFit)

### Plotting ROC curves

MultipleROCCurves(LogisticROC, ProbitROC, CloglogROC, 
                  names = c("Logistic ROC", "Probit ROC", "Cloglog ROC"))

Extract Number of Observations from BranchGLM Objects

Description

Extracts number of observations from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
nobs(object, ...)

Arguments

object

a BranchGLM object.

...

further arguments passed to or from other methods.

Value

A single number indicating the number of observations used to fit the model.


Plot Method for BranchGLM Objects

Description

Creates a plot to help visualize fitted values from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
plot(x, ...)

Arguments

x

a BranchGLM object.

...

further arguments passed to plot.default.

Value

This only produces a plot, nothing is returned.

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
plot(Fit)

Plot Method for BranchGLMCIs Objects

Description

Creates a plot to visualize confidence intervals from BranchGLMCIs objects.

Usage

## S3 method for class 'BranchGLMCIs'
plot(x, which = "all", mary = 5, ...)

Arguments

x

a BranchGLMCIs object.

which

which intervals to plot, can use a numeric vector of indices, a character vector of names of desired variables, or "all" to plot all intervals.

mary

a numeric value used to determine how large to make margin of y-axis. If variable names are cut-off, consider increasing this from the default value of 5.

...

further arguments passed to plotCI.

Value

This only produces a plot, nothing is returned.

See Also

plotCI

Examples

Data <- iris
### Fitting linear regression model
mymodel <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

### Getting confidence intervals
CIs <- confint(mymodel, level = 0.95)
CIs

### Plotting CIs
plot(CIs, mary = 7, cex.y = 0.9)

Plot Method for BranchGLMROC Objects

Description

This plots a ROC curve.

Usage

## S3 method for class 'BranchGLMROC'
plot(x, xlab = "1 - Specificity", ylab = "Sensitivity", type = "l", ...)

Arguments

x

a BranchGLMROC object.

xlab

label for the x-axis.

ylab

label for the y-axis.

type

what type of plot to draw, see more details at plot.default.

...

further arguments passed to plot.default.

Value

This only produces a plot, nothing is returned.

Examples

Data <- ToothGrowth
Fit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
MyROC <- ROC(Fit)
plot(MyROC)

Plot Method for summary.BranchGLMVS and BranchGLMVS Objects

Description

Creates plots to help visualize variable selection results from BranchGLMVS or summary.BranchGLMVS objects.

Usage

## S3 method for class 'BranchGLMVS'
plot(
  x,
  ptype = "both",
  marnames = 7,
  addLines = TRUE,
  type = "b",
  horiz = FALSE,
  cex.names = 1,
  cex.lab = 1,
  cex.axis = 1,
  cex.legend = 1,
  cols = c("deepskyblue", "indianred", "forestgreen"),
  ...
)

## S3 method for class 'summary.BranchGLMVS'
plot(
  x,
  ptype = "both",
  marnames = 7,
  addLines = TRUE,
  type = "b",
  horiz = FALSE,
  cex.names = 1,
  cex.lab = 1,
  cex.axis = 1,
  cex.legend = 1,
  cols = c("deepskyblue", "indianred", "forestgreen"),
  ...
)

Arguments

x

a summary.BranchGLMVS or BranchGLMVS object.

ptype

the type of plot to produce, look at details for more explanation.

marnames

a numeric value used to determine how large to make margin of axis with variable names, this is only for the "variables" plot. If variable names are cut-off, consider increasing this from the default value of 7.

addLines

a logical value to indicate whether or not to add black lines to separate the models for the "variables" plot. This is typically useful for smaller amounts of models, but can be annoying if there are many models.

type

what type of plot to draw for the "metrics" plot, see more details at plot.default.

horiz

a logical value to indicate whether models should be displayed horizontally for the "variables" plot.

cex.names

how big to make variable names in the "variables" plot.

cex.lab

how big to make axis labels.

cex.axis

how big to make axis annotation.

cex.legend

how big to make legend labels.

cols

the colors used to create the "variables" plot. Should be a character vector of length 3, the first color will be used for included variables, the second color will be used for excluded variables, and the third color will be used for kept variables.

...

further arguments passed to plot.default for the "metrics" plot and image.default for the "variables" plot.

Details

The different values for ptype are as follows

  • "metrics" for a plot that displays the metric values ordered by rank for the branch and bound algorithms or a plot which displays the metric values in the path taken by the stepwise algorithms

  • "variables" for a plot that displays which variables are in each of the top models for the branch and bound algorithms or a plot which displays the path taken by the stepwise algorithms

  • "both" for both plots

If there are so many models that the "variables" plot appears to be entirely black, then set addLines to FALSE.

Value

This only produces plots, nothing is returned.

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", bestmodels = 10, 
showprogress = FALSE)
VS

## Getting summary of the process
Summ <- summary(VS)
Summ

## Plotting the BIC of best models
plot(Summ, type = "b", ptype = "metrics")

## Plotting the BIC of the best models 
plot(Summ, ptype = "variables")

### Alternative colors
plot(Summ, ptype = "variables", 
cols = c("yellowgreen", "purple1", "grey50"))

### Smaller text size for names
plot(Summ, ptype = "variables", cex.names = 0.75)

Plot Confidence Intervals

Description

Creates a plot to display confidence intervals.

Usage

plotCI(
  CIs,
  points = NULL,
  ylab = "",
  ylas = 2,
  cex.y = 1,
  decreasing = FALSE,
  ...
)

Arguments

CIs

a numeric matrix of confidence intervals, must have exactly 2 columns. The variable names displayed in the plot are taken from the column names.

points

points to be plotted in the middle of the CIs, typically means or medians. The default is to plot the midpoints of the intervals.

ylab

a label for the y-axis.

ylas

the style of the y-axis label, see more about this at las in par.

cex.y

font size used for variable names on y-axis.

decreasing

a logical value indicating if confidence intervals should be displayed in decreasing or increasing order according to points. Can use NA if no ordering is desired.

...

further arguments passed to plot.default.

Value

This only produces a plot, nothing is returned.

Examples

Data <- iris
### Fitting linear regression model
mymodel <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

### Getting confidence intervals
CIs <- confint.default(mymodel, level = 0.95)
xlim <- c(min(CIs), max(CIs))

### Plotting CIs
par(mar = c(5, 7, 3, 1) + 0.1)
plotCI(CIs, main = "95% Confidence Intervals", xlim = xlim, cex.y = 0.9, 
xlab = "Beta Coefficients")
abline(v = 0)

Predict Method for BranchGLM Objects

Description

Obtains predictions from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
predict(
  object,
  newdata = NULL,
  offset = NULL,
  type = "response",
  na.action = na.pass,
  ...
)

Arguments

object

a BranchGLM object.

newdata

a data.frame, if not specified then the data the model was fit on is used.

offset

a numeric vector containing the offset variable, this is ignored if newdata is not supplied.

type

one of "linpreds" which is on the scale of the linear predictors or "response" which is on the scale of the response variable. If not specified, then "response" is used.

na.action

a function which indicates what should happen when the data contains NAs. The default is na.pass. This is ignored if newdata is not supplied and data isn't included in the supplied BranchGLM object.

...

further arguments passed to or from other methods.

Value

A numeric vector of predictions.

Examples

Data <- airquality

# Example without offset
Fit <- BranchGLM(Temp ~ ., data = Data, family = "gaussian", link = "identity")

## Using default na.action
predict(Fit)

## Using na.omit
predict(Fit, na.action = na.omit)

## Using new data
predict(Fit, newdata = Data[1:20, ], na.action = na.pass)

# Using offset
FitOffset <- BranchGLM(Temp ~ . - Day, data = Data, family = "gaussian", 
link = "identity", offset = Data$Day * -0.1)

## Getting predictions for data used to fit model
### Don't need to supply offset vector
predict(FitOffset)

## Getting predictions for new dataset
### Need to include new offset vector since we are 
### getting predictions for new dataset
predict(FitOffset, newdata = Data[1:20, ], offset = Data$Day[1:20] * -0.1)

Predict Method for BranchGLMVS or summary.BranchGLMVS Objects

Description

Obtains predictions from BranchGLMVS or summary.BranchGLMVS objects.

Usage

## S3 method for class 'BranchGLMVS'
predict(object, which = 1, ...)

## S3 method for class 'summary.BranchGLMVS'
predict(object, which = 1, ...)

Arguments

object

a BranchGLMVS or summary.BranchGLMVS object.

which

a positive integer to indicate which model to get predictions from, the default is 1 which is used for the best model. For the branch and bound algorithms the number k is used for the kth best model and for the stepwise algorithms the number k is used for the model that is k - 1 steps away from the final model.

...

further arguments passed to predict.BranchGLM.

Value

A numeric vector of predictions.

See Also

predict.BranchGLM

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, 
family = "gamma", link = "log")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
bestmodels = 10, showprogress = FALSE)

## Getting predictions from best model
predict(VS, which = 1)

## Getting linear predictors from 5th best model
predict(VS, which = 5, type = "linpreds")

Print Method for BranchGLM Objects

Description

Print method for BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
print(x, coefdigits = 4, digits = 4, ...)

Arguments

x

a BranchGLM object.

coefdigits

number of digits to display for coefficients table.

digits

number of significant digits to display for information after table.

...

further arguments passed to or from other methods.

Value

The supplied BranchGLM object.


Print Method for BranchGLMCIs Objects

Description

Print method for BranchGLMCIs objects.

Usage

## S3 method for class 'BranchGLMCIs'
print(x, digits = 4, ...)

Arguments

x

a BranchGLMCIs object.

digits

number of significant digits to display.

...

further arguments passed from other methods.

Value

The supplied BranchGLMCIs object.


Print Method for BranchGLMROC Objects

Description

Print method for BranchGLMROC objects.

Usage

## S3 method for class 'BranchGLMROC'
print(x, ...)

Arguments

x

a BranchGLMROC object.

...

further arguments passed to other methods.

Value

The supplied BranchGLMROC object.


Print Method for BranchGLMTable Objects

Description

Print method for BranchGLMTable objects.

Usage

## S3 method for class 'BranchGLMTable'
print(x, digits = 4, ...)

Arguments

x

a BranchGLMTable object.

digits

number of digits to display.

...

further arguments passed to other methods.

Value

The supplied BranchGLMTable object.


Print Method for BranchGLMVI Objects

Description

Print method for BranchGLMVI objects.

Usage

## S3 method for class 'BranchGLMVI'
print(x, digits = 3, ...)

Arguments

x

a BranchGLMVI object.

digits

number of significant digits to display.

...

further arguments passed to other methods.

Value

The supplied BranchGLMVI object.


Print Method for BranchGLMVI.boot Objects

Description

Print method for BranchGLMVI.boot objects.

Usage

## S3 method for class 'BranchGLMVI.boot'
print(x, digits = 3, ...)

Arguments

x

a BranchGLMVI.boot object.

digits

number of significant digits to display.

...

further arguments passed to other methods.

Value

The supplied BranchGLMVI.boot object.


Print Method for BranchGLMVS Objects

Description

Print method for BranchGLMVS objects.

Usage

## S3 method for class 'BranchGLMVS'
print(x, digits = 2, ...)

Arguments

x

a BranchGLMVS object.

digits

number of digits to display.

...

further arguments passed to other methods.

Value

The supplied BranchGLMVS object.


Print Method for summary.BranchGLMVS Objects

Description

Print method for summary.BranchGLMVS objects.

Usage

## S3 method for class 'summary.BranchGLMVS'
print(x, digits = 2, ...)

Arguments

x

a summary.BranchGLMVS object.

digits

number of digits to display.

...

further arguments passed to other methods.

Value

The supplied summary.BranchGLMVS object.


Extract the Pearson Residuals from BranchGLM Objects

Description

Extracts the Pearson residuals from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
residuals(object, ...)

Arguments

object

a BranchGLM object.

...

further arguments passed to or from other methods.

Value

A vector of the Pearson residuals from the supplied BranchGLM object.


ROC Curve

Description

Creates an ROC curve.

Usage

ROC(object, ...)

## S3 method for class 'numeric'
ROC(object, y, ...)

## S3 method for class 'BranchGLM'
ROC(object, ...)

Arguments

object

a BranchGLM object or a numeric vector.

...

further arguments passed to other methods.

y

observed values, can be a numeric vector of 0s and 1s, a two-level factor vector, or a logical vector.

Value

A BranchGLMROC object which can be plotted with plot(). The AUC can also be calculated using AUC().

Examples

Data <- ToothGrowth
Fit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
MyROC <- ROC(Fit)
plot(MyROC)

Extract Square Root of the Dispersion Parameter Estimates

Description

Extracts the square root of the dispersion parameter estimates from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
sigma(object, ...)

Arguments

object

a BranchGLM object.

...

further arguments passed to or from other methods.

Value

A numeric vector of length 2 with first and second elements giving

mle

the MLE of the dispersion parameter

pearson

the Pearson estimator of the dispersion parameter

Note

The dispersion parameter for binomial and Poisson regression is always fixed to be 1. The MLE of the dispersion parameter is used in the calculation of the log-likelihood while the Pearson estimator of the dispersion parameter is used to calculate standard errors for the coefficient estimates.

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
sigma(Fit)

Summary Method for BranchGLMVS Objects

Description

Summary method for BranchGLMVS objects.

Usage

## S3 method for class 'BranchGLMVS'
summary(object, ...)

Arguments

object

a BranchGLMVS object.

...

further arguments passed to or from other methods.

Value

An object of class summary.BranchGLMVS which is a list with the following components

results

a data.frame which has the metric values for the best models along with the sets of variables included in each model

VS

the supplied BranchGLMVS object

formulas

a list containing the formulas of the best models

metric

the metric used to perform variable selection

See Also

plot.summary.BranchGLMVS, coef.summary.BranchGLMVS, predict.summary.BranchGLMVS

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
bestmodels = 10, showprogress = FALSE)
VS

## Getting summary of the process
Summ <- summary(VS)
Summ

## Plotting the BIC of the best models
plot(Summ, type = "b")

## Plotting the variables in the best models
plot(Summ, ptype = "variables")

## Getting coefficients
coef(Summ)

Confusion Matrix

Description

Creates a confusion matrix and calculates related measures.

Usage

Table(object, ...)

## S3 method for class 'numeric'
Table(object, y, cutoff = 0.5, ...)

## S3 method for class 'BranchGLM'
Table(object, cutoff = 0.5, ...)

Arguments

object

a BranchGLM object or a numeric vector.

...

further arguments passed to other methods.

y

observed values, can be a numeric vector of 0s and 1s, a two-level factor vector, or a logical vector.

cutoff

cutoff for predicted values, the default is 0.5.

Value

A BranchGLMTable object which is a list with the following components

table

a matrix corresponding to the confusion matrix

accuracy

a number corresponding to the accuracy

sensitivity

a number corresponding to the sensitivity

specificity

a number corresponding to the specificity

PPV

a number corresponding to the positive predictive value

levels

a vector corresponding to the levels of the response variable

Examples

Data <- ToothGrowth
Fit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
Table(Fit)

Computes Exact or Approximate L0-penalization based Variable Importance for GLMs

Description

Gets exact or approximate L0-penalization based variable importance values for generalized linear models (GLMs). More details about what the variable importance values are can be found in the details section.

Usage

VariableImportance(
  object,
  VIMethod = "simultaneous",
  parallel = FALSE,
  nthreads = 8,
  showprogress = TRUE
)

Arguments

object

an object of class BranchGLMVS.

VIMethod

one of "separate" or "simultaneous" to denote the method used to find the variable importance values. This is ignored if the type of variable selection employed in object was a heuristic method.

parallel

a logical value to indicate if parallelization should be used.

nthreads

number of threads used with OpenMP, only used if parallel = TRUE.

showprogress

a logical value to indicate whether or not to show progress updates.

Details

Note that variable importance values can only be found for sets of variables that are not kept through the model selection process. More details about the variable importance values will be made available in an upcoming paper.

When a branch and bound algorithm is used in object, then the exact variable importance values are computed. When a heuristic method is used, then approximate variable importance values are computed based on the specified heuristic method.

Value

A BranchGLMVI object which is a list with the following components

results

a data.frame with the variable importance values and degrees of freedom

metric

metric used to select the best models

numchecked

number of models fit

VS

the supplied BranchGLMVS object

with

a numeric matrix with the best models that include each set of variables

withmetrics

a numeric vector with the metric values for the best models with each set of variables

without

a numeric matrix with the best models that exclude each set of variables

withoutmetrics

a numeric vector with the metric values for the best models without each set of variables

References

Seedorff J, Cavanaugh JE. Assessing Variable Importance for Best Subset Selection. Entropy. 2024; 26(9):801. doi:10.3390/e26090801

See Also

VariableImportance.boot, barplot.BranchGLMVI

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
showprogress = FALSE)

# Getting variable importance
VI <- VariableImportance(VS, showprogress = FALSE)
VI

# Plotting variable importance
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
barplot(VI)
par(mar = oldmar)

Performs Parametric Bootstrap for Modified Variable Importance

Description

Performs a version of the parametric bootstrap to create an approximate null distribution for the modified variable importance values in order to get approximate p-values.

Usage

VariableImportance.boot(object, ...)

## S3 method for class 'BranchGLMVS'
VariableImportance.boot(
  object,
  nboot = 100,
  parallel = FALSE,
  nthreads = 8,
  showprogress = TRUE,
  ...
)

## S3 method for class 'BranchGLMVI'
VariableImportance.boot(
  object,
  nboot = 100,
  parallel = FALSE,
  nthreads = 8,
  showprogress = TRUE,
  ...
)

Arguments

object

a BranchGLMVS or BranchGLMVI object.

...

further arguments to VariableImportance when object is of class BranchGLMVS.

nboot

the number of bootstrap replications to perform.

parallel

a logical value to indicate if parallelization should be used.

nthreads

number of threads used with OpenMP, only used if parallel = TRUE.

showprogress

a logical value to indicate if a progress bar should be displayed.

Details

This performs a version of the parametric bootstrap with the modified variable importance values to generate approximate p-values for the sets of variables. We are currently working on a paper that describes this function in further detail.

Value

a BranchGLMVI.boot object which is a list with the following components

summary

a data.frame with the observed modified variable importance values and approximate p-values

results

a numeric matrix with the modified variable importance values for each set of bootstrap replications

pvals

a numeric vector with the approximate p-values based on modified variable importance

nboot

the number of bootstrap replications performed

metric

the metric used to calculate the modified variable importance values

VI

the supplied BranchGLMVI object

References

Seedorff J, Cavanaugh JE. Assessing Variable Importance for Best Subset Selection. Entropy. 2024; 26(9):801. doi:10.3390/e26090801

See Also

hist.BranchGLMVI.boot, boxplot.BranchGLMVI.boot, VariableImportance

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
showprogress = FALSE)

# Getting approximate null distributions
set.seed(40174)
myBoot <- VariableImportance.boot(VS, showprogress = FALSE)
myBoot

# Plotting histogram of results for second set of variables
hist(myBoot, which = 2)

# Plotting boxplots of results
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
boxplot(myBoot, las = 1)
par(mar = oldmar)

Variable Selection for GLMs

Description

Performs forward selection, multiple different variants of backward elimination, and efficient best subset variable selection with information criterion for generalized linear models (GLMs). Best subset selection is performed with branch and bound algorithms to greatly speed up the process and backward elimination can be performed with bounding algorithms to speed it up.

Usage

VariableSelection(object, ...)

## S3 method for class 'formula'
VariableSelection(
  object,
  data = NULL,
  family,
  link,
  offset = NULL,
  method = "Fisher",
  type = "switch branch and bound",
  metric = "AIC",
  bestmodels = NULL,
  cutoff = NULL,
  keep = NULL,
  keepintercept = TRUE,
  maxsize = NULL,
  grads = 10,
  parallel = FALSE,
  nthreads = 8,
  tol = 1e-06,
  maxit = NULL,
  contrasts = NULL,
  showprogress = TRUE,
  ...
)

## S3 method for class 'BranchGLM'
VariableSelection(
  object,
  type = "switch branch and bound",
  metric = "AIC",
  bestmodels = NULL,
  cutoff = NULL,
  keep = NULL,
  keepintercept = TRUE,
  maxsize = NULL,
  parallel = FALSE,
  nthreads = 8,
  showprogress = TRUE,
  ...
)

Arguments

object

a formula or a BranchGLM object.

...

further arguments.

data

a data.frame, list or environment (or object coercible by as.data.frame to a data.frame), containing the variables in formula. Neither a matrix nor an array will be accepted. If not found in data, the variables are taken from environment(formula), typically the environment from which VariableSelection is called.

family

the distribution used to model the data, one of "gaussian", "gamma", "binomial", or "poisson". A family object may also be supplied for one of the accepted distributions.

link

the link used to link the mean structure to the linear predictors. One of "identity", "logit", "probit", "cloglog", "sqrt", "inverse", or "log". This only needs to be supplied if a string is supplied for family. If a family object is supplied for the family argument, then the link function is taken from that family object.

offset

the offset vector, by default the zero vector is used.

method

one of "Fisher", "BFGS", or "LBFGS". Fisher's scoring is recommended for forward selection and the branch and bound algorithms since they will typically fit many models with a small number of covariates.

type

one of "forward", "backward", "fast backward", "double backward", "fast double backward", "branch and bound", "backward branch and bound", or "switch branch and bound" to indicate the type of variable selection to perform. The default value is "switch branch and bound". See more about these algorithms in details

metric

the metric used to choose the best models, the default is "AIC", but "BIC" and "HQIC" are also available. AIC is the Akaike information criterion, BIC is the Bayesian information criterion, and HQIC is the Hannan-Quinn information criterion.

bestmodels

a positive integer to indicate the number of the best models to find according to the chosen metric or NULL. If this is NULL, then cutoff is used instead. This is only used for the branch and bound algorithms.

cutoff

a non-negative number which indicates that the function should return all models that have a metric value within cutoff of the best metric value or NULL. Only one of this or bestmodels should be specified and when both are NULL a cutoff of 0 is used. This is only used for the branch and bound algorithms.

keep

a character vector of names to denote variables that must be in the models.

keepintercept

a logical value to indicate whether to keep the intercept in all models, only used if an intercept is included in the formula.

maxsize

a positive integer to denote the maximum number of variables to consider in a single model, the default is the total number of variables. This number adds onto any variables specified in keep. This argument only works for type = "forward" and type = "branch and bound". This argument is now deprecated.

grads

a positive integer to denote the number of gradients used to approximate the inverse information with, only for method = "LBFGS".

parallel

a logical value to indicate if parallelization should be used.

nthreads

a positive integer to denote the number of threads used with OpenMP, only used if parallel = TRUE.

tol

a positive number to denote the tolerance used to determine model convergence.

maxit

a positive integer to denote the maximum number of iterations performed. The default for Fisher's scoring is 50 and for the other methods the default is 200.

contrasts

see contrasts.arg of model.matrix.default.

showprogress

a logical value to indicate whether to show progress updates for branch and bound algorithms.

Details

Variable Selection Details

The supplied formula or the formula from the fitted model is treated as the upper model. The variables specified in keep along with an intercept (if included in formula and keepintercept = TRUE) is the lower model. Factor variables are either kept in their entirety or entirely removed and interaction terms are properly handled. All observations that have any missing values in the upper model are removed.

Stepwise Methods

There are 5 different stepwise variable selection algorithms that are available. These are forward selection, backward elimination, fast backward elimination, double backward elimination, and fast double backward elimination. All of these are heuristic algorithms, so the best model found by them may not be the optimal model.

Backward Elimination

Fast backward elimination should give the same results as backward elimination, but it makes use of the bounding techniques used by the branch and bound algorithms to make it faster. Fast backward elimination can give slightly different results than backward elimination if the GLM solver has difficulties fitting some of the larger models.

Double Backward Elimination

Double backward elimination and fast double backward elimination are a variant of backward elimination where up to 2 variables can be removed in one step instead of just 1. This typically results in higher quality models, but can also be much slower. The bounding algorithm used in fast double backward elimination makes it much faster.

Branch and Bound Methods

The branch and bound algorithm is an efficient algorithm used to find the optimal models. The backward branch and bound algorithm is very similar to the branch and bound algorithm, except it tends to be faster when the best models contain most of the variables. The switch branch and bound algorithm is a combination of the two algorithms and is typically the fastest of the 3 branch and bound algorithms. All of the branch and bound algorithms are guaranteed to find the optimal models (up to numerical precision).

GLM Fitting

Fisher's scoring is recommended for branch and bound selection and forward selection. L-BFGS may be faster for the backward elimination and double backward elimination algorithms, especially when there are many variables.

Value

A BranchGLMVS object which is a list with the following components

initmodel

the BranchGLM object corresponding to the upper model

numchecked

number of models fit

names

character vector of the names of the predictor variables

order

the order the variables were added to the model or removed from the model, this is only included for the stepwise algorithms

type

type of variable selection employed

optType

whether the type specified used a heuristic or exact algorithm

metric

metric used to select the best models

bestmodels

numeric matrix used to describe the best models for the branch and bound algorithms or a numeric matrix describing the models along the path taken for stepwise algorithms

bestmetrics

numeric vector with the best metrics found in the search for the branch and bound algorithms or a numeric vector with the metric values along the path taken for stepwise algorithms

beta

numeric matrix of beta coefficients for the models in bestmodels

cutoff

the cutoff that was used, this is set to -1 if bestmodels was used instead or if a stepwise algorithm was used

keep

vector of which variables were kept through the selection process

keepintercept

a boolean value denoting whether to keep the intercept through the selection process or not

See Also

plot.BranchGLMVS, coef.BranchGLMVS, predict.BranchGLMVS, summary.BranchGLMVS

Examples

Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", 
link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
bestmodels = 10, showprogress = FALSE)
VS

## Plotting the BIC of the best models
plot(VS, type = "b")

## Getting the coefficients of the best model according to BIC
FinalModel <- coef(VS, which = 1)
FinalModel

# Now doing it in parallel (although it isn't necessary for this dataset)
parVS <- VariableSelection(Fit, type = "branch and bound", parallel = TRUE, 
metric = "BIC", bestmodels = 10, showprogress = FALSE)

## Getting the coefficients of the best model according to BIC
FinalModel <- coef(parVS, which = 1)
FinalModel

# Using a formula
formVS <- VariableSelection(Sepal.Length ~ ., data = Data, family = "gaussian", 
link = "identity", metric = "BIC", type = "branch and bound", bestmodels = 10, 
showprogress = FALSE)

## Getting the coefficients of the best model according to BIC
FinalModel <- coef(formVS, which = 1)
FinalModel

# Using the keep argument
keepVS <- VariableSelection(Fit, type = "branch and bound", 
keep = c("Species", "Petal.Width"), metric = "BIC", bestmodels = 4, 
showprogress = FALSE)
keepVS

## Getting the coefficients from the fourth best model according to BIC when 
## keeping Petal.Width and Species in every model
FinalModel <- coef(keepVS, which = 4)
FinalModel

# Treating categorical variable beta parameters separately
## This function automatically groups together parameters from a categorical variable
## to avoid this, you need to create the indicator variables yourself
x <- model.matrix(Sepal.Length ~ ., data = iris)
Sepal.Length <- iris$Sepal.Length
Data <- cbind.data.frame(Sepal.Length, x[, -1])
VSCat <- VariableSelection(Sepal.Length ~ ., data = Data, family = "gaussian", 
link = "identity", metric = "BIC", bestmodels = 10, showprogress = FALSE)
VSCat

## Plotting results
plot(VSCat, cex.names = 0.75)

Extract covariance matrix from BranchGLM Objects

Description

Extracts covariance matrix of beta coefficients from BranchGLM objects.

Usage

## S3 method for class 'BranchGLM'
vcov(object, ...)

Arguments

object

a BranchGLM object.

...

further arguments passed to or from other methods.

Value

A numeric matrix which is the covariance matrix of the beta coefficients.