Title: | Best Subset Selection in Linear, Logistic and CoxPH Models |
---|---|
Description: | An implementation of best subset selection in generalized linear model and Cox proportional hazard model via the primal dual active set algorithm proposed by Wen, C., Zhang, A., Quan, S. and Wang, X. (2020) <doi:10.18637/jss.v094.i04>. The algorithm formulates coefficient parameters and residuals as primal and dual variables and utilizes efficient active set selection strategies based on the complementarity of the primal and dual variables. |
Authors: | Canhong Wen [aut, cre], Aijun Zhang [aut], Shijie Quan [aut], Xueqin Wang [aut] |
Maintainer: | Canhong Wen <[email protected]> |
License: | GPL-3 |
Version: | 2.0.4 |
Built: | 2024-12-07 06:35:14 UTC |
Source: | CRAN |
These functions are used by bess to compute Information Criteria from a fitted model object.
aic(object,...) bic(object,...) ebic(object,...)
aic(object,...) bic(object,...) ebic(object,...)
object |
Output from the |
... |
Additional arguments affecting the predictions produced. |
The value of Information Criteria extracted from the "bess" object.
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") aic(fit) bic(fit) ebic(fit)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") aic(fit) bic(fit) ebic(fit)
Best subset selection for generalized linear model and Cox's proportional model.
bess(x, y, family = c("gaussian", "binomial", "cox"), method = "gsection", s.min = 1, s.max, s.list, K.max = 20, max.steps = 15, glm.max = 1e6, cox.max = 20, factor = NULL, epsilon = 1e-4, weights=rep(1,nrow(x)))
bess(x, y, family = c("gaussian", "binomial", "cox"), method = "gsection", s.min = 1, s.max, s.list, K.max = 20, max.steps = 15, glm.max = 1e6, cox.max = 20, factor = NULL, epsilon = 1e-4, weights=rep(1,nrow(x)))
x |
Input matrix,of dimension n x p; each row is an observation vector. |
y |
Response variable,of length n. For family="binomial" should be a factor with two levels. For family="cox", y should be a two-column matrix with columns named 'time' and 'status'. |
family |
One of the GLM or Cox models. Either "gaussian", "binomial", or "cox", depending on the response. |
method |
Methods tobe used to select the optimal model size. For method = " |
s.min |
The minimum value of model sizes. Only used for method = " |
s.max |
The maximum value of model sizes. Only used for method = " |
s.list |
A list of sequential value representing the model sizes. Only used for method = " |
K.max |
The maximum iterations used for method = " |
max.steps |
The maximum number of iterations in |
glm.max |
The maximum number of iterations for solving the maximum likelihood problem on the active set at each step in the primal dual active set algorithm.Only used in the logistic regression for family="binomial". Default is 1e6. |
cox.max |
The maximum number of iterations for solving the maximum partial likelihood problem on the active set at each step in the primal dual active set algorithm. Only used in Cox's model for family="cox". Default is 20. |
factor |
Which variable to be factored. Should be NULL or a numeric vector. |
epsilon |
The tolerance for an early stoping rule in the method "sequential". The early stopping rule is defined as |
weights |
Observation weights. Default is 1 for each observation |
The best subset selection problem with model size is
In the GLM case, is the log-likelihood function; In the Cox model,
is the log parital likelihood function.
For each candiate model size, the best subset selection problem is solved by the primal dual active set(PDAS) algorithm, see Wen et al(2017) for details. This algorithm utilizes an active set updating strategy via primal and dual vairables and fits the sub-model by exploiting the fact that their support set are non-overlap and complementary. For the case of method = "sequential", we run the PDAS algorithm for a list of sequential model sizes and use the estimate from last iteration as a warm start. For the case of method = "gsection
", a golden section search technique is adopted to efficiently determine the optimal model size.
A list with class attribute 'bess' and named components:
family |
Types of the model: " |
beta |
The best fitting coefficients of size |
lambda |
The lambda value in the Lagrangian form of the best subset selection problem with model size of |
bestmodel |
The best fitted model, the class of which is "lm", "glm" or "coxph" |
deviance |
The value of |
nulldeviance |
The value of |
AIC |
The value of |
BIC |
The value of |
EBIC |
The value of |
factor |
Which variable to be factored. Should be NULL or a numeric vector. |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
bess.one
, plot.bess
,
predict.bess
.
#--------------linear model--------------# # Generate simulated data n <- 500 p <- 20 K <-10 sigma <- 1 rho <- 0.2 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit1 <- bess(data$x, data$y, family = "gaussian") print(fit1) #coef(fit1, sparse=TRUE) # The estimated coefficients bestmodel <- fit1$bestmodel #summary(bestmodel) # Plot solution path and the loss function plot(fit1, type = "both", breaks = TRUE) ## Not run: #--------------logistic model--------------# # Generate simulated data data <- gen.data(n, p, family="binomial", 5, rho, sigma) # Best subset selection fit2 <- bess(data$x, data$y, s.list = 1:10, method = "sequential", family = "binomial", epsilon = 0) print(fit2) #coef(fit2, sparse = TRUE) bestmodel <- fit2$bestmodel #summary(bestmodel) # Plot solution path and the loss function plot(fit2, type = "both", breaks = TRUE, K = 5) #--------------cox model--------------# # Generate simulated data data <- gen.data(n, p, 5, rho, sigma, c = 10, family = "cox", scal = 10) # Best subset selection fit3 <- bess(data$x, data$y, s.list = 1:10, method = "sequential", family = "cox") print(fit3) #coef(fit3, sparse = TRUE) bestmodel <- fit3$bestmodel #summary(bestmodel) # Plot solution path and the loss function plot(fit3, type = "both", breaks = TRUE, K = 5) #----------------------High dimensional linear models--------------------# p <- 1000 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit <- bess(data$x, data$y, method="sequential", family = "gaussian", epsilon = 1e-12) # Plot solution path plot(fit, type = "both", breaks = TRUE, K = 10) data("prostate") x = prostate[,-9] y = prostate[,9] fit.group = bess(x, y, s.list = 1:ncol(x), factor = c("gleason")) #---------------SAheart---------------# data("SAheart") y = SAheart[,5] x = SAheart[,-5] x$ldl[x$ldl<5] = 1 x$ldl[x$ldl>=5&x$ldl<10] = 2 x$ldl[x$ldl>=10] = 3 fit.group = bess(x, y, s.list = 1:ncol(x), factor = c("ldl"), family = "binomial") ## End(Not run)
#--------------linear model--------------# # Generate simulated data n <- 500 p <- 20 K <-10 sigma <- 1 rho <- 0.2 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit1 <- bess(data$x, data$y, family = "gaussian") print(fit1) #coef(fit1, sparse=TRUE) # The estimated coefficients bestmodel <- fit1$bestmodel #summary(bestmodel) # Plot solution path and the loss function plot(fit1, type = "both", breaks = TRUE) ## Not run: #--------------logistic model--------------# # Generate simulated data data <- gen.data(n, p, family="binomial", 5, rho, sigma) # Best subset selection fit2 <- bess(data$x, data$y, s.list = 1:10, method = "sequential", family = "binomial", epsilon = 0) print(fit2) #coef(fit2, sparse = TRUE) bestmodel <- fit2$bestmodel #summary(bestmodel) # Plot solution path and the loss function plot(fit2, type = "both", breaks = TRUE, K = 5) #--------------cox model--------------# # Generate simulated data data <- gen.data(n, p, 5, rho, sigma, c = 10, family = "cox", scal = 10) # Best subset selection fit3 <- bess(data$x, data$y, s.list = 1:10, method = "sequential", family = "cox") print(fit3) #coef(fit3, sparse = TRUE) bestmodel <- fit3$bestmodel #summary(bestmodel) # Plot solution path and the loss function plot(fit3, type = "both", breaks = TRUE, K = 5) #----------------------High dimensional linear models--------------------# p <- 1000 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit <- bess(data$x, data$y, method="sequential", family = "gaussian", epsilon = 1e-12) # Plot solution path plot(fit, type = "both", breaks = TRUE, K = 10) data("prostate") x = prostate[,-9] y = prostate[,9] fit.group = bess(x, y, s.list = 1:ncol(x), factor = c("gleason")) #---------------SAheart---------------# data("SAheart") y = SAheart[,5] x = SAheart[,-5] x$ldl[x$ldl<5] = 1 x$ldl[x$ldl>=5&x$ldl<10] = 2 x$ldl[x$ldl>=10] = 3 fit.group = bess(x, y, s.list = 1:ncol(x), factor = c("ldl"), family = "binomial") ## End(Not run)
Best subset selection with a specified model size for generalized linear models and Cox's proportional hazard model.
bess.one(x, y, family = c("gaussian", "binomial", "cox"), s = 1, max.steps = 15, glm.max = 1e6, cox.max = 20, factor = NULL, weights = rep(1,nrow(x)), normalize = TRUE)
bess.one(x, y, family = c("gaussian", "binomial", "cox"), s = 1, max.steps = 15, glm.max = 1e6, cox.max = 20, factor = NULL, weights = rep(1,nrow(x)), normalize = TRUE)
x |
Input matrix,of dimension n x p; each row is an observation vector. |
y |
Response variable, of length n. For family = " |
s |
Size of the selected model.It controls number of nonzero coefiicients to be allowed in the model. |
family |
One of the ditribution function for GLM or Cox models. Either " |
max.steps |
The maximum number of iterations in the primal dual active set algorithm. In most cases, only a few steps can gurantee the convergence. Default is 15. |
glm.max |
The maximum number of iterations for solving the maximum likelihood problem on the active set. It occurs at each step in the primal dual active set algorithm. Only used in the logistic regression for family = " |
cox.max |
The maximum number of iterations for solving the maximum partial likelihood problem on the active set. It occurs at each step in the primal dual active set algorithm. Only used in Cox model for family = " |
weights |
Observation weights. Default is 1 for each observation |
factor |
Which variable to be factored. Should be NULL or a numeric vector. |
normalize |
Whether to normalize |
Given a model size , we consider the following best subset selection problem:
In the GLM case, is the log-likelihood function; In the Cox model,
is the log parital likelihood function.
The best subset selection problem is solved by the primal dual active set algorithm, see Wen et al. (2017) for details. This algorithm utilizes an active set updating strategy via primal and dual vairables and fits the sub-model by exploiting the fact that their support set are non-overlap and complementary.
A list with class attribute 'bess.one
' and named components:
type |
Types of the model: " |
beta |
The best fitting coefficients with the smallest loss function given the model size |
lambda |
The estimated lambda value in the Lagrangian form of the best subset selection problem with model size |
bestmodel |
The best fitted model, the class of which is "lm", "glm" or "coxph" |
deviance |
The value of |
nulldeviance |
The value of |
factor |
Which variable to be factored. Should be NULL or a numeric vector. |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
bess
, plot.bess
,
predict.bess
.
#--------------linear model--------------# # Generate simulated data n <- 500 p <- 20 K <-10 sigma <- 1 rho <- 0.2 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit1 <- bess.one(data$x, data$y, s = 10, family = "gaussian", normalize = TRUE) #coef(fit1,sparse=TRUE) bestmodel <- fit1$bestmodel #summary(bestmodel) ## Not run: #--------------logistic model--------------# # Generate simulated data data <- gen.data(n, p, family = "binomial", K, rho, sigma) # Best subset selection fit2 <- bess.one(data$x, data$y, family = "binomial", s = 10, normalize = TRUE) bestmodel <- fit2$bestmodel #summary(bestmodel) #--------------cox model--------------# # Generate simulated data data <- gen.data(n, p, K, rho, sigma, c=10, family="cox", scal=10) # Best subset selection fit3 <- bess.one(data$x, data$y, s = 10, family = "cox", normalize = TRUE) bestmodel <- fit3$bestmodel #summary(bestmodel) #----------------------High dimensional linear models--------------------# p <- 1000 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit <- bess.one(data$x, data$y, s=10, family = "gaussian", normalize = TRUE) #---------------prostate---------------# data("prostate") x = prostate[,-9] y = prostate[,9] fit.ungroup = bess.one(x, y, s=5) fit.group = bess.one(x, y, s=5, factor = c("gleason")) #---------------SAheart---------------# data(SAheart) y = SAheart[,5] x = SAheart[,-5] x$ldl[x$ldl<5] = 1 x$ldl[x$ldl>=5&x$ldl<10] = 2 x$ldl[x$ldl>=10] = 3 fit.ungroup = bess.one(x, y, s=5, family = "binomial") fit.group = bess.one(x, y, s=5, factor = c("ldl"), family = "binomial") ## End(Not run)
#--------------linear model--------------# # Generate simulated data n <- 500 p <- 20 K <-10 sigma <- 1 rho <- 0.2 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit1 <- bess.one(data$x, data$y, s = 10, family = "gaussian", normalize = TRUE) #coef(fit1,sparse=TRUE) bestmodel <- fit1$bestmodel #summary(bestmodel) ## Not run: #--------------logistic model--------------# # Generate simulated data data <- gen.data(n, p, family = "binomial", K, rho, sigma) # Best subset selection fit2 <- bess.one(data$x, data$y, family = "binomial", s = 10, normalize = TRUE) bestmodel <- fit2$bestmodel #summary(bestmodel) #--------------cox model--------------# # Generate simulated data data <- gen.data(n, p, K, rho, sigma, c=10, family="cox", scal=10) # Best subset selection fit3 <- bess.one(data$x, data$y, s = 10, family = "cox", normalize = TRUE) bestmodel <- fit3$bestmodel #summary(bestmodel) #----------------------High dimensional linear models--------------------# p <- 1000 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit <- bess.one(data$x, data$y, s=10, family = "gaussian", normalize = TRUE) #---------------prostate---------------# data("prostate") x = prostate[,-9] y = prostate[,9] fit.ungroup = bess.one(x, y, s=5) fit.group = bess.one(x, y, s=5, factor = c("gleason")) #---------------SAheart---------------# data(SAheart) y = SAheart[,5] x = SAheart[,-5] x$ldl[x$ldl<5] = 1 x$ldl[x$ldl>=5&x$ldl<10] = 2 x$ldl[x$ldl>=10] = 3 fit.ungroup = bess.one(x, y, s=5, family = "binomial") fit.group = bess.one(x, y, s=5, factor = c("ldl"), family = "binomial") ## End(Not run)
Similar to other prediction methods, this function provides estimated coefficients from a fitted "bess
" object.
## S3 method for class 'bess' coef(object, sparse=TRUE, type = c("ALL", "AIC", "BIC", "EBIC"),...)
## S3 method for class 'bess' coef(object, sparse=TRUE, type = c("ALL", "AIC", "BIC", "EBIC"),...)
object |
A " |
sparse |
Logical or NULL, specifying whether the coefficients should be presented as sparse matrix or not. |
type |
Types of coefficients returned. |
... |
Other arguments. |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") coef(fit, sparse=TRUE) # The estimated coefficients
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") coef(fit, sparse=TRUE) # The estimated coefficients
Similar to other prediction methods, this function provides estimated coefficients from a fitted "bess.one
" object.
## S3 method for class 'bess.one' coef(object, sparse = TRUE , ...)
## S3 method for class 'bess.one' coef(object, sparse = TRUE , ...)
object |
A " |
sparse |
Logical or NULL, specifying whether the coefficients should be presented as sparse matrix or not. |
... |
Other arguments. |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") coef(fit, sparse=TRUE) # The estimated coefficients
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") coef(fit, sparse=TRUE) # The estimated coefficients
Similar to other deviance methods, which returns deviance from a fitted "bess
" object.
## S3 method for class 'bess' deviance(object,...)
## S3 method for class 'bess' deviance(object,...)
object |
Output from the |
... |
Additional arguments affecting the predictions produced. |
The value of the deviance extracted from the "bess" object.
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") deviance(fit)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") deviance(fit)
Similar to other deviance methods, which returns deviance from a fitted "bess.one
" object.
## S3 method for class 'bess.one' deviance(object,...)
## S3 method for class 'bess.one' deviance(object,...)
object |
Output from the |
... |
Additional arguments affecting the predictions produced. |
The value of the deviance extracted from the "bess.one" object.
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") deviance(fit)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") deviance(fit)
Generate data for simulations under the generalized linear model and Cox model.
gen.data(n, p, family, K, rho = 0, sigma = 1, beta = NULL, censoring = TRUE, c = 1, scal)
gen.data(n, p, family, K, rho = 0, sigma = 1, beta = NULL, censoring = TRUE, c = 1, scal)
n |
The number of observations. |
p |
The number of predictors of interest. |
family |
The distribution of the simulated data. " |
K |
The number of nonzero coefficients in the underlying regression model. |
rho |
A parameter used to characterize the pairwise correlation in predictors. Default is 0. |
sigma |
A parameter used to control the signal-to-noise ratio. For linear regression, it is the error variance |
beta |
The coefficient values in the underlying regression model. |
censoring |
Whether data is censored or not. Default is TRUE |
c |
The censoring rate. Default is 1. |
scal |
A parameter in generating survival time based on the Weibull distribution. Only used for the " |
For the design matrix , we first generate an n x p random Gaussian matrix
whose entries are i.i.d.
and then normalize its columns to the
length. Then the design matrix
is generated with
for
.
For "gaussian
" family, the data model is
The underlying regression coefficient has uniform distribution [m, 100m],
For "binomial
" family, the data model is
The underlying regression coefficient has uniform distribution [2m, 10m],
For "cox
" family, the data model is
The centerning time C
is generated from uniform distribution [0, c], then we define the censor status as .
The underlying regression coefficient
has uniform distribution [2m, 10m],
A list with the following components: x, y, Tbeta.
x |
Design matrix of predictors. |
y |
Response variable |
Tbeta |
The coefficients used in the underlying regression model. |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
# Generate simulated data n <- 500 p <- 20 K <-10 sigma <- 1 rho <- 0.2 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit <- bess(data$x, data$y, family = "gaussian")
# Generate simulated data n <- 500 p <- 20 K <-10 sigma <- 1 rho <- 0.2 data <- gen.data(n, p, family = "gaussian", K, rho, sigma) # Best subset selection fit <- bess(data$x, data$y, family = "gaussian")
Gravier et al. (2010) have considered small, invasive ductal carcinomas without axillary lymph node involvement (T1T2N0) to predict metastasis of small node-negative breast carcinoma. Using comparative genomic hybridization arrays, they examined 168 patients over a five-year period. The 111 patients with no event after diagnosis were labelled good, and the 57 patients with early metastasis were labelled poor.
data(gravier)
data(gravier)
A list containing the design matrix X and response matrix y
Eleonore Gravier., Gaelle Pierron., and Anne Vincent-Salomon. (2010). A prognostic DNA signature for T1T2 node-negative breast cancer patients.
Similar to other logLik methods, which returns loglikelihood from a fitted "bess
" object.
## S3 method for class 'bess' logLik(object,...)
## S3 method for class 'bess' logLik(object,...)
object |
Output from the |
... |
Additional arguments affecting the predictions produced. |
The value of the loglikelihood extracted from the "bess" object.
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") logLik(fit)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") logLik(fit)
Similar to other logLik methods, which returns loglikelihood from a fitted "bess.one
" object.
## S3 method for class 'bess.one' logLik(object,...)
## S3 method for class 'bess.one' logLik(object,...)
object |
Output from the |
... |
Additional arguments affecting the predictions produced. |
The value of the loglikelihood extracted from the "bess.one" object.
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") logLik(fit)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") logLik(fit)
Produces a coefficient profile plot of the coefficient or loss paths for a fitted "bess" object.
## S3 method for class 'bess' plot(x, type=c("loss","coefficients","both"), breaks=TRUE, K=NULL, ...)
## S3 method for class 'bess' plot(x, type=c("loss","coefficients","both"), breaks=TRUE, K=NULL, ...)
x |
a "bess" project |
type |
Either "both", "solutionPath" or "loss" |
breaks |
If TRUE, then vertical lines are drawn at each break point in the coefficient paths |
K |
which break point should the vertical lines drawn at |
... |
Other graphical parameters to plot |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
#--------------linear model--------------# data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") plot(fit, type = "both")
#--------------linear model--------------# data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") plot(fit, type = "both")
Similar to other predict methods, which returns predictions from a fitted "bess
" object.
## S3 method for class 'bess' predict(object, newdata, type = c("ALL", "opt", "AIC", "BIC", "EBIC"),...)
## S3 method for class 'bess' predict(object, newdata, type = c("ALL", "opt", "AIC", "BIC", "EBIC"),...)
object |
Output from the |
newdata |
New data used for prediction. |
type |
Types of coefficients returned. |
... |
Additional arguments affecting the predictions produced. |
The object returned depends on the types of family.
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") pred=predict(fit, newdata = data$x)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") pred=predict(fit, newdata = data$x)
Similar to other predict methods, which returns predictions from a fitted "bess.one
" object.
## S3 method for class 'bess.one' predict(object, newdata, ...)
## S3 method for class 'bess.one' predict(object, newdata, ...)
object |
Output from the |
newdata |
New data used for prediction. |
... |
Additional arguments affecting the predictions produced. |
The object returned depends on the types of family.
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") pred <- predict(fit, newdata = data$x)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") pred <- predict(fit, newdata = data$x)
Print the primary elements of the "bess
" object.
## S3 method for class 'bess' print(x, ...)
## S3 method for class 'bess' print(x, ...)
x |
a " |
... |
additional print arguments |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") print(fit)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") print(fit)
Print the primary elements of the "bess.one
" object.
## S3 method for class 'bess.one' print(x, ...)
## S3 method for class 'bess.one' print(x, ...)
x |
a " |
... |
additional print arguments |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") print(fit)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") print(fit)
Data from a study by by Stamey et al. (1989) to examine the association between prostate specific antigen (PSA) and several clinical measures that are potentially associated with PSA in men who were about to receive a radical prostatectomy. The variables are as follows:
lcavol: Log cancer volume
lweight: Log prostate weight
age: The man's age
lbph: Log of the amount of benign hyperplasia
svi: Seminal vesicle invasion; 1=Yes, 0=No
lcp: Log of capsular penetration
gleason: Gleason score
pgg45: Percent of Gleason scores 4 or 5
lpsa: Log PSA
data(prostate)
data(prostate)
A data frame with 97 observations on 9 variables
Stamey, T., Kabalin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E. and Yang, N. (1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate II. Radical prostatectomy treated patients, Journal of Urology 16: 1076-1083.
Data from a subset of the Coronary Risk-Factor Study baseline survey, carried out in rural South Africa. The variables are as follows:
sbp: Systolic blood pressure
tobacco: Cumulative tobacco consumption, in kg
ldl: Low-density lipoprotein cholesterol
adiposity: Adipose tissue concentration
famhist: Family history of heart disease (1=Present, 0=Absent)
typea: Score on test designed to measure type-A behavior
obesity: Obesity
alcohol: Current consumption of alcohol
age: Age of subject
chd: Coronary heart disease at baseline; 1=Yes 0=No
data(SAheart)
data(SAheart)
A data frame with 462 observations on 10 variables
Rousseauw, J., du Plessis, J., Benade, A., Jordaan, P., Kotze, J. and Ferreira, J. (1983). Coronary risk factor screening in three rural communities. South African Medical Journal 64: 430-436.
Print a summary of the "bess
" object.
## S3 method for class 'bess' summary(object, ...)
## S3 method for class 'bess' summary(object, ...)
object |
a " |
... |
additional print arguments |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") summary(fit)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess(data$x, data$y, family = "gaussian") summary(fit)
Print a summary of the "bess.one
" object.
## S3 method for class 'bess.one' summary(object, ...)
## S3 method for class 'bess.one' summary(object, ...)
object |
a " |
... |
additional print arguments |
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") summary(fit)
data <- gen.data(500, 20, family = "gaussian", 10, 0.2, 1) fit <- bess.one(data$x, data$y, s = 10, family = "gaussian") summary(fit)