Title: | Multivariate Cluster Elastic Net |
---|---|
Description: | Fits the Multivariate Cluster Elastic Net (MCEN) presented in Price & Sherwood (2018) <arXiv:1707.03530>. The MCEN model simultaneously estimates regression coefficients and a clustering of the responses for a multivariate response model. Currently accommodates the Gaussian and binomial likelihood. |
Authors: | Ben Sherwood [aut, cre], Brad Price [aut] |
Maintainer: | Ben Sherwood <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.1 |
Built: | 2024-12-25 06:46:52 UTC |
Source: | CRAN |
Adjusts the value of the coefficients to account for the scaling of x and y.
beta_adjust(beta, sigma_x, sigma_y, mean_x, mean_y)
beta_adjust(beta, sigma_x, sigma_y, mean_x, mean_y)
beta |
The estiamte of beta with scaled data. |
sigma_x |
Sample tandard deviations of the original predictors. |
sigma_y |
Sample standard deviations of the orignal responses. |
mean_x |
Sample means of the original predictors . |
mean_y |
Sample means of the original responses. |
Returns the adjusted coefficients
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Adjusts the value of the binomial coefficients to account for the scaling of x.
beta_adjust_bin(beta, sigma_x)
beta_adjust_bin(beta, sigma_x)
beta |
The estiamte of beta with scaled data. |
sigma_x |
Sample tandard deviations of the original predictors. |
Returns the adjusted coefficients
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
The workhorse function for the binomial updates in mcen. It uses IRWLS glmnet updates to solve the regression problem.
bin_horse(Y, X, delta, gamma_y, y_clusters, set_length, eps, maxiter)
bin_horse(Y, X, delta, gamma_y, y_clusters, set_length, eps, maxiter)
Y |
the matrix of responses |
X |
the matrix of predictors with the intercept included |
delta |
the tuning parameter for the lasso penalty |
gamma_y |
the tuning parameter for the ridge fusion penalty |
y_clusters |
the cluster assignments from the provided clustering algorithm |
set_length |
the size of each cluster corresponding to a given response. r dimensions with each element containing the cluster size of that responses cluster. |
eps |
the tolerance for conversion normally 1e-5 |
maxiter |
the maximum number of iterations |
Returns a matrix of coefficients
Brad Price <[email protected]>
Creates the the working response for all responses for glmnet binomial family
CalcHorseBin(Y, X, Beta)
CalcHorseBin(Y, X, Beta)
Y |
is the matrix of responses result is the list of vectors needed for the working responses in glmnet |
X |
the matrix of predictors. |
Beta |
current iteration of the regression coefficients |
Brad Price <[email protected]>
Creates the probabilities and working response for the glmnet update for a given response with a binomial family
CalcHorseEBin(X, Beta, Y, r)
CalcHorseEBin(X, Beta, Y, r)
X |
the matrix of predictors. |
Beta |
current iteration of the regression coefficients |
Y |
is the matrix of responses |
r |
the response of interest result is a list of things needed for the working response in glmnet |
Brad Price <[email protected]>
Wrapper function for different clustering methods
cluster(x, cNum, clusterMethod = "kmeans", clusterIterations = 100, clusterStartNum = 30)
cluster(x, cNum, clusterMethod = "kmeans", clusterIterations = 100, clusterStartNum = 30)
x |
data to be clustered. Clustering will be done on the columns. |
cNum |
number of cluster centers |
clusterMethod |
"kmean" for kmeans function, "kmeanspp" for kcca implementation of kmeans++ |
clusterIterations |
number of maximum iterations for clustering |
clusterStartNum |
random number of starting points used |
Returns cluster assignments
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Returns the cluster values from a cv.mcen object.
cluster.vals(obj)
cluster.vals(obj)
obj |
The cv.mcen object. |
Returns the clusters from the model with the smallest cross-validation error.
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- cv.mcen(x,y,ky=2,gamma_y=3) mcen_cluster <- cluster.vals(mcen_fit)
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- cv.mcen(x,y,ky=2,gamma_y=3) mcen_cluster <- cluster.vals(mcen_fit)
Returns the coefficients from the cv.mcen object with the smallest cross-validation error.
## S3 method for class 'cv.mcen' coef(object, ...)
## S3 method for class 'cv.mcen' coef(object, ...)
object |
The cv.mcen object. |
... |
Additional values to be passed. |
The matrix of coefficients for the best MCEN model as determined by cross-validation.
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- cv.mcen(x,y,ky=2,gamma_y=3) best_coef <- coefficients(mcen_fit)
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- cv.mcen(x,y,ky=2,gamma_y=3) best_coef <- coefficients(mcen_fit)
Returns the coefficients from an mcen object.
## S3 method for class 'mcen' coef(object, delta = NULL, ...)
## S3 method for class 'mcen' coef(object, delta = NULL, ...)
object |
The mcen object. |
delta |
The L1 tuning parameter |
... |
Additional values to pass on. |
The matrix of coefficients.
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- mcen(x,y,ky=2,gamma_y=3,delta=c(1,2)) best_coef <- coefficients(mcen_fit,delta=1)
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- mcen(x,y,ky=2,gamma_y=3,delta=c(1,2)) best_coef <- coefficients(mcen_fit,delta=1)
Cross validation for mcen function
cv.mcen(x, y, family = "mgaussian", ky = seq(2, 4), gamma_y = seq(0.1, 5.1, 0.5), nfolds = 10, folds = NULL, cluster_y = NULL, delta=NULL, n.cores = 1, ...)
cv.mcen(x, y, family = "mgaussian", ky = seq(2, 4), gamma_y = seq(0.1, 5.1, 0.5), nfolds = 10, folds = NULL, cluster_y = NULL, delta=NULL, n.cores = 1, ...)
x |
Matrix set of predictors. |
y |
Matrix set of responses. |
family |
The exponential family the response corresponds to. |
ky |
A vector with the number of possible clusters for y. |
gamma_y |
Set of tuning parameter for clustering penalty in response categories. |
nfolds |
Number of folds used in the cross-validation. |
folds |
A vector of length n, where this identifies what fold of the kfold cross validation each observation belongs to. |
cluster_y |
a priori definition of clusters. If clusters are provided they will remain fixed and are not estimated. Objective function is then convex. |
delta |
Tuning parameter for the L1 penalty |
n.cores |
Number of cores used for parallel processing. |
... |
The variables passed to mcen |
Returns a cv.mcen object.
models |
A list of mcen objects. |
cv |
Cross validation results. |
ky |
The same value as the input ky. |
gamma_y |
The same value as the input gamma_y. |
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Price, B.S. and Sherwood, B. (2018). A Cluster Elastic Net for Multivariate Regression. arXiv preprint arXiv:1707.03530. http://arxiv-export-lb.library.cornell.edu/abs/1707.03530.
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) cv_fit <- cv.mcen(x,y,ky=2)
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) cv_fit <- cv.mcen(x,y,ky=2)
Gets the index position for the model with the smallest cross-validation error.
get_best_cvm(model)
get_best_cvm(model)
model |
The cv.mcen object. |
Returns the index for the model with the smallest cross-validation error.
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- cv.mcen(x,y,ky=2,gamma_y=3) get_best_cvm(mcen_fit)
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- cv.mcen(x,y,ky=2,gamma_y=3) get_best_cvm(mcen_fit)
matrix multiply
matrix_multiply(beta, x)
matrix_multiply(beta, x)
beta |
Matrix of coefficients. |
x |
Design matrix. |
Returns x times beta
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Fits an MCEN model
mcen(x, y, family = "mgaussian", ky = NULL, delta = NULL, gamma_y = 1, ndelta = 25, delta.min.ratio = NULL, eps = 1e-05, scale_x = TRUE, scale_y = TRUE, clusterMethod = "kmeans", clusterStartNum = 30, clusterIterations = 10, cluster_y = NULL, max_iter = 10, init_beta = NULL, n.cores = 1)
mcen(x, y, family = "mgaussian", ky = NULL, delta = NULL, gamma_y = 1, ndelta = 25, delta.min.ratio = NULL, eps = 1e-05, scale_x = TRUE, scale_y = TRUE, clusterMethod = "kmeans", clusterStartNum = 30, clusterIterations = 10, cluster_y = NULL, max_iter = 10, init_beta = NULL, n.cores = 1)
x |
Matrix of predictors. |
y |
Matrix of responses. |
family |
Type of likelihood used two options "mgaussian" or "mbinomial". |
ky |
Clusters for response. |
delta |
L1 penalty. |
gamma_y |
Penalty for with y clusters difference in predicted values. |
ndelta |
Number of delta parameters. |
delta.min.ratio |
Ratio between smallest and largest delta. |
eps |
Convergence criteria. |
scale_x |
Whether x matrix should be scaled, default is True. |
scale_y |
Whether y matrix should be scaled, default is True. |
clusterMethod |
K-means function used kmeans or kmeanspp. |
clusterStartNum |
Number of random starting points for clustering. |
clusterIterations |
Number of iterations for cluster convergence. |
cluster_y |
An a priori definition of clusters. If clusters are provided they will remain fixed and are not estimated. Objective function is then convex. |
max_iter |
Maximum number of iterations for coefficient estimates. |
init_beta |
Clustering step requires an initial estimate, default is to use elastic net solution. |
n.cores |
Number of cores used for calculation default is 1. |
returns a MCEN object
beta |
List of the coefficient estimates. |
delta |
Value of delta. |
gamma_y |
Value of gamma_y. |
ky |
Value of ky. |
y_clusters |
List of the clusters of y. |
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Price, B.S. and Sherwood, B. (2018). A Cluster Elastic Net for Multivariate Regression. arXiv preprint arXiv:1707.03530. http://arxiv-export-lb.library.cornell.edu/abs/1707.03530.
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- mcen(x,y,ky=2,delta=1)
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- mcen(x,y,ky=2,delta=1)
Calculates cluster assignment and coefficient estimates for a binomial mcen.
mcen_bin_workhorse(beta, delta = NULL, y, x, family = "mbinomial", ky = NULL, gamma_y = 1, eps = 1e-05, clusterMethod = "kmeans", clusterIterations = 100, clusterStartNum = 30, cluster_y = NULL, max_iter = 10)
mcen_bin_workhorse(beta, delta = NULL, y, x, family = "mbinomial", ky = NULL, gamma_y = 1, eps = 1e-05, clusterMethod = "kmeans", clusterIterations = 100, clusterStartNum = 30, cluster_y = NULL, max_iter = 10)
beta |
Initial estimate of coefficients. |
delta |
Tuning parameter for L1 penalty. |
y |
Matrix of responses. |
x |
Matrix of predictors. |
family |
type of likelihood used two options "mgaussian" or "mbinomial" |
ky |
Number of clusters used for grouping response variables. |
gamma_y |
Tuning parameter for the penalty between fitted values for responses in the same group. |
eps |
Convergence criteria |
clusterMethod |
Which clustering method was used, currently support kmeans or kmeanspp |
clusterIterations |
Number of iterations for cluster convergence |
clusterStartNum |
Number of random starting points for clustering |
cluster_y |
An a priori definition of clusters. If clusters are provided they will remain fixed and are not estimated. Objective function is then convex. |
max_iter |
The maximum number of iterations for estimating the coefficients |
Brad Price <[email protected]>
Estimates the clusters and provides the coefficients for an mcen object
mcen_workhorse(beta, delta = NULL, xx, xy, family = "mgaussian", ky = NULL, gamma_y = 0.5, eps = 1e-05, clusterMethod = "kmeans", clusterIterations = 100, clusterStartNum = 30, cluster_y = NULL, max_iter = 10, x = x)
mcen_workhorse(beta, delta = NULL, xx, xy, family = "mgaussian", ky = NULL, gamma_y = 0.5, eps = 1e-05, clusterMethod = "kmeans", clusterIterations = 100, clusterStartNum = 30, cluster_y = NULL, max_iter = 10, x = x)
beta |
The initial value of the coefficients |
delta |
The sparsity (L1) tuning parameter |
xx |
Matrix of transpose of x times x. |
xy |
Matrix of transpose of x times y. |
family |
Type of likelihood used two options "mgaussian" or "mbinomial" |
ky |
Number of clusters for the response |
gamma_y |
Penalty for the y clusters difference in predicted values |
eps |
Convergence criteria |
clusterMethod |
Which clustering method was used, currently support kmeans or kmeanspp |
clusterIterations |
Number of iterations for cluster convergence |
clusterStartNum |
Number of random starting points for clustering |
cluster_y |
An a priori definition of clusters. If clusters are provided they will remain fixed and are not estimated. Objective function is then convex. |
max_iter |
The maximum number of iterations for estimating the coefficients |
x |
The design matrix |
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Provides initial estimates for the mcen functionF
mcen.init(x, y, family = "mgaussian", delta = NULL, gamma_y = 1, intercept = FALSE)
mcen.init(x, y, family = "mgaussian", delta = NULL, gamma_y = 1, intercept = FALSE)
x |
the n x p design matrix |
y |
the n x y matrix of responses |
family |
type of likelihood used two options "mgaussian" or "mbinomial" |
delta |
sparsity tuning parameter |
gamma_y |
tuning parameter for clustering responses |
intercept |
whether an intercept should be included in the model |
matrix of coefficients
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Calculates the out of sample likelihood for an mcen object
pred_eval(obj, test_x, test_y)
pred_eval(obj, test_x, test_y)
obj |
The mcen object. |
test_x |
The matrix of test predictors. |
test_y |
The matrix of test responses. |
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Evaluates prediction error for multiple binomial responses.
## S3 method for class 'mbinom_mcen' pred_eval(obj, test_x, test_y)
## S3 method for class 'mbinom_mcen' pred_eval(obj, test_x, test_y)
obj |
The mbinom_mcen object. |
test_x |
A matrix of the test predictors. |
test_y |
A matrix of the test responses. |
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Calculates the prediction error for a mgauss_mcen object.
## S3 method for class 'mgauss_mcen' pred_eval(obj, test_x, test_y)
## S3 method for class 'mgauss_mcen' pred_eval(obj, test_x, test_y)
obj |
The mgauss_mcen object. |
test_x |
The matrix of test predictors. |
test_y |
The matrix of test responses. |
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Makes predictions from the model with the smallest cross-validation error.
## S3 method for class 'cv.mcen' predict(object, newx, ...)
## S3 method for class 'cv.mcen' predict(object, newx, ...)
object |
The cv.mcen object. |
newx |
The X matrix of predictors. |
... |
Additional parameters to be sent to predict. |
Returns the predicted values from the model with the smallest cross-validation error.
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- cv.mcen(x,y,ky=2,gamma_y=3) new_x <- matrix(rnorm(12),ncol=4) mcen_preds <- predict(mcen_fit, new_x)
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- cv.mcen(x,y,ky=2,gamma_y=3) new_x <- matrix(rnorm(12),ncol=4) mcen_preds <- predict(mcen_fit, new_x)
predictions from a mcen model
## S3 method for class 'mcen' predict(object, newx, ...)
## S3 method for class 'mcen' predict(object, newx, ...)
object |
The mcen object. |
newx |
A matrix of new observations. |
... |
Additional variables to be sent to predict. |
Returns predictions for each beta of an mcen object
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- mcen(x,y,ky=2,delta=1) new_x <- matrix(rnorm(12),ncol=4) mcen_preds <- predict(mcen_fit, new_x)
x <- matrix(rnorm(400),ncol=4) beta <- beta <- matrix(c(1,1,0,0,0,0,-1,-1,0,0,-1,-1,1,1,0,0),ncol=4) y <- x%*%beta + rnorm(400) mcen_fit <- mcen(x,y,ky=2,delta=1) new_x <- matrix(rnorm(12),ncol=4) mcen_preds <- predict(mcen_fit, new_x)
Prints nice output for a cv.mcen object.
## S3 method for class 'cv.mcen' print(x, ...)
## S3 method for class 'cv.mcen' print(x, ...)
x |
The cv.mcen object. |
... |
Additional parameters. |
Prints out information about where the cv.mcen object was minimized.
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Prints nice output for an mcen object.
## S3 method for class 'mcen' print(x, ...)
## S3 method for class 'mcen' print(x, ...)
x |
The mcen object. |
... |
Additional parameters. |
Prints out some basic information about the mcen object.
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
randomly assign n samples to k groups
randomly_assign(n, k)
randomly_assign(n, k)
n |
number of samples |
k |
number of groups |
Returns assignments of n into k groups
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
SetEq test set equivalence of two clustering sets
SetEq(set1, set2)
SetEq(set1, set2)
set1 |
is the cluster assignments of the previous iteration |
set2 |
is the cluster assignments of the current clusters |
Returns a logical saying if the two clusterings are equal
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Calculates sum of squared error between two vectors or matrices
squared_error(pred, test_y)
squared_error(pred, test_y)
pred |
the predictions |
test_y |
the testing response values |
returns the sum of the squared differences between pred and test_y
Ben Sherwood <[email protected]>, Brad Price <[email protected]>
Calculates out of sample error on the binomial likelihood
vl_binom(pred, test_y)
vl_binom(pred, test_y)
pred |
The predicted values. |
test_y |
The test response values. |
Brad Price <[email protected]>