Title: | Multiple-Instance Logistic Regression with LASSO Penalty |
---|---|
Description: | The multiple instance data set consists of many independent subjects (called bags) and each subject is composed of several components (called instances). The outcomes of such data set are binary or categorical responses, and, we can only observe the subject-level outcomes. For example, in manufacturing processes, a subject is labeled as "defective" if at least one of its own components is defective, and otherwise, is labeled as "non-defective". The 'milr' package focuses on the predictive model for the multiple instance data set with binary outcomes and performs the maximum likelihood estimation with the Expectation-Maximization algorithm under the framework of logistic regression. Moreover, the LASSO penalty is attached to the likelihood function for simultaneous parameter estimation and variable selection. |
Authors: | Ping-Yang Chen [aut, cre], ChingChuan Chen [aut], Chun-Hao Yang [aut], Sheng-Mao Chang [aut] |
Maintainer: | Ping-Yang Chen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.1 |
Built: | 2025-01-01 07:06:48 UTC |
Source: | CRAN |
The multiple instance data set consists of many independent subjects (called bags) and each subject is composed of several components (called instances). The outcomes of such data set are binary or multinomial, and, we can only observe the subject-level outcomes. For example, in manufactory processes, a subject is labeled as "defective" if at least one of its own components is defective, and otherwise, is labeled as "non-defective". The milr package focuses on the predictive model for the multiple instance data set with binary outcomes and performs the maximum likelihood estimation with the Expectation-Maximization algorithm under the framework of logistic regression. Moreover, the LASSO penalty is attached to the likelihood function for simultaneous parameter estimation and variable selection.
Chen, R.-B., Cheng, K.-H., Chang, S.-M., Jeng, S.-L., Chen, P.-Y., Yang, C.-H., and Hsia, C.-C. (2016). Multiple-Instance Logistic Regression with LASSO Penalty. arXiv:1607.03615 [stat.ML].
Generating the multiple-instance data set.
DGP(n, m, beta)
DGP(n, m, beta)
n |
an integer. The number of bags. |
m |
an integer or vector of length |
beta |
a vector. The true regression coefficients. |
a list including (1) bag-level labels, Z
, (2) the design matrix, X
, and (3) bag ID of each instance, ID
.
data1 <- DGP(50, 3, runif(10, -5, 5)) data2 <- DGP(50, sample(3:5, 50, TRUE), runif(10, -5, 5))
data1 <- DGP(50, 3, runif(10, -5, 5)) data2 <- DGP(50, sample(3:5, 50, TRUE), runif(10, -5, 5))
Fitted Response of milr Fits
## S3 method for class 'milr' fitted(object, type = "bag", ...)
## S3 method for class 'milr' fitted(object, type = "bag", ...)
object |
A fitted obejct of class inheriting from |
type |
The type of fitted response required. Default is |
... |
further arguments passed to or from other methods. |
Fitted Response of softmax Fits
## S3 method for class 'softmax' fitted(object, type = "bag", ...)
## S3 method for class 'softmax' fitted(object, type = "bag", ...)
object |
A fitted obejct of class inheriting from |
type |
The type of fitted response required. Default is |
... |
further arguments passed to or from other methods. |
calculate the values of logit link
logit(X, beta)
logit(X, beta)
X |
A matrix, the design matrix. |
beta |
A vector, the coefficients. |
An vector of the values of logit link.
Please refer to milr-package.
milr( y, x, bag, lambda = 0, numLambda = 20L, lambdaCriterion = "BIC", nfold = 10L, maxit = 500L )
milr( y, x, bag, lambda = 0, numLambda = 20L, lambdaCriterion = "BIC", nfold = 10L, maxit = 500L )
y |
a vector. Bag-level binary labels. |
x |
the design matrix. The number of rows of |
bag |
a vector, bag id. |
lambda |
the tuning parameter for LASSO-penalty. If |
numLambda |
An integer, the maximum length of LASSO-penalty. in atuo-tunning mode
( |
lambdaCriterion |
a string, the used optimality criterion for tuning the |
nfold |
an integer, the number of fold for cross-validation to choose the optimal |
maxit |
an integer, the maximum iteration for the EM algorithm. The default is 500. |
An object with S3 class "milr".
lambdaa vector of candidate lambda values.
cva vector of predictive deviance via nfold
-fold cross validation
when lambdaCriterion = "deviance"
.
deviancea vector of deviance of candidate model for each candidate lambda value.
BICa vector of BIC of candidate model for each candidate lambda value.
best_indexan integer, indicates the index of the best model among candidate lambda values.
best_modela list of the information for the best model including deviance (not cv deviance), BIC, chosen lambda, coefficients, fitted values, log-likelihood and variances of coefficients.
set.seed(100) beta <- runif(5, -5, 5) trainData <- DGP(40, 3, beta) testData <- DGP(5, 3, beta) # default (not use LASSO) milr_result <- milr(trainData$Z, trainData$X, trainData$ID) coef(milr_result) # coefficients fitted(milr_result) # fitted bag labels fitted(milr_result, type = "instance") # fitted instance labels summary(milr_result) # summary milr predict(milr_result, testData$X, testData$ID) # predicted bag labels predict(milr_result, testData$X, testData$ID, type = "instance") # predicted instance labels # use BIC to choose penalty (not run) #milr_result <- milr(trainData$Z, trainData$X, trainData$ID, # exp(seq(log(0.01), log(50), length = 30))) #coef(milr_result) # coefficients #fitted(milr_result) # fitted bag labels #fitted(milr_result, type = "instance") # fitted instance labels #summary(milr_result) # summary milr #predict(milr_result, testData$X, testData$ID) # predicted bag labels #predict(milr_result, testData$X, testData$ID, type = "instance") # predicted instance labels # use auto-tuning (not run) #milr_result <- milr(trainData$Z, trainData$X, trainData$ID, lambda = -1, numLambda = 20) #coef(milr_result) # coefficients #fitted(milr_result) # fitted bag labels #fitted(milr_result, type = "instance") # fitted instance labels #summary(milr_result) # summary milr #predict(milr_result, testData$X, testData$ID) # predicted bag labels #predict(milr_result, testData$X, testData$ID, type = "instance") # predicted instance labels # use cv in auto-tuning (not run) #milr_result <- milr(trainData$Z, trainData$X, trainData$ID, # lambda = -1, numLambda = 20, lambdaCriterion = "deviance") #coef(milr_result) # coefficients #fitted(milr_result) # fitted bag labels #fitted(milr_result, type = "instance") # fitted instance labels #summary(milr_result) # summary milr #predict(milr_result, testData$X, testData$ID) # predicted bag labels #predict(milr_result, testData$X, testData$ID, type = "instance") # predicted instance labels
set.seed(100) beta <- runif(5, -5, 5) trainData <- DGP(40, 3, beta) testData <- DGP(5, 3, beta) # default (not use LASSO) milr_result <- milr(trainData$Z, trainData$X, trainData$ID) coef(milr_result) # coefficients fitted(milr_result) # fitted bag labels fitted(milr_result, type = "instance") # fitted instance labels summary(milr_result) # summary milr predict(milr_result, testData$X, testData$ID) # predicted bag labels predict(milr_result, testData$X, testData$ID, type = "instance") # predicted instance labels # use BIC to choose penalty (not run) #milr_result <- milr(trainData$Z, trainData$X, trainData$ID, # exp(seq(log(0.01), log(50), length = 30))) #coef(milr_result) # coefficients #fitted(milr_result) # fitted bag labels #fitted(milr_result, type = "instance") # fitted instance labels #summary(milr_result) # summary milr #predict(milr_result, testData$X, testData$ID) # predicted bag labels #predict(milr_result, testData$X, testData$ID, type = "instance") # predicted instance labels # use auto-tuning (not run) #milr_result <- milr(trainData$Z, trainData$X, trainData$ID, lambda = -1, numLambda = 20) #coef(milr_result) # coefficients #fitted(milr_result) # fitted bag labels #fitted(milr_result, type = "instance") # fitted instance labels #summary(milr_result) # summary milr #predict(milr_result, testData$X, testData$ID) # predicted bag labels #predict(milr_result, testData$X, testData$ID, type = "instance") # predicted instance labels # use cv in auto-tuning (not run) #milr_result <- milr(trainData$Z, trainData$X, trainData$ID, # lambda = -1, numLambda = 20, lambdaCriterion = "deviance") #coef(milr_result) # coefficients #fitted(milr_result) # fitted bag labels #fitted(milr_result, type = "instance") # fitted instance labels #summary(milr_result) # summary milr #predict(milr_result, testData$X, testData$ID) # predicted bag labels #predict(milr_result, testData$X, testData$ID, type = "instance") # predicted instance labels
Predict Method for milr Fits
## S3 method for class 'milr' predict(object, newdata = NULL, bag_newdata = NULL, type = "bag", ...)
## S3 method for class 'milr' predict(object, newdata = NULL, bag_newdata = NULL, type = "bag", ...)
object |
A fitted obejct of class inheriting from |
newdata |
Default is |
bag_newdata |
Default is |
type |
The type of prediction required. Default is |
... |
further arguments passed to or from other methods. |
Predict Method for softmax Fits
## S3 method for class 'softmax' predict(object, newdata = NULL, bag_newdata = NULL, type = "bag", ...)
## S3 method for class 'softmax' predict(object, newdata = NULL, bag_newdata = NULL, type = "bag", ...)
object |
A fitted obejct of class inheriting from |
newdata |
Default is |
bag_newdata |
Default is |
type |
The type of prediction required. Default is |
... |
further arguments passed to or from other methods. |
This function calculates the alternative maximum likelihood estimation for multiple-instance logistic regression through a softmax function (Xu and Frank, 2004; Ray and Craven, 2005).
softmax(y, x, bag, alpha = 0, ...)
softmax(y, x, bag, alpha = 0, ...)
y |
a vector. Bag-level binary labels. |
x |
the design matrix. The number of rows of |
bag |
a vector, bag id. |
alpha |
A non-negative realnumber, the softmax parameter. |
... |
arguments to be passed to the |
a list including coefficients and fitted values.
S. Ray, and M. Craven. (2005) Supervised versus multiple instance learning: An empirical comparsion. in Proceedings of the 22nd International Conference on Machine Learnings, ACM, 697–704.
X. Xu, and E. Frank. (2004) Logistic regression and boosting for labeled bags of instances. in Advances in Knowledge Discovery and Data Mining, Springer, 272–281.
set.seed(100) beta <- runif(10, -5, 5) trainData <- DGP(40, 3, beta) testData <- DGP(5, 3, beta) # Fit softmax-MILR model S(0) softmax_result <- softmax(trainData$Z, trainData$X, trainData$ID, alpha = 0) coef(softmax_result) # coefficients fitted(softmax_result) # fitted bag labels fitted(softmax_result, type = "instance") # fitted instance labels predict(softmax_result, testData$X, testData$ID) # predicted bag labels predict(softmax_result, testData$X, testData$ID, type = "instance") # predicted instance labels # Fit softmax-MILR model S(3) (not run) # softmax_result <- softmax(trainData$Z, trainData$X, trainData$ID, alpha = 3)
set.seed(100) beta <- runif(10, -5, 5) trainData <- DGP(40, 3, beta) testData <- DGP(5, 3, beta) # Fit softmax-MILR model S(0) softmax_result <- softmax(trainData$Z, trainData$X, trainData$ID, alpha = 0) coef(softmax_result) # coefficients fitted(softmax_result) # fitted bag labels fitted(softmax_result, type = "instance") # fitted instance labels predict(softmax_result, testData$X, testData$ID) # predicted bag labels predict(softmax_result, testData$X, testData$ID, type = "instance") # predicted instance labels # Fit softmax-MILR model S(3) (not run) # softmax_result <- softmax(trainData$Z, trainData$X, trainData$ID, alpha = 3)