Multiple-instance learning (MIL) is used to model the class labels which are associated with bags of observations instead of the individual observations. This technique has been widely used in solving many different real-world problems. In the early stage of the MIL application, Dietterich, Lathrop, and Lozano-Pérez (1997) studied the drug-activity prediction problem. A molecule is classified as a good drug if it is able to bind strongly to a binding site on the target molecule. The problem is: one molecule can adopt multiple shapes called the conformations and only one or a few conformations can bind the target molecule well. They described a molecule by a bag of its many possible conformations whose binding strength remains unknown. An important application of MIL is the image and text categorization, such as in Maron and Ratan (1998), Andrews, Tsochantaridis, and Hofmann (2003), J. Zhang et al. (2007), Zhou, Sun, and Li (2009), Li et al. (2011), Kotzias et al. (2015), to name a few. An image (bag) possessing at least one particular pattern (instance) is categorized into one class; otherwise, it is categorized into another class. For example, Maron and Ratan (1998) treated the natural scene images as bags, and, each bag is categorized as the scene of waterfall if at least one of its subimages is the waterfall. Whereas, Zhou, Sun, and Li (2009) studied the categorization of collections (bags) of posts (instances) from different newsgroups corpus. A collection is a positive bag if it contains 3% posts from a target corpus category and the remaining 97% posts, as well as all posts in the negative bags, belong to the other corpus categories. MIL is also used in medical researches. The UCSB breast cancer study (Kandemir, Zhang, and A. (2014)) is such a case. Patients (bags) were diagnosed as having or not having cancer by doctors; however, the computer, initially, had no knowledge of which patterns (instances) were associated with the disease. Furthermore, in manufacturing processes (Chen et al. (2016)), a product (bag) is defective as long as one or more of its components (instances) are defective. In practice, at the initial stage, we only know that a product is defective, and we have no idea which component is responsible for the defect.
Several approaches have been offered to analyze datasets with multiple instances, e.g., Maron (1998), Ray and Craven (2005), Xu and Frank (2004), Q. Zhang and Goldman (2002). From our point of view, the statuses of these components are missing variables, and thus, the Expectation-Maximization (EM) algorithm (Dempster, Laird, and Rubin (1977)) can play a role in multiple-instance learning. By now the toolboxes or libraries available for implementing MIL methods are developed by other computer softwares. For example, Yang (2008) and Tax and Cheplygina (2016) are implemented in MATLAB software, but neither of them carries the methods based on logistic regression model. Settles, Craven, and Ray (2008) provided the Java codes including the method introduced in Ray and Craven (2005). Thus, for R users, we are first to develop a MIL-related package based on logistic regression modelling which is called multiple-instance logistic regression (MILR). In this package, we first apply the logistic regression defined in Ray and Craven (2005) and Xu and Frank (2004), and then, we use the EM algorithm to obtain maximum likelihood estimates of the regression coefficients. In addition, the popular lasso penalty (Tibshirani (1996)) is applied to the likelihood function so that parameter estimation and variable selection can be performed simultaneously. This feature is especially desirable when the number of covariates is relatively large.
To fix ideas, we firstly define the notations and introduce the construction of the likelihood function. Suppose that the dataset consists of n bags and that there are mi instances in the ith bag for i = 1, …, n. Let Zi denote the status of the ith bag, and let Yij be the status of the jth instance in the ith bag along with xij ∈ ℜp as the corresponding covariates. We assume that the Yij follow independent Bernoulli distributions with defect rates of pij, where pij = g(β0 + xijTβ) and g(x) = 1/(1 + e−x). We also assume that the Zi follow independent Bernoulli distributions with defect rates of πi. Therefore, the bag-level likelihood function is
To associate the bag-level defect rate πi with the instance-level defect rates pij, several methods have been proposed. The bag-level status is defined as $Z_i=I\left(\sum_{j=1}^{m_i}Y_{ij}>0\right)$. If the independence assumption among the Yij holds, the bag-level defect rate is $\pi_i=1-\prod_{j=1}^{m_i}(1-p_{ij})$. On the other hand, if the independence assumption might not be held, Xu and Frank (2004) and Ray and Craven (2005) proposed the softmax function to associate πi to pij, as follows:
where α is a pre-specified nonnegative value. Xu and Frank (2004) used α = 0, therein modeling πi by taking the average of pij, j = 1, …, mi, whereas Ray and Craven (2005) suggested α = 3. We observe that the likelihood () applying neither the πi function nor the si(α) function results in effective estimators.
Below, we begin by establishing the E-steps and M-steps required for the EM algorithm and then attach the lasso penalty for the estimation and feature selection. Several computation strategies applied are the same as those addressed in Friedman, Hastie, and Tibshirani (2010). Finally, we demonstrate the functions provided in the milr package via simulations and on a real dataset.
If the instance-level statuses, yij, are observable, the complete data likelihood is $$\prod_{i=1}^n\prod_{j=1}^{m_i}p_{ij}^{y_{ij}}q_{ij}^{1-y_{ij}}~,$$ where qij = 1 − pij. An ordinary approach, such as the Newton method, can be used to solve this maximal likelihood estimate (MLE). However, considering multiple-instance data, we can only observe the statuses of the bags, $Z_i=I\left(\sum_{j=1}^{m_j}Y_{ij}>0\right)$, and not the statuses of the instances Yij. As a result, we apply the EM algorithm to obtain the MLEs of the parameters by treating the instance-level labels as the missing data.
In the E-step, two conditional distributions of the missing data given the bag-level statuses Zi are Pr(Yi1 = 0, …, Yimi = 0 ∣ Zi = 0) = 1 and $$ Pr\left(Y_{ij}=y_{ij}, \quad j=1,\dots, m_i \mid Z_i=1\right) = \frac{ \prod_{j=1}^{m_i}p_{ij}^{y_{ij}}q_{ij}^{1-y_{ij}}\times I\left(\sum_{j=1}^{m_i}y_{ij}>0\right) }{1-\prod_{l=1}^{m_i}q_{il}}. $$ Thus, the conditional expectations are
The Q function at step t is $Q\left(\beta_0,\beta\mid\beta_0^t,\beta^t\right) = \sum_{i=1}^nQ_i\left(\beta_0,\beta\mid\beta_0^t,\beta^t\right)$, where Qi is the conditional expectation of the complete log-likelihood for the ith bag given Zi, which is defined as
Note that all the pij, qij, and γij are functions of β0 and β, and thus, we define these functions by substituting β0 and β by their current estimates β0t and βt to obtain pijt, qijt, and γijt, respectively.
In the M-step, we maximize this Q function with respect to (β0, β). Since the maximization of the nonlinear Q function is computationally expensive, following Friedman, Hastie, and Tibshirani (2010), the quadratic approximation to Q is applied. Taking the second-order Taylor expansion about β0t and βt, we have Q(β0, β ∣ β0t, βt) = QQ(β0, β ∣ β0t, βt) + C + R2(β0, β ∣ β0t, βt), where C is a constant in terms of β0 and β, R2(β0, β ∣ β0t, βt) is the remainder term of the expansion and $$ Q_Q\left(\beta_0,\beta\mid \beta_0^t,\beta^t\right) = -\frac{1}{2}\sum_{i=1}^n\sum_{j=1}^{m_i}w_{ij}^t\left[u_{ij}^t-\beta_0-x_{ij}^T\beta\right]^2, $$ where uijt = β0 + xijTβt + (ziγijt − pijt)/(pijtqijt) and wijt = pijtqijt. In the milr package, instead of maximizing Q(β0, β ∣ β0t, βt), we maximize its quadratic approximation, QQ(β0, β ∣ β0t, βt). Since the objective function is quadratic, the roots of ∂QQ/∂β0 and ∂QQ/∂β have closed-form representations.
We adopt the lasso method (Tibshirani (1996)) to identify active features in this MILR framework. The key is to add the L1 penalty into the objective function in the M-step so that the EM algorithm is capable of performing estimation and variable selection simultaneously. To this end, we rewrite the objective function as
Note that the intercept term β0 is always kept in the model; thus, we do not place a penalty on β0. In addition, λ is the tuning parameter, and we will introduce how to determine this parameter later. We applied the shooting algorithm (Fu (1998), milr_paper) to update (β0t, βt).
The milr package contains a data generator,
DGP
, which is used to generate the multiple-instance data
for the simulation studies, and two estimation approaches,
milr
and softmax
, which are the main tools for
modeling the multiple-instance data. In this section, we introduce the
usage and default setups of these
The function DGP
is the generator for the
multiple-instance-type data under the MILR framework.
To use the DGP
function, the user needs to specify an
integer n
as the number of bags, a vector m
of
length n as the number of
instances in each bag, and a vector beta
of length p, with the desired number of
covariates, and the regression coefficients, β, as in
DGP(n, m, beta)
. Note that one can set m
as an
integer for generating the data with an equal instance size
m
for each bag. Thus, the total number of observations is
$N=\sum_{i=1}^n m_i$. The
DGP
simulates the labels of bags through the following
steps:
In the milr package, we provide two approaches to
model the multiple-instance data: the proposed milr
(Chen et al. (2016)) and the softmax
approach (Xu and Frank (2004)). To
implement these two approaches, we assume that the number of
observations and covariates are N and p, respectively. The input data for
both milr
and softmax
are separated into three
parts: the bag-level statuses, y
, as a vector of length
N; the N × p design matrix,
x
; and bag
, the vector of indices of length
N, representing the indices of
the bag to which each instance belongs.
For the milr
function, specifying lambda
in
different ways controls whether and how the lasso penalty participates
in parameter estimation. The default value of lambda
is
0. With this value, the ordinary MLE is
applied, i.e., no penalty term is considered. This is the suggested
choice when the number of covariates p is small. When p is large or when variable
selection is desired, users can specify a λ vector of length κ; otherwise, by letting
lambda = -1
, the program automatically provides a λ vector of length κ=numLambda
as the
tuning set. Following Friedman, Hastie, and
Tibshirani (2010), the theoretical maximal value of λ in () is
The automatically specified sequence of λ values ranges from λmin = λmax/1000 to λmax in ascending order.
The default setting for choosing the optimal λ among these λ values is the Bayesian information
criterion (BIC), −2log (likelihood) + p* × log (n),
where p* is the
number of nonzero regression coefficients. Alternatively, the user can
use the options lambdaCriterion = "deviance"
and
nfold = K
with an integer K
to obtain the best
λ that minimizes the
predictive deviance through ‘bag-wise’ K-fold cross validation. The last
option, maxit
, indicates the maximal number of iterations
of the EM algorithm; its default value is 500.
For the softmax
function, the option alpha
is a nonnegative real number for the α value in (). The maximum
likelihood estimators of the regression coefficients are obtained by the
generic function optim
. Note that no variable selection
approach is implemented for this method.
Two generic accessory functions, coef
and
fitted
, can be used to extract the regression coefficients
and the fitted bag-level labels returned by milr
and
softmax
. We also provide the significance test based on
Wald’s test for the milr
estimations without the lasso
penalty through the summary
function. In addition, to
predict the bag-level statuses for the new data set, the
predict
function can be used by assigning three items:
object
is the fitted model obtained by milr
or
softmax
, newdata
is the covariate matrix, and
bag\_newdata
is the bag indices of the new dataset.
Finally, the MIL model can be used to predict the bag-level labels and
the instances-level labels. The option type
in
fitted
and predicted
functions controls the
type of output labels. The default option is type = "bag"
which results the bag-level prediction. Otherwise, by setting
type = "instance"
, the instances-level labels will be
presented.
We illustrate the usage of the milr package via simulated and real examples.
We demonstrate how to apply the milr
function for model
estimation and variable selection. We simulate data with n = 30 bags, each containing m = 3 instances and regression
coefficients β = (−2, −1, 1, 2, 0.5, 0, 0, 0, 0, 0).
Specifically, the first four covariates are important.
library(milr)
library(pipeR)
set.seed(99)
# set the size of dataset
numOfBag <- 30
numOfInstsInBag <- 3
# set true coefficients: beta_0, beta_1, beta_2, beta_3
trueCoefs <- c(-2, -1, 2, 0.5)
trainData <- DGP(numOfBag, numOfInstsInBag, trueCoefs)
colnames(trainData$X) <- paste0("X", 1:ncol(trainData$X))
(instanceResponse <- as.numeric(with(trainData, tapply(Z, ID, any))))
## [1] 0 1 0 1 0 1 1 1 0 0 1 1 1 0 1 0 1 1 0 0 1 0 1 1 0 1 1 1 1 1
Since the number of covariates is small, we then use the
milr
function to estimate the model parameters with
lambda = 0
. One can apply summary
to produce
results including estimates of the regression coefficients and their
corresponding standard error, testing statistics and the P-values under
Wald’s test. The regression coefficients are returned by the function
coef
.
## Use the user-defined lambda vector.
## Log-Likelihood: -15.550.
## Chosen Penalty: 0.000.
## Estimates:
## Estimate
## intercept -1.4016
## X1 1.4028
## X2 0.3787
## X3 1.0886
## intercept X1 X2 X3
## -1.4015728 1.4027912 0.3787156 1.0886233
The generic function table
builds a contingency table of
the counts for comparing the true bag-level statuses and the fitted
bag-level statuses (obtained by the option type = "bag"
)
and the predict
function is used to predict the labels of
each bag with corresponding covariate X. On the other hand, The fitted and
predicted instance-level statuses can also be found by setting
type = "instance"
in the fitted
and
predict
functions.
## [1] 1 0 1 1 0 0 1 1 1 0 1 0 1 0 0 0 1 1 0 1 1 0 1 1 1 1 0 1 1 1
# fitted(milrFit_EST, type = "instance") # instance-level fitted labels
table(DATA = instanceResponse, FITTED = fitted(milrFit_EST, type = "bag"))
## FITTED
## DATA 0 1
## 0 6 5
## 1 5 14
# predict for testing data
testData <- DGP(numOfBag, numOfInstsInBag, trueCoefs)
colnames(testData$X) <- paste0("X", 1:ncol(testData$X))
(instanceResponseTest <- as.numeric(with(trainData, tapply(Z, ID, any))))
## [1] 0 1 0 1 0 1 1 1 0 0 1 1 1 0 1 0 1 1 0 0 1 0 1 1 0 1 1 1 1 1
pred_EST <- with(testData, predict(milrFit_EST, X, ID, type = "bag"))
# predict(milrFit_EST, testData$X, testData$ID,
# type = "instance") # instance-level prediction
table(DATA = instanceResponseTest, PRED = pred_EST)
## PRED
## DATA 0 1
## 0 4 7
## 1 7 12
Next, the n < p
cases are considered. We generate a data set with n = 30 bags, each with 3 instances
and p = 45 covariates. Among
these covariates, only the first five of them, X1, …, X5,
are active and their nonzero coefficients are the same as the previous
example. First, we manually specify 20 λ values manually from 0.01 to 20
The milr
function chooses the best tuning parameter which
results in the smallest BIC. For this dataset, the chosen model is a
constant model.
set.seed(99)
# Set the new coefficienct vector (large p)
trueCoefs_Lp <- c(-2, -2, -1, 1, 2, 0.5, rep(0, 45))
# Generate the new training data with large p
trainData_Lp <- DGP(numOfBag, numOfInstsInBag, trueCoefs_Lp)
colnames(trainData_Lp$X) <- paste0("X", 1:ncol(trainData_Lp$X))
# variable selection by user-defined tuning set
lambdaSet <- exp(seq(log(0.01), log(20), length = 20))
milrFit_VS <- with(trainData_Lp, milr(Z, X, ID, lambda = lambdaSet))
# grep the active factors and their corresponding coefficients
coef(milrFit_VS) %>>% `[`(abs(.) > 0)
## intercept
## -0.7056211
Second, we try the auto-tuning feature implemented in
milr
by assigning lambda = -1
. The total
number of tuning λ values is
indicated by setting nlambda
. The following example shows
the result of the best model chosen among 5 λ values. The slice
$lambda
shows the auto-tuned λ candidates and the slice
$BIC
returns the corresponding value of BIC for every
candidate λ value. Again, the
chosen model is a constant model.
# variable selection using auto-tuning
milrFit_auto_VS <- milr(trainData_Lp$Z, trainData_Lp$X, trainData_Lp$ID,
lambda = -1, numLambda = 5)
# the auto-selected lambda values
milrFit_auto_VS$lambda
## [1] 0.04516636 0.25398910 1.42828569 8.03184065 45.16635916
## [1] 109.27500 66.61280 49.09209 40.05306 40.05306
# grep the active factors and their corresponding coefficients
coef(milrFit_auto_VS) %>>% `[`(abs(.) > 0)
## intercept
## -0.7056231
Instead of using BIC, a better way to choose the proper λ is using the cross validation by
setting lambdaCriterion = "deviance"
. The following example
shows the best model chosen by minimizing the predictive deviance via
‘bag-wise’ 3-fold cross validation. The results of the predictive
deviance for every candidate λ
can be found in the slice $cv
. Twenty-nine covariates were
identified including the first four true active covariates, X1, …, X4.
# variable selection using auto-tuning with cross validation
milrFit_auto_CV <- milr(trainData_Lp$Z, trainData_Lp$X, trainData_Lp$ID,
lambda = -1, numLambda = 5,
lambdaCriterion = "deviance", nfold = 3)
# the values of predictive deviance under each lambda value
milrFit_auto_CV$cv
## [1] 16.07815 17.03320 13.58422 12.21729 12.21729
# grep the active factors and their corresponding coefficients
coef(milrFit_auto_CV) %>>% `[`(abs(.) > 0)
## intercept
## -0.7056231
According to another simulation study which is not shown in this paper, in contrast to cross-validation, BIC does not perform well for variable selection in terms of multiple-instance logistic regressions. However, it can be an alternative when performing cross-validation is too time consuming.
Hereafter, we denote the proposed method with the lasso penalty by
MILR-LASSO for brevity. In the following, we demonstrate the usage of
MILR-LASSO and the softmax
approach on a real dataset,
called MUSK1. The MUSK1 data set consists of 92 molecules (bags) of
which 47 are classified as having a musky smell and 45 are classified to
be non-musks. The molecules are musky if at least one of their
conformers (instances) were responsible for the musky smell. However,
knowledge about which conformers are responsible for the musky smell is
unknown. There are 166 features that describe the shape, or
conformation, of the molecules. The goal is to predict whether a new
molecules is musk or non-musk. This dataset is one of the popular
benchmark datasets in the field of multiple-instance learning research
and one can download the dataset from the following weblink.
dataName <- "MIL-Data-2002-Musk-Corel-Trec9.tgz"
dataUrl <- "http://www.cs.columbia.edu/~andrews/mil/data/"
Here are the codes that use the untar
function to
decompress the downloaded file and extract the MUSK1
dataset. Then, with the following data preprocessing, we reassemble the
MUSK1
dataset in a "data.frame"
format. The
first 2 columns of the MUSK1
dataset are the bag indices
and the bag-level labels of each observation. Starting with the third
column, there are p = 166
covariates involved in the MUSK1
dataset.
filePath <- file.path(getwd(), dataName)
# Download MIL data sets from the url (not run)
# if (!file.exists(filePath))
# download.file(paste0(dataUrl, dataName), filePath)
# Extract MUSK1 data file (not run)
# if (!dir.exists("MilData"))
# untar(filePath, files = "musk1norm.svm")
# Read and Preprocess MUSK1
library(data.table)
MUSK1 <- fread("musk1norm.svm", header = FALSE) %>>%
`[`(j = lapply(.SD, function(x) gsub("\\d+:(.*)", "\\1", x))) %>>%
`[`(j = c("bag", "label") := tstrsplit(V1, ":")) %>>%
`[`(j = V1 := NULL) %>>% `[`(j = lapply(.SD, as.numeric)) %>>%
`[`(j = `:=`(bag = bag + 1, label = (label + 1)/2)) %>>%
setnames(paste0("V", 2:(ncol(.)-1)), paste0("V", 1:(ncol(.)-2))) %>>%
`[`(j = paste0("V", 1:(ncol(.)-2)) := lapply(.SD, scale),
.SDcols = paste0("V", 1:(ncol(.)-2)))
X <- paste0("V", 1:(ncol(MUSK1) - 2), collapse = "+") %>>%
(paste("~", .)) %>>% as.formula %>>% model.matrix(MUSK1) %>>% `[`( , -1L)
Y <- as.numeric(with(MUSK1, tapply(label, bag, function(x) sum(x) > 0)))
To fit an MIL model without variable selection, the
milr package provides two functions. The first is the
milr
function with lambda = 0
. The second
approach is the softmax
function with a specific value of
alpha
. Here, we apply the approaches that have been
introduced in Xu and Frank (2004) and
Ray and Craven (2005), called the s(0) (alpha=0
) and
s(3) (alpha=3
)
methods, respectively. The optimization method in softmax
is chosen as the default settings of the generic function
optim
, that is, the method.
# softmax with alpha = 0
softmaxFit_0 <- softmax(MUSK1$label, X, MUSK1$bag, alpha = 0,
control = list(maxit = 5000))
# softmax with alpha = 3
softmaxFit_3 <- softmax(MUSK1$label, X, MUSK1$bag, alpha = 3,
control = list(maxit = 5000))
# use a very small lambda so that milr do the estimation
# without evaluating the Hessian matrix
milrFit <- milr(MUSK1$label, X, MUSK1$bag, lambda = 1e-7, maxit = 5000)
For variable selection, we apply the MILR-LASSO approach. First, the
tuning parameter set is chosen automatically by setting λ = −1, and the best λ value is obtained by minimizing
the predictive deviance with 3-fold cross validation among
nlambda = 20
candidates.
# MILR-LASSO
milrSV <- milr(MUSK1$label, X, MUSK1$bag, lambda = -1, numLambda = 20,
nfold = 3, lambdaCriterion = "deviance", maxit = 5000)
# show the detected active covariates
sv_ind <- names(which(coef(milrSV)[-1L] != 0)) %>>%
(~ print(.)) %>>% match(colnames(X))
## [1] "V31" "V36" "V37" "V76" "V83" "V105" "V106" "V108" "V109" "V116"
## [11] "V118" "V124" "V126" "V129" "V136" "V147" "V162" "V163"
# use a very small lambda so that milr do the estimation
# without evaluating the Hessian matrix
milrREFit <- milr(MUSK1$label, X[ , sv_ind], MUSK1$bag,
lambda = 1e-7, maxit = 5000)
# Confusion matrix of the fitted model
table(DATA = Y, FIT_MILR = fitted(milrREFit, type = "bag"))
## FIT_MILR
## DATA 0 1
## 0 39 6
## 1 3 44
We use 3-fold cross validation and compare the prediction accuracy among four MIL models which are s(0), s(3), the MILR model with all covariates, and, the MILR model fitted by the selected covariates via MILR-LASSO. Then, we show their prediction accuracy by the confusion matrices.
set.seed(99)
predY <- matrix(0, length(Y), 4L) %>>%
`colnames<-`(c("s0","s3","milr","milr_sv"))
folds <- 3
foldBag <- rep(1:folds, floor(length(Y) / folds) + 1,
length = length(Y)) %>>% sample(length(.))
foldIns <- rep(foldBag, table(MUSK1$bag))
for (i in 1:folds) {
# prepare training and testing sets
ind <- which(foldIns == i)
# train models
fit_s0 <- softmax(MUSK1[-ind, ]$label, X[-ind, ], MUSK1[-ind, ]$bag,
alpha = 0, control = list(maxit = 5000))
fit_s3 <- softmax(MUSK1[-ind, ]$label, X[-ind, ], MUSK1[-ind, ]$bag,
alpha = 3, control = list(maxit = 5000))
# milr, use a very small lambda so that milr do the estimation
# without evaluating the Hessian matrix
fit_milr <- milr(MUSK1[-ind, ]$label, X[-ind, ], MUSK1[-ind, ]$bag,
lambda = 1e-7, maxit = 5000)
fit_milr_sv <- milr(MUSK1[-ind, ]$label, X[-ind, sv_ind], MUSK1[-ind, ]$bag,
lambda = 1e-7, maxit = 5000)
# store the predicted labels
ind2 <- which(foldBag == i)
# predict function returns bag response in default
predY[ind2, 1L] <- predict(fit_s0, X[ind, ], MUSK1[ind, ]$bag)
predY[ind2, 2L] <- predict(fit_s3, X[ind, ], MUSK1[ind, ]$bag)
predY[ind2, 3L] <- predict(fit_milr, X[ind, ], MUSK1[ind, ]$bag)
predY[ind2, 4L] <- predict(fit_milr_sv, X[ind, sv_ind], MUSK1[ind, ]$bag)
}
table(DATA = Y, PRED_s0 = predY[ , 1L])
## PRED_s0
## DATA 0 1
## 0 30 15
## 1 4 43
## PRED_s3
## DATA 0 1
## 0 26 19
## 1 2 45
## PRED_MILR
## DATA 0 1
## 0 31 14
## 1 10 37
## PRED_MILR_SV
## DATA 0 1
## 0 34 11
## 1 8 39
This vignette introduces the usage of the R package milr for analyzing multiple-instance data under the framework of logistic regression. In particular, the package contains two approaches: summarizing the mean responses within each bag using the softmax function (Xu and Frank (2004), Ray and Craven (2005)) and treating the instance-level statuses as hidden information as well as applying the EM algorithm for estimation (Chen et al. (2016)). In addition, to estimate the MILR model, a lasso-type variable selection technique is incorporated into the latter approach. The limitations of the developed approaches are as follows. First, we ignore the potential dependency among instance statuses within one bag. Random effects can be incorporated into the proposed logistic regression to represent the dependency. Second, according to our preliminary simulation study, not shown in this paper, the maximum likelihood estimator might be biased when the number of instances in a bag is large, say, mi = 100 or more. Bias reduction methods, such as Firth (1993) and Quenouille (1956), can be applied to alleviate this bias. These attempts are deferred to our future work.