Main functions
Functions are coded for easy read and easy understand. * lnmbiclust * lnmfa * plnmfa
1 lnmbiclust is the main function that perform our algorithm, which includes default initial values, main estimations as well as model selection. For illustration, we will generate a simulation data from model “GUU” as follows:
set.seed(123)
n <- 40
simp <- rmultinom(n,1,c(0.6,0.4))
lab <- as.factor(apply(t(simp),1,which.max))
#parameter comes from multinomial
p <- 11
mu1 <- c(-2.8,-1.3,-1.6,-3.9,-2.6,-2.9,-2.5,-2.7,-3.1,-2.9)
B1 <- matrix(c(1,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,0,1,0,0,0,0),nrow = p-1)
T1 <- diag(c(2.9,0.5,1))
D1 <- diag(c(0.52, 1.53, 0.56, 0.19, 1.32, 1.77, 0.6, 0.53, 0.37, 0.4))
cov1 <- B1%*%T1%*%t(B1)+D1
mu2 <- c(1.5,-2.7,-1.1,-0.4,-1.4,-2.6,-3,-3.9,-2.7,-3)
B2 <- matrix(c(1,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,0,1,0,0,0,0),nrow = p-1)
T2 <- diag(c(0.2,0.003,0.15))
D2 <- diag(c(0.01, 0.62, 0.45, 0.01, 0.37, 0.42, 0.08, 0.16, 0.23, 0.27))
cov2 <- B2%*%T2%*%t(B2)+D2
df <- matrix(0,nrow=n,ncol=p-1)
for (i in 1:n) {
if(lab[i]==1){df[i,] <- rmvnorm(1,mu1,sigma = cov1)}
else if(lab[i]==2){df[i,] <- rmvnorm(1,mu2,sigma = cov2)}
}
f_df <- cbind(df,0)
z <- exp(f_df)/rowSums(exp(f_df))
W_count <- matrix(0,nrow=n,ncol=p)
for (i in 1:n) {
W_count[i,] <- rmultinom(1,runif(1,10000,20000),z[i,])
}
After generated data, we can strat to try to fit one model:
range_G <- 2 #define the number of components
range_Q <- 2 #define the possible number of bicluster.
cov_str <- "GUU" #select the model you want to fit
#It will fit GUU model with G=2, Q=c(2,2)
res <- lnmbiclust(W_count=W_count, range_G=range_G, range_Q=range_Q, model=cov_str)
#where res will be a list contain all parameters.
Notice the default setting is to run all 16 models if parameter model is missing. There are 3 criteria you can choose: AIC, BIC(default) and ICL.
If you don’t want to fit a model with specific G or Q, the function can do model selection based on criteria you choose. The output will contain two lists in res, one is the paramters of the best model selected by BIC(AIC or ICL), the other one is a dataframe of model names along with AIC, BIC and ICL values for all models that have ran.
range_G <- c(2:3)
range_Q <- c(2:3)
cov_str <- c("UUU", "GGC")
res <- lnmbiclust(W_count=W_count, range_G=range_G, range_Q=range_Q, model=cov_str, criteria="BIC")
best_model=res$best_model
it will run G=2, Q_g=c(2,2); G=3, Q_g=c(2,2,2); G=2, Q_g=c(3,3);G=3, Q_g=c(3,3,3). In total 4 models for each UUU and GGC, then select the best one based on BIC. If you want to include permutations in UUU, UUG, UUD or UUC:
range_G <- 2
range_Q <- c(2:3)
cov_str <- "UUU"
res <- lnmbiclust(W_count=W_count, range_G=range_G, range_Q=range_Q, model=cov_str, criteria="BIC",permutation=TRUE)
res$best_model
it will run G=2, Q_g=c(2,2); G=2, Q_g=c(3,3); G=2, Q_g=c(2,3);G=2, Q_g=c(3,2). In total 4 models for UUU.
Sometimes you may want to know more detail about models that ran. For example, if the BIC values are very close between two models, then they may equally good. Only choose the best one with highest BIC is not fair. Here we have output all_fitted_model under lnmbiclust, which gives the output of all models you have ran and model selection criteria.
It will return a dataframe with all combinations of G and Q for all models you have included, decreasing ordered as the criteria you specified, default is ordered by BIC.
2 lnmfa The usage is exactly the same as lnmbiclust. Except it doesn’t have parameter permutation. The only difference would be the model name. Since in this model, the T is fix as identity matrix, the middle position will also fixed as U. So all the 8 models will be: UUU, UUG, UUD, UUC, GUU, GUG, GUD, GUC.
range_G <- c(2:3)
range_Q <- c(2:3)
cov_str <- c("UUU", "GUC")
res <- lnmfa(W_count=W_count, range_G=range_G, range_Q=range_Q, model=cov_str, criteria="BIC")
best_model=res$best_model
model_output=res$all_fitted_model
3 plnmfa The usage is exactly the same as lnmfa, with additional tunning parameters. In here, range_Q need to be specified by a number instead of a range. The range_tuning could be a range of number which is between 0 and 1. And it doesn’t allow model selections between the two model, so you have to specify the model name between UUU and GUU.