Title: | Boolean Matrix Factorization |
---|---|
Description: | Provides four boolean matrix factorization (BMF) methods. BMF has many applications like data mining and categorical data analysis. BMF is also known as boolean matrix decomposition (BMD) and was found to be an NP-hard (non-deterministic polynomial-time) problem. Currently implemented methods are 'Asso' Miettinen, Pauli and others (2008) <doi:10.1109/TKDE.2008.53>, 'GreConD' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'GreConDPlus' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'topFiberM' A. Desouki, M. Roeder, A. Ngonga (2019) <arXiv:1903.10326>. |
Authors: | Abdelmoneim Amer Desouki |
Maintainer: | Abdelmoneim Amer Desouki <[email protected]> |
License: | GPL-3 |
Version: | 1.1 |
Built: | 2024-12-02 06:34:14 UTC |
Source: | CRAN |
Provides four boolean matrix factorization (BMF) methods. BMF has many applications like data mining and categorical data analysis. BMF is also known as boolean matrix decomposition (BMD) and was found to be an NP-hard (non-deterministic polynomial-time) problem. Currently implemented methods are 'Asso' Miettinen, Pauli and others (2008) <doi:10.1109/TKDE.2008.53>, 'GreConD' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'GreConDPlus' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'topFiberM' A. Desouki, M. Roeder, A. Ngonga (2019) <arXiv:1903.10326>.
The DESCRIPTION file:
Package: | rBMF |
Type: | Package |
Title: | Boolean Matrix Factorization |
Version: | 1.1 |
Date: | 2021-1-13 |
Author: | Abdelmoneim Amer Desouki |
Maintainer: | Abdelmoneim Amer Desouki <[email protected]> |
Depends: | R (>= 3.2.0), Matrix, methods, Rcpp |
Description: | Provides four boolean matrix factorization (BMF) methods. BMF has many applications like data mining and categorical data analysis. BMF is also known as boolean matrix decomposition (BMD) and was found to be an NP-hard (non-deterministic polynomial-time) problem. Currently implemented methods are 'Asso' Miettinen, Pauli and others (2008) <doi:10.1109/TKDE.2008.53>, 'GreConD' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'GreConDPlus' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'topFiberM' A. Desouki, M. Roeder, A. Ngonga (2019) <arXiv:1903.10326>. |
License: | GPL-3 |
NeedsCompilation: | no |
Packaged: | 2021-01-13 14:57:05 UTC; abdel |
Repository: | CRAN |
Date/Publication: | 2021-01-13 16:40:02 UTC |
Index of help topics:
Asso_approximate Asso: Boolean Matrix Factorization Chess Chess dataset DBLP DBLP dataset GreConD GreConD Boolean matrix factorization GreConDPlus GreConDPlus Boolean Matrix Factorization rBMF-package Boolean Matrix Factorization topFiberM topFiberM
Abdelmoneim Amer Desouki
topFiberM -Desouki, A. A., Röder, M., & Ngomo, A. C. N. (2019). topFiberM: Scalable and Efficient Boolean Matrix Factorization. arXiv preprint arXiv:1903.10326.
Asso -Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem. IEEE transactions on knowledge and data engineering, 20(10), 1348-1362.
GreConD, GreConDPlus -Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76(1)(2010), 3-20
topFiberM
Asso_approximate
GreConD
GreConDPlus
data(DBLP) X=DBLP r=7 Xb=X==1#Convert to boolean tempX=as(X,'TsparseMatrix') stats=NULL for(tP in c(0.2,0.3,0.4,0.5,0.6,0.7,0.8,1)){ Res=topFiberM(Xb,r=r,tP=tP,SR=100,verbose=1) X_=Res$A %*% Res$B X_=as(X_,'TsparseMatrix') #Calculate metrics li=tempX@i[tempX@x==1]+1 lj=tempX@j[tempX@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) stats=rbind(stats,cbind(tP,tp,fn,fp,cv,P=tp*1.0/(tp+fp),R=tp*1.0/(tp+fn))) } plot(stats[,'tP'],stats[,'R'],type='b',col='red',lwd=2, main=sprintf('topFiberM, dataset: %s, #Known facts:%d','DBLP',sum(X)),ylab="",xlab='tP', xlim=c(0,1),ylim=c(0,1)) HM=apply(stats,1,function(x){2/(1/x['P']+1/x['R'])}) points(stats[,'tP'],stats[,'P'],col='blue',lwd=2,type='b') points(stats[,'tP'],HM,col='green',lwd=2,type='b') grid(nx=10, lty = "dotted", lwd = 2) legend(legend=c('Recall','Precision','Harmonic mean'),col=c('red','blue','green'), x=0.6,y=0.2,pch=1,cex=0.75,lwd=2)
data(DBLP) X=DBLP r=7 Xb=X==1#Convert to boolean tempX=as(X,'TsparseMatrix') stats=NULL for(tP in c(0.2,0.3,0.4,0.5,0.6,0.7,0.8,1)){ Res=topFiberM(Xb,r=r,tP=tP,SR=100,verbose=1) X_=Res$A %*% Res$B X_=as(X_,'TsparseMatrix') #Calculate metrics li=tempX@i[tempX@x==1]+1 lj=tempX@j[tempX@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) stats=rbind(stats,cbind(tP,tp,fn,fp,cv,P=tp*1.0/(tp+fp),R=tp*1.0/(tp+fn))) } plot(stats[,'tP'],stats[,'R'],type='b',col='red',lwd=2, main=sprintf('topFiberM, dataset: %s, #Known facts:%d','DBLP',sum(X)),ylab="",xlab='tP', xlim=c(0,1),ylim=c(0,1)) HM=apply(stats,1,function(x){2/(1/x['P']+1/x['R'])}) points(stats[,'tP'],stats[,'P'],col='blue',lwd=2,type='b') points(stats[,'tP'],HM,col='green',lwd=2,type='b') grid(nx=10, lty = "dotted", lwd = 2) legend(legend=c('Recall','Precision','Harmonic mean'),col=c('red','blue','green'), x=0.6,y=0.2,pch=1,cex=0.75,lwd=2)
Given binary matrix S (m x n), and a scalar k (number of factors), Asso finds two matrices A (m x r), B (r x n) such taht S~= A * B
Asso_approximate(S, k, opti)
Asso_approximate(S, k, opti)
S |
input matrix to be factorized |
k |
number of factors (rank) |
opti |
options : list containing components verbose:control of showing messages , threshold:precision limit,penalty_overcovered,bonus_covered. |
LIST of three components
B |
k x n factor matrix |
O |
n x k factor matrix |
D |
n x n matrix, the result from calculate_association |
Abdelmoneim Amer Desouki
Miettinen, P., Mielikaeinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem. IEEE transactions on knowledge and data engineering, 20(10), 1348-1362.
See Also topFiberM
data(DBLP) X=DBLP Xb=X==1 Res=Asso_approximate(Xb,7,list(threshold=0.5,penalty_overcovered=1,bonus_covered=1,verbose=0)) X_=Res$O %*% Res$B X_=as(X_,'TsparseMatrix') X=as(X,'TsparseMatrix') li=X@i[X@x==1]+1 lj=X@j[X@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))
data(DBLP) X=DBLP Xb=X==1 Res=Asso_approximate(Xb,7,list(threshold=0.5,penalty_overcovered=1,bonus_covered=1,verbose=0)) X_=Res$O %*% Res$B X_=as(X_,'TsparseMatrix') X=as(X,'TsparseMatrix') li=X@i[X@x==1]+1 lj=X@j[X@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))
The Chess data describes chess games of certain kind described by attributes representing the positions of pieces on the board. More info: http://fimi.ua.ac.be/data/ dimension: 3196 objects, 76 attributes, number of values in the scale of grades: 2 Downloaded from: http://datasets.inf.upol.cz/
data(Chess)
data(Chess)
Binary matrix, name Chess
Lichman, M: UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science, 2013
The DBLP dataset contains records of in which of the 19 conferences the 6980 authors had published. The dataset is collected from the DBLP database . dimension: 6980 objects, 19 attributes, number of values in the scale of grades: 2 Downloaded from: http://datasets.inf.upol.cz/
data(DBLP)
data(DBLP)
Binary matrix, name DBLP
Miettinen, P.: Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms, PhD thesis, University of Helsinki, 2009
implements GreConD algorithm for Boolean matrix factorization. returns A . B = X (if the no. of factors is not limited).
GreConD(X, no_of_factors = NULL)
GreConD(X, no_of_factors = NULL)
X |
input binary matrix to be factorized. |
no_of_factors |
number of factors of the result (optional). |
List of the following four components:
A |
Factor matrix A |
B |
Factor matrix B |
Abdelmoneim Amer Desouki
Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76(1)(2010), 3-20
data(DBLP) X=DBLP Xb=X==1 Res=GreConD(Xb,7) X_=Res$A %*% Res$B X_=as(X_,'TsparseMatrix') X=as(X,'TsparseMatrix') li=X@i[X@x==1]+1 lj=X@j[X@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))
data(DBLP) X=DBLP Xb=X==1 Res=GreConD(Xb,7) X_=Res$A %*% Res$B X_=as(X_,'TsparseMatrix') X=as(X,'TsparseMatrix') li=X@i[X@x==1]+1 lj=X@j[X@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))
implements GreConDPlus Boolean Matrix Factorization Algorithm.
GreConDPlus(X, no_of_factors = NULL, w, verbose = 2)
GreConDPlus(X, no_of_factors = NULL, w, verbose = 2)
X |
input binary matrix to be factorized. |
no_of_factors |
number of factors of the result (optional). |
w |
wieght parameter for the algorithm : represents the weight of type1 and type2 errors |
verbose |
integer value to control the appearance of messages. 0 minimal messages will be showed. Default 2 |
List of the following four components:
A |
Factor matrix A |
B |
Factor matrix B |
Abdelmoneim Amer Desouki
Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76(1)(2010), 3-20
topFiberM
,
GreConD
,
Asso_approximate
data(DBLP) X=DBLP Xb=X==1 Res=GreConDPlus(Xb,7,w=4,verbose=1) X_=Res$A %*% Res$B X_=as(X_,'TsparseMatrix') X=as(X,'TsparseMatrix') li=X@i[X@x==1]+1 lj=X@j[X@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))
data(DBLP) X=DBLP Xb=X==1 Res=GreConDPlus(Xb,7,w=4,verbose=1) X_=Res$A %*% Res$B X_=as(X_,'TsparseMatrix') X=as(X,'TsparseMatrix') li=X@i[X@x==1]+1 lj=X@j[X@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))
implements topFiberM Boolean matrix factorization algorithm. topFiberM chooses in a greedy way the fibers (rows or columns) to represent the entire matrix. Fibers are extended to rectangles according to a threshold on precision. The search for these" top fibers" can continue beyond the required rank and according to an optional parameter that defines the limit for this search.
topFiberM(X, r = 2, tP = 0.5, verbose = 2, SR = NULL)
topFiberM(X, r = 2, tP = 0.5, verbose = 2, SR = NULL)
X |
the input boolean sparse matrix |
r |
rank (number of factors) required. |
tP |
parameter to put threshold on precision |
verbose |
integer value to control the appearance of messages. 0 minimal messages will be showed. Default 2 |
SR |
search limit which defines the number iterations, minimum value is rank and maximum value is minimum number of columns and number of rows |
List of the following four components:
A |
Factor matrix A |
B |
Factor matrix B |
X1 |
remaining uncovered ones, (False negatives) |
tf |
dataframe logging of steps giving description of each factor, contains index, based on column (2) / row (1), nnz, TP, FP |
Abdelmoneim Amer Desouki
Desouki, A. A., Roeder, M., & Ngomo, A. C. N. (2019). topFiberM: Scalable and Efficient Boolean Matrix Factorization. arXiv preprint arXiv:1903.10326.
data(DBLP) r=7 tP=0.6 X=DBLP Xb=X==1#Convert to boolean Res=topFiberM(Xb,r=r,tP=tP,SR=100,verbose=1) X_=Res$A %*% Res$B X_=as(X_,'TsparseMatrix') #Calculate metrics tempX=as(X,'TsparseMatrix') li=tempX@i[tempX@x==1]+1 lj=tempX@j[tempX@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))
data(DBLP) r=7 tP=0.6 X=DBLP Xb=X==1#Convert to boolean Res=topFiberM(Xb,r=r,tP=tP,SR=100,verbose=1) X_=Res$A %*% Res$B X_=as(X_,'TsparseMatrix') #Calculate metrics tempX=as(X,'TsparseMatrix') li=tempX@i[tempX@x==1]+1 lj=tempX@j[tempX@x==1]+1 tp=sum(X_[cbind(li,lj)]>0) fn=sum(X)-tp#sum(!X_[cbind(li,lj)]) fp=sum(X_@x>0)-tp cv=1-(fp+fn)/(tp+fn) print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))