Package 'rBMF' reference manual

Title:	Boolean Matrix Factorization
Description:	Provides four boolean matrix factorization (BMF) methods. BMF has many applications like data mining and categorical data analysis. BMF is also known as boolean matrix decomposition (BMD) and was found to be an NP-hard (non-deterministic polynomial-time) problem. Currently implemented methods are 'Asso' Miettinen, Pauli and others (2008) <doi:10.1109/TKDE.2008.53>, 'GreConD' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'GreConDPlus' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'topFiberM' A. Desouki, M. Roeder, A. Ngonga (2019) <arXiv:1903.10326>.
Authors:	Abdelmoneim Amer Desouki
Maintainer:	Abdelmoneim Amer Desouki <desouki@hhu.de>
License:	GPL-3
Version:	1.1
Built:	2025-03-02 06:40:07 UTC
Source:	CRAN

Boolean Matrix Factorization

Description

Provides four boolean matrix factorization (BMF) methods. BMF has many applications like data mining and categorical data analysis. BMF is also known as boolean matrix decomposition (BMD) and was found to be an NP-hard (non-deterministic polynomial-time) problem. Currently implemented methods are 'Asso' Miettinen, Pauli and others (2008) <doi:10.1109/TKDE.2008.53>, 'GreConD' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'GreConDPlus' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'topFiberM' A. Desouki, M. Roeder, A. Ngonga (2019) <arXiv:1903.10326>.

Details

The DESCRIPTION file:

Package:	rBMF
Type:	Package
Title:	Boolean Matrix Factorization
Version:	1.1
Date:	2021-1-13
Author:	Abdelmoneim Amer Desouki
Maintainer:	Abdelmoneim Amer Desouki <desouki@hhu.de>
Depends:	R (>= 3.2.0), Matrix, methods, Rcpp
Description:	Provides four boolean matrix factorization (BMF) methods. BMF has many applications like data mining and categorical data analysis. BMF is also known as boolean matrix decomposition (BMD) and was found to be an NP-hard (non-deterministic polynomial-time) problem. Currently implemented methods are 'Asso' Miettinen, Pauli and others (2008) <doi:10.1109/TKDE.2008.53>, 'GreConD' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'GreConDPlus' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'topFiberM' A. Desouki, M. Roeder, A. Ngonga (2019) <arXiv:1903.10326>.
License:	GPL-3
NeedsCompilation:	no
Packaged:	2021-01-13 14:57:05 UTC; abdel
Repository:	CRAN
Date/Publication:	2021-01-13 16:40:02 UTC

Index of help topics:

Asso_approximate        Asso: Boolean Matrix Factorization
Chess                   Chess dataset
DBLP                    DBLP dataset
GreConD                 GreConD Boolean matrix factorization
GreConDPlus             GreConDPlus Boolean Matrix Factorization
rBMF-package            Boolean Matrix Factorization
topFiberM               topFiberM

Author(s)

Abdelmoneim Amer Desouki

References

topFiberM -Desouki, A. A., Röder, M., & Ngomo, A. C. N. (2019). topFiberM: Scalable and Efficient Boolean Matrix Factorization. arXiv preprint arXiv:1903.10326.

Asso -Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem. IEEE transactions on knowledge and data engineering, 20(10), 1348-1362.

GreConD, GreConDPlus -Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76(1)(2010), 3-20

Examples


data(DBLP)
 X=DBLP
    r=7
     Xb=X==1#Convert to boolean
    tempX=as(X,'TsparseMatrix')
    stats=NULL
    for(tP in c(0.2,0.3,0.4,0.5,0.6,0.7,0.8,1)){
    
      Res=topFiberM(Xb,r=r,tP=tP,SR=100,verbose=1)
    
    X_=Res$A %*% Res$B
    X_=as(X_,'TsparseMatrix')
    #Calculate metrics
    li=tempX@i[tempX@x==1]+1
    lj=tempX@j[tempX@x==1]+1
    tp=sum(X_[cbind(li,lj)]>0)
    fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
    fp=sum(X_@x>0)-tp
    cv=1-(fp+fn)/(tp+fn)
    stats=rbind(stats,cbind(tP,tp,fn,fp,cv,P=tp*1.0/(tp+fp),R=tp*1.0/(tp+fn)))
    }

   
    plot(stats[,'tP'],stats[,'R'],type='b',col='red',lwd=2,
    main=sprintf('topFiberM, dataset: %s, 
         #Known facts:%d','DBLP',sum(X)),ylab="",xlab='tP',
    xlim=c(0,1),ylim=c(0,1))
    HM=apply(stats,1,function(x){2/(1/x['P']+1/x['R'])})
    points(stats[,'tP'],stats[,'P'],col='blue',lwd=2,type='b')
    points(stats[,'tP'],HM,col='green',lwd=2,type='b')
    grid(nx=10, lty = "dotted", lwd = 2)
    legend(legend=c('Recall','Precision','Harmonic mean'),col=c('red','blue','green'),
    x=0.6,y=0.2,pch=1,cex=0.75,lwd=2)





data(DBLP)
 X=DBLP
    r=7
     Xb=X==1#Convert to boolean
    tempX=as(X,'TsparseMatrix')
    stats=NULL
    for(tP in c(0.2,0.3,0.4,0.5,0.6,0.7,0.8,1)){
    
      Res=topFiberM(Xb,r=r,tP=tP,SR=100,verbose=1)
    
    X_=Res$A %*% Res$B
    X_=as(X_,'TsparseMatrix')
    #Calculate metrics
    li=tempX@i[tempX@x==1]+1
    lj=tempX@j[tempX@x==1]+1
    tp=sum(X_[cbind(li,lj)]>0)
    fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
    fp=sum(X_@x>0)-tp
    cv=1-(fp+fn)/(tp+fn)
    stats=rbind(stats,cbind(tP,tp,fn,fp,cv,P=tp*1.0/(tp+fp),R=tp*1.0/(tp+fn)))
    }

   
    plot(stats[,'tP'],stats[,'R'],type='b',col='red',lwd=2,
    main=sprintf('topFiberM, dataset: %s, 
         #Known facts:%d','DBLP',sum(X)),ylab="",xlab='tP',
    xlim=c(0,1),ylim=c(0,1))
    HM=apply(stats,1,function(x){2/(1/x['P']+1/x['R'])})
    points(stats[,'tP'],stats[,'P'],col='blue',lwd=2,type='b')
    points(stats[,'tP'],HM,col='green',lwd=2,type='b')
    grid(nx=10, lty = "dotted", lwd = 2)
    legend(legend=c('Recall','Precision','Harmonic mean'),col=c('red','blue','green'),
    x=0.6,y=0.2,pch=1,cex=0.75,lwd=2)

Asso: Boolean Matrix Factorization

Description

Given binary matrix S (m x n), and a scalar k (number of factors), Asso finds two matrices A (m x r), B (r x n) such taht S~= A * B

Usage

Asso_approximate(S, k, opti)
Asso_approximate(S, k, opti)

Arguments

`S`	input matrix to be factorized
`k`	number of factors (rank)
`opti`	options : list containing components verbose:control of showing messages , threshold:precision limit,penalty_overcovered,bonus_covered.

Value

LIST of three components

`B`	k x n factor matrix
`O`	n x k factor matrix
`D`	n x n matrix, the result from calculate_association

Author(s)

Abdelmoneim Amer Desouki

References

Miettinen, P., Mielikaeinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem. IEEE transactions on knowledge and data engineering, 20(10), 1348-1362.

Examples

data(DBLP)
X=DBLP
Xb=X==1
Res=Asso_approximate(Xb,7,list(threshold=0.5,penalty_overcovered=1,bonus_covered=1,verbose=0))

    X_=Res$O %*% Res$B
    X_=as(X_,'TsparseMatrix')
    X=as(X,'TsparseMatrix')
			li=X@i[X@x==1]+1
			lj=X@j[X@x==1]+1
			tp=sum(X_[cbind(li,lj)]>0)
			fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
			fp=sum(X_@x>0)-tp
			cv=1-(fp+fn)/(tp+fn)
print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))

data(DBLP)
X=DBLP
Xb=X==1
Res=Asso_approximate(Xb,7,list(threshold=0.5,penalty_overcovered=1,bonus_covered=1,verbose=0))

    X_=Res$O %*% Res$B
    X_=as(X_,'TsparseMatrix')
    X=as(X,'TsparseMatrix')
			li=X@i[X@x==1]+1
			lj=X@j[X@x==1]+1
			tp=sum(X_[cbind(li,lj)]>0)
			fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
			fp=sum(X_@x>0)-tp
			cv=1-(fp+fn)/(tp+fn)
print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))

Chess dataset

Description

The Chess data describes chess games of certain kind described by attributes representing the positions of pieces on the board. More info: http://fimi.ua.ac.be/data/ dimension: 3196 objects, 76 attributes, number of values in the scale of grades: 2 Downloaded from: http://datasets.inf.upol.cz/

Usage

data(Chess)data(Chess)

Format

Binary matrix, name Chess

References

Lichman, M: UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science, 2013

DBLP dataset

Description

The DBLP dataset contains records of in which of the 19 conferences the 6980 authors had published. The dataset is collected from the DBLP database . dimension: 6980 objects, 19 attributes, number of values in the scale of grades: 2 Downloaded from: http://datasets.inf.upol.cz/

Usage

data(DBLP)data(DBLP)

Format

Binary matrix, name DBLP

References

Miettinen, P.: Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms, PhD thesis, University of Helsinki, 2009

GreConD Boolean matrix factorization

Description

implements GreConD algorithm for Boolean matrix factorization. returns A . B = X (if the no. of factors is not limited).

Usage

GreConD(X, no_of_factors = NULL)
GreConD(X, no_of_factors = NULL)

Arguments

`X`	input binary matrix to be factorized.
`no_of_factors`	number of factors of the result (optional).

Value

List of the following four components:

`A`	Factor matrix A
`B`	Factor matrix B

Author(s)

Abdelmoneim Amer Desouki

References

Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76(1)(2010), 3-20

Examples

data(DBLP)
X=DBLP
Xb=X==1
Res=GreConD(Xb,7)

   X_=Res$A %*% Res$B
    X_=as(X_,'TsparseMatrix')
    X=as(X,'TsparseMatrix')
			li=X@i[X@x==1]+1
			lj=X@j[X@x==1]+1
			tp=sum(X_[cbind(li,lj)]>0)
			fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
			fp=sum(X_@x>0)-tp
			cv=1-(fp+fn)/(tp+fn)
print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))

data(DBLP)
X=DBLP
Xb=X==1
Res=GreConD(Xb,7)

   X_=Res$A %*% Res$B
    X_=as(X_,'TsparseMatrix')
    X=as(X,'TsparseMatrix')
			li=X@i[X@x==1]+1
			lj=X@j[X@x==1]+1
			tp=sum(X_[cbind(li,lj)]>0)
			fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
			fp=sum(X_@x>0)-tp
			cv=1-(fp+fn)/(tp+fn)
print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))

GreConDPlus Boolean Matrix Factorization

Description

implements GreConDPlus Boolean Matrix Factorization Algorithm.

Usage

GreConDPlus(X, no_of_factors = NULL, w, verbose = 2)
GreConDPlus(X, no_of_factors = NULL, w, verbose = 2)

Arguments

`X`	input binary matrix to be factorized.
`no_of_factors`	number of factors of the result (optional).
`w`	wieght parameter for the algorithm : represents the weight of type1 and type2 errors
`verbose`	integer value to control the appearance of messages. 0 minimal messages will be showed. Default 2

Value

List of the following four components:

`A`	Factor matrix A
`B`	Factor matrix B

Author(s)

Abdelmoneim Amer Desouki

References

Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76(1)(2010), 3-20

Examples


data(DBLP)
X=DBLP
Xb=X==1
Res=GreConDPlus(Xb,7,w=4,verbose=1)

    X_=Res$A %*% Res$B
    X_=as(X_,'TsparseMatrix')
    X=as(X,'TsparseMatrix')
			li=X@i[X@x==1]+1
			lj=X@j[X@x==1]+1
			tp=sum(X_[cbind(li,lj)]>0)
			fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
			fp=sum(X_@x>0)-tp
			cv=1-(fp+fn)/(tp+fn)
print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))

data(DBLP)
X=DBLP
Xb=X==1
Res=GreConDPlus(Xb,7,w=4,verbose=1)

    X_=Res$A %*% Res$B
    X_=as(X_,'TsparseMatrix')
    X=as(X,'TsparseMatrix')
			li=X@i[X@x==1]+1
			lj=X@j[X@x==1]+1
			tp=sum(X_[cbind(li,lj)]>0)
			fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
			fp=sum(X_@x>0)-tp
			cv=1-(fp+fn)/(tp+fn)
print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))

topFiberM

Description

implements topFiberM Boolean matrix factorization algorithm. topFiberM chooses in a greedy way the fibers (rows or columns) to represent the entire matrix. Fibers are extended to rectangles according to a threshold on precision. The search for these" top fibers" can continue beyond the required rank and according to an optional parameter that defines the limit for this search.

Usage

topFiberM(X, r = 2, tP = 0.5, verbose = 2, SR = NULL)
topFiberM(X, r = 2, tP = 0.5, verbose = 2, SR = NULL)

Arguments

`X`	the input boolean sparse matrix
`r`	rank (number of factors) required.
`tP`	parameter to put threshold on precision
`verbose`	integer value to control the appearance of messages. 0 minimal messages will be showed. Default 2
`SR`	search limit which defines the number iterations, minimum value is rank and maximum value is minimum number of columns and number of rows

Value

List of the following four components:

`A`	Factor matrix A
`B`	Factor matrix B
`X1`	remaining uncovered ones, (False negatives)
`tf`	dataframe logging of steps giving description of each factor, contains index, based on column (2) / row (1), nnz, TP, FP

Author(s)

Abdelmoneim Amer Desouki

References

Desouki, A. A., Roeder, M., & Ngomo, A. C. N. (2019). topFiberM: Scalable and Efficient Boolean Matrix Factorization. arXiv preprint arXiv:1903.10326.

Examples


data(DBLP)
    r=7
    tP=0.6
    X=DBLP
    Xb=X==1#Convert to boolean
    
 Res=topFiberM(Xb,r=r,tP=tP,SR=100,verbose=1)
    X_=Res$A %*% Res$B
    X_=as(X_,'TsparseMatrix')
    #Calculate metrics
    tempX=as(X,'TsparseMatrix')
    li=tempX@i[tempX@x==1]+1
    lj=tempX@j[tempX@x==1]+1
    tp=sum(X_[cbind(li,lj)]>0)
    fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
    fp=sum(X_@x>0)-tp
    cv=1-(fp+fn)/(tp+fn)
    
print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))
    
data(DBLP)
    r=7
    tP=0.6
    X=DBLP
    Xb=X==1#Convert to boolean
    
 Res=topFiberM(Xb,r=r,tP=tP,SR=100,verbose=1)
    X_=Res$A %*% Res$B
    X_=as(X_,'TsparseMatrix')
    #Calculate metrics
    tempX=as(X,'TsparseMatrix')
    li=tempX@i[tempX@x==1]+1
    lj=tempX@j[tempX@x==1]+1
    tp=sum(X_[cbind(li,lj)]>0)
    fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
    fp=sum(X_@x>0)-tp
    cv=1-(fp+fn)/(tp+fn)
    
print(sprintf("tp:%d, fp:%d,fn:%d, Error:%d, covered=%.3f",tp,fp,fn,fn+fp,cv))

Package 'rBMF'

Help Index

Boolean Matrix Factorization

Description

Details

Author(s)

References

See Also

Examples

Asso: Boolean Matrix Factorization

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Chess dataset

Description

Usage

Format

References

DBLP dataset

Description

Usage

Format

References

GreConD Boolean matrix factorization

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

GreConDPlus Boolean Matrix Factorization

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

topFiberM

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples