Package 'MetChem' reference manual

Title:	Chemical Structural Similarity Analysis
Description:	A new pipeline to explore chemical structural similarity across metabolite. It allows to classify metabolites in structurally-related modules and identify common shared functional groups. KODAMA algorithm is used to highlight structural similarity between metabolites. See Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA. (2017) Bioinformatics <doi:10.1093/bioinformatics/btw705>, Cacciatore S, Luchinat C, Tenori L. (2014) Proc Natl Acad Sci USA <doi:10.1073/pnas.1220873111>, and Abdel-Shafy EA, Melak T, MacIntyre DA, Zadra G, Zerbini LF, Piazza S, Cacciatore S. (2023) Bioinformatics Advances <doi:10.1093/bioadv/vbad053>.
Authors:	Ebtesam Abdel-Shafy [aut], Tadele Melak [aut], David A. MacIntyre [aut], Giorgia Zadra [aut], Luiz F. Zerbini [aut], Silvano Piazza [aut], Stefano Cacciatore [aut, cre]
Maintainer:	Stefano Cacciatore <[email protected]>
License:	GPL (>= 2)
Version:	0.4
Built:	2025-03-03 07:05:33 UTC
Source:	CRAN

Cut a Tree into Groups of Data

Description

Cuts a tree as resulting from hclust function, into groups (a.k.a. modules).

Usage


allbranches(hh,minlen=5)

allbranches(hh,minlen=5)

Arguments

`hh`	a tree as produced by `hclust` function.
`minlen`	The minimum number of elements in each module.

Value

A list contains vectors of module memberships.

Examples


data(Metabolites)

data=Metabolites$readMet$concentration
hh=hclust(dist(data),method="ward.D")
res=allbranches(hh) 


data(Metabolites)

data=Metabolites$readMet$concentration
hh=hclust(dist(data),method="ward.D")
res=allbranches(hh)

Chemical dissimilarity.

Description

This function calculates the structural dissimilarity between different metabolites using the simplified molecular-input line-entry system (SMILE) of each metabolite as input.

Usage


chemical.dissimilarity (smiles,method="tanimoto",type="extended")

chemical.dissimilarity (smiles,method="tanimoto",type="extended")

Arguments

`smiles`	A vector of smile notations.
`method`	The method used to calculated the distance between molecular fingerprint ("tanimoto" as default). For more information see `fp.sim.matrix` function.
`type`	The type of fingerprint applied to the SMILEs ("extended" as default). For more information see `get.fingerprint` function.

Value

A list contains distance between fingerprints .

Examples


data(Metabolites)
d=chemical.dissimilarity(Metabolites$SMILES[1:50])

data(Metabolites)
d=chemical.dissimilarity(Metabolites$SMILES[1:50])

This dataset consists of a list of the metabolites names download from https://chemrich.fiehnlab.ucdavis.edu/. HMDB IDs were retrieved from PubChem Identifier Exchange Service (https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi) and manually curated.

Usage

data(ChemRICH)data(ChemRICH)

Value

A list with the following elements in the variable ChemRICH:

`name`	A vector of metabolite's names.
`SMILES`	A vector of SMILES represenation of each metabolite.
`HMDB`	A vector containing HMDB IDs of each metabolite.

Examples

 data(ChemRICH)

data(ChemRICH)

Detection of clusters.

Description

This function calculates the structural similarity between different metabolites and perform hierarchical clustering using the KODAMA algorithm and detect the optimal number of clusters. The procedure is repeated to ensure the robustness of the detection.

Usage



clusters.detection  (smiles,
                     repetition=10,
                     k=50,
                     seed=12345,
                     max_nc = 30,
                     dissimilarity.parameters=list(),
                     kodama.matrix.parameters=list(),
                     kodama.visualization.parameters=list(),
                     hclust.parameters=list(method="ward.D"),
                     verbose = TRUE)

clusters.detection  (smiles,
                     repetition=10,
                     k=50,
                     seed=12345,
                     max_nc = 30,
                     dissimilarity.parameters=list(),
                     kodama.matrix.parameters=list(),
                     kodama.visualization.parameters=list(),
                     hclust.parameters=list(method="ward.D"),
                     verbose = TRUE)

Arguments

`smiles`	A list of smile notations for the study metabolites dataset.
`repetition`	The number of time the KODAMA analysis is repeated.
`k`	A number of components of multidimensional scaling.
`seed`	Seed for the generation of random numbers.
`max_nc`	Maximum number of clusters.
`dissimilarity.parameters`	Optional parameters for `chemical.dissimilarity` function.
`kodama.matrix.parameters`	Optional parameters for `KODAMA.matrix` function.
`kodama.visualization.parameters`	Optional parameters for `KODAMA.visualization` function.
`hclust.parameters`	Optional parameters for `hclust` function.
`verbose`	If verbose is TRUE, it displays the progress for each iteration.

Value

A list contains all results of KODAMA chemical similarity analysis and hierarchical clustering.

Examples


data(Metabolites)

res=clusters.detection(Metabolites$SMILES) 


data(Metabolites)

res=clusters.detection(Metabolites$SMILES)

Metabolite-associated Diseases

Description

This function correlates metabolites to associated diseases.

Usage

diseasesMet(doc)
diseasesMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the diseases associated with each metabolite.

Examples


data(Metabolites)
dis=diseasesMet(Metabolites$readMet)

data(Metabolites)
dis=diseasesMet(Metabolites$readMet)

Metabolite-associated Enzymes

Description

This function finds the metabolite related enzymes.

Usage

enzymesMet(doc)
enzymesMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the enzymes associated with each metabolite.

Examples


data(Metabolites)
enz=enzymesMet(Metabolites$readMet)

data(Metabolites)
enz=enzymesMet(Metabolites$readMet)

Cluster features extraction

Description

This function finds features associated with each cluster.

Usage


features(doc,cla,cl,HMDB_ID)
features(doc,cla,cl,HMDB_ID)

Arguments

`doc`	The output of the `readMet` function.
`cla`	The output of `diseasesMet`, `enzymesMet`, `pathwaysMet`, `propertiesMet`, `substituentsMet`, or `taxonomyMet` functions.
`cl`	The output of the `allbranches` function containing the module memberships.
`HMDB_ID`	A vector of HMDB IDs associated with their chemical name.

Value

A list of p-value calculated using Fisher test for cluster associted features.

Examples


data(Metabolites)
SMILES=Metabolites$SMILES
names(SMILES)=Metabolites$name
HMDB=Metabolites$HMDB
names(HMDB)=Metabolites$name
res=KODAMA.chem.sim(SMILES)
cl=allbranches(res$hclust)
cla=substituentsMet(Metabolites$readMet)
f=features(Metabolites$readMet,cla,cl,HMDB)

data(Metabolites)
SMILES=Metabolites$SMILES
names(SMILES)=Metabolites$name
HMDB=Metabolites$HMDB
names(HMDB)=Metabolites$name
res=KODAMA.chem.sim(SMILES)
cl=allbranches(res$hclust)
cla=substituentsMet(Metabolites$readMet)
f=features(Metabolites$readMet,cla,cl,HMDB)

HFD Dataset

Description

This dataset is dataframe of metabolite dataset contains only chemical information.

Usage

data(HFD)data(HFD)

Value

A list with the following elements in the variable HFD:

`SMILES`	A vector of SMILES represenation of each metabolite.
`CHEMICAL_ID`	A vector of chemical ID number or each metabolite.
`PUBCHEM`	A vector of identifier ID number from PUBCHEM database for chemical molecules and their activities in biological assays.
`CHEMSPIDER`	A vector of a unique identifier from CHEMSPIDER database each molecule.
`HMDB`	A vector containing HMDB IDs of each metabolite.

Examples

 data(HFD)

data(HFD)

KODAMA chemical similarity.

Description

This function calculates the structural similarity between different metabolites and perform hierarchical clustering using the KODAMA algorithm.

Usage




KODAMA.chem.sim (smiles,
                 d=NULL,
                 k=50,
                 dissimilarity.parameters=list(),
                 kodama.matrix.parameters=list(),
                 kodama.visualization.parameters=list(),
                 hclust.parameters=list(method="ward.D"))

KODAMA.chem.sim (smiles,
                 d=NULL,
                 k=50,
                 dissimilarity.parameters=list(),
                 kodama.matrix.parameters=list(),
                 kodama.visualization.parameters=list(),
                 hclust.parameters=list(method="ward.D"))

Arguments

`smiles`	A list of smile notations for the study metabolites dataset.
`d`	A distance structure such as that returned by dist or a full symmetric matrix containing the dissimilarities. If NULL (default), then the dissimilarity matrix will be generated by `chemical.dissimilarity` function. Otherwise, `d` will be considered as the dissimilarity matrix.
`k`	A number of components of multidimensional scaling.
`dissimilarity.parameters`	Optional parameters for `chemical.dissimilarity` function.
`kodama.matrix.parameters`	Optional parameters for `KODAMA.matrix` function.
`kodama.visualization.parameters`	Optional parameters for `KODAMA.visualization` function.
`hclust.parameters`	Optional parameters for `hclust` function.

Value

A list contains all results of KODAMA chemical similarity analysis and hierarchical clustering for the KODAMA dimensions.

Examples


data(Metabolites)

res=KODAMA.chem.sim(Metabolites$SMILES)  
plot(res$kodama$visualization)


data(Metabolites)

res=KODAMA.chem.sim(Metabolites$SMILES)  
plot(res$kodama$visualization)

Metabolomic Dataset

Description

This dataset consists of a list of the metabolites as returned by the function readMet and concentration value of each metabolites.

Usage

data(Metabolites)data(Metabolites)

Value

A list with the following elements in the variable Metabolites:

`concentration`	A matrix containing the concentration of each metabolites.
`name`	A vector of metabolite's names.
`SMILES`	A vector of SMILES represenation of each metabolite.
`HMDB`	A vector containing HMDB IDs of each metabolite.
`readMet`	A list of metabolites information produced by `readMet` function.

Examples

 data(Metabolites)

data(Metabolites)

Name of metabolites

Description

This function extracts the metabolite's names from the list generated by readMet function.

Usage

nameMet(doc)
nameMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the names of each metabolite.

Examples


data(Metabolites)
nam=nameMet(Metabolites$readMet)

data(Metabolites)
nam=nameMet(Metabolites$readMet)

Metabolic Pathways

Description

This function finds the metabolite related pathways.

Usage

pathwaysMet(doc)
pathwaysMet(doc)

Arguments

doc

A list of metabolites information produced by readMetfunction.

Value

A data frame contains the pathways associated with each metabolite.

Examples



data(Metabolites)
pat=pathwaysMet(Metabolites$readMet)

data(Metabolites)
pat=pathwaysMet(Metabolites$readMet)

Physical Proprieties of metabolites

Description

This function finds the Physical Proprieties of metabolites.

Usage

propertiesMet(doc)
propertiesMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the properties associated with each metabolite.

Examples


data(Metabolites)
pro=propertiesMet(Metabolites$readMet)

data(Metabolites)
pro=propertiesMet(Metabolites$readMet)

Metabolite Cards Reading

Description

This function extract metabocards of your metabolites dataset from http://www.hmdb.ca/metabolites/ database and store all of this information in a list.

Usage

readMet(ID, address =  c("http://www.hmdb.ca/metabolites/"),remove=TRUE) 
readMet(ID, address =  c("http://www.hmdb.ca/metabolites/"),remove=TRUE)

Arguments

`ID`	A vector containg the HMDBcodes (i.e., metabolite IDs) of metabolites dataset.
`address`	Optional address where the MetaboCards are located. The default address is http://www.hmdb.ca/metabolites/.
`remove`	A logic value. If true, missing and wrong HMDB IDs are removed.

Value

A list containing all the information related to the metabocards.

Examples



ID=c("HMDB0000122","HMDB0000124","HMDB0000243","HMDB0000263")
doc=readMet(ID) 


ID=c("HMDB0000122","HMDB0000124","HMDB0000243","HMDB0000263")
doc=readMet(ID)

Metabolites selection

Description

This function select metabolites from the list generated by readMet function.

Usage

selectionMet(doc, sel)
selectionMet(doc, sel)

Arguments

`doc`	A list of metabolites information produced by `readMet` function.
`sel`	A vector of metabolite's HMDBcode that will be selected

Value

doc

A doc list contains only the selcted metabolites.

Examples


data(Metabolites)
doc=selectionMet(Metabolites$readMet,c("HMDB0000299","HMDB0000881"))
nameMet(doc)
data(Metabolites)
doc=selectionMet(Metabolites$readMet,c("HMDB0000299","HMDB0000881"))
nameMet(doc)

Metabolite substituents

Description

This function finds the metabolite related substituents.

Usage

substituentsMet(doc)
substituentsMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function

Value

A data frame contains the substituents of each metabolite.

Examples


data(Metabolites)
sub=substituentsMet(Metabolites$readMet)

data(Metabolites)
sub=substituentsMet(Metabolites$readMet)

Metabolite Taxonomy

Description

This function finds the metabolite related taxonomy.

Usage

taxonomyMet(doc)
taxonomyMet(doc)

Arguments

doc

A list of metabolites information produced by readMet function.

Value

A data frame contains the taxonomy of each metabolite.

Examples


data(Metabolites)
tax=taxonomyMet(Metabolites$readMet)

data(Metabolites)
tax=taxonomyMet(Metabolites$readMet)

Optimal cluster number calculation.

Description

This function helps to estimate the optimal cluster number that fit the metabolites dataset. It applies different optimal cluster number calculating algorithms to cut clutering tree of hclust function. and return a list contains index corresponding to each cluster number.

Usage

tree.cutting (res,max_nc=20)
tree.cutting (res,max_nc=20)

Arguments

`res`	A list produced by `KODAMA.chem.sim` function.
`max_nc`	The maximum number of cluster (default = 20).

Value

A list contains the calculation for each clustering of Rousseeuw's Silhouette index.

Examples


data(Metabolites)

res=KODAMA.chem.sim(Metabolites$SMILES)
clu=tree.cutting(res,max_nc = 30)
plot(clu$min_nc:clu$max_nc,clu$res.S)


data(Metabolites)

res=KODAMA.chem.sim(Metabolites$SMILES)
clu=tree.cutting(res,max_nc = 30)
plot(clu$min_nc:clu$max_nc,clu$res.S)

Weighted Metabolite Chemical Structural Analysis

Description

Summarize metabolites concetration in each of identified clusters using the module eigenvalue (eigen-metabolite) for calculating module membership measures.

Usage

WMCSA(data,cl)
WMCSA(data,cl)

Arguments

`data`	dataset of different metabolite concentration in differnt samples.
`cl`	The output of the `allbranches` function containing the module memberships.

Value

This function returns a matrix as output represent similarity score of metabolites within the same module among different samples.

Examples



data(Metabolites)

SMILES=Metabolites$SMILES
names(SMILES)=Metabolites$name
res=KODAMA.chem.sim(SMILES)  
cl=allbranches(res$hclust)
ww=WMCSA(Metabolites$concentration,cl)



data(Metabolites)

SMILES=Metabolites$SMILES
names(SMILES)=Metabolites$name
res=KODAMA.chem.sim(SMILES)  
cl=allbranches(res$hclust)
ww=WMCSA(Metabolites$concentration,cl)

Write a CLS file

Description

This function write a file in the format CLS defined by GenePattern.

Usage

write.cls(es, address)
write.cls(es, address)

Arguments

`es`	A matrix.
`address`	The address of the file should be saved.

Value

No return value. If an invalid address is inserted, the function will generate an error.

Write a GCT file

Description

This function write a file in the format GCT defined by GenePattern.

Usage

write.gct(es, address)
write.gct(es, address)

Arguments

`es`	A matrix.
`address`	The address of the file should be saved.

Value

No return value. If an invalid address is inserted, the function will generate an error.

Write a GMT file

Description

This function write a file containing the Metabolite Set informtation in the format GMT defined by GenePattern.

Usage

write.gmt(sub,address,min_entry=2,max_entry=50)
write.gmt(sub,address,min_entry=2,max_entry=50)

Arguments

`sub`	A matrix.
`address`	The address of the file should be saved.
`min_entry`	The minimum number of metabolites for each metabolite set.
`max_entry`	The maximum number of metabolites for each metabolite set.

Value

No return value. If an invalid address is inserted, the function will generate an error.

Package 'MetChem'

Help Index

Cut a Tree into Groups of Data

Description

Usage

Arguments

Value

See Also

Examples

Chemical dissimilarity.

Description

Usage

Arguments

Value

See Also

Examples

ChemRICH Dataset

Description

Usage

Value

Examples

Detection of clusters.

Description

Usage

Arguments

Value

See Also

Examples

Metabolite-associated Diseases

Description

Usage

Arguments

Value

See Also

Examples

Metabolite-associated Enzymes

Description

Usage

Arguments

Value

See Also

Examples

Cluster features extraction

Description

Usage

Arguments

Value

See Also

Examples

HFD Dataset

Description

Usage

Value

Examples

KODAMA chemical similarity.

Description

Usage

Arguments

Value

See Also

Examples

Metabolomic Dataset

Description

Usage

Value

Examples

Name of metabolites

Description

Usage

Arguments

Value

See Also

Examples

Metabolic Pathways

Description

Usage

Arguments

Value

See Also

Examples