Title: | Chemical Structural Similarity Analysis |
---|---|
Description: | A new pipeline to explore chemical structural similarity across metabolite. It allows to classify metabolites in structurally-related modules and identify common shared functional groups. KODAMA algorithm is used to highlight structural similarity between metabolites. See Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA. (2017) Bioinformatics <doi:10.1093/bioinformatics/btw705>, Cacciatore S, Luchinat C, Tenori L. (2014) Proc Natl Acad Sci USA <doi:10.1073/pnas.1220873111>, and Abdel-Shafy EA, Melak T, MacIntyre DA, Zadra G, Zerbini LF, Piazza S, Cacciatore S. (2023) Bioinformatics Advances <doi:10.1093/bioadv/vbad053>. |
Authors: | Ebtesam Abdel-Shafy [aut], Tadele Melak [aut], David A. MacIntyre [aut], Giorgia Zadra [aut], Luiz F. Zerbini [aut], Silvano Piazza [aut], Stefano Cacciatore [aut, cre] |
Maintainer: | Stefano Cacciatore <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.4 |
Built: | 2024-11-03 06:43:14 UTC |
Source: | CRAN |
Cuts a tree as resulting from hclust
function, into groups (a.k.a. modules).
allbranches(hh,minlen=5)
allbranches(hh,minlen=5)
hh |
a tree as produced by |
minlen |
The minimum number of elements in each module. |
A list contains vectors of module memberships.
cutree
, hclust
, clusters.detection
data(Metabolites) data=Metabolites$readMet$concentration hh=hclust(dist(data),method="ward.D") res=allbranches(hh)
data(Metabolites) data=Metabolites$readMet$concentration hh=hclust(dist(data),method="ward.D") res=allbranches(hh)
This function calculates the structural dissimilarity between different metabolites using the simplified molecular-input line-entry system (SMILE) of each metabolite as input.
chemical.dissimilarity (smiles,method="tanimoto",type="extended")
chemical.dissimilarity (smiles,method="tanimoto",type="extended")
smiles |
A vector of smile notations. |
method |
The method used to calculated the distance between molecular fingerprint ("tanimoto" as default). For more information see |
type |
The type of fingerprint applied to the SMILEs ("extended" as default). For more information see |
A list contains distance between fingerprints .
fp.sim.matrix
, get.fingerprint
,
data(Metabolites) d=chemical.dissimilarity(Metabolites$SMILES[1:50])
data(Metabolites) d=chemical.dissimilarity(Metabolites$SMILES[1:50])
This dataset consists of a list of the metabolites names download from https://chemrich.fiehnlab.ucdavis.edu/. HMDB IDs were retrieved from PubChem Identifier Exchange Service (https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi) and manually curated.
data(ChemRICH)
data(ChemRICH)
A list with the following elements in the variable ChemRICH
:
name |
A vector of metabolite's names. |
SMILES |
A vector of SMILES represenation of each metabolite. |
HMDB |
A vector containing HMDB IDs of each metabolite. |
data(ChemRICH)
data(ChemRICH)
This function calculates the structural similarity between different metabolites and perform hierarchical clustering using the KODAMA algorithm and detect the optimal number of clusters. The procedure is repeated to ensure the robustness of the detection.
clusters.detection (smiles, repetition=10, k=50, seed=12345, max_nc = 30, dissimilarity.parameters=list(), kodama.matrix.parameters=list(), kodama.visualization.parameters=list(), hclust.parameters=list(method="ward.D"), verbose = TRUE)
clusters.detection (smiles, repetition=10, k=50, seed=12345, max_nc = 30, dissimilarity.parameters=list(), kodama.matrix.parameters=list(), kodama.visualization.parameters=list(), hclust.parameters=list(method="ward.D"), verbose = TRUE)
smiles |
A list of smile notations for the study metabolites dataset. |
repetition |
The number of time the KODAMA analysis is repeated. |
k |
A number of components of multidimensional scaling. |
seed |
Seed for the generation of random numbers. |
max_nc |
Maximum number of clusters. |
dissimilarity.parameters |
Optional parameters for |
kodama.matrix.parameters |
Optional parameters for |
kodama.visualization.parameters |
Optional parameters for |
hclust.parameters |
Optional parameters for |
verbose |
If verbose is TRUE, it displays the progress for each iteration. |
A list contains all results of KODAMA chemical similarity analysis and hierarchical clustering.
data(Metabolites) res=clusters.detection(Metabolites$SMILES)
data(Metabolites) res=clusters.detection(Metabolites$SMILES)
This function correlates metabolites to associated diseases.
diseasesMet(doc)
diseasesMet(doc)
doc |
A list of metabolites information produced by |
A data frame contains the diseases associated with each metabolite.
pathwaysMet
, taxonomyMet
, enzymesMet
data(Metabolites) dis=diseasesMet(Metabolites$readMet)
data(Metabolites) dis=diseasesMet(Metabolites$readMet)
This function finds the metabolite related enzymes.
enzymesMet(doc)
enzymesMet(doc)
doc |
A list of metabolites information produced by |
A data frame contains the enzymes associated with each metabolite.
pathwaysMet
, taxonomyMet
, diseasesMet
data(Metabolites) enz=enzymesMet(Metabolites$readMet)
data(Metabolites) enz=enzymesMet(Metabolites$readMet)
This function finds features associated with each cluster.
features(doc,cla,cl,HMDB_ID)
features(doc,cla,cl,HMDB_ID)
doc |
The output of the |
cla |
The output of |
cl |
The output of the |
HMDB_ID |
A vector of HMDB IDs associated with their chemical name. |
A list of p-value calculated using Fisher test for cluster associted features.
KODAMA.chem.sim
, tree.cutting
, substituentsMet
data(Metabolites) SMILES=Metabolites$SMILES names(SMILES)=Metabolites$name HMDB=Metabolites$HMDB names(HMDB)=Metabolites$name res=KODAMA.chem.sim(SMILES) cl=allbranches(res$hclust) cla=substituentsMet(Metabolites$readMet) f=features(Metabolites$readMet,cla,cl,HMDB)
data(Metabolites) SMILES=Metabolites$SMILES names(SMILES)=Metabolites$name HMDB=Metabolites$HMDB names(HMDB)=Metabolites$name res=KODAMA.chem.sim(SMILES) cl=allbranches(res$hclust) cla=substituentsMet(Metabolites$readMet) f=features(Metabolites$readMet,cla,cl,HMDB)
This dataset is dataframe of metabolite dataset contains only chemical information.
data(HFD)
data(HFD)
A list with the following elements in the variable HFD
:
SMILES |
A vector of SMILES represenation of each metabolite. |
CHEMICAL_ID |
A vector of chemical ID number or each metabolite. |
PUBCHEM |
A vector of identifier ID number from PUBCHEM database for chemical molecules and their activities in biological assays. |
CHEMSPIDER |
A vector of a unique identifier from CHEMSPIDER database each molecule. |
HMDB |
A vector containing HMDB IDs of each metabolite. |
data(HFD)
data(HFD)
This function calculates the structural similarity between different metabolites and perform hierarchical clustering using the KODAMA algorithm.
KODAMA.chem.sim (smiles, d=NULL, k=50, dissimilarity.parameters=list(), kodama.matrix.parameters=list(), kodama.visualization.parameters=list(), hclust.parameters=list(method="ward.D"))
KODAMA.chem.sim (smiles, d=NULL, k=50, dissimilarity.parameters=list(), kodama.matrix.parameters=list(), kodama.visualization.parameters=list(), hclust.parameters=list(method="ward.D"))
smiles |
A list of smile notations for the study metabolites dataset. |
d |
A distance structure such as that returned by dist or a full symmetric matrix containing the dissimilarities. If NULL (default), then the dissimilarity matrix will be generated by |
k |
A number of components of multidimensional scaling. |
dissimilarity.parameters |
Optional parameters for |
kodama.matrix.parameters |
Optional parameters for |
kodama.visualization.parameters |
Optional parameters for |
hclust.parameters |
Optional parameters for |
A list contains all results of KODAMA chemical similarity analysis and hierarchical clustering for the KODAMA dimensions.
data(Metabolites) res=KODAMA.chem.sim(Metabolites$SMILES) plot(res$kodama$visualization)
data(Metabolites) res=KODAMA.chem.sim(Metabolites$SMILES) plot(res$kodama$visualization)
This dataset consists of a list of the metabolites as returned by the function readMet
and concentration value of each metabolites.
data(Metabolites)
data(Metabolites)
A list with the following elements in the variable Metabolites
:
concentration |
A matrix containing the concentration of each metabolites. |
name |
A vector of metabolite's names. |
SMILES |
A vector of SMILES represenation of each metabolite. |
HMDB |
A vector containing HMDB IDs of each metabolite. |
readMet |
A list of metabolites information produced by |
data(Metabolites)
data(Metabolites)
This function extracts the metabolite's names from the list generated by readMet
function.
nameMet(doc)
nameMet(doc)
doc |
A list of metabolites information produced by |
A data frame contains the names of each metabolite.
data(Metabolites) nam=nameMet(Metabolites$readMet)
data(Metabolites) nam=nameMet(Metabolites$readMet)
This function finds the metabolite related pathways.
pathwaysMet(doc)
pathwaysMet(doc)
doc |
A list of metabolites information produced by |
A data frame contains the pathways associated with each metabolite.
readMet
, taxonomyMet
, enzymesMet
, diseasesMet
data(Metabolites) pat=pathwaysMet(Metabolites$readMet)
data(Metabolites) pat=pathwaysMet(Metabolites$readMet)
This function finds the Physical Proprieties of metabolites.
propertiesMet(doc)
propertiesMet(doc)
doc |
A list of metabolites information produced by |
A data frame contains the properties associated with each metabolite.
readMet
, taxonomyMet
, substituentsMet
, propertiesMet
data(Metabolites) pro=propertiesMet(Metabolites$readMet)
data(Metabolites) pro=propertiesMet(Metabolites$readMet)
This function extract metabocards of your metabolites dataset from http://www.hmdb.ca/metabolites/ database and store all of this information in a list.
readMet(ID, address = c("http://www.hmdb.ca/metabolites/"),remove=TRUE)
readMet(ID, address = c("http://www.hmdb.ca/metabolites/"),remove=TRUE)
ID |
A vector containg the HMDBcodes (i.e., metabolite IDs) of metabolites dataset. |
address |
Optional address where the MetaboCards are located. The default address is http://www.hmdb.ca/metabolites/. |
remove |
A logic value. If true, missing and wrong HMDB IDs are removed. |
A list containing all the information related to the metabocards.
ID=c("HMDB0000122","HMDB0000124","HMDB0000243","HMDB0000263") doc=readMet(ID)
ID=c("HMDB0000122","HMDB0000124","HMDB0000243","HMDB0000263") doc=readMet(ID)
This function select metabolites from the list generated by readMet
function.
selectionMet(doc, sel)
selectionMet(doc, sel)
doc |
A list of metabolites information produced by |
sel |
A vector of metabolite's HMDBcode that will be selected |
doc |
A doc list contains only the selcted metabolites. |
data(Metabolites) doc=selectionMet(Metabolites$readMet,c("HMDB0000299","HMDB0000881")) nameMet(doc)
data(Metabolites) doc=selectionMet(Metabolites$readMet,c("HMDB0000299","HMDB0000881")) nameMet(doc)
This function finds the metabolite related substituents.
substituentsMet(doc)
substituentsMet(doc)
doc |
A list of metabolites information produced by |
.
A data frame contains the substituents of each metabolite.
readMet
, nameMet
, propertiesMet
data(Metabolites) sub=substituentsMet(Metabolites$readMet)
data(Metabolites) sub=substituentsMet(Metabolites$readMet)
This function finds the metabolite related taxonomy.
taxonomyMet(doc)
taxonomyMet(doc)
doc |
A list of metabolites information produced by |
A data frame contains the taxonomy of each metabolite.
readMet
, propertiesMet
, enzymesMet
, diseasesMet
data(Metabolites) tax=taxonomyMet(Metabolites$readMet)
data(Metabolites) tax=taxonomyMet(Metabolites$readMet)
This function helps to estimate the optimal cluster number that fit the metabolites dataset. It applies different optimal cluster number calculating algorithms to cut clutering tree of hclust
function. and return a list contains index corresponding to each cluster number.
tree.cutting (res,max_nc=20)
tree.cutting (res,max_nc=20)
res |
A list produced by |
max_nc |
The maximum number of cluster (default = 20). |
A list contains the calculation for each clustering of Rousseeuw's Silhouette index.
data(Metabolites) res=KODAMA.chem.sim(Metabolites$SMILES) clu=tree.cutting(res,max_nc = 30) plot(clu$min_nc:clu$max_nc,clu$res.S)
data(Metabolites) res=KODAMA.chem.sim(Metabolites$SMILES) clu=tree.cutting(res,max_nc = 30) plot(clu$min_nc:clu$max_nc,clu$res.S)
Summarize metabolites concetration in each of identified clusters using the module eigenvalue (eigen-metabolite) for calculating module membership measures.
WMCSA(data,cl)
WMCSA(data,cl)
data |
dataset of different metabolite concentration in differnt samples. |
cl |
The output of the |
This function returns a matrix as output represent similarity score of metabolites within the same module among different samples.
data(Metabolites) SMILES=Metabolites$SMILES names(SMILES)=Metabolites$name res=KODAMA.chem.sim(SMILES) cl=allbranches(res$hclust) ww=WMCSA(Metabolites$concentration,cl)
data(Metabolites) SMILES=Metabolites$SMILES names(SMILES)=Metabolites$name res=KODAMA.chem.sim(SMILES) cl=allbranches(res$hclust) ww=WMCSA(Metabolites$concentration,cl)
This function write a file in the format CLS defined by GenePattern.
write.cls(es, address)
write.cls(es, address)
es |
A matrix. |
address |
The address of the file should be saved. |
No return value. If an invalid address is inserted, the function will generate an error.
This function write a file in the format GCT defined by GenePattern.
write.gct(es, address)
write.gct(es, address)
es |
A matrix. |
address |
The address of the file should be saved. |
No return value. If an invalid address is inserted, the function will generate an error.
This function write a file containing the Metabolite Set informtation in the format GMT defined by GenePattern.
write.gmt(sub,address,min_entry=2,max_entry=50)
write.gmt(sub,address,min_entry=2,max_entry=50)
sub |
A matrix. |
address |
The address of the file should be saved. |
min_entry |
The minimum number of metabolites for each metabolite set. |
max_entry |
The maximum number of metabolites for each metabolite set. |
No return value. If an invalid address is inserted, the function will generate an error.