Title: | Interface to the 'CDK' Libraries |
---|---|
Description: | Allows the user to access functionality in the 'CDK', a Java framework for chemoinformatics. This allows the user to load molecules, evaluate fingerprints, calculate molecular descriptors and so on. In addition, the 'CDK' API allows the user to view structures in 2D. |
Authors: | Rajarshi Guha [aut, cph], Zachary Charlop-Powers [cre], Emma Schymanski [ctb] |
Maintainer: | Zachary Charlop-Powers <[email protected]> |
License: | LGPL |
Version: | 3.8.1 |
Built: | 2024-10-24 06:52:00 UTC |
Source: | CRAN |
get.symbol
returns the chemical symbol for an atom
get.point3d
returns the 3D coordinates of the atom
get.point2d
returns the 2D coordinates of the atom
get.atomic.number
returns the atomic number of the atom
get.hydrogen.count
returns the number of implicit H’s on the atom.
Depending on where the molecule was read from this may be NULL
or an integer greater than or equal to 0
get.charge
returns the partial charge on the atom. If charges
have not been set the return value is NULL
, otherwise
the appropriate charge.
get.formal.charge
returns the formal charge on the atom. By
default the formal charge will be 0
(i.e., NULL
is never returned)
is.aromatic
returns TRUE
if the atom is aromatic,
FALSE
otherwise
is.aliphatic
returns TRUE
if the atom is part of an
aliphatic chain, FALSE
otherwise
is.in.ring
returns TRUE
if the atom is in a ring,
FALSE
otherwise
get.atom.index
eturns the index of the atom in the molecule
(starting from 0
)
get.connected.atoms
returns a list of atoms that are connected to the specified atom
get.symbol(atom) get.point3d(atom) get.point2d(atom) get.atomic.number(atom) get.hydrogen.count(atom) get.charge(atom) get.formal.charge(atom) get.connected.atoms(atom, mol) get.atom.index(atom, mol) is.aromatic(atom) is.aliphatic(atom) is.in.ring(atom) set.atom.types(mol)
atom A jobjRef representing an IAtom object mol A jobjRef representing an IAtomContainer object
In the case of get.point3d
the return value is a 3-element vector
containing the X, Y and Z co-ordinates of the atom. If the atom does not
have 3D coordinates, it returns a vector of the form c(NA,NA,NA)
.
Similarly for get.point2d
, in which case the return vector is of
length 2
.
Rajarshi Guha ([email protected])
A dataset containing the structures and associated boiling points for 277 molecules, primarily alkanes and substituted alkanes.
bpdata
bpdata
A data frame with 277 rows and 2 columns.:
Structure in SMILES format
Boiling point in Kelvin
The names of the molecules are used as the row names.
Goll, E.S. and Jurs, P.C.; "Prediction of the Normal Boiling Points of Organic Compounds From Molecular Structures with a Computational Neural Network Model", J. Chem. Inf. Comput. Sci., 1999, 39, 974-983.
Get the current CDK version used in the package.
cdk.version()
cdk.version()
Returns a character containing the version of the CDK used in this package
Rajarshi Guha ([email protected])
This class handles molecular formulae. It provides extra information such as the IMolecularFormula Java object, elements contained and number of them.
Objects can be created using new constructor and filled with a specific mass and window accuracy
Miguel Rojas-Cherto ([email protected])
A parallel effort to expand the Chemistry Development Kit: https://cdk.github.io/
get.formula
set.charge.formula
get.isotopes.pattern
isvalid.formula
Computes a similarity score between two different isotope abundance patterns.
compare.isotope.pattern(iso1, iso2, ips = NULL)
compare.isotope.pattern(iso1, iso2, ips = NULL)
iso1 |
The first isotope pattern, which should be a |
iso2 |
The second isotope pattern, which should be a |
ips |
An instance of the |
A numeric value between 0 and 1 indicating the similarity between the two patterns
Miguel Rojas Cherto
http://cdk.github.io/cdk/2.3/docs/api/org/openscience/cdk/formula/IsotopePatternSimilarity.html
get.isotope.pattern.similarity
In some cases, a molecule may not have any hydrogens (such as when read in from an MDL MOL file that did not have hydrogens or SMILES with no explicit hydrogens). In such cases, this method will add implicit hydrogens and then convert them to explicit ones. The newly added H's will not have any 2D or 3D coordinates associated with them. Ensure that the molecule has been typed beforehand.
convert.implicit.to.explicit(mol)
convert.implicit.to.explicit(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
Rajarshi Guha ([email protected])
get.hydrogen.count
, remove.hydrogens
, set.atom.types
generate an image and make it available to the system clipboard.
copy.image.to.clipboard(molecule, depictor = NULL)
copy.image.to.clipboard(molecule, depictor = NULL)
molecule |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
depictor |
Optional. Default |
detect aromaticity of an input compound
do.aromaticity(mol)
do.aromaticity(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
configure isotopes
do.isotopes(mol)
do.isotopes(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
Compute descriptors for each atom in a molecule
eval.atomic.desc(molecule, which.desc, verbose = FALSE)
eval.atomic.desc(molecule, which.desc, verbose = FALSE)
molecule |
A molecule object |
which.desc |
A character vector of atomic descriptor class names |
verbose |
Optional. Default |
A 'data.frame' with atoms in the rows and descriptors in the columns
Rajarshi Guha ([email protected])
Compute descriptor values for a set of molecules
eval.desc(molecules, which.desc, verbose = FALSE)
eval.desc(molecules, which.desc, verbose = FALSE)
molecules |
A 'list' of molecule objects |
which.desc |
A character vector listing descriptor class names |
verbose |
If 'TRUE', verbose output |
A 'data.frame' with molecules in the rows and descriptors in the columns. If a descriptor value cannot be computed for a molecule, 'NA' is returned.
Rajarshi Guha ([email protected])
Some file formats such as SMILES do not support 2D (or 3D) coordinates for the atoms. Other formats such as SD or MOL have support for coordinates but may not include them. This method will generate reasonable 2D coordinates based purely on connectivity information, overwriting any existing coordinates if present.
generate.2d.coordinates(mol)
generate.2d.coordinates(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
Note that when depicting a molecule (view.molecule.2d
), 2D coordinates
are generated, but since it does not modify the input molecule, we do not have access
to the generated coordinates.
The input molecule, with 2D coordinates added
Rajarshi Guha ([email protected])
generate.formula
generate.formula( mass, window = 0.01, elements = list(c("C", 0, 50), c("H", 0, 50), c("N", 0, 50), c("O", 0, 50), c("S", 0, 50)), validation = FALSE, charge = 0 )
generate.formula( mass, window = 0.01, elements = list(c("C", 0, 50), c("H", 0, 50), c("N", 0, 50), c("O", 0, 50), c("S", 0, 50)), validation = FALSE, charge = 0 )
mass |
Required. Mass. |
window |
Optional. Default |
elements |
Optional. Default |
validation |
Optional. Default |
charge |
Optional. Default |
Generate a list of possible formula objects given a mass and a mass tolerance.
generate.formula.iter( mass, window = 0.01, elements = list(c("C", 0, 50), c("H", 0, 50), c("N", 0, 50), c("O", 0, 50), c("S", 0, 50)), validation = FALSE, charge = 0, as.string = TRUE )
generate.formula.iter( mass, window = 0.01, elements = list(c("C", 0, 50), c("H", 0, 50), c("N", 0, 50), c("O", 0, 50), c("S", 0, 50)), validation = FALSE, charge = 0, as.string = TRUE )
mass |
Required. Mass. |
window |
Optional. Default |
elements |
Optional. Default |
validation |
Optional. Default |
charge |
Optional. Default |
as.string |
Optional. Default |
The adjacency matrix for a molecule with non-hydrogen atoms is an
matrix where the element [
,
] is set to 1
if atoms
and
are connected by a bond, otherwise set to 0.
get.adjacency.matrix(mol)
get.adjacency.matrix(mol)
mol |
A |
A numeric matrix
Rajarshi Guha [email protected]
m <- parse.smiles("CC=C")[[1]] get.adjacency.matrix(m)
m <- parse.smiles("CC=C")[[1]] get.adjacency.matrix(m)
Compute ALogP for a molecule
get.alogp(molecule)
get.alogp(molecule)
molecule |
A molecule object |
A double value representing the ALogP value
Rajarshi Guha ([email protected])
Get the number of atoms in the molecule.
get.atom.count(mol)
get.atom.count(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
An integer representing the number of atoms in the molecule
Rajarshi Guha ([email protected])
Get the index of an atom in a molecule.
get.atom.index(atom, mol)
get.atom.index(atom, mol)
atom |
The atom object |
mol |
The 'IAtomContainer' object containing the atom |
Acces the index of an atom in the context of an IAtomContainer. Indexing starts from 0. If the index is not known, -1 is returned.
An integer representing the atom index.
Rajarshi Guha ([email protected])
Get class names for atomic descriptors
get.atomic.desc.names(type = "all")
get.atomic.desc.names(type = "all")
type |
A string indicating which class of descriptors to return. Specifying '"all"' will return class names for all molecular descriptors. Options include * topological * geometrical * hybrid * constitutional * protein * electronic |
A character vector containing class names for atomic descriptors
Rajarshi Guha ([email protected])
Get the atomic number of the atom.
get.atomic.number(atom)
get.atomic.number(atom)
atom |
The atom to query |
An integer representing the atomic number
Rajarshi Guha ([email protected])
Get the atoms from a molecule or bond.
get.atoms(object)
get.atoms(object)
object |
A 'jobjRef' representing either a molecule ('IAtomContainer') or bond ('IBond') object. |
A list of 'jobjRef' representing the 'IAtom' objects in the molecule or bond
Rajarshi Guha ([email protected])
get.bonds
, get.connected.atoms
This function returns a Java enum representing a bond order. This can be used to modify the order of pre-existing bonds
get.bond.order(order = "single")
get.bond.order(order = "single")
order |
A character vector that can be one of single, double, triple, quadruple, quintuple, sextuple or unset. Case is ignored |
A jObjRef
representing an 'Order' enum object
Rajarshi Guha ([email protected])
## Not run: m <- parse.smiles('CCN')[[1]] b <- get.bonds(m)[[1]] b$setOrder(get.bond.order("double")) ## End(Not run)
## Not run: m <- parse.smiles('CCN')[[1]] b <- get.bonds(m)[[1]] b$setOrder(get.bond.order("double")) ## End(Not run)
Get the bonds in a molecule.
get.bonds(mol)
get.bonds(mol)
mol |
A 'jobjRef' representing the molecule ('IAtomContainer') object. |
A list of 'jobjRef' representing the bonds ('IBond') objects in the molecule
Rajarshi Guha ([email protected])
get.atoms
, get.connected.atoms
Get the charge on the atom.
get.charge(atom)
get.charge(atom)
atom |
The atom to query |
This method returns the partial charge on the atom. If charges have not been set the
return value is NULL
, otherwise the appropriate charge.
An numeric representing the partial charge. If charges have not been set, 'NULL' is returned
Rajarshi Guha ([email protected])
The CDK employs a builder design pattern to construct instances of new chemical objects (e.g., atoms, bonds, parsers and so on). Many methods require an instance of a builder object to function. While most functions in this package handle this internally, it is useful to be able to get an instance of a builder object when directly working with the CDK API via 'rJava'.
get.chem.object.builder()
get.chem.object.builder()
This method returns an instance of the SilentChemObjectBuilder. Note that this is a static object that is created at package load time, and the same instance is returned whenever this function is called.
An instance of SilentChemObjectBuilder
Rajarshi Guha ([email protected])
This function returns the atom that is connected to a specified in a specified bond. Note that this function assumes 2-atom bonds, mainly because the CDK does not currently support other types of bonds
get.connected.atom(bond, atom)
get.connected.atom(bond, atom)
bond |
A |
atom |
A |
A jObjRef
representing an 'IAtom“ object
Rajarshi Guha ([email protected])
Get atoms connected to the specified atom
get.connected.atoms(atom, mol)
get.connected.atoms(atom, mol)
atom |
The atom object |
mol |
The 'IAtomContainer' object containing the atom |
Returns a 'list“ of atoms that are connected to the specified atom.
A 'list' containing 'IAtom' objects, representing the atoms directly connected to the specified atom
Rajarshi Guha ([email protected])
The connection matrix for a molecule with non-hydrogen atoms is an
matrix where the element [
,
] is set to the
bond order if atoms
and
are connected by a bond, otherwise set to 0.
get.connection.matrix(mol)
get.connection.matrix(mol)
mol |
A |
A numeric matrix
Rajarshi Guha [email protected]
m <- parse.smiles("CC=C")[[1]] get.connection.matrix(m)
m <- parse.smiles("CC=C")[[1]] get.connection.matrix(m)
return an RcdkDepictor.
get.depictor( width = 200, height = 200, zoom = 1.3, style = "cow", annotate = "off", abbr = "on", suppressh = TRUE, showTitle = FALSE, smaLimit = 100, sma = NULL, fillToFit = FALSE )
get.depictor( width = 200, height = 200, zoom = 1.3, style = "cow", annotate = "off", abbr = "on", suppressh = TRUE, showTitle = FALSE, smaLimit = 100, sma = NULL, fillToFit = FALSE )
width |
Default. |
height |
Default. |
zoom |
Default. |
style |
Default. |
annotate |
Default. |
abbr |
Default. |
suppressh |
Default. |
showTitle |
Default. |
smaLimit |
Default. |
sma |
Default. |
fillToFit |
Defailt. |
List available descriptor categories
get.desc.categories()
get.desc.categories()
A character vector listing available descriptor categories. This can be used in get.desc.names
Rajarshi Guha ([email protected])
Get descriptor class names
get.desc.names(type = "all")
get.desc.names(type = "all")
type |
A string indicating which class of descriptors to return. Specifying '"all"' will return class names for all molecular descriptors. Options include * topological * geometrical * hybrid * constitutional * protein * electronic |
Rajarshi Guha ([email protected])
Supported elements types are
an central atom involved in a cumulated system (not yet supported)
an atom at one end of a geometric (double-bond) stereo bond or cumulated system
a tetrahedral atom (could also be square planar in future)
the atom is not a (supported) stereo element type
get.element.types(mol)
get.element.types(mol)
mol |
A |
A factor of length equal in length to the number of atoms, indicating the element type
Rajarshi Guha [email protected]
get.stereocenters
, get.stereo.types
get.exact.mass
get.exact.mass(mol)
get.exact.mass(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
Fragment the input molecule using the Bemis-Murcko scheme
get.exhaustive.fragments(mols, min.frag.size = 6, as.smiles = TRUE)
get.exhaustive.fragments(mols, min.frag.size = 6, as.smiles = TRUE)
mols |
A list of 'jobjRef' objects of Java class 'IAtomContainer' |
min.frag.size |
The smallest fragment to consider (in terms of heavy atoms) |
as.smiles |
If 'TRUE' return the fragments as SMILES strings. If not, then fragments are returned as 'jobjRef' objects |
A variety of methods for fragmenting molecules are available ranging from exhaustive, rings to more specific methods such as Murcko frameworks. Fragmenting a collection of molecules can be a useful for a variety of analyses. In addition fragment based analysis can be a useful and faster alternative to traditional clustering of the whole collection, especially when it is large.
Note that exhaustive fragmentation of large molecules (with many single bonds) can become time consuming.
returns a list of length equal to the number of input molecules. Each element is a character vector of SMILES strings or a list of 'jobjRef' objects.
Rajarshi Guha ([email protected])
[get.murcko.fragments()]
mol <- parse.smiles('c1ccc(cc1)CN(c2cc(ccc2[N+](=O)[O-])c3c(nc(nc3CC)N)N)C')[[1]] mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=TRUE) mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=FALSE)
mol <- parse.smiles('c1ccc(cc1)CN(c2cc(ccc2[N+](=O)[O-])c3c(nc(nc3CC)N)N)C')[[1]] mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=TRUE) mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=FALSE)
'get.fingerprint' returns a 'fingerprint' object representing molecular fingerprint of the input molecule.
get.fingerprint( molecule, type = "standard", fp.mode = "bit", depth = 6, size = 1024, substructure.pattern = character(), circular.type = "ECFP6", verbose = FALSE )
get.fingerprint( molecule, type = "standard", fp.mode = "bit", depth = 6, size = 1024, substructure.pattern = character(), circular.type = "ECFP6", verbose = FALSE )
molecule |
A |
type |
The type of fingerprint. Possible values are:
|
fp.mode |
The style of fingerprint. Specifying "'bit'" will return a binary fingerprint, "'raw'" returns the the original representation (usually sequence of integers) and "'count'" returns the fingerprint as a sequence of counts. |
depth |
The search depth. This argument is ignored for the 'pubchem', 'maccs', 'kr' and 'estate' fingerprints |
size |
The final length of the fingerprint. This argument is ignored for the 'pubchem', 'maccs', 'kr', 'signature', 'circular' and 'estate' fingerprints |
substructure.pattern |
List of characters containing the SMARTS pattern to match. If the an empty list is provided (default) than the functional groups substructures (default in CDK) are used. |
circular.type |
Name of the circular fingerprint type that should be computed given as string. Possible values are: 'ECFP0', 'ECFP2', 'ECFP4', 'ECFP6' (default), 'FCFP0', 'FCFP2', 'FCFP4' and 'FCFP6'. |
verbose |
Verbose output if |
an S4 object of class fingerprint-class
or featvec-class
,
which can be manipulated with the fingerprint package.
Rajarshi Guha ([email protected])
## get some molecules sp <- get.smiles.parser() smiles <- c('CCC', 'CCN', 'CCN(C)(C)', 'c1ccccc1Cc1ccccc1','C1CCC1CC(CN(C)(C))CC(=O)CC') mols <- parse.smiles(smiles) ## get a single fingerprint using the standard ## (hashed, path based) fingerprinter fp <- get.fingerprint(mols[[1]]) ## get MACCS keys for all the molecules fps <- lapply(mols, get.fingerprint, type='maccs') ## get Signature fingerprint ## feature, count fingerprinter fps <- lapply(mols, get.fingerprint, type='signature', fp.mode='raw') ## get Substructure fingerprint for functional group fragments fps <- lapply(mols, get.fingerprint, type='substructure') ## get Substructure count fingerprint for user defined fragments mol1 <- parse.smiles("c1ccccc1CCC")[[1]] smarts <- c("c1ccccc1", "[CX4H3][#6]", "[CX2]#[CX2]") fps <- get.fingerprint(mol1, type='substructure', fp.mode='count', substructure.pattern=smarts) ## get ECFP0 count fingerprints mol2 <- parse.smiles("C1=CC=CC(=C1)CCCC2=CC=CC=C2")[[1]] fps <- get.fingerprint(mol2, type='circular', fp.mode='count', circular.type='ECFP0')
## get some molecules sp <- get.smiles.parser() smiles <- c('CCC', 'CCN', 'CCN(C)(C)', 'c1ccccc1Cc1ccccc1','C1CCC1CC(CN(C)(C))CC(=O)CC') mols <- parse.smiles(smiles) ## get a single fingerprint using the standard ## (hashed, path based) fingerprinter fp <- get.fingerprint(mols[[1]]) ## get MACCS keys for all the molecules fps <- lapply(mols, get.fingerprint, type='maccs') ## get Signature fingerprint ## feature, count fingerprinter fps <- lapply(mols, get.fingerprint, type='signature', fp.mode='raw') ## get Substructure fingerprint for functional group fragments fps <- lapply(mols, get.fingerprint, type='substructure') ## get Substructure count fingerprint for user defined fragments mol1 <- parse.smiles("c1ccccc1CCC")[[1]] smarts <- c("c1ccccc1", "[CX4H3][#6]", "[CX2]#[CX2]") fps <- get.fingerprint(mol1, type='substructure', fp.mode='count', substructure.pattern=smarts) ## get ECFP0 count fingerprints mol2 <- parse.smiles("C1=CC=CC(=C1)CCCC2=CC=CC=C2")[[1]] fps <- get.fingerprint(mol2, type='circular', fp.mode='count', circular.type='ECFP0')
Get the formal charge on the atom.
get.formal.charge(atom)
get.formal.charge(atom)
atom |
The atom to query |
By default the formal charge will be 0 (i.e., NULL
is never returned).
An integer representing the formal charge
Rajarshi Guha ([email protected])
obtain molecular formula from formula string
get.formula(mf, charge = 0)
get.formula(mf, charge = 0)
mf |
Required. Molecular formula |
charge |
Optional. Default |
Get the implicit hydrogen count for the atom.
get.hydrogen.count(atom)
get.hydrogen.count(atom)
atom |
The atom to query |
This method returns the number of implicit H's on the atom.
Depending on where the molecule was read from this may be NULL
or an integer
greater than or equal to 0
An integer representing the hydrogen count
Rajarshi Guha ([email protected])
Constructs an instance of the CDK IsotopePatternGenerator
, with an optional
minimum abundance specified. This object can be used to generate all combinatorial
chemical isotopes given a structure.
get.isotope.pattern.generator(minAbundance = NULL)
get.isotope.pattern.generator(minAbundance = NULL)
minAbundance |
The minimum abundance |
A jobjRef
corresponding to an instance of IsotopePatternGenerator
Miguel Rojas Cherto
http://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/formula/IsotopePatternGenerator.html
A method that returns an instance of the CDK IsotopePatternSimilarity
class which can be used to compute similarity scores between pairs of
isotope abundance patterns.
get.isotope.pattern.similarity(tol = NULL)
get.isotope.pattern.similarity(tol = NULL)
tol |
The tolerance |
A jobjRef
corresponding to an instance of IsotopePatternSimilarity
Miguel Rojas Cherto
http://cdk.github.io/cdk/1.5/docs/api/org/openscience/cdk/formula/IsotopePatternSimilarity.html
Generate the isotope pattern given a formula class
get.isotopes.pattern(formula, minAbund = 0.1)
get.isotopes.pattern(formula, minAbund = 0.1)
formula |
Required. A CDK molecule formula |
minAbund |
Optional. Default |
A molecule may be represented as a disconnected graph, such as when read in as a salt form. This method will return the larges connected component or if there is only a single component (i.e., the molecular graph is complete or fully connected), that component is returned.
get.largest.component(mol)
get.largest.component(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
The largest component as an 'IAtomContainer' object or else the input molecule itself
Rajarshi Guha ([email protected])
m <- parse.smiles("CC.CCCCCC.CCCC")[[1]] largest <- get.largest.component(m) length(get.atoms(largest)) == 6
m <- parse.smiles("CC.CCCCCC.CCCC")[[1]] largest <- get.largest.component(m) length(get.atoms(largest)) == 6
get.mcs
get.mcs(mol1, mol2, as.molecule = TRUE)
get.mcs(mol1, mol2, as.molecule = TRUE)
mol1 |
Required. First molecule to compare. Should be a 'jobjRef' representing an 'IAtomContainer' |
mol2 |
Required. Second molecule to compare. Should be a 'jobjRef' representing an 'IAtomContainer' |
as.molecule |
Optional. Default |
get.mol2formula
get.mol2formula(molecule, charge = 0)
get.mol2formula(molecule, charge = 0)
molecule |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
charge |
Optional. Default |
Fragment the input molecule using the Bemis-Murcko scheme
get.murcko.fragments( mols, min.frag.size = 6, as.smiles = TRUE, single.framework = FALSE )
get.murcko.fragments( mols, min.frag.size = 6, as.smiles = TRUE, single.framework = FALSE )
mols |
A list of 'jobjRef' objects of Java class 'IAtomContainer' |
min.frag.size |
The smallest fragment to consider (in terms of heavy atoms) |
as.smiles |
If 'TRUE' return the fragments as SMILES strings. If not, then fragments are returned as 'jobjRef' objects |
single.framework |
If 'TRUE', then a single framework (i.e., the framework consisting of the union of all ring systems and linkers) is returned for each molecule. Otherwise, all combinations of ring systems and linkers are returned |
A variety of methods for fragmenting molecules are available ranging from exhaustive, rings to more specific methods such as Murcko frameworks. Fragmenting a collection of molecules can be a useful for a variety of analyses. In addition fragment based analysis can be a useful and faster alternative to traditional clustering of the whole collection, especially when it is large.
Note that exhaustive fragmentation of large molecules (with many single bonds) can become time consuming.
Returns a list with each element being a list with two elements: 'rings' and 'frameworks'. Each of these elements is either a character vector of SMILES strings or a list of 'IAtomContainer' objects.
Rajarshi Guha ([email protected])
[get.exhuastive.fragments()]
mol <- parse.smiles('c1ccc(cc1)CN(c2cc(ccc2[N+](=O)[O-])c3c(nc(nc3CC)N)N)C')[[1]] mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=TRUE) mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=FALSE)
mol <- parse.smiles('c1ccc(cc1)CN(c2cc(ccc2[N+](=O)[O-])c3c(nc(nc3CC)N)N)C')[[1]] mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=TRUE) mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=FALSE)
get.natural.mass
get.natural.mass(mol)
get.natural.mass(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
Get the 2D coordinates of the atom.
get.point2d(atom)
get.point2d(atom)
atom |
The atom to query |
In case, coordinates are unavailable (e.g., molecule was read in from a SMILES file) or have not been generated yet, ‘NA'’s are returned for the X & Y coordinates.
A 2-element numeric vector representing the X & Y coordinates.
Rajarshi Guha ([email protected])
## Not run: atoms <- get.atoms(mol) coords <- do.call('rbind', lapply(apply, get.point2d)) ## End(Not run)
## Not run: atoms <- get.atoms(mol) coords <- do.call('rbind', lapply(apply, get.point2d)) ## End(Not run)
Get the 3D coordinates of the atom.
get.point3d(atom)
get.point3d(atom)
atom |
The atom to query |
In case, coordinates are unavailable (e.g., molecule was read in from a SMILES file) or have not been generated yet, ‘NA'’s are returned for the X, Y and Z coordinates.
A 3-element numeric vector representing the X, Y and Z coordinates.
Rajarshi Guha ([email protected])
## Not run: atoms <- get.atoms(mol) coords <- do.call('rbind', lapply(apply, get.point3d)) ## End(Not run)
## Not run: atoms <- get.atoms(mol) coords <- do.call('rbind', lapply(apply, get.point3d)) ## End(Not run)
In this context a property is a value associated with a key and stored with the molecule. This method returns a list of all the properties of a molecule. The names of the list are set to the property names.
get.properties(molecule)
get.properties(molecule)
molecule |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
A named 'list' with the property values. Element names are the keys for each property. If no properties have been defined, an empty list.
Rajarshi Guha ([email protected])
set.property
, get.property
, remove.property
mol <- parse.smiles("CC1CC(C=O)CCC1")[[1]] set.property(mol, 'prop1', 23.45) set.property(mol, 'prop2', 'inactive') get.properties(mol)
mol <- parse.smiles("CC1CC(C=O)CCC1")[[1]] set.property(mol, 'prop1', 23.45) set.property(mol, 'prop2', 'inactive') get.properties(mol)
This function retrieves the value of a keyed property that has previously been set on the molecule. Properties enable us to associate arbitrary pieces of data with a molecule. Such data can be text, numeric or a Java object (represented as a 'jobjRef').
get.property(molecule, key)
get.property(molecule, key)
molecule |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
key |
The property key as a character string |
The value of the property. If there is no property with the specified key, 'NA' is returned
Rajarshi Guha ([email protected])
mol <- parse.smiles("CC1CC(C=O)CCC1")[[1]] set.property(mol, 'prop1', 23.45) set.property(mol, 'prop2', 'inactive') get.property(mol, 'prop1')
mol <- parse.smiles("CC1CC(C=O)CCC1")[[1]] set.property(mol, 'prop1', 23.45) set.property(mol, 'prop2', 'inactive') get.property(mol, 'prop1')
The function will generate a SMILES representation of an 'IAtomContainer' object. The default parameters of the CDK SMILES generator are used. This can mean that for large ring systems the method may fail. See CDK Javadocs for more information
get.smiles(molecule, flavor = smiles.flavors(c("Generic")), smigen = NULL)
get.smiles(molecule, flavor = smiles.flavors(c("Generic")), smigen = NULL)
molecule |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
flavor |
The type of SMILES to generate. See |
smigen |
A pre-existing SMILES generator object. By default, a new one is created from the specified flavor |
A character string containing the generated SMILES
Rajarshi Guha ([email protected])
m <- parse.smiles('C1C=CCC1N(C)c1ccccc1')[[1]] get.smiles(m) get.smiles(m, smiles.flavors(c('Generic','UseAromaticSymbols')))
m <- parse.smiles('C1C=CCC1N(C)c1ccccc1')[[1]] get.smiles(m) get.smiles(m, smiles.flavors(c('Generic','UseAromaticSymbols')))
This function returns a reference to a SMILES parser
object. If you are parsing multiple SMILES strings using multiple
calls to parse.smiles
, it is
preferable to create your own parser and supply it to
parse.smiles
rather than forcing that function
to instantiate a new parser for each call
get.smiles.parser()
get.smiles.parser()
A 'jobjRef' object corresponding to the CDK SmilesParser class
Rajarshi Guha ([email protected])
Supported stereo center types are
the atom has constitutionally different neighbors
the atom resembles a stereo centre but has constitutionally equivalent neighbors (e.g. inositol, decalin). The stereocenter depends on the configuration of one or more stereocenters.
the atom can supported stereo chemistry but has not be shown ot be a true or para center
the atom is not a stereocenter (e.g. methane)
get.stereo.types(mol)
get.stereo.types(mol)
mol |
A |
A factor of length equal in length to the number of atoms indicating the stereocenter type.
Rajarshi Guha [email protected]
get.stereocenters
, get.element.types
This method identifies stereocenters based on connectivity.
get.stereocenters(mol)
get.stereocenters(mol)
mol |
A |
A logical vector of length equal in length to the number of atoms. The i'th element is TRUE
if the i'th element is identified as a stereocenter
Rajarshi Guha [email protected]
get.element.types
, get.stereo.types
Get the atomic symbol of the atom.
get.symbol(atom)
get.symbol(atom)
atom |
The atom to query |
A character representing the atomic symbol
Rajarshi Guha ([email protected])
Some molecules may not have a title (such as when parsing in a SMILES with not title).
get.title(mol)
get.title(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
A character string with the title, 'NA' is no title is specified
Rajarshi Guha ([email protected])
get.total.charge
get.total.charge(mol)
get.total.charge(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
get.total.formal.charge
get.total.formal.charge(mol)
get.total.formal.charge(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
Counts the number of hydrogens on the provided molecule. As this method will sum all implicit hydrogens on each atom it is important to ensure the molecule has already been configured (and thus each atom has an implicit hydrogen count).
get.total.hydrogen.count(mol)
get.total.hydrogen.count(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
An integer representing the total number of implicit hydrogens
Rajarshi Guha ([email protected])
get.hydrogen.count
, remove.hydrogens
Compute TPSA for a molecule
get.tpsa(molecule)
get.tpsa(molecule)
molecule |
A molecule object |
A double value representing the TPSA value
Rajarshi Guha ([email protected])
This method does not require 3D coordinates. As a result its an approximation
get.volume(molecule)
get.volume(molecule)
molecule |
A molecule object |
A double value representing the volume
Rajarshi Guha ([email protected])
Compute XLogP for a molecule
get.xlogp(molecule)
get.xlogp(molecule)
molecule |
A molecule object |
A double value representing the XLogP value
Rajarshi Guha ([email protected])
The CDK can read a variety of molecular structure formats. Some file
formats support multiple molecules in a single file. If read using
load.molecules
, all are read into memory. For very large
structure files, this can lead to out of memory errors. Instead it is
recommended to use the iterating version of the loader so that only a
single molecule is read at a time.
iload.molecules( molfile, type = "smi", aromaticity = TRUE, typing = TRUE, isotopes = TRUE, skip = TRUE )
iload.molecules( molfile, type = "smi", aromaticity = TRUE, typing = TRUE, isotopes = TRUE, skip = TRUE )
molfile |
A string containing the filename to load. Must be a local file |
type |
Indicates whether the input file is SMILES or SDF. Valid values are '"smi"' or '"sdf"' |
aromaticity |
If 'TRUE' then aromaticity detection is performed on all loaded molecules. If this fails for a given molecule, then the molecule is set to 'NA' in the return list |
typing |
If 'TRUE' then atom typing is performed on all loaded molecules. The assigned types will be CDK internal types. If this fails for a given molecule, then the molecule is set to 'NA' in the return list |
isotopes |
If 'TRUE' then atoms are configured with isotopic masses |
skip |
If 'TRUE', then the reader will continue reading even when faced with an invalid molecule. If 'FALSE', the reader will stop at the fist invalid molecule |
Note that the iterating loader only supports SDF and SMILES file formats.
Rajarshi Guha ([email protected])
write.molecules
, load.molecules
, parse.smiles
## Not run: moliter <- iload.molecules("big.sdf", type="sdf") while(hasNext(moliter)) { mol <- nextElem(moliter) print(get.property(mol, "cdk:Title")) } ## End(Not run)
## Not run: moliter <- iload.molecules("big.sdf", type="sdf") while(hasNext(moliter)) { mol <- nextElem(moliter) print(get.property(mol, "cdk:Title")) } ## End(Not run)
Tests whether an atom is aliphatic.
is.aliphatic(atom)
is.aliphatic(atom)
atom |
The atom to test |
This assumes that the molecule containing the atom has been appropriately configured.
'TRUE' is the atom is aliphatic, 'FALSE' otherwise
Rajarshi Guha ([email protected])
Tests whether an atom is aromatic.
is.aromatic(atom)
is.aromatic(atom)
atom |
The atom to test |
This assumes that the molecule containing the atom has been appropriately configured.
'TRUE' is the atom is aromatic, 'FALSE' otherwise
Rajarshi Guha ([email protected])
is.aliphatic
, is.in.ring
, do.aromaticity
A single molecule will be represented as a complete graph. In some cases, such as for molecules in salt form, or after certain operations such as bond splits, the molecular graph may contained disconnected components. This method can be used to tested whether the molecule is complete (i.e. fully connected).
is.connected(mol)
is.connected(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
'TRUE' if molecule is complete, 'FALSE' otherwise
Rajarshi Guha ([email protected])
m <- parse.smiles("CC.CCCCCC.CCCC")[[1]] is.connected(m)
m <- parse.smiles("CC.CCCCCC.CCCC")[[1]] is.connected(m)
Tests whether an atom is in a ring.
is.in.ring(atom)
is.in.ring(atom)
atom |
The atom to test |
This assumes that the molecule containing the atom has been appropriately configured.
'TRUE' is the atom is in a ring, 'FALSE' otherwise
Rajarshi Guha ([email protected])
The test checks whether all atoms in the molecule have a formal charge of 0.
is.neutral(mol)
is.neutral(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
'TRUE' if molecule is neutral, 'FALSE' otherwise
Rajarshi Guha ([email protected])
Validate a cdkFormula.
isvalid.formula(formula, rule = c("nitrogen", "RDBE"))
isvalid.formula(formula, rule = c("nitrogen", "RDBE"))
formula |
Required. A CDK Formula |
rule |
Optional. Default |
The CDK can read a variety of molecular structure formats. This function encapsulates the calls to the CDK API to load a structure given its filename or a URL to a structure file.
load.molecules( molfiles = NA, aromaticity = TRUE, typing = TRUE, isotopes = TRUE, verbose = FALSE )
load.molecules( molfiles = NA, aromaticity = TRUE, typing = TRUE, isotopes = TRUE, verbose = FALSE )
molfiles |
A 'character' vector of filenames. Note that the full path to the files should be provided. URL's can also be used as paths. In such a case, the URL should start with "http://" |
aromaticity |
If 'TRUE' then aromaticity detection is performed on all loaded molecules. If this fails for a given molecule, then the molecule is set to 'NA' in the return list |
typing |
If 'TRUE' then atom typing is performed on all loaded molecules. The assigned types will be CDK internal types. If this fails for a given molecule, then the molecule is set to 'NA' in the return list |
isotopes |
If 'TRUE' then atoms are configured with isotopic masses |
verbose |
If 'TRUE', output (such as file download progress) will be bountiful |
Note that this method will load all molecules into memory. For files containing tens of thousands of molecules this may lead to out of memory errors. In such situations consider using the iterating file readers.
Note that if molecules are read in from formats that do not have rules for
handling implicit hydrogens (such as MDL MOL), the molecule will not have
implicit or explicit hydrogens. To add explicit hydrogens, make sure that the molecule
has been typed (this is 'TRUE' by default for this function) and then call
convert.implicit.to.explicit
. On the other hand for a format
such as SMILES, implicit or explicit hydrogens will be present.
A 'list' of CDK 'IAtomContainer' objects, represented as 'jobjRef' objects in R, which can be used in other 'rcdk' functions
Rajarshi Guha ([email protected])
write.molecules
, parse.smiles
, iload.molecules
## Not run: sdffile <- system.file("molfiles/dhfr00008.sdf", package="rcdk") mols <- load.molecules(c('mol1.sdf', 'mol2.smi', sdfile)) ## End(Not run)
## Not run: sdffile <- system.file("molfiles/dhfr00008.sdf", package="rcdk") mols <- load.molecules(c('mol1.sdf', 'mol2.smi', sdfile)) ## End(Not run)
matches
matches(query, target, return.matches = FALSE)
matches(query, target, return.matches = FALSE)
query |
Required. A SMARTSQuery |
target |
Required. The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
return.matches |
Optional. Default |
Various functions to perform operations on molecules.
get.exact.mass
returns the exact mass of a molecule
get.natural.mass
returns the natural exact mass of a molecule
convert.implicit.to.explicit
converts implicit hydrogens to
explicit hydrogens. This function does not return any value but rather
modifies the molecule object passed to it
is.neutral
returns TRUE
if all atoms in the molecule have
a formal charge of 0
, otherwise FALSE
In some cases, a molecule may not have any hydrogens (such as when read in
from an MDL MOLfile that did not have hydrogens). In such cases,
convert.implicit.to.explicit
will add implicit hydrogens and
then convert them to explicit ones. In addition, for such cases, make sure
that the molecule has been typed beforehand.
get.exact.mass(mol) get.natural.mass(mol) convert.implicit.to.explicit(mol) is.neutral(mol)
mol A jobjRef representing an IAtomContainer or IMolecule object
get.exact.mass
returns a numeric
get.natural.mass
returns a numeric
convert.implicit.to.explicit
has no return value
is.neutral
returns a boolean.
Rajarshi Guha ([email protected])
This function parses a vector of SMILES strings to generate a list of
'IAtomContainer' objects. Note that the resultant molecule will
not have any 2D or 3D coordinates.
Note that the molecules obtained from this method will not have any
aromaticity perception (unless aromatic symbols are encountered, in which
case the relevant atoms are automatically set to aromatic), atom typing or
isotopic configuration done on them. This is in contrast to the
load.molecules
method. Thus, you should
perform these steps manually on the molecules.
parse.smiles(smiles, kekulise = TRUE, omit.nulls = FALSE, smiles.parser = NULL)
parse.smiles(smiles, kekulise = TRUE, omit.nulls = FALSE, smiles.parser = NULL)
smiles |
A single SMILES string or a vector of SMILES strings |
kekulise |
If set to 'FALSE' disables electron checking and allows for parsing of incorrect SMILES. If a SMILES does not parse by default, try setting this to 'FALSE' - though the resultant molecule may not have consistent bonding. As an example, 'c4ccc2c(cc1=Nc3ncccc3(Cn12))c4' will not be parsed by default because it is missing a nitrogen. With this argument set to 'FALSE' it will parse successfully, but this is a hack to handle an incorrect SMILES |
omit.nulls |
If set to 'TRUE', omits SMILES which were parsed as 'NULL' |
smiles.parser |
A SMILES parser object obtained from |
A 'list' of 'jobjRef's to their corresponding CDK 'IAtomContainer' objects. If a SMILES string could not be parsed and 'omit.nulls=TRUE' it is omited from the output list.
Rajarshi Guha ([email protected])
These functions are provided for compatibility with older version of the phyloseq package. They may eventually be completely removed.
deprecated_rcdk_function(x, value, ...)
deprecated_rcdk_function(x, value, ...)
x |
For assignment operators, the object that will undergo a replacement (object inside parenthesis). |
value |
For assignment operators, the value to replace with (the right side of the assignment). |
... |
For functions other than assignment operators, parameters to be passed to the modern version of the function (see table). |
do.typing |
now a synonym for set.atom.types
|
Create an copy of the original structure with explicit hydrogens removed. Stereochemistry is updated but up and down bonds in a depiction may need to be recalculated. This can also be useful for descriptor calculations.
remove.hydrogens(mol)
remove.hydrogens(mol)
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
A copy of the original molecule, with explicit hydrogens removed
Rajarshi Guha ([email protected])
get.hydrogen.count
, get.total.hydrogen.count
In this context a property is a value associated with a key and stored with the molecule. This methd will remove the property defined by the key. If there is such key, a warning is raised.
remove.property(molecule, key)
remove.property(molecule, key)
molecule |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
key |
The property key as a character string |
Rajarshi Guha ([email protected])
set.property
, get.property
, get.properties
mol <- parse.smiles("CC1CC(C=O)CCC1")[[1]] set.property(mol, 'prop1', 23.45) set.property(mol, 'prop2', 'inactive') get.properties(mol) remove.property(mol, 'prop2') get.properties(mol)
mol <- parse.smiles("CC1CC(C=O)CCC1")[[1]] set.property(mol, 'prop1', 23.45) set.property(mol, 'prop2', 'inactive') get.properties(mol) remove.property(mol, 'prop2') get.properties(mol)
Set the CDK atom types for all atoms in the molecule.
set.atom.types(mol)
set.atom.types(mol)
mol |
The molecule whose atoms should be typed |
Calling this method will overwrite any pre-existing type information. Currently there is no way to choose other atom typing schemes
Nothing is returned, the molecule is modified in place
Rajarshi Guha ([email protected])
Set the charge to a cdkFormula function.
set.charge.formula(formula, charge = -1)
set.charge.formula(formula, charge = -1)
formula |
Required. Molecular formula |
charge |
Optional. Default |
This function sets the value of a keyed property on the molecule. Properties enable us to associate arbitrary pieces of data with a molecule. Such data can be text, numeric or a Java object (represented as a 'jobjRef').
set.property(molecule, key, value)
set.property(molecule, key, value)
molecule |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
key |
The property key as a character string |
value |
The value of the property. This can be a character, numeric or 'jobjRef' R object |
Rajarshi Guha ([email protected])
get.property
, get.properties
, remove.property
mol <- parse.smiles("CC1CC(C=O)CCC1")[[1]] set.property(mol, 'prop1', 23.45) set.property(mol, 'prop2', 'inactive') get.property(mol, 'prop1')
mol <- parse.smiles("CC1CC(C=O)CCC1")[[1]] set.property(mol, 'prop1', 23.45) set.property(mol, 'prop2', 'inactive') get.property(mol, 'prop1')
Set the title of the molecule.
set.title(mol, title = "")
set.title(mol, title = "")
mol |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
title |
The title of the molecule as a character string. This will overwrite any pre-existing title. The default value is an empty string. |
Rajarshi Guha ([email protected])
The CDK supports a variety of customizations for SMILES generation including the use of lower case symbols for aromatic compounds to the use of the ChemAxon CxSmiles format. Each 'flavor' is represented by an integer and multiple customizations are bitwise OR'ed. This method accepts the names of one or more customizations and returns the bitwise OR of them. See CDK documentation for the list of flavors and what they mean.
smiles.flavors(flavors = c("Generic"))
smiles.flavors(flavors = c("Generic"))
flavors |
A character vector of flavors. The default is
|
A numeric representing the bitwise 'OR“ of the specified flavors
Rajarshi Guha [email protected]
m <- parse.smiles('C1C=CCC1N(C)c1ccccc1')[[1]] get.smiles(m) get.smiles(m, smiles.flavors(c('Generic','UseAromaticSymbols'))) m <- parse.smiles("OS(=O)(=O)c1ccc(cc1)C(CC)CC |Sg:n:13:m:ht,Sg:n:11:n:ht|")[[1]] get.smiles(m,flavor = smiles.flavors(c("CxSmiles"))) get.smiles(m,flavor = smiles.flavors(c("CxSmiles","UseAromaticSymbols")))
m <- parse.smiles('C1C=CCC1N(C)c1ccccc1')[[1]] get.smiles(m) get.smiles(m, smiles.flavors(c('Generic','UseAromaticSymbols'))) m <- parse.smiles("OS(=O)(=O)c1ccc(cc1)C(CC)CC |Sg:n:13:m:ht,Sg:n:11:n:ht|")[[1]] get.smiles(m,flavor = smiles.flavors(c("CxSmiles"))) get.smiles(m,flavor = smiles.flavors(c("CxSmiles","UseAromaticSymbols")))
view.image.2d
view.image.2d(molecule, depictor = NULL)
view.image.2d(molecule, depictor = NULL)
molecule |
The molecule to display Should be a 'jobjRef' representing an 'IAtomContainer' |
depictor |
Default |
Create a 2D depiction of a molecule. If there are more than
one molecules supplied, return a grid woth ncol
columns,.
view.molecule.2d( molecule, ncol = 4, width = 200, height = 200, depictor = NULL )
view.molecule.2d( molecule, ncol = 4, width = 200, height = 200, depictor = NULL )
molecule |
The molecule to query. Should be a 'jobjRef' representing an 'IAtomContainer' |
ncol |
Default |
width |
Default |
height |
Default |
depictor |
Default |
Create a tabular view of a set of molecules (in 2D) and associated data columns
view.table(molecules, dat, depictor = NULL)
view.table(molecules, dat, depictor = NULL)
molecules |
A list of molecule objects ('jobjRef' representing an 'IAtomContainer') |
dat |
The |
depictor |
Default |
This function writes one or more molecules to an SD file on disk, which can be of the single- or multi-molecule variety. In addition, if the molecule has keyed properties, they can also be written out as SD tags.
write.molecules(mols, filename, together = TRUE, write.props = FALSE)
write.molecules(mols, filename, together = TRUE, write.props = FALSE)
mols |
A 'list' of 'jobjRef' objects representing 'IAtomContainer' objects |
filename |
The name of the SD file to write. Note that if 'together' is 'FALSE' then this argument is taken as a prefix for the name of the individual files |
together |
If 'TRUE' then all the molecules are written to a single SD file. If 'FALSE' each molecule is written to an individual file |
write.props |
If 'TRUE', keyed properties are included in the SD file output |
In case individual SD files are desired the
together
argument can be set ot FALSE
. In this case, the
value of filename
is used as a prefix, to which a numeric
identifier and the suffix of ".sdf" is appended.
Rajarshi Guha ([email protected])
load.molecules
, parse.smiles
, iload.molecules