Title: | Biological Entity Dictionary (BED) |
---|---|
Description: | An interface for the 'Neo4j' database providing mapping between different identifiers of biological entities. This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information. The method has been published by Godard and van Eyll (2018) <doi:10.12688/f1000research.13925.3>. |
Authors: | Patrice Godard [aut, cre, cph] |
Maintainer: | Patrice Godard <[email protected]> |
License: | GPL-3 |
Version: | 1.6.0 |
Built: | 2024-12-11 06:44:53 UTC |
Source: | CRAN |
An interface for the neo4j database providing mapping between different identifiers of biological entities. This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information.
Available database instance: https://github.com/patzaw/BED#bed-database-instance-available-as-a-docker-image
Building a database instance: https://github.com/patzaw/BED#build-a-bed-database-instance
Repository: https://github.com/patzaw/BED
Bug reports: https://github.com/patzaw/BED/issues
Patrice Godard
Call a function on the BED graph
bedCall(f, ..., bedCheck = FALSE)
bedCall(f, ..., bedCheck = FALSE)
f |
the function to call |
... |
params for f |
bedCheck |
check if a connection to BED exists (default: FALSE). |
The output of the called function.
## Not run: result <- bedCall( cypher, query=prepCql( 'MATCH (n:BEID)', 'WHERE n.value IN $values', 'RETURN n.value AS value, n.labels, n.database' ), parameters=list(values=c("10", "100")) ) ## End(Not run)
## Not run: result <- bedCall( cypher, query=prepCql( 'MATCH (n:BEID)', 'WHERE n.value IN $values', 'RETURN n.value AS value, n.labels, n.database' ), parameters=list(values=c("10", "100")) ) ## End(Not run)
Not exported to avoid unintended modifications of the DB.
bedImport(cql, toImport, periodicCommit = 10000, ...)
bedImport(cql, toImport, periodicCommit = 10000, ...)
cql |
the CQL query to be applied on each row of toImport |
toImport |
the data.frame to be imported as "row". Use "row.FIELD" in the cql query to refer to one FIELD of the toImport data.frame |
periodicCommit |
use periodic commit when loading the data (default: 1000). |
... |
additional parameters for bedCall |
the results of the query
bedCall, neo2R::import_from_df
Create a BEIDList
BEIDList(l, metadata, scope)
BEIDList(l, metadata, scope)
l |
a named list of BEID vectors |
metadata |
a data.frame with rownames or a column ".lname" all in names of l. If missing, the metadata is constructed with .lname being the names of l. |
scope |
a list with 3 character vectors of length one named "be", "source" and "organism". If missing, it is guessed from l. |
A BEIDList object which is a list of BEID vectors with 2 additional attributes:
metadata: a data.frame with metadata about list elements. The ".lname" column correspond to the names of the BEIDList.
scope: the BEID scope ("be", "source" and "organism")
## Not run: bel <- BEIDList( l=list( kinases=c("117283", "3706", "3707", "51447", "80271", "9807"), phosphatases=c( "130367", "249", "283871", "493911", "57026", "5723", "81537" ) ), scope=list(be="Gene", source="EntrezGene", organism="Homo sapiens") ) scope(bel) metadata(bel) metadata(bel) <- dplyr::mutate( metadata(bel), "description"=c("A few kinases", "A few phosphatases") ) metadata(bel) ## End(Not run)
## Not run: bel <- BEIDList( l=list( kinases=c("117283", "3706", "3707", "51447", "80271", "9807"), phosphatases=c( "130367", "249", "283871", "493911", "57026", "5723", "81537" ) ), scope=list(be="Gene", source="EntrezGene", organism="Homo sapiens") ) scope(bel) metadata(bel) metadata(bel) <- dplyr::mutate( metadata(bel), "description"=c("A few kinases", "A few phosphatases") ) metadata(bel) ## End(Not run)
Get the BEIDs from an object
BEIDs(x, ...)
BEIDs(x, ...)
x |
an object representing a collection of BEID (e.g. BEIDList) |
... |
method specific parameters |
A tibble with at least 4 columns:
value
be
source
organism
...
Shiny module for searching BEIDs
beidsServer( id, toGene = TRUE, excludeTechID = FALSE, multiple = FALSE, beOfInt = NULL, selectBe = TRUE, orgOfInt = NULL, selectOrg = TRUE, oneColumn = FALSE, withId = FALSE, maxHits = 75, compact = FALSE, tableHeight = 150, highlightStyle = "", highlightClass = "bed-search" ) beidsUI(id)
beidsServer( id, toGene = TRUE, excludeTechID = FALSE, multiple = FALSE, beOfInt = NULL, selectBe = TRUE, orgOfInt = NULL, selectOrg = TRUE, oneColumn = FALSE, withId = FALSE, maxHits = 75, compact = FALSE, tableHeight = 150, highlightStyle = "", highlightClass = "bed-search" ) beidsUI(id)
id |
an identifier for the module instance |
toGene |
focus on gene entities (default=TRUE): matches from other BE are converted to genes. |
excludeTechID |
do not display BED technical BEIDs |
multiple |
allow multiple selections (default=FALSE) |
beOfInt |
if toGene==FALSE, BE to consider (default=NULL ==> all) |
selectBe |
if toGene==FALSE, display an interface for selecting BE |
orgOfInt |
organism to consider (default=NULL ==> all) |
selectOrg |
display an interface for selecting organisms |
oneColumn |
if TRUE the hits are displayed in only one column |
withId |
if FALSE and one column, the BEIDs are not shown |
maxHits |
maximum number of raw hits to return |
compact |
compact display (default: FALSE) |
tableHeight |
height of the result table (default: 150) |
highlightStyle |
style to apply to the text to highlight |
highlightClass |
class to apply to the text to highlight |
A reactive data.frame with the following columns:
beid: the BE identifier
preferred: preferred identifier for the same BE in the same scope
be: the type of biological entity
source: the source of the identifier
organism: the BE organism
entity: internal identifier of the BE
match: the matching character string
beidsUI()
:
## Not run: library(shiny) library(BED) library(DT) ui <- fluidPage( beidsUI("be"), fluidRow( column( 12, tags$br(), h3("Selected gene entities"), DTOutput("result") ) ) ) server <- function(input, output){ found <- beidsServer("be", toGene=TRUE, multiple=TRUE, tableHeight=250) output$result <- renderDT({ req(found()) toRet <- found() datatable(toRet, rownames=FALSE) }) } shinyApp(ui = ui, server = server) ## End(Not run)
## Not run: library(shiny) library(BED) library(DT) ui <- fluidPage( beidsUI("be"), fluidRow( column( 12, tags$br(), h3("Selected gene entities"), DTOutput("result") ) ) ) server <- function(input, output){ found <- beidsServer("be", toGene=TRUE, multiple=TRUE, tableHeight=250) output$result <- renderDT({ req(found()) toRet <- found() datatable(toRet, rownames=FALSE) }) } shinyApp(ui = ui, server = server) ## End(Not run)
Find all BEID and ProbeID corresponding to a BE
beIDsToAllScopes( beids, be, source, organism, entities = NULL, canonical_symbols = TRUE )
beIDsToAllScopes( beids, be, source, organism, entities = NULL, canonical_symbols = TRUE )
beids |
a character vector of gene identifiers |
be |
one BE. Guessed if not provided |
source |
the source of gene identifiers. Guessed if not provided |
organism |
the gene organism. Guessed if not provided |
entities |
a numeric vector of gene entity. If NULL (default), beids, source and organism arguments are used to identify BEs. Be carefull when using entities as these identifiers are not stable. |
canonical_symbols |
return only canonical symbols (default: TRUE). |
A data.frame with the following fields:
value: the identifier
be: the type of BE
source: the source of the identifier
organism: the BE organism
symbol: canonical symbol of the identifier
BE_entity: the BE entity input
BEID (optional): the BE ID input
BE_source (optional): the BE source input
This function calls neo4j DB the first time a query is sent and puts the result in the cache SQLite database. The next time the same query is called, it loads the results directly from cache SQLite database.
cacheBedCall(..., tn, recache = FALSE)
cacheBedCall(..., tn, recache = FALSE)
... |
params for bedCall |
tn |
the name of the cached table |
recache |
boolean indicating if the CQL query should be run even if the table is already in cache |
Use only with "row" result returned by DB request.
Internal use.
The results of the bedCall.
Internal use
cacheBedResult(value, name)
cacheBedResult(value, name)
value |
the result to cache |
name |
the name of the query |
This function checks information recorded into BED cache and resets it if not relevant.
checkBedCache(newCon = FALSE)
checkBedCache(newCon = FALSE)
newCon |
if TRUE for the loading of the system information file |
Internal use.
Check if there is a connection to a BED database
checkBedConn(verbose = FALSE)
checkBedConn(verbose = FALSE)
verbose |
if TRUE print information about the BED connection (default: FALSE). |
TRUE if the connection can be established
Or FALSE if the connection cannot be established or the "System" node does not exist or does not have "BED" as name or any version recorded.
This function takes a vector of identifiers and verify if they can be found in the provided source database according to the BE type and the organism of interest. If an ID is in the DB but not linked directly nor indirectly to any entity then it is considered as not found.
checkBeIds(ids, be, source, organism, stopThr = 1, caseSensitive = FALSE)
checkBeIds(ids, be, source, organism, stopThr = 1, caseSensitive = FALSE)
ids |
a vector of identifiers to be checked |
be |
biological entity. See getBeIds. Guessed if not provided |
source |
source of the ids. See getBeIds. Guessed if not provided |
organism |
the organism of interest. See getBeIds. Guessed if not provided |
stopThr |
proportion of non-recognized IDs above which an error is thrown. Default: 1 ==> no check |
caseSensitive |
if FALSE (default) the case is not taken into account when checking ids. |
invisible(TRUE). Stop if too many (see stopThr parameter) ids are not found. Warning if any id is not found.
getBeIds, listBeIdSources, getAllBeIdSources
## Not run: checkBeIds( ids=c("10", "100"), be="Gene", source="EntrezGene", organism="human" ) checkBeIds( ids=c("10", "100"), be="Gene", source="Ens_gene", organism="human" ) ## End(Not run)
## Not run: checkBeIds( ids=c("10", "100"), be="Gene", source="EntrezGene", organism="human" ) checkBeIds( ids=c("10", "100"), be="Gene", source="Ens_gene", organism="human" ) ## End(Not run)
Not exported to avoid unintended modifications of the DB.
cleanDubiousXRef(d, strict = TRUE)
cleanDubiousXRef(d, strict = TRUE)
d |
a cross-reference data.frame with 2 columns. |
strict |
if TRUE (default), the function returns only unambiguous mappings |
This function returns d without dubious cross-references. Issues are reported in attr(d, "issues").
Clear the BED cache SQLite database
clearBedCache(queries = NULL, force = FALSE, hard = FALSE, verbose = FALSE)
clearBedCache(queries = NULL, force = FALSE, hard = FALSE, verbose = FALSE)
queries |
a character vector of the names of queries to remove. If NULL all queries are removed. |
force |
if TRUE clear the BED cache table even if cache file is not found |
hard |
if TRUE remove everything in cache without checking file names |
verbose |
display some information during the process |
Compare 2 BED database instances
compareBedInstances(connections)
compareBedInstances(connections)
connections |
a numeric vector of length 1 or 2 providing connections from lsBedConnections to be compared. |
The current connection is restored when exiting this function.
If only one connection is provided, the function returns a list with information about BEID and platforms available for the connection along with DB version information. If two connections are provided the same information as above is provided for the 2 connection named V1 and V2 in that order. In addition, differences observed between the 2 instances are reported for BEID and platforms.
Connect to a neo4j BED database
connectToBed( url = NULL, username = NULL, password = NULL, connection = 1, remember = FALSE, useCache = NA, importPath = NULL, .opts = list() )
connectToBed( url = NULL, username = NULL, password = NULL, connection = 1, remember = FALSE, useCache = NA, importPath = NULL, .opts = list() )
url |
a character string. The host and the port are sufficient (e.g: "localhost:5454") |
username |
a character string |
password |
a character string |
connection |
the id of the connection already registered to use. By default the first registered connection is used. |
remember |
if TRUE connection information is saved localy in a file and used to automatically connect the next time. The default is set to FALSE. All the connections that have been saved can be listed with lsBedConnections and any of them can be forgotten with forgetBedConnection. |
useCache |
if TRUE the results of large queries can be saved locally in a file. The default is FALSE for policy reasons. But it is recommended to set it to TRUE to improve the speed of recurrent queries. If NA (default parameter) the value is taken from former connection if it exists or it is set to FALSE. |
importPath |
the path to the import folder for loading information in BED (used only when feeding the database ==> default: NULL) |
.opts |
a named list identifying the curl
options for the handle (see |
Be careful that you should reconnect to BED database each time
the environment is reloaded. It is done automatically if remember
is
set to TRUE.
Information about how to get an instance of the BED 'Neo4j' database is provided here:
This function does not return any value. It prepares the BED environment to allow transparent DB calls.
checkBedConn, lsBedConnections, forgetBedConnection
Converts lists of BE IDs
convBeIdLists(idList, entity = FALSE, ...)
convBeIdLists(idList, entity = FALSE, ...)
idList |
a list of IDs lists |
entity |
if TRUE returns BE instead of BEID (default: FALSE). BE CAREFUL, THIS INTERNAL ID IS NOT STABLE AND CANNOT BE USED AS A REFERENCE. This internal identifier is useful to avoid biases related to identifier redundancy. See <../doc/BED.html#3_managing_identifiers> |
... |
params for the convBeIds function |
A list of convBeIds ouput ids.
Scope ("be", "source" "organism" and "entity" (see Arguments))
is provided as a named list
in the "scope" attributes: attr(x, "scope")
## Not run: convBeIdLists( idList=list(a=c("10", "100"), b=c("1000")), from="Gene", from.source="EntrezGene", from.org="human", to.source="Ens_gene" ) ## End(Not run)
## Not run: convBeIdLists( idList=list(a=c("10", "100"), b=c("1000")), from="Gene", from.source="EntrezGene", from.org="human", to.source="Ens_gene" ) ## End(Not run)
Converts BE IDs
convBeIds( ids, from, from.source, from.org, to, to.source, to.org, caseSensitive = FALSE, canonical = FALSE, prefFilter = FALSE, restricted = TRUE, recache = FALSE, limForCache = 2000 )
convBeIds( ids, from, from.source, from.org, to, to.source, to.org, caseSensitive = FALSE, canonical = FALSE, prefFilter = FALSE, restricted = TRUE, recache = FALSE, limForCache = 2000 )
ids |
list of identifiers |
from |
a character corresponding to the biological entity or Probe. Guessed if not provided |
from.source |
a character corresponding to the ID source. Guessed if not provided |
from.org |
a character corresponding to the organism. Guessed if not provided |
to |
a character corresponding to the biological entity or Probe |
to.source |
a character corresponding to the ID source |
to.org |
a character corresponding to the organism |
caseSensitive |
if TRUE the case of provided symbols is taken into account during search. This option will only affect the conversion from "Symbol" (default: caseSensitive=FALSE). All the other conversion will be case sensitive. |
canonical |
if TRUE, only returns the canonical "Symbol". (default: FALSE) |
prefFilter |
boolean indicating if the results should be filter to keep only preferred BEID of BE when they exist (default: FALSE). If there are several preferred BEID of a BE, all are kept. If there are no preferred BEID of a BE, all non-preferred BEID are kept. |
restricted |
boolean indicating if the results should be restricted to current version of to BEID db. If FALSE former BEID are also returned: Depending on history it can take a very long time to return a very large result! |
recache |
a logical value indicating if the results should be taken from cache or recomputed |
limForCache |
if there are more ids than limForCache. Results are collected for all IDs (beyond provided ids) and cached for futur queries. If not, results are collected only for provided ids and not cached. |
a data.frame with the following columns:
from: the input IDs
to: the corresponding IDs in to.source
to.preferred: boolean indicating if the to ID is a preferred ID for the corresponding entity.
to.entity: the entity technical ID of the to
IDs
This data.frame can be filtered in order to remove duplicated
from/to.entity associations which can lead information bias.
Scope ("be", "source" and "organism") is provided as a named list
in the "scope" attributes: attr(x, "scope")
getBeIdConvTable, convBeIdLists, convDfBeIds
## Not run: oriId <- c("10", "100") convBeIds( ids=oriId, from="Gene", from.source="EntrezGene", from.org="human", to.source="Ens_gene" ) convBeIds( ids=oriId, from="Gene", from.source="EntrezGene", from.org="human", to="Peptide", to.source="Ens_translation" ) convBeIds( ids=oriId, from="Gene", from.source="EntrezGene", from.org="human", to="Peptide", to.source="Ens_translation", to.org="mouse" ) ## End(Not run)
## Not run: oriId <- c("10", "100") convBeIds( ids=oriId, from="Gene", from.source="EntrezGene", from.org="human", to.source="Ens_gene" ) convBeIds( ids=oriId, from="Gene", from.source="EntrezGene", from.org="human", to="Peptide", to.source="Ens_translation" ) convBeIds( ids=oriId, from="Gene", from.source="EntrezGene", from.org="human", to="Peptide", to.source="Ens_translation", to.org="mouse" ) ## End(Not run)
Add BE ID conversion to a data frame
convDfBeIds(df, idCol = NULL, entity = FALSE, ...)
convDfBeIds(df, idCol = NULL, entity = FALSE, ...)
df |
the data.frame to be converted |
idCol |
the column in which ID to convert are. If NULL (default) the row names are taken. |
entity |
if TRUE returns BE instead of BEID (default: FALSE). BE CAREFUL, THIS INTERNAL ID IS NOT STABLE AND CANNOT BE USED AS A REFERENCE. This internal identifier is useful to avoid biases related to identifier redundancy. See ../doc/BED.html#3_managing_identifiers |
... |
params for the convBeIds function |
A data.frame with converted IDs.
Scope ("be", "source", "organism" and "entity" (see Arguments))
is provided as a named list
in the "scope" attributes: attr(x, "scope")
.
## Not run: toConv <- data.frame(a=1:2, b=3:4) rownames(toConv) <- c("10", "100") convDfBeIds( df=toConv, from="Gene", from.source="EntrezGene", from.org="human", to.source="Ens_gene" ) ## End(Not run)
## Not run: toConv <- data.frame(a=1:2, b=3:4) rownames(toConv) <- c("10", "100") convDfBeIds( df=toConv, from="Gene", from.source="EntrezGene", from.org="human", to.source="Ens_gene" ) ## End(Not run)
Not exported to avoid unintended modifications of the DB.
dumpEnsCore( organism, release, gv, ddir, toDump = c("attrib_type", "gene_attrib", "transcript", "external_db", "gene", "translation", "external_synonym", "object_xref", "xref", "stable_id_event"), env = parent.frame(n = 1) )
dumpEnsCore( organism, release, gv, ddir, toDump = c("attrib_type", "gene_attrib", "transcript", "external_db", "gene", "translation", "external_synonym", "object_xref", "xref", "stable_id_event"), env = parent.frame(n = 1) )
organism |
the organism to download (e.g. "Homo sapiens"). |
release |
Ensembl release (e.g. "83") |
gv |
version of the genome (e.g. "38") |
ddir |
path to the directory where the data should be saved |
toDump |
the list of tables to download |
env |
the R environment in which to load the tables when downloaded |
Not exported to avoid unintended modifications of the DB.
dumpNcbiDb( taxOfInt, reDumpThr, ddir, toLoad = c("gene_info", "gene2ensembl", "gene_group", "gene_orthologs", "gene_history", "gene2refseq"), env = parent.frame(n = 1), curDate )
dumpNcbiDb( taxOfInt, reDumpThr, ddir, toLoad = c("gene_info", "gene2ensembl", "gene_group", "gene_orthologs", "gene_history", "gene2refseq"), env = parent.frame(n = 1), curDate )
taxOfInt |
the organism to download (e.g. "9606"). |
reDumpThr |
time difference threshold between 2 downloads |
ddir |
path to the directory where the data should be saved |
toLoad |
the list of tables to load |
env |
the R environment in which to load the tables when downloaded |
curDate |
current date as given by Sys.Date |
Not exported to avoid unintended modifications of the DB.
dumpNcbiTax( reDumpThr, ddir, toDump = c("names.dmp"), env = parent.frame(n = 1), curDate )
dumpNcbiTax( reDumpThr, ddir, toDump = c("names.dmp"), env = parent.frame(n = 1), curDate )
reDumpThr |
time difference threshold between 2 downloads |
ddir |
path to the directory where the data should be saved |
toDump |
the list of tables to load |
env |
the R environment in which to load the tables when downloaded |
curDate |
current date as given by Sys.Date |
Not exported to avoid unintended modifications of the DB.
dumpUniprotDb( taxOfInt, divOfInt, release, ddir, ftp = "ftp://ftp.expasy.org/databases/uniprot", env = parent.frame(n = 1) )
dumpUniprotDb( taxOfInt, divOfInt, release, ddir, ftp = "ftp://ftp.expasy.org/databases/uniprot", env = parent.frame(n = 1) )
taxOfInt |
the organism of interest (e.g., "9606" for human, "10090" for mouse or "10116" for rat) |
divOfInt |
the taxonomic division to which the organism belong (e.g., "human", "rodents", "mammals", "vertebrates") |
release |
the release of interest (check if already downloaded) |
ddir |
path to the directory where the data should be saved |
ftp |
location of the ftp site |
env |
the R environment in which to load the tables when built |
This function uses visNetwork to draw all the identifiers corresponding to one BE (including ProbeID and BESymbol)
exploreBe( id, source, be, showBE = FALSE, showProbes = FALSE, showLegend = TRUE )
exploreBe( id, source, be, showBE = FALSE, showProbes = FALSE, showLegend = TRUE )
id |
one ID for the BE |
source |
the ID source database. Guessed if not provided |
be |
the type of BE. Guessed if not provided |
showBE |
boolean. If TRUE the Biological Entity corresponding to the id is shown. If id is isolated (not mapped to any other ID or symbol) BE is shown anyway. |
showProbes |
boolean. If TRUE, probes targeting any BEID are shown. |
showLegend |
boolean. If TRUE the legend is displayed. |
## Not run: exploreBe("Gene", "100", "EntrezGene") ## End(Not run)
## Not run: exploreBe("Gene", "100", "EntrezGene") ## End(Not run)
This function uses visNetwork to draw all the shortest convertion paths between two identifiers (including ProbeID).
exploreConvPath( from.id, to.id, from, from.source, to, to.source, edgeDirection = FALSE, showLegend = TRUE, verbose = FALSE )
exploreConvPath( from.id, to.id, from, from.source, to, to.source, edgeDirection = FALSE, showLegend = TRUE, verbose = FALSE )
from.id |
the first identifier |
to.id |
the second identifier |
from |
the type of entity: |
from.source |
the identifier source: database or platform. Guessed if not provided |
to |
the type of entity: |
to.source |
the identifier source: database or platform. Guessed if not provided |
edgeDirection |
a logical value indicating if the direction of the edges should be drawn. |
showLegend |
boolean. If TRUE the legend is displayed. |
verbose |
if TRUE the cypher query is shown |
## Not run: exploreConvPath( from.id="ENST00000413465", from="Transcript", from.source="Ens_transcript", to.id="ENSMUST00000108658", to="Transcript", to.source="Ens_transcript" ) ## End(Not run)
## Not run: exploreConvPath( from.id="ENST00000413465", from="Transcript", from.source="Ens_transcript", to.id="ENSMUST00000108658", to="Transcript", to.source="Ens_transcript" ) ## End(Not run)
Filter an object to keep only a set of BEIDs
filterByBEID(x, toKeep, ...)
filterByBEID(x, toKeep, ...)
x |
an object representing a collection of BEID (e.g. BEIDList) |
toKeep |
a vector of elements to keep |
... |
method specific parameters |
Find Biological Entity in BED based on their IDs, symbols and names
findBe( be = NULL, organism = NULL, ncharSymb = 4, ncharName = 8, restricted = TRUE, by = 20, exclude = c("BEDTech_gene", "BEDTech_transcript") )
findBe( be = NULL, organism = NULL, ncharSymb = 4, ncharName = 8, restricted = TRUE, by = 20, exclude = c("BEDTech_gene", "BEDTech_transcript") )
be |
optional. If provided the search is focused on provided BEs. |
organism |
optional. If provided the search is focused on provided organisms. |
ncharSymb |
The minimum number of characters in searched to consider incomplete symbol matches. |
ncharName |
The minimum number of characters in searched to consider incomplete name matches. |
restricted |
boolean indicating if the results should be restricted to current version of to BEID db. If FALSE former BEID are also returned: Depending on history it can take a very long time to return a very large result! |
by |
number of found items to be converted into relevant IDs. |
exclude |
database to exclude from possible selection. Used to filter out technical database names such as "BEDTech_gene" and "BEDTech_transcript" used to manage orphan IDs (not linked to any gene based on information taken from sources) |
A data frame with the following fields:
found: the element found in BED corresponding to the searched term
be: the type of the element
source: the source of the element
organism: the related organism
entity: the related entity internal ID
ebe: the BE of the related entity
canonical: if the symbol is canonical
Relevant ID: the seeked element id
Symbol: the symbol(s) of the corresponding gene(s)
Name: the symbol(s) of the corresponding gene(s)
Scope ("be", "source" and "organism") is provided as a named list in the "scope" attributes: 'attr(x, "scope")“
Find Biological Entity identifiers
findBeids(toGene = TRUE, ...)
findBeids(toGene = TRUE, ...)
toGene |
focus on gene entities (default=TRUE): matches from other BE are converted to genes. |
... |
parameters for beidsServer |
NULL if not any result, or a data.frame with the selected values and the following column:
value: the BE identifier
preferred: preferred identifier for the same BE in the same scope
be: the type of biological entity
source: the source of the identifier
organism: the organism of the BE
canonical (if toGene==TRUE): canonical gene product? (if known)
symbol: the symbol of the identifier (if any)
Returns the first common Biological Entity (BE) upstream a set of BE.
firstCommonUpstreamBe(beList = listBe(), uniqueOrg = TRUE)
firstCommonUpstreamBe(beList = listBe(), uniqueOrg = TRUE)
beList |
a character vector containing BE |
uniqueOrg |
a logical value indicating if as single organism is under focus. If false "Gene" is returned. |
This function is used to identified the level at which different BE should be compared. Peptides and transcripts should be compared at the level of transcripts whereas transcripts and objects should be compared at the level of genes. BE from different organism should be compared at the level of genes using homologs.
## Not run: firstCommonUpstreamBe(c("Object", "Transcript")) firstCommonUpstreamBe(c("Peptide", "Transcript")) firstCommonUpstreamBe(c("Peptide", "Transcript"), uniqueOrg=FALSE) ## End(Not run)
## Not run: firstCommonUpstreamBe(c("Object", "Transcript")) firstCommonUpstreamBe(c("Peptide", "Transcript")) firstCommonUpstreamBe(c("Peptide", "Transcript"), uniqueOrg=FALSE) ## End(Not run)
Focus a BE related object on a specific identifier (BEID) scope
focusOnScope( x, be, source, organism, scope, force, restricted, prefFilter, ... )
focusOnScope( x, be, source, organism, scope, force, restricted, prefFilter, ... )
x |
an object representing a collection of BEID (e.g. BEIDList) |
be |
the type of biological entity to focus on.
Used if |
source |
the source of BEID to focus on.
Used if |
organism |
the organism of BEID to focus on.
Used if |
scope |
a list with the following element:
|
force |
if TRUE the conversion is done even between identical scopes (default: FALSE) |
restricted |
if TRUE (default) the BEID are limited to current version of the source |
prefFilter |
if TRUE (default) the BEID are limited to prefered identifiers when they exist |
... |
method specific parameters for BEID conversion |
Depends on the class of x
Convert a BEIDList object in a specific identifier (BEID) scope
## S3 method for class 'BEIDList' focusOnScope( x, be = NULL, source = NULL, organism = NULL, scope = NULL, force = FALSE, restricted = TRUE, prefFilter = TRUE, ... )
## S3 method for class 'BEIDList' focusOnScope( x, be = NULL, source = NULL, organism = NULL, scope = NULL, force = FALSE, restricted = TRUE, prefFilter = TRUE, ... )
x |
the BEIDList to be converted |
be |
the type of biological entity to focus on.
If NULL (default), it's taken from |
source |
the source of BEID to focus on.
If NULL (default), it's taken from |
organism |
the organism of BEID to focus on.
If NULL (default), it's taken from |
scope |
a list with the following element:
|
force |
if TRUE the conversion is done even between identical scopes (default: FALSE) |
restricted |
if TRUE (default) the BEID are limited to current version of the source |
prefFilter |
if TRUE (default) the BEID are limited to prefered identifiers when they exist |
... |
additional parameters to the BEID conversion function |
A BEIDList
Forget a BED connection
forgetBedConnection(connection, save = FALSE)
forgetBedConnection(connection, save = FALSE)
connection |
the id of the connection to forget. |
save |
a logical. Should be set to TRUE to save the updated list of connections in the file space (default to FALSE to comply with CRAN policies). |
lsBedConnections, checkBedConn, connectToBed
Internal use
genBePath(from, to, onlyR = FALSE)
genBePath(from, to, onlyR = FALSE)
from |
one biological entity (BE) |
to |
one biological entity (BE) |
onlyR |
logical. If TRUE (default: FALSE) it returns only the names of the relationships and not the cypher sub-query |
A character value corresponding to the sub-query. Or, if onlyR, a character vector with the names of the relationships.
Find all GeneID, ObjectID, TranscriptID, PeptideID and ProbeID corresponding to a Gene in any organism
geneIDsToAllScopes( geneids, source, organism, entities = NULL, orthologs = TRUE, canonical_symbols = TRUE )
geneIDsToAllScopes( geneids, source, organism, entities = NULL, orthologs = TRUE, canonical_symbols = TRUE )
geneids |
a character vector of gene identifiers |
source |
the source of gene identifiers. Guessed if not provided |
organism |
the gene organism. Guessed if not provided |
entities |
a numeric vector of gene entity. If NULL (default), geneids, source and organism arguments are used to identify genes. Be carefull when using entities as these identifiers are not stable. |
orthologs |
return identifiers from orthologs |
canonical_symbols |
return only canonical symbols (default: TRUE). |
A data.frame with the following fields:
value: the identifier
preferred: preferred identifier for the same BE in the same scope
be: the type of BE
organism: the BE organism
source: the source of the identifier
canonical: canonical gene product (logical)
symbol: canonical symbol of the identifier
Gene_entity: the gene entity input
GeneID (optional): the gene ID input
Gene_source (optional): the gene source input
Gene_organism (optional): the gene organism input
Internal use
genProbePath(platform)
genProbePath(platform)
platform |
the platform of the probes |
A character value corresponding to the sub-query.
The attr(,"be")
correspond to the BE targeted by probes
List all the source databases of BE identifiers whatever the BE type
getAllBeIdSources(recache = FALSE)
getAllBeIdSources(recache = FALSE)
recache |
boolean indicating if the CQL query should be run even if the table is already in cache |
A data.frame indicating the BE related to the ID source (database).
listBeIdSources, listPlatforms
Get a conversion table between biological entity (BE) identifiers
getBeIdConvTable( from, to = from, from.source, to.source, organism, caseSensitive = FALSE, canonical = FALSE, restricted = TRUE, entity = TRUE, verbose = FALSE, recache = FALSE, filter = NULL, limForCache = 100 )
getBeIdConvTable( from, to = from, from.source, to.source, organism, caseSensitive = FALSE, canonical = FALSE, restricted = TRUE, entity = TRUE, verbose = FALSE, recache = FALSE, filter = NULL, limForCache = 100 )
from |
one BE or "Probe" |
to |
one BE or "Probe" |
from.source |
the from BE ID database if BE or the from probe platform if Probe |
to.source |
the to BE ID database if BE or the to probe platform if Probe |
organism |
organism name |
caseSensitive |
if TRUE the case of provided symbols is taken into account during the conversion and selection. This option will only affect the conversion from "Symbol" (default: caseSensitive=FALSE). All the other conversion will be case sensitive. |
canonical |
if TRUE, only returns the canonical "Symbol". (default: FALSE) |
restricted |
boolean indicating if the results should be restricted to current version of to BEID db. If FALSE former BEID are also returned: Depending on history it can take a very long time to return a very large result! |
entity |
boolean indicating if the technical ID of to BE should be returned |
verbose |
boolean indicating if the CQL query should be displayed |
recache |
boolean indicating if the CQL query should be run even if the table is already in cache |
filter |
character vector on which to filter from IDs. If NULL (default), the result is not filtered: all from IDs are taken into account. |
limForCache |
if there are more filter than limForCache results are collected for all IDs (beyond provided ids) and cached for futur queries. If not, results are collected only for provided ids and not cached. |
a data.frame mapping BE IDs with the following fields:
from: the from BE ID
to: the to BE ID
entity: (optional) the technical ID of to BE
preferred: true if "to" is the preferred identifier for the entity
getHomTable, listBe, listPlatforms, listBeIdSources
## Not run: getBeIdConvTable( from="Gene", from.source="EntrezGene", to.source="Ens_gene", organism="human" ) ## End(Not run)
## Not run: getBeIdConvTable( from="Gene", from.source="EntrezGene", to.source="Ens_gene", organism="human" ) ## End(Not run)
This description can be used for annotating tables or graph based on BE IDs.
getBeIdDescription(ids, be, source, organism, ...)
getBeIdDescription(ids, be, source, organism, ...)
ids |
list of identifiers |
be |
one BE. Guessed if not provided |
source |
the BE ID database. Guessed if not provided |
organism |
organism name. Guessed if not provided |
... |
further arguments for getBeIdNames and getBeIdSymbols functions |
a data.frame providing for each BE IDs (row.names are provided BE IDs):
id: the BE ID
symbol: the BE symbol
name: the corresponding name
## Not run: getBeIdDescription( ids=c("10", "100"), be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
## Not run: getBeIdDescription( ids=c("10", "100"), be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
Get names of Biological Entity identifiers
getBeIdNames(ids, be, source, organism, limForCache = 4000, ...)
getBeIdNames(ids, be, source, organism, limForCache = 4000, ...)
ids |
list of identifiers |
be |
one BE. Guessed if not provided |
source |
the BE ID database. Guessed if not provided |
organism |
organism name. Guessed if not provided |
limForCache |
if there are more ids than limForCache results are collected for all IDs (beyond provided ids) and cached for futur queries. If not, results are collected only for provided ids and not cached. |
... |
params for the getBeIdNameTable function |
a data.frame mapping BE IDs and names with the following fields:
id: the BE ID
name: the corresponding name
canonical: true if the name is canonical for the direct BE ID (often FALSE for backward compatibility)
direct: true if the name is directly related to the BE ID
preferred: true if the id is the preferred identifier for the BE
entity: (optional) the technical ID of to BE
getBeIdNameTable, getBeIdSymbols
## Not run: getBeIdNames( ids=c("10", "100"), be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
## Not run: getBeIdNames( ids=c("10", "100"), be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
Get a table of biological entity (BE) identifiers and names
getBeIdNameTable( be, source, organism, restricted, entity = TRUE, verbose = FALSE, recache = FALSE, filter = NULL )
getBeIdNameTable( be, source, organism, restricted, entity = TRUE, verbose = FALSE, recache = FALSE, filter = NULL )
be |
one BE |
source |
the BE ID database |
organism |
organism name |
restricted |
boolean indicating if the results should be restricted to direct names |
entity |
boolean indicating if the technical ID of BE should be returned |
verbose |
boolean indicating if the CQL query should be displayed |
recache |
boolean indicating if the CQL query should be run even if the table is already in cache |
filter |
character vector on which to filter id. If NULL (default), the result is not filtered: all IDs are taken into account. |
a data.frame with the following fields:
id: the from BE ID
name: the BE name
direct: false if the symbol is not directly associated to the BE ID
preferred: true if the id is the preferred identifier for the BE
entity: (optional) the technical ID of to BE
getBeIdNames, getBeIdSymbolTable
## Not run: getBeIdNameTable( be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
## Not run: getBeIdNameTable( be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
Get biological entities identifiers
getBeIds( be = c(listBe(), "Probe"), source, organism = NA, restricted, entity = TRUE, attributes = NULL, verbose = FALSE, recache = FALSE, filter = NULL, caseSensitive = FALSE, limForCache = 100, bef = NULL )
getBeIds( be = c(listBe(), "Probe"), source, organism = NA, restricted, entity = TRUE, attributes = NULL, verbose = FALSE, recache = FALSE, filter = NULL, caseSensitive = FALSE, limForCache = 100, bef = NULL )
be |
one BE or "Probe" |
source |
the BE ID database or "Symbol" if BE or the probe platform if Probe |
organism |
organism name |
restricted |
boolean indicating if the results should be restricted to current version of to BEID db. If FALSE former BEID are also returned. |
entity |
boolean indicating if the technical ID of BE should be returned |
attributes |
a character vector listing attributes that should be returned. |
verbose |
boolean indicating if the CQL query should be displayed |
recache |
boolean indicating if the CQL query should be run even if the table is already in cache |
filter |
character vector on which to filter id. If NULL (default), the result is not filtered: all IDs are taken into account. |
caseSensitive |
if TRUE the case of provided symbols is taken into account. This option will only affect "Symbol" source (default: caseSensitive=FALSE). |
limForCache |
if there are more filter than limForCache results are collected for all IDs (beyond provided ids) and cached for futur queries. If not, results are collected only for provided ids and not cached. |
bef |
For internal use only |
a data.frame mapping BE IDs with the following fields:
id: the BE ID
preferred: true if the id is the preferred identifier for the BE
BE: IF entity is TRUE the technical ID of BE
db.version: IF be is not "Probe" and source not "Symbol" the version of the DB
db.deprecated: IF be is not "Probe" and source not "Symbol" a value if the BE ID is deprecated or FALSE if it's not
canonical: IF source is "Symbol" TRUE if the symbol is canonical
organism: IF be is "Probe" the organism of the targeted BE
If attributes are part of the query, additional columns for each of them.
Scope ("be", "source" and "organism") is provided as a named list
in the "scope" attributes: attr(x, "scope")
listPlatforms, listBeIdSources
## Not run: beids <- getBeIds(be="Gene", source="EntrezGene", organism="human", restricted=TRUE) ## End(Not run)
## Not run: beids <- getBeIds(be="Gene", source="EntrezGene", organism="human", restricted=TRUE) ## End(Not run)
Get symbols of Biological Entity identifiers
getBeIdSymbols(ids, be, source, organism, limForCache = 4000, ...)
getBeIdSymbols(ids, be, source, organism, limForCache = 4000, ...)
ids |
list of identifiers |
be |
one BE. Guessed if not provided |
source |
the BE ID database. Guessed if not provided |
organism |
organism name. Guessed if not provided |
limForCache |
if there are more ids than limForCache. Results are collected for all IDs (beyond provided ids) and cached for futur queries. If not, results are collected only for provided ids and not cached. |
... |
params for the getBeIdSymbolTable function |
a data.frame with the following fields:
id: the from BE ID
symbol: the BE symbol
canonical: true if the symbol is canonical for the direct BE ID
direct: false if the symbol is not directly associated to the BE ID
preferred: true if the id is the preferred identifier for the BE
entity: (optional) the technical ID of to BE
getBeIdSymbolTable, getBeIdNames
## Not run: getBeIdSymbols( ids=c("10", "100"), be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
## Not run: getBeIdSymbols( ids=c("10", "100"), be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
Get a table of biological entity (BE) identifiers and symbols
getBeIdSymbolTable( be, source, organism, restricted, entity = TRUE, verbose = FALSE, recache = FALSE, filter = NULL )
getBeIdSymbolTable( be, source, organism, restricted, entity = TRUE, verbose = FALSE, recache = FALSE, filter = NULL )
be |
one BE |
source |
the BE ID database |
organism |
organism name |
restricted |
boolean indicating if the results should be restricted to direct symbols |
entity |
boolean indicating if the technical ID of BE should be returned |
verbose |
boolean indicating if the CQL query should be displayed |
recache |
boolean indicating if the CQL query should be run even if the table is already in cache |
filter |
character vector on which to filter id. If NULL (default), the result is not filtered: all IDs are taken into account. |
a data.frame with the following fields:
id: the from BE ID
symbol: the BE symbol
canonical: true if the symbol is canonical for the direct BE ID
direct: false if the symbol is not directly associated to the BE ID
preferred: true if the id is the preferred identifier for the BE
entity: (optional) the technical ID of to BE
getBeIdSymbols, getBeIdNameTable
## Not run: getBeIdSymbolTable( be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
## Not run: getBeIdSymbolTable( be="Gene", source="EntrezGene", organism="human" ) ## End(Not run)
Get reference URLs for BE IDs
getBeIdURL(ids, databases)
getBeIdURL(ids, databases)
ids |
the BE ID |
databases |
the databases from which each ID has been taken (if only one database is provided it is chosen for all ids) |
A character vector of the same length than ids corresponding to the relevant URLs. NA is returned is there is no URL corresponding to the provided database.
## Not run: getBeIdURL(c("100", "ENSG00000145335"), c("EntrezGene", "Ens_gene")) ## End(Not run)
## Not run: getBeIdURL(c("100", "ENSG00000145335"), c("EntrezGene", "Ens_gene")) ## End(Not run)
The origin is directly taken as provided by the original database. This function does not return indirect relationships.
getDirectOrigin( ids, sources = NULL, process = c("is_expressed_as", "is_translated_in", "codes_for") )
getDirectOrigin( ids, sources = NULL, process = c("is_expressed_as", "is_translated_in", "codes_for") )
ids |
list of product identifiers |
sources |
a character vector corresponding to the possible product ID sources. If NULL (default), all sources are considered |
process |
the production process among: "is_expressed_as", "is_translated_in", "codes_for". |
a data.frame with the following columns:
origin: the origin BE identifiers
osource: the origin database
product: the product BE identifiers
psource: the production database
canonical: whether the production process is canonical or not
The process is also returned as an attribute of the data.frame.
## Not run: oriId <- c("XP_016868427", "NP_001308979") res <- getDirectOrigin( ids=oriId, source="RefSeq_peptide", process="is_translated_in" ) attr(res, "process") ## End(Not run)
## Not run: oriId <- c("XP_016868427", "NP_001308979") res <- getDirectOrigin( ids=oriId, source="RefSeq_peptide", process="is_translated_in" ) attr(res, "process") ## End(Not run)
The product is directly taken as provided by the original database. This function does not return indirect relationships.
getDirectProduct( ids, sources = NULL, process = c("is_expressed_as", "is_translated_in", "codes_for"), canonical = NA )
getDirectProduct( ids, sources = NULL, process = c("is_expressed_as", "is_translated_in", "codes_for"), canonical = NA )
ids |
list of origin identifiers |
sources |
a character vector corresponding to the possible origin ID sources. If NULL (default), all sources are considered |
process |
the production process among: "is_expressed_as", "is_translated_in", "codes_for". |
canonical |
If TRUE returns only canonical production process. If FALSE returns only non-canonical production processes. If NA (default) canonical information is taken into account. |
a data.frame with the following columns:
origin: the origin BE identifiers
osource: the origin database
product: the product BE identifiers
psource: the production database
canonical: whether the production process is canonical or not
The process is also returned as an attribute of the data.frame.
## Not run: oriId <- c("10", "100") res <- getDirectProduct( ids=oriId, source="EntrezGene", process="is_expressed_as", canonical=NA ) attr(res, "process") ## End(Not run)
## Not run: oriId <- c("10", "100") res <- getDirectProduct( ids=oriId, source="EntrezGene", process="is_expressed_as", canonical=NA ) attr(res, "process") ## End(Not run)
Not exported to avoid unintended modifications of the DB.
getEnsemblGeneIds(organism, release, gv, ddir, dbCref, dbAss, canChromosomes)
getEnsemblGeneIds(organism, release, gv, ddir, dbCref, dbAss, canChromosomes)
organism |
character vector of 1 element corresponding to the organism of interest (e.g. "Homo sapiens") |
release |
the Ensembl release of interest (e.g. "83") |
gv |
the genome version (e.g. "38") |
ddir |
path to the directory where the data should be saved |
dbCref |
a named vector of characters providing cross-reference DB of interest. These DB are also used to find indirect ID associations. |
dbAss |
a named vector of characters providing associated DB of interest. Unlike the DB in dbCref parameter, these DB are not used for indirect ID associations: the IDs are only linked to Ensembl IDs. |
canChromosomes |
canonical chromosmomes to be considered as preferred ID (e.g. c(1:22, "X", "Y", "MT") for human) |
Not exported to avoid unintended modifications of the DB.
getEnsemblPeptideIds(organism, release, gv, ddir, dbCref, canChromosomes)
getEnsemblPeptideIds(organism, release, gv, ddir, dbCref, canChromosomes)
organism |
character vector of 1 element corresponding to the organism of interest (e.g. "Homo sapiens") |
release |
the Ensembl release of interest (e.g. "83") |
gv |
the genome version (e.g. "38") |
ddir |
path to the directory where the data should be saved |
dbCref |
a named vector of characters providing cross-reference DB of interest. These DB are also used to find indirect ID associations. |
canChromosomes |
canonical chromosmomes to be considered as preferred ID (e.g. c(1:22, "X", "Y", "MT") for human) |
Not exported to avoid unintended modifications of the DB.
getEnsemblTranscriptIds(organism, release, gv, ddir, dbCref, canChromosomes)
getEnsemblTranscriptIds(organism, release, gv, ddir, dbCref, canChromosomes)
organism |
character vector of 1 element corresponding to the organism of interest (e.g. "Homo sapiens") |
release |
the Ensembl release of interest (e.g. "83") |
gv |
the genome version (e.g. "38") |
ddir |
path to the directory where the data should be saved |
dbCref |
a named vector of characters providing cross-reference DB of interest. These DB are also used to find indirect ID associations. |
canChromosomes |
canonical chromosmomes to be considered as preferred ID (e.g. c(1:22, "X", "Y", "MT") for human) |
This description can be used for annotating tables or graph based on BE IDs.
getGeneDescription( ids, be, source, organism, gsource = largestBeSource(be = "Gene", organism = organism, rel = "is_known_as", restricted = TRUE), limForCache = 2000 )
getGeneDescription( ids, be, source, organism, gsource = largestBeSource(be = "Gene", organism = organism, rel = "is_known_as", restricted = TRUE), limForCache = 2000 )
ids |
list of identifiers |
be |
one BE. Guessed if not provided |
source |
the BE ID database. Guessed if not provided |
organism |
organism name. Guessed if not provided |
gsource |
the source of the gene IDs to use. It's chosen automatically by default. |
limForCache |
The number of ids above which the description is gathered for all be IDs and cached for futur queries. |
a data.frame providing for each BE IDs (row.names are provided BE IDs):
id: the BE ID
gsource: the Gene ID the column name provides the source of the used identifier
symbol: the associated gene symbols
name: the associated gene names
getBeIdDescription, getBeIdNames, getBeIdSymbols
## Not run: getGeneDescription( ids=c("1438_at", "1552335_at"), be="Probe", source="GPL570", organism="human" ) ## End(Not run)
## Not run: getGeneDescription( ids=c("1438_at", "1552335_at"), be="Probe", source="GPL570", organism="human" ) ## End(Not run)
Get gene homologs between 2 organisms
getHomTable( from.org, to.org, from.source = "Ens_gene", to.source = from.source, restricted = TRUE, verbose = FALSE, recache = FALSE, filter = NULL, limForCache = 100 )
getHomTable( from.org, to.org, from.source = "Ens_gene", to.source = from.source, restricted = TRUE, verbose = FALSE, recache = FALSE, filter = NULL, limForCache = 100 )
from.org |
organism name |
to.org |
organism name |
from.source |
the from gene ID database |
to.source |
the to gene ID database |
restricted |
boolean indicating if the results should be restricted to current version of to BEID db. If FALSE former BEID are also returned: Depending on history it can take a very long time to return a very large result! |
verbose |
boolean indicating if the CQL query should be displayed |
recache |
boolean indicating if the CQL query should be run even if the table is already in cache |
filter |
character vector on which to filter from IDs. If NULL (default), the result is not filtered: all from IDs are taken into account. |
limForCache |
if there are more filter than limForCache results are collected for all IDs (beyond provided ids) and cached for futur queries. If not, results are collected only for provided ids and not cached. |
a data.frame mapping gene IDs with the following fields:
from: the from gene ID
to: the to gene ID
## Not run: getHomTable( from.org="human", to.org="mouse" ) ## End(Not run)
## Not run: getHomTable( from.org="human", to.org="mouse" ) ## End(Not run)
Not exported to avoid unintended modifications of the DB.
getNcbiGeneTransPep(organism, reDumpThr = 1e+05, ddir, curDate)
getNcbiGeneTransPep(organism, reDumpThr = 1e+05, ddir, curDate)
organism |
character vector of 1 element corresponding to the organism of interest (e.g. "Homo sapiens") |
reDumpThr |
time difference threshold between 2 downloads |
ddir |
path to the directory where the data should be saved |
curDate |
current date as given by Sys.Date |
Get organism names from taxonomy IDs
getOrgNames(taxID = NULL)
getOrgNames(taxID = NULL)
taxID |
a vector of taxonomy IDs. If NULL (default) the function lists all taxonomy IDs available in the DB. |
A data.frame mapping taxonomy IDs to organism names with the following fields:
taxID: the taxonomy ID
name: the organism name
nameClass: the class of the name
## Not run: getOrgNames(c("9606", "10090")) getOrgNames("9606") ## End(Not run)
## Not run: getOrgNames(c("9606", "10090")) getOrgNames("9606") ## End(Not run)
DEPRECATED: use searchBeid and geneIDsToAllScopes instead. This function is meant to be used with searchId in order to implement a dictonary of identifiers of interest. First the searchId function is used to search a term. Then the getRelevantIds function is used to find the corresponding IDs in a context of interest.
getRelevantIds( d, selected = 1, be = c(listBe(), "Probe"), source, organism, restricted = TRUE, simplify = TRUE, verbose = FALSE )
getRelevantIds( d, selected = 1, be = c(listBe(), "Probe"), source, organism, restricted = TRUE, simplify = TRUE, verbose = FALSE )
d |
the data.frame returned by searchId. |
selected |
the rows of interest in d |
be |
the BE in the context of interest |
source |
the source of the identifier in the context of interest |
organism |
the organism in the context of interest |
restricted |
boolean indicating if the results should be restricted to current version of to BEID db. If FALSE former BEID are also returned: Depending on history it can take a very long time to return a very large result! |
simplify |
if TRUE (default) duplicated IDs are removed from the output |
verbose |
if TRUE, the CQL query is shown |
The d data.frame with a new column providing the relevant ID
in the context of interest and without the gene field.
Scope ("be", "source" and "organism") is provided as a named list
in the "scope" attributes: attr(x, "scope")
Identify the biological entity (BE) targeted by probes
getTargetedBe(platform)
getTargetedBe(platform)
platform |
the platform of the probes |
The BE targeted by the platform
## Not run: getTargetedBe("GPL570") ## End(Not run)
## Not run: getTargetedBe("GPL570") ## End(Not run)
Get taxonomy ID of an organism name
getTaxId(name)
getTaxId(name)
name |
the name of the organism |
A vector of taxonomy ID
## Not run: getTaxId("human") ## End(Not run)
## Not run: getTaxId("human") ## End(Not run)
Not exported to avoid unintended modifications of the DB.
getUniprot(organism, taxDiv, release, ddir)
getUniprot(organism, taxDiv, release, ddir)
organism |
character vector of 1 element corresponding to the organism of interest (e.g. "Homo sapiens") |
taxDiv |
the taxonomic division to which the organism belong (e.g., "human", "rodents", "mammals", "vertebrates") |
release |
the release of interest (check if already downloaded) |
ddir |
path to the directory where the data should be saved |
Guess biological entity (BE), database source and organism of a vector of identifiers.
guessIdScope(ids, be, source, organism, tcLim = 100) guessIdOrigin(...)
guessIdScope(ids, be, source, organism, tcLim = 100) guessIdOrigin(...)
ids |
a character vector of identifiers |
be |
one BE or "Probe". Guessed if not provided |
source |
the BE ID database or "Symbol" if BE or the probe platform if Probe. Guessed if not provided |
organism |
organism name. Guessed if not provided |
tcLim |
number of identifiers to check to guess origin for the whole set. Inf ==> no limit. |
... |
params for |
A list (NULL if no match):
be: a character vector of length 1 providing the best BE guess (NA if inconsistent with user input: be, source or organism)
source: a character vector of length 1 providing the best source guess (NA if inconsistent with user input: be, source or organism)
*organism$: a character vector of length 1 providing the best organism guess (NA if inconsistent with user input: be, source or organism)
The "details" attribute ('attr(x, "details")“) is a data frame providing numbers supporting the guess
guessIdOrigin()
: Deprecated version of guessIdScope
## Not run: guessIdScope(ids=c("10", "100")) ## End(Not run)
## Not run: guessIdScope(ids=c("10", "100")) ## End(Not run)
Check if two objects have the same BEID scope
identicalScopes(x, y)
identicalScopes(x, y)
x |
the object to test |
y |
the object to test |
A logical indicating if the 2 scopes are identical
Check if the provided object is a BEIDList
is.BEIDList(x)
is.BEIDList(x)
x |
the object to check |
A logical value
The selection is based on direct identifiers
largestBeSource( be, organism, rel = NA, restricted = TRUE, exclude = c("BEDTech_gene", "BEDTech_transcript") )
largestBeSource( be, organism, rel = NA, restricted = TRUE, exclude = c("BEDTech_gene", "BEDTech_transcript") )
be |
the biological entity under focus |
organism |
the organism under focus |
rel |
a type of relationship to consider in the query (e.g. "is_member_of") in order to focus on specific information. If NA (default) all be are taken into account whatever their available relationships. |
restricted |
boolean indicating if the results should be restricted to current version of to BEID db. If FALSE former BEID are also taken into account. |
exclude |
database to exclude from possible selection. Used to filter out technical database names such as "BEDTech_gene" and "BEDTech_transcript" used to manage orphan IDs (not linked to any gene based on information taken from sources) |
The name of the selected source. The selected source will be the one providing the largest number of current identifiers.
## Not run: largestBeSource(be="Gene", "Mus musculus") ## End(Not run)
## Not run: largestBeSource(be="Gene", "Mus musculus") ## End(Not run)
Lists all the biological entities (BE) available in the BED database
listBe()
listBe()
A character vector of biological entities (BE)
listPlatforms, listBeIdSources, listOrganisms
Lists all the databases taken into account in the BED database for a biological entity (BE)
listBeIdSources( be = listBe(), organism, direct = FALSE, rel = NA, restricted = FALSE, recache = FALSE, verbose = FALSE, exclude = c() )
listBeIdSources( be = listBe(), organism, direct = FALSE, rel = NA, restricted = FALSE, recache = FALSE, verbose = FALSE, exclude = c() )
be |
the BE on which to focus |
organism |
the name of the organism to focus on. |
direct |
a logical value indicating if only "direct" BE identifiers should be considered |
rel |
a type of relationship to consider in the query (e.g. "is_member_of") in order to focus on specific information. If NA (default) all be are taken into account whatever their available relationships. |
restricted |
boolean indicating if the results should be restricted to current version of to BEID db. If FALSE former BEID are also returned. There is no impact if direct is set to TRUE. |
recache |
boolean indicating if the CQL query should be run even if the table is already in cache |
verbose |
boolean indicating if the CQL query should be shown. |
exclude |
database to exclude from possible selection. Used to filter out technical database names such as "BEDTech_gene" and "BEDTech_transcript" used to manage orphan IDs (not linked to any gene based on information taken from sources) |
A data.frame indicating the number of ID in each available database with the following fields:
database: the database name
nbBe: number of distinct entities
nbId: number of identifiers
be: the BE under focus
## Not run: listBeIdSources(be="Transcript", organism="mouse") ## End(Not run)
## Not run: listBeIdSources(be="Transcript", organism="mouse") ## End(Not run)
List all attributes provided by a BEDB
listDBAttributes(dbname)
listDBAttributes(dbname)
dbname |
the name of the database |
A character vector of attribute names
Lists all the organisms available in the BED database
listOrganisms()
listOrganisms()
A character vector of organism scientific names
listPlatforms, listBeIdSources, listBe, getTaxId, getOrgNames
Lists all the probe platforms available in the BED database
listPlatforms(be = c(NA, listBe()))
listPlatforms(be = c(NA, listBe()))
be |
a character vector of BE on which to focus. if NA (default) all the BE are considered. |
A data.frame mapping platforms to BE with the following fields:
name: the platform nam
description: platform description
focus: Targeted BE
listBe, listBeIdSources, listOrganisms, getTargetedBe
## Not run: listPlatforms(be="Gene") listPlatforms() ## End(Not run)
## Not run: listPlatforms(be="Gene") listPlatforms() ## End(Not run)
Not exported to avoid unintended modifications of the DB.
loadBE( d, be = "Gene", dbname, version = NA, deprecated = NA, taxId = NA, onlyId = FALSE )
loadBE( d, be = "Gene", dbname, version = NA, deprecated = NA, taxId = NA, onlyId = FALSE )
d |
a data.frame with information about the entities to be loaded. It should contain the following fields: "id". If there is a boolean column named "preferred", the value is loaded. |
be |
a character corresponding to the BE type (default: "Gene") |
dbname |
the DB from which the BE ID are taken |
version |
the version of the DB from which the BE IDs are taken |
deprecated |
NA (default) or the date when the ID was deprecated |
taxId |
the taxonomy ID of the BE organism |
onlyId |
a logical. If TRUE, only an BEID is created and not the corresponding BE. |
Not exported to avoid unintended modifications of the DB.
loadBeAttribute(d, be = "Gene", dbname, attribute)
loadBeAttribute(d, be = "Gene", dbname, attribute)
d |
a data.frame providing for each BE ID ("id" column) an attribute value ("value" column). There can be several values for each id. |
be |
a character corresponding to the BE type (default: "Gene") |
dbname |
the DB from which the BE ID are taken |
attribute |
the name of the attribute to be loaded |
Not exported to avoid unintended modifications of the DB.
loadBedModel()
loadBedModel()
Not exported to avoid unintended modifications of the DB.
loadBedOtherIndexes()
loadBedOtherIndexes()
Internal use
loadBedResult(name)
loadBedResult(name)
name |
the name of the query |
Not exported to avoid unintended modifications of the DB.
loadBENames(d, be = "Gene", dbname)
loadBENames(d, be = "Gene", dbname)
d |
a data.frame with information about the names to be loaded. It should contain the following fields: "id", "name" and "canonical" (optional). |
be |
a character corresponding to the BE type (default: "Gene") |
dbname |
the DB of BEID |
Not exported to avoid unintended modifications of the DB.
loadBESymbols(d, be = "Gene", dbname)
loadBESymbols(d, be = "Gene", dbname)
d |
a data.frame with information about the symbols to be loaded. It should contain the following fields: "id", "symbol" and "canonical" (optional). |
be |
a character corresponding to the BE type (default: "Gene") |
dbname |
the DB of BEID |
Not exported to avoid unintended modifications of the DB.
loadBEVersion(d, be = "Gene", dbname, taxId = NA, onlyId = FALSE)
loadBEVersion(d, be = "Gene", dbname, taxId = NA, onlyId = FALSE)
d |
a data.frame with information about the entities to be loaded. It should contain the following fields: "id", "version" and "deprecated". |
be |
a character corresponding to the BE type (default: "Gene") |
dbname |
the DB from which the BE ID are taken |
taxId |
the taxonomy ID of the BE organism |
onlyId |
a logical. If TRUE, only an BEID is created and not the corresponding BE. |
Not exported to avoid unintended modifications of the DB.
loadCodesFor(d, gdb, odb)
loadCodesFor(d, gdb, odb)
d |
a data.frame with information about the coding events. It should contain the following fields: "gid" and "oid" |
gdb |
the DB of Gene IDs |
odb |
the DB of Object IDs |
Not exported to avoid unintended modifications of the DB.
loadCorrespondsTo(d, db1, db2, be = "Gene")
loadCorrespondsTo(d, db1, db2, be = "Gene")
d |
a data.frame with information about the correspondances to be loaded. It should contain the following fields: "id1" and "id2". |
db1 |
the DB of id1 |
db2 |
the DB of id2 |
be |
a character corresponding to the BE type (default: "Gene") |
Not exported to avoid unintended modifications of the DB.
loadHistory(d, dbname, be = "Gene")
loadHistory(d, dbname, be = "Gene")
d |
a data.frame with information about the history. It should contain the following fields: "old" and "new". |
dbname |
the DB of BEID |
be |
a character corresponding to the BE type (default: "Gene") |
Not exported to avoid unintended modifications of the DB.
loadIsAssociatedTo(d, db1, db2, be = "Gene")
loadIsAssociatedTo(d, db1, db2, be = "Gene")
d |
a data.frame with information about the associations to be loaded. It should contain the following fields: "id1" and "id2". At the end id1 is associated to id2 (this way and not the other). |
db1 |
the DB of id1 |
db2 |
the DB of id2 |
be |
a character corresponding to the BE type (default: "Gene") |
When associating one id1 to id2, the BE identified by id1 is deleted after that its production edges have been transferred to the BE identified by id2. After this operation all id "corresponding_to" id1 do not directly identify any BE as they are supposed to do. Thus, to run this function with id1 involved in "corresponds_to" edges.
Not exported to avoid unintended modifications of the DB.
loadIsExpressedAs(d, gdb, tdb)
loadIsExpressedAs(d, gdb, tdb)
d |
a data.frame with information about the expression events. It should contain the following fields: "gid", "tid" and "canonical" (optional). |
gdb |
the DB of Gene IDs |
tdb |
the DB of Transcript IDs |
Not exported to avoid unintended modifications of the DB.
loadIsHomologOf(d, db1, db2, be = "Gene")
loadIsHomologOf(d, db1, db2, be = "Gene")
d |
a data.frame with information about the homologies to be loaded. It should contain the following fields: "id1" and "id2". |
db1 |
the DB of id1 |
db2 |
the DB of id2 |
be |
a character corresponding to the BE type (default: "Gene") |
Not exported to avoid unintended modifications of the DB.
loadIsTranslatedIn(d, tdb, pdb)
loadIsTranslatedIn(d, tdb, pdb)
d |
a data.frame with information about the translation events. It should contain the following fields: "tid", "pid" and "canonical" (optional). |
tdb |
the DB of Transcript IDs |
pdb |
the DB of Peptide IDs |
Not exported to avoid unintended modifications of the DB.
loadLuceneIndexes()
loadLuceneIndexes()
Not exported to avoid unintended modifications of the DB.
loadNCBIEntrezGOFunctions(organism, reDumpThr = 1e+05, ddir, curDate)
loadNCBIEntrezGOFunctions(organism, reDumpThr = 1e+05, ddir, curDate)
organism |
character vector of 1 element corresponding to the organism of interest (e.g. "Homo sapiens") |
reDumpThr |
time difference threshold between 2 downloads |
ddir |
path to the directory where the data should be saved |
curDate |
current date as given by Sys.Date |
Not exported to avoid unintended modifications of the DB.
loadNcbiTax(reDumpThr, ddir, orgOfInt = c("human", "rat", "mouse"), curDate)
loadNcbiTax(reDumpThr, ddir, orgOfInt = c("human", "rat", "mouse"), curDate)
reDumpThr |
time difference threshold between 2 downloads |
ddir |
path to the directory where the data should be saved |
orgOfInt |
organisms of interest: a character vector |
curDate |
current date as given by Sys.Date |
Not exported to avoid unintended modifications of the DB.
loadOrganisms(d)
loadOrganisms(d)
d |
a data.frame with 2 columns named "tax_id" and "name_txt" providing the taxonomic ID for each organism name |
Not exported to avoid unintended modifications of the DB.
loadPlf(name, description, be)
loadPlf(name, description, be)
name |
the name of the platform |
description |
a description of the platform |
be |
the type of BE targeted by the platform |
Not exported to avoid unintended modifications of the DB.
loadProbes(d, be = "Transcript", platform, dbname)
loadProbes(d, be = "Transcript", platform, dbname)
d |
a data.frame with information about the entities to be loaded. It should contain the following fields: "id" and "probeID". |
be |
a character corresponding to the BE targeted by the probes (default: "Transcript") |
platform |
the plateform gathering the probes |
dbname |
the DB from which the BE ID are taken |
List all the BED queries in cache and the total size of the cache
lsBedCache(verbose = TRUE)
lsBedCache(verbose = TRUE)
verbose |
if TRUE (default) prints a message displaying the total size of the cache |
A data.frame giving for each query (row names) its size in Bytes (column "size") and in human readable format (column "hr"). The attribute "Total" corresponds to the sum of all the file size.
List all registered BED connection
lsBedConnections()
lsBedConnections()
connectToBed, forgetBedConnection, checkBedConn
Get object metadata
metadata(x, ...)
metadata(x, ...)
x |
an object representing a collection of BEID (e.g. BEIDList) |
... |
method specific parameters |
Set object metadata
metadata(x) <- value
metadata(x) <- value
x |
an object representing a collection of BEID (e.g. BEIDList) |
value |
a data.frame with rownames or a column ".lname" all in names of l. |
Not exported to avoid unintended modifications of the DB.
registerBEDB(name, description = NA, currentVersion = NA, idURL = NA)
registerBEDB(name, description = NA, currentVersion = NA, idURL = NA)
name |
of the database (e.g. "Ens_gene") |
description |
a short description of the database (e.g. "Ensembl gene") |
currentVersion |
the version taken into account in BED (e.g. 83) |
idURL |
the URL template to use to retrieve id information. A '%s' corresponding to the ID should be present in this character vector of length one. |
Get the BEID scope of an object
scope(x, ...)
scope(x, ...)
x |
an object representing a collection of BEID (e.g. BEIDList) |
... |
method specific parameters |
Get the BEID scopes of an object
scopes(x, ...)
scopes(x, ...)
x |
an object representing a collection of BEID (e.g. BEIDList) |
... |
method specific parameters |
A tibble with 4 columns:
be
source
organism
Freq
Search a BEID
searchBeid(x, maxHits = 75, clean_id_search = TRUE, clean_name_search = TRUE)
searchBeid(x, maxHits = 75, clean_id_search = TRUE, clean_name_search = TRUE)
x |
a character value to search |
maxHits |
maximum number of raw hits to return |
clean_id_search |
clean x to avoid error during ID search. Default: TRUE. Set it to false if you're sure of your lucene query. |
clean_name_search |
clean x to avoid error during ID search. Default: TRUE. Set it to false if you're sure of your lucene query. |
NULL if there is not any match or a data.frame with the following columns:
value: the matching term
from: the type of the matched term (e.g. BESymbol, GeneID...)
be: the matching biological entity (BE)
beid: the BE identifier
source: the BEID reference database
preferred: TRUE if the BEID is considered as a preferred identifier
symbol: BEID canonical symbol
name: BEID name
entity: technical BE identifier
GeneID: Corresponding gene identifier
Gene_source: Gene ID database
preferred_gene: TRUE if the GeneID is considered as a preferred identifier
Gene_symbol: Gene symbol
Gene_name: Gene name
Gene_entity: technical gene identifier
organism: gene organism (scientific name)
score: score of the fuzzy search
included: is the search term fully included in the value
exact: is the value an exact match of the term
DEPRECATED: use searchBeid and geneIDsToAllScopes instead. This function is meant to be used with getRelevantIds in order to implement a dictonary of identifiers of interest. First the searchId function is used to search a term. Then the getRelevantIds function is used to find the corresponding ID in a context of interest.
searchId( searched, be = NULL, organism = NULL, ncharSymb = 4, ncharName = 8, verbose = FALSE )
searchId( searched, be = NULL, organism = NULL, ncharSymb = 4, ncharName = 8, verbose = FALSE )
searched |
the searched term. Identifiers are searched by exact match. Symbols and names are also searched for partial match when searched is greater than ncharSymb and ncharName respectively. |
be |
optional. If provided the search is focused on provided BEs. |
organism |
optional. If provided the search is focused on provided organisms. |
ncharSymb |
The minimum number of characters in searched to consider incomplete symbol matches. |
ncharName |
The minimum number of characters in searched to consider incomplete name matches. |
verbose |
boolean indicating if the CQL queries should be displayed |
A data frame with the following fields:
found: the element found in BED corresponding to the searched term
be: the type of the element
source: the source of the element
organism: the related organism
entity: the related entity internal ID
ebe: the BE of the related entity
canonical: if the symbol is canonical
gene: list of the related genes BE internal ID
Exact matches are returned first folowed by the shortest elements.
Not exported to avoid unintended modifications of the DB. This function is used when modifying the BED content.
setBedVersion(bedInstance, bedVersion)
setBedVersion(bedInstance, bedVersion)
bedInstance |
instance of BED to be set |
bedVersion |
version of BED to be set |
Show the shema of the BED data model.
showBedDataModel()
showBedDataModel()