```{r}
eid <- beids$id[which(beids$Gene %in% names(which(table(beids$Gene)>=3)))][1]
print(eid)
exploreBe(id=eid, source="EntrezGene", be="Gene") %>%
visPhysics(solver="repulsion")
```
```{r}
mapt <- convBeIds(
"MAPT", from="Gene", from.source="Symbol", from.org="human",
to.source="Ens_gene", restricted=TRUE
)
exploreBe(
mapt[1, "to"],
source="Ens_gene",
be="Gene"
)
getBeIds(
be="Gene", source="Ens_gene", organism="human",
restricted=TRUE,
attributes=listDBAttributes("Ens_gene"),
filter=mapt$to
)
```
## Checking identifiers
The origin of identifiers can be guessed as following.
```{r}
oriId <- c(
"17237", "105886298", "76429", "80985", "230514", "66459",
"93696", "72514", "20352", "13347", "100462961", "100043346",
"12400", "106582", "19062", "245607", "79196", "16878", "320727",
"230649", "66880", "66245", "103742", "320145", "140795"
)
idOrigin <- guessIdScope(oriId)
print(idOrigin$be)
print(idOrigin$source)
print(idOrigin$organism)
```
The best guess is returned as a list but other possible origins are listed in
the *details* attribute.
```{r}
print(attr(idOrigin, "details"))
```
If the origin of identifiers is already known, it can also be tested.
```{r}
checkBeIds(ids=oriId, be="Gene", source="EntrezGene", organism="mouse")
```
```{r}
checkBeIds(ids=oriId, be="Gene", source="HGNC", organism="human")
```
## Identifier annotation
Identifiers can be annotated with symbols and names according to available
information.
The following code returns the most relevant symbol and the most relevant name
for each ID.
Source URL can also be generated with the `getBeIdURL()` function.
```{r}
toShow <- getBeIdDescription(
ids=oriId, be="Gene", source="EntrezGene", organism="mouse"
)
toShow$id <- paste0(
sprintf(
''
)
kable(toShow, escape=FALSE, row.names=FALSE)
```
All possible symbols and all possible names for each ID can also be retrieved
using the following functions.
```{r}
res <- getBeIdSymbols(
ids=oriId, be="Gene", source="EntrezGene", organism="mouse",
restricted=FALSE
)
head(res)
```
```{r}
res <- getBeIdNames(
ids=oriId, be="Gene", source="EntrezGene", organism="mouse",
restricted=FALSE
)
head(res)
```
Also probes and some biological entities do not have directly associated
symbols or names. These elements can also be annotated according to information
related to relevant genes.
```{r}
someProbes <- c(
"238834_at", "1569297_at", "213021_at", "225480_at",
"216016_at", "35685_at", "217969_at", "211359_s_at"
)
toShow <- getGeneDescription(
ids=someProbes, be="Probe", source="GPL570", organism="human"
)
kable(toShow, escape=FALSE, row.names=FALSE)
```
## Products of molecular biology processes
The BED data model has beeing built to fulfill molecular biology processes:
- **is_expressed_as** relationships correspond to the transcription process.
- **is_translated_in** relationships correspond to the translation process.
- **codes_for** is a fuzzy relationship allowing the mapping of genes on
object not necessary corresonpding to the same kind of biological molecule.
These processes are described in different databases with different level of
granularity. For exemple, Ensembl provides possible transcripts for each gene
specifying which one of them is canonical.
The following functions are used to retrieve direct products or direct
origins of molecular biology processes.
```{r}
getDirectProduct("ENSG00000145335", process="is_expressed_as")
getDirectProduct("ENST00000336904", process="is_translated_in")
getDirectOrigin("NM_001146055", process="is_expressed_as")
```
# Converting identifiers
## Same entity and same organism: from one source to another
```{r}
res <- convBeIds(
ids=oriId,
from="Gene",
from.source="EntrezGene",
from.org="mouse",
to.source="Ens_gene",
restricted=TRUE,
prefFilter=TRUE
)
head(res)
```
## Same organism: from one entity to another
```{r}
res <- convBeIds(
ids=oriId,
from="Gene",
from.source="EntrezGene",
from.org="mouse",
to="Peptide",
to.source="Ens_translation",
restricted=TRUE,
prefFilter=TRUE
)
head(res)
```
## From one organism to another
```{r}
res <- convBeIds(
ids=oriId,
from="Gene",
from.source="EntrezGene",
from.org="mouse",
to="Peptide",
to.source="Ens_translation",
to.org="human",
restricted=TRUE,
prefFilter=TRUE
)
head(res)
```
## Converting lists of identifiers
List of identifiers can be converted the following way.
Only converted IDs are returned in this case.
```{r}
humanEnsPeptides <- convBeIdLists(
idList=list(a=oriId[1:5], b=oriId[-c(1:5)]),
from="Gene",
from.source="EntrezGene",
from.org="mouse",
to="Peptide",
to.source="Ens_translation",
to.org="human",
restricted=TRUE,
prefFilter=TRUE
)
unlist(lapply(humanEnsPeptides, length))
lapply(humanEnsPeptides, head)
```
### BEIDList
`BEIDList` objects are used to manage lists of BEID with
an attached explicit scope,
and metadata provided in a data frame.
The `focusOnScope()` function is used to easily convert such object to another
scope. For example, in the code below, Entrez gene identifiers are converted
in Ensembl identifiers.
```{r}
entrezGenes <- BEIDList(
list(a=oriId[1:5], b=oriId[-c(1:5)]),
scope=list(be="Gene", source="EntrezGene", organism="Mus musculus"),
metadata=data.frame(
.lname=c("a", "b"),
description=c("Identifiers in a", "Identifiers in b"),
stringsAsFactors=FALSE
)
)
entrezGenes
entrezGenes$a
ensemblGenes <- focusOnScope(entrezGenes, source="Ens_gene")
ensemblGenes$a
```
## Converting data frames
IDs in data frames can also be converted.
```{r}
toConv <- data.frame(a=1:25, b=runif(25))
rownames(toConv) <- oriId
res <- convDfBeIds(
df=toConv,
from="Gene",
from.source="EntrezGene",
from.org="mouse",
to.source="Ens_gene",
restricted=TRUE,
prefFilter=TRUE
)
head(res)
```
## Explore convertion shortest path between two identifiers
Because the conversion process takes into account several resources,
it might be useful to explore the path between two identifiers
which have been mapped. This can be achieved by the `exploreConvPath`
function.
```{r}
from.id <- "ILMN_1220595"
res <- convBeIds(
ids=from.id, from="Probe", from.source="GPL6885", from.org="mouse",
to="Peptide", to.source="Uniprot", to.org="human",
prefFilter=TRUE
)
res
exploreConvPath(
from.id=from.id, from="Probe", from.source="GPL6885",
to.id=res$to[1], to="Peptide", to.source="Uniprot"
)
```
The figure above shows how the `r ifelse(exists("from.id"), from.id, "XXX")`
ProbeID, targeting
the mouse NM_010552 transcript, can be associated
to the `r ifelse(exists("res"), res$to[1], "XXX")` human protein ID in Uniprot.
## Notes about converting from and to gene symbols
Canonical and non-canonical symbols are associated to genes.
In some cases the same symbol (canonical or not) can be associated to
several genes. This can lead to ambiguous mapping.
The strategy to apply for such mapping depends
on the aim of the user and his knowledge about the origin of the
symbols to consider.
The complete mapping between Ensembl gene identifiers and symbols is
retrieved by using the `getBeIDSymbolTable` function.
```{r}
compMap <- getBeIdSymbolTable(
be="Gene", source="Ens_gene", organism="rat",
restricted=FALSE
)
dim(compMap)
head(compMap)
```
The canonical field indicates if the symbol is canonical for the identifier.
The direct field indicates if the symbol is directly associated to the
identifier or indirectly through a relationship with another identifier.
As an example, let's consider the "Snca" symbol in rat. As shown below, this
symbol is associated to 2 genes; it is canonical for one gene and
not for another. These 2 genes are also associated to other symbols.
```{r}
sncaEid <- compMap[which(compMap$symbol=="Snca"),]
sncaEid
compMap[which(compMap$id %in% sncaEid$id),]
```
The `getBeIdDescription` function described before, reports only one symbol
for each identifier. Canonical and direct symbols are prioritized.
```{r}
getBeIdDescription(
sncaEid$id,
be="Gene", source="Ens_gene", organism="rat"
)
```
The `convBeIds` works differently in order to provide a mapping as exhaustive
as possible. If a symbol is associated to several input identifiers,
non-canonical associations with this symbol are removed if a canonical
association exists for any other identifier. This can lead to inconsistent
results, depending on the user input, as show below.
```{r}
convBeIds(
sncaEid$id[1],
from="Gene", from.source="Ens_gene", from.org="rat",
to.source="Symbol"
)
convBeIds(
sncaEid$id[2],
from="Gene", from.source="Ens_gene", from.org="rat",
to.source="Symbol"
)
convBeIds(
sncaEid$id,
from="Gene", from.source="Ens_gene", from.org="rat",
to.source="Symbol"
)
```
In the example above, when the query is run for each identifier independently,
the association to the "Snca" symbol is reported for both.
However, when running the same query with the 2 identifiers at the same time,
the "Snca" symbol is reported only for one gene corresponding to the canonical
association. An additional filter can be used to only keep canonical
symbols:
```{r}
convBeIds(
sncaEid$id,
from="Gene", from.source="Ens_gene", from.org="rat",
to.source="Symbol",
canonical=TRUE
)
```
Finally, as shown below, when running the query the other way,
"Snca" is only associated to the gene for which it is the canonical symbol.
```{r}
convBeIds(
"Snca",
from="Gene", from.source="Symbol", from.org="rat",
to.source="Ens_gene"
)
```
Therefore, the user should chose the function to use with care when needing
to convert from or to gene symbol.
# An interactive dictionary: Shiny module
IDs, symbols and names can be seeked without knowing the original biological
entity or probe. Then the results can be converted to the context of interest.
```{r}
searched <- searchBeid("sv2A")
toTake <- which(searched$organism=="Homo sapiens")[1]
relIds <- geneIDsToAllScopes(
geneids=searched$GeneID[toTake],
source=searched$Gene_source[toTake],
organism=searched$organism[toTake]
)
```
A Shiny gadget integrating these two function has been developped and is also
available as an Rstudio addins.
```{r, eval=FALSE}
relIds <- findBeids()
```
It relies on
a Shiny module (`beidsServer()` and `beidsUI()` functions)
made to facilitate the development
of applications focused on biological entity related information.
The code below shows a minimum example of such an application.
```{r, eval=FALSE}
library(shiny)
library(BED)
library(DT)
ui <- fluidPage(
beidsUI("be"),
fluidRow(
column(
12,
tags$br(),
h3("Selected gene entities"),
DTOutput("result")
)
)
)
server <- function(input, output){
found <- beidsServer("be", toGene=TRUE, multiple=TRUE, tableHeight=250)
output$result <- renderDT({
req(found())
toRet <- found()
datatable(toRet, rownames=FALSE)
})
}
shinyApp(ui = ui, server = server)
```
# Session info
```{r, echo=FALSE, eval=TRUE}
sessionInfo()
```