Title: | Convert Gene IDs Between Each Other and Fetch Annotations from Biomart |
---|---|
Description: | Gene Symbols or Ensembl Gene IDs are converted using the Bimap interface in 'AnnotationDbi' in convertId2() but that function is only provided as fallback mechanism for the most common use cases in data analysis. The main function in the package is convert.bm() which queries BioMart using the full capacity of the API provided through the 'biomaRt' package. Presets and defaults are provided for convenience but all "marts", "filters" and "attributes" can be set by the user. Function convert.alias() converts Gene Symbols to Aliases and vice versa and function likely_symbol() attempts to determine the most likely current Gene Symbol. |
Authors: | Vidal Fey [aut, cre], Henrik Edgren [aut] |
Maintainer: | Vidal Fey <[email protected]> |
License: | GPL-3 |
Version: | 0.1.8 |
Built: | 2024-12-24 06:59:07 UTC |
Source: | CRAN |
convert.alias()
attempts to find all possible symbol-alias combinations for a given gene symbol, i.e.,
it assumes the input ID to be either an Alias or a Symbol and performs multiple queries to find all possible
counterparts. The input IDs are converted to title and upper case before querying and all possibilities are tested.
There are species presets for Human and Mouse annotations.
convert.alias(id, species = c("Human", "Mouse"), db = NULL)
convert.alias(id, species = c("Human", "Mouse"), db = NULL)
id |
( |
species |
( |
db |
( |
A data.frame
with two columns:
'SYMBOL': The official gene symbol. | |
'ALIAS': All possible aliases. | |
convert.alias("TRPV4")
convert.alias("TRPV4")
convert.bm()
is a wrapper for get.bm()
which in turn makes use of getBM()
from the biomaRt package.
It takes a matrix or data frame with the IDs to be converted in one column or as row names as input and returns a data frame with additional
annotations after cleaning the fetched annotations and merging them with the input data frame.
convert.bm( dat, id = "ID", biom.data.set = c("human", "mouse"), biom.mart = c("ensembl", "mouse", "snp", "funcgen", "plants"), host = "https://www.ensembl.org", biom.filter = "ensembl_gene_id", biom.attributes = c("ensembl_gene_id", "hgnc_symbol", "description"), biom.cache = rappdirs::user_cache_dir("biomaRt"), use.cache = TRUE, sym.col = "hgnc_symbol", rm.dups = FALSE, verbose = FALSE )
convert.bm( dat, id = "ID", biom.data.set = c("human", "mouse"), biom.mart = c("ensembl", "mouse", "snp", "funcgen", "plants"), host = "https://www.ensembl.org", biom.filter = "ensembl_gene_id", biom.attributes = c("ensembl_gene_id", "hgnc_symbol", "description"), biom.cache = rappdirs::user_cache_dir("biomaRt"), use.cache = TRUE, sym.col = "hgnc_symbol", rm.dups = FALSE, verbose = FALSE )
dat |
|
id |
|
biom.data.set |
|
biom.mart |
|
host |
|
biom.filter |
|
biom.attributes |
|
biom.cache |
|
use.cache |
( |
sym.col |
|
rm.dups |
|
verbose |
( |
Wrapped around 'get.bm'.
A data frame with the retrieved information.
Vidal Fey
## Not run: dat <- data.frame(ID=c("ENSG00000111199", "ENSG00000134121", "ENSG00000176102", "ENSG00000171611")) bm <- convert.bm(dat) bm ## End(Not run)
## Not run: dat <- data.frame(ID=c("ENSG00000111199", "ENSG00000134121", "ENSG00000176102", "ENSG00000171611")) bm <- convert.bm(dat) bm ## End(Not run)
Gene Symbols or Ensembl Gene IDs are converted using the Bimap interface in 'AnnotationDbi' in convertId2() but that function is only provided as fallback mechanism for the most common use cases in data analysis. The main function in the package is convert.bm() which queries Biomart using the full capacity of the API provided through the 'biomaRt' package. Presets and defaults are provided for convenience but all "marts", "filters" and "attributes" can be set by the user. Function convert.alias() converts Gene Symbols to Aliases and vice versa and function likely_symbol() attempts to determine the most likely current Gene Symbol.
Package: | convertid |
Type: | Package |
Initial version: | 0.1-0 |
Created: | 2021-08-18 |
License: | GPL-3 |
LazyLoad: | yes |
Vidal Fey <[email protected]> Maintainer: Vidal Fey <[email protected]>
convertId2()
uses the Bimap interface in AnnotationDbi to extract information from
annotation packages. The function is limited to Human and Mouse annotations and is provided only as
fallback mechanism for the most common use cases in data analysis. Please use the Biomart interface
function convert.bm()
for more flexibility.
convertId2(id, species = c("Human", "Mouse"))
convertId2(id, species = c("Human", "Mouse"))
id |
( |
species |
( |
A named character vector where the input IDs are the names and the query results the values.
convertId2("ENSG00000111199") convertId2("TRPV4")
convertId2("ENSG00000111199") convertId2("TRPV4")
get.bm()
is a user-friendly wrapper for getBM()
from the biomaRt package with default
settings for Human and Mouse.
It sets all needed variables and performs the query.
get.bm( values, biom.data.set = c("human", "mouse"), biom.mart = c("ensembl", "mouse", "snp", "funcgen", "plants"), host = "https://www.ensembl.org", biom.filter = "ensembl_gene_id", biom.attributes = c("ensembl_gene_id", "hgnc_symbol", "description"), biom.cache = rappdirs::user_cache_dir("biomaRt"), use.cache = TRUE, verbose = FALSE )
get.bm( values, biom.data.set = c("human", "mouse"), biom.mart = c("ensembl", "mouse", "snp", "funcgen", "plants"), host = "https://www.ensembl.org", biom.filter = "ensembl_gene_id", biom.attributes = c("ensembl_gene_id", "hgnc_symbol", "description"), biom.cache = rappdirs::user_cache_dir("biomaRt"), use.cache = TRUE, verbose = FALSE )
values |
|
biom.data.set |
|
biom.mart |
|
host |
|
biom.filter |
|
biom.attributes |
|
biom.cache |
|
use.cache |
( |
verbose |
( |
A data frame with the retrieved information.
Vidal Fey
## Not run: val <- c("ENSG00000111199", "ENSG00000134121", "ENSG00000176102", "ENSG00000171611") bm <- get.bm(val) bm ## End(Not run)
## Not run: val <- c("ENSG00000111199", "ENSG00000134121", "ENSG00000176102", "ENSG00000171611") bm <- get.bm(val) bm ## End(Not run)
likely_symbol()
downloads the latest version of the HGNC gene symbol database as a text
file and query it to obtain symbol aliases, previous symbols and all symbols currently in use. (Optionally)
assuming the input ID to be either an Alias or a Symbol or a Previous Symbol it performs multiple queries and
compares the results of all possible combinations to determine a likely current Symbol.
likely_symbol( syms, alias_sym = TRUE, prev_sym = TRUE, orgnsm = "human", hgnc = NULL, hgnc_url = NULL, output = c("likely", "symbols", "all"), verbose = TRUE )
likely_symbol( syms, alias_sym = TRUE, prev_sym = TRUE, orgnsm = "human", hgnc = NULL, hgnc_url = NULL, output = c("likely", "symbols", "all"), verbose = TRUE )
syms |
( |
alias_sym |
( |
prev_sym |
( |
orgnsm |
( |
hgnc |
( |
hgnc_url |
( |
output |
( |
verbose |
( |
Please note that the algorithm is very slow for large input vectors.
A data.frame
with the following columns depending on the output
setting.
output="likely"
:
'likely_symbol' | |
'input_symbol' | |
output="symbols"
:
'current_symbols' | |
'likely_symbol' | |
'input_symbol' | |
'all_symbols' | |
output="all"
:
'orig_input' | |
'organism' | |
'current_symbols' | |
'likely_symbol' | |
'input_symbol' | |
'all_symbols' | |
Only fully implemented for Human for now.
## Not run: likely_symbol(c("ABCC4", "ACPP", "KIAA1524")) ## End(Not run)
## Not run: likely_symbol(c("ABCC4", "ACPP", "KIAA1524")) ## End(Not run)
todisp2()
uses Biomart by employing get.bm()
to retrieve Gene Symbols for a set of Ensembl
Gene IDs. It is mainly meant as a fast way to convert IDs in standard gene expression analysis output to Symbols,
e.g., for visualisation, which is why the input ID type is hard coded to ENSG IDs. If Biomart is not available
the function can fall back to use convertId2()
or a user-provided data frame with corresponding ENSG IDs and
Symbols.
todisp2(ensg, lab = NULL, biomart = TRUE, verbose = FALSE)
todisp2(ensg, lab = NULL, biomart = TRUE, verbose = FALSE)
ensg |
( |
lab |
( |
biomart |
( |
verbose |
( |
A character vector of Gene Symbols.
## Not run: val <- c("ENSG00000111199", "ENSG00000134121", "ENSG00000176102", "ENSG00000171611") sym <- todisp2(val) sym ## End(Not run)
## Not run: val <- c("ENSG00000111199", "ENSG00000134121", "ENSG00000176102", "ENSG00000171611") sym <- todisp2(val) sym ## End(Not run)