Title: | Interface to the 'g:Profiler' Toolset |
---|---|
Description: | A toolset for functional enrichment analysis and visualization, gene/protein/SNP identifier conversion and mapping orthologous genes across species via 'g:Profiler' (<https://biit.cs.ut.ee/gprofiler/>). The main tools are: (1) 'g:GOSt' - functional enrichment analysis and visualization of gene lists; (2) 'g:Convert' - gene/protein/transcript identifier conversion across various namespaces; (3) 'g:Orth' - orthology search across species; (4) 'g:SNPense' - mapping SNP rs identifiers to chromosome positions, genes and variant effects. This package is an R interface corresponding to the 2019 update of 'g:Profiler' and provides access to 'g:Profiler' for versions 'e94_eg41_p11' and higher. See the package 'gProfileR' for accessing older versions from the 'g:Profiler' toolset. |
Authors: | Liis Kolberg <[email protected]>, Uku Raudvere <[email protected]> |
Maintainer: | Liis Kolberg <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.3 |
Built: | 2024-12-21 06:31:02 UTC |
Source: | CRAN |
Interface to the g:Profiler tool g:Convert (https://biit.cs.ut.ee/gprofiler/convert) that uses the information in Ensembl databases to handle hundreds of types of identifiers for genes, proteins, transcripts, microarray probesets, etc, for many species, experimental platforms and biological databases. The input is flexible: it accepts a mixed list of IDs and recognises their types automatically. It can also serve as a service to get all genes belonging to a particular functional category.
gconvert( query, organism = "hsapiens", target = "ENSG", numeric_ns = "", mthreshold = Inf, filter_na = TRUE )
gconvert( query, organism = "hsapiens", target = "ENSG", numeric_ns = "", mthreshold = Inf, filter_na = TRUE )
query |
character vector that can consist of mixed types of gene IDs (proteins, transcripts, microarray IDs, etc), SNP IDs, chromosomal intervals or term IDs. |
organism |
organism name. Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - 'hsapiens', mouse - 'mmusculus'. |
target |
target namespace. |
numeric_ns |
namespace to use for fully numeric IDs (list of available namespaces). |
mthreshold |
maximum number of results per initial alias to show. Shows all by default. |
filter_na |
logical indicating whether to filter out results without a corresponding target. |
The output is a data.frame which is a table closely corresponding to the web interface output.
The result fields are further described in the vignette.
Liis Kolberg <[email protected]>, Uku Raudvere <[email protected]>
gconvert(c("POU5F1", "SOX2", "NANOG"), organism = "hsapiens", target="AFFY_HG_U133_PLUS_2")
gconvert(c("POU5F1", "SOX2", "NANOG"), organism = "hsapiens", target="AFFY_HG_U133_PLUS_2")
Get the TLS version for SSL
get_tls_version()
get_tls_version()
Get the HTTP User-Agent string.
get_user_agent()
get_user_agent()
Get version info of g:Profiler data sources
get_version_info(organism = "hsapiens")
get_version_info(organism = "hsapiens")
organism |
organism name. Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - 'hsapiens', mouse - 'mmusculus'. |
A named nested list that includes the versions for all the data sources (GO, KEGG, Reactome, WP, etc) at the time of the data extraction for the given organism. The versions correspond to the g:Profiler version embedded in the base_url which is also returned by this function under the name 'gprofiler_version'.
Liis Kolberg <[email protected]>
## Not run: version_info <- get_version_info(organism = "hsapiens")
## Not run: version_info <- get_version_info(organism = "hsapiens")
Interface to the g:Profiler tool g:Orth (https://biit.cs.ut.ee/gprofiler/orth) that, given a target organism, retrieves the genes of the target organism that are similar in sequence to the source organism genes in the input.
gorth( query, source_organism = "hsapiens", target_organism = "mmusculus", numeric_ns = "", mthreshold = Inf, filter_na = TRUE )
gorth( query, source_organism = "hsapiens", target_organism = "mmusculus", numeric_ns = "", mthreshold = Inf, filter_na = TRUE )
query |
character vector of gene IDs to be translated. |
source_organism |
name of the source organism. Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - 'hsapiens', mouse - 'mmusculus'. |
target_organism |
name of the target organism. Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - 'hsapiens', mouse - 'mmusculus'. |
numeric_ns |
namespace to use for fully numeric IDs (list of available namespaces). |
mthreshold |
maximum number of ortholog names per gene to show. |
filter_na |
logical indicating whether to filter out results without a corresponding target name. |
The output is a data.frame which is a table closely corresponding to the web interface output.
The result fields are further described in the vignette.
Liis Kolberg <[email protected]>, Uku Raudvere <[email protected]>
gorth(c("Klf4","Pax5","Sox2","Nanog"), source_organism="mmusculus", target_organism="hsapiens")
gorth(c("Klf4","Pax5","Sox2","Nanog"), source_organism="mmusculus", target_organism="hsapiens")
Interface to the g:Profiler tool g:GOSt (https://biit.cs.ut.ee/gprofiler/gost) for functional enrichments analysis of gene lists. In case the input 'query' is a list of gene vectors, results for multiple queries will be returned in the same data frame with column 'query' indicating the corresponding query name. If 'multi_query' is selected, the result is a data frame for comparing multiple input lists, just as in the web tool.
gost( query, organism = "hsapiens", ordered_query = FALSE, multi_query = FALSE, significant = TRUE, exclude_iea = FALSE, measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05, correction_method = c("g_SCS", "bonferroni", "fdr", "false_discovery_rate", "gSCS", "analytical"), domain_scope = c("annotated", "known", "custom", "custom_annotated"), custom_bg = NULL, numeric_ns = "", sources = NULL, as_short_link = FALSE, highlight = FALSE )
gost( query, organism = "hsapiens", ordered_query = FALSE, multi_query = FALSE, significant = TRUE, exclude_iea = FALSE, measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05, correction_method = c("g_SCS", "bonferroni", "fdr", "false_discovery_rate", "gSCS", "analytical"), domain_scope = c("annotated", "known", "custom", "custom_annotated"), custom_bg = NULL, numeric_ns = "", sources = NULL, as_short_link = FALSE, highlight = FALSE )
query |
character vector, or a (named) list of character vectors for multiple queries, that can consist of mixed types of gene IDs (proteins, transcripts, microarray IDs, etc), SNP IDs, chromosomal intervals or term IDs. |
organism |
organism name. Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - 'hsapiens', mouse - 'mmusculus'. |
ordered_query |
in case input gene lists are ranked this option may be used to get GSEA style p-values. |
multi_query |
in case of multiple gene lists, returns comparison table of these lists. If enabled, the result data frame has columns named 'p_values', 'gconvert_sizes', 'intersection_sizes' with vectors showing values in the order of input queries. Set 'multi_gconvert' to FALSE and simply input query as list of multiple gene vectors to get the results in a long format. |
significant |
whether all or only statistically significant results should be returned. |
exclude_iea |
exclude GO electronic annotations (IEA). |
measure_underrepresentation |
measure underrepresentation. |
evcodes |
include evidence codes to the results. Note that this can decrease performance and make the query slower. In addition, a column 'intersection' is created that contains the gene id-s that intersect between the query and term. This parameter does not work if 'multi_query' is set to TRUE. |
user_threshold |
custom p-value threshold for significance, results with smaller p-value are tagged as significant. We don't recommend to set it higher than 0.05. |
correction_method |
the algorithm used for multiple testing correction, one of "gSCS" (synonyms: "analytical", "g_SCS"), "fdr" (synonyms: "false_discovery_rate"), "bonferroni". |
domain_scope |
how to define statistical domain, one of "annotated", "known", "custom" or "custom_annotated". |
custom_bg |
vector of gene names to use as a statistical background. If given, the domain_scope is by default set to "custom", if domain_scope is set to "custom_annotated", then this is used instead. |
numeric_ns |
namespace to use for fully numeric IDs (list of available namespaces). |
sources |
a vector of data sources to use. Currently, these include GO (GO:BP, GO:MF, GO:CC to select a particular GO branch), KEGG, REAC, TF, MIRNA, CORUM, HP, HPA, WP. Please see the g:GOSt web tool for the comprehensive list and details on incorporated data sources. |
as_short_link |
indicator to return results as short-link to the g:Profiler web tool. If set to TRUE, then the function returns the results URL as a character string instead of the data.frame. |
highlight |
indicator to return a TRUE-FALSE column called 'highlighted' to indicate driver terms in GO. |
The input gene lists are not stored in g:Profiler unless the option 'as_short_link' is set to TRUE.
A named list where 'result' contains data.frame with the enrichment analysis results and 'meta' contains metadata needed for Manhattan plot. If the input consisted of several lists the corresponding list is indicated with a variable 'query'. The 'result' data.frame is ordered first by the query name, data source (such as GO:BP, GO:CC, GO:MF, REAC, etc), and then by the adjusted p-value. When requesting a 'multi_query', either TRUE or FALSE, the columns of the resulting data frame differ. If 'evcodes' is set, the return value includes columns 'evidence_codes' and 'intersection'. The latter conveys info about the intersecting genes between the corresponding query and term.
The result fields are further described in the vignette.
If 'as_short_link' is set to TRUE, then the result is a character short-link to see and share corresponding results via the g:Profiler web tool. In this case, the input gene lists will be stored in a database.
Liis Kolberg <[email protected]>, Uku Raudvere <[email protected]>
gostres <- gost(c("X:1000:1000000", "rs17396340", "GO:0005005", "ENSG00000156103", "NLRP1"))
gostres <- gost(c("X:1000:1000000", "rs17396340", "GO:0005005", "ENSG00000156103", "NLRP1"))
This function creates a Manhattan plot out of the results from gprofiler2::gost(). The plot is very similar to the one shown in the g:GOSt web tool.
gostplot( gostres, capped = TRUE, interactive = TRUE, pal = c(`GO:MF` = "#dc3912", `GO:BP` = "#ff9900", `GO:CC` = "#109618", KEGG = "#dd4477", REAC = "#3366cc", WP = "#0099c6", TF = "#5574a6", MIRNA = "#22aa99", HPA = "#6633cc", CORUM = "#66aa00", HP = "#990099") )
gostplot( gostres, capped = TRUE, interactive = TRUE, pal = c(`GO:MF` = "#dc3912", `GO:BP` = "#ff9900", `GO:CC` = "#109618", KEGG = "#dd4477", REAC = "#3366cc", WP = "#0099c6", TF = "#5574a6", MIRNA = "#22aa99", HPA = "#6633cc", CORUM = "#66aa00", HP = "#990099") )
gostres |
named list from gost() function (with names 'result' and 'meta') |
capped |
whether the -log10(p-values) would be capped if >= 16, just as in the web options. |
interactive |
if enabled, returns interactive plot using 'plotly'. If disabled, static 'ggplot()' object is returned. |
pal |
values mapped to relevant colors for data sources. |
The output is either a plotly object (if interactive = TRUE) or a ggplot object (if interactive = FALSE).
Liis Kolberg <[email protected]>
gostres <- gost(c("Klf4", "Pax5", "Sox2", "Nanog"), organism = "mmusculus") gostplot(gostres)
gostres <- gost(c("Klf4", "Pax5", "Sox2", "Nanog"), organism = "mmusculus") gostplot(gostres)
Interface to the g:Profiler tool g:SNPense (https://biit.cs.ut.ee/gprofiler/snpense) that maps SNP rs identifiers to chromosome positions, genes and variant effects. Available only for human variants.
gsnpense(query, filter_na = TRUE)
gsnpense(query, filter_na = TRUE)
query |
vector of SNP IDs to be translated (should start with prefix 'rs'). |
filter_na |
logical indicating whether to filter out results without a corresponding target name. |
The output is a data.frame which is a table closely corresponding to the web interface output. Columns 'ensgs' and 'gene_names' can contain list of multiple values.
The result fields are further described in the vignette.
Liis Kolberg <[email protected]>, Uku Raudvere <[email protected]>
gsnpense(c("rs11734132", "rs7961894", "rs4305276", "rs17396340", "rs3184504"))
gsnpense(c("rs11734132", "rs7961894", "rs4305276", "rs17396340", "rs3184504"))
Map vector of numeric values to Viridis color scale.
mapViridis(values, domain_min = 0, domain_max = 50, n = 256)
mapViridis(values, domain_min = 0, domain_max = 50, n = 256)
values |
vector of numeric values (mostly -log10(p-values)) |
domain_min |
numeric value that corresponds to the 'yellow' in the color scale |
domain_max |
numeric value that corresponds to the 'dark blue' in the color scale |
n |
number of bins to generate from the color scale |
The output is a corresponding vector of colors from the Viridis color scale with domain in range(domain_min, domain_max).
Liis Kolberg <[email protected]>
This function allows to highlight a list of selected terms on the Manhattan plot created with the gprofiler2::gostplot() function. The resulting plot is saved to a publication ready image if 'filename' is specified. The plot is very similar to the one shown in the g:GOSt web tool after clicking on circles.
publish_gostplot( p, highlight_terms = NULL, filename = NULL, width = NA, height = NA )
publish_gostplot( p, highlight_terms = NULL, filename = NULL, width = NA, height = NA )
p |
ggplot object from gostplot(gostres, interactive = FALSE) function |
highlight_terms |
vector of selected term IDs from the analysis or a (subset) data.frame that has a column called 'term_id'. No annotation is added if set to NULL. |
filename |
file name to create on disk and save the annotated plot. Filename extension should be from c("png", "pdf", "jpeg", "tiff", "bmp"). |
width |
plot width in inches. If not supplied, the size of current graphics device is used. |
height |
plot height in inches. If not supplied, the size of current graphics device is used. |
The output is a ggplot object.
Liis Kolberg <[email protected]>
gostres <- gost(c("Klf4", "Pax5", "Sox2", "Nanog"), organism = "mmusculus") p <- gostplot(gostres, interactive = FALSE) publish_gostplot(p, highlight_terms = c("GO:0001010", "REAC:R-MMU-8939245"))
gostres <- gost(c("Klf4", "Pax5", "Sox2", "Nanog"), organism = "mmusculus") p <- gostplot(gostres, interactive = FALSE) publish_gostplot(p, highlight_terms = c("GO:0001010", "REAC:R-MMU-8939245"))
This function creates a table mainly for the results from gost() function. However, if the input at least contains columns named 'term_id' and 'p_value' then any enrichment results data frame can be visualised in a table with this function.
publish_gosttable( gostres, highlight_terms = NULL, use_colors = TRUE, show_columns = c("source", "term_name", "term_size", "intersection_size"), filename = NULL, ggplot = TRUE )
publish_gosttable( gostres, highlight_terms = NULL, use_colors = TRUE, show_columns = c("source", "term_name", "term_size", "intersection_size"), filename = NULL, ggplot = TRUE )
gostres |
named list from gost() function (with names 'result' and 'meta') or a data frame that has columns named "term_id" and "p_value(s)". |
highlight_terms |
vector of selected term IDs from the analysis or a (subset) data.frame that has a column called 'term_id'. All data is shown if set to NULL. |
use_colors |
if enabled, the p-values are highlighted in the viridis colorscale just as in g:Profiler, otherwise the table has no background colors. |
show_columns |
names of additional columns to show besides term_id and p_value. By default the output table shows additional columns named "source", "term_name", "term_size", "intersection_size" |
filename |
file name to create on disk and save the annotated plot. Filename extension should be from c("png", "pdf", "jpeg", "tiff", "bmp"). |
ggplot |
if FALSE, then the function returns a gtable object. |
The output table is very similar to the one shown under the Manhattan plot.
The output is a ggplot object.
Liis Kolberg <[email protected]>
gostres <- gost(c("Klf4", "Pax5", "Sox2", "Nanog"), organism = "mmusculus") publish_gosttable(gostres, highlight_terms = c("GO:0001010", "REAC:R-MMU-8939245"))
gostres <- gost(c("Klf4", "Pax5", "Sox2", "Nanog"), organism = "mmusculus") publish_gosttable(gostres, highlight_terms = c("GO:0001010", "REAC:R-MMU-8939245"))
This function returns a vector of randomly selected genes from the selected organism.
random_query(organism = "hsapiens")
random_query(organism = "hsapiens")
organism |
organism name. Organism names are constructed by concatenating the first letter of the name and the family name. Example: human - 'hsapiens', mouse - 'mmusculus'. |
a character vector containing randomly selected gene IDs from the selected organism.
Liis Kolberg <[email protected]>
random_genes <- random_query()
random_genes <- random_query()
Function to change the g:Profiler base URL. Useful for overriding the default URL (http://biit.cs.ut.ee/gprofiler) with the beta (http://biit.cs.ut.ee/gprofiler_beta) or an archived version (available starting from the version e94_eg41_p11, e.g. http://biit.cs.ut.ee/gprofiler_archive3/e94_eg41_p11).
set_base_url(url)
set_base_url(url)
url |
the base URL. |
Set the TLS version. Could be useful at environments where SSL was built without TLS 1.2 support
set_tls_version(v)
set_tls_version(v)
v |
version: "1.2" (default), "1.1" (fallback) |
Set the HTTP User-Agent string. Useful for overriding the default user agent for packages that depend on gprofiler2 functionality.
set_user_agent(ua, append = F)
set_user_agent(ua, append = F)
ua |
the user agent string. |
append |
logical indicating whether to append the passed string to the default user agent string. |
Upload your own annotation data using files in the Gene Matrix Transposed file format (GMT) for functional enrichment analysis in g:GOSt. The accepted file is either a single annotations file (with the extension .gmt) or a compressed directory of multiple annotation GMT files (with the extension .zip). The GMT format is a tab-separated list of gene annotation sets where every line represents a separate gene set/functional term. The first column defines the function ID, second defines a short name/description of the function and the following columns are the list of genes related to the specific function in that row.
upload_GMT_file(gmtfile)
upload_GMT_file(gmtfile)
gmtfile |
the filepath of the GMT file to be uploaded. The file extension should be .gmt or .zip in case of multiple GMT files. If the filepath does not contain an absolute path, the filename is relative to the current working directory. |
The uploaded filename is used to define 'source' name in the g:GOSt results.
A string that denotes the ID of the uploaded custom annotations in the g:Profiler database.
After the GMT file upload this unique ID can be used as a value for the argument 'organism' in the gost()
function to perform
functional enrichment analysis based on these custom data.
No need to repeatedly upload the same custom GMT file(s) every time you want to do the enrichment analysis. The custom ID can also be used in the web tool as a token under the Custom GMT options.
Liis Kolberg <[email protected]>
## Not run: custom_id <- upload_GMT_file("path/to/file.gmt")
## Not run: custom_id <- upload_GMT_file("path/to/file.gmt")