The annotation enrichment analysis increases the chance of identifying relevant biological pathways in a list of genes or proteins. The post translational enrichment, integration, and matching analysis (PEIMAN v1) software was introduced to provide a systematic framework to identify more probable and enriched post-translational modification (PTM) terms in a list of proteins obtained from high-throughput technologies (Nickchi, Jafari, and Kalantari 2015). PEIMAN maps a large list of proteins to PTM pathways and test for their statistical significance, using a hypergeometric test. PEIMAN uses the most traditional way of enrichment analysis, by getting a list of proteins selected by user, and search for enriched PTM terms one by one. This strategy is called Singular Enrichment Analysis or SEA. Although this is a very promising approach for identifying biological pathways, the quality of selected list by researcher can potentially affect results at the end of the analysis.
To avoid this problem, we extend our enrichment framework to a wider class of enrichment analysis called Gene Set Enrichment Analysis or GSEA (Subramanian et al. 2005). The underlying idea of GSEA is very similar to SEA. Instead of applying a cutoff on input genes obtained from micro array experiments (either p-value or fold-change in gene expression), a ‘no-cutoff’ strategy is considered. The immediate benefits of this approach is to reduce the bias of gene selection and include genes with a low change in their expression level to participate in final analysis. The maximum value of the running score profile for ranked genes in each enrichment category is then calculated and compared with random scores obtained from permutation. More details on (Subramanian et al. 2005). This framework can be expanded to enrichment analysis in proteins. Inspired by GSEA idea, we here introduce a package in R for Protein Set Enrichment Analysis (PSEA).
The database in PEIMAN package updates monthly according to changes in UniProt. The package can be used to perform singular enrichment analysis (SEA) and visualize the results. PEIMAN can also be used to match and integrate results of two SEA analysis (for the same species) by visualizing their common pathways. To correct for biases in SEA, we implement protein set enrichment analysis (PSEA) as a new tool for computational community. Researchers can use this package to run PSEA and visualize the results.
We consider two example datasets to demonstrate the features of our package.
exmplData1
: We use the first example data for single
enrichment analysis. This dataset contains two list of human proteins
randomly selected from UniProt. The first list contains 45 proteins and
the second list contains of 97 randomly selected proteins. Both lists
belongs to Homo Sapiens (Human). Note: Only the first six proteins in
each list are shown below.P31946 |
P62258 |
Q04917 |
P61981 |
P31947 |
P27348 |
P17174 |
Q9NY61 |
P00505 |
Q96GS6 |
Q5VST6 |
Q6PCB6 |
exmplData2
: We will use the second dataset to perform
protein set enrichment analysis or PSEA. The dataset is described in
(Gholizadeh et al. 2021).UniProtAC | Score |
---|---|
P47819 | 579.6287 |
P20428 | 129.7175 |
P62982 | 2139.2700 |
P0CG51 | 2139.2700 |
P62986 | 2139.2700 |
Q63429 | 2139.2700 |
In this section, we introduce the functions related to singular
enrichment analysis or SEA in PEIMAN2 package. The functions in this
section are divided into two parts, functions for enrichment and
functions for plotting. We use exmplData1
in this part.
runEnrichment()
function can be used to run singular
enrichment analysis for one list of protein. This function takes the
following inputs:
protein
which is a character vector with protein
UniProt accession codes.os.name
which is a character vector of length one with
exact taxonomy name of species.p.adj.method
which is pvalue adjustment methos and
optional. By default the value is set to ‘BH’. To see a possible list of
values, type p.adjust.methods
in R console.As it was mentioned, the taxonomy name of species must be provided,
e.g for a list of proteins belongs to human we pass os.name
as ‘Homo sapiens (Human)’. The list is available at UniProt website. We
also included a helper function named getTaxonomyName
to
help getting the exact taxonomy name. More on this function later.
The following lines of code illustrate the steps to run SEA on
exmplData1
. In runEnrichment
function, we pass
pl1
(a character vector of UniProt accession code) to
perform SEA as follows and save the results in enrich1
.
# Load PEIMAN2 package
library(PEIMAN2)
#> Loading required package: tidyverse
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Extract dataset and assign a variable name to it
pl1 <- exmplData1$pl1
# Run SEA on the list
enrich1 <- runEnrichment(protein = pl1, os.name = 'Homo sapiens (Human)')
The function returns a dataframe with the following columns:
PTM
: Post-translational modification (PTM).Freq in Uniprot
: The total number of proteins with this
PTM in UniProt.Freq in List
: The total number of proteins with this
PTM in the list.Sample
: Number of proteins in the given list.Population
: Total number of proteins in the current
version of PEIMAN databse.pvalue
: The p-value obtained from hypergeometric test
(enrichment analysis).corrected pvalue
: Adjusted p-value to correct for
multiple testing.AC
: Uniprot accession code (AC) of proteins with each
PTM.PTM | FreqinUniprot | FreqinList | Sample | Population | pvalue | corrected pvalue | AC |
---|---|---|---|---|---|---|---|
N6-(pyridoxal phosphate)lysine | 53 | 5 | 97 | 14256 | 2e-06 | 6e-05 | Q96QU6; Q4AC99; Q8N5Z0; Q8NHS2; P17174 |
Pyridoxal phosphate | 60 | 5 | 97 | 14256 | 3e-06 | 6e-05 | Q96QU6; Q4AC99; Q8N5Z0; Q8NHS2; P17174 |
Isoglutamyl cysteine thioester (Cys-Gln) | 7 | 2 | 97 | 14256 | 1e-05 | 1e-04 | P01023; A8K2U0 |
Thioester bond | 11 | 2 | 97 | 14256 | 5e-05 | 5e-04 | P01023; A8K2U0 |
S-cysteinyl cysteine | 3 | 1 | 97 | 14256 | 1e-04 | 1e-03 | P01009 |
Sulfation | 57 | 3 | 97 | 14256 | 6e-04 | 4e-03 | P05408; P08697; P05067 |
Note: As it was mentioned, the os.name is the exact taxonomy name of
species that you are working with. The name should be exactly the same
as UniProt definition. To facilitate searching for this name, you can
pass your protein list with UniProt accession ID to
getTaxonomyName
function as follows. The result is the
exact taxonomy name of protein list that you need to pass to
runEnrichment
. In the following example, the exact taxonomy
name is printed:
Similarly, we can run SEA for the second list of proteins:
# Extract dataset and assign a variable name to it
pl2 <- exmplData1$pl2
# Run SEA on the list
enrich2 <- runEnrichment(protein = pl2, os.name = 'Homo sapiens (Human)')
PTM | FreqinUniprot | FreqinList | Sample | Population | pvalue | corrected pvalue | AC |
---|---|---|---|---|---|---|---|
Nucleotide-binding | 1800 | 33 | 45 | 14256 | 0e+00 | 0.000 | O95477; Q9BZC7; Q99758; P78363; Q8WWZ7; Q8N139; Q8IZY2; O94911; Q8IUA7; Q8WWZ4; Q86UK0; Q86UQ4; Q2M3G0; Q9NP58; O75027; Q9NP78; Q9NRK6; O95342; Q09428; O60706; P33897; Q9UBJ2; P28288; O14678; P61221; Q8NE71; Q9UG63; Q9NUQ8; P45844; Q9UNQ0; Q9H172; Q9H222; Q96J66 |
Glutathionylation | 11 | 1 | 45 | 14256 | 5e-04 | 0.003 | Q9NRK6 |
Glycoprotein | 4691 | 25 | 45 | 14256 | 5e-04 | 0.003 | O95477; Q9BZC7; Q99758; P78363; Q8WWZ7; Q8N139; Q8IZY2; O94911; Q8IUA7; Q86UK0; Q2M3G0; Q9NP58; O95342; Q09428; O60706; P33897; Q9UBJ2; P28288; Q9UNQ0; Q9H172; Q9H222; Q9H221; Q8N2K0; Q0P651; Q96J66 |
N6-(pyridoxal phosphate)lysine | 53 | 2 | 45 | 14256 | 6e-04 | 0.003 | P17174; P00505 |
S-glutathionyl cysteine | 8 | 1 | 45 | 14256 | 3e-04 | 0.003 | Q9NRK6 |
Pyridoxal phosphate | 60 | 2 | 45 | 14256 | 9e-04 | 0.004 | P17174; P00505 |
The plotEnrichment
function can be used to visualize
singular enrichment analysis for one set of proteins or match, analyse,
and integrate results for two sets of proteins. To read more about this
match and integration, please read details at (Nickchi, Jafari, and Kalantari 2015). We start
by plotting the results for the firs list.
The results is a Lollipop plot which presents “Relative frequency” of
each “PTM keywords” along with their corrected p-value measured in log
scale. Note that only significant PTMs are shown. The default value for
significance level is 5 percent. One can also visualize and match the
results of two enrichment. For example, we can see the integrated
results of enrich1
and enrich2
by the
following line of code:
The plot presents the ‘Relative frequency’ of common PTM terms among
two enriched list (x and y). The coloring is the corrected p-value
measured in log scale. By default a significance level of 5 percent is
set to filter results. This can be modified by sig.level
parameter.
In this section, we introduce the functions for protein set
enrichment analysis (PSEA). The functions in this section are divided
into two parts, functions for PSEA and functions for plotting the
results. We use exmplData2
in this part.
In order to run protein set enrichment analysis (PSEA), you can use
runPSEA
function. This function takes the following
inputs:
protein
: A character vector with protein UniProt
accession.os.name
: A character of length one for the exact name
of organism name.pexponent
: Enrichment weighting exponent, p. The
default value is 1. For values of p < 1, one can detect incoherent
patterns in a set of protein. If one expects a small number of proteins
to be coherent in a large set, then p > 1 is a good choice.nperm
: Number of permutation to adjust for multiple
testing in different pathways. Default is 1000.p.adj.method
: The adjustment method to correct for
multiple testing. Run p.adjust.methods
to get a list of
possible methods.sig.level
: The significance level to filter pathways
(applies to adjusted p-value), 0.05 is the default value.minSize
: PTM pathways with a lower number of proteins
than minSize are excluded. The default value is one.The result is a list with 6 elements. The first element of this list is important: A dataframe with protein set enrichment analysis (PSEA) results. Every row corresponds to a post-translational modification (PTM) pathway with the following columns:
pval
: p-value for singular enrichment analysis.pvaladj
: Adjusted p-valueES
: Enrichment scoreNES
: Enrichmnt score normalized to mean enrichment of
random samples of the same size.nMoreExtreme
: Number of times the permuted sample
resulted in profile with ES larger than abs(ES original)size
: Number of proteins in the pathwayEnrichment
: Whether the proteins in the pathway have
been enriched in the list.leadingEdge
: UniProt accession code of leading edge
proteins that drive the enrichment.PTM | pval | pvaladj | FreqinUniProt | FreqinList | ES | NES | nMoreExtreme | size | Enrichment | AC | leadingEdge |
---|---|---|---|---|---|---|---|---|---|---|---|
ADP-ribosylglycine | 0e+00 | 0e+00 | 4 | 4 | 0.7707317 | 1.5658312 | 281 | 4 | Significant | P62986; P62982; P0CG51; Q63429 | P62982; P0CG51; P62986; Q63429 |
Acetylation | 0e+00 | 0e+00 | 1762 | 123 | 0.7521522 | 1.1849051 | 12 | 123 | Significant | P0C1X8; P11030; P60711; P63259; Q63028; Q62847; Q62848; Q9WUC4; P31399; P29419; P21571; P15999; D3ZAF6; Q9JJW3; O08839; P0DP29; P0DP30; P0DP31; P18418; P26772; P63039; B0K020; P08081; P08082; P45592; Q91ZN1; P11240; Q63768; P10715; P62898; Q9JHL4; Q7M0E3; P62628; Q07266; P84060; P62870; P15429; P07323; P60841; P56571; B0BN94; P55053; P55051; P07483; Q62658; Q32PX7; Q99PF5; Q5XI73; Q63228; P62994; P01946; P02091; P11517; P62959; P82995; P34058; P27321; Q5XI72; P50411; Q6AXU6; Q5BK20; P11980; Q99MZ8; Q792I0; Q66HF9; P15205; Q5M7W5; P02688; B0BN72; P30904; O35763; P62775; Q71UE8; Q9JJ19; P13084; Q01205; P08461; Q920Q0; O88767; P04785; P31044; O55012; P10111; Q6J4I0; Q9R063; Q9EPC6; P02625; Q63475; P51583; Q68A21; P02401; P62982; P62859; Q6RJR6; Q9JK11; Q63945; B0BN85; P07632; Q66HL2; P28042; O35814; P13668; P37377; Q62880; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; Q6PEC1; P11232; P62076; P62078; Q9WV97; P48500; P04692; P58775; Q63610; P09495; Q7M767; Q9Z1A5; P63045 | P62628; P31044; P37377; P45592; P11030; P02625; P29419; P62775; P21571; O88767; P31399; P02688; P08082; P62898; P63045; P62076; P11232; O35814; Q9WUC4; Q62658; Q63228; P07632; Q5XI73; B0K020; P08081; P62959 |
Cysteine sulfinic acid (-SO2H) | 0e+00 | 0e+00 | 1 | 1 | 0.9423077 | -54.1472291 | 54 | 1 | Not significant | O88767 | O88767 |
N-acetylaspartate | 0e+00 | 0e+00 | 1 | 1 | -0.9615385 | 343.1506849 | 41 | 1 | Significant | P60711 | P31044 |
N-acetylglutamate | 0e+00 | 0e+00 | 1 | 1 | -0.9663462 | -24.1065407 | 46 | 1 | Not significant | P63259 | P31044 |
N6-acetyllysine | 0e+00 | 0e+00 | 992 | 73 | 0.7226249 | 1.1430644 | 53 | 73 | Significant | P11030; Q62848; Q9WUC4; P31399; P29419; P21571; P15999; D3ZAF6; Q9JJW3; P0DP29; P0DP30; P0DP31; P18418; P26772; P63039; B0K020; P08081; P08082; P45592; P11240; P62898; Q9JHL4; Q7M0E3; P07323; P56571; Q62658; Q99PF5; Q5XI73; P62994; P01946; P62959; P82995; P34058; P27321; Q6AXU6; Q5BK20; P11980; Q99MZ8; P02688; B0BN72; P30904; O35763; P62775; Q71UE8; P13084; Q01205; P08461; O88767; P04785; P10111; Q9R063; Q63475; P51583; Q68A21; P02401; P62982; Q9JK11; Q63945; P07632; Q66HL2; P28042; O35814; P13668; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; P11232; P48500; P09495; Q9Z1A5 | P45592; P11030; P29419; P62775; P21571; O88767; P31399; P02688; P08082; P62898; P11232; O35814; Q9WUC4; Q62658; P07632; Q5XI73; B0K020; P08081; P62959 |
Phosphoprotein | 0e+00 | 0e+00 | 4088 | 171 | 0.5995932 | 0.9321392 | 745 | 171 | Significant | P0C1X8; P11030; Q63028; Q62847; O08838; Q99068; Q05140; Q62848; Q9WUC4; P29419; P21571; P15999; D3ZAF6; Q05175; O08839; O88778; P0DP29; P0DP30; P0DP31; O35783; O35397; P26772; P63039; P08081; P08082; P10354; P45592; Q91ZN1; P11240; P84087; Q5U2U2; Q63768; Q6AY72; P11951; P10715; P62898; Q9JHL4; Q9QXU8; Q7M0E3; Q62950; P47942; Q07266; P84060; Q9WTP0; P62870; P15429; P07323; P60841; Q9Z1Z3; Q5RJL0; B0BN94; P55053; P07483; Q9JIX3; Q62658; Q32PX7; Q99PF5; Q920R4; Q5XI73; P47819; Q63228; P62994; P01946; P02091; P11517; P62959; Q9Z2X5; P82995; P34058; P27321; Q5XI72; Q68FR3; P50411; Q6AXU6; Q5BK20; P07335; P11980; Q99MZ8; Q66HF9; P34926; P15205; Q5M7W5; Q63560; P30009; P02688; B0BN72; Q5FVH7; Q4KM98; Q6XVN8; Q62625; O35763; Q9EPH2; P15146; P62775; P20428; Q05982; P69682; P97603; P07936; Q9JJ19; P13084; Q63083; Q9JI85; Q01205; P08461; Q4V8B0; Q5XIL2; Q9Z0W5; Q920Q0; O88767; P04785; Q5U318; P31044; O55012; Q99MC0; P10111; Q6J4I0; Q9R063; P02625; Q812D1; Q63475; P51583; P86252; Q68A21; P62986; P02401; P62982; P62859; Q64548; Q6RJR6; Q9JK11; O35314; P10362; Q63945; B0BN85; P60881; Q9Z2P6; P07632; Q66HL2; P28042; O35814; P13668; P21818; P09951; Q63537; O70441; Q58DZ9; P37377; Q63754; P21643; Q62880; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; Q66HC1; P62076; Q9WVA1; P48500; P04692; P58775; Q63610; P09495; P02767; P0CG51; Q63429; P63045; P20156; Q5BJU7 | P31044; P37377; P45592; P11030; P02625; P29419; Q05175; P62775; P21571; O88767; P15146; Q63754; P02688; P08082; P62898; P63045; P62076; O35814; Q9WUC4; Q62658; P86252; Q63228; P07632; Q9WVA1; Q5XI73; P08081; P62959; P09951; P60881; P84087; P10362 |
Phosphothreonine | 0e+00 | 0e+00 | 1511 | 92 | 0.5499037 | 0.8687798 | 907 | 92 | Significant | P0C1X8; Q63028; O08838; Q05140; Q62848; P15999; Q05175; O08839; O88778; P0DP29; P0DP30; P0DP31; O35783; P26772; P08082; P45592; Q91ZN1; P11240; Q9JHL4; Q9QXU8; Q62950; P47942; Q07266; P84060; Q9WTP0; P62870; P15429; P07323; P60841; Q9Z1Z3; Q5RJL0; B0BN94; P07483; Q32PX7; Q99PF5; Q920R4; P47819; P62994; P01946; P02091; P11517; P82995; P34058; P27321; P50411; Q6AXU6; P07335; P11980; Q99MZ8; P34926; P15205; Q5M7W5; P30009; P02688; B0BN72; Q4KM98; O35763; Q9EPH2; P15146; P62775; P20428; P69682; P97603; P07936; Q9JJ19; P13084; Q63083; Q4V8B0; Q9Z0W5; Q920Q0; P31044; Q99MC0; P10111; Q6J4I0; Q812D1; Q63475; P51583; Q68A21; Q6RJR6; Q9JK11; B0BN85; P60881; Q66HL2; O35814; P09951; Q63537; Q62880; P19332; P48500; P58775; Q63610; P09495 | P31044; P45592; Q05175; P62775; P15146; P02688; P08082; O35814 |
N-acetylalanine | 0e+00 | 0e+00 | 435 | 42 | 0.7139681 | 1.1321467 | 142 | 42 | Significant | P31399; D3ZAF6; O08839; P0DP29; P0DP30; P0DP31; P26772; P45592; Q63768; Q7M0E3; P62628; Q07266; P15429; B0BN94; P55053; P07483; Q32PX7; Q5XI73; P62959; Q5XI72; P50411; Q792I0; P15205; Q5M7W5; P02688; O88767; P31044; Q9EPC6; P51583; Q68A21; Q6RJR6; B0BN85; P07632; P13668; P19332; Q6PEC1; P62078; Q9WV97; Q63610; P09495; Q7M767; Q9Z1A5 | P62628; P31044; P45592; O88767; P31399; P02688; P07632; Q5XI73; P62959; Q9WV97; Q6PEC1; P07483 |
Phosphoserine | 0e+00 | 0e+00 | 3634 | 155 | 0.5378929 | 0.8429228 | 944 | 155 | Significant | P0C1X8; Q63028; Q62847; O08838; Q99068; Q05140; Q62848; Q9WUC4; P29419; P21571; P15999; D3ZAF6; Q05175; O08839; O88778; P0DP29; P0DP30; P0DP31; O35783; O35397; P63039; P08081; P08082; P10354; P45592; Q91ZN1; P84087; Q63768; P11951; Q9JHL4; Q9QXU8; Q7M0E3; Q62950; P47942; Q07266; P84060; Q9WTP0; P62870; P15429; P07323; P60841; Q9Z1Z3; Q5RJL0; P55053; P07483; Q62658; Q32PX7; Q99PF5; Q920R4; Q5XI73; P47819; P01946; P02091; P11517; P62959; Q9Z2X5; P82995; P34058; P27321; Q5XI72; Q68FR3; P50411; Q6AXU6; Q5BK20; P07335; P11980; Q99MZ8; Q66HF9; P34926; P15205; Q5M7W5; Q63560; P30009; P02688; B0BN72; Q5FVH7; Q4KM98; Q6XVN8; O35763; Q9EPH2; P15146; P20428; Q05982; P97603; P07936; Q9JJ19; P13084; Q63083; Q9JI85; Q01205; P08461; Q4V8B0; Q5XIL2; Q9Z0W5; Q920Q0; P04785; Q5U318; P31044; O55012; Q99MC0; P10111; Q6J4I0; Q9R063; P02625; Q812D1; Q63475; P51583; P86252; Q68A21; P62986; P02401; P62982; P62859; Q64548; Q6RJR6; Q9JK11; O35314; P10362; Q63945; B0BN85; P60881; Q9Z2P6; P07632; Q66HL2; P28042; O35814; P13668; P21818; P09951; Q63537; O70441; Q58DZ9; P37377; Q63754; P21643; Q62880; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; Q66HC1; P62076; Q9WVA1; P48500; P04692; P58775; Q63610; P09495; P02767; P0CG51; Q63429; P20156; Q5BJU7 | P31044; P37377; P45592; P02625; P29419; Q05175; P21571; P15146; Q63754; P02688; P08082; P62076; O35814; Q9WUC4; Q62658; P86252; P07632; Q9WVA1; Q5XI73; P08081; P62959; P09951; P60881; P84087; P10362; Q9Z0W5; Q63537; P07483; P15999; Q9JHL4; D3ZAF6; P62982; P0CG51; P62986; Q63429; O08838 |
Phosphotyrosine | 0e+00 | 0e+00 | 655 | 48 | 0.7093412 | 1.1147205 | 146 | 48 | Significant | P0C1X8; P11030; P0DP29; P0DP30; P0DP31; O35783; P63039; P45592; Q5U2U2; Q63768; Q6AY72; P62898; Q9JHL4; Q62950; P47942; Q9WTP0; P15429; P07323; P55053; P07483; Q9JIX3; P01946; P82995; P34058; P07335; P11980; P34926; P15205; Q63560; P02688; B0BN72; O35763; P15146; P13084; Q9Z0W5; O88767; P51583; Q63945; Q66HL2; O35814; P09951; P37377; P19332; Q6AYZ1; Q68FR8; Q5XIF6; P04692; P58775 | P37377; P45592; P11030; O88767; P15146; P02688; P62898; O35814 |
N6-succinyllysine | 0e+00 | 0e+00 | 327 | 31 | 0.7518702 | 1.1767539 | 76 | 31 | Significant | P11030; P31399; P21571; P15999; P26772; P63039; P62898; P47942; P07323; P56571; Q62658; Q5XI73; P01946; P02091; P11517; P34058; P11980; Q99MZ8; P30904; O35763; P13084; P08461; O88767; P04785; Q9R063; P02401; P07632; P28042; P11232; P62076; P48500 | P11030; P21571; O88767; P31399; P62898; P62076; P11232; Q62658; P07632; Q5XI73 |
Methylation | 0e+00 | 0e+00 | 494 | 39 | 0.3911697 | 0.6236016 | 993 | 39 | Significant | P0C1X8; P60711; P63259; Q05140; P15999; O88778; P0DP29; P0DP30; P0DP31; P47942; Q9Z1Z3; Q32PX7; Q99PF5; P47819; P02091; P11517; P34058; Q5XI72; P11980; Q99MZ8; P15205; P02688; Q920Q0; Q63475; Q68A21; P63033; P62986; Q63945; Q66HL2; P13668; P09951; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; P48500; Q5BJU7 | P02688; P09951; P15999; P62986; Q6P9V9; Q6AYZ1; P68370; P48500; Q5XI72 |
3’-nitrotyrosine | 2e-07 | 1e-06 | 31 | 8 | 0.5834036 | 0.9696382 | 647 | 8 | Significant | Q62950; P07335; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; P48500 | Q6P9V9; Q6AYZ1; P68370; P48500 |
Nitration | 3e-07 | 1e-06 | 32 | 8 | 0.5834036 | 0.9644070 | 657 | 8 | Significant | Q62950; P07335; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; P48500 | Q6P9V9; Q6AYZ1; P68370; P48500 |
N6-methyllysine | 6e-06 | 3e-05 | 57 | 9 | -0.3200000 | -0.5188542 | 25 | 9 | Not significant | P60711; P63259; P0DP29; P0DP30; P0DP31; Q99MZ8; P13668; P19332; P48500 | P31044; Q9WUC4; P0DN35; P62859; Q5PPG6; P10715; Q71UE8; O88778; Q6AXU6 |
Lipoyl | 3e-05 | 1e-04 | 3 | 2 | -0.7584541 | -2.5565333 | 59 | 2 | Not significant | Q01205; P08461 | P31044; Q63754 |
N6-lipoyllysine | 3e-05 | 1e-04 | 3 | 2 | -0.7584541 | -2.5073660 | 51 | 2 | Not significant | Q01205; P08461 | P31044; Q63754 |
N-acetylvaline | 4e-05 | 1e-04 | 14 | 4 | 0.3804878 | 0.8106329 | 801 | 4 | Significant | P55051; P02091; P11517; P10111 | P10111; P55051; P11517; P02091 |
N6,N6,N6-trimethyllysine | 5e-05 | 2e-04 | 33 | 6 | 0.5493333 | 0.9619649 | 681 | 6 | Significant | P0DP29; P0DP30; P0DP31; P11980; P62986; Q6P9V9 | P62986; Q6P9V9 |
N6-malonyllysine | 8e-05 | 3e-04 | 16 | 4 | 0.8782475 | 1.7722250 | 107 | 4 | Significant | P11030; P26772; P63039; P34058 | P11030 |
Isopeptide bond | 1e-04 | 4e-04 | 708 | 38 | 0.6661330 | 1.0479937 | 381 | 38 | Significant | Q62847; Q05175; P0DP29; P0DP30; P0DP31; P63039; B0K020; P45592; P07323; Q99PF5; Q5XI73; P27321; Q68FR3; P11980; Q66HF9; Q5M7W5; Q05982; Q71UE8; P13084; O88767; O55012; P10111; Q812D1; P62986; P62982; Q63945; B0BN85; Q66HL2; O35814; P19332; P68370; Q6P9V9; Q66HC1; P48500; P0CG51; Q63429; Q5BJP3; P63025 | P45592; Q05175; Q5BJP3; O88767; O35814; Q5XI73; B0K020 |
Omega-N-methylarginine | 2e-04 | 7e-04 | 256 | 18 | 0.4700971 | 0.7404587 | 902 | 18 | Significant | P0C1X8; Q05140; P15999; O88778; Q9Z1Z3; Q32PX7; Q99PF5; P47819; Q5XI72; P15205; P02688; Q63475; Q68A21; Q66HL2; P09951; P19332; Q6P9V9; Q5BJU7 | P02688; P09951; P15999; Q6P9V9; Q5XI72 |
Phosphatidylethanolamine amidated glycine | 3e-04 | 7e-04 | 5 | 2 | 0.6231884 | 1.9808100 | 476 | 2 | Significant | Q6XVN8; Q62625 | Q62625; Q6XVN8 |
Phosphatidylserine amidated glycine | 3e-04 | 7e-04 | 5 | 2 | 0.6231884 | 2.3410292 | 444 | 2 | Significant | Q6XVN8; Q62625 | Q62625; Q6XVN8 |
S-nitrosocysteine | 4e-04 | 1e-03 | 45 | 6 | 0.7352011 | 1.2995806 | 369 | 6 | Significant | P47942; P82995; P34058; P15205; O35763; P11232 | P11232 |
Methionine (R)-sulfoxide | 5e-04 | 1e-03 | 6 | 2 | -0.9661836 | -3.2630716 | 2 | 2 | Not significant | P60711; P63259 | P31044; P37377 |
N-acetylmethionine | 5e-04 | 1e-03 | 383 | 23 | 0.6967950 | 1.0886932 | 315 | 23 | Significant | P0C1X8; P60711; P63259; Q63028; P84060; P62870; P62994; Q6AXU6; Q5BK20; Q99MZ8; P13084; Q920Q0; P10111; Q6J4I0; P02401; P62859; Q9JK11; O35814; P37377; Q62880; P62076; P04692; P58775 | P37377; P62076; O35814 |
Oxidation | 5e-04 | 1e-03 | 23 | 4 | 0.9077364 | 1.9429355 | 63 | 4 | Significant | P60711; P63259; P10354; O88767 | O88767 |
S-nitrosylation | 6e-04 | 1e-03 | 49 | 6 | 0.7352011 | 1.2838339 | 350 | 6 | Significant | P47942; P82995; P34058; P15205; O35763; P11232 | P11232 |
5-glutamyl polyglutamate | 9e-04 | 2e-03 | 7 | 2 | 0.7053140 | 2.4700832 | 371 | 2 | Significant | P68370; Q6P9V9 | Q6P9V9; P68370 |
Tele-methylhistidine | 1e-03 | 3e-03 | 8 | 2 | -0.9661836 | -3.2001164 | 2 | 2 | Not significant | P60711; P63259 | P31044; P37377 |
ADP-ribosylation | 2e-03 | 4e-03 | 43 | 5 | 0.7465420 | 1.3703850 | 331 | 5 | Significant | P13084; P62986; P62982; P0CG51; Q63429 | P62982; P0CG51; P62986; Q63429 |
Deamidated glutamine | 3e-03 | 6e-03 | 3 | 1 | 0.9230769 | 39.3715568 | 79 | 1 | Significant | P02688 | P02688 |
Arginine amide | 5e-03 | 1e-02 | 4 | 1 | 0.5144231 | 15.5641990 | 497 | 1 | Significant | O35314 | O35314 |
Glycine amide | 5e-03 | 1e-02 | 4 | 1 | -0.7644231 | -135.1500000 | 214 | 1 | Not significant | P10354 | P31044 |
N6-(2-hydroxyisobutyryl)lysine | 7e-03 | 1e-02 | 26 | 3 | 0.9429546 | 2.1509180 | 51 | 3 | Significant | P11030; P18418; P07323 | P11030 |
N,N,N-trimethylalanine | 9e-03 | 2e-02 | 5 | 1 | -0.7115385 | 46.3192020 | 309 | 1 | Significant | Q63945 | P31044 |
Methionine sulfoxide | 1e-02 | 2e-02 | 6 | 1 | -0.7644231 | -56.1605505 | 238 | 1 | Not significant | P10354 | P31044 |
N6-methylated lysine | 1e-02 | 2e-02 | 6 | 1 | -0.8365385 | 53.6859504 | 179 | 1 | Significant | P34058 | P31044 |
Asymmetric dimethylarginine | 1e-02 | 2e-02 | 105 | 7 | 0.5311510 | 0.8974831 | 715 | 7 | Significant | Q05140; O88778; P47942; P02091; P11517; P09951; Q5BJU7 | P09951 |
N-acetylglycine | 1e-02 | 2e-02 | 17 | 2 | 0.8299358 | 3.2356209 | 194 | 2 | Significant | P10715; P62898 | P62898 |
Pyruvate | 2e-02 | 4e-02 | 8 | 1 | -0.7692308 | -105.6147272 | 237 | 1 | Not significant | P11980 | P31044 |
N-acetylserine | 2e-02 | 4e-02 | 210 | 11 | 0.8333308 | 1.3111529 | 71 | 11 | Significant | P11030; Q62847; Q91ZN1; P07323; P60841; Q99PF5; Q63228; Q9JJ19; O55012; P02625; P63045 | P11030; P02625; P63045; Q63228 |
4-carboxyglutamate | 3e-02 | 4e-02 | 9 | 1 | -0.7596154 | 142.1006289 | 257 | 1 | Significant | P02767 | P31044 |
Citrulline | 3e-02 | 4e-02 | 39 | 3 | 0.8494968 | 1.9220972 | 165 | 3 | Significant | P47819; P02688; Q812D1 | P02688 |
We now introduce the plotting features for protein set enrichment
analysis. Two functions are included to visualize PSEA results returned
from runPSEA
function. The first plot is generated by
plotPSEA
function and shows Normalized Enrichment Score
(NES) for each PTM pathway. User can restrict the number of pathways to
draw based by adjusting sig.level parameter (default value is 0.05). The
coloring of the plot indicates if the pathway is enriched or not.
The second plot is generated by plotRunningScore
function. A running enrichment score plot for each PTM can be
plotted.
In addition to the introduced features and extensions from previous
version, the results from PEIMAN can also be utilized in Mass
spectrometry searching tools. The enriched PTM terms in list of proteins
generated by runPSEA
function in the previous step can be
searched in subset of protein modifications database.
psea2mass
function takes PSEA results and a significant
level (default value is 0.05) and returns protein modification of
statistically significant pathways for later searches in mass
spectrometry tools. For example, continuing from exmplData2
for PSEA, we call psea2mass
function as follows:
MS <- psea2mass(x = psea_res, sig.level = 0.05)
MS
#> MOD_ID name
#> 1 MOD:00085 N6-methyl-L-lysine
#> 2 MOD:00322 1'-methyl-L-histidine
#> 3 MOD:00051 N-acetyl-L-aspartic acid
#> 4 MOD:00053 N-acetyl-L-glutamic acid
#> def
#> 1 "converts an L-lysine residue to N6-methyl-L-lysine." [ChEBI:17604, DeltaMass:165, PubMed:11875433, PubMed:3926756, RESID:AA0076, Unimod:34#K]
#> 2 "converts an L-histidine residue to tele-methyl-L-histidine." [PubMed:10601317, PubMed:11474090, PubMed:11875433, PubMed:6692818, PubMed:8076, PubMed:8645219, RESID:AA0317]
#> 3 "converts an L-aspartic acid residue to N-acetyl-L-aspartic acid." [ChEBI:21547, PubMed:1560020, PubMed:2395459, RESID:AA0042]
#> 4 "converts an L-glutamic acid residue to N-acetyl-L-glutamic acid." [ChEBI:17533, PubMed:6725286, RESID:AA0044]
#> FreqinList
#> 1 9
#> 2 2
#> 3 1
#> 4 1
Note that list of proteins generated by runEnrichment
function can be passed to sea2mass
function too.