Comprehending biological pathways stands as a pivotal endeavor in the realm of life sciences. However, merely possessing visual representations such as PNG or SVG files is insufficient for comprehensive analysis. The WayFindR package is a robust tool meticulously crafted for both biologists and bioinformaticians. Its primary objective is streamlining the intricate task of dissecting cellular pathways. Through seamless integration of WikiPathways (a well-known repository for biological pathways) and R’s igraph package, WayFindR bestows users with the capability to transform complex pathway architectures into actionable insights.
From a mathematical perspective, control theory asserts that negative feedback is essential for effective regulation and control of dynamic systems. Negative feedback mechanisms serve as stabilizing forces, ensuring that deviations from desired states are minimized and the system returns to equilibrium. In the context of biological pathways, where intricate networks of molecular interactions govern cellular processes, the concept of negative feedback is paramount.
Biologists rely on understanding the dynamics of biological pathways to elucidate mechanisms underlying disease, identify potential drug targets, and uncover regulatory networks. Negative feedback loops within these pathways play crucial roles in maintaining cellular homeostasis, modulating responses to external stimuli, and preventing excessive or aberrant signaling.
By enabling the conversion of pathway structures into computationally tractable formats and providing tools for analysis, WayFindR facilitates the identification and characterization of negative feedback loops within pathways. This capability is invaluable for deciphering the regulatory logic governing cellular processes and designing interventions to modulate pathway activity.
In essence, by bridging the gap between biological knowledge and mathematical principles, WayFindR empowers researchers to navigate the intricate landscapes of cellular pathways with precision and insight, ultimately advancing our understanding of complex biological systems and paving the way for innovative therapeutic strategies.
The WayFindR package is available from CRAN:
We will explore the functionality of WayFindR using one example file. You can obtain a GPML file by visiting https://www.wikipathways.org/ and selecting a pathway that piques your interest. We previously downloaded the pathway named “Factors and pathways influencing insulin-like growth factor (IGF1)-Akt signaling (WP3850),” which can be accessed directly at https://www.wikipathways.org/pathways/WP3850.html. We included this GPML file in the package, and you can find it as a system file.
Functions in WayFindR
rely on the XML
R
package to parse GPML files. (GPML is a “dialect” of the extensible
markup language, XML.) You can simly supply the file name to the
functions, but they also work is you explicitly parse them yourself:
We are going to illustrate the use of functions that allow users to
explore the various types of entities that are present in GPML files.
These are useful for understanding both how GPML files are defined and
how WayFindR
works. You do not need to use them separately
on every file; if you just want to get to the resulting
igraph
, you can skip ahead to the section on “Converting
pathways into igraph objects”.
A user can generate a matrix with information about edges from the provided GPML file:
## [1] "matrix" "array"
## [1] 46 3
The matrix comprises 46 rows (representing edges) and 3 columns: Source, Target, and MIM (Molecular Interaction Maps). Each edge is identified by a unique ID, such as id2257fd8. These edges are directional, indicating they originate from one node and terminate at another. The Source and Target columns contain labels corresponding to these nodes. The MIM column illustrates the type of interaction occurring between the nodes.
## Source Target MIM
## id2257fd8 "c0520" "b2305" "mim-inhibition"
## id25882942 "a531f" "b3910" "Arrow"
## id25c739ca "f5259" "ceb96" "Arrow"
## id2848101a "cc7e6" "fea9d" "Arrow"
## id295ae5fe "a8bb6" "b687e" "Arrow"
## id2e105012 "fea9d" "cd260" "Arrow"
## Source Target MIM
## idf7bb43f7 "f2f0d" "a2d31" "mim-inhibition"
## idfc5a5200 "f58ce" "d3cde" "Arrow"
## idfea9f3db "ceb96" "a2d31" "Arrow"
## idfed646d2 "ba0fb" "fca28" "Arrow"
## id2d5bc3d8 "b0088" "b2d78" "mim-inhibition"
## id70c7e487 "b62b2" "b3356" "mim-inhibition"
Likewise, users can acquire a matrix of nodes in a similar manner.
## [1] "matrix" "array"
## [1] 52 3
In the selected pathway, there are a total of 52 nodes. The resulting matrix consists of three columns: GraphId, label, and Type. GraphId correlates to a distinct node ID extracted from the GPML file, while label corresponds to the name of a gene, protein, group, complex or biological process. The Type column contains information regarding the type of node.
## GraphId label Type
## a169c "a169c" "GSK3B" "GeneProduct"
## a2d31 "a2d31" "PI3K" "GeneProduct"
## a4cc9 "a4cc9" "SMAD3" "GeneProduct"
## a531f "a531f" "4EPB" "GeneProduct"
## a5550 "a5550" "EIF2B2" "GeneProduct"
## a8bb6 "a8bb6" "N-WASP" "GeneProduct"
## GraphId label Type
## b3910 "b3910" "Protein synthesis" "Pathway"
## c2a5c "c2a5c" "MTOR" "GeneProduct"
## f5051 "f5051" "MLST8" "GeneProduct"
## e1ccf "e1ccf" "MAPKAP1" "GeneProduct"
## c6e5f "c6e5f" "RICTOR" "GeneProduct"
## a3b41 "a3b41" "MTOR" "GeneProduct"
We’ve already extracted all edges (Interactions) and vertices
(DataNodes) from a GPML file. Another significant element within a GPML
file is referred to as a “Group.” In some instances, Groups denote a
complex, often associated to a separate DataNode
labeled
with the attribute Type="Complex"
. In other cases, Groups
seem to represent genes with interchangeable roles within the pathway.
For instance, within the IGF1-AKT pathway, there is an anonymous group
featuring SMAD2 and SMAD3.
## [1] "list"
## [1] "nodes" "edges"
## GraphId label Type
## b5c6b b5c6b mTORC2 complex Complex
## b3356 b3356 mTORC1 complex Complex
## d51fd d51fd Group1 Complex
## dae36 dae36 Group2 Group
## Source Target MIM
## ec1 a4cc9 d51fd contained
## ec2 b3463 d51fd contained
## ec3 bb118 b3356 contained
## ec4 bd3fb dae36 contained
## ec5 ca5fb b3356 contained
## ec6 efb74 dae36 contained
## ec7 f8fb6 b3356 contained
## ec8 fad48 b3356 contained
## ec9 c2a5c b3356 contained
## ec10 f5051 b5c6b contained
## ec11 e1ccf b5c6b contained
## ec12 c6e5f b5c6b contained
## ec13 a3b41 b5c6b contained
The entry “contained” in the MIM column does not come from the MIM
standard, the WayFindR
package has invented this edge type
to be able to relate a group to its constituent parts.
The Anchor, which is the fourth and final “semantic” structure
outlined in the GPML specification, serves a specific purpose. It is
used to designate edges (or interactions, or arrows) that act as targets
for other edges, instead of the typical nodes (or DataNodes, or
vertices). To handle this scenario, WayFindR
modifies the
target arrow, originally represented as A -> B, into a two-step
arrow, A -> EDGE -> B. Subsequently, we transform the
unconventional arrow into a simpler, more standardized representation
that now points to the newly introduced EDGE node.
## [1] "list"
## GraphId label Type
## 1 b2d78 Anchor1 EDGE
## Source Target MIM
## 1 b2305 b2d78 Source
## 2 b2d78 cd260 Arrow
The core feature of WayFindR is its ability to convert pathways from WikiPathways GPML format into igraph objects. In the preocess, we also define “standard” colors, line types, and shapes to display edges and nodes.
data(edgeColors)
data(edgeTypes)
data(nodeColors)
data(nodeShapes)
if (requireNamespace("Polychrome")) {
opar <- par(mfrow = c(2,1))
Polychrome::swatch(edgeColors, main = "Edge Types")
Polychrome::swatch(nodeColors, main = "Node Types")
} else {
opar <- par(mfrow = c(1,2))
plot(0,0, type = "n", xlab="", ylab = "", main = "Edges")
legend("center", legend = names(edgeColors), lwd = 3,
col = edgeColors, lty = edgeTypes)
num <- c(rectangle = 15, circle = 16)
plot(0,0, type = "n", xlab="", ylab = "", main = "Nodes")
legend("center", legend = names(nodeColors), cex = 1.5,
col = nodeColors, pch = num[nodeShapes])
}
## Loading required namespace: Polychrome
The main function collects all the entities from the GPML file, adds
colors and styles, and produces an igraph
.
To visualize and analyze our graph, we can apply the graphopt (or any of the many other) layout algorithm(s) from the igraph package:
Once the pathway is converted, WayFindR provides a suite of tools for in-depth analysis. Let’s explore how to find cycles within the pathway graph (if they exist).
## [1] 5
## [[1]]
## dbe54 c0520 b2305 c7b3c b3356 ef7f5 ceb96 a2d31
## 2 27 15 8 18 52 30 23 2
##
## [[2]]
## dbe54 c0520 b2305 b3356 ef7f5 ceb96 a2d31
## 2 27 15 8 52 30 23 2
##
## [[3]]
## c7b3c b2305
## 8 18 8
##
## [[4]]
## c7b3c b3356 d51fd c0520 b2305
## 8 18 52 53 15 8
##
## [[5]]
## b3356 d51fd c0520 b2305
## 8 52 53 15 8
The output consists of a list of cycles extracted from a directed graph. Each cycle is represented as a sequence of vertices, where each vertex is indicated by its numerical position within the graph’s vertex list. These cycles denote closed loops of vertices in the graph, signifying paths that both start and end at the same vertex. The WP3850 pathway has 5 cycles.
Furthermore, within WayFindR, there exists a function named
“interpretCycle
” designed to interpret a cycle within the
context of the associated graph. This function takes a cycle and the
graph it belongs to as inputs. Its purpose is to extract pertinent
details from the cycle, including edge ids, arrow types (indicating the
edge direction), and labels associated with genes.
## [[1]]
## genes arrows
## 1 PI3K mim-stimulation
## 2 PDK1 mim-stimulation
## 3 AKT1 mim-inhibition
## 4 FoxO mim-stimulation
## 5 MAFbx mim-inhibition
## 6 mTORC1 complex mim-stimulation
## 7 S6K mim-inhibition
## 8 IRS1 mim-stimulation
##
## [[2]]
## genes arrows
## 1 PI3K mim-stimulation
## 2 PDK1 mim-stimulation
## 3 AKT1 mim-inhibition
## 4 FoxO mim-inhibition
## 5 mTORC1 complex mim-stimulation
## 6 S6K mim-inhibition
## 7 IRS1 mim-stimulation
##
## [[3]]
## genes arrows
## 1 FoxO mim-stimulation
## 2 MAFbx mim-stimulation
##
## [[4]]
## genes arrows
## 1 FoxO mim-stimulation
## 2 MAFbx mim-inhibition
## 3 mTORC1 complex mim-inhibition
## 4 Group1 mim-inhibition
## 5 AKT1 mim-inhibition
##
## [[5]]
## genes arrows
## 1 FoxO mim-inhibition
## 2 mTORC1 complex mim-inhibition
## 3 Group1 mim-inhibition
## 4 AKT1 mim-inhibition
Notice that the cycles in the chosen pathway include only inhibition and stimulation processes.
Finally, we can generate a subgraph from the original graph, containing only the vertices that participate in the specified cycles.
We depict the section of the graph that impacts cycles within the IGF1-AKT wikiPathway.
Here, Group1 corresponds to the the pair SMAD2, SMAD3.