Graph metrics are quantitative measures that provide insights into
the structural properties of pathway graphs, playing a crucial role in
understanding the topology of biological networks and revealing key
characteristics. Through the integration of WikiPathways and R’s
igraph
package, WayFindR
provides a suite of
functions that enables researchers to compute graph metrics on
biological pathways. In this vignette, we demonstrate how to compute
some of these metrics on the pathway named “Factors and pathways
influencing insulin-like growth factor (IGF1)-Akt signaling (WP3850)”
accessible at https://www.wikipathways.org/pathways/WP3850.html. This
GPML file is included in the package as a system file.
First, we load the required libraries:
Now we are ready to load our GPML file and convert it into an igraph object:
xmlfile <- system.file("pathways/WP3850.gpml", package = "WayFindR")
G <- GPMLtoIgraph(xmlfile)
class(G)
## [1] "igraph"
After obtaining an igraph
object, we can use functions
from the igraph
package to compute its structural
properties.
Users have the flexibility to choose which metrics to calculate for their research purposes. However, for the exploration of cycles in graphs, we will concentrate on a selection of global metrics that are potentially intriguing:
We refer readers to the igraph
package tutorial for more
detailed explanations of these metrics.
Next, let’s create a table summarizing all the metrics of interest.
# Calculate metrics
metrics <- data.frame(nVertices = length(V(G)),
nEdges = length(E(G)),
nNegative = sum(edge_attr(G, "MIM") == "mim-inhibition"),
hasLoop = any_loop(G),
hasMultiple = any_multiple(G),
hasEuler = has_eulerian_cycle(G) | has_eulerian_path(G),
nComponents = count_components(G),
density = edge_density(G),
diameter = diameter(G),
radius = radius(G),
girth = ifelse(is.null(girth(G)), NA, girth(G)$girth),
nTriangles = sum(count_triangles(G)),
efficiency = global_efficiency(G),
meanDistance = mean_distance(G),
cliques = clique_num(G),
reciprocity = reciprocity(G))
## Warning in clique_num(G): At
## vendor/cigraph/src/cliques/maximal_cliques_template.h:219 : Edge directions are
## ignored for maximal clique calculation.
## nVertices nEdges nNegative hasLoop hasMultiple hasEuler nComponents
## 1 55 61 14 FALSE FALSE FALSE 1
## density diameter radius girth nTriangles efficiency meanDistance cliques
## 1 0.02053872 10 6 2 5 0.08133571 4.666256 3
## reciprocity
## 1 0.03278689
We can find cycles and analyze cycle subgraph (i.e., the subgraph
defined by including only the nodes that re presen tin at least one
cycle. Here, nCyVert
is the number of vertices in the cycle
subgraph, nCyEdge
is the number of edges in the cycle
subgraph, nCyNeg
is the number of edges in the cycle
subgraph with the attribute “MIM” equal to “mim-inhibition”. You can
visually confirm the cunts in the plot below.
## [1] 5
S <- cycleSubgraph(G, cy)
cymetrics <- data.frame(nCycles = length(cy),
nCyVert = length(V(S)),
nCyEdge = length(E(S)),
nCyNeg = sum(edge_attr(S, "MIM") == "mim-inhibition"))
cymetrics
## nCycles nCyVert nCyEdge nCyNeg
## 1 5 9 12 6
nd Hubs In addition to numerous “global” graph metrics, the
igraph
package includes tools to compute numerous “local”
metrics, which describe the properties of individual nodes or edges. In
many networks (which, like pathways, can be represented by mathematical
graphs), highly connected nodes are often viewed as “hubs” that play a
more important role. The simplest such metric is the “degree”, which
counts the number of edges connected to the node. (In directed graphs,
which we use to instantiate pathway, one can talk about both inbound and
outbound edges and degrees.)
Here we compute the total degree of each edge
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 2.218 3.000 12.000
## ceb96 b5c6b d51fd c0520 b2305 b3356
## 4 5 5 6 7 12
We see that the largest degree is equal to 12. To find our which node that is, we peek into the graph.
## [1] "mTORC1 complex"
So, the highest degree belongs to the group representing the mTORC1 complex. We know that the in-degree for this graph node is artificially inflated by the “contained” arrows defining the group members. So, it may be worth exploring how many such arrows there are, and how many are actual interactions.
Here are the genes that are the source of inbound arrows.
## $b3356
## + 9/55 vertices, named, from e40d773:
## [1] b2305 b62b2 bb118 c7b3c ca5fb f8fb6 fad48 fb887 c2a5c
By default, we only see the cryptic alphanumeric identifiers. By extracting the IDs, we can find the gene names.
## [1] "FoxO " "AMPK" "MLST8" "MAFbx" "AKT1S1"
## [6] "RPTOR" "DEPTOR" "amino acids" "MTOR"
Knowing the IDs, we can also determine the edge type.
## [1] "mim-inhibition" "mim-inhibition" "contained" "mim-inhibition"
## [5] "contained" "contained" "contained" "mim-stimulation"
## [9] "contained"
Now we can plot the subgraph that connects directly to the mTORC1 complex.
B <- adjacent_vertices(G, w, "out")
subg <- subgraph(G, c(names(w), ids, as_ids(B[[1]])))
plot(subg, lwd=3)
We see that five of the inbound arrows are for the genes that are “contained” in the complex, leaving seven arrows rhat have biological meaning for the pathway.
We saw above that there is another node in this pathway that has 7 connected edges. It may be worth looking more closely at that node.
## [1] "FoxO "
This time, we get an actual gene, FoxO
.
B <- adjacent_vertices(G, w, "all")
subg <- subgraph(G, c(names(w), as_ids(B[[1]])))
plot(subg, lwd=3)