Node-level network properties are properties that pertain to each individual node in the network graph.
Some are local properties, meaning that their value for a given node depends only on a subset of the nodes in the network. One example is the network degree of a given node, which represents the number of other nodes that are directly joined to the given node by an edge connection.
Other properties are global properties, meaning that their value for a given node depends on all of the nodes in the network. An example is the authority score of a node, which is computed using the entire graph adjacency matrix (if we denote this matrix by A, then the principal eigenvector of ATA represents the authority scores of the network nodes).
Node-level network properties can be computed when calling
buildRepSeqNetwork()
or its alias buildNet()
by setting node_stats = TRUE
, or as a separate step using
addNodeStats()
.
We simulate some toy data for demonstration.
We simulate data consisting of two samples with 100 observations each, for a total of 200 observations (rows).
set.seed(42)
library(NAIR)
dir_out <- tempdir()
toy_data <- simulateToyData()
head(toy_data)
#> CloneSeq CloneFrequency CloneCount SampleID
#> 1 TTGAGGAAATTCG 0.007873775 3095 Sample1
#> 2 GGAGATGAATCGG 0.007777102 3057 Sample1
#> 3 GTCGGGTAATTGG 0.009094910 3575 Sample1
#> 4 GCCGGGTAATTCG 0.010160859 3994 Sample1
#> 5 GAAAGAGAATTCG 0.009336593 3670 Sample1
#> 6 AGGTGGGAATTCG 0.010369470 4076 Sample1
buildRepSeqNetwork()
/buildNet()
Calling buildRepSeqNetwork()
with
node_stats = TRUE
is one way to compute node-level network
properties.
After using either of the methods described above, the node metadata now contains additional variables for the network properties.
names(net$node_data)
#> [1] "CloneSeq" "CloneFrequency"
#> [3] "CloneCount" "SampleID"
#> [5] "degree" "transitivity"
#> [7] "eigen_centrality" "centrality_by_eigen"
#> [9] "betweenness" "centrality_by_betweenness"
#> [11] "authority_score" "coreness"
#> [13] "page_rank"
head(net$node_data[ , c("CloneSeq", "degree", "authority_score")])
#> CloneSeq degree authority_score
#> 2 GGAGATGAATCGG 1 3.161692e-18
#> 5 GAAAGAGAATTCG 3 1.642413e-17
#> 8 GGGGAGAAATTGG 2 4.558649e-02
#> 11 GGGGGAGAATTGC 4 1.505537e-01
#> 12 GGGGGGGAATTGC 10 5.269180e-01
#> 13 AGGGGGAAATTGG 5 1.468234e-01
The names of the node-level network properties that can be computed
are listed below. For details on the individual properties, see
?chooseNodeStats()
. The cluster_id
property is
discussed here.
degree
cluster_id
transitivity
closeness
centrality_by_closeness
eigen_centrality
centrality_by_eigen
betweenness
centrality_by_betweenness
authority_score
coreness
page_rank
By default, all of the available node-level properties are computed
except for closeness
, centrality_by_closeness
and cluster_id
.
When computing node properties with buildRepSeqNetwork()
or addNodeStats()
, the properties to compute can be
specified using the stats_to_include
parameter.
stats_to_include = "all"
computes all properties.
To specify a subset of properties, stats_to_include
accepts a named logical vector following a particular format. This
vector can be created with chooseNodeStats()
. Each
parameter of chooseNodeStats()
is one of the property names
seen above, accepting TRUE
or FALSE
to specify
whether the property is computed. (The default values match the default
set of node properties, so
stats_to_include = chooseNodeStats()
is the same as leaving
stats_to_include
unspecified.)
Below, the closeness
property is computed along with the
default properties except for page_rank
.
# Modifying the default set of node-level properties
net <- buildNet(toy_data, "CloneSeq",
node_stats = TRUE,
stats_to_include =
chooseNodeStats(closeness = TRUE,
page_rank = FALSE
)
)
To include only a few properties and exclude the rest, it is easier
to use exclusiveNodeStats()
, which behaves like
chooseNodeStats()
, but all argument values are
FALSE
by default.