Node-Level Network Properties

Introduction

Node-level network properties are properties that pertain to each individual node in the network graph.

Some are local properties, meaning that their value for a given node depends only on a subset of the nodes in the network. One example is the network degree of a given node, which represents the number of other nodes that are directly joined to the given node by an edge connection.

Other properties are global properties, meaning that their value for a given node depends on all of the nodes in the network. An example is the authority score of a node, which is computed using the entire graph adjacency matrix (if we denote this matrix by A, then the principal eigenvector of ATA represents the authority scores of the network nodes).

Node-level network properties can be computed when calling buildRepSeqNetwork() or its alias buildNet() by setting node_stats = TRUE, or as a separate step using addNodeStats().

Simulate Data for Demonstration

We simulate some toy data for demonstration.

We simulate data consisting of two samples with 100 observations each, for a total of 200 observations (rows).

set.seed(42)
library(NAIR)
dir_out <- tempdir()

toy_data <- simulateToyData()
head(toy_data)
#>        CloneSeq CloneFrequency CloneCount SampleID
#> 1 TTGAGGAAATTCG    0.007873775       3095  Sample1
#> 2 GGAGATGAATCGG    0.007777102       3057  Sample1
#> 3 GTCGGGTAATTGG    0.009094910       3575  Sample1
#> 4 GCCGGGTAATTCG    0.010160859       3994  Sample1
#> 5 GAAAGAGAATTCG    0.009336593       3670  Sample1
#> 6 AGGTGGGAATTCG    0.010369470       4076  Sample1
nrow(toy_data)
#> [1] 200

Computing Node-Level Properties

With buildRepSeqNetwork()/buildNet()

Calling buildRepSeqNetwork() with node_stats = TRUE is one way to compute node-level network properties.

# build network with computation of node-level network properties
net <- buildNet(toy_data, "CloneSeq", 
                node_stats = TRUE
)

With addNodeStats()

addNodeStats() can be used with the output of buildRepSeqNetwork() to compute node properties for the network.

net <- buildNet(toy_data, "CloneSeq")

net <- addNodeStats(net)

Results

After using either of the methods described above, the node metadata now contains additional variables for the network properties.

names(net$node_data)
#>  [1] "CloneSeq"                  "CloneFrequency"           
#>  [3] "CloneCount"                "SampleID"                 
#>  [5] "degree"                    "transitivity"             
#>  [7] "eigen_centrality"          "centrality_by_eigen"      
#>  [9] "betweenness"               "centrality_by_betweenness"
#> [11] "authority_score"           "coreness"                 
#> [13] "page_rank"
head(net$node_data[ , c("CloneSeq", "degree", "authority_score")])
#>         CloneSeq degree authority_score
#> 2  GGAGATGAATCGG      1    3.161692e-18
#> 5  GAAAGAGAATTCG      3    1.642413e-17
#> 8  GGGGAGAAATTGG      2    4.558649e-02
#> 11 GGGGGAGAATTGC      4    1.505537e-01
#> 12 GGGGGGGAATTGC     10    5.269180e-01
#> 13 AGGGGGAAATTGG      5    1.468234e-01

Choosing the Node-Level Properties

The names of the node-level network properties that can be computed are listed below. For details on the individual properties, see ?chooseNodeStats(). The cluster_id property is discussed here.

  • degree
  • cluster_id
  • transitivity
  • closeness
  • centrality_by_closeness
  • eigen_centrality
  • centrality_by_eigen
  • betweenness
  • centrality_by_betweenness
  • authority_score
  • coreness
  • page_rank

By default, all of the available node-level properties are computed except for closeness, centrality_by_closeness and cluster_id.

When computing node properties with buildRepSeqNetwork() or addNodeStats(), the properties to compute can be specified using the stats_to_include parameter.

stats_to_include = "all" computes all properties.

To specify a subset of properties, stats_to_include accepts a named logical vector following a particular format. This vector can be created with chooseNodeStats(). Each parameter of chooseNodeStats() is one of the property names seen above, accepting TRUE or FALSE to specify whether the property is computed. (The default values match the default set of node properties, so stats_to_include = chooseNodeStats() is the same as leaving stats_to_include unspecified.)

Below, the closeness property is computed along with the default properties except for page_rank.

# Modifying the default set of node-level properties
net <- buildNet(toy_data, "CloneSeq", 
                node_stats = TRUE,
                stats_to_include = 
                  chooseNodeStats(closeness = TRUE, 
                                  page_rank = FALSE
                  )
)

To include only a few properties and exclude the rest, it is easier to use exclusiveNodeStats(), which behaves like chooseNodeStats(), but all argument values are FALSE by default.

# Include only the node-level properties specified below
net <- buildNet(toy_data, "CloneSeq", 
                node_stats = TRUE, 
                stats_to_include = 
                  exclusiveNodeStats(degree = TRUE, 
                                     transitivity = TRUE
                  )
)