---
title: 'DyNAM-i: an example script'
subtitle: "Dynamic Network Actor Models for face-to-face interaction data"
author: "Marion Hoffman"
date: '2022-08-11'
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{DyNAM-i: an example script}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---



Example script analyzing the RFID validity study data (Elmer et al, 2019)
with the DyNAM-i model of the `goldfish` package. 
Data received from:

> Elmer, T., Chaitanya, K., Purwar, P., & Stadtfeld, C. (2019).
> The validity of RFID badges measuring face-to-face interactions.
> Behavior research methods, 51(5), 2120-2138.

Analyses inspired from:

> Hoffman, M., Block, P., Elmer, T., & Stadtfeld, C. (2020).
> A model for the dynamics of face-to-face interactions in social groups.
> Network Science, 8(S1), S4-S25.

# Step 0: Load package and data

First, we load the `goldfish` package and load the data. . 
You can find out more about this dataset and its format by callings
its documentation.


``` r
library(goldfish)
data("RFID_Validity_Study")
#?RFID_Validity_Study
```

The `participants` object contains the nodeset of actors interacting together.
The available attributes are their age, their gender, their organizational
unit (group), and their seniority level:

``` r
head(participants)
```

```
##   actor  label present age gender group level
## 1     1 Actor1    TRUE  35      0     1     4
## 2     2 Actor2    TRUE  34      0     2     3
## 3     3 Actor3    TRUE  25      1     1     1
## 4     4 Actor4    TRUE  26      0     1     2
## 5     5 Actor5    TRUE  26      0     1     1
## 6     6 Actor6    TRUE  31      1     3     3
```

The `rfid` object contains the list of dyadic interactions collected via
RFID badges.
Each interaction is characterized by two actors (`NodeA` and `NodeB`) and
two time points (`Start` and `End`):

``` r
head(rfid)
```

```
##   NodeA NodeB      Start        End
## 1     3     4 1491329893 1491329930
## 2     1    11 1491329894 1491329926
## 3    10     8 1491329894 1491329904
## 4    10     4 1491329895 1491329911
## 5    10     5 1491329895 1491329924
## 6     3     5 1491329902 1491329913
```

The `video` object contains the list of dyadic interactions collected
via video recordings and has the same format as the `rfid` object.
We know that these measures should be more reliable, so we will work with
those data in this script!

``` r
head(video)
```

```
##    NodeA NodeB      Start        End
## 13     5     8 1491329893 1491329984
## 14     7     1 1491329893 1491329984
## 18     1    11 1491329893 1491329984
## 19     1     6 1491329893 1491329984
## 42     7     6 1491329893 1491330372
## 45    11     6 1491329893 1491330372
```

Finally, the `known.before` matrix indicates which actors knew each other
before the event.


### Step 1: Create groups and interaction events

Before specifying the model, we need to set the data in the right format
for DyNAM-i: We had a list of dyadic interactions, but DyNAM-i is specifically
meant for group interactions.
More specifically, the model is designed for events of actors joining or
leavin interaction groups. We can use this function to create these group
events (please note we need to use the right labels in the dataframe:
`NodeA`, `NodeB`,`Start`, and `End`):

``` r
#?defineGroups_interaction
prepdata <- defineGroups_interaction(video, participants,
                                     seed.randomization = 1)
```

This functions to creates 5 objects:
1. `groups`: a goldfish nodeset containing the interaction groups
(initially there are as many groups as actors and they are all present,
meaning available)

``` r
groups <- prepdata$groups
head(groups)
```

```
##    label present
## 1 Group1    TRUE
## 2 Group2    TRUE
## 3 Group3    TRUE
## 4 Group4    TRUE
## 5 Group5    TRUE
## 6 Group6    TRUE
```

2. `dependent.events`: goldfish events specifying the events that we want
to model, when an actor (in the dataframe, the `sender` column) joins or
leaves (`increment` = 1 or -1) a group (`receiver`) at a particular point
in time (`time`)

``` r
dependent.events <- prepdata$dependent.events
head(dependent.events)
```

```
##         time  sender receiver increment
## 1 1491329893  Actor4   Group3         1
## 2 1491329893  Actor2   Group9         1
## 3 1491329893  Actor7   Group6         1
## 4 1491329893  Actor1   Group6         1
## 5 1491329893 Actor11   Group6         1
## 6 1491329893  Actor8   Group5         1
```

3. `exogenous.events`: goldfish events specifying the events that need to
happen but are not modeled (for example, when an actor leaves a group, a
dependent event is created for this leaving but an exogenous event is also
created because the actor "joins" a new group, its own isolated group)

``` r
exogenous.events <- prepdata$exogenous.events
head(exogenous.events)
```

```
##         time  sender receiver increment
## 1 1491329893  Actor4   Group4        -1
## 2 1491329893  Actor2   Group2        -1
## 3 1491329893  Actor7   Group7        -1
## 4 1491329893  Actor1   Group1        -1
## 5 1491329893 Actor11  Group11        -1
## 6 1491329893  Actor8   Group8        -1
```

4. `interaction.updates`: goldfish events that are used to update the number
of past interactions between participants: 

``` r
interaction.updates <- prepdata$interaction.updates
head(interaction.updates)
```

```
##         time  sender receiver increment
## 1 1491329893  Actor4   Actor3         1
## 2 1491329893  Actor2   Actor9         1
## 3 1491329893  Actor7   Actor6         1
## 4 1491329893  Actor1   Actor6         1
## 5 1491329893  Actor1   Actor7         1
## 6 1491329893 Actor11   Actor1         1
```

5. `opportunities`: list containing the interaction groups available at each
decision time (this will vary when groups are being joined or left)

``` r
opportunities <- prepdata$opportunities
head(opportunities)
```

```
## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10 11
## 
## [[2]]
##  [1]  1  2  3  5  6  7  8  9 10 11
## 
## [[3]]
## [1]  1  9  3  5  6  7  8 10 11
## 
## [[4]]
## [1]  1  9  3  5  6  8 10 11
## 
## [[5]]
## [1]  6  9  3  5  8 10 11
## 
## [[6]]
## [1]  6  9  3  5  8 10
```

### Step 2: Set up goldfish objects

Now that we have all the data we need, we need to define the link between our
objects in a way that goldfish understands what is going on.

First, we define the first mode nodeset, the actors:

``` r
# goldfish requires character names
participants$label <- as.character(participants$label)
actors <- defineNodes(participants)
```

Then we define the second mode nodeset, the groups:

``` r
groups <- defineNodes(groups)
```

Then we create the dynamic interaction network between the actors and
the groups, updated by the previously created events in `dependent.events`
and `exogenous.events`

``` r
init.network <- diag(x = 1, nrow(actors), nrow(groups))
# goldfish check that row/column names agree with the nodes data frame labels
dimnames(init.network) <- list(actors$label, groups$label)
network.interactions <- defineNetwork(
  matrix = init.network, nodes = actors, nodes2 = groups, directed = TRUE
)
network.interactions <- linkEvents(
  x = network.interactions, changeEvent = dependent.events,
  nodes = actors, nodes2 = groups
)
network.interactions <- linkEvents(
  x = network.interactions, changeEvent = exogenous.events,
  nodes = actors, nodes2 = groups
)
```

Then we create the dynamic network between the actors that records
past interactions, updated by the previously created events in
`interaction.updates`

``` r
network.past <- defineNetwork(nodes = actors, directed = FALSE)
network.past <- linkEvents(
  x = network.past, changeEvents = interaction.updates, nodes = actors
) # don't worry about the warnings
```
Finally, we define the events that we want to model: `dependent.events`:

``` r
dependent.events <- defineDependentEvents(
  events = dependent.events, nodes = actors,
  nodes2 = groups, defaultNetwork = network.interactions
)
```

### Step 4: Estimate a model with attribute effects

Let us define some models we could estimate (we try not to put too many effects
because we have little data).

We first have a model only with effects related to individual attributes (M1). 

For the rate, we have the following effects:

1. general intercept (joining and leaving events) + dummy interaction for the
  intercept of joining events specifically
2. ego age (joining): tendency for older individuals to join groups faster
3. ego age (leaving): tendency for older individuals to leave groups faster
4. diff age (leaving): tendency for individuals to leave groups faster
  if the average sum of age difference to the group are high
5. diff level (leaving): tendency for individuals to leave groups faster
  if the average sum of level difference to the group are high
6. same gender (leaving): tendency for individuals to leave groups faster
  if the proportion of same gender in the group is high
7. same group (leaving): tendency for individuals to leave groups faster
  if the proportion of same group in the group is high
8. tie known before (leaving): tendency for individuals to leave groups faster
  if the proportion of previous friends in the group is high
  

``` r
formula.rate.M1 <- dependent.events ~  1 +
  intercept(network.interactions, joining = 1) +
  ego(actors$age, joining = 1, subType = "centered") +
  ego(actors$age, joining = -1, subType = "centered") +
  diff(actors$age, joining = -1, subType = "averaged_sum") +
  diff(actors$level, joining = -1, subType = "averaged_sum") +
  same(actors$gender, joining = -1, subType = "proportion") +
  same(actors$group, joining = -1, subType = "proportion") +
  tie(known.before, joining = -1, subType = "proportion")
```

and for the choice model:

9. diff age: tendency for individuals to join groups
  if the average sum of age difference to the group are high
10. diff level: tendency for individuals to join groups
  if the average sum of level difference to the group are high
11. same gender: tendency for individuals to join groups
  if the proportion of same gender in the group is high
12. same group: tendency for individuals to join groups
  if the proportion of same group in the group is high
13. tie known before: tendency for individuals to join groups
  if the proportion of previous friends in the group is high
  

``` r
formula.choice.M1 <- dependent.events ~
  diff(actors$age, subType = "averaged_sum") +
  diff(actors$level, subType = "averaged_sum") +
  same(actors$gender, subType = "proportion") +
  same(actors$group, subType = "proportion") +
  tie(known.before, subType = "proportion")
```

Note that effects in the choice model can mirror the ones in the leaving model:
The first model explains who people want to join (or not join),
the second explains who people want to leave (or stay with). 

Now let us run goldfish estimation!

``` r
est.rate.M1 <- estimate(formula.rate.M1, model = "DyNAMi", subModel = "rate")
summary(est.rate.M1)
```

```
## 
## Call:
## estimate(x = dependent.events ~ 1 + intercept(network.interactions, 
##     joining = 1) + ego(actors$age, joining = 1, subType = "centered") + 
##     ego(actors$age, joining = -1, subType = "centered") + diff(actors$age, 
##     joining = -1, subType = "averaged_sum") + diff(actors$level, 
##     joining = -1, subType = "averaged_sum") + same(actors$gender, 
##     joining = -1, subType = "proportion") + same(actors$group, 
##     joining = -1, subType = "proportion") + tie(known.before, 
##     joining = -1, subType = "proportion"), model = "DyNAMi", 
##     subModel = "rate")
## 
## 
## Effects details:
##           Object                 joining subType         
## Intercept ""                     ""      ""              
## intercept "network.interactions" "1"     ""              
## ego       "actors$age"           "1"     ""centered""    
## ego       "actors$age"           "-1"    ""centered""    
## diff      "actors$age"           "-1"    ""averaged_sum""
## diff      "actors$level"         "-1"    ""averaged_sum""
## same      "actors$gender"        "-1"    ""proportion""  
## same      "actors$group"         "-1"    ""proportion""  
## tie       "known.before"         "-1"    ""proportion""  
## 
## Coefficients:
##            Estimate Std. Error  z-value  Pr(>|z|)    
## Intercept -5.942916   0.322237 -18.4427 < 2.2e-16 ***
## intercept  2.509660   0.334344   7.5062 6.084e-14 ***
## ego        0.061687   0.024706   2.4968   0.01253 *  
## ego        0.027958   0.035570   0.7860   0.43188    
## diff      -0.163643   0.069175  -2.3656   0.01800 *  
## diff       0.361468   0.226675   1.5947   0.11079    
## same       0.027282   0.254374   0.1073   0.91459    
## same       0.508497   0.371577   1.3685   0.17116    
## tie       -0.156303   0.346267  -0.4514   0.65170    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##   Converged with max abs. score of 0.00012 
##   Log-Likelihood: -1306.3
##   AIC:  2630.6 
##   AICc: 2631.4 
##   BIC:  2661.7 
##   model: "DyNAMi" subModel: "rate"
```


``` r
est.choice.M1 <- estimate(
  formula.choice.M1,
  model = "DyNAMi", subModel = "choice",
  estimationInit = list(opportunitiesList = opportunities)
)
summary(est.choice.M1)
```

```
## 
## Call:
## estimate(x = dependent.events ~ diff(actors$age, subType = "averaged_sum") + 
##     diff(actors$level, subType = "averaged_sum") + same(actors$gender, 
##     subType = "proportion") + same(actors$group, subType = "proportion") + 
##     tie(known.before, subType = "proportion"), model = "DyNAMi", 
##     subModel = "choice", estimationInit = list(opportunitiesList = opportunities))
## 
## 
## Effects details:
##      Object          subType         
## diff "actors$age"    ""averaged_sum""
## diff "actors$level"  ""averaged_sum""
## same "actors$gender" ""proportion""  
## same "actors$group"  ""proportion""  
## tie  "known.before"  ""proportion""  
## 
## Coefficients:
##       Estimate Std. Error z-value  Pr(>|z|)    
## diff -0.107578   0.067179 -1.6014 0.1092943    
## diff  0.262945   0.225568  1.1657 0.2437353    
## same  0.292121   0.298317  0.9792 0.3274665    
## same  0.024975   0.384103  0.0650 0.9481565    
## tie   1.316141   0.378545  3.4768 0.0005074 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##   Converged with max abs. score of 0.00026 
##   Log-Likelihood: -195.16
##   AIC:  400.32 
##   AICc: 400.84 
##   BIC:  414.39 
##   model: "DyNAMi" subModel: "choice"
```
### Step 5: Estimate a model with structural and time effects

Now let's add effects related to group sizes and past interactions.
We add to the rate model the effects of:

1. size (leaving): tendency for individuals to leave large groups faster
2. egopop (joining): tendency for individuals
  who had more interactions in the past to join groups faster
3. egopop (leaving): tendency for individuals
  who had more interactions in the past to leave groups faster


``` r
formula.rate.M2 <- dependent.events ~  1 +
  intercept(network.interactions, joining = 1) +
  ego(actors$age, joining = 1, subType = "centered") +
  ego(actors$age, joining = -1, subType = "centered") +
  diff(actors$age, joining = -1, subType = "averaged_sum") +
  diff(actors$level, joining = -1, subType = "averaged_sum") +
  same(actors$gender, joining = -1, subType = "proportion") +
  same(actors$group, joining = -1, subType = "proportion") +
  tie(known.before, joining = -1, subType = "proportion") +
  size(network.interactions, joining = -1, subType = "identity") +
  egopop(network.past, joining = 1, subType = "normalized") +
  egopop(network.past, joining = -1, subType = "normalized")
```

We add to the choice model:

1. size: tendency for individuals to join large groups
2. alterpop: tendency for individuals to join groups with individuals
  who had more interactions in the past
3. inertia, window = 60s: tendency for individuals to join groups
  with a high average number of previous interactions with the group members
  within the last minute
3. inertia, window = 300s: same as above, for 5 minutes


``` r
formula.choice.M2 <- dependent.events ~
  diff(actors$age, subType = "averaged_sum") +
  diff(actors$level, subType = "averaged_sum") +
  same(actors$gender, subType = "proportion") +
  same(actors$group, subType = "proportion") +
  alter(actors$age, subType = "mean") +
  tie(known.before, subType = "proportion") +
  size(network.interactions, subType = "identity") +
  alterpop(network.past, subType = "mean_normalized") +
  inertia(network.past, window = 60, subType = "mean") +
  inertia(network.past, window = 300, subType = "mean")
```

One can note that we have normalized (using the option `subType="mean"`)
the effects related to previous interactions, because we want to avoid
time heterogeneity issues (in the beginning no interaction has happened,
at the end, a lot happened).

All effects can have different subTypes,
please look at the goldfish documentation to learn more.

Now we can run again

``` r
est.rate.M2 <- estimate(formula.rate.M2, model = "DyNAMi", subModel = "rate")
summary(est.rate.M2)
```

```
## 
## Call:
## estimate(x = dependent.events ~ 1 + intercept(network.interactions, 
##     joining = 1) + ego(actors$age, joining = 1, subType = "centered") + 
##     ego(actors$age, joining = -1, subType = "centered") + diff(actors$age, 
##     joining = -1, subType = "averaged_sum") + diff(actors$level, 
##     joining = -1, subType = "averaged_sum") + same(actors$gender, 
##     joining = -1, subType = "proportion") + same(actors$group, 
##     joining = -1, subType = "proportion") + tie(known.before, 
##     joining = -1, subType = "proportion") + size(network.interactions, 
##     joining = -1, subType = "identity") + egopop(network.past, 
##     joining = 1, subType = "normalized") + egopop(network.past, 
##     joining = -1, subType = "normalized"), model = "DyNAMi", 
##     subModel = "rate")
## 
## 
## Effects details:
##           Object                 joining subType         
## Intercept ""                     ""      ""              
## intercept "network.interactions" "1"     ""              
## ego       "actors$age"           "1"     ""centered""    
## ego       "actors$age"           "-1"    ""centered""    
## diff      "actors$age"           "-1"    ""averaged_sum""
## diff      "actors$level"         "-1"    ""averaged_sum""
## same      "actors$gender"        "-1"    ""proportion""  
## same      "actors$group"         "-1"    ""proportion""  
## tie       "known.before"         "-1"    ""proportion""  
## size      "network.interactions" "-1"    ""identity""    
## egopop    "network.past"         "1"     ""normalized""  
## egopop    "network.past"         "-1"    ""normalized""  
## 
## Coefficients:
##            Estimate Std. Error  z-value  Pr(>|z|)    
## Intercept -6.376865   0.396152 -16.0970 < 2.2e-16 ***
## intercept  2.942756   0.406082   7.2467  4.27e-13 ***
## ego        0.067331   0.026053   2.5844  0.009755 ** 
## ego        0.012435   0.036750   0.3384  0.735094    
## diff      -0.142109   0.076203  -1.8649  0.062200 .  
## diff       0.248709   0.240336   1.0348  0.300744    
## same       0.051020   0.263799   0.1934  0.846641    
## same       0.172918   0.409717   0.4220  0.672994    
## tie        0.145864   0.379845   0.3840  0.700971    
## size       0.133780   0.073838   1.8118  0.070015 .  
## egopop     0.068527   0.105430   0.6500  0.515708    
## egopop     0.177403   0.131982   1.3441  0.178900    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##   Converged with max abs. score of 0.00053 
##   Log-Likelihood: -1303.5
##   AIC:  2631 
##   AICc: 2632.4 
##   BIC:  2672.4 
##   model: "DyNAMi" subModel: "rate"
```
and:

``` r
est.choice.M2 <- estimate(
  formula.choice.M2,
  model = "DyNAMi", subModel = "choice",
  estimationInit = list(opportunitiesList = opportunities)
)
summary(est.choice.M2)
```

```
## 
## Call:
## estimate(x = dependent.events ~ diff(actors$age, subType = "averaged_sum") + 
##     diff(actors$level, subType = "averaged_sum") + same(actors$gender, 
##     subType = "proportion") + same(actors$group, subType = "proportion") + 
##     alter(actors$age, subType = "mean") + tie(known.before, subType = "proportion") + 
##     size(network.interactions, subType = "identity") + alterpop(network.past, 
##     subType = "mean_normalized") + inertia(network.past, window = 60, 
##     subType = "mean") + inertia(network.past, window = 300, subType = "mean"), 
##     model = "DyNAMi", subModel = "choice", estimationInit = list(opportunitiesList = opportunities))
## 
## 
## Effects details:
##          Object                 window subType            
## diff     "actors$age"           ""     ""averaged_sum""   
## diff     "actors$level"         ""     ""averaged_sum""   
## same     "actors$gender"        ""     ""proportion""     
## same     "actors$group"         ""     ""proportion""     
## alter    "actors$age"           ""     ""mean""           
## tie      "known.before"         ""     ""proportion""     
## size     "network.interactions" ""     ""identity""       
## alterpop "network.past"         ""     ""mean_normalized""
## inertia  "network.past"         "60"   ""mean""           
## inertia  "network.past"         "300"  ""mean""           
## 
## Coefficients:
##           Estimate Std. Error z-value Pr(>|z|)   
## diff     -0.133077   0.073387 -1.8134 0.069776 . 
## diff      0.095857   0.231054  0.4149 0.678238   
## same      0.141033   0.321444  0.4387 0.660843   
## same     -0.284446   0.418710 -0.6793 0.496924   
## alter     0.033538   0.017678  1.8971 0.057809 . 
## tie       1.274992   0.393714  3.2384 0.001202 **
## size      0.097754   0.093141  1.0495 0.293934   
## alterpop  0.288526   0.170532  1.6919 0.090662 . 
## inertia   0.211654   0.458129  0.4620 0.644084   
## inertia  -0.096613   0.235330 -0.4105 0.681410   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##   Converged with max abs. score of 2e-05 
##   Log-Likelihood: -189.76
##   AIC:  399.53 
##   AICc: 401.49 
##   BIC:  427.65 
##   model: "DyNAMi" subModel: "choice"
```

Of course these models ask a bit much from our data,
so ideally we would a bit more reasonable with the number of effects we include,
or do some model selection.

### Step 6: Extra steps

We now have for each rate model a general intercept and
a dummy interaction between the intercept and the joining events.
If we want to report the actual intercept for joining,
we can calculate the following (for M2 for example):

``` r
cov.matrix <- vcov(est.rate.M2)

est.interceptjoining <- coef(est.rate.M2)[1] + coef(est.rate.M2)[2]
se.interceptjoining <- sqrt(
  cov.matrix[1, 1] + cov.matrix[2, 2] + 2 * cov.matrix[1, 2]
)
t.interceptjoining <- est.interceptjoining / se.interceptjoining
sprintf(
  "Intercept for joining: %.3f (SE = %.3f, t = %.3f)",
  est.interceptjoining, se.interceptjoining, t.interceptjoining
)
```

```
## [1] "Intercept for joining: -3.434 (SE = 0.089, t = -38.477)"
```

This script can be continued by looking at better model specifications
than the ones provided, or trying to model the interaction data collected
from RFID badges rather than video recordings, and maybe compare both.