Title: | Relational Event Models (REM) |
---|---|
Description: | Calculate endogenous network effects in event sequences and fit relational event models (REM): Using network event sequences (where each tie between a sender and a target in a network is time-stamped), REMs can measure how networks form and evolve over time. Endogenous patterns such as popularity effects, inertia, similarities, cycles or triads can be calculated and analyzed over time. |
Authors: | Laurence Brandenberger |
Maintainer: | Laurence Brandenberger <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3.1 |
Built: | 2024-12-25 06:51:06 UTC |
Source: | CRAN |
The rem package uses a combination of event history and network analysis to test network dependencies in event sequences. If events in an event sequence depend on each other, network structures and patterns can be calculated and estimated using relational event models. The rem
-package includes functions to calculate endogenous network statistics in (signed) one-, two- and multi-mode network event sequences. The statistics include inertia (inertiaStat), reciprocity (reciprocityStat), in- or outdegree statistics (degreeStat), closing triads (triadStat), closing four-cycles (fourCycleStat) or endogenous similarity statistics (similarityStat). The rate of event occurrence can then be tested using standard models of event history analysis, such as a stratified Cox model (or a conditional logistic regression). createRemDataset can be used to create counting process data sets with dynamic risk sets.
Package: | rem |
Type: | Package |
Version: | 1.3.1 |
Date: | 2018-10-24 |
Laurence Brandenberger [email protected]
Lerner, Jurgen, Bussmann, Margit, Snijders, Tom. A., & Brandes, Ulrik. 2013. Modeling frequency and type of interaction in event networks. Corvinus Journal of Sociology and Social Policy, (1), 3-32.
Brandenberger, Laurence. 2018. Trading Favors - Examining the Temporal Dynamics of Reciprocity in Congressional Collaborations Using Relational Event Models. Social Networks, 54: 238-253.
Malang, Thomas, Laurence Brandeberger and Philip Leifeld. 2018. Networks and Social Influence in European Legislative Politics. British Journal of Political Science. DOI: 10.1017/S0007123417000217.
The function creates counting process data sets with dynamic risk sets for relational event models. For each event in the event sequence, null-events are generated and represent possible events that could have happened at that time but did not. A data set with true and null-events is returned with an event dummy for whether the event occurred or was simply possible (variable eventdummy
). The returned data set also includes a variable eventTime
which represents the true time of the reported event.
createRemDataset(data, sender, target, eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE)
createRemDataset(data, sender, target, eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE)
data |
A data frame containing all the events. |
sender |
A string (or factor or numeric) variable that represents the sender of the event. |
target |
A string (or factor or numeric) variable that represents the target of the event. |
eventSequence |
Numeric variable that represents the event sequence. The variable has to be sorted in ascending order. |
eventAttribute |
An optional variable that represents an attribute to an event. Repeated events affect the construction of the counting process data set. Use the |
time |
An optional date variable that represents the date an event took place. The variable is used if |
start |
An optional numeric variable that indicates at which point in the event sequence a specific event was at risk. The variable has to be numerical and correspond to the variable |
startDate |
An optional date variable that represents the date an event started being at risk. |
end |
An optional numeric variable that indicates at which point in the event sequence a specific event stopped being at risk. The variable has to be numerical and correspond to the variable |
endDate |
An optional date variable that represents the date an event stoped being at risk. |
timeformat |
A character string indicating the format of the |
atEventTimesOnly |
|
untilEventOccurrs |
|
includeAllPossibleEvents |
|
possibleEvents |
An optional data set with the form: column 1 = sender, column 2 = target, 3 = start, 4 = end, 5 = event attribute, 6... . The data set provides all possible events for the entire event sequence and gives each possible event a start and end value to determine when each event could have been possible. This is useful if the risk set follows a complex pattern that cannot be resolved with the above options. E.g., providing a |
returnInputData |
|
To follow.
Laurence Brandenberger [email protected]
## Example 1: standard conditional logistic set-up dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), eventSequence = c(1, 2, 2, 3, 3, 4, 6) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) ## Example 2: add 2 attributes to the event-classification dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), pro.con = c('pro', 'pro', 'con', 'pro', 'con', 'pro', 'pro'), attack = c('yes', 'no', 'no', 'yes', 'yes', 'no', 'yes'), eventSequence = c(1, 2, 2, 3, 3, 4, 6) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = paste0(dt$pro.con, dt$attack), time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) ## Example 3: adding start and end variables # Note: the start and end variables will be overwritten # if there are duplicate events. If you want to # keep the strict start and stop values that you set, use # includeAllPossibleEvents = TRUE and specify a # possibleEvents-data set. # Note 2: if untilEventOccurrs = TRUE and an end # variable is provided, this end variable is # overwritten. Set untilEventOccurrs 0 FALSE and # provide the end variable if you want the events # possibilities to stop at these exact event times. dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), eventSequence = c(1, 2, 2, 3, 3, 4, 6), start = c(0, 0, 1, 1, 1, 3, 3), end = rep(6, 7) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = dt$start, startDate = NULL, end = dt$end, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) ## Example 4: using start (and stop) dates dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), eventSequence = c(1, 2, 2, 3, 3, 4, 6), date = c('01.02.1971', rep('02.02.1971', 2), rep('03.02.1971', 2), '04.02.1971', '06.02.1971'), dateAtRisk = c(rep('21.01.1971', 2), rep('01.02.1971', 5)), dateRiskEnds = rep('01.03.1971', 7) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = dt$date, start = NULL, startDate = dt$dateAtRisk, end = NULL, endDate = NULL, timeformat = '%d.%m.%Y', atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) # if you want to include null-events at times when no event happened, # either see Example 5 or create a start-variable by yourself # by using the eventSequence()-command with the option # 'returnDateSequenceData = TRUE' in this package. With the # generated sequence, dates from startDate can be matched # to the event sequence values (using the match()-command). ## Example 5: using start and stop dates and including # possible events whenever no event occurred. possible.events <- data.frame( sender = c('a', 'c', 'd', 'f'), target = c('b', 'd', 'd', 'a'), start = c(0, 0, 1, 1), end = c(rep(8, 4))) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = TRUE, possibleEvents = possible.events, returnInputData = FALSE) # now you can set 'atEventTimesOnly = FALSE' to include # null-events where none occurred until the events happened count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = FALSE, untilEventOccurrs = TRUE, includeAllPossibleEvents = TRUE, possibleEvents = possible.events, returnInputData = FALSE) # plus you can set to get the full range of the events # (bounded by max(possible.events$end)) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = FALSE, untilEventOccurrs = FALSE, includeAllPossibleEvents = TRUE, possibleEvents = possible.events, returnInputData = FALSE)
## Example 1: standard conditional logistic set-up dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), eventSequence = c(1, 2, 2, 3, 3, 4, 6) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) ## Example 2: add 2 attributes to the event-classification dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), pro.con = c('pro', 'pro', 'con', 'pro', 'con', 'pro', 'pro'), attack = c('yes', 'no', 'no', 'yes', 'yes', 'no', 'yes'), eventSequence = c(1, 2, 2, 3, 3, 4, 6) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = paste0(dt$pro.con, dt$attack), time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) ## Example 3: adding start and end variables # Note: the start and end variables will be overwritten # if there are duplicate events. If you want to # keep the strict start and stop values that you set, use # includeAllPossibleEvents = TRUE and specify a # possibleEvents-data set. # Note 2: if untilEventOccurrs = TRUE and an end # variable is provided, this end variable is # overwritten. Set untilEventOccurrs 0 FALSE and # provide the end variable if you want the events # possibilities to stop at these exact event times. dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), eventSequence = c(1, 2, 2, 3, 3, 4, 6), start = c(0, 0, 1, 1, 1, 3, 3), end = rep(6, 7) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = dt$start, startDate = NULL, end = dt$end, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) ## Example 4: using start (and stop) dates dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), eventSequence = c(1, 2, 2, 3, 3, 4, 6), date = c('01.02.1971', rep('02.02.1971', 2), rep('03.02.1971', 2), '04.02.1971', '06.02.1971'), dateAtRisk = c(rep('21.01.1971', 2), rep('01.02.1971', 5)), dateRiskEnds = rep('01.03.1971', 7) ) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = dt$date, start = NULL, startDate = dt$dateAtRisk, end = NULL, endDate = NULL, timeformat = '%d.%m.%Y', atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = FALSE) # if you want to include null-events at times when no event happened, # either see Example 5 or create a start-variable by yourself # by using the eventSequence()-command with the option # 'returnDateSequenceData = TRUE' in this package. With the # generated sequence, dates from startDate can be matched # to the event sequence values (using the match()-command). ## Example 5: using start and stop dates and including # possible events whenever no event occurred. possible.events <- data.frame( sender = c('a', 'c', 'd', 'f'), target = c('b', 'd', 'd', 'a'), start = c(0, 0, 1, 1), end = c(rep(8, 4))) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = TRUE, possibleEvents = possible.events, returnInputData = FALSE) # now you can set 'atEventTimesOnly = FALSE' to include # null-events where none occurred until the events happened count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = FALSE, untilEventOccurrs = TRUE, includeAllPossibleEvents = TRUE, possibleEvents = possible.events, returnInputData = FALSE) # plus you can set to get the full range of the events # (bounded by max(possible.events$end)) count.data <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$eventSequence, eventAttribute = NULL, time = NULL, start = NULL, startDate = NULL, end = NULL, endDate = NULL, timeformat = NULL, atEventTimesOnly = FALSE, untilEventOccurrs = FALSE, includeAllPossibleEvents = TRUE, possibleEvents = possible.events, returnInputData = FALSE)
Calculate the endogenous network statistic indegree/outdegree
for relational event models. indegree/outdegree
measures the senders' tendency to be involved in events (sender activity, sender out- or indegree) or the tendency of events to surround a specific target (target popularity, target in- or outdegree)
degreeStat(data, time, degreevar, halflife, weight = NULL, eventtypevar = NULL, eventtypevalue = "valuematch", eventfiltervar = NULL, eventfiltervalue = NULL, eventvar = NULL, degreeOnOtherVar = NULL, variablename = "degree", returnData = FALSE, dataPastEvents = NULL, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL)
degreeStat(data, time, degreevar, halflife, weight = NULL, eventtypevar = NULL, eventtypevalue = "valuematch", eventfiltervar = NULL, eventfiltervalue = NULL, eventvar = NULL, degreeOnOtherVar = NULL, variablename = "degree", returnData = FALSE, dataPastEvents = NULL, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL)
data |
A data frame containing all the variables. |
time |
Numeric variable that represents the event sequence. The variable has to be sorted in ascending order. |
degreevar |
A string (or factor or numeric) variable that represents the sender or target of the event. The degree statistic will calculate how often in the past, a given sender or target has been active by counting the number of events in the past where the |
halflife |
A numeric value that is used in the decay function. The vector of past events is weighted by an exponential decay function using the specified halflife. The halflife parameter determines after how long a period the event weight should be halved. E.g. if |
weight |
An optional numeric variable that represents the weight of each event. If |
eventtypevar |
An optional variable that represents the type of the event. Use |
eventtypevalue |
An optional value (or set of values) used to specify how paste events should be filtered depending on their type.
|
eventfiltervar |
An optional numeric/character/or factor variable for each event. If |
eventfiltervalue |
An optional character string that represents the value for which past events should be filtered. To filter the current events, use |
eventvar |
An (optional) dummy variable with 0 values for null-events and 1 values for true events. If the |
degreeOnOtherVar |
A string (or factor or numeric) variable that represents the sender or target of the event. It can be used to calculate target-outdegree or sender-indegree statistics in one-mode networks. For the sender indegree statistic, fill the sender variable into the |
variablename |
An optional value (or values) with the name the degree statistic variable should be given. Default "degree" is used. To be used if |
returnData |
|
dataPastEvents |
An optional |
showprogressbar |
|
inParallel |
|
cluster |
An optional numeric or character value that defines the cluster. By specifying a single number, the cluster option uses the provided number of nodes to parallellize. By specifying a cluster using the |
The degreeStat()
-function calculates an endogenous statistic that measures whether events have a tendency to include either the same sender or the same target over the entire event sequence.
The effect is calculated as follows.
represents the network of past events and includes all events
. These events consist
each of a sender
and a target
(in one-mode networks
) and a weight function
:
where is the event weight (usually a constant set to 1 for each event),
is the current event time,
is the past event time and
is a halflife parameter.
For the degree effect, the past events are filtered to include only events
where the senders or targets are identical to the current sender or target.
Depending on whether the degree statistic is measured on the sender variable or the target variable, either activity or popularity effects are calculated.
For one-mode networks: Four distinct statistics can be calculated: sender-indegree, sender-outdegree, target-indegree or target-outdegree. The sender-indegree measures how often the current sender was targeted by other senders in the past (i.e. how popular were current senders). The sender-outedegree measures how often the current sender was involved in an event, where they were also marked as sender (i.e. how active the current sender has been in the past). The target-indegree statistic measures how often the current targets were targeted in the past (i.e. how popular were current targets). And the target-outdegree measures how often the current targets were senders in the past (i.e. how active were current targets in the past).
For two-mode networks: Two distinct statistics can be calculated: sender-outdegree and target-indegree. Sender-outdegree measures how often the current sender has been involved in an event in the past (i.e. how active the sender has been up until now). The target-indegree statistic measures how often the current target has been involved in an event in the past (i.e. how popular a given target has been before the current event).
An exponential decay function is used to model the effect of time on the endogenous statistics. Each past event that contains the same sender or the same target (depending on the variable specified in degreevar
) and fulfills additional filtering options (specified via event type or event attributes) is weighted with an exponential decay. The further apart the past event is from the present event, the less weight is given to this event. The halflife parameter in the degreeStat()
-function determines at which rate the weights of past events should be reduced.
The eventtypevar
- and eventattributevar
-options help filter the past events more specifically. How they are filtered depends on the eventtypevalue
- and eventattributevalue
-option.
Laurence Brandenberger [email protected]
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] dtrem$important <- dt$important[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # calculate sender-outdegree statistic dtrem$sender.outdegree <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$sender, halflife = 2, eventvar = dtrem$eventDummy, returnData = FALSE) # plot sender-outdegree over time library("ggplot2") ggplot(dtrem, aes(eventTime, sender.outdegree, group = factor(eventDummy), color = factor(eventDummy) ) ) + geom_point()+ geom_smooth() # calculate sender-indegree statistic dtrem$sender.indegree <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$sender, halflife = 2, eventvar = dtrem$eventDummy, degreeOnOtherVar = dtrem$target, returnData = FALSE) # calculate target-indegree statistic dtrem$target.indegree <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$target, halflife = 2, eventvar = dtrem$eventDummy, returnData = FALSE) # calculate target-outdegree statistic dtrem$target.outdegree <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$target, halflife = 2, eventvar = dtrem$eventDummy, degreeOnOtherVar = dtrem$sender, returnData = FALSE) # calculate target-indegree with typematch dtrem$target.indegree.tm <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalue = "valuematch", eventvar = dtrem$eventDummy, returnData = FALSE)
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] dtrem$important <- dt$important[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # calculate sender-outdegree statistic dtrem$sender.outdegree <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$sender, halflife = 2, eventvar = dtrem$eventDummy, returnData = FALSE) # plot sender-outdegree over time library("ggplot2") ggplot(dtrem, aes(eventTime, sender.outdegree, group = factor(eventDummy), color = factor(eventDummy) ) ) + geom_point()+ geom_smooth() # calculate sender-indegree statistic dtrem$sender.indegree <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$sender, halflife = 2, eventvar = dtrem$eventDummy, degreeOnOtherVar = dtrem$target, returnData = FALSE) # calculate target-indegree statistic dtrem$target.indegree <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$target, halflife = 2, eventvar = dtrem$eventDummy, returnData = FALSE) # calculate target-outdegree statistic dtrem$target.outdegree <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$target, halflife = 2, eventvar = dtrem$eventDummy, degreeOnOtherVar = dtrem$sender, returnData = FALSE) # calculate target-indegree with typematch dtrem$target.indegree.tm <- degreeStat(data = dtrem, time = dtrem$eventTime, degreevar = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalue = "valuematch", eventvar = dtrem$eventDummy, returnData = FALSE)
Create the event sequence for relational event models. Continuous or ordinal sequences can be created. Various dates may be excluded from the sequence (e.g. special holidays, specific weekdays or longer time spans).
eventSequence(datevar, dateformat = NULL, data = NULL, type = "continuous", byTime = "daily", excludeDate = NULL, excludeTypeOfDay = NULL, excludeYear = NULL, excludeFrom = NULL, excludeTo = NULL, returnData = FALSE, sortData = FALSE, returnDateSequenceData = FALSE)
eventSequence(datevar, dateformat = NULL, data = NULL, type = "continuous", byTime = "daily", excludeDate = NULL, excludeTypeOfDay = NULL, excludeYear = NULL, excludeFrom = NULL, excludeTo = NULL, returnData = FALSE, sortData = FALSE, returnDateSequenceData = FALSE)
datevar |
The variable containing the information on the date and/or time of the event. |
dateformat |
A character string indicating the format of the |
data |
An optional data frame containing all the variables. |
type |
"' |
byTime |
String value. Specifies at what interval the event sequence is created. Use "daily", "monthly" or "yearly". |
excludeDate |
An optional string or string vector containing one or more dates that should be excluded from the event.sequence. The dates have to be in the same format as provided in |
excludeTypeOfDay |
String value or vector naming the day(s) that should be excluded from the event sequence. Depending on the locale the weekdays may be named differently. Use |
excludeYear |
A string value or vector naming the year(s) that should be excluded from the event sequence. |
excludeFrom |
A string value (or a vector of strings) with the start value of the date from (from-value included) which the event sequence should not be affected. The value has to be in the same format as specified in |
excludeTo |
A string value (or a vector of strings) with the end value of the date to which time the event sequence should not be affected (to-value included). The value has to be in the same format as specified in |
returnData |
|
sortData |
|
returnDateSequenceData |
|
In order to estimate relational event models, the events have to be ordered, either according to an ordinal or a continuous event sequence. The ordinal event sequence simply orders the events and gives each event a place in the sequence.
The continuous event sequence creates an artificial sequence ranging from min(datevar)
to max(datevar)
and matches each event with its place in the artificial event sequence. Dates, years or Weekdays can be excluded from the artificial event sequence. This is useful for excluding specific holidays, weekends etc..
Where two or more events occur at the same time, they are given the same value in the event sequence.
Laurence Brandenberger [email protected]
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') # combine them into a data.frame dt <- data.frame(sender, target, time) # create continuous event sequence: return the data with the # event sequence and sort the data according to the event sequence. dt <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'daily', returnData = TRUE, sortData = TRUE) # alternative : create variable with the continuous event # sequence, unsorted dt$eventSeq <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'daily', returnData = FALSE, sortData = FALSE) # manually sort the data set dt <- dt[order(dt$eventSeq), ] # create the sequence by month dt$eventSeqMonthly <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'monthly', returnData = FALSE, sortData = FALSE) # create the sequence by year dt$eventSeqYearly <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'yearly', returnData = FALSE, sortData = FALSE) # create an ordinal event sequence dt$eventSeqOrdinal <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'ordinal', byTime = 'daily', returnData = FALSE, sortData = FALSE) # exclude certain dates dt$eventSeqEx <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'daily', excludeDate = c('800108', '800112'), returnData = FALSE, sortData = FALSE) # return the sequence data set, where all values in the event sequence # correspond to the date of the events. Useful to calculate # start-variables for the createRemDataset-command. seq.data <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", excludeDate = c("800108", "800112"), returnData = FALSE, sortData = FALSE, returnDateSequenceData = TRUE)
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') # combine them into a data.frame dt <- data.frame(sender, target, time) # create continuous event sequence: return the data with the # event sequence and sort the data according to the event sequence. dt <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'daily', returnData = TRUE, sortData = TRUE) # alternative : create variable with the continuous event # sequence, unsorted dt$eventSeq <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'daily', returnData = FALSE, sortData = FALSE) # manually sort the data set dt <- dt[order(dt$eventSeq), ] # create the sequence by month dt$eventSeqMonthly <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'monthly', returnData = FALSE, sortData = FALSE) # create the sequence by year dt$eventSeqYearly <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'yearly', returnData = FALSE, sortData = FALSE) # create an ordinal event sequence dt$eventSeqOrdinal <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'ordinal', byTime = 'daily', returnData = FALSE, sortData = FALSE) # exclude certain dates dt$eventSeqEx <- eventSequence(datevar = dt$time, dateformat = '%y%m%d', data = dt, type = 'continuous', byTime = 'daily', excludeDate = c('800108', '800112'), returnData = FALSE, sortData = FALSE) # return the sequence data set, where all values in the event sequence # correspond to the date of the events. Useful to calculate # start-variables for the createRemDataset-command. seq.data <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", excludeDate = c("800108", "800112"), returnData = FALSE, sortData = FALSE, returnDateSequenceData = TRUE)
Calculate the endogenous network statistic fourCycle
that
measures the tendency for events to close four cycles in two-mode event sequences.
fourCycleStat(data, time, sender, target, halflife, weight = NULL, eventtypevar = NULL, eventtypevalue = 'standard', eventfiltervar = NULL, eventfilterAB = NULL, eventfilterAJ = NULL, eventfilterIB = NULL, eventfilterIJ = NULL, eventvar = NULL, variablename = 'fourCycle', returnData = FALSE, dataPastEvents = NULL, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL )
fourCycleStat(data, time, sender, target, halflife, weight = NULL, eventtypevar = NULL, eventtypevalue = 'standard', eventfiltervar = NULL, eventfilterAB = NULL, eventfilterAJ = NULL, eventfilterIB = NULL, eventfilterIJ = NULL, eventvar = NULL, variablename = 'fourCycle', returnData = FALSE, dataPastEvents = NULL, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL )
data |
A data frame containing all the variables. |
time |
Numeric variable that represents the event sequence. The variable has to be sorted in ascending order. |
sender |
A string (or factor or numeric) variable that represents the sender of the event. |
target |
A string (or factor or numeric) variable that represents the target of the event. |
halflife |
A numeric value that is used in the decay function.
The vector of past events is weighted by an exponential decay function using the specified halflife. The halflife parameter determins after how long a period the event weight should be halved. E.g. if |
weight |
An optional numeric variable that represents the weight of each event. If |
eventtypevar |
An optional variable that represents the type of the event. Use |
eventtypevalue |
An optional value (or set of values) used to specify how paste events should be filtered depending on their type. |
eventfiltervar |
An optinoal variable that allows filtering of past events using an event attribute. It can be a sender attribute, a target attribute, time or dyad attribute.
Use |
eventfilterAB |
An optional value used to specify how
paste events should be filtered depending on their attribute. Each distinct edge that form a four cycle can be filtered. |
eventfilterAJ |
see |
eventfilterIB |
see |
eventfilterIJ |
see |
eventvar |
An optional dummy variable with 0 values for null-events and 1 values for true events. If the |
variablename |
An optional value (or values) with the name the four cycle statistic variable should be given. To be used if |
returnData |
|
dataPastEvents |
An optional |
showprogressbar |
|
inParallel |
|
cluster |
An optional numeric or character value that defines the cluster. By specifying a single number, the cluster option uses the provided number of nodes to parallellize. By specifying a cluster using the |
The fourCycleStat()
-function calculates an endogenous statistic that measures whether events have a tendency to form four cycles.
The effect is calculated as follows:
represents the network of past events and includes all events
. These events consist
each of a sender
and a target
and a weight function
:
where is the event weight (usually a constant set to 1 for each event),
is the current event time,
is the past event time and
is a halflife parameter.
For the four-cylce effect, the past events are filtered to include only events
where the current event closes an open four-cycle in the past.
An exponential decay function is used to model the effect of time on the endogenous statistics. The further apart the past event is from the present event, the less weight is given to this event. The halflife parameter in the fourCycleStat()
-function determins at which rate the weights of past events should be reduced. Therefore, if the one (or more) of the three events in the four cycle have ocurred further in the past, less weight is given to this four cycle because it becomes less likely that the two senders reacted to each other in the way the four cycle assumes.
The eventtypevar
- and eventfiltervar
-options help filter the past events more specifically. How they are filtered depends on the eventtypevalue
- and eventfilter__
-option.
Laurence Brandenberger [email protected]
# create some data two-mode network event sequence data with # a 'sender', 'target' and a 'time'-variable sender <- c('A', 'B', 'A', 'C', 'A', 'D', 'F', 'G', 'A', 'B', 'B', 'C', 'D', 'E', 'F', 'B', 'C', 'D', 'E', 'C', 'A', 'F', 'E', 'B', 'C', 'E', 'D', 'G', 'A', 'G', 'F', 'B', 'C') target <- c('T1', 'T2', 'T3', 'T2', 'T1', 'T4', 'T6', 'T2', 'T4', 'T5', 'T5', 'T5', 'T1', 'T6', 'T7', 'T2', 'T3', 'T1', 'T1', 'T4', 'T5', 'T6', 'T8', 'T2', 'T7', 'T1', 'T6', 'T7', 'T3', 'T4', 'T7', 'T8', 'T2') time <- c('03.01.15', '04.01.15', '10.02.15', '28.02.15', '01.03.15', '07.03.15', '07.03.15', '12.03.15', '04.04.15', '28.04.15', '06.05.15', '11.05.15', '13.05.15', '17.05.15', '22.05.15', '09.08.15', '09.08.15', '14.08.15', '16.08.15', '29.08.15', '05.09.15', '25.09.15', '02.10.15', '03.10.15', '11.10.15', '18.10.15', '20.10.15', '28.10.15', '04.11.15', '09.11.15', '10.12.15', '11.12.15', '12.12.15') type <- sample(c('con', 'pro'), 33, replace = TRUE) important <- sample(c('important', 'not important'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type, important) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = '%d.%m.%y', data = dt, type = 'continuous', byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] dtrem$important <- dt$important[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # calculate closing four-cycle statistic dtrem$fourCycle <- fourCycleStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 20) # plot closing four-cycles over time: library("ggplot2") ggplot(dtrem, aes (eventTime, fourCycle, group = factor(eventDummy), color = factor(eventDummy)) ) + geom_point()+ geom_smooth() # calculate positive closing four-cycles: general support dtrem$fourCycle.pos <- fourCycleStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventtypevar = dtrem$type, eventtypevalue = 'positive', halflife = 20) # calculate negative closing four-cycles: general opposition dtrem$fourCycle.neg <- fourCycleStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventtypevar = dtrem$type, eventtypevalue = 'negative', halflife = 20)
# create some data two-mode network event sequence data with # a 'sender', 'target' and a 'time'-variable sender <- c('A', 'B', 'A', 'C', 'A', 'D', 'F', 'G', 'A', 'B', 'B', 'C', 'D', 'E', 'F', 'B', 'C', 'D', 'E', 'C', 'A', 'F', 'E', 'B', 'C', 'E', 'D', 'G', 'A', 'G', 'F', 'B', 'C') target <- c('T1', 'T2', 'T3', 'T2', 'T1', 'T4', 'T6', 'T2', 'T4', 'T5', 'T5', 'T5', 'T1', 'T6', 'T7', 'T2', 'T3', 'T1', 'T1', 'T4', 'T5', 'T6', 'T8', 'T2', 'T7', 'T1', 'T6', 'T7', 'T3', 'T4', 'T7', 'T8', 'T2') time <- c('03.01.15', '04.01.15', '10.02.15', '28.02.15', '01.03.15', '07.03.15', '07.03.15', '12.03.15', '04.04.15', '28.04.15', '06.05.15', '11.05.15', '13.05.15', '17.05.15', '22.05.15', '09.08.15', '09.08.15', '14.08.15', '16.08.15', '29.08.15', '05.09.15', '25.09.15', '02.10.15', '03.10.15', '11.10.15', '18.10.15', '20.10.15', '28.10.15', '04.11.15', '09.11.15', '10.12.15', '11.12.15', '12.12.15') type <- sample(c('con', 'pro'), 33, replace = TRUE) important <- sample(c('important', 'not important'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type, important) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = '%d.%m.%y', data = dt, type = 'continuous', byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] dtrem$important <- dt$important[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # calculate closing four-cycle statistic dtrem$fourCycle <- fourCycleStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 20) # plot closing four-cycles over time: library("ggplot2") ggplot(dtrem, aes (eventTime, fourCycle, group = factor(eventDummy), color = factor(eventDummy)) ) + geom_point()+ geom_smooth() # calculate positive closing four-cycles: general support dtrem$fourCycle.pos <- fourCycleStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventtypevar = dtrem$type, eventtypevalue = 'positive', halflife = 20) # calculate negative closing four-cycles: general opposition dtrem$fourCycle.neg <- fourCycleStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventtypevar = dtrem$type, eventtypevalue = 'negative', halflife = 20)
Calculate the endogenous network statistic inertia
for relational event models. inertia
measures the tendency for events to consist of the same sender and target (i.e. repeated events).
inertiaStat(data, time, sender, target, halflife, weight = NULL, eventtypevar = NULL, eventtypevalue = "valuematch", eventfiltervar = NULL, eventfiltervalue = NULL, eventvar = NULL, variablename = "inertia", returnData = FALSE, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL)
inertiaStat(data, time, sender, target, halflife, weight = NULL, eventtypevar = NULL, eventtypevalue = "valuematch", eventfiltervar = NULL, eventfiltervalue = NULL, eventvar = NULL, variablename = "inertia", returnData = FALSE, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL)
data |
A data frame containing all the variables. |
time |
Numeric variable that represents the event sequence. The variable has to be sorted in ascending order. |
sender |
A string (or factor or numeric) variable that represents the sender of the event. |
target |
A string (or factor or numeric) variable that represents the target of the event. |
halflife |
A numeric value that is used in the decay function. The vector of past events is weighted by an exponential decay function using the specified halflife. The halflife parameter determins after how long a period the event weight should be halved. E.g. if |
weight |
An optional numeric variable that represents the weight of each event. If |
eventtypevar |
An optional variable that represents the type of the event. Use |
eventtypevalue |
An optional value (or set of values) used to specify how paste events should be filtered depending on their type.
|
eventfiltervar |
An optional numeric/character/or factor variable for each event. If |
eventfiltervalue |
An optional character string that represents the value for which past events should be filtered. To filter the current events, use |
eventvar |
An optional dummy variable with 0 values for null-events and 1 values for true events. If the |
variablename |
An optional value (or values) with the name the inertia statistic variable should be given. To be used if |
returnData |
|
showprogressbar |
|
inParallel |
|
cluster |
An optional numeric or character value that defines the cluster. By specifying a single number, the cluster option uses the provided number of nodes to parallellize. By specifying a cluster using the |
The inertiaStat()
-function calculates an endogenous statistic that measures whether events have a tendency to be repeated with the same sender and target over the entire event sequence.
The effect is calculated as follows.
represents the network of past events and includes all events
. These events consist
each of a sender
and a target
and a weight function
:
where is the event weight (usually a constant set to 1 for each event),
is the current event time,
is the past event time and
is a halflife parameter.
For the inertia effect, the past events are filtered to include only events
where the senders and targets are identical to the current sender and target.
An exponential decay function is used to model the effect of time on the endogenous statistics. Each past event that contains the same sender and target and fulfills additional filtering options specivied via event type or event attributes is weighted with an exponential decay. The further apart the past event is from the present event, the less weight is given to this event. The halflife parameter in the inertiaStat()
-function determins at which rate the weights of past events should be reduced.
The eventfiltervar
- and eventtypevar
-options help filter the past events more specifically. How they are filtered depends on the eventfiltervalue
- and eventtypevalue
-option.
Laurence Brandenberger [email protected]
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # calculate inertia statistics dtrem$inertia <- inertiaStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2, returnData = FALSE, showprogressbar = FALSE) # plot inertia over time library("ggplot2") ggplot(dtrem, aes ( eventTime, inertia, group = factor(eventDummy), color = factor(eventDummy)) ) + geom_point() + geom_smooth() # inertia with typematch (e.g. for 'cooperation' events only count # past 'cooperation' events) dtrem$inertia.tm <- inertiaStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2, eventtypevar = dtrem$type, eventtypevalue = "valuematch", returnData = FALSE, showprogressbar = FALSE) # inertia with valuemix: for each combination of types # in the eventtypevar, create a variable dtrem <- inertiaStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2, eventtypevar = dtrem$type, eventtypevalue = "valuemix", returnData = TRUE, showprogressbar = FALSE)
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # calculate inertia statistics dtrem$inertia <- inertiaStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2, returnData = FALSE, showprogressbar = FALSE) # plot inertia over time library("ggplot2") ggplot(dtrem, aes ( eventTime, inertia, group = factor(eventDummy), color = factor(eventDummy)) ) + geom_point() + geom_smooth() # inertia with typematch (e.g. for 'cooperation' events only count # past 'cooperation' events) dtrem$inertia.tm <- inertiaStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2, eventtypevar = dtrem$type, eventtypevalue = "valuematch", returnData = FALSE, showprogressbar = FALSE) # inertia with valuemix: for each combination of types # in the eventtypevar, create a variable dtrem <- inertiaStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2, eventtypevar = dtrem$type, eventtypevalue = "valuemix", returnData = TRUE, showprogressbar = FALSE)
Calculate the endogenous network statistic reciprocity
for relational event models. reciprocity
measures the tendency for senders to reciprocate prior events where they were targeted by other senders. One-mode network statistic only.
reciprocityStat(data, time, sender, target, halflife, weight = NULL, eventtypevar = NULL, eventtypevalue = "valuematch", eventfiltervar = NULL, eventfiltervalue = NULL, eventvar = NULL, variablename = "recip", returnData = FALSE, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL)
reciprocityStat(data, time, sender, target, halflife, weight = NULL, eventtypevar = NULL, eventtypevalue = "valuematch", eventfiltervar = NULL, eventfiltervalue = NULL, eventvar = NULL, variablename = "recip", returnData = FALSE, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL)
data |
A data frame containing all the variables. |
time |
Numeric variable that represents the event sequence. The variable has to be sorted in ascending order. |
sender |
A string (or factor or numeric) variable that represents the sender of the event. |
target |
A string (or factor or numeric) variable that represents the target of the event. |
halflife |
A numeric value that is used in the decay function.
The vector of past events is weighted by an exponential decay function using the specified halflife. The halflife parameter determines after how long a period the event weight should be halved. E.g. if |
weight |
An optional numeric variable that represents the weight of each event. If |
eventtypevar |
An optional variable that represents the type of the event. Use |
eventtypevalue |
An optional value (or set of values) used to specify how
paste events should be filtered depending on their type.
|
eventfiltervar |
An optional numeric/character/or factor variable for each event. If |
eventfiltervalue |
An optional character string that represents the value for which past events should be filtered. To filter the current events, use |
eventvar |
An optional dummy variable with 0 values for null-events and 1 values for true events. If the |
variablename |
An optional value (or values) with the name the reciprocity
statistic variable should be given. To be used if |
returnData |
|
showprogressbar |
|
inParallel |
|
cluster |
An optional numeric or character value that defines the cluster. By specifying a single number, the cluster option uses the provided number of nodes to parallellize. By specifying a cluster using the |
The reciprocityStat()
-function calculates an endogenous statistic that measures whether senders have a tendency to reciprocate events.
The effect is calculated as follows:
represents the network of past events and includes all events
. These events consist each of a sender
and a target
and a weight function
:
where is the event weight (usually a constant set to 1 for each event),
is the current event time,
is the past event time and
is a halflife parameter.
For the reciprocity effect, the past events are filtered to include only events where the senders are the present targets and the targets are the present senders:
An exponential decay function is used to model the effect of time on the endogenous statistics. Each past event that involves the sender as target and the target as sender, and fulfills additional filtering options specified via event type or event attributes, is weighted with an exponential decay. The further apart the past event is from the present event, the less weight is given to this event. The halflife parameter in the reciprocityStat()
-function determines at which rate the weights of past events should be reduced.
The eventtypevar
- and eventattributevar
-options help filter the past events more specifically. How they are filtered depends on the eventtypevalue
- and eventattributevalue
-option.
Laurence Brandenberger [email protected]
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) important <- sample(c('important', 'not important'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type, important) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] dtrem$important <- dt$important[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # calculate reciprocity statistic dtrem$recip <- reciprocityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2) # plot sender-outdegree over time library("ggplot2") ggplot(dtrem, aes(eventTime, recip, group = factor(eventDummy), color = factor(eventDummy)) ) + geom_point()+ geom_smooth() # calculate reciprocity statistic with typematch # if a cooperated with b in the past, does # b cooperate with a now? dtrem$recip.typematch <- reciprocityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventtypevar = dtrem$type, eventtypevalue = 'valuematch', halflife = 2) # calculate reciprocity with valuemix on type dtrem <- reciprocityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventtypevar = dtrem$type, eventtypevalue = 'valuemix', halflife = 2, returnData = TRUE) # calculate reciprocity and count important events only dtrem$recip.filtered <- reciprocityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventfiltervar = dtrem$important, eventfiltervalue = 'important', halflife = 2)
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) important <- sample(c('important', 'not important'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type, important) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] dtrem$important <- dt$important[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # calculate reciprocity statistic dtrem$recip <- reciprocityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2) # plot sender-outdegree over time library("ggplot2") ggplot(dtrem, aes(eventTime, recip, group = factor(eventDummy), color = factor(eventDummy)) ) + geom_point()+ geom_smooth() # calculate reciprocity statistic with typematch # if a cooperated with b in the past, does # b cooperate with a now? dtrem$recip.typematch <- reciprocityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventtypevar = dtrem$type, eventtypevalue = 'valuematch', halflife = 2) # calculate reciprocity with valuemix on type dtrem <- reciprocityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventtypevar = dtrem$type, eventtypevalue = 'valuemix', halflife = 2, returnData = TRUE) # calculate reciprocity and count important events only dtrem$recip.filtered <- reciprocityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, eventfiltervar = dtrem$important, eventfiltervalue = 'important', halflife = 2)
Calculate the endogenous network statistic similarity
for relational event models. similarityStat
measures the tendency for senders to adapt their behavior to that of their peers.
similarityStat(data, time, sender, target, senderOrTarget = 'sender', whichSimilarity = NULL, halflifeLastEvent = NULL, halflifeTimeBetweenEvents = NULL, eventtypevar = NULL, eventfiltervar = NULL, eventfiltervalue = NULL, eventvar = NULL, variablename = 'similarity', returnData = FALSE, dataPastEvents = NULL, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL )
similarityStat(data, time, sender, target, senderOrTarget = 'sender', whichSimilarity = NULL, halflifeLastEvent = NULL, halflifeTimeBetweenEvents = NULL, eventtypevar = NULL, eventfiltervar = NULL, eventfiltervalue = NULL, eventvar = NULL, variablename = 'similarity', returnData = FALSE, dataPastEvents = NULL, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL )
data |
A data frame containing all the variables. |
time |
Numeric variable that represents the event sequence. The variable has to be sorted in ascending order. |
sender |
A string (or factor or numeric) variable that represents the sender of the event. |
target |
A string (or factor or numeric) variable that represents the target of the event. |
senderOrTarget |
|
whichSimilarity |
|
halflifeLastEvent |
A numeric value that is used in the decay function. The vector of past events is weighted by an exponential decay function using the specified halflife. The halflife parameter determines after how long a period the event weight should be halved. For sender similarity: The halflife determines the weight of the count of targets that two actors have in common. The further back the second sender was active, the less weight is given the similarity between this sender and the current sender. For target similarity: The halflife determines the weight of the count of targets that have used both been used by other senders in the past. The longer ago the current sender engaged in an event with the other target, the less weight is given the count. |
halflifeTimeBetweenEvents |
A numeric value that is used in the decay function. Instead of counting each past event for the similarity statistic, each event is reduced depending on the time that passed between the current event and the past event. For sender similarity: Each target that two actors have in common is weighted by the time that passed between the two events. For target similarity: Each sender that two targets have in common is weighted by the time that passed between the two events. |
eventtypevar |
An optional dummy variable that represents the type of the event. If specified, only past events are considered for the count that reflect the same type as the current event (typematch). |
eventfiltervar |
An optional variable that filters past events by the |
eventfiltervalue |
A string that represents an event attribute by which all past events have to be filtered by. |
eventvar |
An optional dummy variable with 0 values for null-events and 1 values for true events. If the |
variablename |
An optional value (or values) with the name the similarity statistic variable should be given. To be used if |
returnData |
|
dataPastEvents |
An optional |
showprogressbar |
|
inParallel |
|
cluster |
An optional numeric or character value that defines the cluster. By specifying a single number, the cluster option uses the provided number of nodes to parallellize. By specifying a cluster using the |
The similiarityStat()
-function calculates an endogenous statistic that measures whether sender (or targets) have a tendency to cluster together. Tow distinct types of similarity measures can be calculated: sender similarity or target similarity.
Sender similarity: How many targets does the current sender have in common with senders who used the current target in the past? How likely is it that two senders are alike?
The function proceeds as follows:
First it filters out all the targets that the present sender used in the past
Next it filters out all the senders that have also used the current target
For each of the senders found in (2) it compiles a list of targets that this sender has used in the past
For each of the senders found in (2) it cross-checks the two lists generated in (1) and (3) and count how many targets the two senders have in common.
Target similarity: How many senders have used the same two concepts that the current sender has used (in the past and is currently using)? For each target that the current sender has used in the past, how many senders have also used these past targets as well as the current target? How likely is it that two targets are used together?
The function proceeds as follows:
First filter out all the targets that the current sender has used in the past
Next it filters out all the senders that have also used the current target
For each target found in (1) it compiles a list of senders that have also used this target in the past
For each target found in (1) it cross-checks the list of senders that have used (found under (2)) and the list of senders that also used one other target that
used (found under (3))
Two decay functions may be used in the calculation of the similarity score for each event.
Laurence Brandenberger [email protected]
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) important <- sample(c('important', 'not important'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type, important) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] dtrem$important <- dt$important[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # average sender similarity dtrem$s.sim.av <- similarityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, senderOrTarget = "sender", whichSimilarity = "average") # average target similarity dtrem$t.sim.av <- similarityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, senderOrTarget = "target", whichSimilarity = "average") # Calculate sender similarity with 1 halflife # parameter: This parameter makes sure, that those other # senders (with whom you compare your targets) have been # active in the past. THe longer they've done nothing, the # less weight is given to the number of similar targets. dtrem$s.sim.hl2 <- similarityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, senderOrTarget = "sender", halflifeLastEvent = 2) # Calculate sender similarity with 2 halflife parameters: # The first parameter makes sure that the actors against # whom you compare yourself have been active in the # recent past. The second halflife parameter makes # sure that the two events containing the same # targets (once by the current actor, once by the other # actor) are not that far apart. The longer apart, the # less likely it is that the current sender will remember # how the similar-past sender has acted. dtrem$s.sim.hl2.hl1 <- similarityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, senderOrTarget = "sender", halflifeLastEvent = 2, halflifeTimeBetweenEvents = 1)
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'NIR', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'NAT', 'USA') target <- c('BNG', 'ZAM', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'BEL', 'PAK') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) important <- sample(c('important', 'not important'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type, important) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dtrem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] dtrem$important <- dt$important[match(dtrem$eventID, dt$eventID)] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # average sender similarity dtrem$s.sim.av <- similarityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, senderOrTarget = "sender", whichSimilarity = "average") # average target similarity dtrem$t.sim.av <- similarityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, senderOrTarget = "target", whichSimilarity = "average") # Calculate sender similarity with 1 halflife # parameter: This parameter makes sure, that those other # senders (with whom you compare your targets) have been # active in the past. THe longer they've done nothing, the # less weight is given to the number of similar targets. dtrem$s.sim.hl2 <- similarityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, senderOrTarget = "sender", halflifeLastEvent = 2) # Calculate sender similarity with 2 halflife parameters: # The first parameter makes sure that the actors against # whom you compare yourself have been active in the # recent past. The second halflife parameter makes # sure that the two events containing the same # targets (once by the current actor, once by the other # actor) are not that far apart. The longer apart, the # less likely it is that the current sender will remember # how the similar-past sender has acted. dtrem$s.sim.hl2.hl1 <- similarityStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, senderOrTarget = "sender", halflifeLastEvent = 2, halflifeTimeBetweenEvents = 1)
Calculate time-to-next-event or time-since-date for a REM data set.
timeToEvent(time, type = 'time-to-next-event', timeEventPossible = NULL)
timeToEvent(time, type = 'time-to-next-event', timeEventPossible = NULL)
time |
A integer or Date variable reflecting the time of the event. Note: make sure to specify event time not the event sequence in a counting process data set. |
type |
Either 'time-to-next-event' or 'time-since-date'.
|
timeEventPossible |
An optional integer or Date variable to be used if |
To come.
Laurence Brandenberger [email protected]
## get some random data dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), date = c(rep('10.01.90',2), '11.01.90', '04.01.90', '05.01.90', rep('10.01.90',2)), start = c(0, 0, 1, 1, 1, 3, 3), end = rep(6, 7), targetAvailableSince = c(rep(-10,6), -2), dateTargetAvailable = c(rep('31.12.89',6), '01.01.90') ) ## create event sequence dt <- eventSequence(dt$date, dateformat = '%d.%m.%y', data = dt, type = "continuous", byTime = "daily", excludeDate = '07.01.90', returnData = TRUE, sortData = TRUE, returnDateSequenceData = FALSE) ## also return the sequenceData dt.seq <- eventSequence(dt$date, dateformat = '%d.%m.%y', data = dt, type = "continuous", byTime = "daily", excludeDate = '07.01.90', returnDateSequenceData = TRUE) ## create counting process data set dts <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$event.seq.cont, eventAttribute = NULL, time = NULL, start = dt$start, startDate = NULL, end = dt$end, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dt.rem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dt.rem$targetAvailableSince <- dt$targetAvailableSince[match(dt.rem$eventID, dt$eventID)] dt.rem$dateTargetAvailable <- dt$dateTargetAvailable[match(dt.rem$eventID, dt$eventID)] ## add dates to the eventTime dt.rem$eventDate <- dt.seq$date.sequence[match(dt.rem$eventTime, dt.seq$event.sequence)] ## sort the dataframe according to eventTime dt.rem <- dt.rem[order(dt.rem$eventTime), ] ## 1. numeric, time-to-next-event dt.rem$timeToNextEvent <- timeToEvent(as.integer(dt.rem$eventTime)) ## 2. numeric, time-since dt.rem$timeSince <- timeToEvent(dt.rem$eventTime, type = 'time-since-date', dt.rem$targetAvailableSince) ## 3. Date, time-to-next-event # since the event sequence excluded 06.01.90 => time to next event differs # for the two specification with the integr (1) and the Date-variable (2). # To be consistent, pick the eventTime instead of the Date-variable. dt.rem$timeToNextEvent2 <- timeToEvent(as.Date(dt.rem$eventDate, '%d.%m.%y')) ## 4. Date, time-since dt.rem$timeSince2 <- timeToEvent( as.Date(dt.rem$eventDate, '%d.%m.%y'), type = 'time-since-date', as.Date(dt.rem$dateTargetAvailable, '%d.%m.%y'))
## get some random data dt <- data.frame( sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'), target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'), date = c(rep('10.01.90',2), '11.01.90', '04.01.90', '05.01.90', rep('10.01.90',2)), start = c(0, 0, 1, 1, 1, 3, 3), end = rep(6, 7), targetAvailableSince = c(rep(-10,6), -2), dateTargetAvailable = c(rep('31.12.89',6), '01.01.90') ) ## create event sequence dt <- eventSequence(dt$date, dateformat = '%d.%m.%y', data = dt, type = "continuous", byTime = "daily", excludeDate = '07.01.90', returnData = TRUE, sortData = TRUE, returnDateSequenceData = FALSE) ## also return the sequenceData dt.seq <- eventSequence(dt$date, dateformat = '%d.%m.%y', data = dt, type = "continuous", byTime = "daily", excludeDate = '07.01.90', returnDateSequenceData = TRUE) ## create counting process data set dts <- createRemDataset( data = dt, sender = dt$sender, target = dt$target, eventSequence = dt$event.seq.cont, eventAttribute = NULL, time = NULL, start = dt$start, startDate = NULL, end = dt$end, endDate = NULL, timeformat = NULL, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, includeAllPossibleEvents = FALSE, possibleEvents = NULL, returnInputData = TRUE) ## divide up the results: counting process data = 1, original data = 2 dt.rem <- dts[[1]] dt <- dts[[2]] ## merge all necessary event attribute variables back in dt.rem$targetAvailableSince <- dt$targetAvailableSince[match(dt.rem$eventID, dt$eventID)] dt.rem$dateTargetAvailable <- dt$dateTargetAvailable[match(dt.rem$eventID, dt$eventID)] ## add dates to the eventTime dt.rem$eventDate <- dt.seq$date.sequence[match(dt.rem$eventTime, dt.seq$event.sequence)] ## sort the dataframe according to eventTime dt.rem <- dt.rem[order(dt.rem$eventTime), ] ## 1. numeric, time-to-next-event dt.rem$timeToNextEvent <- timeToEvent(as.integer(dt.rem$eventTime)) ## 2. numeric, time-since dt.rem$timeSince <- timeToEvent(dt.rem$eventTime, type = 'time-since-date', dt.rem$targetAvailableSince) ## 3. Date, time-to-next-event # since the event sequence excluded 06.01.90 => time to next event differs # for the two specification with the integr (1) and the Date-variable (2). # To be consistent, pick the eventTime instead of the Date-variable. dt.rem$timeToNextEvent2 <- timeToEvent(as.Date(dt.rem$eventDate, '%d.%m.%y')) ## 4. Date, time-since dt.rem$timeSince2 <- timeToEvent( as.Date(dt.rem$eventDate, '%d.%m.%y'), type = 'time-since-date', as.Date(dt.rem$dateTargetAvailable, '%d.%m.%y'))
Calculate the endogenous network statistic triads
that measures the tendency for events to close open triads.
triadStat(data, time, sender, target, halflife, weight = NULL, eventtypevar = NULL, eventtypevalues = NULL, eventfiltervar = NULL, eventfilterAI = NULL, eventfilterBI = NULL, eventfilterAB = NULL, eventvar = NULL, variablename = 'triad', returnData = FALSE, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL )
triadStat(data, time, sender, target, halflife, weight = NULL, eventtypevar = NULL, eventtypevalues = NULL, eventfiltervar = NULL, eventfilterAI = NULL, eventfilterBI = NULL, eventfilterAB = NULL, eventvar = NULL, variablename = 'triad', returnData = FALSE, showprogressbar = FALSE, inParallel = FALSE, cluster = NULL )
data |
A data frame containing all the variables. |
time |
Numeric variable that represents the event sequence. The variable has to be sorted in ascending order. |
sender |
A string (or factor or numeric) variable that represents the sender of the event. |
target |
A string (or factor or numeric) variable that represents the target of the event. |
halflife |
A numeric value that is used in the decay function. The vector of past events is weighted by an exponential decay function using the specified halflife. The halflife parameter determins after how long a period the event weight should be halved. E.g. if |
weight |
An optional numeric variable that represents the weight of each event. If |
eventtypevar |
An optional dummy variable that represents the type of the event. Use |
eventtypevalues |
Two string values that represent the type of the past events. The first string value represents the eventtype that exists for all past events that include the current sender (either as sender or target) and a third actor. The second value represents the eventtype for all past events that include the target (either as sender or target) as well as the third actor.
An example: Let the |
eventfiltervar |
An optional string (or factor or numeric) variable that can be used to filter past and current events. Use |
eventfilterAI |
An optional value used to specify how paste events should be filtered depending on their attribute. Each distinct edge that form a triad can be filtered. |
eventfilterBI |
see |
eventfilterAB |
see |
eventvar |
An optional dummy variable with 0 values for null-events and 1 values for true events. If the |
variablename |
An optional value (or values) with the name the triad
statistic variable should be given. To be used if |
returnData |
|
showprogressbar |
|
inParallel |
|
cluster |
An optional numeric or character value that defines the cluster. By specifying a single number, the cluster option uses the provided number of nodes to parallellize. By specifying a cluster using the |
The triadStat()
-function calculates an endogenous statistic that measures whether events have a tendency to form closing triads.
The effect is calculated as follows:
represents the network of past events and includes all events
. These events consist
each of a sender
and a target
and a weight function
:
where is the event weight (usually a constant set to 1 for each event),
is the current event time,
is the past event time and
is a halflife parameter.
For the triad effect, the past events are filtered to include only events
where the current event closes an open triad in the past.
An exponential decay function is used to model the effect of time on the endogenous statistics. The further apart the past event is from the present event, the less weight is given to this event. The halflife parameter in the triadStat()
-function determines at which rate the weights of past events should be reduced. Therefore, if the one (or more) of the two events in the triad have occurred further in the past, less weight is given to this triad because it becomes less likely that the sender and target actors reacted to each other in the way the triad assumes.
The eventtypevar
- and eventattributevar
-options help filter the past events more specifically. How they are filtered depends on the eventtypevalue
- and eventattributevalue
-option.
Laurence Brandenberger [email protected]
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'UNK', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'UNK', 'IRN') target <- c('BNG', 'RUS', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'IRN', 'CHN') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) important <- sample(c('important', 'not important'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type, important) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) dtrem <- dts[[1]] dt <- dts[[2]] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # merge type-variable back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] # calculate triad statistic dtrem$triad <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2) # calculate friend-of-friend statistic dtrem$triad.fof <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalues = c("cooperation", "cooperation"), eventvar = dtrem$eventDummy) # calculate friend-of-enemy statistic dtrem$triad.foe <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalues = c("conflict", "cooperation"), eventvar = dtrem$eventDummy) # calculate enemy-of-friend statistic dtrem$triad.eof <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalues = c("cooperation", "conflict"), eventvar = dtrem$eventDummy) # calculate enemy-of-enemy statistic dtrem$triad.eoe <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalues = c("conflict", "conflict"), eventvar = dtrem$eventDummy)
# create some data with 'sender', 'target' and a 'time'-variable # (Note: Data used here are random events from the Correlates of War Project) sender <- c('TUN', 'UNK', 'NIR', 'TUR', 'TUR', 'USA', 'URU', 'IRQ', 'MOR', 'BEL', 'EEC', 'USA', 'IRN', 'IRN', 'USA', 'AFG', 'ETH', 'USA', 'SAU', 'IRN', 'IRN', 'ROM', 'USA', 'USA', 'PAN', 'USA', 'USA', 'YEM', 'SYR', 'AFG', 'NAT', 'UNK', 'IRN') target <- c('BNG', 'RUS', 'JAM', 'SAU', 'MOM', 'CHN', 'IRQ', 'AFG', 'AFG', 'EEC', 'BEL', 'ITA', 'RUS', 'UNK', 'IRN', 'RUS', 'AFG', 'ISR', 'ARB', 'USA', 'USA', 'USA', 'AFG', 'IRN', 'IRN', 'IRN', 'AFG', 'PAL', 'ARB', 'USA', 'EEC', 'IRN', 'CHN') time <- c('800107', '800107', '800107', '800109', '800109', '800109', '800111', '800111', '800111', '800113', '800113', '800113', '800114', '800114', '800114', '800116', '800116', '800116', '800119', '800119', '800119', '800122', '800122', '800122', '800124', '800125', '800125', '800127', '800127', '800127', '800204', '800204', '800204') type <- sample(c('cooperation', 'conflict'), 33, replace = TRUE) important <- sample(c('important', 'not important'), 33, replace = TRUE) # combine them into a data.frame dt <- data.frame(sender, target, time, type, important) # create event sequence and order the data dt <- eventSequence(datevar = dt$time, dateformat = "%y%m%d", data = dt, type = "continuous", byTime = "daily", returnData = TRUE, sortData = TRUE) # create counting process data set (with null-events) - conditional logit setting dts <- createRemDataset(dt, dt$sender, dt$target, dt$event.seq.cont, eventAttribute = dt$type, atEventTimesOnly = TRUE, untilEventOccurrs = TRUE, returnInputData = TRUE) dtrem <- dts[[1]] dt <- dts[[2]] # manually sort the data set dtrem <- dtrem[order(dtrem$eventTime), ] # merge type-variable back in dtrem$type <- dt$type[match(dtrem$eventID, dt$eventID)] # calculate triad statistic dtrem$triad <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, eventvar = dtrem$eventDummy, halflife = 2) # calculate friend-of-friend statistic dtrem$triad.fof <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalues = c("cooperation", "cooperation"), eventvar = dtrem$eventDummy) # calculate friend-of-enemy statistic dtrem$triad.foe <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalues = c("conflict", "cooperation"), eventvar = dtrem$eventDummy) # calculate enemy-of-friend statistic dtrem$triad.eof <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalues = c("cooperation", "conflict"), eventvar = dtrem$eventDummy) # calculate enemy-of-enemy statistic dtrem$triad.eoe <- triadStat(data = dtrem, time = dtrem$eventTime, sender = dtrem$sender, target = dtrem$target, halflife = 2, eventtypevar = dtrem$type, eventtypevalues = c("conflict", "conflict"), eventvar = dtrem$eventDummy)