Assign behavior estimates to observations
assign_behavior(dat.orig, dat.seg.list, theta.estim.long, behav.names)
dat.orig 
A data frame that contains all of the original data for all
animal IDs. Must be same as was used to originally segment the tracks. Must
have columns 
dat.seg.list 
A list of data associated with each animal ID where names
of list elements are the ID names and tracks have already been segmented.
Must have columns 
theta.estim.long 
A data frame in long format where each observation (time1) of each track segment (tseg) of each animal ID (id) has separate rows for behavior proportion estimates per state. Columns for behavior and proportion estimates should be labeled behavior and prop, respectively. Date (in POSIXct format) should also be included as a column labeled date. 
behav.names 
character. A vector of names to label each state (in order). 
A data frame of all animal IDs where columns (with names from
behav.names
) include proportions of each behavioral state per
observation, as well as a column that stores the dominant behavior within a
given track segment for which the observation belongs (behav
). This
is merged with the original data frame dat.orig
, so any observations
that were excluded (not at primary time interval) will show NA
for
behavior estimates.
#load original and segmented data
data(tracks)
data(tracks.seg)
#convert segmented dataset into list
tracks.list< df_to_list(dat = tracks.seg, ind = "id")
#select only id, tseg, SL, and TA columns
tracks.seg2< tracks.seg[,c("id","tseg","SL","TA")]
#summarize data by track segment
obs< summarize_tsegs(dat = tracks.seg2, nbins = c(5,8))
#cluster data with LDA
res< cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000,
nburn = 500, nmaxclust = 7, ndata.types = 2)
#Extract proportions of behaviors per track segment
theta.estim< extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7)
#Create augmented matrix by replicating rows (tsegs) according to obs per tseg
theta.estim.long< expand_behavior(dat = tracks.seg, theta.estim = theta.estim, obs = obs,
nbehav = 3, behav.names = c("Encamped","ARS","Transit"),
behav.order = c(1,2,3))
#Run function
dat.out< assign_behavior(dat.orig = tracks, dat.seg.list = tracks.list,
theta.estim.long = theta.estim.long,
behav.names = c("Encamped","ARS","Transit"))
After breakpoints have been extracted for each animal ID, this function assigns the associated segment number to observations for each animal ID. These segments of observations will be used in the second stage of the model framework to perform mixedmembership clustering by Latent Dirichlet Allocation.
assign_tseg(dat, brkpts)
dat 
A list where each element stores the data for a unique animal ID.
Each element is a data frame that contains all data associated for a given
animal ID and must include a column labeled 
brkpts 
A data frame of breakpoints for each animal ID (as generated by

A data frame that updates the original data object by including the segment number associated with each observation in relation to the extracted breakpoints.
#load data
data(tracks.list)
#subset only first track
tracks.list< tracks.list[1]
#only retain id and discretized step length (SL) and turning angle (TA) columns
tracks.list2< purrr::map(tracks.list,
subset,
select = c(id, SL, TA))
set.seed(1)
# Define model params
alpha< 1
ngibbs< 1000
nbins< c(5,8)
#future::plan(future::multisession) #run all MCMC chains in parallel
dat.res< segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins,
alpha = alpha)
# Determine MAP iteration for selecting breakpoints and store breakpoints
MAP.est< get_MAP(dat = dat.res$LML, nburn = ngibbs/2)
brkpts< get_breakpts(dat = dat.res$brkpts, MAP.est = MAP.est)
# Assign track segments to all observations by ID
tracks.seg< assign_tseg(dat = tracks.list, brkpts = brkpts)
After breakpoints have been extracted for each animal ID, this function assigns the associated segment number to observations for each animal ID. These segments of observations will be used in the second stage of the model framework to perform mixedmembership clustering by Latent Dirichlet Allocation.
assign_tseg_internal(dat, brkpts)
dat 
A data frame that contains all data associated for a given animal
ID. Must include a column labeled 
brkpts 
A data frame of breakpoints for each animal ID (as generated by

A data frame that updates the original data object by including the segment number associated with each observation in relation to the extracted breakpoints.
This function serves as a wrapper for samp_move
by running this
sampler for each iteration of the MCMC chain. It is called by
segment_behavior
to run the RJMCMC on all animal IDs
simultaneously.
behav_gibbs_sampler(dat, ngibbs, nbins, alpha, breakpt, p)
dat 
A data frame that only contains columns for the animal IDs and for each of the discretized movement variables. 
ngibbs 
numeric. The total number of iterations of the MCMC chain. 
nbins 
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within

alpha 
numeric. A single value used to specify the hyperparameter for
the prior distribution. A standard value for 
breakpt 
numeric. A vector of breakpoints if prespecifying where they
may occur, otherwise 
p 
An object storing information from

A list of the breakpoints, the number of breakpoints, and the log marginal likelihood at each MCMC iteration, as well as the time it took the model to finish running. This is only provided for the data of a single animal ID.
Transforms vectors of bin numbers into full matrices for plotting as a heatmap.
behav_seg_image(dat, nbins)
dat 
A data frame for a single animal ID that contains only columns for
the ID and each of the movement variables that were analyzed by

nbins 
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within

A list where each element stores the presenceabsence matrix for each of the movement variables.
This function uses a Gibbs sampler within a mixture model to estimate the optimal number of behavioral states, the statedependent distributions, and to assign behavioral states to each observation. This model does not assume an underlying mechanistic process.
cluster_obs(dat, alpha, ngibbs, nmaxclust, nburn)
dat 
A data frame that **only** contains columns for the discretized movement variables. 
alpha 
numeric. A single value used to specify the hyperparameter for the prior distribution. 
ngibbs 
numeric. The total number of iterations of the MCMC chain. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
nburn 
numeric. The length of the burnin phase. 
The mixture model analyzes all animal IDs pooled together, thus providing a populationlevel estimate of behavioral states.
A list of model results is returned where elements include the
phi
matrix for each data stream, theta
matrix, log likelihood
estimates for each iteration of the MCMC chain loglikel
, a list of
the MAP estimates of the latent states for each observation z.MAP
, a
matrix of the whole posterior of state assignments per observation
z.posterior
, and a vector gamma1
of estimates for the gamma
hyperparameter.
data(tracks.list)
#convert from list to data frame
tracks.list< dplyr::bind_rows(tracks.list)
#only retain id and discretized step length (SL) and turning angle (TA) columns
tracks< subset(tracks.list, select = c(SL, TA))
set.seed(1)
# Define model params
alpha=0.1
ngibbs=1000
nburn=ngibbs/2
nmaxclust=7
dat.res< cluster_obs(dat = tracks, alpha = alpha, ngibbs = ngibbs,
nmaxclust = nmaxclust, nburn = nburn)
This function performs a Gibbs sampler within the Latent Dirichlet Allocation
(LDA) model to estimate proportions of each behavioral state for all time
segments generated by segment_behavior
. This is the second
stage of the twostage Bayesian model that estimates proportions of
behavioral states by first segmenting individual tracks into relatively
homogeneous segments of movement.
cluster_segments(dat, gamma1, alpha, ngibbs, nmaxclust, nburn, ndata.types)
dat 
A data frame returned by 
gamma1 
numeric. A hyperparameter for the truncated stickbreaking
prior for estimating the 
alpha 
numeric. A hyperparameter for the Dirichlet distribution when
estimating the 
ngibbs 
numeric. The total number of iterations of the MCMC chain. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
nburn 
numeric. The length of the burnin phase. 
ndata.types 
numeric. A vector of the number of bins used to discretize
each movement variable. These must be in the same order as the columns
within 
The LDA model analyzes all animal IDs pooled together, thereby providing populationlevel estimates of behavioral states.
A list of model results is returned where elements include the
phi
matrix for each data stream, theta
matrix, log likelihood
estimates for each iteration of the MCMC chain loglikel
, and
matrices of the latent cluster estimates for each data stream z.agg
.
#load data
data(tracks.seg)
#select only id, tseg, SL, and TA columns
tracks.seg2< tracks.seg[,c("id","tseg","SL","TA")]
#summarize data by track segment
obs< summarize_tsegs(dat = tracks.seg2, nbins = c(5,8))
#cluster data with LDA
res< cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000,
nburn = 500, nmaxclust = 7, ndata.types = 2)
Internal function that calculates the inverted cumsum
CumSumInv(ntsegm, nmaxclust, z)
ntsegm 
An integer. 
nmaxclust 
An integer. 
z 
An integer matrix. 
Converts an object of class data.frame
to a list where each element is
a separate animal ID. This function prepares the data for further analysis
and when mapping other functions onto the data for separate animal IDs.
df_to_list(dat, ind)
dat 
A data frame containing the data for each animal ID. 
ind 
character. The name of the column storing the animal IDs. 
A list where each element stores the data for a separate animal ID.
#load data
data(tracks)
#convert to list
dat.list< df_to_list(dat = tracks, ind = "id")
Convert movement variables from continuous to discrete values for analysis by
segment_behavior
.
discrete_move_var(dat, lims, varIn, varOut)
dat 
A data frame that contains the variable(s) of interest to convert from continuous to discrete values. 
lims 
A list of the bin limits for each variable. Each element of the list should be a vector of real numbers. 
varIn 
A vector of names for the continuous variable stored as columns
within 
varOut 
A vector of names for the storage of the discrete variables returned by the function. 
A data frame with new columns of discretized variables as labeled by
varOut
.
#load data
data(tracks)
#subset only first track
tracks< tracks[tracks$id == "id1",]
#calculate step lengths and turning angles
tracks< prep_data(dat = tracks, coord.names = c("x","y"), id = "id")
#round times to nearest interval of interest (e.g. 3600 s or 1 hr)
tracks< round_track_time(dat = tracks, id = "id", int = 3600, tol = 180, time.zone = "UTC",
units = "secs")
#create list from data frame
tracks.list< df_to_list(dat = tracks, ind = "id")
#filter observations to only 1 hr (or 3600 s)
tracks_filt.list< filter_time(dat.list = tracks.list, int = 3600)
#define bin number and limits for turning angles and step lengths
angle.bin.lims=seq(from=pi, to=pi, by=pi/4) #8 bins
dist.bin.lims=quantile(tracks[tracks$dt == 3600,]$step,
c(0,0.25,0.50,0.75,0.90,1), na.rm=TRUE) #5 bins
# Assign bins to observations
tracks_disc.list< purrr::map(tracks_filt.list,
discrete_move_var,
lims = list(dist.bin.lims, angle.bin.lims),
varIn = c("step", "angle"),
varOut = c("SL", "TA"))
Expand behavior estimates from track segments to observations
expand_behavior(dat, theta.estim, obs, nbehav, behav.names, behav.order)
dat 
A data frame of the animal ID, track segment labels, and all other data per observation. Animal ID, date, track segment, and observation number columns must be labeled id, date, tseg, and time1, respectively. 
theta.estim 
A matrix (returned by 
obs 
A data frame summarizing the number of observations within each
bin per movement variable that is returned by

nbehav 
numeric. The number of behavioral states that will be retained in 1 to nmaxclust. 
behav.names 
character. A vector of names to label each state (in order). 
behav.order 
numeric. A vector that identifies the order in which the user would like to rearrange the behavioral states. If satisfied with order returned by the LDA model, this still must be specified. 
A new data frame that expands behavior proportions for each
observation within all track segments, including the columns labeled
time1 and date from the original dat
data frame.
#load data
data(tracks.seg)
#select only id, tseg, SL, and TA columns
tracks.seg2< tracks.seg[,c("id","tseg","SL","TA")]
#summarize data by track segment
obs< summarize_tsegs(dat = tracks.seg2, nbins = c(5,8))
#cluster data with LDA
res< cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000,
nburn = 500, nmaxclust = 7, ndata.types = 2)
#Extract proportions of behaviors per track segment
theta.estim< extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7)
#Create augmented matrix by replicating rows (tsegs) according to obs per tseg
theta.estim.long< expand_behavior(dat = tracks.seg, theta.estim = theta.estim, obs = obs,
nbehav = 3, behav.names = c("Encamped","ARS","Transit"),
behav.order = c(1,2,3))
Calculates the mean of the posterior for the proportions of each behavior within track segments. These results can be explored to determine the optimal number of latent behavioral states.
extract_prop(res, ngibbs, nburn, nmaxclust)
res 
A list of results returned by 
ngibbs 
numeric. The total number of iterations of the MCMC chain. 
nburn 
numeric. The length of the burnin phase. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
A matrix that stores the proportions of each state/cluster (columns) per track segment (rows).
#load data
data(tracks.seg)
#select only id, tseg, SL, and TA columns
tracks.seg2< tracks.seg[,c("id","tseg","SL","TA")]
#summarize data by track segment
obs< summarize_tsegs(dat = tracks.seg2, nbins = c(5,8))
#cluster data with LDA
res< cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000,
nburn = 500, nmaxclust = 7, ndata.types = 2)
#Extract proportions of behaviors per track segment
theta.estim< extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7)
Selects observations that belong to the time interval of interest and removes
all others. This function also removes entire IDs from the dataset when there
is one or fewer observations at this time interval. This function works
closely with round_track_time
to only retain observations
sampled at a regular time interval, which is important for analyzing step
lengths and turning angles. Column storing the time intervals must be labeled
dt
.
filter_time(dat.list, int)
dat.list 
A list of data associated with each animal ID where names of list elements are the ID names. 
int 
numeric. The time interval of interest. 
A list where observations for each animal ID (element) has been
filtered for int
. Two columns (obs
and time1
) are
added for each list element (ID), which store the original observation
number before filtering and the new observation number after filtering,
respectively.
#load data
data(tracks)
#subset only first track
tracks< tracks[tracks$id == "id1",]
#calculate step lengths and turning angles
tracks< prep_data(dat = tracks, coord.names = c("x","y"), id = "id")
#round times to nearest interval of interest (e.g. 3600 s or 1 hr)
tracks< round_track_time(dat = tracks, id = "id", int = 3600, tol = 180, time.zone = "UTC",
units = "secs")
#create list from data frame
tracks.list< df_to_list(dat = tracks, ind = "id")
#filter observations to only 1 hr (or 3600 s)
tracks_filt.list< filter_time(dat.list = tracks.list, int = 3600)
Identify changes within a discrete variable. These values can be used to
prespecify breakpoints within the segmentation model using
segment_behavior
.
find_breaks(dat, ind)
dat 
A data frame containing the data for each animal ID. 
ind 
character. The name of the column storing the discrete variable of interest. 
A vector of breakpoints is returned based on the data provided. If
wishing to identify separate breakpoints per animal ID, this function
should be mapped onto a list generated by df_to_list
.
#simuluate data
var< sample(1:3, size = 50, replace = TRUE)
var< rep(var, each = 20)
id< rep(1:10, each = 100)
#create data frame
dat< data.frame(id, var)
#create list
dat.list< df_to_list(dat = dat, ind = "id")
#run function using purrr::map()
breaks< purrr::map(dat.list, ~find_breaks(dat = ., ind = "var"))
#or with lapply()
breaks1< lapply(dat.list, find_breaks, ind = "var")
Pulls model results for the estimates of bin proportions per movement variable from the posterior distribution. This can be used for visualization of movement variable distribution for each behavior estimated.
get_behav_hist(dat, nburn, ngibbs, nmaxclust, var.names)
dat 
The list object returned by the LDA model
( 
nburn 
numeric. The length of the burnin phase. 
ngibbs 
numeric. The total number of iterations of the MCMC chain. 
nmaxclust 
numeric. The maximum number of clusters on which to attribute behaviors. 
var.names 
character. A vector of names used for each of the movement
variables. Must be in the same order as were listed within the data frame
returned by 
A data frame that contains columns for bin number, behavioral state,
proportion represented by a given bin, and movement variable name. This is
displayed in a long format, which is easier to visualize using
ggplot2
.
#load data
data(tracks.seg)
#select only id, tseg, SL, and TA columns
tracks.seg2< tracks.seg[,c("id","tseg","SL","TA")]
#summarize data by track segment
obs< summarize_tsegs(dat = tracks.seg2, nbins = c(5,8))
#cluster data with LDA
res< cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000,
nburn = 500, nmaxclust = 7, ndata.types = 2)
#Extract proportions of behaviors per track segment
theta.estim< extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7)
#run function for clustered segments
behav.res< get_behav_hist(dat = res, nburn = 500, ngibbs = 1000, nmaxclust = 7,
var.names = c("Step Length","Turning Angle"))
Extract breakpoints for each animal ID
get_breakpts(dat, MAP.est)
dat 
A list of lists where animal IDs are separated as well as the
breakpoints estimated for each iteration of the MCMC chain. This is stored
within 
MAP.est 
numeric. A vector of values at which the maximum a posteriori
(MAP) estimate was identified for each of the animal IDs as returned by

A data frame where breakpoints are returned per animal ID within each
row. For animal IDs that have fewer breakpoints than the maximum number
that were estimated, NA
values are used as place holders for these
breakpoints that do not exist.
#load data
data(tracks.list)
#subset only first track
tracks.list< tracks.list[1]
#only retain id and discretized step length (SL) and turning angle (TA) columns
tracks.list2< purrr::map(tracks.list,
subset,
select = c(id, SL, TA))
set.seed(1)
# Define model params
alpha< 1
ngibbs< 1000
nbins< c(5,8)
#future::plan(future::multisession) #run all MCMC chains in parallel
dat.res< segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins,
alpha = alpha)
# Determine MAP iteration for selecting breakpoints and store breakpoints
MAP.est< get_MAP(dat = dat.res$LML, nburn = ngibbs/2)
brkpts< get_breakpts(dat = dat.res$brkpts, MAP.est = MAP.est)
Identify the MCMC iteration that holds the MAP estimate. This will be used to
inform get_breakpts
as to which breakpoints should be retained
on which to assign track segments to the observations of each animal ID.
get_MAP(dat, nburn)
dat 
A data frame where each row holds the log marginal likelihood values at each iteration of the MCMC chain. 
nburn 
numeric. The size of the burnin phase after which the MAP estimate will be identified. 
A numeric vector of iterations at which the MAP estimate was found for each animal ID.
#load data
data(tracks.list)
#subset only first track
tracks.list< tracks.list[1]
#only retain id and discretized step length (SL) and turning angle (TA) columns
tracks.list2< purrr::map(tracks.list,
subset,
select = c(id, SL, TA))
set.seed(1)
# Define model params
alpha< 1
ngibbs< 1000
nbins< c(5,8)
#future::plan(future::multisession) #run all MCMC chains in parallel
dat.res< segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins,
alpha = alpha)
# Determine MAP iteration for selecting breakpoints and store breakpoints
MAP.est< get_MAP(dat = dat.res$LML, nburn = ngibbs/2)
Internal function to be used by a wrapper.
get_MAP_internal(dat, nburn)
dat 
numeric. A vector of log marginal likelihood values for a given animal ID. 
nburn 
numeric. The size of the burnin phase after which the MAP estimate will be identified. 
A numeric value indicating the iteration after the burnin phase that holds the MAP estimate.
An internal function that calculates the sufficient statistics to be used
within the reversiblejump MCMC Gibbs sampler called by
link{samp_move}
.
get_summary_stats(breakpt, dat, max.time, nbins, ndata.types)
breakpt 
numeric. A vector of breakpoints. 
dat 
A matrix that only contains columns storing discretized data for each of the movement variables. 
max.time 
numeric. The number of of the last observation of 
nbins 
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within

ndata.types 
numeric. The length of 
Returns the sufficient statistics associated with the provided breakpoints for a given animal ID.
Calculates the loglikelihood of the mixture model based on estimates for theta and phi.
get.llk.mixmod(phi, theta, ndata.types, dat, nobs, nmaxclust)
phi 
A list of proportion estimates that characterize distributions (bins) for each data stream and possible behavioral state. 
theta 
numeric. A vector of values that sum to one. 
ndata.types 
numeric. The number of data streams being analyzed. 
dat 
A data frame containing only columns of the discretized data streams for all observations. 
nobs 
numeric. The total number of rows in the dataset. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
A numeric value of the loglikelihood based upon the current values for phi and theta.
Calculates values of theta matrix within Gibbs sampler. Not for calling directly by users.
get.theta(v, nmaxclust, ntsegm)
v 
A matrix returned by 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
ntsegm 
numeric. The total number of time segments from all animal IDs. 
A matrix of proportion estimates that represent proportions of different behavioral states per time segment.
Insert NA gaps to regularize a time series
insert_NAs(data, int, units)
data 
A data frame that minimally contains columns for animal ID, date,
and time step. These must be labeled 
int 
integer. An integer that characterizes the desired interval on which to insert new rows. 
units 
character. The units of the selected time interval 
A data frame where new rows have been inserted to regularize the date
column. This results in values provided for id
, date
, and dt while inserting NAs for all other columns. Additionally, observations with duplicate datetimes are removed.
#load data
data(tracks)
#remove rows to show how function works (create irregular time series)
set.seed(1)
ind< sort(sample(2:15003, 500))
tracks.red< tracks[ind,]
#calculate step lengths, turning angles, netsquared displacement, and time steps
tracks.red< prep_data(dat = tracks.red, coord.names = c("x","y"), id = "id")
#round times to nearest interval
tracks.red< round_track_time(dat = tracks.red, id = "id", int = c(3600, 7200, 10800, 14400),
tol = 300, units = "secs")
#insert NA gaps
dat.out< insert_NAs(tracks.red, int = 3600, units = "secs")
An internal function that is used to calculate the log marginal likelihood of
models for the current and proposed sets of breakpoints. Called within
samp_move
.
log_marg_likel(alpha, summary.stats, nbins, ndata.types)
alpha 
numeric. A single value used to specify the hyperparameter for
the prior distribution. A standard value for 
summary.stats 
A matrix of sufficient statistics returned from

nbins 
numeric. A vector of the number of bins used to discretize each movement variable. 
ndata.types 
numeric. The length of 
The log marginal likelihood is calculated for a model with a given set of breakpoints and the discretized data.
Visualize the breakpoints estimated by the segmentation model as they relate to either the original (continuous) or discretized data. These plots assist in determining whether too many or too few breakpoints were estimated as well as whether the user needs to redefine how they discretized their data before analysis.
plot_breakpoints(data, as_date = FALSE, var_names, var_labels = NULL, brkpts)
data 
A list where each element stores a data frame for a given animal
ID. Each of these data frames contains columns for the ID, date or time1
generated by 
as_date 
logical. If 
var_names 
A vector of the column names for the movement variables to be plotted over time. 
var_labels 
A vector of the labels to be plotted on the yaxis for each
movement variable. Set to 
brkpts 
A data frame that contains the breakpoints associated with each
animal ID. This data frame is returned by 
A line plot per animal ID for each movement variable showing how the estimated
breakpoints relate to the underlying data. Depending on the user input for
var_names
, this may either be on the scale of the original
continuous data or the discretized data.
#load data
data(tracks.list)
#subset only first track
tracks.list< tracks.list[1]
#only retain id and discretized step length (SL) and turning angle (TA) columns
tracks.list2< purrr::map(tracks.list,
subset,
select = c(id, SL, TA))
set.seed(1)
# Define model params
alpha< 1
ngibbs< 1000
nbins< c(5,8)
#future::plan(future::multisession) #run all MCMC chains in parallel
dat.res< segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins,
alpha = alpha)
# Determine MAP iteration for selecting breakpoints and store breakpoints
MAP.est< get_MAP(dat = dat.res$LML, nburn = ngibbs/2)
brkpts< get_breakpts(dat = dat.res$brkpts, MAP.est = MAP.est)
#run function
plot_breakpoints(data = tracks.list, as_date = FALSE, var_names = c("step","angle"),
var_labels = c("Step Length (m)", "Turning Angle (rad)"), brkpts = brkpts)
An internal function for plotting the results of the segmentation model.
plot_breakpoints_behav(data, as_date, var_names, var_labels, brkpts)
data 
A data frame for a single animal ID that contains columns for the
ID, date or time variable, and each of the movement variables that were
analyzed by 
as_date 
logical. If 
var_names 
A vector of the column names for the movement variables to be plotted over time. 
var_labels 
A vector of the labels to be plotted on the yaxis for each
movement variable. Set to 
brkpts 
A data frame that contains the breakpoints associated with each
animal ID. This data frame is returned by 
A line plot for each movement variable showing how the estimated
breakpoints relate to the underlying data. Depending on the user input for
var_names
, this may either be on the scale of the original
continuous data or the discretized data.
Calculates step lengths, turning angles, and netsquared displacement based
on coordinates for each animal ID and calculates time steps based on the
datetime. Provides a selfcontained method to calculate these variables
without needing to rely on other R packages (e.g., adehabitatLT
).
However, functions from other packages can also be used to perform this step
in data preparation.
prep_data(dat, coord.names, id)
dat 
A data frame that contains a column for animal IDs, the columns
associated with the x and y coordinates, and a column for the date. For
easier interpretation of the model results, it is recommended that
coordinates be stored in a UTM projection (meters) as opposed to
unprojected in decimal degrees (map units). Datetime should be of class

coord.names 
character. A vector of the column names under which the coordinates are stored. The name for the x coordinate should be listed first and the name for the y coordinate second. 
id 
character. The name of the column storing the animal IDs. 
A data frame where all original data are returned and new columns are
added for step length (step
), turning angle (angle
),
netsquared displacement (NSD
), and time
step (dt
). Names for coordinates are changed to x
and
y
. Units for step
and NSD
depend on the projection of the
coordinates, angle
is returned in radians, and dt
is
returned in seconds.
#load data
data(tracks)
#subset only first track
tracks< tracks[tracks$id == "id1",]
#calculate step lengths and turning angles
tracks< prep_data(dat = tracks, coord.names = c("x","y"), id = "id")
An internal function that calculates step lengths, turning angles, and time steps for a given animal ID.
prep_data_internal(dat, coord.names)
dat 
A data frame that contains the columns associated with the x and y
coordinates as well as the datetime. For easier interpretation of the
model results, it is recommended that coordinates be stored after UTM
projection (meters) as opposed to unprojected in decimal degrees (map
units). Datetime should be of class 
coord.names 
character. A vector of the column names under which the coordinates are stored. The name for the x coordinate should be listed first and the name for the y coordinate second. 
A data frame where all original data are returned and new columns are
added for step length (step
), turning angle (angle
),
netsquared displacement (NSD
), and time step (dt
).
Internal function that samples z's from a categorical distribution
rmultinom1(prob, randu)
prob 
A numeric matrix. 
randu 
A numeric vector. 
Internal function that samples z's from a multinomial distribution
rmultinom2(prob, n, randu, nmaxclust)
prob 
A numeric vector. 
n 
An integer. 
randu 
A numeric vector. 
nmaxclust 
An integer. 
Rounds sampling intervals that are close, but not exactly the time interval
of interest (e.g., 240 s instead of 300 s). This can be performed on multiple
time intervals, but only using a single tolerance value. This function
prepares the data to be analyzed by segment_behavior
, which
requires that all time intervals exactly match the primary time interval when
analyzing step lengths and turning angles. Columns storing the time intervals
and dates must be labeled dt
and date
, respectively, where
dates are of class POSIXct
.
round_track_time(dat, id, int, tol, time.zone = "UTC", units)
dat 
A data frame that contains the sampling interval of the observations. 
id 
character. The name of the column storing the animal IDs. 
int 
numeric. A vector of the time interval(s) of on which to perform rounding. 
tol 
numeric. A single tolerance value on which to round any 
time.zone 
character. Specify the time zone for which the datetimes
were recorded. Set to UTC by default. Refer to 
units 
character. The units of the selected time interval 
A data frame where dt
and date
are both adjusted based
upon the rounding of time intervals according to the specified tolerance.
#load data
data(tracks)
#subset only first track
tracks< tracks[tracks$id == "id1",]
#calculate step lengths and turning angles
tracks< prep_data(dat = tracks, coord.names = c("x","y"), id = "id")
#round times to nearest interval of interest (e.g. 3600 s or 1 hr)
tracks< round_track_time(dat = tracks, id = "id", int = 3600, tol = 180, time.zone = "UTC",
units = "secs")
This is RJMCMC algorithm that drives the proposal and selection of
breakpoints for the data based on the difference in log marginal likelihood.
This function is called within behav_gibbs_sampler
.
samp_move(breakpt, max.time, dat, alpha, nbins, ndata.types)
breakpt 
numeric. A vector of breakpoints. 
max.time 
numeric. The number of of the last observation of 
dat 
A matrix that only contains columns storing discretized data for
each of the movement variables used within 
alpha 
numeric. A single value used to specify the hyperparameter for
the prior distribution. A standard value for 
nbins 
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within

ndata.types 
numeric. The length of 
The breakpoints and log marginal likelihood are retained from the selected model from the Gibbs sampler and returned as elements of a list. This is performed for each iteration of the MCMC algorithm.
Internal function to sample the gamma hyperparameter
sample.gamma.mixmod(v, ngroup, gamma.possib)
v 
numeric. A vector of proportions for each of the possible clusters. 
ngroup 
numeric. The total number of possible clusters. 
gamma.possib 
numeric. A vector of possible values that gamma can take ranging between 0.1 and 1. 
A single numeric value for gamma that falls within
gamma.possib
for calculation of the loglikelihood.
Estimates values of phi matrix for use in characterizing distributions of the movement variables. Not for calling directly by users.
sample.phi(z.agg, alpha, nmaxclust, nbins, ndata.types)
z.agg 
A list of latent cluster estimates provided by

alpha 
numeric. A hyperparameter for the Dirichlet distribution. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
nbins 
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within

ndata.types 
numeric. The number of data streams being analyzed. 
A matrix of proportion estimates that characterize distributions (bins) for each movement variable and possible behavioral state.
Estimates values of phi matrix for use in characterizing distributions of the movement variables. Not for calling directly by users.
sample.phi.mixmod(alpha, nmaxclust, nbins, ndata.types, nmat)
alpha 
numeric. A hyperparameter for the Dirichlet distribution. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
nbins 
numeric. A vector of the number of bins used to discretize each
data stream. These must be in the same order as the columns within

ndata.types 
numeric. The number of data streams being analyzed. 
nmat 
A list based on 
A list of proportion estimates that characterize distributions (bins) for each data stream and possible behavioral state.
This function samples the latent v parameter within the Gibbs sampler.
Calls on the CumSumInv
function written in C++. Not for calling
directly by users.
sample.v(z.agg, gamma1, ntsegm, ndata.types, nmaxclust)
z.agg 
A list of latent cluster estimates provided by

gamma1 
numeric. Hyperparameter for the truncated stickbreaking prior. 
ntsegm 
numeric. The total number of time segments from all animal IDs. 
ndata.types 
numeric. The number of data streams being analyzed. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
A matrix with estimates for v for each of the number of time segments and possible states.
This function samples the latent v parameter within the Gibbs sampler. Not for calling directly by users.
sample.v.mixmod(z, gamma1, nmaxclust)
z 
A vector of latent cluster estimates provided by

gamma1 
numeric. Hyperparameter for the truncated stickbreaking prior. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
A list with estimates for v and theta for each of the possible states.
This function samples the latent z parameter within the Gibbs sampler.
Calls on the SampleZAgg
function written in C++. Not for calling
directly by users.
sample.z(ntsegm, nbins, y, nmaxclust, phi, ltheta, zeroes, ndata.types)
ntsegm 
numeric. The total number of time segments from all animal IDs. 
nbins 
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within

y 
A list where each element stores separate aggregated count data per bin per time segment for each movement variable being analyzed. These are stored as matrices. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
phi 
A list where each element stores separate proportions per bin per time segment for each movement variable. 
ltheta 
A matrix storing the logtransformed values from the

zeroes 
A list of arrays that contain only zero values which are three dimensional (ntsegm,nbins[i],nmaxclust). 
ndata.types 
numeric. The number of data streams being analyzed. 
A list with estimates for z where the number of elements is equal to the number of movement variables.
This function samples the latent z parameter within the Gibbs sampler.
Calls on the rmultinom1
function written in C++. Not for calling
directly by users.
sample.z.mixmod(nobs, nmaxclust, dat, ltheta, lphi, ndata.types)
nobs 
numeric. The total number of rows in the dataset. 
nmaxclust 
numeric. A single number indicating the maximum number of clusters to test. 
dat 
A data frame containing only columns of the discretized data streams for all observations. 
ltheta 
numeric. A vector of logtransformed estimates for parameter theta. 
lphi 
A list containing logtransformed estimates for each data stream of the phi parameter. 
ndata.types 
numeric. The number of data streams being analyzed. 
A vector with estimates for z for each observation within
dat
.
Internal function that samples z1 aggregate
SampleZAgg(ntsegm, b1, y1, nmaxclust, lphi1, ltheta, zeroes)
ntsegm 
An integer. 
b1 
An integer. 
y1 
An integer matrix. 
nmaxclust 
An integer. 
lphi1 
A numeric matrix. 
ltheta 
A numeric matrix. 
zeroes 
A numeric vector. 
This function performs the reversiblejump MCMC algorithm using a Gibbs sampler, which estimates the breakpoints of the movement variables for each of the animal IDs. This is the first stage of the twostage Bayesian model that estimates proportions of behavioral states by first segmenting individual tracks into relatively homogeneous segments of movement.
segment_behavior(
data,
ngibbs,
nbins,
alpha,
breakpt = purrr::map(names(data), ~NULL)
)
data 
A list where each element stores the data for a separate animal ID. List elements are data frames that only contain columns for the animal ID and for each of the discretized movement variables. 
ngibbs 
numeric. The total number of iterations of the MCMC chain. 
nbins 
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within

alpha 
numeric. A single value used to specify the hyperparameter for
the prior distribution. A standard value for 
breakpt 
A list where each element stores a vector of breakpoints if
prespecifying where they may occur for each animal ID. By default this is
set to 
This model is run in parallel using the future
package. To ensure that
the model is run in parallel, the plan
must be used
with future::multisession
as the argument for most operating systems.
Otherwise, model will run sequentially by default if this is not set before
running segment_behavior
.
A list of model results is returned where elements include the breakpoints, number of breakpoints, and log marginal likelihood at each iteration of the MCMC chain for all animal IDs. The time it took the model to finish running for each animal ID are also stored and returned.
#load data
data(tracks.list)
#subset only first track
tracks.list< tracks.list[1]
#only retain id and discretized step length (SL) and turning angle (TA) columns
tracks.list2< purrr::map(tracks.list,
subset,
select = c(id, SL, TA))
set.seed(1)
# Define model params
alpha< 1
ngibbs< 1000
nbins< c(5,8)
future::plan(future::multisession, workers = 3) #run all MCMC chains in parallel
dat.res< segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins,
alpha = alpha)
future::plan(future::sequential) #return to single core
This Shiny application allows for the exploration of animal movement patterns. Options are available to interactively filter the plotted tracks by a selected time period of a given variable, which is then displayed on an interactive map. Additionally, a data table is shown with options to filter and export this table once satisfied.
shiny_tracks(data, epsg)
data 
A data frame that must contain columns labeled 
epsg 
numeric. The coordinate reference system (CRS) as an EPSG code. 
Currently, the time series plot shown for the exploration of individual
tracks cannot display variables of class character
or factor
.
Therefore, these should be changed to numeric values if they are to be
plotted.
If the data are stored as longitude and latitude (i.e., WGS84), the EPSG code is 4326. All other codes will need to be looked up if they are not already known.
## Not run:
#load data
data(tracks)
#run Shiny app
shiny_tracks(data = tracks, epsg = 32617)
## End(Not run)
This function helps store z from all iterations after burn in
StoreZ(z, store_z, nobs)
z 
An integer vector. 
store_z 
An integer matrix. 
nobs 
An integer. 
Prepares the data that has already been segmented for clustering by Latent Dirichlet Allocation. This function summarizes the counts observed per movement variable bin within each track segment per animal ID.
summarize_tsegs(dat, nbins)
dat 
A data frame of only the animal ID, track segment number,
and the discretized data for each movement variable. Animal ID and time
segment must be the first two columns of this data frame. This should be a
simplified form of the output from 
nbins 
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within

A new data frame that contains the animal ID, track segment number,
and the counts per bin for each movement variable. The names for each of
these bins are labeled according to the order in which the variables were
provided to summarize_tsegs
.
#load data
data(tracks.seg)
#select only id, tseg, SL, and TA columns
tracks.seg2< tracks.seg[,c("id","tseg","SL","TA")]
#run function
obs< summarize_tsegs(dat = tracks.seg2, nbins = c(5,8))
Internal function that summarizes bin distributions of track segments
summarize1(VecVals, Breakpts, nobs, nbins, nbreak)
VecVals 
A vector of bin values. 
Breakpts 
A vector if breakpoints. 
nobs 
The number of observations. 
nbins 
The number of bins for a given data stream. 
nbreak 
The number of estimated breakpoints. 
Internal function that generates nmat matrix to help with multinomial draws
SummarizeDat(z, dat, ncateg, nbehav, nobs)
z 
An integer vector. 
dat 
An integer vector. 
ncateg 
An integer. 
nbehav 
An integer. 
nobs 
An integer. 
Visualize traceplots of the number of breakpoints estimated by the model as well as the log marginal likelihood (LML) for each animal ID.
traceplot(data, type)
data 
A list of model results that is returned as output from 
type 
character. The type of data that are being plotted from the Bayesian segmentation model results. Takes either 'nbrks' for the number of breakpoints or 'LML' for the log marginal likelihood. 
Traceplots for the number of breakpoints or the log marginal likelihood are displayed for each of the animal IDs that were analyzed by the segmentation model.
#load data
data(tracks.list)
#only retain id and discretized step length (SL) and turning angle (TA) columns
tracks.list2< purrr::map(tracks.list,
subset,
select = c(id, SL, TA))
set.seed(1)
# Define model params
alpha< 1
ngibbs< 1000
nbins< c(5,8)
future::plan(future::multisession, workers = 3) #run all MCMC chains in parallel
dat.res< segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins,
alpha = alpha)
future::plan(future::sequential) #return to single core
#run function
traceplot(data = dat.res, type = "nbrks")
traceplot(data = dat.res, type = "LML")
A dataset containing the IDs as well as x and y coordinates for three tracks of 5001 observations each (15,003 in total).
tracks
A data frame with 15003 rows and 4 variables:
ID for each simulated track
date, recorded as datetime
x coordinate of tracks
y coordinate of tracks
A dataset containing the prepared data after discretizing step lengths and turning angles, as well as filtering observations at the primary time step.
tracks.list
A list with three elements, each containing a data frame with ~4700 rows and 11 variables:
ID for each simulated track
date, recorded as datetime
x coordinate of tracks
y coordinate of tracks
the step length calculated as the distance between successive locations measured in units
the relative turning angle measured in radians
the time step or sampling interval between datetimes of successive observations
the ordered number of observations per ID before filtering for the primary time step
the ordered number of observations per ID after filtering for the primary time step
discretized step lengths, separated into five bins
discretized turning angles, separated into eight bins
A dataset containing the filtered track data with time segments assigned to all observations on an individual basis.
tracks.seg
A data frame with 14096 rows and 12 variables:
ID for each simulated track
date, recorded as datetime
x coordinate of tracks
y coordinate of tracks
the step length calculated as the distance between successive locations measured in units
the relative turning angle measured in radians
the time step or sampling interval between datetimes of successive observations
the ordered number of observations per ID before filtering for the primary time step
the ordered number of observations per ID after filtering for the primary time step
discretized step lengths, separated into five bins
discretized turning angles, separated into eight bins
time segment assigned to a given set of observations per ID