Title: | Expectation-Maximization Binary Clustering |
---|---|
Description: | Unsupervised, multivariate, binary clustering for meaningful annotation of data, taking into account the uncertainty in the data. A specific constructor for trajectory analysis in movement ecology yields behavioural annotation of trajectories based on estimated local measures of velocity and turning angle, eventually with solar position covariate as a daytime indicator, ("Expectation-Maximization Binary Clustering for Behavioural Annotation"). |
Authors: | Joan Garriga, John R.B. Palmer, Aitana Oltra, Frederic Bartumeus |
Maintainer: | Joan Garriga <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 2.0.4 |
Built: | 2024-10-27 06:37:36 UTC |
Source: | CRAN |
The Expectation-maximization binary clustering (EMbC) is a general purpose, unsupervised, multi-variate, clustering algorithm, driven by two main motivations: (i) it looks for a good compromise between statistical soundness and ease and generality of use - by minimizing prior assumptions and favouring the semantic interpretation of the final clustering - and, (ii) it allows taking into account the uncertainty in the data. These features make it specially suitable for the behavioural annotation of animal's movement trajectories.
The method is a variant of the well sounded Expectation-Maximization Clustering (EMC) algorithm, - i.e. under the assumption of an underlying Gaussian Mixture Model (GMM) describing the distribution of the data-set - but constrained to generate a binary partition of the input space. This is achieved by means of the *delimiters*, a set of parameters that discretizes the input features into high and low values and define the binary regions of the input space. As a result, each final cluster includes a unique combination of either low or high values of the input variables. Splitting the input features into low and high values is what favours the semantic interpretation of the final clustering.
The initial assumptions implemented in the EMbC algorithm aim at minimizing biases and sensitivity to initial conditions: (i) each data point is assigned a uniform probability of belonging to each cluster, (ii) the prior mixture distribution is uniform (each cluster starts with the same number of data points), (iii) the starting partition, (*i.e.* initial delimiters position), is selected based on a global maximum variance criterion, thus conveying the minimum information possible.
The number of output clusters is $2^m$ determined by the number of input features $m$. This number is only an upper bound as some of the clusters can be merged along the likelihood optimization process. The EMbC algorithm is intended to be used with not more than 5 or 6 input features, yielding a maximum of 32 or 64 clusters. This limitation in the number of clusters is consistent with the main motivation of the algorithm of favouring the semantic interpretation of the results.
The algorithm deals very intuitively with data reliability: the larger the uncertainty associated with a data point, the smaller the leverage of that data point in the clustering.
Compared to close related methods like EMC and Hidden Markov Models (HMM), the EMbC is specially useful when: (i) we can expect bi-modality, to some extent, in the conditional distribution of the input features or, at least, we can assume that a binary partition of the input space can provide useful information, and (ii) a first order temporal dependence assumption, a necessary condition in HMM, can not be guaranteed.
The EMbC R-package is mainly intended for the behavioural annotation of animals' movement trajectories where an easy interpretation of the final clustering and the reliability of the data constitute two key issues, and the conditions of bi-modality and unfair temporal dependence usually hold. In particular, the temporal dependence condition is easily violated in animal's movement trajectories because of the heterogeneity in empirical time series due to large gaps, or prefixed sampling scheduling.
Input movement trajectories are given either as a *data.frame* or a *Move* object from the **move** R-package. The package deals also with stacks of trajectories for population level analysis. Segmentation is based on local estimates of velocity and turning angle, eventually including a solar position covariate as a daytime indicator.
The core clustering method is complemented with a set of functions to easily visualize and analyze the output:
* clustering statistics, * clustering scatterplot (2D and 3D) * temporal labeling profile (ethogram), * plotting of intermediate variables, * confusion matrix (numerical validation with respect to an expert's labeling), * visual validation with external information (e.g. environmental data), * generation of kml or webmap docs for detailed inspection of the output.
Also, some functions are provided to further refine the output, either by pre-processing (smoothing) the input data or by post-processing (smoothing, relabeling, merging) the output labeling.
The results obtained for different empirical datasets suggest that the EMbC algorithm behaves reasonably well for a wide range of tracking technologies, species, and ecological contexts (e.g. migration, foraging).
Joan Garriga [email protected]
Unless otherwise specified, a binClst
instance refers to any of the binary clustering objects defined in the package, either a binClst
object itself, or any of its child classes, a binClstPath
or a binClstMove
instance. The latter inherit all slots and functionality defined for the former.
binClst
is a generic multivariate binary clustering object.
X
The input data set. A multivariate matrix where each row is a data point and each column is an input feature (a variable).
U
A multivariate matrix with same dimension as X with the values of certainty associated to each corresponding value in X. Ceartainties assign reliability to the data points so that the less reliable is a data point the less its leverage in the clustering. By default certainties are set to one for all variables of all data points.
stdv
A numeric vector with variable specific values for minimum standard deviation.
m
The number of input features.
k
The number of clusters.
n
The number of observations (data points).
R
A matrix with the values delimiting each binary region (the Reference
values).
P
A list with the GMM (Gaussian Mixture Model) parameters. Each element of the list corresponds to a component of the GMM and it is a named-sublist itself, with elements '$M' (the component's mean) and '$S' (the component's covariance matrix).
W
A n*k matrix with the likelihood weights.
A
A numeric vector with the clustering labels (annotations) for each data-point (the basic output data). Labels are assigned based on the likelihood weights. Only in case of equal likelihoods the delimiters are used as a further criterion to assign labels.
L
The values of likelihood at each step of the optimization process.
C
Default color palette used for the plots. Can be changed by means of the setc() function.
Unless otherwise specified, a binClstPath
instance refers to a binClstPath
object itself, as well as its child class binClstMove
. The latter inherits all slots and functionality defined for the former.
binClstPath
is a binClst
subclass for fast and easy speed/turn-clustering of movement trajectories. The input trajectory is given as a data.frame with, at least, the columns (timeStamp,longitude,latitude). This format is described in detail in the class constructor stbc. As a binClst
subclass, this class inherits all slots and functionality of its parent class.
pth
A data.frame with the trajectory timestamps and geolocation coordinates, plus eventual extra columns that were included in the input path data frame, (see the stbc constructor).
spn
A numeric vector with the time intervals between locations (in seconds).
dst
A numeric vector with the distances between locations (in meters). We use loxodromic computations.
hdg
A numeric vector with local heading directions (in radians from North). We use loxodromic computations.
bursted
A logical value indicating whether the binClstPath
instance has already been bursted. As bursting can be computationally demanding for long trajectories, an instance is bursted only when a burst wise representation of the trajectory' is requested for the first time, (unless this value is changed to FALSE).
tracks
If bursted=TRUE, a SpatialLinesDataFrame
object ("sp" R-package) with the bursted track segments.
midPoints
If bursted=TRUE, a SpatialPointsDataFrame
object ("sp" R-package) with the bursted track midpoints.
binClstStck
is a special class for population level speed/turn-clustering of movement trajectories, given either as path data.frames or move
objects.
bCS
A list of either binClstPath
or binClstMove
objects, depending on how the input paths are given.
bC
A binClst
instance with the global speed/turn clustering of the paths in the stack.
bkml
generates a burstwise .kml file of a
binClstPath_instance, which can be viewed using Google Earth or
other GIS software. At first issue, this command can take some time because
bursted segmentation has to be computed.
bkml(obj, folder = "embcDocs", markerRadius = 15, display = FALSE) ## S4 method for signature 'binClstPath' bkml(obj, folder = "embcDocs", markerRadius = 15, display = FALSE)
bkml(obj, folder = "embcDocs", markerRadius = 15, display = FALSE) ## S4 method for signature 'binClstPath' bkml(obj, folder = "embcDocs", markerRadius = 15, display = FALSE)
obj |
|
folder |
A character string indicating the name of the folder in which the .kml file will be saved. If the folder does not exist it is automatically created, (defaults to '~/embcDocs'). |
markerRadius |
A numeric value indicating the radius of the markers to be plotted, (defaults to 5 pixels). |
display |
A boolean value (defaults to FALSE) to automatically launch Google-Earth from within R to visualize the generated .kml document. (Google Earth must already be installed on the system. In Windows, it must be associated with the .kml file type.) |
The path/name of the saved kml file.
## Not run: # -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- generate a burstwise kml of the output -- bkml(mybcp) ## End(Not run)
## Not run: # -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- generate a burstwise kml of the output -- bkml(mybcp) ## End(Not run)
bmap
generates a burstwise .html file map of a
binClstPath_instance in HTML5, using Google Maps JavaScript API v3
(https://developers.google.com/maps/documentation/javascript/). The
resulting file can be viewed locally in most browsers (an internet
connection is required for displaying the map tiles) or posted online.
bmap( obj, folder = "embcDocs", apiKey = "", mapType = "SATELLITE", markerRadius = 15, display = FALSE ) ## S4 method for signature 'binClstPath' bmap( obj, folder = "embcDocs", apiKey = "", mapType = "SATELLITE", markerRadius = 15, display = FALSE )
bmap( obj, folder = "embcDocs", apiKey = "", mapType = "SATELLITE", markerRadius = 15, display = FALSE ) ## S4 method for signature 'binClstPath' bmap( obj, folder = "embcDocs", apiKey = "", mapType = "SATELLITE", markerRadius = 15, display = FALSE )
obj |
|
folder |
A character string indicating the name of the folder in which the .html file will be saved. If the folder does not exist it is automatically created, (defaults to '~/embcDocs'). |
apiKey |
A character string specifying the API Key to be passed to the Google Maps server. No Key is needed for using Google Maps JavaScript API v3, but users may wish to specify a key in order to monitor web traffic if the document is being posted online. |
mapType |
A character string specifying the type of map to be used in the background. This value is passed directly to the Google Maps server, and currently can be set to ROADMAP, SATELLITE, HYBRID, or TERRAIN. (See the Google Maps API documentation for more information.) |
markerRadius |
A numeric value indicating the radius of the markers to be plotted, (defaults to 5 pixels). |
display |
A boolean value (defaults to FALSE) to automatically launch the system's default browser from within R to visualize the generated .html document. |
The path/name of the saved .html file.
## Not run: # -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- generate a burstwise HTML of the output -- bmap(mybcp) ## End(Not run)
## Not run: # -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- generate a burstwise HTML of the output -- bmap(mybcp) ## End(Not run)
Plots the labeling profile of a binClst_instance against a control variable (e.g. environmental information) depicted as background coloured bars.
chkp(obj, ...) ## S4 method for signature 'binClst' chkp(obj, ctrlLbls = NULL, ctrlClrs = NULL, ctrlLgnd = NULL, lims = NULL)
chkp(obj, ...) ## S4 method for signature 'binClst' chkp(obj, ctrlLbls = NULL, ctrlClrs = NULL, ctrlLgnd = NULL, lims = NULL)
obj |
|
... |
Parameters |
ctrlLbls |
A numeric vector with the control labels or a string specifying one of 'height', 'azimuth' or 'both' solar covariates. By default, for a binClstPath_instance it is set to the solar height covariate, regardless it has been used or not for the clustering. |
ctrlClrs |
A vector of colors to depict the control labeling. At least one colour should be specified for each different control label. By default white/grey colours are used for the default control labels. |
ctrlLgnd |
A vector of strings identifying the labels for the legend of the plot. They are automatically generated for the solar covariates. |
lims |
A numeric vector with lower and upper bounds to limit the plot. |
# -- apply EMbC to \code{expth} -- mybcp <- stbc(expth) # -- plot the labeling profile against 'both' solar covariates -- chkp(mybcp,ctrlLbls='both',ctrlClrs=RColorBrewer::brewer.pal(8,'Oranges')[1:4])
# -- apply EMbC to \code{expth} -- mybcp <- stbc(expth) # -- plot the labeling profile against 'both' solar covariates -- chkp(mybcp,ctrlLbls='both',ctrlClrs=RColorBrewer::brewer.pal(8,'Oranges')[1:4])
cnfm
computes the confusion matrix of the clustering with
respect to an expert/reference labeling of the data. Also, it can be used
to compare the labelings of two different clusterings of the same
trajectory, (see details).
cnfm(obj, ref, ...) ## S4 method for signature 'binClst,numeric' cnfm(obj, ref, ret = FALSE, ...) ## S4 method for signature 'binClstPath,missing' cnfm(obj, ref, ret = FALSE, ...) ## S4 method for signature 'binClstStck,missing' cnfm(obj, ref, ret = FALSE, ...) ## S4 method for signature 'binClst,binClst' cnfm(obj, ref, ret = FALSE, ...)
cnfm(obj, ref, ...) ## S4 method for signature 'binClst,numeric' cnfm(obj, ref, ret = FALSE, ...) ## S4 method for signature 'binClstPath,missing' cnfm(obj, ref, ret = FALSE, ...) ## S4 method for signature 'binClstStck,missing' cnfm(obj, ref, ret = FALSE, ...) ## S4 method for signature 'binClst,binClst' cnfm(obj, ref, ret = FALSE, ...)
obj |
A binClst_instance or |
ref |
A numeric vector with an expert/reference labeling of the data. A second binClst_instance (see details). |
... |
Parameters |
ret |
A boolean value (defaults to FALSE). If ret=TRUE the confusion matrix is returned as a matrix object. |
The confusion matrix yields marginal counts and Recall for each row, and marginal counts, Precision and class F-measure for each column. The 3x2 subset of cells at the bottom right show (in this order): the overall Accuracy, the average Recall, the average Precision, NaN, NaN, and the overall Macro-F-Measure. The number of classes (expert/reference labeling) should match or, at least not be greater than the number of clusters. The overall value of the Macro-F-Measure is an average of the class F-measure values, hence it is underestimated if the number of classes is lower than the number of clusters.
If obj
is a binClstPath_instance and there is a column "lbl" in
the obj@pth slot with an expert labeling, this labeling will be used by
default.
If obj
is a binClstStck
instance and, for all paths in the
stack, there is a column "lbl" in the obj@pth slot of each, this labeling
will be used to compute the confusion matrix for the whole stack.
If obj
and ref
are both a binClst_instance (e.g.
smoothed versus non-smoothed), the confusion matrix compares both labelings.
If ret=TRUE returns a matrix with the confusion matrix values.
# -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- compute the confusion matrix -- cnfm(mybcp,expth$lbl) # -- as we have expth$lbl the following also works -- cnfm(mybcp,mybcp@pth$lbl) # -- or simply -- cnfm(mybcp) # -- numerical differences with respect to the smoothed clustering -- cnfm(mybcp,smth(mybcp))
# -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- compute the confusion matrix -- cnfm(mybcp,expth$lbl) # -- as we have expth$lbl the following also works -- cnfm(mybcp,mybcp@pth$lbl) # -- or simply -- cnfm(mybcp) # -- numerical differences with respect to the smoothed clustering -- cnfm(mybcp,smth(mybcp))
embc
implements the core function of the Expectation-Maximization multivariate binary clustering.
embc(X, U = NULL, stdv = NULL, maxItr = 200, info = 0)
embc(X, U = NULL, stdv = NULL, maxItr = 200, info = 0)
X |
The input data set. A multivariate matrix where each row is a data point and each column is an input feature (a variable). |
U |
A multivariate matrix with same dimension as X with the values of certainty associated to each corresponding value in X. Certainties assign reliability to the data points so that the less reliable is a data point the less its leverage in the clustering. By default certainties are set to one (no uncertainty in any value in X). |
stdv |
a vector with bounds for the maximum precision of clusters, given as minimum standard deviation for each variable, (by default is set to rep(sqrt(.Machine$double.eps),ncol(X)) |
maxItr |
A limit to the number of iterations in case of slow convergence (defaults to 200). |
info |
Level of information shown at each step: info=0 (default) shows step likelihood, number of clusters, and number of changing labels; info=1, include clustering statistics; info=2, include delimiters information; info<0, suppress any step information. |
Returns a binClst object.
# -- apply EMbC to the example set of data points x2d --- mybc <- embc(x2d@D)
# -- apply EMbC to the example set of data points x2d --- mybc <- embc(x2d@D)
A data.frame with a synthetically generated trajectory with column values (timeStamps, longitudes, latitudes, labels) and column headers ('dTm','lon','lat','lbl'). The order of the columns is important. Column headers can be whatever but are expected to be there. The only exception is the header for the labels column: if headed as 'lbl' it will be used automatically by any methods that can make use of it.
See parameter pth
of the stbc constructor.
lblp
plots the labeling profile of a
binClst_instance.
lblp(obj, ref, ...) ## S4 method for signature 'binClst,missing' lblp(obj, ref, lims = NULL, ...) ## S4 method for signature 'binClstStck,missing' lblp(obj, ref, lims = NULL, ...) ## S4 method for signature 'binClst,numeric' lblp(obj, ref, lims = NULL, ...) ## S4 method for signature 'binClst,binClst' lblp(obj, ref, lims = NULL, ...)
lblp(obj, ref, ...) ## S4 method for signature 'binClst,missing' lblp(obj, ref, lims = NULL, ...) ## S4 method for signature 'binClstStck,missing' lblp(obj, ref, lims = NULL, ...) ## S4 method for signature 'binClst,numeric' lblp(obj, ref, lims = NULL, ...) ## S4 method for signature 'binClst,binClst' lblp(obj, ref, lims = NULL, ...)
obj |
|
ref |
A numeric vector with an expert's labeling profile. A second binClst_instance to be compared with the first. |
... |
Parameters |
lims |
A numeric vector with lower and upper bounds to limit the plot. |
# -- apply EMbC to the example path -- mybcp <- stbc(expth) # -- plot the labeling profile comparing with expert labeling -- lblp(mybcp,expth$lbl) # -- compare original and smoothed labeling profiles -- lblp(mybcp,smth(mybcp))
# -- apply EMbC to the example path -- mybcp <- stbc(expth) # -- plot the labeling profile comparing with expert labeling -- lblp(mybcp,expth$lbl) # -- compare original and smoothed labeling profiles -- lblp(mybcp,smth(mybcp))
lkhp
likelihood optimization plot.
lkhp(obj, offSet = 1) ## S4 method for signature 'binClst' lkhp(obj, offSet = 1) ## S4 method for signature 'list' lkhp(obj, offSet = 1)
lkhp(obj, offSet = 1) ## S4 method for signature 'binClst' lkhp(obj, offSet = 1) ## S4 method for signature 'list' lkhp(obj, offSet = 1)
obj |
A |
offSet |
A numeric value indicating an offset to avoid the initial iterations. This is useful to see the likelihood evolution in the last iterations where the changes in likelihood are of different order of magnitude than those at the starting iterations. |
# -- apply EMbC to the example path -- mybcp <- stbc(expth) # -- inspect the likelihood evolution -- lkhp(mybcp) # -- avoid the initial values -- lkhp(mybcp,10)
# -- apply EMbC to the example path -- mybcp <- stbc(expth) # -- inspect the likelihood evolution -- lkhp(mybcp) # -- avoid the initial values -- lkhp(mybcp,10)
pkml
generates a pointwise KML file of a
binClstPath_instance, which can be viewed using Google Earth or
other GIS software.
pkml(obj, folder = "embcDocs", markerRadius = 15, display = FALSE, ...) ## S4 method for signature 'binClstPath' pkml(obj, folder, markerRadius, display, showClst = numeric(), ...)
pkml(obj, folder = "embcDocs", markerRadius = 15, display = FALSE, ...) ## S4 method for signature 'binClstPath' pkml(obj, folder, markerRadius, display, showClst = numeric(), ...)
obj |
|
folder |
A character string indicating the name of the folder in which the .kml file will be saved. If the folder does not exist it is automatically created, (defaults to '~/embcDocs'). |
markerRadius |
A numeric value indicating the radius of the markers to be plotted, (defaults to 5 pixels). |
display |
A boolean value (defaults to FALSE) to automatically launch Google-Earth from within R to visualize the generated .kml document. (Google Earth must already be installed on the system. In Windows, it must be associated with the .kml file type.) |
... |
Parameters |
showClst |
A numeric vector indicating a subset of clusters to be shown. |
The path/name of the saved kml file.
## Not run: # -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- generate a pointwise .kml of the output -- pkml(mybcp) # -- show only stopovers and automatically display the .kml document -- pkml(mybcp,showClst=c(1,2),display=TRUE) ## End(Not run)
## Not run: # -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- generate a pointwise .kml of the output -- pkml(mybcp) # -- show only stopovers and automatically display the .kml document -- pkml(mybcp,showClst=c(1,2),display=TRUE) ## End(Not run)
pmap
generates a pointwise .html file-map of a
binClstPath_instance in HTML5, using Google Maps JavaScript API v3
(https://developers.google.com/maps/documentation/javascript/). The
resulting file can be viewed locally in most browsers (an internet
connection is required for displaying the map tiles) or posted online.
pmap( obj, folder = "embcDocs", apiKey = "", mapType = "SATELLITE", markerRadius = 15, display = FALSE ) ## S4 method for signature 'binClstPath' pmap( obj, folder = "embcDocs", apiKey = "", mapType = "SATELLITE", markerRadius = 15, display = FALSE )
pmap( obj, folder = "embcDocs", apiKey = "", mapType = "SATELLITE", markerRadius = 15, display = FALSE ) ## S4 method for signature 'binClstPath' pmap( obj, folder = "embcDocs", apiKey = "", mapType = "SATELLITE", markerRadius = 15, display = FALSE )
obj |
|
folder |
A character string indicating the name of the folder in which the .html file will be saved. If the folder does not exist it is automatically created, (defaults to '~/embcDocs'). |
apiKey |
A character string specifying the API Key to be passed to the Google Maps server. No Key is needed for using Google Maps JavaScript API v3, but users may wish to specify a key in order to monitor web traffic if the document is being posted online. |
mapType |
A character string specifying the type of map to be used in the background. This value is passed directly to the Google Maps server, and currently can be set to ROADMAP, SATELLITE, HYBRID, or TERRAIN. (See the Google Maps API documentation for more information.) |
markerRadius |
A numeric value indicating the radius of the markers to be plotted, (defaults to 5 pixels). |
display |
A boolean value (defaults to FALSE) to automatically launch the system's default browser from within R to visualize the generated .html document. |
The path/name of the saved html file.
## Not run: # -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- generate a pointwise HTML of the output -- pmap(mybcp) ## End(Not run)
## Not run: # -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- generate a pointwise HTML of the output -- pmap(mybcp) ## End(Not run)
rlbl
Manual relabeling of clusters (to merge clusters or
relabel merged clusters).
rlbl(obj, old = 0, new = 0, reset = FALSE) ## S4 method for signature 'binClst' rlbl(obj, old = 0, new = 0, reset = FALSE)
rlbl(obj, old = 0, new = 0, reset = FALSE) ## S4 method for signature 'binClst' rlbl(obj, old = 0, new = 0, reset = FALSE)
obj |
|
old |
The number of the cluster to be relabeled. |
new |
The new number of the cluster. |
reset |
A boolean value (defaults to FALSE). If reset=TRUE the labeling is reset to the original state. |
Whenever two adjacent clusters are merged, the label identifying the splitting variable between them both is meaningless, and the algorithm ends up assigning either a L or H only depending on how it evolved until reaching the merging point. Thus it can happen that the final labeling of the resulting cluster is not the most intuitive one. With this method the labels can be changed as desired. It can also be used to manually force the merging of two clusters.
This method does not return a relabeled copy of the input obj
,
instead the binClst_instance itself is relabeled. However, this is
intended only for output and visualization purposes (sctr(), lblp(),
cnfm(), view()) as the binClst_instance parameters (GMM parameters and
binary delimiters) are not recomputed. Thus the input instance can always be
reset to its original state.
This method does not return a relabeled copy of the input
obj
, instead the binClst_instance itself is relabeled. It is
intended only for visualization purposes, as it does not recompute the GMM
parameters nor the binary delimiters of the binClst_instance.
# -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- manually merge clusters 1 and 2 -- rlbl(mybcp,1,2) # -- reset to the original state -- rlbl(mybcp,reset=TRUE)
# -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- manually merge clusters 1 and 2 -- rlbl(mybcp,1,2) # -- reset to the original state -- rlbl(mybcp,reset=TRUE)
sct3
generates a dynamic 3D-scatterplot of a multivariate
binClst_instance, showing clusters in different colors. The scatter
plot can be zoomed/rotated with the mouse.
sct3(obj, ...) ## S4 method for signature 'binClst' sct3(obj, showVars = NULL, showClst = NULL, ...)
sct3(obj, ...) ## S4 method for signature 'binClst' sct3(obj, showVars = NULL, showClst = NULL, ...)
obj |
|
... |
Parameters |
showVars |
When the number of variables is greater than two, a length 3 numeric vector indicating one splitting variable and two variables to be scattered (given in that order). |
showClst |
When the number of variables is greater than two, a numeric vector (of variable length) indicating a subset of the clusters that will be shown in the scatter plot. This is useful in case of overlapping clusters. |
This function needs the package "rgl" to be installed.
## Not run: # -- apply EMbC to the example path with scv='height' -- mybcp <- stbc(expth,scv='height') # -- show a dynamic 3D-scatterplot -- sct3(mybcp) # -- show only a subset of clusters -- sct3(mybcp,showClst=c(2,4,6)) ## End(Not run)
## Not run: # -- apply EMbC to the example path with scv='height' -- mybcp <- stbc(expth,scv='height') # -- show a dynamic 3D-scatterplot -- sct3(mybcp) # -- show only a subset of clusters -- sct3(mybcp,showClst=c(2,4,6)) ## End(Not run)
sctr
generates a scatterplot from a
binClst_instance, showing clusters in different colors.
sctr(obj, ...) ## S4 method for signature 'binClst' sctr(obj, ref = NULL, showVars = NULL, showClst = NULL, bg = NULL, ...) ## S4 method for signature 'binClstStck' sctr(obj, ref = NULL, showVars = NULL, showClst = NULL, ...)
sctr(obj, ...) ## S4 method for signature 'binClst' sctr(obj, ref = NULL, showVars = NULL, showClst = NULL, bg = NULL, ...) ## S4 method for signature 'binClstStck' sctr(obj, ref = NULL, showVars = NULL, showClst = NULL, ...)
obj |
|
... |
Parameters |
ref |
A numeric vector with expert/reference labeling for visual validation of the clustering. A second binClst_instance to be compared with the former. |
showVars |
When the number of variables is greater than two, a length 3 numeric vector indicating one splitting variable and two variables to be scattered (given in that order). |
showClst |
When the number of variables is greater than two, a numeric vector (of variable length) indicating a subset of the clusters that will be shown in the scatter plot. This is useful in case of overlapping clusters. |
bg |
A valid colour to be used as background colour for multivariate scatterplots. By default a light-grey colour is used to enhance data points visibility. |
# -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- show the scatterplot compared with expert labeling-- sctr(mybcp,expth$lbl)
# -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- show the scatterplot compared with expert labeling-- sctr(mybcp,expth$lbl)
setc
sets the color palette to a color family from the
RColorbrewer package
setc(bC, fam = "RdYlBu")
setc(bC, fam = "RdYlBu")
bC |
|
fam |
The name of a color family from the Rcolorbrewer R-package, (default color palette is 'RdYlBu' which is colorblind safe and print friendly up to 6 colors). |
# -- change the color palette of mybc to "PuOr" -- ## Not run: setc(mybc,'PuOr') ## End(Not run)
# -- change the color palette of mybc to "PuOr" -- ## Not run: setc(mybc,'PuOr') ## End(Not run)
binClstStck
instance.slct
selects a single path from a binClstStck
instance.
slct(stck, pathNmbr)
slct(stck, pathNmbr)
stck |
A |
pathNmbr |
The number of the single path to be selected. |
Returns the single binClstPath_instance selected.
## Not run: # -- select path number 3 in mybcpstack -- bcp3 <- slct(mybcpstack,3) ## End(Not run)
## Not run: # -- select path number 3 in mybcpstack -- bcp3 <- slct(mybcpstack,3) ## End(Not run)
smth
Performs a posterior smoothing of single local
labels (locations that differ from their neighbouring locations while the
later have equal labels).
smth(obj, dlta = 1) ## S4 method for signature 'binClst' smth(obj, dlta = 1) ## S4 method for signature 'binClstStck' smth(obj, dlta = 1)
smth(obj, dlta = 1) ## S4 method for signature 'binClst' smth(obj, dlta = 1) ## S4 method for signature 'binClstStck' smth(obj, dlta = 1)
obj |
Either a |
dlta |
A numeric value in the range (0,1) (default is 1) indicating the
user's will to accept a change of label. The change of label is done
whenever the decrease in likelihood is not greater then |
A smoothed copy of the input instance. In the case of a
binClstStck_instance
smoothing is performed at population level
as well as at each individual trajectory in the stack.
# -- cluster the example path with a prior smooth of 1 hour -- mysmoothbcp <- stbc(expth,smth=1,info=-1) # -- apply a posterior smoothing -- mysmoothbcpsmoothed <- smth(mysmoothbcp,dlta=0.5)
# -- cluster the example path with a prior smooth of 1 hour -- mysmoothbcp <- stbc(expth,smth=1,info=-1) # -- apply a posterior smoothing -- mysmoothbcpsmoothed <- smth(mysmoothbcp,dlta=0.5)
stbc
is a specific constructor for movement ecology pourposes. By default it implements a bivariate (speed/turn) clustering for behavioural annotation of animals' movement trajectories. Alternatively, it can perform a trivariate clustering by including the solar position covariate (i.e. solar height or solar azimuth) as a daytime indicator.
stbc( obj, stdv = c(0.1, 5 * pi/180), spdLim = 40, smth = 0, scv = "None", maxItr = 200, info = 0 )
stbc( obj, stdv = c(0.1, 5 * pi/180), spdLim = 40, smth = 0, scv = "None", maxItr = 200, info = 0 )
obj |
A A A |
stdv |
a vector with bounds for the maximum precision of clusters, given as minimum standard deviation for each variable, (by default is set to 0.1 m/s for velocities and 5 degrees for turns). |
spdLim |
A speed limit for automatic detection of outliers. Trajectory locations with associated values of speed above the spdLim are not eliminated but will play no part in the clustering. By default is set to 40 m/s. |
smth |
A smoothing time interval in hours. This is used to estimate local values of speed and turn computed as an average over a time window centered at each location. |
scv |
A solar position covariate to be used as a daytime indicator. It can be either 'height' (the solar height in degrees above the horizon) or 'azimuth' (the solar azimuth in degrees from north). If it is used, a trivariate clustering is performed, increasing to a maximum of 8 the number of clusters (behaviours) that can potentially be identified. By default this value is set to None (i.e. perform the standard bivariate speed/turn clustering). |
maxItr |
A limit to the number of iterations in case of slow convergence (defaults to 200). |
info |
Level of information shown at each step: info=0 (default) shows step likelihood, number of clusters, and number of changing labels; info=1, include clustering statistics; info=2, include delimiters information; info<0, suppress any step information. |
Returns a binClstPath object.
# -- apply EMbC to the example path -- mybcp <- stbc(expth) ## Not run: # --- binary clustering of a Move object --- require(move) mybcm <- stbc(move(system.file("extdata","leroy.csv.gz",package="move"))) # --- binary clustering of a stack of trajetories --- mybcm <- stbc(list(mypth1,mypth2,mypth3)) ## End(Not run)
# -- apply EMbC to the example path -- mybcp <- stbc(expth) ## Not run: # --- binary clustering of a Move object --- require(move) mybcm <- stbc(move(system.file("extdata","leroy.csv.gz",package="move"))) # --- binary clustering of a stack of trajetories --- mybcm <- stbc(list(mypth1,mypth2,mypth3)) ## End(Not run)
stts
clustering statistics information.
stts(obj, dec = 2, width = 8) ## S4 method for signature 'binClst' stts(obj, dec = 2, width = 8) ## S4 method for signature 'binClstStck' stts(obj, dec = 2, width = 8)
stts(obj, dec = 2, width = 8) ## S4 method for signature 'binClst' stts(obj, dec = 2, width = 8) ## S4 method for signature 'binClstStck' stts(obj, dec = 2, width = 8)
obj |
Either a binClst_instance or a |
dec |
The number of decimals for mean/stdv formatting. |
width |
The number of digits for mean/stdv formatting. |
This method prints a line for each cluster with the following information: the cluster number, the cluster binary label, the cluster mean and variance of each input feature (two columns for each variable), and the size of the cluster in number and proportion of points (the posterior marginal distribution).
# -- apply EMbC to the example path with solar covariate 'height'-- mybcp <- stbc(expth,scv='height',info=-1) # -- show clustering statistics -- stts(mybcp,width=5,dec=1) ## Not run: # -- show clustering statistics of mybcpstack at stack level -- stts(mybcpstack) # -- show individual statistics for path number 3 in mybcpstack -- stts(slct(mybcpstack,3)) ## End(Not run)
# -- apply EMbC to the example path with solar covariate 'height'-- mybcp <- stbc(expth,scv='height',info=-1) # -- show clustering statistics -- stts(mybcp,width=5,dec=1) ## Not run: # -- show clustering statistics of mybcpstack at stack level -- stts(mybcpstack) # -- show individual statistics for path number 3 in mybcpstack -- stts(slct(mybcpstack,3)) ## End(Not run)
varp
easy plot of input, output and intermediate
variables of a binClstPath_instance.
varp(obj, ...) ## S4 method for signature 'binClstPath' varp(obj, lims = NULL, ...) ## S4 method for signature 'matrix' varp(obj, lims = NULL, ...)
varp(obj, ...) ## S4 method for signature 'binClstPath' varp(obj, lims = NULL, ...) ## S4 method for signature 'matrix' varp(obj, lims = NULL, ...)
obj |
Either a matrix or a binClstPath_instance. |
... |
Parameter |
lims |
A numeric vector with lower and upper bounds to limit the plot. |
If obj
is a matrix, axes labels are automatically generated from the
colnames()
of the matrix, hence they can be changed as desired.
If obj
is a binClstPath_instance it plots the values of the
intermediate computations saved in slots mybcp@spn (span times), mybcp@dst
(distances) and mybcp@hdg (local heading directions).
# -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- plot clustering data points -- varp(mybcp@X) # -- plot data points' certainties -- varp(mybcp@U) # -- plot intermediate computations (span-times, distances and headings) in one figure -- varp(mybcp) ## Not run: # -- plot only span-times between locations a and b -- plot(seq(a,b),mybcp@spn[a:b],col=4,type='l',xlab='loc',ylab='spanTime (s)') ## End(Not run)
# -- apply EMbC to the example path -- mybcp <- stbc(expth,info=-1) # -- plot clustering data points -- varp(mybcp@X) # -- plot data points' certainties -- varp(mybcp@U) # -- plot intermediate computations (span-times, distances and headings) in one figure -- varp(mybcp) ## Not run: # -- plot only span-times between locations a and b -- plot(seq(a,b),mybcp@spn[a:b],col=4,type='l',xlab='loc',ylab='spanTime (s)') ## End(Not run)
view
provides a fast plot of a segmented trajectory or
specific chunks of it.
view(obj, ...) ## S4 method for signature 'binClstPath' view(obj, lbl = NULL, lims = NULL, bg = NULL, ...) ## S4 method for signature 'data.frame' view(obj, lbl = NULL, lims = NULL, bg = NULL, ...)
view(obj, ...) ## S4 method for signature 'binClstPath' view(obj, lbl = NULL, lims = NULL, bg = NULL, ...) ## S4 method for signature 'data.frame' view(obj, lbl = NULL, lims = NULL, bg = NULL, ...)
obj |
A binClstPath_instance or a data.frame with the format
described for slot |
... |
Parameters |
lbl |
A numeric vector with location labels. If |
lims |
A numeric vector with lower and upper limit locations to show only a chunk of the trajectory. |
bg |
A valid colour to be used as background colour. By default a light-grey colour is used to enhance data points visibility. |
# -- Fast view of the binClstPath instance included in the package -- view(expth) # -- the same with reference labels -- view(expth,lbl=TRUE)
# -- Fast view of the binClstPath instance included in the package -- view(expth) # -- the same with reference labels -- view(expth,lbl=TRUE)
An ad-hoc object with a set of bivariate data points synthetically generated by sampling from a four component GMM and their corresponding labels indicating which component of the mixture generated each data point.
See parameter X
of the embc constructor.