Title: | Spatial Entropy Measures |
---|---|
Description: | The heterogeneity of spatial data presenting a finite number of categories can be measured via computation of spatial entropy. Functions are available for the computation of the main entropy and spatial entropy measures in the literature. They include the traditional version of Shannon's entropy (Shannon, 1948 <doi:10.1002/j.1538-7305.1948.tb01338.x>), Batty's spatial entropy (Batty, 1974 <doi:10.1111/j.1538-4632.1974.tb01014.x>), O'Neill's entropy (O'Neill et al., 1998 <doi:10.1007/BF00162741>), Li and Reynolds' contagion index (Li and Reynolds, 1993 <doi:10.1007/BF00125347>), Karlstrom and Ceccato's entropy (Karlstrom and Ceccato, 2002 <https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-61351>), Leibovici's entropy (Leibovici, 2009 <doi:10.1007/978-3-642-03832-7_24>), Parresol and Edwards' entropy (Parresol and Edwards, 2014 <doi:10.3390/e16041842>) and Altieri's entropy (Altieri et al., 2018, <doi:10.1007/s10651-017-0383-1>). Full references for all measures can be found under the topic 'SpatEntropy'. The package is able to work with lattice and point data. The updated version works with the updated 'spatstat' package (>= 3.0-2). |
Authors: | L. Altieri, D. Cocchi, G. Roli |
Maintainer: | Altieri Linda <[email protected]> |
License: | GPL-3 |
Version: | 2.2-4 |
Built: | 2024-11-12 06:29:56 UTC |
Source: | CRAN |
This function computes spatial mutual information and spatial residual entropy as in Altieri et al (2017) and following works.
References can be found at SpatEntropy
.
altieri(data, cell.size = 1, distbreak = "default", verbose = F, plotout = T)
altieri(data, cell.size = 1, distbreak = "default", verbose = F, plotout = T)
data |
If data are lattice, a data matrix, which can be numeric, factor, character, ...
If the dataset is a point pattern, |
cell.size |
A single number or a vector of length two, only needed if data are lattice. It gives the length of the side of each pixel; if the pixel is rectangular, the first number gives the horizontal side and the second number gives the vertical side. Default to 1. Ignored if data are points. |
distbreak |
Numeric. The chosen distance breaks for selecting pairs of pixels/points within the observation area.
The default option is |
verbose |
Logical. If |
plotout |
Logical. Default to |
The computation of Altieri's entropy starts from a point or areal dataset, for which
Shannon's entropy of the transformed variable (for details see
shannonZ
)
is computed using all possible pairs within the observation area. Then, its two components spatial mutual information
and spatial residual entropy
are calculated in order to account for the overall role of space in determining the data heterogeneity. Besides, starting from a partition into distance classes, a list of adjacency matrices is built, which identifies what pairs of units must be considered for each class. Spatial mutual information and spatial residual entropy are split into local terms according to the chosen distance breaks, so that the role of space can be investigated both in absolute and relative terms. In the function output, the relative partial terms are returned so that they sum to 1 for each distance class: e.g. if the relative SPI terms is 0.3 and the relative residual term is 0.7, the interpretation is that, at the specific distance class, 30% of the entropy is due to the role of space as a source of heterogeneity. The function is able to work with lattice data with missing data, as long as they are specified as NAs: missing data are ignored in the computations. The function is able to work with grids containing missing data, specified as NA values. All NAs are ignored in the computation and only couples of non-NA observations are considered.
A list with elements:
distance.breaks
a two column matrix with the lower and upper extreme of each distance class
SPI.terms
the spatial partial information terms
rel.SPI.terms
the relative version of spatial partial information terms (see the details)
RES.terms
the spatial partial residual entropies
rel.RES.terms
the relative version of spatial partial residual entropies (see the details)
SMI
the spatial mutual information
RES
the global residual entropy
ShannonZ
Shannon's entropy of in the same format as the output of
shannonZ()
W.distribution
the spatial weights for each distance range
total.pairs
the total number of pairs over the area (realizations of )
class.pairs
the number of pairs for each distance range.
cond.Z.distribution
a list with the conditional absolute and relative frequencies of for each distance range
#lattice data data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) outp=altieri(data) outp=altieri(data, cell.size=2) #same result outp=altieri(data, cell.size=2, distbreak=c(2, 5)) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,l=length(unique(c(data))))), main="", ribbon=TRUE) #lattice data with missing values data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) data=rbind(rep(NA, ncol(data)), data, rep(NA, ncol(data))) outp=altieri(data) #plot data plot(as.im(data, W=square(nrow(data))), col=topo.colors(length(unique(c(data)[!is.na(c(data))]))), main="", ribbon=TRUE) #point data data=ppp(x=runif(400), y=runif(400), window=square(1), marks=(sample(c("a","b","c"), 400, replace=TRUE))) outp=altieri(data) outp=altieri(data, verbose=TRUE) #plot data plot(data, cols=1:length(unique(marks(data))), main="", pch=16) #check what happens for badly specified distance breaks #outp=altieri(data, distbreak=c(1,1.4)) #outp=altieri(data, distbreak=c(1,2))
#lattice data data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) outp=altieri(data) outp=altieri(data, cell.size=2) #same result outp=altieri(data, cell.size=2, distbreak=c(2, 5)) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,l=length(unique(c(data))))), main="", ribbon=TRUE) #lattice data with missing values data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) data=rbind(rep(NA, ncol(data)), data, rep(NA, ncol(data))) outp=altieri(data) #plot data plot(as.im(data, W=square(nrow(data))), col=topo.colors(length(unique(c(data)[!is.na(c(data))]))), main="", ribbon=TRUE) #point data data=ppp(x=runif(400), y=runif(400), window=square(1), marks=(sample(c("a","b","c"), 400, replace=TRUE))) outp=altieri(data) outp=altieri(data, verbose=TRUE) #plot data plot(data, cols=1:length(unique(marks(data))), main="", pch=16) #check what happens for badly specified distance breaks #outp=altieri(data, distbreak=c(1,1.4)) #outp=altieri(data, distbreak=c(1,2))
This function partitions the observation area in a number of sub-areas, and assigns the data points/pixels to the areas. This function is useful either when a random partition wants to be created, or when the user wants to set the area's centroids and is happy with an area tessellation in Voronoi polygons according to the defined centroids.
areapart(data, G, cell.size = 1, win = NULL, plotout = T)
areapart(data, G, cell.size = 1, win = NULL, plotout = T)
data |
If data are lattice, a data matrix, which can be numeric, factor, character, ...
If the dataset is a point pattern, |
G |
An integer if sub-areas are randomly generated, determining the number |
cell.size |
A single number. If data are lattice, the length of the side of each pixel. Default to 1. Ignored if data are points. |
win |
Optional, the observation area given as a |
plotout |
Logical. Default to |
The function is preliminary to the computation of Batty's or Karlstrom and Ceccato's entropy. An event of interest (in the form of a point or binary areal dataset) occurs over an observation area divided into sub-areas. If the partition is random, this function generates the sub-areas by randomly drawing the areas' centroids over the observation window. Then, data points/pixels are assigned to the area with the closest centroid. When data are pixels, each pixel is assigned to an area according to the coordinates of its own centroid. The function also works for non-binary datasets and marked ppp objects.
A list with elements:
G.pp
a point pattern containing the areas' centroids
data.assign
a four column matrix, with all pairs of data coordinates and data values
matched to one of the areas (numbered 1 to
). If the dataset is an unmarked ppp
object, the data category column is a vector of 1s.
Moreover, a plot is produced showing the data and the area partition.
#LATTICE DATA data=matrix(sort(sample(c("a","b","c"), 100, replace=TRUE)), nrow=10) partition=areapart(data, G=5) partition=areapart(data, G=5, cell.size=2) #providing a pre-fixed area partition data=matrix(sort(sample(c("a","b","c"), 100, replace=TRUE)), nrow=10) win=square(nrow(data)) GG=cbind(runif(5, win$xrange[1], win$xrange[2]), runif(5, win$yrange[1], win$yrange[2])) partition=areapart(data, G=GG) #POINT DATA data=ppp(x=runif(100), y=runif(100), window=square(1)) partition=areapart(data, 10) #with marks data=ppp(x=runif(100), y=runif(100), window=square(1), marks=(sample(c("a","b","c"), 100, replace=TRUE))) GG=cbind(runif(10, data$window$xrange[1], data$window$xrange[2]), runif(10, data$window$yrange[1], data$window$yrange[2])) partition=areapart(data, G=GG)
#LATTICE DATA data=matrix(sort(sample(c("a","b","c"), 100, replace=TRUE)), nrow=10) partition=areapart(data, G=5) partition=areapart(data, G=5, cell.size=2) #providing a pre-fixed area partition data=matrix(sort(sample(c("a","b","c"), 100, replace=TRUE)), nrow=10) win=square(nrow(data)) GG=cbind(runif(5, win$xrange[1], win$xrange[2]), runif(5, win$yrange[1], win$yrange[2])) partition=areapart(data, G=GG) #POINT DATA data=ppp(x=runif(100), y=runif(100), window=square(1)) partition=areapart(data, 10) #with marks data=ppp(x=runif(100), y=runif(100), window=square(1), marks=(sample(c("a","b","c"), 100, replace=TRUE))) GG=cbind(runif(10, data$window$xrange[1], data$window$xrange[2]), runif(10, data$window$yrange[1], data$window$yrange[2])) partition=areapart(data, G=GG)
This function computes Batty's spatial entropy, following Batty (1976), see also Altieri et al (2017 and following)
(references are under the topic SpatEntropy
).
batty( data, category = 1, cell.size = 1, partition = 10, win = NULL, rescale = T, plotout = T )
batty( data, category = 1, cell.size = 1, partition = 10, win = NULL, rescale = T, plotout = T )
data |
If data are lattice, a data matrix, which can be numeric, factor, character, ...
If the dataset is a point pattern, |
category |
A single value matching the data category of interest for computing Batty's entropy. Default to 1. If the dataset is an unmarked point pattern, this argument must not be changed from the default. In the plot, only data belonging to the selected category are displayed. |
cell.size |
A single number or a vector of length two, only needed if data are lattice. It gives the length of the side of each pixel; if the pixel is rectangular, the first number gives the horizontal side and the second number gives the vertical side. Default to 1. Ignored if data are points. |
partition |
Input defining the partition into subareas. If an integer, it defines the
number of sub-areas that are randomly generated by areapart; if a two column matrix
with coordinates, they are the centroids of the subareas built by areapart. Alternatively,
it can be the output of areapart, a |
win |
Optional, the observation area given as a |
rescale |
Logical. Default to |
plotout |
Logical. Default to |
Batty's spatial entropy measures the heterogeneity in the spatial distribution of a phenomenon of interest, with regard to an area partition. It is high when the phenomenon is equally intense over the sub-areas, and low when it concentrates in one or few sub-areas. This function allows to compute Batty's entropy as
where is the probability of occurrence of the phenomenon over sub-area
,
and
is the sub-area size.
When data are categorical, the phenomenon of interest corresponds to
one category, which must be specified. If data are an unmarked
point pattern, a fake mark vector is be created with the same category for all points.
For comparison purposes, the relative version of Batty's entropy is also returned, i.e.
Batty's entropy divided by its maximum
.
Note that when the total observation area is 1, then
, therefore
in that case during the computation all
s are multiplied by 100 and a warning is produced.
The function is able to work with grids containing missing data, specified as NA values.
All NAs are ignored in the computation.
A list of five elements:
batty
Batty's entropy
range
The theoretical range of Batty's entropy
rel.batty
Batty's entropy divided by for comparison across observation areas.
areas
a dataframe giving, for each sub-area of the partition, the absolute and relative frequency of
the points/pixels of interest, the sub-area size and the intensity defined as
area.tess
a tess
object with the area partition
Moreover, a plot is produced showing the data and the area partition.
#LATTICE DATA data=matrix((sample(c("a","b","c"), 100, replace=TRUE)), nrow=10) batty.entropy=batty(data, category="a") #POINT DATA #unmarked pp data=ppp(x=runif(100, 0, 10), y=runif(100, 0, 10), window=square(10)) batty.entropy=batty(data) #smaller window so that some areas' size are smaller than 1 data=ppp(x=runif(100, 0, 3), y=runif(100, 0, 3), window=square(3)) batty.entropy=batty(data) #marked pp data=ppp(x=runif(100, 0, 10), y=runif(100, 0, 10), window=square(10), marks=(sample(1:5, 100, replace=TRUE))) plot(data) #see ?plot.ppp for options #if you want to compute the entropy on all points batty.entropy=batty(unmark(data)) #if you want to compute the entropy on a category, say 3 batty.entropy=batty(data, category=3)
#LATTICE DATA data=matrix((sample(c("a","b","c"), 100, replace=TRUE)), nrow=10) batty.entropy=batty(data, category="a") #POINT DATA #unmarked pp data=ppp(x=runif(100, 0, 10), y=runif(100, 0, 10), window=square(10)) batty.entropy=batty(data) #smaller window so that some areas' size are smaller than 1 data=ppp(x=runif(100, 0, 3), y=runif(100, 0, 3), window=square(3)) batty.entropy=batty(data) #marked pp data=ppp(x=runif(100, 0, 10), y=runif(100, 0, 10), window=square(10), marks=(sample(1:5, 100, replace=TRUE))) plot(data) #see ?plot.ppp for options #if you want to compute the entropy on all points batty.entropy=batty(unmark(data)) #if you want to compute the entropy on a category, say 3 batty.entropy=batty(data, category=3)
A lattice dataset with Bologna's Urban Morphological Zones (UMZ, see EEA, 2011).
bologna
bologna
A matrix
with 135 rows and 124 columns. Values are either 0 (non-urban) or 1 (urban). Pixels
outside the administrative borders are classified as NA.
This raster/pixel/lattice dataset comes from the EU CORINE Land Cover project (EEA, 2011) and is dated 2011. It is the result of classifying the original land cover data into urbanised and non-urbanised zones, known as 'Urban Morphological Zones' (UMZ, see EEA, 2011). UMZ data are useful to identify shapes and patterns of urban areas, and thus to detect what is known as urban sprawl. Bologna's metropolitan area is extracted from the European dataset and is composed by the municipality of Bologna and the surrounding municipalities: . The dataset is made of 135x124 pixels of size 250x250 metres.
EEA (2011). Corine land cover 2000 raster data. Technical Report, downloadable at http://www.eea.europa.eu/data-and-maps/ data/corine-land-cover-2000-raster-1
data(bologna) #plot(as.im(bologna), main="", col=gray(c(0.8,0)), ribbon=FALSE) #shannon's entropy shannon(bologna) #shannon's entropy of Z (urban/non-urban pairs) shannonZ(bologna) #oneill's entropy oneill(bologna) #leibovici's entropy on a subset of the window bolsub=bologna[30:70,45:85] plot(as.im(bolsub), main="", col=gray(c(0.8,0)), ribbon=FALSE) leibovici(bolsub, cell.size=250, ccdist=400, verbose=TRUE) #altieri's entropy bolsub=bologna[30:70,45:85] plot(as.im(bolsub), main="", col=gray(c(0.8,0)), ribbon=FALSE) altieri(bolsub, cell.size=250, distbreak=c(250, 500), verbose=TRUE) #batty's entropy #on all points, with a random partition in 10 sub-areas batty.ent=batty(bologna, cell.size=250, partition=10, win=bolognaW) #plot with partition data(bolognaW) #plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #plot(batty.ent$area.tess, add=TRUE, border=2) #batty's entropy with a partition based on the administrative areas data(bolognaTess) batty.ent=batty(bologna, cell.size=250, partition=bolognaTess, win=bolognaW) #plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #for(i in 1:bolognaTess$n) plot(bolognaTess$tiles[[i]], add=TRUE, border=2) #karlstrom and ceccato's entropy data(bolognaW) KC.ent=karlstrom(bologna, cell.size=250, partition=15, win=bolognaW, neigh=3) #plot with partition #plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #plot(KC.ent$area.tess, add=TRUE, border=2) #karlstrom and ceccato's entropy with a partition based on the administrative #areas data(bolognaTess) KC.ent=karlstrom(bologna, cell.size=250, partition=bolognaTess, win=bolognaW, neigh=10000, method="distance") #plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #for(i in 1:bolognaTess$n) plot(bolognaTess$tiles[[i]], add=TRUE, border=2)
data(bologna) #plot(as.im(bologna), main="", col=gray(c(0.8,0)), ribbon=FALSE) #shannon's entropy shannon(bologna) #shannon's entropy of Z (urban/non-urban pairs) shannonZ(bologna) #oneill's entropy oneill(bologna) #leibovici's entropy on a subset of the window bolsub=bologna[30:70,45:85] plot(as.im(bolsub), main="", col=gray(c(0.8,0)), ribbon=FALSE) leibovici(bolsub, cell.size=250, ccdist=400, verbose=TRUE) #altieri's entropy bolsub=bologna[30:70,45:85] plot(as.im(bolsub), main="", col=gray(c(0.8,0)), ribbon=FALSE) altieri(bolsub, cell.size=250, distbreak=c(250, 500), verbose=TRUE) #batty's entropy #on all points, with a random partition in 10 sub-areas batty.ent=batty(bologna, cell.size=250, partition=10, win=bolognaW) #plot with partition data(bolognaW) #plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #plot(batty.ent$area.tess, add=TRUE, border=2) #batty's entropy with a partition based on the administrative areas data(bolognaTess) batty.ent=batty(bologna, cell.size=250, partition=bolognaTess, win=bolognaW) #plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #for(i in 1:bolognaTess$n) plot(bolognaTess$tiles[[i]], add=TRUE, border=2) #karlstrom and ceccato's entropy data(bolognaW) KC.ent=karlstrom(bologna, cell.size=250, partition=15, win=bolognaW, neigh=3) #plot with partition #plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #plot(KC.ent$area.tess, add=TRUE, border=2) #karlstrom and ceccato's entropy with a partition based on the administrative #areas data(bolognaTess) KC.ent=karlstrom(bologna, cell.size=250, partition=bolognaTess, win=bolognaW, neigh=10000, method="distance") #plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #for(i in 1:bolognaTess$n) plot(bolognaTess$tiles[[i]], add=TRUE, border=2)
City borders of all municipalities included in the Bologna dataset, in the format of polyognal windows owin
objects.
bolognaTess
bolognaTess
A list
of three:
tiles
a list
of 11, each element is a owin
object with the
administrative border of one municipality of Bologna's dataset
n
the number of municipalities
names
the names of the 11 municipalities, in the same order as the windows
The object contains a list of 11 observation windows created as owin
objects based on the coordinates of the border polygons, for each municipality.
See ?owin
for details.
The object also contains the names of the municipalities, in Italian.
Examples on the usefulness of the administrative borders can be found at the topic bologna.
EEA (2011). Corine land cover 2000 raster data. Technical Report, downloadable at http://www.eea.europa.eu/data-and-maps/ data/corine-land-cover-2000-raster-1
data(bologna); data(bolognaW); data(bolognaTess) plot(bolognaW, main="") plot(bolognaTess$tiles[[1]],border=2, add=TRUE, lwd=2) for(ll in 2:bolognaTess$n) plot(bolognaTess$tiles[[ll]],border=2, add=TRUE, lwd=2) plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.85,0.4)), ribbon=FALSE) plot(bolognaTess$tiles[[1]],border=1, add=TRUE, lwd=2) for(ll in 2:bolognaTess$n) plot(bolognaTess$tiles[[ll]],border=1, add=TRUE, lwd=2) #see examples under the topic "bologna"
data(bologna); data(bolognaW); data(bolognaTess) plot(bolognaW, main="") plot(bolognaTess$tiles[[1]],border=2, add=TRUE, lwd=2) for(ll in 2:bolognaTess$n) plot(bolognaTess$tiles[[ll]],border=2, add=TRUE, lwd=2) plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.85,0.4)), ribbon=FALSE) plot(bolognaTess$tiles[[1]],border=1, add=TRUE, lwd=2) for(ll in 2:bolognaTess$n) plot(bolognaTess$tiles[[ll]],border=1, add=TRUE, lwd=2) #see examples under the topic "bologna"
An owin
object with the rectangle circumscribing the city border for the Bologna dataset.
bolognaW
bolognaW
An owin
object. The basic spatial unit is a 250x250 metres pixels.
This observation window is an owin
object with margins given by the CORINE project coordinates.
See ?owin
for details.
Examples on the usefulness of the window can be found at the topic bologna.
EEA (2011). Corine land cover 2000 raster data. Technical Report, downloadable at http://www.eea.europa.eu/data-and-maps/ data/corine-land-cover-2000-raster-1
data(bolognaW) plot(bolognaW, main="") plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE, add=TRUE) #see examples under the topic "bologna"
data(bolognaW) plot(bolognaW, main="") plot(as.im(bologna, W=bolognaW), main="", col=gray(c(0.8,0)), ribbon=FALSE, add=TRUE) #see examples under the topic "bologna"
This function computes Li and Reynold's contagion index, following Li and Reynolds (1993),
starting from a data matrix. References can be found at SpatEntropy
.
contagion( data, win = spatstat.geom::owin(xrange = c(0, ncol(data)), yrange = c(0, nrow(data))), plotout = T )
contagion( data, win = spatstat.geom::owin(xrange = c(0, ncol(data)), yrange = c(0, nrow(data))), plotout = T )
data |
A data matrix or vector, can be numeric, factor, character, ... |
win |
Optional, an object of class |
plotout |
Logical. Default to |
This index is based on the transformed variable identifying couples of realizations
of the variable of interest. A distance of interest is fixed: the contagion index is
originally thought for areas sharing a border, as O'Neill's entropy. Then, all contiguous couples
of realizations of the variable of interest are counted
and their relative frequencies are used to compute the index, which is
where
is the relative version of O'Neill's entropy, i.e. O'Neill's entropy divided by its maximum
,
being the number of categories of the variable under study. The relative contagion index ranges
from 0 (no contagion, maximum entropy) to 1 (maximum contagion).
The function is able to work with grids containing missing data, specified as NA values.
All NAs are ignored in the computation and only couples of non-NA observations are considered.
a list of two elements:
contagion
Li and Reynold's relative contagion index
probabilities
a table with absolute frequencies and estimated probabilities (relative frequencies) for all couple categories
Moreover, a plot of the dataset is produced.
#numeric data, square grid data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) contagion(data) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,length.out=length(unique(c(data))))), main="", ribbon=TRUE) #character data, rectangular grid data=matrix(sample(c("a","b","c"), 300, replace=TRUE), nrow=30) contagion(data) #plot data plot(as.im(data, W=owin(xrange=c(0,ncol(data)), yrange=c(0,nrow(data)))), col=terrain.colors(length(unique(c(data)))), main="", ribbon=TRUE)
#numeric data, square grid data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) contagion(data) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,length.out=length(unique(c(data))))), main="", ribbon=TRUE) #character data, rectangular grid data=matrix(sample(c("a","b","c"), 300, replace=TRUE), nrow=30) contagion(data) #plot data plot(as.im(data, W=owin(xrange=c(0,ncol(data)), yrange=c(0,nrow(data)))), col=terrain.colors(length(unique(c(data)))), main="", ribbon=TRUE)
This function computes Karlstrom and Ceccato's spatial entropy for a
chosen neighbourhood distance,
following Karlstrom and Ceccato (2002), see also Altieri et al (2017) and following works
(references are under the topic SpatEntropy
).
karlstrom( data, category = 1, cell.size = 1, partition = 10, win = NULL, neigh = 4, method = "number", plotout = T ) battyLISA( data, category = 1, cell.size = 1, partition = 10, win = NULL, neigh = 4, method = "number", plotout = T )
karlstrom( data, category = 1, cell.size = 1, partition = 10, win = NULL, neigh = 4, method = "number", plotout = T ) battyLISA( data, category = 1, cell.size = 1, partition = 10, win = NULL, neigh = 4, method = "number", plotout = T )
data |
If data are lattice, a data matrix, which can be numeric, factor, character, ...
If the dataset is a point pattern, |
category |
A single value matching the data category of interest for computing Batty's entropy. Default to 1. If the dataset is an unmarked point pattern, this argument must not be changed from the default. |
cell.size |
A single number or a vector of length two, only needed if data are lattice. It gives the length of the side of each pixel; if the pixel is rectangular, the first number gives the horizontal side and the second number gives the vertical side. Default to 1. Ignored if data are points. |
partition |
Input defining the partition into subareas. If an integer, it defines the
number of sub-areas that are randomly generated by areapart; if a two column matrix
with coordinates, they are the centroids of the subareas built by areapart. Alternatively,
it can be the output of areapart, a |
win |
Optional, the observation area given as a |
neigh |
A single number. It can be either the number of neighbours for each sub-area (including the area itself). or the Euclidean distance to define which sub-areas are neighbours, based on their centroids. Default to 4 neighbours. |
method |
Character, it guides the interpretation of |
plotout |
Logical. Default to |
Karlstrom and Ceccato's spatial entropy measures the heterogeneity in the spatial distribution
of a phenomenon of interest, with regard to an area partition and accounting for the neighbourhood.
It is similar to Batty's entropy (see batty) discarding the sub-area size,
with the difference that the probability of occurrence of the phenomenon over area
is actually a weighted sum of the neighbouring probabilities.
where is the probability of occurrence of the phenomenon over sub-area
,
and
is the averaged probability over the neighbouring areas (including the g-th area itself).
When data are categorical, the phenomenon of interest corresponds to
one category, which must be specified. If data are an unmarked
point pattern, a fake mark vector is be created with the same category for all points.
For comparison purposes, the relative version of Karlstrom and Ceccato's entropy is also returned, i.e.
Karlstrom and Ceccato's entropy divided by its maximum log(number of sub-areas).
The function is able to work with grids containing missing data, specified as NA values.
All NAs are ignored in the computation.
A list of five elements:
karlstrom
Karlstrom and Ceccato's entropy
range
The theoretical range of Karlstrom and Ceccato's entropy
rel.karl
Karlstrom and Ceccato's entropy divided by (number og sub-areas) for comparison across observation areas.
areas
a dataframe giving, for each sub-area, the absolute and relative frequency of
the points/pixels of interest, the weighted probabilities of the neighbours and the sub-area size
area.tess
a tess
object with the area partition
Moreover, a plot is produced showing the data and the area partition.
#LATTICE DATA data=matrix((sample(c("a","b","c"), 100, replace=TRUE)), nrow=10) KC.entropy=karlstrom(data, category="a") KC.entropy=karlstrom(data, category="a", neigh=3.5, method="distance") ##to plot data.binary=matrix(as.numeric(data=="a"), nrow(data)) plot(as.im(data.binary, W=KC.entropy$area.tess$window), main="", col=grDevices::gray(seq(1,0,l=length(unique(c(data.binary))))), ribbon=FALSE) plot(KC.entropy$area.tess, add=TRUE, border=2) #POINT DATA #unmarked pp data=ppp(x=runif(100, 0, 10), y=runif(100, 0, 10), window=square(10)) KC.entropy=karlstrom(data) ##to plot plot(data) plot(KC.entropy$area.tess, add=TRUE, border=2) #marked pp data=ppp(x=runif(100, 0, 10), y=runif(100, 0, 10), window=square(10), marks=(sample(1:5, 100, replace=TRUE))) #if you want to compute the entropy on all points KC.entropy=karlstrom(unmark(data)) #if you want to compute the entropy on a category, say 3 KC.entropy=karlstrom(data, category=3) ##to plot using the selected category ind=which(spatstat.geom::marks(data)==3) data.binary=unmark(data[ind]) plot(data.binary) plot(KC.entropy$area.tess, add=TRUE, border=2)
#LATTICE DATA data=matrix((sample(c("a","b","c"), 100, replace=TRUE)), nrow=10) KC.entropy=karlstrom(data, category="a") KC.entropy=karlstrom(data, category="a", neigh=3.5, method="distance") ##to plot data.binary=matrix(as.numeric(data=="a"), nrow(data)) plot(as.im(data.binary, W=KC.entropy$area.tess$window), main="", col=grDevices::gray(seq(1,0,l=length(unique(c(data.binary))))), ribbon=FALSE) plot(KC.entropy$area.tess, add=TRUE, border=2) #POINT DATA #unmarked pp data=ppp(x=runif(100, 0, 10), y=runif(100, 0, 10), window=square(10)) KC.entropy=karlstrom(data) ##to plot plot(data) plot(KC.entropy$area.tess, add=TRUE, border=2) #marked pp data=ppp(x=runif(100, 0, 10), y=runif(100, 0, 10), window=square(10), marks=(sample(1:5, 100, replace=TRUE))) #if you want to compute the entropy on all points KC.entropy=karlstrom(unmark(data)) #if you want to compute the entropy on a category, say 3 KC.entropy=karlstrom(data, category=3) ##to plot using the selected category ind=which(spatstat.geom::marks(data)==3) data.binary=unmark(data[ind]) plot(data.binary) plot(KC.entropy$area.tess, add=TRUE, border=2)
This function computes Leibovici's entropy according to a chosen distance
(with O'Neill's entropy as a special case)
following Leibovici (2009), see also Altieri et al (2017). References can be found at
SpatEntropy
.
leibovici( data, cell.size = 1, ccdist = cell.size[1], win = NULL, verbose = F, plotout = T )
leibovici( data, cell.size = 1, ccdist = cell.size[1], win = NULL, verbose = F, plotout = T )
data |
If data are lattice, a data matrix, which can be numeric, factor, character, ...
If the dataset is a point pattern, |
cell.size |
A single number or a vector of length two, only needed if data are lattice. It gives the length of the side of each pixel; if the pixel is rectangular, the first number gives the horizontal side and the second number gives the vertical side. Default to 1. Ignored if data are points. |
ccdist |
A single number. The chosen distance for selecting couples of pixels/points within the observation area. Default to |
win |
Optional, an object of class |
verbose |
Logical. If |
plotout |
Logical. Default to |
This index is based on the transformed variable identifying couples of realizations
of the variable of interest. A distance of interest is fixed, which in the case of O'Neill's
entropy is the contiguity, i.e. sharing a border for lattice data. Then, all couples
of realizations of the variable of interest lying at a distance smaller or equal to the distance of interest
are counted, and their relative frequencies are used to compute the index with the traditional Shannon's
formula.
#'
where is a generic couple of realizations of the study variable
.
The conditioning on
means that only couples within a predefined distance are considered.
The maximum value for Leibovici's entropy is
where
is the number of categories of the study variable
.
The relative version of Leibovici's entropy is obtained by dividing the entropy value by its maximum, and is useful for comparison across datasets with
a different number of categories.
The function is able to work with grids containing missing data, specified as NA values.
All NAs are ignored in the computation and only couples of non-NA observations are considered.
a list of four elements:
leib
Leibovici's entropy
range
the theoretical range of Leibovici's entropy, from 0 to
rel.leib
Leibovici's relative entropy
probabilities
a table with absolute frequencies and estimated probabilities (relative frequencies) for all couple categories
Moreover, a plot of the dataset is produced. Over the plot, a random point is displayed as a red star, and a circle is plotted
around that point. The radius of the circle is set by ccdist
, so that a visual idea is given about the
choice of the distance for building co-occurrences.
#random grid data - high entropy data=matrix(sample(c("a","b","c"), 400, replace=TRUE), nrow=20) leibovici(data, cell.size=1, ccdist=2) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,length.out=length(unique(c(data))))), main="", ribbon=TRUE) #compact grid data - low entropy data=matrix(sort(sample(c("a","b","c"), 400, replace=TRUE)), nrow=20) #Note: with sorted data, only some couple categories will be present leibovici(data, cell.size=1, ccdist=1.5) #plot data plot(as.im(data, W=square(nrow(data))), col=heat.colors(length(unique(c(data)))), main="", ribbon=TRUE) #point data data=ppp(x=runif(400), y=runif(400), window=square(1), marks=sample(1:4, 400, replace=TRUE)) leibovici(data, ccdist=0.1) #plot data plot(data)
#random grid data - high entropy data=matrix(sample(c("a","b","c"), 400, replace=TRUE), nrow=20) leibovici(data, cell.size=1, ccdist=2) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,length.out=length(unique(c(data))))), main="", ribbon=TRUE) #compact grid data - low entropy data=matrix(sort(sample(c("a","b","c"), 400, replace=TRUE)), nrow=20) #Note: with sorted data, only some couple categories will be present leibovici(data, cell.size=1, ccdist=1.5) #plot data plot(as.im(data, W=square(nrow(data))), col=heat.colors(length(unique(c(data)))), main="", ribbon=TRUE) #point data data=ppp(x=runif(400), y=runif(400), window=square(1), marks=sample(1:4, 400, replace=TRUE)) leibovici(data, ccdist=0.1) #plot data plot(data)
This function computes O'Neill's entropy for a data matrix (see O'Neill et al, 1988).
oneill( data, win = spatstat.geom::owin(xrange = c(0, ncol(data)), yrange = c(0, nrow(data))), plotout = T )
oneill( data, win = spatstat.geom::owin(xrange = c(0, ncol(data)), yrange = c(0, nrow(data))), plotout = T )
data |
A data matrix, can be numeric, factor, character, ... |
win |
Optional, an object of class |
plotout |
Logical. Default to |
O'Neill's entropy index is based on the transformed variable , identifying couples of realizations
of the variable of interest:
where is a generic couple of realizations of the study variable
.
The conditioning on
for grid data means that only contiguous couples are considered, i.e.
couples of pixels sharing a border.
All contiguous couples of realizations of the variable of interest are counted
and their relative frequencies are used to compute the index. The maximum value for O'Neill's entropy is
where
is the number of categories of
. The relative version of O'Neill's entropy
is obtained by dividing the entropy value by its maximum, and is useful for comparison across datasets with
a different number of categories.
The function is able to work with grids containing missing data, specified as NA values.
All NAs are ignored in the computation and only couples of non-NA observations are considered.
a list of four elements:
oneill
O'Neill's entropy
range
the theoretical range of O'Neill's entropy, from 0 to
rel.oneill
O'Neill's relative entropy
probabilities
a table with absolute frequencies and estimated probabilities (relative frequencies) for all couple categories
Moreover, a plot of the dataset is produced.
#numeric data, square grid data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) oneill(data) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,length.out=length(unique(c(data))))), main="", ribbon=TRUE) #character data, rectangular grid data=matrix(sample(c("a","b","c"), 300, replace=TRUE), nrow=30) oneill(data) #plot data plot(as.im(data, W=owin(xrange=c(0,ncol(data)), yrange=c(0,nrow(data)))), col=terrain.colors(length(unique(c(data)))), main="", ribbon=TRUE) #data with missing values data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) data=rbind(rep(NA, ncol(data)), data, rep(NA, ncol(data))) oneill(data)
#numeric data, square grid data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) oneill(data) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,length.out=length(unique(c(data))))), main="", ribbon=TRUE) #character data, rectangular grid data=matrix(sample(c("a","b","c"), 300, replace=TRUE), nrow=30) oneill(data) #plot data plot(as.im(data, W=owin(xrange=c(0,ncol(data)), yrange=c(0,nrow(data)))), col=terrain.colors(length(unique(c(data)))), main="", ribbon=TRUE) #data with missing values data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) data=rbind(rep(NA, ncol(data)), data, rep(NA, ncol(data))) oneill(data)
Compute Parresol and Edwards' entropy, following Parresol and Edwards (2014),
starting from data. References can be found at SpatEntropy
.
parredw( data, win = spatstat.geom::owin(xrange = c(0, ncol(data)), yrange = c(0, nrow(data))), plotout = T )
parredw( data, win = spatstat.geom::owin(xrange = c(0, ncol(data)), yrange = c(0, nrow(data))), plotout = T )
data |
A data matrix or vector, can be numeric, factor, character, ... |
win |
Optional, an object of class |
plotout |
Logical. Default to |
This index is based on the transformed variable identifying couples of realizations
of the variable of interest. A distance of interest is fixed: Parresol and Edwards' entropy is
thought for areas sharing a border, as O'Neill's entropy. All contiguous couples
of realizations of the variable of interest are counted
and their relative frequencies are used to compute the index, which is the opposite of O'Neill's entropy.
The function is able to work with grids containing missing data, specified as NA values.
All NAs are ignored in the computation and only couples of non-NA observations are considered.
a list of four elements:
parredw
Parresol and Edwards' entropy
range
the theoretical range of Parresol and Edwards' entropy, from to 0
rel.parredw
Parresol and Edwards' relative entropy (with the same interpretation as O'Neill's relative entropy)
probabilities
a table with absolute frequencies and estimated probabilities (relative frequencies) for all couple categories
Moreover, a plot of the dataset is produced.
#numeric data, square grid data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) parredw(data) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,length.out=length(unique(c(data))))), main="", ribbon=TRUE) #character data, rectangular grid data=matrix(sample(c("a","b","c"), 300, replace=TRUE), nrow=30) parredw(data) #plot data plot(as.im(data, W=owin(xrange=c(0,ncol(data)), yrange=c(0,nrow(data)))), col=terrain.colors(length(unique(c(data)))), main="", ribbon=TRUE)
#numeric data, square grid data=matrix(sample(1:5, 100, replace=TRUE), nrow=10) parredw(data) #plot data plot(as.im(data, W=square(nrow(data))), col=grDevices::gray(seq(1,0,length.out=length(unique(c(data))))), main="", ribbon=TRUE) #character data, rectangular grid data=matrix(sample(c("a","b","c"), 300, replace=TRUE), nrow=30) parredw(data) #plot data plot(as.im(data, W=owin(xrange=c(0,ncol(data)), yrange=c(0,nrow(data)))), col=terrain.colors(length(unique(c(data)))), main="", ribbon=TRUE)
A marked point pattern dataset about four rainforest tree species: Acalypha diversifolia, Chamguava schippii, Inga pezizifera and Rinorea sylvatica
raintrees
raintrees
A ppp
object (see package spatstat
) with 7251 points, containing:
An object of type owin
(see package spatstat
), the 1000x500 metres observation area
Numeric vectors with points' coordinates
A character vector matching the tree species to the data points
This dataset documents the presence of tree species over Barro Colorado Island, Panama.
Barro Colorado Island has been the focus of intensive research on lowland tropical
rainforest since 1923 (http://www.ctfs.si.edu). Research identified several tree species
over a rectangular observation window of size 1000x500 metres; the tree species
constitute the point data categorical mark. This dataset presents 4 species with
different spatial configurations: Acalypha diversifolia, Chamguava schippii,
Inga pezizifera and Rinorea sylvatica. The overall dataset has a total number of 7251 points.
The dataset is analyzed with spatial entropy measures in Altieri et al (2018) (references can be
found at SpatEntropy
).
http://www.ctfs.si.edu
data(raintrees) #plot(raintrees, main="", pch=16, cols=1:4) #shannon's entropy of the four trees shannon(raintrees) #shannon's entropy of Z (tree pairs) shannonZ(raintrees) #leibovici's entropy raintrees$window #to check size and unit of measurement #example run on a subset of the data to speed up computations subdata=raintrees[owin(c(0,200),c(0,100))]; plot(subdata) outp=leibovici(subdata, ccdist=10, verbose=TRUE) #do not worry about warnings like "data contain duplicated points": since #coordinates are rounded to the first decimal place, it looks like #some trees are overlapping when they are just very close. #Entropy computation works properly anyway #altieri's entropy #example run on a subset of the data to speed up computations subdata=raintrees[owin(c(0,200),c(0,100))]; plot(subdata) outp=altieri(subdata, distbreak=c(1,2,5,10), verbose=TRUE) #batty's entropy #on all points, with a random partition in 10 sub-areas batty.ent=batty(unmark(raintrees), partition=10) #plot with partition #plot(unmark(raintrees), pch=16, cex=0.6, main="") #plot(batty.ent$area.tess, add=TRUE, border=2, lwd=2) #on a specific tree species, with a random partition in 6 sub-areas unique(marks(raintrees)) #to check the species' names #plot(split.ppp(raintrees), main="") #to plot by species batty.ent=batty(raintrees, category="cha2sc", partition=6) #plot with partition #plot(split.ppp(raintrees)$cha2sc, pch=16, cex=0.6, main="") #plot(batty.ent$area.tess, add=TRUE, border=2) #batty's entropy with a partition based on the covariate, #exploiting spatstat functions data(raintreesCOV) #plot(raintreesCOV$grad, main="", col=gray(seq(1,0,l=100))) data=split.ppp(raintrees)$acaldi #plot(data, add=TRUE, pch=16, cex=0.6, main="") #discretize the covariate slopecut=cut(raintreesCOV$grad, breaks = quantile(raintreesCOV$grad, probs = (0:4)/4), labels = 1:4) maskv=tiles=list() for(ii in 1:nlevels(slopecut)) { maskv[[ii]]=as.logical(c(slopecut$v)==levels(slopecut)[ii]) tiles[[ii]]=owin(xrange=data$window$xrange, yrange=data$window$yrange, mask=matrix(maskv[[ii]],nrow(slopecut$v))) } slopetess=list(tiles=tiles, n=nlevels(slopecut)) #plot(slopecut, main = "", col=gray(seq(1,0.4,l=4))) #plot(data, add=TRUE, pch=16, cex=0.6, main="", col=1) batty(data, partition=slopetess) #karlstrom and ceccato's entropy #on a specific tree species, with a random partition in 6 sub-areas unique(marks(raintrees)) #to check the species' names #plot(split.ppp(raintrees), main="") #to plot by species KC.ent=karlstrom(raintrees, category="rinosy", partition=6, neigh=3) #plot with partition #plot(split.ppp(raintrees)$rinosy, pch=16, cex=0.6, main="") #plot(KC.ent$area.tess, add=TRUE, border=2)
data(raintrees) #plot(raintrees, main="", pch=16, cols=1:4) #shannon's entropy of the four trees shannon(raintrees) #shannon's entropy of Z (tree pairs) shannonZ(raintrees) #leibovici's entropy raintrees$window #to check size and unit of measurement #example run on a subset of the data to speed up computations subdata=raintrees[owin(c(0,200),c(0,100))]; plot(subdata) outp=leibovici(subdata, ccdist=10, verbose=TRUE) #do not worry about warnings like "data contain duplicated points": since #coordinates are rounded to the first decimal place, it looks like #some trees are overlapping when they are just very close. #Entropy computation works properly anyway #altieri's entropy #example run on a subset of the data to speed up computations subdata=raintrees[owin(c(0,200),c(0,100))]; plot(subdata) outp=altieri(subdata, distbreak=c(1,2,5,10), verbose=TRUE) #batty's entropy #on all points, with a random partition in 10 sub-areas batty.ent=batty(unmark(raintrees), partition=10) #plot with partition #plot(unmark(raintrees), pch=16, cex=0.6, main="") #plot(batty.ent$area.tess, add=TRUE, border=2, lwd=2) #on a specific tree species, with a random partition in 6 sub-areas unique(marks(raintrees)) #to check the species' names #plot(split.ppp(raintrees), main="") #to plot by species batty.ent=batty(raintrees, category="cha2sc", partition=6) #plot with partition #plot(split.ppp(raintrees)$cha2sc, pch=16, cex=0.6, main="") #plot(batty.ent$area.tess, add=TRUE, border=2) #batty's entropy with a partition based on the covariate, #exploiting spatstat functions data(raintreesCOV) #plot(raintreesCOV$grad, main="", col=gray(seq(1,0,l=100))) data=split.ppp(raintrees)$acaldi #plot(data, add=TRUE, pch=16, cex=0.6, main="") #discretize the covariate slopecut=cut(raintreesCOV$grad, breaks = quantile(raintreesCOV$grad, probs = (0:4)/4), labels = 1:4) maskv=tiles=list() for(ii in 1:nlevels(slopecut)) { maskv[[ii]]=as.logical(c(slopecut$v)==levels(slopecut)[ii]) tiles[[ii]]=owin(xrange=data$window$xrange, yrange=data$window$yrange, mask=matrix(maskv[[ii]],nrow(slopecut$v))) } slopetess=list(tiles=tiles, n=nlevels(slopecut)) #plot(slopecut, main = "", col=gray(seq(1,0.4,l=4))) #plot(data, add=TRUE, pch=16, cex=0.6, main="", col=1) batty(data, partition=slopetess) #karlstrom and ceccato's entropy #on a specific tree species, with a random partition in 6 sub-areas unique(marks(raintrees)) #to check the species' names #plot(split.ppp(raintrees), main="") #to plot by species KC.ent=karlstrom(raintrees, category="rinosy", partition=6, neigh=3) #plot with partition #plot(split.ppp(raintrees)$rinosy, pch=16, cex=0.6, main="") #plot(KC.ent$area.tess, add=TRUE, border=2)
A marked point pattern dataset about four rainforest tree species: Astronium graveolens, Beilschmiedia pendula, Heisteria concinna and Inga sapindoides.
raintrees2
raintrees2
A ppp
object (see package spatstat
) with 5639 points, containing:
An object of type owin
(see package spatstat
), the 1000x500 metres observation area
Numeric vectors with points' coordinates
A character vector matching the tree species to the data points
This dataset documents the presence of tree species over Barro Colorado Island, Panama. Barro Colorado Island has been the focus of intensive research on lowland tropical rainforest since 1923 (http://www.ctfs.si.edu). Research identified several tree species over a rectangular observation window of size 1000x500 metres; the tree species constitute the point data categorical mark. This dataset presents 4 species with different spatial configurations: Astronium graveolens, Beilschmiedia pendula, Heisteria concinna and Inga sapindoides. The overall dataset has a total number of 5639 points.
http://www.ctfs.si.edu
data(raintrees2) #plot(raintrees2, main="", pch=16, cols=1:4) #shannon's entropy of the four trees shannon(raintrees2) #shannon's entropy of Z (tree pairs) shannonZ(raintrees2) #leibovici's entropy raintrees2$window #to check size and unit of measurement #example run on a subset of the data to speed up computations subdata=raintrees2[owin(c(0,200),c(0,100))]; plot(subdata) outp=leibovici(subdata, ccdist=10, verbose=TRUE) #do not worry about warnings like "data contain duplicated points": since #coordinates are rounded to the first decimal place, it looks like #some trees are overlapping when they are just very close. #Entropy computation works properly anyway #altieri's entropy #example run on a subset of the data to speed up computations subdata=raintrees2[owin(c(0,200),c(0,100))]; plot(subdata) outp=altieri(subdata, distbreak=c(1,2,5,10), verbose=TRUE) #batty's entropy #on all points, with a random partition in 10 sub-areas batty.ent=batty(unmark(raintrees2), partition=10) #plot with partition #plot(unmark(raintrees2), pch=16, cex=0.6, main="") #plot(batty.ent$area.tess, add=TRUE, border=2, lwd=2) #on a specific tree species, with a random partition in 6 sub-areas unique(marks(raintrees2)) #to check the species' names #plot(split.ppp(raintrees2), main="") #to plot by species batty.ent=batty(raintrees2, category=levels(marks(raintrees2))[1], partition=6) #plot with partition #plot(split.ppp(raintrees2)[[1]], pch=16, cex=0.6, main="") #plot(batty.ent$area.tess, add=TRUE, border=2) #batty's entropy with a partition based on the covariate, #exploiting spatstat functions data(raintreesCOV) #plot(raintreesCOV$grad, main="", col=gray(seq(1,0,l=100))) data=split.ppp(raintrees2)[[1]] #plot(data, add=TRUE, pch=16, cex=0.6, main="") #discretize the covariate slopecut=cut(raintreesCOV$grad, breaks = quantile(raintreesCOV$grad, probs = (0:4)/4), labels = 1:4) maskv=tiles=list() for(ii in 1:nlevels(slopecut)) { maskv[[ii]]=as.logical(c(slopecut$v)==levels(slopecut)[ii]) tiles[[ii]]=owin(xrange=data$window$xrange, yrange=data$window$yrange, mask=matrix(maskv[[ii]],nrow(slopecut$v))) } slopetess=list(tiles=tiles, n=nlevels(slopecut)) #plot(slopecut, main = "", col=gray(seq(1,0.4,l=4))) #plot(data, add=TRUE, pch=16, cex=0.6, main="", col=1) batty(data, partition=slopetess) #karlstrom and ceccato's entropy #on a specific tree species, with a random partition in 6 sub-areas unique(marks(raintrees2)) #to check the species' names #plot(split.ppp(raintrees2), main="") #to plot by species KC.ent=karlstrom(raintrees2, category=levels(marks(raintrees2))[2], partition=6, neigh=3) #plot with partition #plot(split.ppp(raintrees2)[[2]], pch=16, cex=0.6, main="") #plot(KC.ent$area.tess, add=TRUE, border=2)
data(raintrees2) #plot(raintrees2, main="", pch=16, cols=1:4) #shannon's entropy of the four trees shannon(raintrees2) #shannon's entropy of Z (tree pairs) shannonZ(raintrees2) #leibovici's entropy raintrees2$window #to check size and unit of measurement #example run on a subset of the data to speed up computations subdata=raintrees2[owin(c(0,200),c(0,100))]; plot(subdata) outp=leibovici(subdata, ccdist=10, verbose=TRUE) #do not worry about warnings like "data contain duplicated points": since #coordinates are rounded to the first decimal place, it looks like #some trees are overlapping when they are just very close. #Entropy computation works properly anyway #altieri's entropy #example run on a subset of the data to speed up computations subdata=raintrees2[owin(c(0,200),c(0,100))]; plot(subdata) outp=altieri(subdata, distbreak=c(1,2,5,10), verbose=TRUE) #batty's entropy #on all points, with a random partition in 10 sub-areas batty.ent=batty(unmark(raintrees2), partition=10) #plot with partition #plot(unmark(raintrees2), pch=16, cex=0.6, main="") #plot(batty.ent$area.tess, add=TRUE, border=2, lwd=2) #on a specific tree species, with a random partition in 6 sub-areas unique(marks(raintrees2)) #to check the species' names #plot(split.ppp(raintrees2), main="") #to plot by species batty.ent=batty(raintrees2, category=levels(marks(raintrees2))[1], partition=6) #plot with partition #plot(split.ppp(raintrees2)[[1]], pch=16, cex=0.6, main="") #plot(batty.ent$area.tess, add=TRUE, border=2) #batty's entropy with a partition based on the covariate, #exploiting spatstat functions data(raintreesCOV) #plot(raintreesCOV$grad, main="", col=gray(seq(1,0,l=100))) data=split.ppp(raintrees2)[[1]] #plot(data, add=TRUE, pch=16, cex=0.6, main="") #discretize the covariate slopecut=cut(raintreesCOV$grad, breaks = quantile(raintreesCOV$grad, probs = (0:4)/4), labels = 1:4) maskv=tiles=list() for(ii in 1:nlevels(slopecut)) { maskv[[ii]]=as.logical(c(slopecut$v)==levels(slopecut)[ii]) tiles[[ii]]=owin(xrange=data$window$xrange, yrange=data$window$yrange, mask=matrix(maskv[[ii]],nrow(slopecut$v))) } slopetess=list(tiles=tiles, n=nlevels(slopecut)) #plot(slopecut, main = "", col=gray(seq(1,0.4,l=4))) #plot(data, add=TRUE, pch=16, cex=0.6, main="", col=1) batty(data, partition=slopetess) #karlstrom and ceccato's entropy #on a specific tree species, with a random partition in 6 sub-areas unique(marks(raintrees2)) #to check the species' names #plot(split.ppp(raintrees2), main="") #to plot by species KC.ent=karlstrom(raintrees2, category=levels(marks(raintrees2))[2], partition=6, neigh=3) #plot with partition #plot(split.ppp(raintrees2)[[2]], pch=16, cex=0.6, main="") #plot(KC.ent$area.tess, add=TRUE, border=2)
A list of two pixel images with covariates altitude and soil slope for the rainforest tree data 1 and 2, i.e. raintrees and raintrees2
raintreesCOV
raintreesCOV
A list
of two elements:
An object of type im
(see package spatstat
), the soil elevation
An object of type im
(see package spatstat
), the soil slope (gradient of elevation)
For details of the point datasets, see raintrees and raintrees2. This accompanying dataset
gives information about the elevation in the study region.
It is a list containing two pixel images, elev (elevation in metres)
and grad (norm of elevation gradient). These pixel images are objects of class im
.
Covariate values are continuous. Once discretized as wished, they can turn into categorical datasets
for the computation of all entropy measures. Moreover, they can be used
to build sensible sub-areas for Batty's and Karlstrom and Ceccato's entropies
(see the examples).
http://www.ctfs.si.edu
data(raintreesCOV) plot(raintreesCOV, main="")
data(raintreesCOV) plot(raintreesCOV, main="")
This function computes Shannon's entropy of a variable with a finite number of categories. Shannon's entropy is a non-spatial measure.
shannon(data)
shannon(data)
data |
A data matrix or vector, can be numeric, factor, character, ...
Alternatively, a marked |
Shannon's entropy measures the heterogeneity of a set of categorical data. It is computed as
where is the
probability of occurrence of the
-th category, here estimated, as usual, by its relative
frequency. This is both the non parametric and the maximum likelihood estimator for entropy.
Shannon's entropy varies between 0 and
,
being the
number of categories of the variable under study. The relative version of Shannon's entropy, i.e. the entropy divided by
, is also computed, under the assumption that all data categories are present in the dataset.
The relative entropy is useful for comparison across datasets with differen
.
The function is able to work with lattice data with missing data, as long as they are specified as NAs:
missing data are ignored in the computations.
a list of four elements:
shann
Shannon's entropy
range
The theoretical range of Shannon's entropy, from 0 to
rel.shann
Shannon's relative entropy
probabilities
a table with absolute frequencies and estimated probabilities (relative frequencies) for all data categories
#NON SPATIAL DATA shannon(sample(1:5, 50, replace=TRUE)) #POINT DATA #requires marks with a finite number of categories data.pp=runifpoint(100, win=square(10)) marks(data.pp)=sample(c("a","b","c"), 100, replace=TRUE) shannon(marks(data.pp)) #LATTICE DATA data.lat=matrix(sample(c("a","b","c"), 100, replace=TRUE), nrow=10) shannon(data.lat)
#NON SPATIAL DATA shannon(sample(1:5, 50, replace=TRUE)) #POINT DATA #requires marks with a finite number of categories data.pp=runifpoint(100, win=square(10)) marks(data.pp)=sample(c("a","b","c"), 100, replace=TRUE) shannon(marks(data.pp)) #LATTICE DATA data.lat=matrix(sample(c("a","b","c"), 100, replace=TRUE), nrow=10) shannon(data.lat)
.This function computes Shannon's entropy of variable ,
where
identifies pairs of realizations of the variable of interest.
shannonZ(data)
shannonZ(data)
data |
A data matrix or vector, can be numeric, factor, character, ...
Alternatively, a marked |
Many spatial entropy indices are based on the trasformation of the study variable,
i.e. on pairs (unordered couples) of realizations of the variable of interest. 'Unordered couples'
means that the relative spatial location is irrelevant, i.e. that a couple
where category
occurs at the left of category
is identical to a couple
where category
occurs at the left of category
.
When all possible pairs occurring within the observation areas are considered,
Shannon's entropy of the variable
may be computed as
where is the probability of the
-th pair of realizations, here
estimated by its relative frequency.
Shannon's entropy of
varies between 0 and
,
(where
is the number of observations) being the
number of possible pairs of categories of the variable under study.
The function is able to work with lattice data with missing data, as long as they are specified as NAs:
missing data are ignored in the computations.
a list of three elements:
shannZ
Shannon's entropy of
range
The theoretical range of Shannon's entropy of ,
from 0 to
rel.shannZ
Shannon's relative entropy of
probabilities
a table with absolute frequencies and estimated probabilities (relative frequencies) for all categories (data pairs)
#NON SPATIAL DATA shannonZ(sample(1:5, 50, replace=TRUE)) #POINT DATA data.pp=runifpoint(100, win=square(10)) marks(data.pp)=sample(c("a","b","c"), 100, replace=TRUE) shannonZ(marks(data.pp)) #LATTICE DATA data.lat=matrix(sample(c("a","b","c"), 100, replace=TRUE), nrow=10) shannonZ(data.lat)
#NON SPATIAL DATA shannonZ(sample(1:5, 50, replace=TRUE)) #POINT DATA data.pp=runifpoint(100, win=square(10)) marks(data.pp)=sample(c("a","b","c"), 100, replace=TRUE) shannonZ(marks(data.pp)) #LATTICE DATA data.lat=matrix(sample(c("a","b","c"), 100, replace=TRUE), nrow=10) shannonZ(data.lat)
The heterogeneity of spatial data presenting a finite number of categories can be measured via computation of spatial entropy. Functions are available for the computation of the main entropy and spatial entropy measures in the literature. They include the traditional version of Shannon's entropy, Batty's spatial entropy, O'Neill's entropy, Li and Reynolds' contagion index, Karlstrom and Ceccato's entropy, Leibovici's entropy, Parresol and Edwards' entropy and Altieri's entropy. The package is able to work with lattice and point data. A step-by-step guide for new users can be found in the first referenced article.
References:
ALTIERI L., COCCHI D., ROLI G. (2021). Spatial entropy for biodiversity and environmental data: The R-package SpatEntropy. Environmental Modelling and Software
ALTIERI L., COCCHI D., ROLI G. (2019). Advances in spatial entropy measures. Stochastic Environmental Research and Risk Assessment
ALTIERI L., COCCHI D., ROLI G. (2019). Measuring heterogeneity in urban expansion via spatial entropy. Environmetrics, 30(2), e2548
ALTIERI L., COCCHI D., ROLI G. (2018). A new approach to spatial entropy measures. Environmental and Ecological Statistics, 25(1), 95-110
Altieri, L., D. Cocchi, and G. Roli (2017). The use of spatial information in entropy measures. arXiv:1703.06001
Batty, M. (1974). Spatial entropy. Geographical Analysis 6, 1-31.
Batty, M. (1976). Entropy in spatial aggregation. Geographical Analysis 8, 1-21.
EEA (2011). Corine land cover 2000 raster data. Technical Report, downloadable at http://www.eea.europa.eu/data-and-maps/ data/corine-land-cover-2000-raster-1.
Karlstrom, A. and V. Ceccato (2002). A new information theoretical measure of global and local spatial association. The Review of Regional Research 22, 13-40.
Leibovici, D. (2009). Defining spatial entropy from multivariate distributions of co-occurrences. Berlin, Springer: In K. S. Hornsby et al. (eds.): 9th International Conference on Spatial Information Theory 2009, Lecture Notes in Computer Science 5756, 392-404.
Li, H. and J. Reynolds (1993). A new contagion index to quantify spatial patterns of landscapes. Landscape Ecology 8(3), 155-162.
O'Neill, R., J. Krummel, R. Gardner, G. Sugihara, B. Jackson, D. DeAngelis, B. Milne, M. Turner, B. Zygmunt, S. Christensen, V. Dale, and R. Graham (1988). Indices of landscape pattern. Landscape Ecology 1(3), 153-162.
Parresol, B. and L. Edwards (2014). An entropy-based contagion index and its sampling properties for landscape analysis. Entropy 16(4), 1842-1859.
Shannon, C. (1948). A mathematical theory of communication. Bell Dyditem Technical Journal 27, 379-423, 623-656.
A lattice dataset with Turin's Urban Morphological Zones (UMZ, see EEA, 2011).
turin
turin
A matrix
with 111 rows and 113 columns. Values are either 0 (non-urban) or 1 (urban). Pixels
outside the administrative borders are classified as NA.
This raster/pixel/lattice dataset comes from the EU CORINE Land Cover project (EEA, 2011) and is dated 2011. It is the result of classifying the original land cover data into urbanised and non-urbanised zones, known as 'Urban Morphological Zones' (UMZ, see EEA, 2011). UMZ data are useful to identify shapes and patterns of urban areas, and thus to detect what is known as urban sprawl. Turin's metropolitan area is extracted from the European dataset and is composed by the municipality of Turin and the surrounding municipalities: Beinasco, Venaria Reale, San Mauro Torinese, Grugliasco, Borgaro Torinese, Collegno, Pecetto Torinese, Pino Torinese, Moncalieri, Nichelino, Settimo Torinese, Baldissero Torinese, Rivoli, Orbassano. The dataset is made of 111x113 pixels of size 250x250 metres.
EEA (2011). Corine land cover 2000 raster data. Technical Report, downloadable at http://www.eea.europa.eu/data-and-maps/ data/corine-land-cover-2000-raster-1
data(turin) #plot(as.im(turin), main="", col=gray(c(0.8,0)), ribbon=FALSE) #shannon's entropy shannon(turin) #shannon's entropy of Z (urban/non-urban pairs) shannonZ(turin) #oneill's entropy oneill(turin) #leibovici's entropy only on Collegno's municipality data(turinTess) cell.size=250; ncl=ncol(turin); nrw=nrow(turin) coords=expand.grid(rev(seq(cell.size/2, (nrw*cell.size-cell.size/2), l=nrw)), seq(cell.size/2, (ncl*cell.size-cell.size/2), l=ncl)) data.pp=ppp(x=coords[which(!is.na(c(turin))),2], y=coords[which(!is.na(c(turin))),1], window=owin(xrange=c(0, ncl*cell.size), yrange=c(0,nrw*cell.size)), marks=c(turin)[which(!is.na(c(turin)))]) data=data.pp[turinTess$tiles[[which(turinTess$names=="Collegno")]]] #plot(data, pch=16, cex=0.4) outp=leibovici(data, cell.size=250, ccdist=400, verbose=TRUE) #altieri's entropy only on Collegno's municipality outp=altieri(data, cell.size=250, distbreak=c(cell.size, 2*cell.size), verbose=TRUE) #batty's entropy #on all points, with a random partition in 10 sub-areas batty.ent=batty(turin, cell.size=250, partition=10) #plot with partition data(turinW) #plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #plot(batty.ent$area.tess, add=TRUE, border=2) #batty's entropy with a partition based on the administrative areas data(turinTess) batty.ent=batty(turin, cell.size=250, partition=turinTess) #plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #for(i in 1:turinTess$n) plot(turinTess$tiles[[i]], add=TRUE, border=2) #karlstrom and ceccato's entropy data(turinW) KC.ent=karlstrom(turin, cell.size=250, partition=15, neigh=3) #plot with partition #plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #plot(KC.ent$area.tess, add=TRUE, border=2) #karlstrom and ceccato's entropy with a partition based on the administrative areas data(turinTess) KC.ent=karlstrom(turin, cell.size=250, partition=turinTess, neigh=5000, method="distance") #plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #for(i in 1:turinTess$n) plot(turinTess$tiles[[i]], add=TRUE, border=2)
data(turin) #plot(as.im(turin), main="", col=gray(c(0.8,0)), ribbon=FALSE) #shannon's entropy shannon(turin) #shannon's entropy of Z (urban/non-urban pairs) shannonZ(turin) #oneill's entropy oneill(turin) #leibovici's entropy only on Collegno's municipality data(turinTess) cell.size=250; ncl=ncol(turin); nrw=nrow(turin) coords=expand.grid(rev(seq(cell.size/2, (nrw*cell.size-cell.size/2), l=nrw)), seq(cell.size/2, (ncl*cell.size-cell.size/2), l=ncl)) data.pp=ppp(x=coords[which(!is.na(c(turin))),2], y=coords[which(!is.na(c(turin))),1], window=owin(xrange=c(0, ncl*cell.size), yrange=c(0,nrw*cell.size)), marks=c(turin)[which(!is.na(c(turin)))]) data=data.pp[turinTess$tiles[[which(turinTess$names=="Collegno")]]] #plot(data, pch=16, cex=0.4) outp=leibovici(data, cell.size=250, ccdist=400, verbose=TRUE) #altieri's entropy only on Collegno's municipality outp=altieri(data, cell.size=250, distbreak=c(cell.size, 2*cell.size), verbose=TRUE) #batty's entropy #on all points, with a random partition in 10 sub-areas batty.ent=batty(turin, cell.size=250, partition=10) #plot with partition data(turinW) #plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #plot(batty.ent$area.tess, add=TRUE, border=2) #batty's entropy with a partition based on the administrative areas data(turinTess) batty.ent=batty(turin, cell.size=250, partition=turinTess) #plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #for(i in 1:turinTess$n) plot(turinTess$tiles[[i]], add=TRUE, border=2) #karlstrom and ceccato's entropy data(turinW) KC.ent=karlstrom(turin, cell.size=250, partition=15, neigh=3) #plot with partition #plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #plot(KC.ent$area.tess, add=TRUE, border=2) #karlstrom and ceccato's entropy with a partition based on the administrative areas data(turinTess) KC.ent=karlstrom(turin, cell.size=250, partition=turinTess, neigh=5000, method="distance") #plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE) #for(i in 1:turinTess$n) plot(turinTess$tiles[[i]], add=TRUE, border=2)
City borders of all municipalities included in the Turin dataset, in the format of polyognal windows owin
objects.
turinTess
turinTess
A list
of three:
tiles
a list
of 15, each element is a owin
object with the
administrative border of one municipality of Turin's dataset
n
the number of municipalities
names
the names of the 15 municipalities, in the same order as the windows
The object contains a list of 15 observation windows created as owin
objects based on the coordinates of the border polygons, for each municipality.
See ?owin
for details.
The object also contains the names of the municipalities, in Italian.
Examples on the usefulness of the administrative borders can be found at the topic turin.
EEA (2011). Corine land cover 2000 raster data. Technical Report, downloadable at http://www.eea.europa.eu/data-and-maps/ data/corine-land-cover-2000-raster-1
data(turin); data(turinW); data(turinTess) plot(turinW, col=c("black", "white"), main="") plot(turinTess$tiles[[1]],border=2, add=TRUE, lwd=2) for(ll in 2:turinTess$n) plot(turinTess$tiles[[ll]],border=2, add=TRUE, lwd=2) plot(as.im(turin, W=turinW), main="", col=gray(c(0.85,0.4)), ribbon=FALSE) plot(turinTess$tiles[[1]],border=1, add=TRUE, lwd=2) for(ll in 2:turinTess$n) plot(turinTess$tiles[[ll]],border=1, add=TRUE, lwd=2) #see examples under the topic "turin"
data(turin); data(turinW); data(turinTess) plot(turinW, col=c("black", "white"), main="") plot(turinTess$tiles[[1]],border=2, add=TRUE, lwd=2) for(ll in 2:turinTess$n) plot(turinTess$tiles[[ll]],border=2, add=TRUE, lwd=2) plot(as.im(turin, W=turinW), main="", col=gray(c(0.85,0.4)), ribbon=FALSE) plot(turinTess$tiles[[1]],border=1, add=TRUE, lwd=2) for(ll in 2:turinTess$n) plot(turinTess$tiles[[ll]],border=1, add=TRUE, lwd=2) #see examples under the topic "turin"
An owin
object with the city border for the Turin dataset.
turinW
turinW
An owin
object. Units are given in metres; the basic image unit is a 250x250 metres pixels.
This observation window is an owin
object created as a binary mask. See ?owin
for details.
Examples on the usefulness of the window can be found at the topic turin.
EEA (2011). Corine land cover 2000 raster data. Technical Report, downloadable at http://www.eea.europa.eu/data-and-maps/ data/corine-land-cover-2000-raster-1
data(turinW) plot(turinW, col=c("red", "white"), main="") plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE, add=TRUE) #see examples under the topic "turin"
data(turinW) plot(turinW, col=c("red", "white"), main="") plot(as.im(turin, W=turinW), main="", col=gray(c(0.8,0)), ribbon=FALSE, add=TRUE) #see examples under the topic "turin"
This function estimates the variance of Shannon's entropy of a variable .
varshannon(data)
varshannon(data)
data |
A data matrix or vector, can be numeric, factor, character, ...
Alternatively, a marked |
varshannon estimates the variance of the maximum likelihood estimator of Shannon's entropy given by shannon. The variance is
, where is
a version of Shannon's entropy (see shannon) where
the information function
is squared:
. The function is able to work with lattice data with missing data, as long as they are specified as NAs: missing data are ignored in the computations.
the estimated variance of Shannon's entropy.
#NON SPATIAL DATA varshannon(sample(1:5, 50, replace=TRUE)) #POINT DATA data.pp=runifpoint(100, win=square(10)) marks(data.pp)=sample(c("a","b","c"), 100, replace=TRUE) varshannon(marks(data.pp)) #LATTICE DATA data.lat=matrix(sample(c("a","b","c"), 100, replace=TRUE), nrow=10) varshannon(data.lat)
#NON SPATIAL DATA varshannon(sample(1:5, 50, replace=TRUE)) #POINT DATA data.pp=runifpoint(100, win=square(10)) marks(data.pp)=sample(c("a","b","c"), 100, replace=TRUE) varshannon(marks(data.pp)) #LATTICE DATA data.lat=matrix(sample(c("a","b","c"), 100, replace=TRUE), nrow=10) varshannon(data.lat)
.This function estimates the variance of Shannon's entropy of , where
identifies pairs of categories of the original study variable.
varshannonZ(data)
varshannonZ(data)
data |
A data matrix or vector, can be numeric, factor, character, ...
Alternatively, a marked |
varshannonZ estimates the
variance of the maximum likelihood estimator of Shannon's entropy of given by
shannonZ. The variance is
, where
. The function is able to work with lattice data with missing data, as long as they are specified as NAs: missing data are ignored in the computations.
the estimated variance of Shannon's entropy of .
#NON SPATIAL DATA data=sample(1:5, 50, replace=TRUE) varshannonZ(data) #POINT DATA data.pp=runifpoint(100, win=square(10)) marks(data.pp)=sample(c("a","b","c"), 100, replace=TRUE) varshannonZ(marks(data.pp)) #LATTICE DATA data.lat=matrix(sample(c("a","b","c"), 100, replace=TRUE), nrow=10) varshannonZ(data.lat)
#NON SPATIAL DATA data=sample(1:5, 50, replace=TRUE) varshannonZ(data) #POINT DATA data.pp=runifpoint(100, win=square(10)) marks(data.pp)=sample(c("a","b","c"), 100, replace=TRUE) varshannonZ(marks(data.pp)) #LATTICE DATA data.lat=matrix(sample(c("a","b","c"), 100, replace=TRUE), nrow=10) varshannonZ(data.lat)