--- title: "Spatial Autocorrelation" author: "Stamatis Kalogirou" date: "`r Sys.Date()`" output: pdf_document vignette: > %\VignetteIndexEntry{Spatial Autocorrelation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction ## Exploratory Spatial Data Analysis (ESDA) & Spatial autocorrelation Spatial autocorrelation is a method of Exploratory Spatial Data Analysis (ESDA). The latter set of methods allow for the study and understanding of the spatial distribution and spatial structure as well as they allow for detecting spatial dependence or autocorrelation in spatial data. More specifically, spatial autocorrelation is the correlation between the values of a single variable that is strictly due to the proximity of these values in geographical space by introducing a deviation from the assumption of independent observations of classical statistics (Griffith, 2003). The most common spatial autocorrelation indicators in the literature are: the Moran's *I*, the Geary's *c*, and the Getis' *G*. ## Moran's I Moran's $I$ is one of the oldest statistics used to examine spatial autocorrelation. This global statistic was first proposed by Moran (1948, 1950). Later, Cliff and Ord (1973, 1981) present a comprehensive work on spatial autocorrelation and suggested a formula to calculate the $I$ which is now used in most textbooks and software: $I = \frac{n}{W} \frac{\sum_{i=1}^{n} \sum_{j=1}^{n} w_{ij} z_i z_j} {\sum_{i=1}^{n} z_i^2}$ where $n$ is number of observations, $W$ is the sum of the weights $w_{ij}$ for all pairs in the system, $z_i=x_i - \bar{x}$ where $x$ is the value of the variable at location $i$ and $\bar{x}$ the mean value of the variable in question (Eq. 5.2 Kalogirou, 2003). The original implementation here (function `moransI`) allows only nearest neighbour weighting schemes. However, the function `moransI.w` allows for any weighting scheme if by means of a weighting matrix. Using the package _lctools_, the latter can be computed by the function `w.matrix` that currently supports a fixed (straight distance) and an adaptive (number of nearest neighbours) spatial kernels. Resampling and randomization null hypotheses have been tested following the discussion of Goodchild (1986, pp. 24-26). # Exploring the data The _lctools_ package has some built in data to allow for practicing the various spatial analysis techniques. One of the datasets is `GR.Municipalities`. The latter is a SpatialPolygonsDataFrame that refers to the 325 Municipalities in Greece (Programme Kallikratis) in 2011. The descriptive data for each municipality include demographic and economic variables the source of which are the 2001 Census of Population and the General Secretariat for Information Systems, respectively. ```{r message=FALSE} library(lctools) data(GR.Municipalities) names(GR.Municipalities@data) ``` To learn more about the above data set, try `help(GR.Municipalities)`. # Calculate the global Moran's I To calculate the global Moran's I statistic one can: 1. make use of the function `MoransI` with three arguments: the coordinates of the observations, the number of nearest neighbours and the variable for which the statistic will be calculated for, or 2. calculate the weights matrix using the function `w.matrix` (arguments being the coordinates of the observations, the bandwidth -fixed or adaptive, and the weighting scheme) and then use the function `moransI.w` with two arguments: the weights matrix and the variable in question. The coordinates refer to the geometric centro-ids of the municipalities in Greece. In the next two examples, the number of nearest neighbours is set to 6 and the straight distance bandwidth to 50 kilometres. The selected weights are binary ($w_{ij}$=1 for neighbours and =0 for non-neighbours. The variable to be analysed refers to the mean annual recorded income in 2001 (in Euros). ## Case 1: function _MoransI_ ```{r} Coords <- cbind(GR.Municipalities@data$X, GR.Municipalities@data$Y) bw <- 6 mI <- moransI(Coords,bw,GR.Municipalities@data$Income01) moran.table <- matrix(data=NA,nrow=1,ncol=6) col.names <- c("Moran's I", "Expected I", "Z resampling", "P-value resampling", "Z randomization", "P-value randomization") colnames(moran.table) <- col.names moran.table[1,1] <- mI$Morans.I moran.table[1,2] <- mI$Expected.I moran.table[1,3] <- mI$z.resampling moran.table[1,4] <- mI$p.value.resampling moran.table[1,5] <- mI$z.randomization moran.table[1,6] <- mI$p.value.randomization ``` Using table formatting the results can be shown as follows: ```{r, echo=FALSE, results='asis'} knitr::kable(round(moran.table,5)) ``` ## Case 2: functions _w.matrix_ and _MoransI.w_ The above calculation can be repeated as follows: ```{r} #adaptive kernel w.adaptive <- w.matrix(Coords,6, WType='Binary', family='adaptive') mI.adaptive <- moransI.w(GR.Municipalities@data$Income01, w.adaptive) mI.adaptive <- t(as.numeric(as.matrix(mI.adaptive[1:6]))) colnames(mI.adaptive) <- col.names ``` Using table formatting the results can be shown as follows: ```{r, echo=FALSE, results='asis'} knitr::kable(round(mI.adaptive,5)) ``` An example Moran's I calculation with a fixed spatial kernel and a 50 km radius as the bandwidth can be computed as follows: ```{r} #fixed kernel w.fixed <- w.matrix(Coords, 50000, WType='Binary', family='fixed') mI.fixed<- moransI.w(GR.Municipalities@data$Income01, w.fixed) mI.fixed <- t(as.numeric(as.matrix(mI.fixed[1:6]))) colnames(mI.fixed) <- col.names ``` The above bandwidth is such that 5 observations will have no neighbours. This is a result of a check that is printed out from the `moransI.w` function. These observations refer to isolated islands, such as Megisti east of Rhodes, which is 150 km away from its nearest neighbour. Using table formatting the results can be shown as follows: ```{r, echo=FALSE, results='asis'} knitr::kable(round(mI.fixed ,5)) ``` The above results suggest a strong positive spatial autocorrelation that is statistically significant using either the randomization or resampling hypotheses (Cliff and Ord, 1973; 1981; Goodchild, 1986). In order to examine the sensitivity of the above results, one could try different bandwidth sizes (i.e. number of nearest neighbours). This can be done by either a loop or by using the function `moransI.v` that computes a number of Moran's I statistics with different kernel sizes of the same family and optionally shows a scatter plot with the Moran's I for each kernel size. ```{r fig.width = 5, fig.height = 5} bws <- c(3, 4, 6, 9, 12, 18, 24) moran <- moransI.v(Coords, bws, GR.Municipalities@data$Income01) ``` The Moran's I along with significance tests for all spatial kernels are presented in the table below. The Moran's I statistics for the mean annual recorded income is above 0.5 in all cases suggesting a strong positive spatial autocorrelation. They are also statistically significant in all cases. ```{r, echo=FALSE, results='asis'} knitr::kable(round(moran,4)) ``` # Local Moran's I The next step is the calculation of local Moran $I_i$ using the function `l.moransI`, which creates a Moran's I Scatter Plot by default, as follows: ```{r fig.width = 5, fig.height = 5} l.moran<-l.moransI(Coords,6,GR.Municipalities@data$Income01) ``` The object `l.moran` generated includes a plethora of results that can be plotted or mapped. # References Cliff, A.D., and Ord, J.K., 1973, Spatial autocorrelation (London: Pion). Cliff, A.D., and Ord, J.K., 1981, Spatial processes: models and applications (London: Pion). Goodchild, M. F., 1986, Spatial Autocorrelation. Catmog 47, Geo Books. Griffith, D.A. (2003). Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization. Berlin: Springer-Verlag. Kalogirou, S., 2003, The Statistical Analysis and Modelling of Internal Migration Flows within England and Wales, PhD Thesis, School of Geography, Politics and Sociology, University of Newcastle upon Tyne, UK. \url{http://gisc.gr/?mdocs-file=1245&mdocs-url=false} Kalogirou, S., 2015, Spatial Analysis: Methodology and Applications with R. [ebook] Athens: Hellenic Academic Libraries Link. ISBN: 978-960-603-285-1 (in Greek). \url{https://repository.kallipos.gr/handle/11419/5029?locale=en} Moran, P.A.P., 1948, The interpretation of statistical maps, Journal of the Royal Statistics Society, Series B (Methodological), 10, 2, pp. 243 - 251. Moran, P.A.P., 1950, Notes on continuous stochastic phenomena, Biometrika, 37, pp. 17 - 23.