--- title: "Introduction to labour market areas delineation and processing through the R package LabourMarketAreas" author: "Luisa Franconi, Daniela Ichim" date: "`r Sys.Date()`" output: rmarkdown::html_vignette always_allow_html: true vignette: > %\VignetteIndexEntry{LabourMarketAreas} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, message = FALSE, fig.align = 'center', fig.width = 7, comment = "#>", root.dir="/tmp" ) suppressMessages(library(data.table)) suppressMessages(library(sf)) suppressMessages(library(sp)) ``` ```{r,echo=FALSE} suppressPackageStartupMessages(library(sf)) suppressMessages(library(sf)) suppressPackageStartupMessages(library(sp)) suppressMessages(library(sp)) options(warn=-1) options(digits=2) options(rmarkdown.html_vignette.check_title=FALSE) options(big.mark="") library(data.table) library(sp) library(tmap) library(sf) setDTthreads(1) tmap_mode("view") ``` Labour market areas (LMAs) are sub-regional geographical areas where the bulk of the labour force lives and works, and where establishments can find the main part of the labour force necessary to occupy the offered jobs. They are functional regions that stem from the aggregation of elementary geographical units (municipalities, census output areas, etc.) on the basis of their level of spatial interaction measured by commuting to work flows through quantitative methods. The guiding idea is to maximise the flow inside the area (internal cohesion) and minimise it outside (external separation). The R package LabourMarketAreas includes a series of functions useful for the treatment of LMAs. The package contains tools for the creation, operationalisation, manipulation and dissemination of labour market areas. This vignette briefly illustrates the main stages of the labour market areas delimitation process. Users are invited to explore the other functions devoted to LMA management.


# 1. Introduction LMAs are clusters comprising two or more initial elementary units (communities) linked by commuting patterns between them. The key characteristics of LMAs is self-containment of commuting flows. An algorithm has been implemented in the package in order to create such clusters. The algorithm is an iterative agglomerative one that depends on a set of parameters. Such parameters set the level of desired self-containment and size of LMAs. It starts by considering each community as a cluster that is checked against a set of conditions to see whether it can be considered an LMA. At each iteration clusters that are not fit for the purpose are disaggregated and a single community inside the cluster is chosen to be attached to a new cluster in order to improve the set of given conditions. The final solution is obtained when the whole set of clusters satisfies the given conditions. The main ingredient of the algorithm are: 1. a set of parameters, 2. a function to decide when a cluster is “fit for the purpose”, 3. a measure to choose the cluster to be assigned to a selected community and 4. the steps of the iterative procedure. A little explanation on these components is the following: 1. The set of parameters, chosen by the user, identifies thresholds on the dimensions of the cluster to be created. The fist dimension is the size of the LMA, in terms of number of occupied residents; the second is the level of self-containment required in order for a cluster to be considered an LMA. To allow flexibility a trade-off is suggested between these two dimensions see Franconi et al. (2017) for further details; 2. A condition of validity states, through a quantification of a function based on the values of the parameters, whether a cluster of elementary units is an LMA; 3. A measure of cohesion between a community and all the clusters with whom such community has relationships; such measure identifies the cluster where the community will to be assigned: the one where the maximum is attained. 4. A reserve list (Coombes, 2014) comprising of communities which cannot be clearly assigned during the iterations of the algorithm; 5. An iterative procedure that selects a community at a time, aggregates it to a different cluster, and defines the order and the operations to be implemented. The R object *lma* is the main actor of the package. The description of this object necessarily needs information on all its dimensions: the initial units comprising it, the flows inward and outward and the summary of its size in terms of employees that reside and/or work there. The description of the process to delineate and produce a graphical representation of the LMA starts with the necessary input data, presented in Section 2. The core function of the package LabourMarketAreas - comprising all the elements listed above - is described in Section 3. The output of the main function provides information on all the dimensions of the LMAs, a recap on the parameters used and details of the process. Section 4 describe the functions for the production of maps. The quality assessment of the clusters is of extreme importance and functions devoted to its investigation are presented in Section 5. As the algorithm does not constrain on contiguity of comprising elements of each LMA, functions are present to check on fulfilment of such property and, whether needed, correct the composition; Section 6 presents this fine tuning stage of the process. In order to guide on the choice of initial parameters, Section 7 shows how to compare different partitions using implemented package functions. The vignette ends with examples of data aggregation and maps preparation through the package.


# 2. Input data There are two main data sets needed for the delimitation of the labour market areas: commuting flows and the shape files of the initial territorial partition (elementary geographical units). ## 2.1 Commuting flows Labour market areas are aggregations of basic territorial units. The latter are called communities. Labour market areas are built by aggregating communities; the aggregation process is driven by the commuting flows between communities. Thus, the commuting flows matrix is the main input for the delineation of the labour market areas. In the LabourMarketAreas package the commuting flows matrix is a data.table with three columns: the origin elementary geographical unit identifier, the destination identifier and the amount of commuting flow between origin and destination. The identifiers of the origin and the destination may be either numeric or character, but the amount is constrained to be a numeric variable. The examples in this vignette are based on the travel-to-work commuting flows between the municipalities of the Italian NUTS3 of Brindisi. The data stems from the 2001 Italian Population Census. ```{r eval=TRUE} library(LabourMarketAreas) data(Brindisi) #?Brindisi head(Brindisi) ``` ## 2.2 Shape files of the communities In order to visualize the obtained aggregations, the shape files of the initial territorial units are required. These objects should be loaded as an object of class sf (see st_read in the package lightgray">sf). Besides the commuting flows between Brindisi municipalities, the package LabourMarketAreas includes also the spatial information of this Italian region. ```{r eval=TRUE,fig.align='center'} #?shpBrindisi data("shpBrindisi") tm_shape(st_geometry(shpBrindisi))+tm_borders("black",alpha=0.5)+tm_fill("gray",alpha=0.2) ```


If available, the names/labels of the initial communities may be supplied to facilitate identification of the derived LMA partition. Although it is not a mandatory input for the LMA delimitation, the data.table containing the labels should have the following structure: the community identifiers *Code* and their labels *com.name*: ```{r eval=TRUE} data("names.Brindisi") head(names.Brindisi) ```

Please make sure that the initial communities' identifiers in the travel-to-work commuting matrix coincide with those present in the shape files and in the data.table containing the labels. The initial communities listed in the travel-to-work commuting matrix should be amongst the communities registered in the shape files and label data.frame. None of the communities should be identified or labeled by "0".


# 3. Delineation of Labour Market Areas The main function of the package LabourMarketAreas is **findClusters**. It implements the algorithm that iteratively aggregates the initial communities in order to fulfill the conditions set by the initial parameters. ```{r eval=F} ?findClusters ``` By means of the travel-to-work commuting flows contained in the $LWCom$ data.table, the algorithm iteratively aggregates the initial communities until convergence is reached. The algorithm creates a partition of the territory such that all areas satisfy the so called validity rule (convergence criteria). Such rule depends on area size, the number of commuters living in the area, and on self-containment, the proportion of the commuters not crossing area borders. In particular, the validity rule sets a trade-off between area size and its level of self-containment. The rationale is the smaller the area the higher the self-containment to be considered adequate. Lower self-containment values are instead acceptable for larger areas. These cut-off levels are defined by users through four parameters: *minSZ*, *tarSZ*, *minSC* and *tarSC*. - *minSZ* is the area acceptable minimum size - *tarSZ* is the size for an area to be considered large - *minSC* is the acceptable minimum level of selfcontainment for large areas - *tarSC* is the acceptable minimum level of self-containment for small areas.

The basic usage of the **findClusters** function is as follows: ```{r eval=T} out = findClusters(LWCom=Brindisi, minSZ=1000,minSC=0.6667,tarSZ=10000,tarSC=0.75, verbose=FALSE) ``` There are several additional arguments that render the function **findClusters** more flexible: - *idcom_type* may be used for switching from numeric to character type of the codes of the communities. - *PartialClusterData* may be used to start the aggregation of the communities from a given setting. - *verbose* and *sink.output* may be used to display some convergence information at each iteration. - *trace* may be used to save some intermediate results.

The output of the **findClusters** is a list with several components describing the derived partition, reserve list, the communities not assigned to any LMA and the isolated communities. ```{r eval=T} str(out) ``` The *lma* component of the output is a list of three data.table objects: 1. *clusterList* - it contains the allocation of each community to the corresponing labour market area together with the number of residents in each initial territorial unit ```{r eval=T} #str(out$lma) head(out$lma$clusterList) ``` 2. *LWClus* - it contains the commuting flows between the labour market areas ```{r eval=T} head(out$lma$LWClus) ``` 3. *marginals* - it contains the main characteristics of the labour market areas: - *amount_live* = number of employees who are residents in the LMA - *amount_work* = number of employees working in LMA regardless of where they live ```{r eval=T} head(out$lma$marginals) ```
The *reserve.list* is a list of particular communities. During the iterations, the algorithm tries to assign communities to their dominant cluster. If, for a particular community *C*, there is no dominant cluster or if the validity of the dominant cluster does not improve after the assignment, the dominant cluster is not modified. In these situations, the community *C* is assigned to the reserve list. ```{r eval=T} str(out$reserve.list) ```
As the reserve list is not emptied during the iterations, once the convergence is achieved (each cluster satisfies the validity rule), the function **findClusters** outputs also an object *lma.before0.* All the clusters in *lma.before0* satisfy the validity rule, except the cluster identified by "0". The latter cluster represents the reserve list created during the iterations of the algorithm. The communities belonging to the reserve list may be investigated through the data.table clusterList. ```{r eval=T} #str(out$lma.before0) out$lma.before0$clusterList[cluster==0] ``` *lma* and *lma.before0* share the same structure. Actually, *lma* is derived from *lma.before0* simply by assigning each community in the reserve list (cluster "0") to other clusters in *lma.before0*. This assignment is driven by the maximisation of the linkages between communities and clusters. For this final assignment of the communities from the reserve list, the validity of the obtained clusters is no more checked. Consequently, clusters in *lma.before0* satisfy the validity rule, while some clusters in *lma* may not satisfy the same rule. Furthermore, the *lma* component of the output does not include any cluster identified by "0".
The *zero.list* contains information on communities that could not be processed by the algorithm for various reasons; either the number of commuters resident in it equals zero or the number of workers/jobs is zero or the community has no interaction with any other community.


# 4. Shape files of the derived Labour Market Areas Once the labour market areas are delimited by means of the **findClusters** function, the LMAs are labeled according to the name of the community having the highest number of jobs (incoming commuters) among all the communities comprising the LMA. The function **AssignLmaName** changes the structure of an LMA partition. Firstly, the *cluster* columns (in *clusterList*, *LWClus* and *marginals*) are re-named into *LMA,* but they maintain their meaning, i.e. LMA identification code. The columns *residents* are re-named into *EMP_live,* while the columns *workers* are re-named into *EMP_work*. In the *LWClus* component, the commuting flows are re-named from *amount* into *commuters*. Secondly, the names of the communities are included in the *clusterList* component: the column *com.name* is added. Thirdly, the *clusterList*, *LWClus* and *marginals* components include the LMA names, columns starting with the prefix *Lma.name*. ```{r eval=T} lma_name=AssignLmaName(Brindisi,out$lma,names.Brindisi) head(lma_name$clusterList) #head(lma_name$LWClus) #head(lma_name$marginals) ```

Starting from the initial communities shape files, the function **CreateLMAshape** derives the shape files of the corresponding labour market areas. The output of this function includes the labour market areas shape files and the lists of communities belonging either to commuting flows or the communities shape files. The LMA shape files are registered in an sf R object (sf) ```{r eval=T} out_shp=CreateLMAshape(lma=lma_name, comIDs="community", lmaIDs="LMA", shp_com=shpBrindisi, id_shp_com="PRO_COM") # str(shp) ``` Obviously, if a community is registered in the commuting flows matrix, but not in the communities shape files, the completeness of the polygons should be checked. On the contrary, it might be possible for a community to be registered in the communities shape files, but not in the LMA partition. This is especially the case of the *zero.list* communities, i.e. those communities having no links with other communities. Such communities should be manually treated before generating the LMA shape files. For example, additional clusters could be created for each such isolated community. ```{r eval=T} # check whether there are communities registered in lma but not in the communities shape file out_shp$comID.in.LMA.not.in.SHP #check whether there are communities registered in the communities shape file but not in the lma out_shp$comID.in.SHP.not.in.LMA ``` In order to graphically visualise the shape of the labour market areas, the *shp_lma* component of the output of **CreateLMAshape** can be used: ```{r eval=T,fig.align='center'} # plot(shp) # or tm_shape(st_geometry(shpBrindisi))+tm_borders("black",alpha=0.5)+tm_fill("gray",alpha=0.2)+tm_shape(st_geometry(out_shp$shp_lma))+tm_borders("red",alpha=0.5)+tm_fill("blue",alpha=0.2) ```


# 5. Quality assessment The package LabourMarketAreas includes both statistical and spatial tools for quality evaluation. The function **StatClusterData** computes statistics on the LMA partition, its flows and quality indicators, e.g. statistics on internal cohesion flows, home-work ratio, $Q$-modularity, etc. Before using this function, the LMA names should be deleted from the corresponding R object, if any. The function **StatReserveList** computes statistics on the reserve list. It summarizes the typologies of the communities, their validities, statistics on their original clusters, etc. ```{r eval=T} lma_no_name=DeleteLmaName(lma_name) #?StatClusterData mystat=StatClusterData(lma_no_name,out$param,1,Brindisi) # mystat$marginals # mystat$StatFlows # mystat$StatQuality #?StatReserveList stat_reserve=StatReserveList(out$reserve.list,Brindisi) ``` \vspace{30pt}


# 6. Fine tuning The algorithm implemented in the **findClusters** function esclusively takes into account the travel-to-work commuting flows. Otherwise stated, the spatial contiguity of the communities is not at all considered by the greedy algorithm. Consequently, the spatial extension of some labour market areas might include isolated polygons. The latter are defined as those polygons having no contiguity relationship with other polygons belonging to the same LMA. Moreover, it might be that some LMAs contain a single elementary territorial unit which could be intended as an unwanted feature. The functions **FindIsolated** and **FindContig** help users in identifying these critical LMAs. ```{r eval=F} #?FindIsolated iso=FindIsolated(lma=lma_name, lma_shp=out_shp$shp_lma, com_shp=shpBrindisi, id_com="PRO_COM" ) ``` ```{r eval=T,echo=F} setwd("..") setwd("R") load("sysdata.rda") setwd("..") setwd("vignettes") ``` The function **FindIsolated** is an interactive one. It plots in a new window containing the isolated polygons together with the communities of the corresponding LMA: \newpage ```{r eval=T,echo=F,warning=FALSE,message=FALSE,cache.comments=FALSE,fig.align='center'} lma_shp_path = NULL lma_shp_name = NULL com_shp_path = NULL com_shp_name = NULL lma=lma_name lma_shp=out_shp$shp_lma com_shp=shpBrindisi id_com="PRO_COM" shp = lma_shp ##aug 2023 commentato la riga del proj4string #proj4string = lma_shp@proj4string comuni91 = com_shp #proj4string = com_shp@proj4string #aug 2023 cancellato @data # kiki = grep(id_com, names(comuni91@data)) kiki = grep(id_com, names(comuni91)) comuni91 <- sp::merge(comuni91, data.frame(lma$clusterList), by.x = kiki, by.y = "community") #aug 2023 commentato e sostituio il pezzo di nbpolysll # nbpolygsll <- spdep::poly2nb(shp, row.names = shp@data$LMA, # queen = TRUE) nbpolygsll <- st_relate(shp, pattern = "F***T****") as.nb.sgbp <- function(x, ...) { attrs <- attributes(x) x <- lapply(x, function(i) { if(length(i) == 0L) 0L else i } ) attributes(x) <- attrs class(x) <- "nb" x } nbpolygsll=as.nb.sgbp(nbpolygsll) Wsll = spdep::nb2mat(nbpolygsll, style = "B", zero.policy = TRUE) #rownames(Wsll[rowSums(Wsll) == 0, ]) rownames(Wsll[rowSums(Wsll)==0,]) #daniela 22febb2023 sf # colnames(Wsll)=shp@data$LMA colnames(Wsll)=shp$LMA rownames(Wsll)=shp$LMA #daniela 22febb2023 sf # nolinksll=shp@data$LMA[shp@data$LMA%in%rownames(Wsll)[rowSums(Wsll)==0]] # nolinksllname=shp@data$lma.name[shp@data$LMA%in%rownames(Wsll)[rowSums(Wsll)==0]] nolinksll=shp$LMA[shp$LMA%in%rownames(Wsll)[rowSums(Wsll)==0]] nolinksllname=shp$lma.name[shp$LMA%in%rownames(Wsll)[rowSums(Wsll)==0]] shp = shp[!shp$LMA %in% nolinksll, ] setDT(lma$clusterList) lma$clusterList = lma$clusterList[!(lma$clusterList$LMA %in% nolinksll), ] lma$LWCom = lma$LWCom[!(lma$LWCom$LMA_live %in% nolinksll), ] lma$LWCom = lma$LWCom[!(lma$LWCom$LMA_work %in% nolinksll), ] lma$marginals = lma$marginals[!(lma$marginals$LMA %in% nolinksll), ] lma$clusterList[, `:=`(Ncom, .N), by = LMA] badsll = lma$clusterList[Ncom == 1, ] setorder(badsll, -EMP_live) badsll = badsll$LMA badsllname = "0" if(length(badsll)>0){ for(kiki in 1:length(badsll)){ #daniela 22febb2023 sf # badsllname=c(badsllname, # as.character(shp@data$lma.name[shp$LMA==badsll[kiki]])) badsllname=c(badsllname, as.character(shp$lma.name[shp$LMA==badsll[kiki]])) } } badsllname=badsllname[-1] ##QUI INTERVENIRE: #d=sp::disaggregate(shp) d=st_cast(shp,"POLYGON") # d@data=data.table(d@data) # d@data[,x:=.N,by=lma.name] # d@data[x>=2,ID_PiecesLMA:=paste(as.character(LMA),1:max(x),sep="_"),by=lma.name] # d@data[is.na(ID_PiecesLMA),ID_PiecesLMA:=as.character(LMA)] d1=data.table(d) d1[,x:=.N,by=lma.name] d1[x>=2,ID_PiecesLMA:=paste(as.character(LMA),1:max(x),sep="_"),by=lma.name] d1[is.na(ID_PiecesLMA),ID_PiecesLMA:=as.character(LMA)] d$x=NULL d1[,x:=.N,by=lma.name] d=merge(d,d1) #QUI INTERVENIRE sll.shp.nb <- spdep::poly2nb(d, row.names =d$ID_PiecesLMA,queen=TRUE) W=spdep::nb2mat(sll.shp.nb,style="B", zero.policy=TRUE) rownames(W[rowSums(W)==0,]) colnames(W)=d$ID_PiecesLMA rownames(W)=d$ID_PiecesLMA nolinkpolig=d$ID_PiecesLMA[d$ID_PiecesLMA%in%rownames(W)[rowSums(W)==0]] nolinkpoligname=d$lma.name[d$ID_PiecesLMA%in%rownames(W)[rowSums(W)==0]] badlmalist=sort(unique(d$LMA[d$x>1])) zeris=c(0,0) df.mun.poly=data.frame(t(zeris)) colnames(df.mun.poly)=c("community","Polygon") rm(zeris) indice.ident=grep(id_com,names(comuni91)) badlma=badlmalist[1] zz = d[d$LMA == badlma, ] zz$colore=c("red", "yellow") # plot(zz, main = paste("polygons", unique(zz@data$lma.name), # sep = " "), col = c("yellow", "red", "cyan", "orange"), # col.main = "red") # centroids <- coordinates(zz) # text(centroids, label = zz@data$ID_PiecesLMA, cex = 0.8) mappa1=tm_shape(zz)+tm_borders("gray")+tm_fill(col="colore")+tm_text("ID_PiecesLMA") mappa2=tm_shape(comuni91[comuni91$LMA ==badlma,])+tm_borders("black")+tm_fill("gray",alpha=0.2)+tm_text("PRO_COM") tmap_arrange(mappa1,mappa2,ncol=2) ```


The isolated polygons have a yellow background. The user should identify the association between the isolated polygon and a new community. In this example, the user should type in the **R console** the community and polygon ID, e.g 74015 and 1_1, respectively. The function **FindIsolated** loops for all isolated polygons. At the end of the manual introduction of the associations between communities and polygons, the user is also asked to modify or confirm his choice (y/n). Once the association table is confirmed, the interactivity stops. The output of the **FindIsolated** function is a list with two quite similar components, one dedicated to particular LMAs and the other containing information about particular polygons. Depending on the case study, these components could be further manipulated and updated in order to reflect the expert choices. Then, these components could be further used to tune the isolated elements: 1. *isolated.lma* - *contig.matrix.lma* - the LMA contiguity matrix of the given LMA partition - *lma.unique* - these are the LMAs having a unique community - *lma.nolink* - the LMAs with no link w.r.t. other LMAs 2. *isolated.poly* - *contig.matrix.poly* - the contiguity matrix of the polygons of the given LMA partition - *poly.com.linkage* - association between polygons and communities (the one interactively confirmed by the user) - *poly.nolink* - the identifiers of the polygons without links Once the isolated polygons are identified, the function **FindContig** should be used to identify the contiguos polygons. Depending on the situation, the function **FindContig** could be used even twice, once for LMAs and once for polygons. Firstly, the function **FindContig** should be used for LMAs with an unique community. For such LMAs, the output is a list of contiguous LMAs. The contiguous LMAs are ordered in decreasing order of commuters who are resident in the given LMA (with unique community). ```{r eval=T} conti.lma=FindContig(type = "lma", lma=out$lma, contig.matrix=iso$isolated.lma$contig.matrix.lma, isolated=iso$isolated.lma$lma.unique$lma.unique.ID) conti.lma$list.contig.lma ``` Secondly, the function **FindContig** below should be used for isolated polygons. For such polygons, the list containing the IDs of the contiguous labour market areas of each community (polygon) is identified. ```{r eval=T} conti.poly=FindContig(type = "poly", lma=out$lma, contig.matrix=iso$isolated.poly$contig.matrix.poly, isolated=iso$isolated.poly$poly.com.linkage) conti.poly$list.contig.poly conti.poly$com_no.LMA.neigh ``` In order to be further processed, only the non-empty elements should be selected. ```{r eval=T} conti.poly$list.contig.poly= conti.poly$list.contig.poly[!is.na(conti.poly$list.contig.poly)] ``` Finally, the LMAs and polygons tunning could be performed after deleting the names of the LMAs. Of course, the LMA names could be re-attached at the end of the process. ```{r eval=T} out$lma=DeleteLmaName(out$lma) lma.tuned=FineTuning(dat=Brindisi, out.ini=out$lma, list.contiguity=conti.lma$list.contig.lma) str(lma.tuned$tunned.lma) str(lma.tuned$not.tunned.commID) poly.tuned=FineTuning(dat=Brindisi, out.ini=out$lma, list.contiguity=conti.poly$list.contig.poly) output=AssignLmaName(Brindisi,lma.tuned$tunned.lma,names.Brindisi) out$lma=output ``` \vspace{30pt}


# 7. Comparison of different partitions The LMA delineation is a complex process which needs expert input from many fields like statistics, geography, labour market, transportation, etc. The methodology proposed in the package LabourMarketAreas heavily depends on the choice of four parameters, i.e. *minSZ*, *tarSZ*, *minSC* and *tarSC*. In some cases, users test different parameters and then choose the optimal partition. The package LabourMarketAreas includes several assessment functionalities. In order to illustrate them, we first generate two different LMA partitions. ```{r eval=T} #generate a partition with a first set of parameters out1= findClusters(LWCom=Brindisi, minSZ=50,minSC=0.3,tarSZ=100,tarSC=0.4) out1_name=AssignLmaName(Brindisi,out1$lma,names.Brindisi) x=CreateLMAshape(out1_name,comIDs="community",lmaIDs="LMA",shp_com=shpBrindisi,id_shp_com="PRO_COM") shape1=x$shp_lma #generate a partition with a second set of parameters out2= findClusters(LWCom=Brindisi, minSZ=1000,minSC=0.6,tarSZ=10000,tarSC=0.7) out2_name=AssignLmaName(Brindisi,out2$lma,names.Brindisi) x=CreateLMAshape(out2_name,comIDs="community",lmaIDs="LMA",shp_com=shpBrindisi,id_shp_com="PRO_COM") shape2=x$shp_lma ```

Firstly, users could test whether the two partitions are equal: ```{r eval=F, warning=FALSE,message=FALSE} EqualLmaPartition(out1$lma, out2$lma) ```

Users could use the function **StatClusterData** in order to compare the statistical indicators stemming from each LMA partition. For example, users could compare the number of LMAs containing a single community; this is considered a unwanted feature as we do expect that elementary units are related with each other (except for peculiar cases). ```{r eval=T, warning=FALSE,message=FALSE} #Use the structure without names stats_first=StatClusterData(out1$lma,c(50,0.3,100,0.4),1,Brindisi) stats_second=StatClusterData(out2$lma,c(1000,0.6,10000,0.7),1,Brindisi) #str(stats_first) #str(stats_second) stats_first$StatQuality$NbClusterUniqueCom stats_second$StatQuality$NbClusterUniqueCom ```

Instead of individually and repeatedly using the function **StatClusterData**, two or more partitions could be simultaneously analyzed by means of the function **CompareLMAsStat**. ```{r eval=T, warning=FALSE,message=FALSE} comparison=CompareLMAsStat(list(out1,out2),Brindisi) ```

Users could also compare two partitions by checking the assignment of particular communities. The figure below shows the output of the **PlotLmaCommunity** function. For a given community, e.g. 74014, the figure shows the two LMA to which community 74014 belongs to, in the first and second setting, respectively. The red territories are those in common, while the yellow communities are those beloging to one LMA partition but not to the other. \newpage ```{r eval=F,warning=FALSE,message=FALSE} PlotLmaCommunity(list(out1_name,out2_name),"LMA","74014", shpBrindisi, "PRO_COM","my_full_path\\name_bmp_file.bmp") ``` ```{r eval=T,echo=F ,warning=FALSE,message=FALSE,fig.align='center'} col.vec = c("red", "orange", "yellow") list.lma=list(out1_name,out2_name) shp_com=shpBrindisi lmaIDs="LMA" communityID="74014" id_shp="PRO_COM" # par(mfrow = c(1, 2)) counter = 1 lma = list.lma[[counter]] x = CreateLMAshape(lma, comIDs = "community", lmaIDs = lmaIDs, shp_com = shp_com, dsn = NULL, shp_com_name = NULL, id_shp_com = id_shp, outd = NULL, outf = NULL, bf = NULL, po = c("green", 1, 2, "red", 1, 2, 0.8, 2)) indice = grep(lmaIDs, names(lma$clusterList)) current.lma = lma$clusterList[community %in% communityID, indice, with = F] com.in.current.lma = lma$clusterList[unlist(lma$clusterList[, indice, with = F]) %in% unlist(current.lma), community] current.lma2 = list.lma[[2]]$clusterList[community %in% communityID, indice, with = F] com.in.current.lma2 = list.lma[[2]]$clusterList[unlist(list.lma[[2]]$clusterList[, indice, with = F]) %in% unlist(current.lma2), community] new.com = setdiff(com.in.current.lma, com.in.current.lma2) indicecom = grep(id_shp, names(shp_com)) indicelma = grep(lmaIDs, names(x$shp_lma)) # plot(shp_com[shp_com@data[, indicecom] %in% com.in.current.lma, ], border = "gray", main = counter) # plot(x$shp_lma[x$shp_lma@data[, indicelma] %in% current.lma, ], border = "red", add = T) # plot(shp_com[shp_com@data[, indicecom] %in% communityID, ], border = "gray", col = col.vec[1], add = T) # mappa1=tm_shape(st_geometry(shp_com[st_drop_geometry(shp_com)[, indicecom] %in% com.in.current.lma,]))+tm_borders("gray") mappa1=mappa1+tm_shape(st_geometry(x$shp_lma[st_drop_geometry(x$shp_lma)[, indicelma] %in% current.lma, ]))+tm_borders("red") mappa1=mappa1+tm_shape(shp_com[st_drop_geometry(shp_com)[, indicecom] %in% communityID, ])+tm_borders("gray")+tm_fill(col=col.vec[1])+tm_text("PRO_COM") counter = 2 lma = list.lma[[counter]] x = CreateLMAshape(lma, comIDs = "community", lmaIDs = lmaIDs, shp_com = shp_com, dsn = NULL, shp_com_name = NULL, id_shp_com = id_shp, outd = NULL, outf = NULL, bf = NULL, po = c("green", 1, 2, "red", 1, 2, 0.8, 2)) indice = grep(lmaIDs, names(lma$clusterList)) current.lma = lma$clusterList[community %in% communityID, indice, with = F] com.in.current.lma = lma$clusterList[unlist(lma$clusterList[, indice, with = F]) %in% unlist(current.lma), community] current.lma1 = list.lma[[1]]$clusterList[community %in% communityID, indice, with = F] com.in.current.lma1 = list.lma[[1]]$clusterList[unlist(list.lma[[1]]$clusterList[, indice, with = F]) %in% unlist(current.lma1), community] new.com = setdiff(com.in.current.lma, com.in.current.lma1) indicecom = grep(id_shp, names(shp_com)) indicelma = grep(lmaIDs, names(x$shp_lma)) # plot(shp_com[shp_com@data[, indicecom] %in% com.in.current.lma, ], border = "gray", main = counter) # plot(x$shp_lma[x$shp_lma@data[, indicelma] %in% current.lma, ], border = "red", add = T) # plot(shp_com[shp_com@data[, indicecom] %in% communityID, ], border = "gray", col = col.vec[1], add = T) # mappa2=tm_shape(shp_com[st_drop_geometry(shp_com)[, indicecom] %in% com.in.current.lma, ])+tm_borders("gray")+tm_fill("yellow")+tm_text("PRO_COM") mappa2=mappa2+tm_shape(x$shp_lma[st_drop_geometry(x$shp_lma)[, indicelma] %in% current.lma, ])+tm_borders("red") mappa2=mappa2+tm_shape(shp_com[st_drop_geometry(shp_com)[, indicecom] %in% communityID, ])+tm_borders("gray")+tm_fill(col= "red")+tm_text("PRO_COM") mappa1 mappa2 #tmap_arrange(mappa1,mappa2,ncol=2) ```


Finally, in order to compare two partitions, the package LabourMarketAreas includes also the function **LmaSpatialComparison**. For each LMA in the first partition, the LMA in the second partition that maximizes the intersection area is found. Then the function **LmaSpatialComparison** returns the areas of the two LMAs and their intersection area together with the coverage percentages, i.e. *shape_area, shape_ref_area*, *area_intersection* , *perc_intersection_shape* and *perc_intersection_shape_ref* respectively. The function **LmaSpatialComparison** also returns the number of employees living and working in the two LMAs. ```{r eval=TRUE,warning=FALSE,message=FALSE} spatial_comp=LmaSpatialComparison(shape1,shape2)[] str(spatial_comp) ```


# 8. Thematic maps Besides the dissemination of structural information on labour market areas, the R package LabourMarketAreas includes the possibility to generate thematic maps, i.e. maps based on meaningful statistical indicators. The function **AddStatistics** could be used to compute statistics at LMA level provided data at community level is available. This function **sums** the values at community level to obtain the corresponding value at LMA level. The obtained statistics can then, in turn, be used to create maps by mean of the LMA shape files. ```{r echo=TRUE,eval=TRUE,fig.align='center'} lma_pop=AddStatistics(shpBrindisi[,c("PRO_COM","POP2001")], "PRO_COM",out1$lma,"community" ) head(lma_pop) shp_stats=sp::merge(shape2,lma_pop,by.x="LMA",by.y="cluster") tm_shape(shp_stats)+tm_borders("red")+tm_fill("POP2001")+tm_view(view.legend.position = c('right','bottom'))+tm_layout(legend.text.size=0.6) ``` ```{r echo=FALSE} options(warn=0) ```