Introduction to labour market areas delineation and processing through the R package LabourMarketAreas

Labour market areas (LMAs) are sub-regional geographical areas where the bulk of the labour force lives and works, and where establishments can find the main part of the labour force necessary to occupy the offered jobs. They are functional regions that stem from the aggregation of elementary geographical units (municipalities, census output areas, etc.) on the basis of their level of spatial interaction measured by commuting to work flows through quantitative methods. The guiding idea is to maximise the flow inside the area (internal cohesion) and minimise it outside (external separation).

The R package LabourMarketAreas includes a series of functions useful for the treatment of LMAs. The package contains tools for the creation, operationalisation, manipulation and dissemination of labour market areas. This vignette briefly illustrates the main stages of the labour market areas delimitation process. Users are invited to explore the other functions devoted to LMA management.




1. Introduction

LMAs are clusters comprising two or more initial elementary units (communities) linked by commuting patterns between them. The key characteristics of LMAs is self-containment of commuting flows. An algorithm has been implemented in the package in order to create such clusters.

The algorithm is an iterative agglomerative one that depends on a set of parameters. Such parameters set the level of desired self-containment and size of LMAs. It starts by considering each community as a cluster that is checked against a set of conditions to see whether it can be considered an LMA. At each iteration clusters that are not fit for the purpose are disaggregated and a single community inside the cluster is chosen to be attached to a new cluster in order to improve the set of given conditions. The final solution is obtained when the whole set of clusters satisfies the given conditions. The main ingredient of the algorithm are: 1. a set of parameters, 2. a function to decide when a cluster is “fit for the purpose”, 3. a measure to choose the cluster to be assigned to a selected community and 4. the steps of the iterative procedure.

A little explanation on these components is the following:

  1. The set of parameters, chosen by the user, identifies thresholds on the dimensions of the cluster to be created. The fist dimension is the size of the LMA, in terms of number of occupied residents; the second is the level of self-containment required in order for a cluster to be considered an LMA. To allow flexibility a trade-off is suggested between these two dimensions see Franconi et al. (2017) for further details;

  2. A condition of validity states, through a quantification of a function based on the values of the parameters, whether a cluster of elementary units is an LMA;

  3. A measure of cohesion between a community and all the clusters with whom such community has relationships; such measure identifies the cluster where the community will to be assigned: the one where the maximum is attained.

  4. A reserve list (Coombes, 2014) comprising of communities which cannot be clearly assigned during the iterations of the algorithm;

  5. An iterative procedure that selects a community at a time, aggregates it to a different cluster, and defines the order and the operations to be implemented.

The R object lma is the main actor of the package. The description of this object necessarily needs information on all its dimensions: the initial units comprising it, the flows inward and outward and the summary of its size in terms of employees that reside and/or work there. The description of the process to delineate and produce a graphical representation of the LMA starts with the necessary input data, presented in Section 2. The core function of the package LabourMarketAreas - comprising all the elements listed above - is described in Section 3. The output of the main function provides information on all the dimensions of the LMAs, a recap on the parameters used and details of the process. Section 4 describe the functions for the production of maps. The quality assessment of the clusters is of extreme importance and functions devoted to its investigation are presented in Section 5. As the algorithm does not constrain on contiguity of comprising elements of each LMA, functions are present to check on fulfilment of such property and, whether needed, correct the composition; Section 6 presents this fine tuning stage of the process. In order to guide on the choice of initial parameters, Section 7 shows how to compare different partitions using implemented package functions. The vignette ends with examples of data aggregation and maps preparation through the package.




2. Input data

There are two main data sets needed for the delimitation of the labour market areas: commuting flows and the shape files of the initial territorial partition (elementary geographical units).

2.1 Commuting flows

Labour market areas are aggregations of basic territorial units. The latter are called communities.

Labour market areas are built by aggregating communities; the aggregation process is driven by the commuting flows between communities. Thus, the commuting flows matrix is the main input for the delineation of the labour market areas.

In the LabourMarketAreas package the commuting flows matrix is a data.table with three columns: the origin elementary geographical unit identifier, the destination identifier and the amount of commuting flow between origin and destination. The identifiers of the origin and the destination may be either numeric or character, but the amount is constrained to be a numeric variable.

The examples in this vignette are based on the travel-to-work commuting flows between the municipalities of the Italian NUTS3 of Brindisi. The data stems from the 2001 Italian Population Census.

library(LabourMarketAreas)
data(Brindisi)
#?Brindisi
head(Brindisi)
#>    community_live community_work amount
#>             <int>          <int>  <int>
#> 1:          74004          74001    320
#> 2:          74014          74001    142
#> 3:          74012          74001    599
#> 4:          74020          74001     77
#> 5:          74002          74001    500
#> 6:          74015          74001    410

2.2 Shape files of the communities

In order to visualize the obtained aggregations, the shape files of the initial territorial units are required. These objects should be loaded as an object of class sf (see st_read in the package lightgray”>sf).

Besides the commuting flows between Brindisi municipalities, the package LabourMarketAreas includes also the spatial information of this Italian region.

#?shpBrindisi
data("shpBrindisi")
tm_shape(st_geometry(shpBrindisi))+tm_borders("black",alpha=0.5)+tm_fill("gray",alpha=0.2)




If available, the names/labels of the initial communities may be supplied to facilitate identification of the derived LMA partition. Although it is not a mandatory input for the LMA delimitation, the data.table containing the labels should have the following structure: the community identifiers Code and their labels com.name:

data("names.Brindisi")
head(names.Brindisi)
#>     Code          com.name
#>    <int>            <char>
#> 1: 74001          Brindisi
#> 2: 74002         Carovigno
#> 3: 74003  Ceglie Messapica
#> 4: 74004 Cellino San Marco
#> 5: 74005        Cisternino
#> 6: 74006            Erchie



Please make sure that the initial communities’ identifiers in the travel-to-work commuting matrix coincide with those present in the shape files and in the data.table containing the labels.

The initial communities listed in the travel-to-work commuting matrix should be amongst the communities registered in the shape files and label data.frame.

None of the communities should be identified or labeled by “0”.




3. Delineation of Labour Market Areas

The main function of the package LabourMarketAreas is findClusters. It implements the algorithm that iteratively aggregates the initial communities in order to fulfill the conditions set by the initial parameters.

?findClusters

By means of the travel-to-work commuting flows contained in the LWCom data.table, the algorithm iteratively aggregates the initial communities until convergence is reached. The algorithm creates a partition of the territory such that all areas satisfy the so called validity rule (convergence criteria). Such rule depends on area size, the number of commuters living in the area, and on self-containment, the proportion of the commuters not crossing area borders. In particular, the validity rule sets a trade-off between area size and its level of self-containment. The rationale is the smaller the area the higher the self-containment to be considered adequate. Lower self-containment values are instead acceptable for larger areas. These cut-off levels are defined by users through four parameters: minSZ, tarSZ, minSC and tarSC.

  • minSZ is the area acceptable minimum size

  • tarSZ is the size for an area to be considered large

  • minSC is the acceptable minimum level of selfcontainment for large areas

  • tarSC is the acceptable minimum level of self-containment for small areas.



The basic usage of the findClusters function is as follows:

out = findClusters(LWCom=Brindisi, minSZ=1000,minSC=0.6667,tarSZ=10000,tarSC=0.75, 
verbose=FALSE)
#> [1] "The algorithm has converged."

There are several additional arguments that render the function findClusters more flexible:

  • idcom_type may be used for switching from numeric to character type of the codes of the communities.

  • PartialClusterData may be used to start the aggregation of the communities from a given setting.

  • verbose and sink.output may be used to display some convergence information at each iteration.

  • trace may be used to save some intermediate results.



The output of the findClusters is a list with several components describing the derived partition, reserve list, the communities not assigned to any LMA and the isolated communities.

str(out)
#> List of 8
#>  $ lma                 :List of 3
#>   ..$ clusterList:Classes 'data.table' and 'data.frame': 20 obs. of  3 variables:
#>   .. ..$ community: num [1:20] 74001 74002 74003 74004 74005 ...
#>   .. ..$ cluster  : int [1:20] 1 12 3 16 5 10 7 8 10 10 ...
#>   .. ..$ residents: num [1:20] 21433 3630 4009 1468 2557 ...
#>   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>   .. ..- attr(*, "sorted")= chr "community"
#>   .. ..- attr(*, "index")= int(0) 
#>   .. .. ..- attr(*, "__cluster")= int [1:20] 1 14 15 17 3 5 7 8 11 6 ...
#>   ..$ LWClus     :Classes 'data.table' and 'data.frame': 80 obs. of  3 variables:
#>   .. ..$ cluster_live: int [1:80] 1 1 1 1 1 1 1 1 1 3 ...
#>   .. ..$ cluster_work: int [1:80] 1 3 5 7 8 10 12 16 20 1 ...
#>   .. ..$ amount      : num [1:80] 26868 120 11 72 233 ...
#>   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>   ..$ marginals  :Classes 'data.table' and 'data.frame': 9 obs. of  3 variables:
#>   .. ..$ cluster    : int [1:9] 1 3 5 7 8 10 12 16 20
#>   .. ..$ amount_live: num [1:9] 28895 4009 2557 10162 10773 ...
#>   .. ..$ amount_work: num [1:9] 34742 3776 2315 10055 10376 ...
#>   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>  $ lma.before0         :List of 3
#>   ..$ clusterList:Classes 'data.table' and 'data.frame': 20 obs. of  3 variables:
#>   .. ..$ community: num [1:20] 74001 74002 74003 74004 74005 ...
#>   .. ..$ cluster  : int [1:20] 1 12 3 16 5 10 7 8 10 10 ...
#>   .. ..$ residents: num [1:20] 21433 3630 4009 1468 2557 ...
#>   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>   .. ..- attr(*, "sorted")= chr "community"
#>   ..$ LWClus     :Classes 'data.table' and 'data.frame': 81 obs. of  3 variables:
#>   .. ..$ cluster_live: int [1:81] 0 1 1 1 1 1 1 1 1 1 ...
#>   .. ..$ cluster_work: int [1:81] 0 1 3 5 7 8 10 12 16 20 ...
#>   .. ..$ amount      : num [1:81] 0 26868 120 11 72 ...
#>   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>   ..$ marginals  :Classes 'data.table' and 'data.frame': 10 obs. of  3 variables:
#>   .. ..$ cluster    : int [1:10] 0 1 3 5 7 8 10 12 16 20
#>   .. ..$ amount_live: num [1:10] 0 28895 4009 2557 10162 ...
#>   .. ..$ amount_work: num [1:10] 0 34742 3776 2315 10055 ...
#>   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>  $ reserve.list        : list()
#>  $ comNotAssigned      :List of 1
#>   ..$ : num(0) 
#>  $ zero.list           :List of 4
#>   ..$ Communities: num(0) 
#>   ..$ LWCom      :Classes 'data.table' and 'data.frame': 0 obs. of  3 variables:
#>   .. ..$ community_live: int(0) 
#>   .. ..$ community_work: int(0) 
#>   .. ..$ amount        : int(0) 
#>   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>   ..$ Residents  :Classes 'data.table' and 'data.frame': 0 obs. of  2 variables:
#>   .. ..$ residents: int(0) 
#>   .. ..$ Code     : int(0) 
#>   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>   ..$ Workers    :Classes 'data.table' and 'data.frame': 0 obs. of  2 variables:
#>   .. ..$ workers: int(0) 
#>   .. ..$ Code   : int(0) 
#>   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#>  $ communitiesMovements:Classes 'data.table' and 'data.frame':   21 obs. of  2 variables:
#>   ..$ community: num [1:21] 74001 74002 74003 74004 74005 ...
#>   ..$ moves    : num [1:21] 0 1 0 1 0 1 0 0 1 0 ...
#>   ..- attr(*, ".internal.selfref")=<externalptr> 
#>   ..- attr(*, "sorted")= chr "community"
#>  $ param               : num [1:4] 1.00e+03 6.67e-01 1.00e+04 7.50e-01
#>  $ idcom_rel           : NULL

The lma component of the output is a list of three data.table objects:

  1. clusterList - it contains the allocation of each community to the corresponing labour market area together with the number of residents in each initial territorial unit
#str(out$lma)
head(out$lma$clusterList)
#> Key: <community>
#>    community cluster residents
#>        <num>   <int>     <num>
#> 1:     74001       1     21433
#> 2:     74002      12      3630
#> 3:     74003       3      4009
#> 4:     74004      16      1468
#> 5:     74005       5      2557
#> 6:     74006      10      1708
  1. LWClus - it contains the commuting flows between the labour market areas
head(out$lma$LWClus)
#>    cluster_live cluster_work amount
#>           <int>        <int>  <num>
#> 1:            1            1  26868
#> 2:            1            3    120
#> 3:            1            5     11
#> 4:            1            7     72
#> 5:            1            8    233
#> 6:            1           10    781
  1. marginals - it contains the main characteristics of the labour market areas:
  • amount_live = number of employees who are residents in the LMA
  • amount_work = number of employees working in LMA regardless of where they live
head(out$lma$marginals)
#>    cluster amount_live amount_work
#>      <int>       <num>       <num>
#> 1:       1       28895       34742
#> 2:       3        4009        3776
#> 3:       5        2557        2315
#> 4:       7       10162       10055
#> 5:       8       10773       10376
#> 6:      10       13596       10960


The reserve.list is a list of particular communities. During the iterations, the algorithm tries to assign communities to their dominant cluster. If, for a particular community C, there is no dominant cluster or if the validity of the dominant cluster does not improve after the assignment, the dominant cluster is not modified. In these situations, the community C is assigned to the reserve list.

str(out$reserve.list)
#>  list()


As the reserve list is not emptied during the iterations, once the convergence is achieved (each cluster satisfies the validity rule), the function findClusters outputs also an object lma.before0.

All the clusters in lma.before0 satisfy the validity rule, except the cluster identified by “0”. The latter cluster represents the reserve list created during the iterations of the algorithm. The communities belonging to the reserve list may be investigated through the data.table clusterList.

#str(out$lma.before0)
out$lma.before0$clusterList[cluster==0]
#> Key: <community>
#> Empty data.table (0 rows and 3 cols): community,cluster,residents

lma and lma.before0 share the same structure. Actually, lma is derived from lma.before0 simply by assigning each community in the reserve list (cluster “0”) to other clusters in lma.before0. This assignment is driven by the maximisation of the linkages between communities and clusters. For this final assignment of the communities from the reserve list, the validity of the obtained clusters is no more checked. Consequently, clusters in lma.before0 satisfy the validity rule, while some clusters in lma may not satisfy the same rule. Furthermore, the lma component of the output does not include any cluster identified by “0”.


The zero.list contains information on communities that could not be processed by the algorithm for various reasons; either the number of commuters resident in it equals zero or the number of workers/jobs is zero or the community has no interaction with any other community.




4. Shape files of the derived Labour Market Areas

Once the labour market areas are delimited by means of the findClusters function, the LMAs are labeled according to the name of the community having the highest number of jobs (incoming commuters) among all the communities comprising the LMA. The function AssignLmaName changes the structure of an LMA partition. Firstly, the cluster columns (in clusterList, LWClus and marginals) are re-named into LMA, but they maintain their meaning, i.e. LMA identification code. The columns residents are re-named into EMP_live, while the columns workers are re-named into EMP_work. In the LWClus component, the commuting flows are re-named from amount into commuters. Secondly, the names of the communities are included in the clusterList component: the column com.name is added. Thirdly, the clusterList, LWClus and marginals components include the LMA names, columns starting with the prefix Lma.name.

lma_name=AssignLmaName(Brindisi,out$lma,names.Brindisi)
head(lma_name$clusterList)
#> Key: <community>
#>    community          com.name   LMA             lma.name EMP_live
#>        <int>            <char> <int>               <char>    <num>
#> 1:     74001          Brindisi     1             BRINDISI    21433
#> 2:     74002         Carovigno    12               OSTUNI     3630
#> 3:     74003  Ceglie Messapica     3     CEGLIE MESSAPICA     4009
#> 4:     74004 Cellino San Marco    16 SAN PIETRO VERNOTICO     1468
#> 5:     74005        Cisternino     5           CISTERNINO     2557
#> 6:     74006            Erchie    10              MESAGNE     1708
#head(lma_name$LWClus)
#head(lma_name$marginals)



Starting from the initial communities shape files, the function CreateLMAshape derives the shape files of the corresponding labour market areas. The output of this function includes the labour market areas shape files and the lists of communities belonging either to commuting flows or the communities shape files. The LMA shape files are registered in an sf R object (sf)

out_shp=CreateLMAshape(lma=lma_name,
                  comIDs="community",
                  lmaIDs="LMA",
                  shp_com=shpBrindisi,
                  id_shp_com="PRO_COM")
# str(shp)

Obviously, if a community is registered in the commuting flows matrix, but not in the communities shape files, the completeness of the polygons should be checked. On the contrary, it might be possible for a community to be registered in the communities shape files, but not in the LMA partition. This is especially the case of the zero.list communities, i.e. those communities having no links with other communities. Such communities should be manually treated before generating the LMA shape files. For example, additional clusters could be created for each such isolated community.

# check whether there are communities registered in lma but not in the communities shape file
out_shp$comID.in.LMA.not.in.SHP
#> integer(0)
#check whether there are communities registered in the communities shape file but not in the lma
out_shp$comID.in.SHP.not.in.LMA
#> integer(0)

In order to graphically visualise the shape of the labour market areas, the shp_lma component of the output of CreateLMAshape can be used:

# plot(shp)
# or

tm_shape(st_geometry(shpBrindisi))+tm_borders("black",alpha=0.5)+tm_fill("gray",alpha=0.2)+tm_shape(st_geometry(out_shp$shp_lma))+tm_borders("red",alpha=0.5)+tm_fill("blue",alpha=0.2)




5. Quality assessment

The package LabourMarketAreas includes both statistical and spatial tools for quality evaluation.

The function StatClusterData computes statistics on the LMA partition, its flows and quality indicators, e.g. statistics on internal cohesion flows, home-work ratio, Q-modularity, etc. Before using this function, the LMA names should be deleted from the corresponding R object, if any.

The function StatReserveList computes statistics on the reserve list. It summarizes the typologies of the communities, their validities, statistics on their original clusters, etc.

lma_no_name=DeleteLmaName(lma_name)
#?StatClusterData
mystat=StatClusterData(lma_no_name,out$param,1,Brindisi)
#> [1] "Warning: names of clusterList were changed into community cluster residents"
# mystat$marginals
# mystat$StatFlows
# mystat$StatQuality

#?StatReserveList
stat_reserve=StatReserveList(out$reserve.list,Brindisi)




6. Fine tuning

The algorithm implemented in the findClusters function esclusively takes into account the travel-to-work commuting flows. Otherwise stated, the spatial contiguity of the communities is not at all considered by the greedy algorithm. Consequently, the spatial extension of some labour market areas might include isolated polygons. The latter are defined as those polygons having no contiguity relationship with other polygons belonging to the same LMA. Moreover, it might be that some LMAs contain a single elementary territorial unit which could be intended as an unwanted feature. The functions FindIsolated and FindContig help users in identifying these critical LMAs.

#?FindIsolated
iso=FindIsolated(lma=lma_name,
                 lma_shp=out_shp$shp_lma,
                 com_shp=shpBrindisi,
                 id_com="PRO_COM"
                )

The function FindIsolated is an interactive one. It plots in a new window containing the isolated polygons together with the communities of the corresponding LMA:

#> NULL
#> NULL




The isolated polygons have a yellow background. The user should identify the association between the isolated polygon and a new community. In this example, the user should type in the R console the community and polygon ID, e.g 74015 and 1_1, respectively. The function FindIsolated loops for all isolated polygons. At the end of the manual introduction of the associations between communities and polygons, the user is also asked to modify or confirm his choice (y/n). Once the association table is confirmed, the interactivity stops.

The output of the FindIsolated function is a list with two quite similar components, one dedicated to particular LMAs and the other containing information about particular polygons. Depending on the case study, these components could be further manipulated and updated in order to reflect the expert choices. Then, these components could be further used to tune the isolated elements:

  1. isolated.lma
  • contig.matrix.lma - the LMA contiguity matrix of the given LMA partition
  • lma.unique - these are the LMAs having a unique community
  • lma.nolink - the LMAs with no link w.r.t. other LMAs
  1. isolated.poly
  • contig.matrix.poly - the contiguity matrix of the polygons of the given LMA partition
  • poly.com.linkage - association between polygons and communities (the one interactively confirmed by the user)
  • poly.nolink - the identifiers of the polygons without links

Once the isolated polygons are identified, the function FindContig should be used to identify the contiguos polygons. Depending on the situation, the function FindContig could be used even twice, once for LMAs and once for polygons.

Firstly, the function FindContig should be used for LMAs with an unique community. For such LMAs, the output is a list of contiguous LMAs. The contiguous LMAs are ordered in decreasing order of commuters who are resident in the given LMA (with unique community).

conti.lma=FindContig(type = "lma", 
                     lma=out$lma, 
                     contig.matrix=iso$isolated.lma$contig.matrix.lma, 
                     isolated=iso$isolated.lma$lma.unique$lma.unique.ID)

conti.lma$list.contig.lma
#> $`74007`
#> [1] "12" "5" 
#> 
#> $`74003`
#> [1] "1"  "12" "20" "8" 
#> 
#> $`74005`
#> [1] "12" "7" 
#> 
#> $`74020`
#> [1] "3" "8"

Secondly, the function FindContig below should be used for isolated polygons. For such polygons, the list containing the IDs of the contiguous labour market areas of each community (polygon) is identified.

conti.poly=FindContig(type = "poly", 
                      lma=out$lma, 
                      contig.matrix=iso$isolated.poly$contig.matrix.poly, 
                      isolated=iso$isolated.poly$poly.com.linkage)


conti.poly$list.contig.poly
#> $`74015`
#> [1] "10" "16"
conti.poly$com_no.LMA.neigh
#> character(0)

In order to be further processed, only the non-empty elements should be selected.

conti.poly$list.contig.poly=
  conti.poly$list.contig.poly[!is.na(conti.poly$list.contig.poly)]

Finally, the LMAs and polygons tunning could be performed after deleting the names of the LMAs. Of course, the LMA names could be re-attached at the end of the process.

out$lma=DeleteLmaName(out$lma)
lma.tuned=FineTuning(dat=Brindisi, 
                     out.ini=out$lma, 
                     list.contiguity=conti.lma$list.contig.lma)

str(lma.tuned$tunned.lma)
#> List of 3
#>  $ clusterList:Classes 'data.table' and 'data.frame':    20 obs. of  3 variables:
#>   ..$ community: num [1:20] 74001 74002 74003 74004 74005 ...
#>   ..$ cluster  : int [1:20] 1 12 12 16 12 10 5 8 10 10 ...
#>   ..$ residents: num [1:20] 21433 3630 4009 1468 2557 ...
#>   ..- attr(*, ".internal.selfref")=<externalptr> 
#>   ..- attr(*, "sorted")= chr "community"
#>  $ LWClus     :Classes 'data.table' and 'data.frame':    36 obs. of  3 variables:
#>   ..$ cluster_live: int [1:36] 1 12 16 10 5 8 1 12 10 5 ...
#>   ..$ cluster_work: int [1:36] 1 1 1 1 1 1 12 12 12 12 ...
#>   ..$ amount      : int [1:36] 26868 1717 1743 3116 236 1062 677 14930 188 559 ...
#>   ..- attr(*, ".internal.selfref")=<externalptr> 
#>  $ marginals  :Classes 'data.table' and 'data.frame':    6 obs. of  3 variables:
#>   ..$ cluster    : num [1:6] 1 5 8 10 12 16
#>   ..$ amount_live: int [1:6] 28895 10162 12066 13596 17638 6835
#>   ..$ amount_work: int [1:6] 34742 10055 11490 10960 16717 5228
#>   ..- attr(*, ".internal.selfref")=<externalptr>
str(lma.tuned$not.tunned.commID)
#>  num(0)

poly.tuned=FineTuning(dat=Brindisi, 
                      out.ini=out$lma, 
                      list.contiguity=conti.poly$list.contig.poly)

output=AssignLmaName(Brindisi,lma.tuned$tunned.lma,names.Brindisi)
out$lma=output




7. Comparison of different partitions

The LMA delineation is a complex process which needs expert input from many fields like statistics, geography, labour market, transportation, etc. The methodology proposed in the package LabourMarketAreas heavily depends on the choice of four parameters, i.e. minSZ, tarSZ, minSC and tarSC. In some cases, users test different parameters and then choose the optimal partition. The package LabourMarketAreas includes several assessment functionalities. In order to illustrate them, we first generate two different LMA partitions.

#generate a partition with a first set of parameters
out1= findClusters(LWCom=Brindisi, minSZ=50,minSC=0.3,tarSZ=100,tarSC=0.4)
#> [1] "The algorithm has converged."
out1_name=AssignLmaName(Brindisi,out1$lma,names.Brindisi)                    
x=CreateLMAshape(out1_name,comIDs="community",lmaIDs="LMA",shp_com=shpBrindisi,id_shp_com="PRO_COM")
shape1=x$shp_lma

#generate a partition with a second set of parameters
out2= findClusters(LWCom=Brindisi, minSZ=1000,minSC=0.6,tarSZ=10000,tarSC=0.7)
#> [1] "The algorithm has converged."
out2_name=AssignLmaName(Brindisi,out2$lma,names.Brindisi)                    
x=CreateLMAshape(out2_name,comIDs="community",lmaIDs="LMA",shp_com=shpBrindisi,id_shp_com="PRO_COM")
shape2=x$shp_lma



Firstly, users could test whether the two partitions are equal:

EqualLmaPartition(out1$lma, out2$lma)



Users could use the function StatClusterData in order to compare the statistical indicators stemming from each LMA partition. For example, users could compare the number of LMAs containing a single community; this is considered a unwanted feature as we do expect that elementary units are related with each other (except for peculiar cases).

#Use the structure without names
stats_first=StatClusterData(out1$lma,c(50,0.3,100,0.4),1,Brindisi)
#> [1] "Warning: names of clusterList were changed into community cluster residents"
#> [1] "Warning: names of LWClus were changed into cluster_live cluster_work amount"
#> [1] "Warning: names of marginals were changed into cluster_live amount_live amount_work"
stats_second=StatClusterData(out2$lma,c(1000,0.6,10000,0.7),1,Brindisi)
#> [1] "Warning: names of clusterList were changed into community cluster residents"
#> [1] "Warning: names of LWClus were changed into cluster_live cluster_work amount"
#> [1] "Warning: names of marginals were changed into cluster_live amount_live amount_work"
#str(stats_first)
#str(stats_second)
stats_first$StatQuality$NbClusterUniqueCom
#> [1] 20
stats_second$StatQuality$NbClusterUniqueCom
#> [1] 4



Instead of individually and repeatedly using the function StatClusterData, two or more partitions could be simultaneously analyzed by means of the function CompareLMAsStat.

comparison=CompareLMAsStat(list(out1,out2),Brindisi)
#> [1] 1
#> [1] "Warning: names of clusterList were changed into community cluster residents"
#> [1] 2
#> [1] "Warning: names of clusterList were changed into community cluster residents"
#> [1] "Warning: names of clusterList were changed into community cluster residents"
#> [1] "Warning: names of clusterList were changed into community cluster residents"



Users could also compare two partitions by checking the assignment of particular communities. The figure below shows the output of the PlotLmaCommunity function. For a given community, e.g. 74014, the figure shows the two LMA to which community 74014 belongs to, in the first and second setting, respectively. The red territories are those in common, while the yellow communities are those beloging to one LMA partition but not to the other.

PlotLmaCommunity(list(out1_name,out2_name),"LMA","74014", shpBrindisi, "PRO_COM","my_full_path\\name_bmp_file.bmp")




Finally, in order to compare two partitions, the package LabourMarketAreas includes also the function LmaSpatialComparison. For each LMA in the first partition, the LMA in the second partition that maximizes the intersection area is found. Then the function LmaSpatialComparison returns the areas of the two LMAs and their intersection area together with the coverage percentages, i.e. shape_area, shape_ref_area, area_intersection , perc_intersection_shape and perc_intersection_shape_ref respectively. The function LmaSpatialComparison also returns the number of employees living and working in the two LMAs.

spatial_comp=LmaSpatialComparison(shape1,shape2)[]
str(spatial_comp)
#> Classes 'data.table' and 'data.frame':   20 obs. of  11 variables:
#>  $ shape_lma                  : chr  "1" "2" "3" "4" ...
#>  $ shape_ref_lma              : chr  "1" "12" "3" "16" ...
#>  $ area_intersection          : num  3.26e+08 1.05e+08 1.30e+08 3.73e+07 5.41e+07 ...
#>  $ shape_area                 : num  3.26e+08 1.05e+08 1.30e+08 3.73e+07 5.41e+07 ...
#>  $ shape_ref_area             : num  4.75e+08 3.28e+08 1.30e+08 1.48e+08 5.41e+07 ...
#>  $ shape_EMP_live             : num  21433 3630 4009 1468 2557 ...
#>  $ shape_EMP_work             : num  29346 2849 3776 1075 2315 ...
#>  $ shape_ref_EMP_live         : num  28895 11072 4009 6835 2557 ...
#>  $ shape_ref_EMP_work         : num  34742 10626 3776 5228 2315 ...
#>  $ perc_intersection_shape    : num  100 100 100 100 100 100 100 100 100 100 ...
#>  $ perc_intersection_shape_ref: num  68.7 32.1 100 25.1 100 ...
#>  - attr(*, ".internal.selfref")=<externalptr>




8. Thematic maps

Besides the dissemination of structural information on labour market areas, the R package LabourMarketAreas includes the possibility to generate thematic maps, i.e. maps based on meaningful statistical indicators. The function AddStatistics could be used to compute statistics at LMA level provided data at community level is available. This function sums the values at community level to obtain the corresponding value at LMA level. The obtained statistics can then, in turn, be used to create maps by mean of the LMA shape files.

lma_pop=AddStatistics(shpBrindisi[,c("PRO_COM","POP2001")], "PRO_COM",out1$lma,"community" )
head(lma_pop)
#>    cluster POP2001
#>      <int>   <int>
#> 1:       1   89081
#> 2:       2   14960
#> 3:       3   21370
#> 4:       4    6818
#> 5:       5   12078
#> 6:       6    8740
shp_stats=sp::merge(shape2,lma_pop,by.x="LMA",by.y="cluster")
tm_shape(shp_stats)+tm_borders("red")+tm_fill("POP2001")+tm_view(view.legend.position = c('right','bottom'))+tm_layout(legend.text.size=0.6)