--- title: "Getting started" output: rmarkdown::html_vignette: toc: true fig_caption: true self_contained: yes fontsize: 11pt documentclass: article bibliography: references.bib csl: reference-style.csl vignette: > %\VignetteIndexEntry{Getting started} %\VignetteEngine{knitr::rmarkdown_notangle} --- ```{r, include = FALSE} # define dummy variables so that vignette passes package checks ##add variables here as needed prt_tradeoffs_metedata <- data.frame(total_restorable = NA) ``` ```{r, include = FALSE} # define variables for vignette figures and code execution h <- 3.5 w <- 3.5 is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_", "_R_CHECK_LICENSE_") %in% names(Sys.getenv())) knitr::opts_chunk$set(fig.align = "center", eval = !is_check) ``` ## Introduction Habitat restoration is urgently need to prevent further declines in biodiversity [@r6; @r7]. Since the resources available for restoring habitat are limited, plans for restoring habitat (hereafter, "restoration prioritizations") need to achieve conservation objectives for minimal cost [@r8]. Indeed, restoration prioritizations need to strategically site restoration actions in places that will enhance connectivity among remaining habitats [@r9]. They also need to account for existing land-use constraints that prevent such actions for being implemented in particular places (e.g., places that are too heavily modified for conservation activities) [@r10]. Since accounting for these considerations and exploring trade-offs between them can be difficult, decision support tools can play an important role in conservation decision making [@r11; @r12; @r13]. Here we provide a tutorial for using the _restoptr R_ package for ecological restoration planning. Briefly, it it uses optimization algorithms to generate restoration prioritizations [based on @r1]. They can be identified by maximizing landscape indices, such as the effective mesh size (Jaeger 2000), or the integral index of connectivity (Pascual-Hortal & Saura 2006). Additionally, constraints can be used to ensure that priority areas exhibit particular characteristics (e.g., ensure that particular places are not selected for restoration, ensure that priority areas form a single contiguous network). Furthermore, multiple near-optimal solutions can be generated to explore multiple options in restoration planning. The package leverages the [Choco-solver](https://choco-solver.org/) software to perform optimization using constraint programming (CP) techniques [@r3]. ## Setup Let's setup our _R_ session for the tutorial. Here, we will load the _restoptr R_ package. We will also load the _terra R_ package for working with raster data, and the _vegan_ and _cluster R_ packages for comparing prioritizations. Additionally, we will set the seed for the random number generator to help ensure consistent results. ```{r, message = FALSE, results = "hide"} # load packages library(restoptr) library(terra) library(vegan) library(cluster) library(ggthemes) # set seed for reproducibility set.seed(500) ``` Since the package needs Java to work, let's verify that Java is installed. ```{r} # check if Java is available for usage is_java_available() ``` If the previous code returns a `TRUE`, then this means that Java is installed on your computer and ready to use. If not, please see the package README documentation for instructions on installing and setting up Java (see https://github.com/dimitri-justeau/restoptr). Assuming that Java is installed correctly, we can then proceed with the tutorial. ## Data Let's import data for generating restoration prioritizations. Specifically, we will use an example dataset distributed with the package [obtained from @r1]. This dataset contains data for prioritizing forest restoration efforts within a protected area in New Caledonia. To begin with, let's import a raster layer to describe the spatial distribution of existing habitat across the study area. This raster layer (i.e., `habitat_data`) contains binary values (zeros and ones) indicating places that (0) do not contain existing habitat and (1) currently contain existing habitat. This raster layer also contains places with missing (`NA`) values, which represent places that should not be considered as part of the prioritization exercise (e.g., places located outside the study area). ```{r, fig.height = 4, fig.width = 7, out.width = "90%"} # load habitat data habitat_data <- rast(system.file( "extdata", "habitat_hi_res.tif", package = "restoptr" )) # preview habitat data print(habitat_data) # visualize habitat data plot(habitat_data, main = "Existing habitat", plg = list(x = "topright")) ``` Next, let's import a raster layer to describe the spatial distribution of places inside the study area that are not available for restoration activities. For example, these places could include places where restoration efforts would be too costly to implement, or places where restoration is not possible because they are needed for other land-uses (e.g., places needed for urban development). Similar to the previous raster layer, this raster layer (i.e., `locked_out_data`) contains binary values (zeros and ones) indicating places that (0) are suitable for restoration actions and (1) not suitable for restoration actions. This raster layer also contains missing (`NA`) values in the same places as the previous layer. ```{r, fig.height = 4, fig.width = 7, out.width = "90%"} # load locked out data locked_out_data <- rast(system.file( "extdata", "locked_out.tif", package = "restoptr" )) # preview locked out data print(locked_out_data) # visualize locked out data plot(locked_out_data, main = "Locked out areas", plg = list(x = "topright"), col=c("darkorange")) ``` After loading in the data, we can begin formulating an optimization problem. ## Problem formulation Now let's formulate a restoration optimization problem. To achieve this, we will build an object that contains the data and settings needed to perform the optimization process. First, we will create an object for the underlying data (using the `restopt_problem()` function). Here, we will specify that we are using the `habitat_data` raster layer to specify the spatial distribution of existing habitat (via the `existing_habitat` parameter). We will also specify that the data should be internally aggregated (using a factor of 16) to reduce computational processing (via the `aggregation_factor` parameter). Additionally, we will specify that -- following data aggregation -- places must contain at least 70% of habitat to be considered adequately managed for biodiversity (via the `habitat_threshold` parameter). Second, we will specify that the objective function -- the metric used to compare competing solutions -- will be a connectivity metric known as the effective mesh size statistic (via the `set_max_mesh_objective()` function) [@r2]. Third, we will specify that the total amount of land selected for restoration should not exceed 220 ha (via the `add_restorable_constraint()` function). Fourth, we will specify that the spatial extent of the priority areas must not exceed 2.5 km (via the `add_compactness_constraint()` function). Fifth, we will specify that particular places -- per the `locked_out_data` raster layer -- should not be selected for restoration activities (via the `add_locked_out_constraint()` function). Finally, we will specify that we wish to generate a single optimal prioritization (via the `add_settings()` function). ```{r} # build problem rp <- restopt_problem( existing_habitat = habitat_data, aggregation_factor = 16, habitat_threshold = 0.7 ) %>% set_max_mesh_objective() %>% add_restorable_constraint( min_restore = 1, max_restore = 220, unit = "ha" ) %>% add_compactness_constraint(2.5, unit = "km") %>% add_locked_out_constraint(data = locked_out_data) %>% add_settings(optimality_gap = 0) # preview problem print(rp) ``` After building the problem formulation object, we can solve it. ## Generating a prioritization We can generate a prioritization by solving the restoration problem object (i.e., `rp`). ```{r, fig.height = 4, fig.width = 7, out.width = "90%"} # solve problem to generate prioritization rs <- solve(rp) # preview prioritization print(rs) # display summary statistics for the prioritization print(get_metadata(rs)) # visualize prioritization ## here, values indicate: ## 0 : places that were locked out. ## 1 : places that were potential candidates for restoration ## 2 : places that already contain existing habitat. ## 3 : places selected for restoration actions. plot( rs, main = "Optimal prioritization", col = c("#E5E5E5", "#fff1d6", "#b2df8a", "#1f78b4"), plg = list(x = "topright") ) ``` We can see that the prioritization selected planning units for restoration in the southern region of the study area (shown in blue). By restoring the habitat inside these planning units, restoration efforts could link up two large patches of existing habitat (shown in green, located north and south of the prioritization). Thus the prioritization has identified a set of priority areas that could produce a single extremely large patch of habitat---substantially improving connectivity within the study area. After generating a prioritization, we can also extract metadata for it. These metadata describe the spatial extent of priority areas, and their ability to promote connectivity according to various metrics. ```{r} # extract metadata get_metadata(rs) ``` Although this prioritization identifies a set of places that -- in combination -- can help promote connectivity, it does not necessarily tell us which of these places are more (or less) important for achieving this objective. ## Relative importance We can assess the relative importance of places for restoring connectivity. This information is useful to help facilitate stakeholder discussions and determine which places in the prioritization should scheduled immediately for restoration. To assess relative importance, we will generate a portfolio of optimal prioritizations and calculate the number of times that each planning unit is selected for prioritization. This approach is similar to the selection frequency statistics provided by the _Marxan_ decision support tool for conservation planning [@r4]. ```{r, message = FALSE, result = "hide"} # generate multiple prioritizations for assessing relative importance prt_imp <- rp %>% add_settings(nb = 100, optimality_gap = 0) %>% solve(verbose = FALSE) %>% rast() ``` ```{r, fig.height = 4, fig.width = 7, out.width = "90%"} # preview portfolio prioritizations print(prt_imp) # calculate selection frequency of from portfolio selfreq <- sum(prt_imp == 3) # preview selection frequency data print(selfreq) # visualize selection frequencies plot(selfreq, main = "Selection frequencies", type = "continuous") ``` We can see that some planning units are critical for implementing an optimal prioritization (shown in dark green) and there are other planning units which are not so important (shown in yellow or pink). In addition to evaluating relative importance, we can also generate portfolios to explore alternative options for restoration. ## Exploring alternatives Spatial conservation planning exercises can help facilitate stakeholder discussions by presenting a range of alternate prioritizations. This is because stakeholders might have additional objectives that were not previously communicated (e.g., conservation of a particular charismatic species), or because existing data may be insufficient to fully parametrize all of their objectives (e.g., although a stakeholder might value a particular ecosystem service, data may not exist to incorporate the ecosystem service into the prioritization process). As such, it can be useful to generate a set of multiple prioritizations that have (roughly) similar performance and exhibit different spatial configurations. So, let's generate a second portfolio of prioritizations. Since our aim is to identify alternate spatial configurations with relatively similar performance, we will generate prioritizations under a range of different optimality gaps (i.e., ranging from 0% to 15%). ```{r, message = FALSE, result = "hide"} # define optimality gaps gap_values <- seq(0, 0.15, 0.1) # generate portfolio of prioritizations prt_alt <- lapply(gap_values, function(x) { rp %>% add_settings( nb = 10, optimality_gap = x, solution_name_prefix = paste0("opt_gap_", x, "__#") ) %>% solve(v, verbose = FALSE) }) %>% unlist(recursive = FALSE, use.names = FALSE) %>% rast() # preview portfolio print(prt_alt) ``` Although we generated a portfolio of different prioritizations, it's difficult to tell which exactly which prioritizations are different to each other. Although we could try viewing the prioritizations one by one, this isn't really practical. So, let's conduct a statistical analysis to visualize differences among the planning units -- based on which planning units they selected -- using a hierarchical cluster analysis [@r5]. ```{r, fig.height = 4, fig.width = 7, out.width = "90%"} # extract prioritization values prt_values <- as.data.frame(prt_alt == 3) # calculate pair-wise distances between different prioritizations for analysis prt_dists <- vegdist(t(prt_values), method = "jaccard", binary = TRUE) # run cluster analysis prt_clust <- hclust(as.dist(prt_dists), method = "average") # visualize clusters withr::with_par( list(oma = c(0, 0, 0, 0), mar = c(0, 4.1, 1.5, 2.1)), plot( prt_clust, labels = FALSE, sub = NA, xlab = "", main = "Cluster analysis of prioritizations in portfolio" ) ) ``` We can see that there are two main groups of prioritizations in the portfolio. To identify prioritizations within each of these groups, let's conduct a k-medoids cluster analysis to extract the most typical prioritization within each group. ```{r, fig.height = 8, fig.width = 7, out.width = "90%"} # run k-medoids analysis prt_med <- pam(prt_dists, k = 2) # extract names of prioritizations that are most central for each group prt_med_names <- prt_med$medoids print(prt_med_names) # create a new portfolio based on these prioritizations prt_cand <- prt_alt[[prt_med_names]] # visualize prioritizations in subset portfolio ## here, values indicate: ## 0 : places that were locked out. ## 1 : places that were potential candidates for restoration ## 2 : places that already contain existing habitat. ## 3 : places selected for restoration actions. plot( prt_cand, col = c("#E5E5E5", "#fff1d6", "#b2df8a", "#1f78b4"), plg = list(x = "topright"), nc = 1 ) ``` We can see that the first prioritization (shown on the top) is very similar to the optimal solution we generated previously. Indeed -- similar to the optimal prioritization -- this prioritization selected a set of planning units to link up the two patches of existing habitat in the southern region of the study area. However, the second prioritizations (shown on the bottom) selected a set of planning units in the central region of the study area. This prioritization identified a different set of planning units to link up other patches of existing habitat. Since both prioritizations are within 15% of optimality (per the greatest optimality gap used to generate the portfolio), they both have roughly the same performance. ## Evaluating trade-offs Conservation decision making fundamentally involves making trade-offs between multiple objectives. Here, the main trade-off is between (i) maximizing the level of connectivity that can be promoted by restoring habitats and (ii) minimizing the total cost of restoration actions. Specifically, we consider total cost to be reflected by the total area of the planning units selected for restoration. This trade-off can be particularly acute, wherein large gains in connectivity can be achieved through only a minor increase in cost. As such, it can be very useful to understand how the increasing the maximum amount of land available for restoration can influence the connectivity. To explore this, we will generate a portfolio of prioritizations using different constraints and measure their performance. Since we are interested in exploring trade-offs -- rather than alternate prioritizations -- we will specify an optimality gap of 0%. ```{r, message = FALSE, result = "hide"} # define a set of different maximum area values for generating prioritizations ## N.B. these values are in hectares max_restore_values <- seq(100, 450, 10) # build a baseline problem without restorable constraints bp <- restopt_problem( existing_habitat = habitat_data, aggregation_factor = 16, habitat_threshold = 0.7 ) %>% set_max_mesh_objective() %>% add_compactness_constraint(2.5, unit = "km") %>% add_locked_out_constraint(data = locked_out_data) %>% add_settings(optimality_gap = 0) # generate a portfolio of prioritizations to explore trade-offs prt_tradeoffs <- lapply(max_restore_values, function(x) { bp %>% add_restorable_constraint(min_restore = 1, max_restore = x, unit = "ha") %>% solve(verbose = FALSE) }) ```{r, fig.height = 9, fig.width = 9, out.width = "90%"} # extract metadata for each prioritization prt_tradeoffs_metedata <- do.call(rbind, lapply(prt_tradeoffs, get_metadata)) prt_tradeoffs_metedata$max_restore <- max_restore_values # print metadata print(prt_tradeoffs_metedata) # convert portfolio into a stack of rasters prt_tradeoffs <- rast(prt_tradeoffs) # preview portfolio print(prt_tradeoffs) # visualize some of the prioritizations in portfolio ## here, titles indicate maximum area and values indicate: ## 0 : places that were locked out. ## 1 : places that were potential candidates for restoration ## 2 : places that already contain existing habitat. ## 3 : places selected for restoration actions. plot_idx <- round(seq(1, length(max_restore_values), length.out = 9)) plot( prt_tradeoffs[[plot_idx]], main = paste("max_restore = ", max_restore_values[plot_idx]), col = c("#E5E5E5", "#fff1d6", "#b2df8a", "#1f78b4"), plg = list(x = "topright") ) ``` We can see that all of the prioritizations within this portfolio are similar to the first optimal prioritization that we generated at the start of this tutorial. Indeed, they all focus restoration efforts on connecting the two large patches of existing habitat in the southern region of the study area. They mainly differ in terms of the number of planning units selected for restoration, with prioritizations generated with a higher maximum amount of land (e.g., `max_restore = 450`) restoring a larger contiguous block of land between the two patches of existing habitat. To help understand this relationship better, let's create a plot. ```{r, fig.height = 5, fig.width = 6} # create plot showing relationship between restorable area and connectivity ## N.B. the prioritization corresponding to a maximum restorable area of 220 ## is shown in red, which is the same value used for previous prioritizations plot( mesh_best ~ max_restore_values, data = prt_tradeoffs_metedata, xlab = "Area restored (ha)", ylab = "Connectivity (mesh statistic)", pch = 16, col = ifelse(prt_tradeoffs_metedata$max_restore == 200, "red", "black") ) ``` We can see that there is a steady increase in connectivity (as measured by the effective mesh size metric) and the maximum amount of land that can be restored. This means that restoring more land can result in more connectivity. Interestingly, we also see that the level of connectivity reaches a plateau at approximately 350 ha of land. This is because -- when looking at `prt_tradeoffs_metedata` -- we can see that the largest amount of land restored in any prioritization is `r round(max(as.numeric(prt_tradeoffs_metedata$total_restorable)), 2)` ha. Although some of the prioritizations were generated using a higher maximum restoreable area constraint (e.g., 450 ha), the compactness and locked out constraints combined to prevent an larger amount of land from being selected for restoration. Thus we can see that multiple constraints can, in combination, affect the ability for restoration efforts to achieve conservation objectives. ## Conclusion Hopefully, this tutorial has provided a useful introduction to the _restoptr R_ package. For more information on using the package, please consult the package documentation which provides examples for each of the functions. If you have any questions about using the package, suggestions for improving it, or encounter any bugs, please [open an issue](https://github.com/dimitri-justeau/restoptr/issues/new/choose) on the online code repository. ## References