---
title: "dynamicSDM: Explanatory variable data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{dynamicSDM: Explanatory variable data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, include = FALSE}

```

```{r setup}
library(dynamicSDM)
```

## Stage 2: Explanatory variable data

In this tutorial, we will be extracting spatio-temporally buffered explanatory variables for each
occurrence and pseudo-absence record. The *dynamicSDM* functions for extracting such variables
require Google Earth Engine and Google Drive to be initialised. Fill in the code below with your
Google account email, and run the code to check that *rgee* and *googledrive* have been correctly
installed and authorised.

```{r check Google, eval=FALSE}
library(rgee)
rgee::ee_check()

library(googledrive)
googledrive::drive_user()

# Set your user email here
#user.email<-"your_google_email_here"
```

Note: You will need internet connection for this tutorial. Variable extraction may take some time
depending on your internet connection strength. If you try out these functions and are excited to
move onto the next tutorial, then don’t worry - you can read the extracted data into your R
environment from the dynamicSDM package.

## Directory organisation 
We will be extracting data for three dynamic explanatory variables. Let’s
first create new folders within the project directory to export extracted variable data to.

```{r create directories}
project_directory <- file.path(file.path(tempdir(), "dynamicSDM_vignette"))

dir.create(project_directory)

variablenames<-c("eight_sum_prec","year_sum_prec","grass_crop_percentage")

extraction_directories <- file.path(file.path(project_directory,"extraction"))
dir.create(extraction_directories)

extraction_directory_1 <- file.path(file.path(project_directory,variablenames[1]))
dir.create(extraction_directory_1)

extraction_directory_2 <- file.path(file.path(project_directory,variablenames[2]))
dir.create(extraction_directory_2)

extraction_directory_3 <- file.path(file.path(project_directory,variablenames[3]))
dir.create(extraction_directory_3)

```

Now, the filtered occurrence and pseudo-absence record data frame generated in the first tutorial
can be imported or read into your R environment from the *dynamicSDM* package.

```{r load data}
# sample_filt_data<-read.csv(paste0(project_directory,"/filtered_quelea_occ.csv"))
data(sample_filt_data)
```


### a)	Extract dynamic explanatory variables

`extract_dynamic_coords()` extracts processed remote sensing data using the Google Earth Engine
cloud servers. There are various arguments to this function to specify the explanatory variable
including:

•	`datasetname`: the dataset’s Google Earth Engine catalogue name.

•	`bandname` : the band of interest with the dataset.

•	`temporal.res` : the temporal resolution (i.e. the number of days to calculate the variable over).

•	`temporal.direction`: temporal direction (days either prior or post each record’s date).

• `spatial.res.metres`: spatial resolution (the resolution in metres to extract data at). 

•	`GEE.math.fun` : the mathematical function to calculate across the period (e.g. mean, sum or
standard deviation across the given period).

#### Case study

The distribution of our case study species, the red-billed quelea, is driven by precipitation
levels. Run the code below to extract the sum of precipitation across the 8-week and 52-week period
prior to each occurrence record from the Climate Hazards Group InfraRed Precipitation with Station
data (CHIRPS) dataset at
[GEE](https://developers.google.com/earth-engine/datasets/catalog/UCSB-CHG_CHIRPS_DAILY).

For the 8-week precipitation extraction, we will use the split method to save extracted data. Notice
how each record’s data are extracted and exported individually. If you specify `resume = T`, then if
internet connection is lost, progress can be resumed.

```{r example-extract_dynamic_coords week, eval=F}
# 8-week total precipitation
extract_dynamic_coords(occ.data=sample_filt_data,
                       datasetname = "UCSB-CHG/CHIRPS/DAILY",
                       bandname="precipitation",
                       spatial.res.metres = 5566 ,
                       GEE.math.fun = "sum",
                       temporal.direction = "prior",
                       temporal.res = 56,
                       save.method = "split",
                       varname = variablenames[1],
                       save.directory = extraction_directory_1)

```

For the 52-week precipitation extraction, we will use the combined method to save extracted data.
Here, all data are extracted and then exported as a single data frame. This approach writes fewer
files but may be more vulnerable to internet connection outage because all progress will be lost and
cannot be resumed.

```{r example-extract_dynamic_coords annual,eval=F}
# 52-week total precipitation
extract_dynamic_coords(occ.data=sample_filt_data,
                       datasetname = "UCSB-CHG/CHIRPS/DAILY",
                       bandname = "precipitation",
                       spatial.res.metres = 5566 ,
                       GEE.math.fun = "sum",
                       temporal.direction = "prior",
                       temporal.res = 364,
                       save.method = "combined",
                       varname = variablenames[2],
                       save.directory = extraction_directory_2)
```


### b) Extract spatially buffered explanatory variables

`extract_buffered_coords() `extracts explanatory variable data across a spatial buffer from
occurrence record co-ordinates. These variables can be categorical or continuous, but if a temporal
buffer is also used only continuous data will work. This function utilises a “moving window matrix”
that specifies the neighbourhood of cells (spatial buffer area) surrounding each occurrence record’s
cell that will also be included in the calculation. `get_moving_window()` generates the optimal
“moving window matrix” sizes based upon a given spatial radius and resolution of remote-sensing
data.

#### Case study

The distribution of red-billed quelea is driven by availability of wild grass and cereal crop seed
availability. The code below extracts the total number of grassland or cereal cropland cells across
a spatial buffer from the MODIS Annual Land Cover Type dataset
[googleearthenginecatalogue](https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD12Q1#bands).

First, however, we must generate the optimal moving window matrix for this calculated based upon the
fact that quelea travel up to 10km to access resources and that the data will be at 0.05 degree
resolution (500m aggregated by 12).

```{r example-get_moving_window}
matrix <- get_moving_window(radial.distance = 10000,
                                        spatial.res.degrees = 0.05,
                                        spatial.ext = c(-35, -6, 10, 40))
matrix
```


```{r example-extract_buffered_coords,eval=F}
# Total grassland and cereal cropland cells in surrounding area
extract_buffered_coords(occ.data=sample_filt_data,
                        datasetname = "MODIS/006/MCD12Q1",
                        bandname="LC_Type5",
                        spatial.res.metres = 500,
                        GEE.math.fun = "sum",
                        moving.window.matrix=matrix,
                        user.email= user.email,
                        save.method="split",
                        temporal.level="year",
                        categories=c(6,7),
                        agg.factor = 12,
                        varname = variablenames[3],
                        save.directory=extraction_directory_3)
```


### c) Combine explanatory variable data

Data for each explanatory variable have been saved across multiple directories and files.
`extract_coord_combine()` combine the extracted explanatory variable data into a single data frame.

```{r combine extracted data,eval=F}
complete.dataset <- extract_coords_combine(varnames = variablenames,
                                           local.directory = c(extraction_directory_1,
                                                               extraction_directory_2,
                                                               extraction_directory_3))
```

## Summary

At the end of this vignette, we now have a complete data frame of filtered species occurrence and
pseudo-absence records with associated extracted dynamic variables. Let’s save this to our project
directory for use in the next tutorial!


```{r save extracted data,eval=F}
# Set NA values as zero 
complete.dataset[is.na(complete.dataset$grass_crop_percentage),"grass_crop_percentage"]<-0

write.csv(complete.dataset, file = paste0(project_directory, "/extracted_quelea_occ.csv"))

```