Title: | Species Distribution Modeling and Ecological Niche Modeling |
---|---|
Description: | Implements species distribution modeling and ecological niche modeling, including: bias correction, spatial cross-validation, model evaluation, raster interpolation, biotic "velocity" (speed and direction of movement of a "mass" represented by a raster), interpolating across a time series of rasters, and use of spatially imprecise records. The heart of the package is a set of "training" functions which automatically optimize model complexity based number of available occurrences. These algorithms include MaxEnt, MaxNet, boosted regression trees/gradient boosting machines, generalized additive models, generalized linear models, natural splines, and random forests. To enhance interoperability with other modeling packages, no new classes are created. The package works with 'PROJ6' geodetic objects and coordinate reference systems. |
Authors: | Adam B. Smith [cre, aut] |
Maintainer: | Adam B. Smith <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.10 |
Built: | 2024-12-11 16:43:43 UTC |
Source: | CRAN |
Calculates metrics of "movement" of cell densities across a time series of rasters. Rasters could represent, for example, the probability of presence of a species through time. In this case, velocities would indicate rates and directions of range shift. The simplest metric is movement of the density-weighted centroid (i.e., range "center"), but many more are available to provide a nuanced indicator of velocity. See Details
for the types of metrics that can be calculated.
bioticVelocity( x, times = NULL, atTimes = NULL, elevation = NULL, metrics = c("centroid", "nsCentroid", "ewCentroid", "nCentroid", "sCentroid", "eCentroid", "wCentroid", "nsQuants", "ewQuants", "similarity", "summary"), quants = c(0.05, 0.1, 0.5, 0.9, 0.95), onlyInSharedCells = FALSE, cores = 1, warn = TRUE, longitude = NULL, latitude = NULL, paths = NULL, ... )
bioticVelocity( x, times = NULL, atTimes = NULL, elevation = NULL, metrics = c("centroid", "nsCentroid", "ewCentroid", "nCentroid", "sCentroid", "eCentroid", "wCentroid", "nsQuants", "ewQuants", "similarity", "summary"), quants = c(0.05, 0.1, 0.5, 0.9, 0.95), onlyInSharedCells = FALSE, cores = 1, warn = TRUE, longitude = NULL, latitude = NULL, paths = NULL, ... )
x |
Either a
|
times |
Numeric vector with the same number of layers in |
atTimes |
Numeric, values of |
elevation |
Either |
metrics |
Biotic velocity metrics to calculate (default is to calculate them all). All metrics ignore
|
quants |
Numeric vector indicating the quantiles at which biotic velocity is calculated for the " |
onlyInSharedCells |
If |
cores |
Positive integer. Number of processor cores to use. Note that if the number of time steps at which velocity is calculated is small, using more cores may not always be faster. If you have issues when |
warn |
Logical, if |
longitude |
Numeric matrix or
|
latitude |
Numeric matrix or
|
paths |
This is used internally and rarely (never?) needs to be defined by a user (i.e., leave it as |
... |
Other arguments (not used). |
Attention:
This function may yield erroneous velocities if the region of interest is near or spans a pole or the international date line. Results using the "Quant" and "quant" metrics may be somewhat counterintuitive if just one cell is >0, or one row or column has the same values with all other values equal to 0 or NA
because defining quantiles in these situations is not intuitive. Results may also be counterintuitive if some cells have negative values because they can "push" a centroid away from what would seem to be the center of mass as assessed by visual examination of a map.
Note:
For the nsQuants
and ewQuants
metrics it is assumed that the latitude/longitude assigned to a cell is at its exact center. For calculating the position of a quantile, density is interpolated linearly from one cell center to the center of the adjacent cell. If a desired quantile does not fall exactly on the cell center, it is calculated from the interpolated values. For quantiles that fall south/westward of the first row/column of cells, the cell border is assumed to be at 0.5 * cell length south/west of the cell center.
A data frame with biotic velocities and related values. Fields are as follows:
timeFrom
: Start time of interval
timeTo
: End time of interval
timeMid
: Time point between timeFrom
and timeTo
timeSpan
: Duration of interval
Depending on metrics
that are specified, additional fields are as follows. All measurements of velocity are in distance units (typically meters) per time unit (which is the same as the units used for times
and atTimes
). For example, if the rasters are in an Albers equal-area projection and times
are in years, then the output will be meters per year.
If metrics
has 'centroid'
: Columns named centroidVelocity
, centroidLong
, centroidLat
– Speed of weighted centroid, plus its longitude and latitude (in the timeTo
period of each time step). Values are always >= 0.
If metrics
has 'nsCentroid'
: Columns named nsCentroid
and nsCentroidLat
– Velocity of weighted centroid in north-south direction, plus its latitude (in the timeTo
period of each time step). Positive values connote movement north, and negative values south.
If metrics
has 'ewCentroid'
: ewCentroid
and ewCentroidLong
– Velocity of weighted centroid in east-west direction, plus its longitude (in the timeTo
period of each time step). Positive values connote movement east, and negative values west.
If metrics
has 'nCentroid'
, 'sCentroid'
, 'eCentroid'
, and/or 'wCentroid'
: Columns named nCentroidVelocity
and nCentroidAbund
, sCentroid
and sCentroidAbund
, eCentroid
and eCentroidAbund
, and/or wCentroid
and wCentroidAbund
– Speed of weighted centroid of all cells that fall north, south, east, or west of the landscape-wide centroid, plus a column indicating the total weight (abundance) of all such populations. Values are always >= 0.
If metrics
contains any of nsQuants
or ewQuants
: Columns named nsQuantVelocity_quant
Q and nsQuantLat_quant
Q, or ewQuantVelocity_quant
Q and ewQuantLat_quant
Q: Velocity of the Qth quantile weight in the north-south or east-west directions, plus the latitude or longitude thereof (in the timeTo
period of each time step). Quantiles are cumulated starting from the south or the west, so the 0.05th quantile, for example, is in the far south or west of the range and the 0.95th in the far north or east. Positive values connote movement north or east, and negative values movement south or west.
If metrics
contains similarity
, metrics of similarity are calculated for each pair of successive landscapes, defined below as x1
(raster in timeFrom
) and x2
(raster in timeTo
), with the number of shared non-NA
cells between them being n
:
A column named simpleMeanDiff
: sum(x2 - x1, na.rm = TRUE) / n
A column named meanAbsDiff
: sum(abs(x2 - x1), na.rm = TRUE) / n
A column named rmsd
(root-mean square difference): sqrt(sum((x2 - x1)^2, na.rm = TRUE)) / n
A column named godsoeEsp
: 1 - sum(2 * (x1 * x2), na.rm=TRUE) / sum(x1 + x2, na.rm = TRUE)
, values of 1 ==> maximally similar, 0 ==> maximally dissimilar.
A column named schoenersD
: 1 - (sum(abs(x1 - x2), na.rm = TRUE) / n)
, values of 1 ==> maximally similar, 0 ==> maximally dissimilar.
A column named warrensI
: 1 - sqrt(sum((sqrt(x1) - sqrt(x2))^2, na.rm = TRUE) / n)
, values of 1 ==> maximally similar, 0 ==> maximally dissimilar.
A column named cor
: Pearson correlation between x1
and x2
.
A column named rankCor
: Spearman rank correlation between x1
and x2
.
If metrics
contains elevCentroid
: Columns named elevCentroidVelocity
and elevCentroidElev
– Velocity of the centroid in elevation (up or down) and the elevation in the "to" timestep. Positive values of velocity connote movement upward, and negative values downward.
If metrics
contains elevQuants
: Columns named elevQuantVelocity_quant
Q and elevQuantVelocityElev_quant
Q – Velocity of the Nth quantile of mass in elevation (up or down) and the elevation of this quantile in the "to" timestep. Positive values of velocity connote movement upward, and negative values downward.
If metrics
contains summary
:
A column named propSharedCellsNotNA
: Proportion of cells that are not NA
in both the "from" and "to" time steps. The proportion is calculated using the total number of cells in a raster as the denominator (i.e., not total number of cells across two rasters).
Columns named timeFromPropNotNA
and timeToPropNotNA
: Proportion of cells in the "from" time and "to" steps that are not NA
.
A column named mean
: Mean weight in timeTo
time step. In the same units as the values of the cells.
Columns named quantile_quant
Q: The Qth quantile(s) of weight in the timeTo
time step. In the same units as the values of the cells.
A column named prevalence
: Proportion of non-NA
cells with weight >0 in the timeTo
time step relative to all non-NA
cells. Unitless.
# NB These examples can take a few minutes to run. # To illustrate calculation and interpretation of biotic velocity, # we will calibrate a SDM for the Red-Bellied Lemur and project # the model to the present and successive future climates. The time series # of rasters is then used to calculate biotic velocity. library(sf) library(terra) ### process environmental rasters ################################# # get rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) rastFile <- system.file('extdata/madClim2030.tif', package='enmSdmX') madClim2030 <- rast(rastFile) rastFile <- system.file('extdata/madClim2050.tif', package='enmSdmX') madClim2050 <- rast(rastFile) rastFile <- system.file('extdata/madClim2070.tif', package='enmSdmX') madClim2070 <- rast(rastFile) rastFile <- system.file('extdata/madClim2090.tif', package='enmSdmX') madClim2090 <- rast(rastFile) # The bioticVelocity() function needs rasters to be in equal-area # projection, so we will project them here. madAlbers <- getCRS('madAlbers') # Albers projection for Madagascar madClim <- project(madClim, madAlbers) madClim2030 <- project(madClim2030, madAlbers) madClim2050 <- project(madClim2050, madAlbers) madClim2070 <- project(madClim2070, madAlbers) madClim2090 <- project(madClim2090, madAlbers) # Coordinate reference systems: wgs84 <- getCRS('WGS84') # WGS84 madAlbers <- getCRS('madAlbers') # Madagascar Albers # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- project(occs, madAlbers) # eliminate cell duplicates occs <- elimCellDuplicates(occs, madClim) # extract environment at occurrences occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create background sites (using just 1000 to speed things up!) bgEnv <- terra::spatSample(madClim, 3000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) ### calibrate model ################### predictors <- c('bio1', 'bio12') # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast cores = 2 ) ### project to present and future climate ######################################### predPresent <- predictEnmSdm(mx, madClim) pred2030 <- predictEnmSdm(mx, madClim2030) pred2050 <- predictEnmSdm(mx, madClim2050) pred2070 <- predictEnmSdm(mx, madClim2070) pred2090 <- predictEnmSdm(mx, madClim2090) plot(predPresent, main = 'Present Suitability') # plot change in suitability between present and 2090s delta <- pred2090 - predPresent plot(delta, main = 'Change in Suitability') ### calculate biotic velocity ############################# series <- c( predPresent, pred2030, pred2050, pred2070, pred2090 ) names(series) <- c('present', 't2030', 't2050', 't2070', 't2090') plot(series) times <- c(1985, 2030, 2050, 2070, 2090) quants <- c(0.10, 0.90) bv <- bioticVelocity( x = series, times = times, quants = quants, cores = 2 ) bv ### centroid velocities # centroid (will always be >= 0) # fastest centroid movement around 2060 plot(bv$timeMid, bv$centroidVelocity, type = 'l', xlab = 'Year', ylab = 'Speed (m / y)', main = 'Centroid Speed') # velocity northward/southward through time # shows northward shift because always positive, fastest around 2060 plot(bv$timeMid, bv$nsCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Centroid N/S Velocity') # velocity eastward (positive)/westward (negative) through time # movement eastward (positive) first, then westward (negative) plot(bv$timeMid, bv$ewCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Centroid E/W Velocity') ### map of centroid location through time # shows centroid moves slightly northward through time plot(delta, main = 'Centroid Location &\nChange in Suitability') points(bv$centroidLong[1], bv$centroidLat[1], pch = 1) points(bv$centroidLong[4], bv$centroidLat[4], pch = 16) lines(bv$centroidLong, bv$centroidLat) legend( 'bottomright', legend = c( 'start (~1985)', 'stop (~2090)', 'trajectory' ), pch = c(1, 16, NA), lwd = c(NA, NA, 1) ) ### velocities of portions of range north/south/east/west of centroid # positive ==> northward shift # negative ==> southward shift # portion of range north of centroid # shows northward expansion because always positive plot(bv$timeMid, bv$nCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Northern Part of Range') # portion of range south of centroid # shows northward contraction because always positive plot(bv$timeMid, bv$sCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Southern Part of Range') # portion of range east of centroid # shows eastern portion moves farther east plot(bv$timeMid, bv$eCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Eastern Part of Range') # portion of range west of centroid # shows western portion moves east plot(bv$timeMid, bv$wCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Western Part of Range') ### velocities of range margins # from south to north, 10th and 90th quantiles of density # positive ==> northward shift # negative ==> southward shift # shows both northern and southern range margins shift northward # because always positive... northern margin shift usually slower ylim <- range(bv$nsQuantVelocity_quant0p1, bv$nsQuantVelocity_quant0p9) plot(bv$timeMid, bv$nsQuantVelocity_quant0p1, type = 'l', ylim = ylim, xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Northern/Southern Range Margins') lines(bv$timeMid, bv$nsQuantVelocity_quant0p9, lty = 'dashed') legend( 'bottomright', legend = c('Southern Margin', 'Northern Margin'), lty = c('solid', 'dashed') ) # from east to west, 10th and 90th quantiles of density # positive ==> eastward shift # negative ==> westward shift ylim <- range(bv$ewQuantVelocity_quant0p1, bv$ewQuantVelocity_quant0p9) plot(bv$timeMid, bv$ewQuantVelocity_quant0p1, type = 'l', ylim = ylim, xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Eastern/Western Range Margins') lines(bv$timeMid, bv$ewQuantVelocity_quant0p9, lty = 'dashed') legend( 'bottomright', legend = c('Eastern Margin', 'Western Margin'), lty = c('solid', 'dashed') ) ### summary statistics # mean density across cells through time plot(bv$timeMid, bv$mean, type = 'l', xlab = 'Year', ylab = 'Mean Density', main = 'Mean Density') # sum of density across cells through time plot(bv$timeMid, bv$sum, type = 'l', xlab = 'Year', ylab = 'Sum of Density', main = 'Sum of Density') ### change metrics # average change in suitability from one time period to next # shows average conditions getting worse plot(bv$timeMid, bv$simpleMeanDiff, type = 'l', xlab = 'Year', ylab = 'Mean Change in Suitability') # average absolute change in suitability from one time period to next # shows average absolute change declining plot(bv$timeMid, bv$meanAbsDiff, type = 'l', xlab = 'Year', ylab = 'Mean Absolute Change in Suitability') # root-mean square difference from one time period to the next # shows difference between successive rasters declines through time plot(bv$timeMid, bv$rmsd, type = 'l', xlab = 'Year', ylab = 'RMSD') ### raster similarity # most indicate that successive rasters are similar through time ylim <- range(bv$godsoeEsp, bv$schoenerD, bv$warrenI, bv$cor, bv$warrenI) plot(bv$timeMid, bv$godsoeEsp, type = 'l', lty = 1, col = 1, xlab = 'Year', ylab = 'Raster similarity', ylim = ylim) lines(bv$timeMid, bv$schoenerD, lty = 2, col = 2) lines(bv$timeMid, bv$warrenI, lty = 3, col = 3) lines(bv$timeMid, bv$cor, lty = 4, col = 4) lines(bv$timeMid, bv$rankCor, lty = 5, col = 5) legend( 'right', legend = c( 'Godsoe\'s ESP', 'Schoener\'s D', 'Warren\'s I', 'Correlation', 'Rank Correlation' ), col = 1:5, lty = 1:5 ) # values of 10th and 90th quantiles across cells through time # shows most favorable cells becoming less favorable # least favorable cells remain mainly unchanged ylim <- range(bv$quantile_quant0p1, bv$quantile_quant0p9) plot(bv$timeMid, bv$quantile_quant0p1, type = 'l', ylim = ylim, xlab = 'Year', ylab = 'Quantile Value', main = 'Quantiles across Cells') lines(bv$timeMid, bv$quantile_quant0p9, lty = 'dashed') legend( 'topright', legend = c('10th quantile', '90th quantile'), lty = c('solid', 'dashed') ) ### map of northern/southern range margins through time # range of longitude shown in plot madExtent <- ext(madClim) xExtent <- as.vector(madExtent)[1:2] plot(predPresent, main = 'North/South Range Margin Location') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p9[1], bv$nsQuantLat_quant0p9[1])) lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p9[2], bv$nsQuantLat_quant0p9[2]), lty = 'dashed') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p9[3], bv$nsQuantLat_quant0p9[3]), lty = 'dotdash') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p9[4], bv$nsQuantLat_quant0p9[4]), lty = 'dotted') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p1[1], bv$nsQuantLat_quant0p1[1])) lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p1[2], bv$nsQuantLat_quant0p1[2]), lty = 'dashed') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p1[3], bv$nsQuantLat_quant0p1[3]), lty = 'dotdash') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p1[4], bv$nsQuantLat_quant0p1[4]), lty = 'dotted') legend( 'bottomright', legend = c( '1980s', '2030s', '2050s', '2070s', '2090s' ), lty = c('solid', 'dashed', 'dotdash', 'dotted') ) ### map of eastern/western range margins through time # range of longitude shown in plot madExtent <- ext(madClim) yExtent <- as.vector(madExtent)[3:4] plot(predPresent, main = 'North/South Range Margin Location') lines(c(bv$ewQuantLong_quant0p9[1], bv$ewQuantLong_quant0p9[1]), c(yExtent[1], yExtent[2])) lines(c(bv$ewQuantLong_quant0p9[2], bv$ewQuantLong_quant0p9[2]), c(yExtent[1], yExtent[2]), lty = 'dashed') lines(c(bv$ewQuantLong_quant0p9[3], bv$ewQuantLong_quant0p9[3]), c(yExtent[1], yExtent[2]), lty = 'dotdash') lines(c(bv$ewQuantLong_quant0p9[4], bv$ewQuantLong_quant0p9[4]), c(yExtent[1], yExtent[2]), lty = 'dotted') lines(c(bv$ewQuantLong_quant0p1[1], bv$ewQuantLong_quant0p1[1]), c(yExtent[1], yExtent[2])) lines(c(bv$ewQuantLong_quant0p1[2], bv$ewQuantLong_quant0p1[2]), c(yExtent[1], yExtent[2]), lty = 'dashed') lines(c(bv$ewQuantLong_quant0p1[3], bv$ewQuantLong_quant0p1[3]), c(yExtent[1], yExtent[2]), lty = 'dotdash') lines(c(bv$ewQuantLong_quant0p1[4], bv$ewQuantLong_quant0p1[4]), c(yExtent[1], yExtent[2]), lty = 'dotted') legend( 'bottomright', legend = c( '1980s', '2030s', '2050s', '2070s', '2090s' ), lty = c('solid', 'dashed', 'dotdash', 'dotted') )
# NB These examples can take a few minutes to run. # To illustrate calculation and interpretation of biotic velocity, # we will calibrate a SDM for the Red-Bellied Lemur and project # the model to the present and successive future climates. The time series # of rasters is then used to calculate biotic velocity. library(sf) library(terra) ### process environmental rasters ################################# # get rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) rastFile <- system.file('extdata/madClim2030.tif', package='enmSdmX') madClim2030 <- rast(rastFile) rastFile <- system.file('extdata/madClim2050.tif', package='enmSdmX') madClim2050 <- rast(rastFile) rastFile <- system.file('extdata/madClim2070.tif', package='enmSdmX') madClim2070 <- rast(rastFile) rastFile <- system.file('extdata/madClim2090.tif', package='enmSdmX') madClim2090 <- rast(rastFile) # The bioticVelocity() function needs rasters to be in equal-area # projection, so we will project them here. madAlbers <- getCRS('madAlbers') # Albers projection for Madagascar madClim <- project(madClim, madAlbers) madClim2030 <- project(madClim2030, madAlbers) madClim2050 <- project(madClim2050, madAlbers) madClim2070 <- project(madClim2070, madAlbers) madClim2090 <- project(madClim2090, madAlbers) # Coordinate reference systems: wgs84 <- getCRS('WGS84') # WGS84 madAlbers <- getCRS('madAlbers') # Madagascar Albers # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- project(occs, madAlbers) # eliminate cell duplicates occs <- elimCellDuplicates(occs, madClim) # extract environment at occurrences occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create background sites (using just 1000 to speed things up!) bgEnv <- terra::spatSample(madClim, 3000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) ### calibrate model ################### predictors <- c('bio1', 'bio12') # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast cores = 2 ) ### project to present and future climate ######################################### predPresent <- predictEnmSdm(mx, madClim) pred2030 <- predictEnmSdm(mx, madClim2030) pred2050 <- predictEnmSdm(mx, madClim2050) pred2070 <- predictEnmSdm(mx, madClim2070) pred2090 <- predictEnmSdm(mx, madClim2090) plot(predPresent, main = 'Present Suitability') # plot change in suitability between present and 2090s delta <- pred2090 - predPresent plot(delta, main = 'Change in Suitability') ### calculate biotic velocity ############################# series <- c( predPresent, pred2030, pred2050, pred2070, pred2090 ) names(series) <- c('present', 't2030', 't2050', 't2070', 't2090') plot(series) times <- c(1985, 2030, 2050, 2070, 2090) quants <- c(0.10, 0.90) bv <- bioticVelocity( x = series, times = times, quants = quants, cores = 2 ) bv ### centroid velocities # centroid (will always be >= 0) # fastest centroid movement around 2060 plot(bv$timeMid, bv$centroidVelocity, type = 'l', xlab = 'Year', ylab = 'Speed (m / y)', main = 'Centroid Speed') # velocity northward/southward through time # shows northward shift because always positive, fastest around 2060 plot(bv$timeMid, bv$nsCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Centroid N/S Velocity') # velocity eastward (positive)/westward (negative) through time # movement eastward (positive) first, then westward (negative) plot(bv$timeMid, bv$ewCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Centroid E/W Velocity') ### map of centroid location through time # shows centroid moves slightly northward through time plot(delta, main = 'Centroid Location &\nChange in Suitability') points(bv$centroidLong[1], bv$centroidLat[1], pch = 1) points(bv$centroidLong[4], bv$centroidLat[4], pch = 16) lines(bv$centroidLong, bv$centroidLat) legend( 'bottomright', legend = c( 'start (~1985)', 'stop (~2090)', 'trajectory' ), pch = c(1, 16, NA), lwd = c(NA, NA, 1) ) ### velocities of portions of range north/south/east/west of centroid # positive ==> northward shift # negative ==> southward shift # portion of range north of centroid # shows northward expansion because always positive plot(bv$timeMid, bv$nCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Northern Part of Range') # portion of range south of centroid # shows northward contraction because always positive plot(bv$timeMid, bv$sCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Southern Part of Range') # portion of range east of centroid # shows eastern portion moves farther east plot(bv$timeMid, bv$eCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Eastern Part of Range') # portion of range west of centroid # shows western portion moves east plot(bv$timeMid, bv$wCentroidVelocity, type = 'l', xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Western Part of Range') ### velocities of range margins # from south to north, 10th and 90th quantiles of density # positive ==> northward shift # negative ==> southward shift # shows both northern and southern range margins shift northward # because always positive... northern margin shift usually slower ylim <- range(bv$nsQuantVelocity_quant0p1, bv$nsQuantVelocity_quant0p9) plot(bv$timeMid, bv$nsQuantVelocity_quant0p1, type = 'l', ylim = ylim, xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Northern/Southern Range Margins') lines(bv$timeMid, bv$nsQuantVelocity_quant0p9, lty = 'dashed') legend( 'bottomright', legend = c('Southern Margin', 'Northern Margin'), lty = c('solid', 'dashed') ) # from east to west, 10th and 90th quantiles of density # positive ==> eastward shift # negative ==> westward shift ylim <- range(bv$ewQuantVelocity_quant0p1, bv$ewQuantVelocity_quant0p9) plot(bv$timeMid, bv$ewQuantVelocity_quant0p1, type = 'l', ylim = ylim, xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Eastern/Western Range Margins') lines(bv$timeMid, bv$ewQuantVelocity_quant0p9, lty = 'dashed') legend( 'bottomright', legend = c('Eastern Margin', 'Western Margin'), lty = c('solid', 'dashed') ) ### summary statistics # mean density across cells through time plot(bv$timeMid, bv$mean, type = 'l', xlab = 'Year', ylab = 'Mean Density', main = 'Mean Density') # sum of density across cells through time plot(bv$timeMid, bv$sum, type = 'l', xlab = 'Year', ylab = 'Sum of Density', main = 'Sum of Density') ### change metrics # average change in suitability from one time period to next # shows average conditions getting worse plot(bv$timeMid, bv$simpleMeanDiff, type = 'l', xlab = 'Year', ylab = 'Mean Change in Suitability') # average absolute change in suitability from one time period to next # shows average absolute change declining plot(bv$timeMid, bv$meanAbsDiff, type = 'l', xlab = 'Year', ylab = 'Mean Absolute Change in Suitability') # root-mean square difference from one time period to the next # shows difference between successive rasters declines through time plot(bv$timeMid, bv$rmsd, type = 'l', xlab = 'Year', ylab = 'RMSD') ### raster similarity # most indicate that successive rasters are similar through time ylim <- range(bv$godsoeEsp, bv$schoenerD, bv$warrenI, bv$cor, bv$warrenI) plot(bv$timeMid, bv$godsoeEsp, type = 'l', lty = 1, col = 1, xlab = 'Year', ylab = 'Raster similarity', ylim = ylim) lines(bv$timeMid, bv$schoenerD, lty = 2, col = 2) lines(bv$timeMid, bv$warrenI, lty = 3, col = 3) lines(bv$timeMid, bv$cor, lty = 4, col = 4) lines(bv$timeMid, bv$rankCor, lty = 5, col = 5) legend( 'right', legend = c( 'Godsoe\'s ESP', 'Schoener\'s D', 'Warren\'s I', 'Correlation', 'Rank Correlation' ), col = 1:5, lty = 1:5 ) # values of 10th and 90th quantiles across cells through time # shows most favorable cells becoming less favorable # least favorable cells remain mainly unchanged ylim <- range(bv$quantile_quant0p1, bv$quantile_quant0p9) plot(bv$timeMid, bv$quantile_quant0p1, type = 'l', ylim = ylim, xlab = 'Year', ylab = 'Quantile Value', main = 'Quantiles across Cells') lines(bv$timeMid, bv$quantile_quant0p9, lty = 'dashed') legend( 'topright', legend = c('10th quantile', '90th quantile'), lty = c('solid', 'dashed') ) ### map of northern/southern range margins through time # range of longitude shown in plot madExtent <- ext(madClim) xExtent <- as.vector(madExtent)[1:2] plot(predPresent, main = 'North/South Range Margin Location') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p9[1], bv$nsQuantLat_quant0p9[1])) lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p9[2], bv$nsQuantLat_quant0p9[2]), lty = 'dashed') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p9[3], bv$nsQuantLat_quant0p9[3]), lty = 'dotdash') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p9[4], bv$nsQuantLat_quant0p9[4]), lty = 'dotted') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p1[1], bv$nsQuantLat_quant0p1[1])) lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p1[2], bv$nsQuantLat_quant0p1[2]), lty = 'dashed') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p1[3], bv$nsQuantLat_quant0p1[3]), lty = 'dotdash') lines(c(xExtent[1], xExtent[2]), c(bv$nsQuantLat_quant0p1[4], bv$nsQuantLat_quant0p1[4]), lty = 'dotted') legend( 'bottomright', legend = c( '1980s', '2030s', '2050s', '2070s', '2090s' ), lty = c('solid', 'dashed', 'dotdash', 'dotted') ) ### map of eastern/western range margins through time # range of longitude shown in plot madExtent <- ext(madClim) yExtent <- as.vector(madExtent)[3:4] plot(predPresent, main = 'North/South Range Margin Location') lines(c(bv$ewQuantLong_quant0p9[1], bv$ewQuantLong_quant0p9[1]), c(yExtent[1], yExtent[2])) lines(c(bv$ewQuantLong_quant0p9[2], bv$ewQuantLong_quant0p9[2]), c(yExtent[1], yExtent[2]), lty = 'dashed') lines(c(bv$ewQuantLong_quant0p9[3], bv$ewQuantLong_quant0p9[3]), c(yExtent[1], yExtent[2]), lty = 'dotdash') lines(c(bv$ewQuantLong_quant0p9[4], bv$ewQuantLong_quant0p9[4]), c(yExtent[1], yExtent[2]), lty = 'dotted') lines(c(bv$ewQuantLong_quant0p1[1], bv$ewQuantLong_quant0p1[1]), c(yExtent[1], yExtent[2])) lines(c(bv$ewQuantLong_quant0p1[2], bv$ewQuantLong_quant0p1[2]), c(yExtent[1], yExtent[2]), lty = 'dashed') lines(c(bv$ewQuantLong_quant0p1[3], bv$ewQuantLong_quant0p1[3]), c(yExtent[1], yExtent[2]), lty = 'dotdash') lines(c(bv$ewQuantLong_quant0p1[4], bv$ewQuantLong_quant0p1[4]), c(yExtent[1], yExtent[2]), lty = 'dotted') legend( 'bottomright', legend = c( '1980s', '2030s', '2050s', '2070s', '2090s' ), lty = c('solid', 'dashed', 'dotdash', 'dotted') )
This SpatVector
represents the outline of Canada in WGS84 (unprojected) coordinates. This is the "low resolution" (less accurate) version from GADM.
An object of class 'SpatVector'
.
Database of Global Administrative Areas (GADM)
library(terra) canFile <- system.file('extdata', 'canada_level0_gadm41.gpkg', package='enmSdmX') canada <- vect(canFile) plot(canada)
library(terra) canFile <- system.file('extdata', 'canada_level0_gadm41.gpkg', package='enmSdmX') canada <- vect(canFile) plot(canada)
This function calculates a suite of metrics reflecting of niche overlap for two response curves. Response curves are predicted responses of a uni- or multivariate model along a single variable. Depending on the user-specified settings the function calculates these values either at each pair of values of pred1
and pred2
or along a smoothed version of pred1
and pred2
.
compareResponse( pred1, pred2, data, predictor = names(data), adjust = FALSE, gap = Inf, smooth = FALSE, smoothN = 1000, smoothRange = c(0, 1), graph = FALSE, ... )
compareResponse( pred1, pred2, data, predictor = names(data), adjust = FALSE, gap = Inf, smooth = FALSE, smoothN = 1000, smoothRange = c(0, 1), graph = FALSE, ... )
pred1 |
Numeric vector. Predictions from first model along |
pred2 |
Numeric vector. Predictions from second model along |
data |
Data frame or matrix corresponding to |
predictor |
Character vector. Name(s) of predictor(s) for which to calculate comparisons. These must appear as column names in |
adjust |
Logical. If |
gap |
Numeric >0. Proportion of range of predictor variable across which to assume a gap exists. Calculation of |
smooth |
Logical. If |
smoothN |
|
smoothRange |
2-element numeric vector or |
graph |
Logical. If |
... |
Arguments to pass to functions like |
Either a data frame (if smooth = FALSE
or a list object with the smooth model plus a data frame (if smooth = TRUE
) . The data frame represents metrics comparing response curves of pred1
and pred2
:
predictor
Predictor for which comparison was made
n
Number of values of predictor at which comparison was calculated
adjust
adjust
argument.
smooth
smooth
argument.
meanDiff
Mean difference between predictions of pred1
and pred2
(higher ==> more different).
meanAbsDiff
Mean absolute value of difference between predictions of pred1
and pred2
(higher ==> more different).
areaAbsDiff
Sum of the area between curves predicted by pred1
and pred2
, standardized by total potential area between the two curves (i.e., the area available between the minimum and maximum prediction along the minimum and maximum values of the predictor) (higher ==> more different).
d
Schoener's D
i
Hellinger's I (adjusted to have a range [0, 1])
esp
Godsoe's ESP
cor
Pearson correlation between predictions of pred1
and pred2
.
rankCor
Spearman rank correlation between predictions of pred1
and pred2
.
Warren, D.L., Glor, R.E., and Turelli, M. 2008. Environmental niche equivalency versus conservatism: Quantitative approaches to niche evolution. Evolution 62:2868-2883.
Warren, D.L., Glor, R.E., and Turelli, M. 2008. Erratum. Evolution 62:2868-2883.
Godsoe, W. 2014. Inferring the similarity of species distributions using Species Distribution Models. Ecography 37:130-136.
set.seed(123) data <- data.frame( x1=seq(-1, 1, length.out=100), x2=seq(-1, 1, length.out=100) + rnorm(100, 0, 0.3) ) pred1 <- 1 / (1 + exp(-(0.3 + 2 * (data$x1 - 0.2) -0.3 * data$x2))) pred2 <- 1 / (1 + exp(-(-0 + 0.1 * data$x1 - 4 * data$x1^2 + 0.4 * data$x2))) compareResponse(pred1, pred2, data, graph=TRUE) compareResponse(pred1, pred2, data, smooth=TRUE, graph=TRUE) compareResponse(pred1, pred2, data, adjust=TRUE, graph=TRUE)
set.seed(123) data <- data.frame( x1=seq(-1, 1, length.out=100), x2=seq(-1, 1, length.out=100) + rnorm(100, 0, 0.3) ) pred1 <- 1 / (1 + exp(-(0.3 + 2 * (data$x1 - 0.2) -0.3 * data$x2))) pred2 <- 1 / (1 + exp(-(-0 + 0.1 * data$x1 - 4 * data$x1^2 + 0.4 * data$x2))) compareResponse(pred1, pred2, data, graph=TRUE) compareResponse(pred1, pred2, data, smooth=TRUE, graph=TRUE) compareResponse(pred1, pred2, data, adjust=TRUE, graph=TRUE)
This function calculates the imprecision of geographic coordinates due to rounded coordinate values. See Details for an explanation of how this is calculated.
coordImprecision(x, dms = FALSE, epsilon = 2)
coordImprecision(x, dms = FALSE, epsilon = 2)
x |
Spatial points represented as a |
dms |
Logical, if |
epsilon |
Zero or positive integer, number of digits to which to round seconds value if |
For coordinates originally reported in decimal notation, coordinate imprecision is half the distance between the two opposing corners on a bounding box whose size is based on the number of significant digits in the coordinates. This box is defined by 1) finding the maximum number of significant digits after the decimal in the longitude/latitude pair; 2) adding/subtracting 5 to the decimal place that falls just after this; and 3) calculating the distance between these points then dividing by 2. For example, if longitude is 82.37 and latitude 45.8 then the number of significant digits after the decimal place is 2 and 1, respectively so 2 is used on the assumption that latitude is measured to the nearest 100th degree. The precision is then the distance between the point pairs (82.37 - 0.05 = 82.365, 45.8 - 0.05 = 45.795) and (82.37 + 0.05 = 82.375, 45.8 + 0.05 = 45.805).
For coordinates originally reported in degree-minus-second (DMS) format, the bounding box is defined by adding/subtracting 0.5 units (degrees, minutes, or seconds, depending on the smallest non-zero unit reported) from the coordinate. For example, if longitude is 90deg 00min 00sec and latitude is 37deg 37min 37sec, then the bounding box will be defined by adding/subtracting 0.5 arcsec to the coordinates.
Numeric values (by default in units of meters).
# coarse-precision cases long <- c(45, 45.1, 45.1) lat <- c(45, 45.1, 45) ll <- cbind(long, lat) precision_m <- coordImprecision(ll) cbind(ll, precision_m) # fine-precision cases long <- rep(45, 8) lat <- c(45, 45.1, 45.11, 45.111, 45.1111, 45.11111, 45.111111, 45.1111111) ll <- cbind(long, lat) precision_m <- coordImprecision(ll) cbind(ll, precision_m) # precision varies with latitude long <- rep(45, 181) lat <- seq(-90, 90) ll <- cbind(long, lat) precision_m <- coordImprecision(ll) cbind(ll, precision_m) plot(lat, precision_m / 1000, xlab='Latitude', ylab='Precision (km)') # dateline/polar cases long <- c(0, 180, 45, 45) lat <- c(45, 45, 90, -90) ll <- cbind(long, lat) precision_m <- coordImprecision(ll) cbind(ll, precision_m) # original coordinates in degrees-minutes-seconds format longDD <- c(90, 90, 90, 90, 90, 90) longMM <- c(0, 0, 0, 11, 11, 0) longSS <- c(0, 0, 0, 0, 52, 52) latDD <- c(38, 38, 38, 38, 38, 38) latMM <- c(0, 37, 37, 37, 37, 0) latSS <- c(0, 0, 38, 38, 38, 0) longHemis <- rep('W', 6) latHemis <- rep('N', 6) longDec <- dmsToDecimal(longDD, longMM, longSS, longHemis) latDec <- dmsToDecimal(latDD, latMM, latSS, latHemis) decimal <- cbind(longDec, latDec) (decImp <- coordImprecision(decimal)) (dmsImp <- coordImprecision(decimal, dms=TRUE)) # What if we do not know if coordinates were originally reported in # decimal or degrees-minutes-seconds format? Most conservative option # is to use maximum: pmax(decImp, dmsImp) if (FALSE) { # known error when longitude is negative and latitude is -90 long <- -45 lat <- -90 ll <- cbind(long, lat) coordImprecision(ll) }
# coarse-precision cases long <- c(45, 45.1, 45.1) lat <- c(45, 45.1, 45) ll <- cbind(long, lat) precision_m <- coordImprecision(ll) cbind(ll, precision_m) # fine-precision cases long <- rep(45, 8) lat <- c(45, 45.1, 45.11, 45.111, 45.1111, 45.11111, 45.111111, 45.1111111) ll <- cbind(long, lat) precision_m <- coordImprecision(ll) cbind(ll, precision_m) # precision varies with latitude long <- rep(45, 181) lat <- seq(-90, 90) ll <- cbind(long, lat) precision_m <- coordImprecision(ll) cbind(ll, precision_m) plot(lat, precision_m / 1000, xlab='Latitude', ylab='Precision (km)') # dateline/polar cases long <- c(0, 180, 45, 45) lat <- c(45, 45, 90, -90) ll <- cbind(long, lat) precision_m <- coordImprecision(ll) cbind(ll, precision_m) # original coordinates in degrees-minutes-seconds format longDD <- c(90, 90, 90, 90, 90, 90) longMM <- c(0, 0, 0, 11, 11, 0) longSS <- c(0, 0, 0, 0, 52, 52) latDD <- c(38, 38, 38, 38, 38, 38) latMM <- c(0, 37, 37, 37, 37, 0) latSS <- c(0, 0, 38, 38, 38, 0) longHemis <- rep('W', 6) latHemis <- rep('N', 6) longDec <- dmsToDecimal(longDD, longMM, longSS, longHemis) latDec <- dmsToDecimal(latDD, latMM, latSS, latHemis) decimal <- cbind(longDec, latDec) (decImp <- coordImprecision(decimal)) (dmsImp <- coordImprecision(decimal, dms=TRUE)) # What if we do not know if coordinates were originally reported in # decimal or degrees-minutes-seconds format? Most conservative option # is to use maximum: pmax(decImp, dmsImp) if (FALSE) { # known error when longitude is negative and latitude is -90 long <- -45 lat <- -90 ll <- cbind(long, lat) coordImprecision(ll) }
Returns the number of points in a sf
or SpatVector
object. This is typically done using either length(x)
or nrow(x)
, depending on whether the object in question has rows or not. This function helps in ambiguous cases, so users need not care if nrow
or length
is needed.
countPoints(x, byFeature = FALSE)
countPoints(x, byFeature = FALSE)
x |
Object of class |
byFeature |
If |
Numeric.
library(sf) # lemur occurrence data data(lemurs) wgs84 <- getCRS('WGS84') occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- sf::st_as_sf(occs, coords=c('longitude', 'latitude'), crs=wgs84) countPoints(occs)
library(sf) # lemur occurrence data data(lemurs) wgs84 <- getCRS('WGS84') occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- sf::st_as_sf(occs, coords=c('longitude', 'latitude'), crs=wgs84) countPoints(occs)
A table of commonly-used coordinate reference systems, their nicknames, and WKT2 (well-known text) strings
data(crss)
data(crss)
An object of class data.frame
. This is a table with "named" coordinate referenbce systems and their well-known-text (WKT2) representation. It can be used as-is, or with getCRS
to quickly get a WKT for a particular CRS. The fields are as:
long
: "Long" name of the CRS
short1
and short2
: "Short" names of the CRS
region
: Region for which CRS is fit
projected
: Is the CRS projected or not?
projectionGeometry
: Type of projection (NA
, 'cylindrical', 'conic', or 'planar')
datum
: Datum
type
: Either 'CRS' or 'data'. The former are proper CRSs, and the latter are those used by popular datasets.
wkt2
: WKT2 string.
notes
: Notes.
data(crss) getCRS('North America Albers', nice = TRUE)
data(crss) getCRS('North America Albers', nice = TRUE)
These functions take as input either a spatial object or coordinate pair and a custom WKT2 (well-known text) coordinate reference system string centered on the object or coordinate. Projections include:
Albers conic equal-area
Lambert azimuthal equal-area
Vertical near-side (i.e., as the world appears from geosynchronous orbit)
Please note that these are NOT standard projections, so do not have an EPSG or like code.
customAlbers(x) customLambert(x) customVNS(x, alt = 35800)
customAlbers(x) customLambert(x) customVNS(x, alt = 35800)
x |
Either an object of class |
alt |
Altitude in meters of the viewpoint in km. The default (35800 km) is geosynchronous orbit. |
A WKT2 (well-known text) string.
customLambert()
: Custom coordinate reference system WKT2 string
customVNS()
: Custom coordinate reference system WKT2 string
getCRS
, customAlbers
, customLambert
, customVNS
library(sf) ### Madagascar data(mad0) alb <- customAlbers(mad0) lamb <- customLambert(mad0) vert <- customVNS(mad0) madAlb <- st_transform(mad0, alb) madLamb <- st_transform(mad0, lamb) madVert <- st_transform(mad0, vert) oldPar <- par(mfrow=c(2, 2)) plot(st_geometry(mad0), main='Unprojected (WGS84)') plot(st_geometry(madAlb), main='Albers') plot(st_geometry(madLamb), main='Lambert') plot(st_geometry(madVert), main='Vertical') par(oldPar) ### Canada # The effect is more noticeable when plotting large areas, # especially if they lie near the poles. # This example can take a few minutes to run and plot. library(terra) canFile <- system.file('extdata', 'canada_level0_gadm41.gpkg', package='enmSdmX') can <- vect(canFile) alb <- customAlbers(can) lamb <- customLambert(can) vert <- customVNS(can) canAlb <- project(can, alb) canLamb <- project(can, lamb) canVert <- project(can, vert) oldPar <- par(mfrow=c(2, 2)) plot(can, main = 'Unprojected (WGS84)') plot(canAlb, main = 'Albers') plot(canLamb, main = 'Lambert') plot(canVert, main = 'Vertical') par(oldPar)
library(sf) ### Madagascar data(mad0) alb <- customAlbers(mad0) lamb <- customLambert(mad0) vert <- customVNS(mad0) madAlb <- st_transform(mad0, alb) madLamb <- st_transform(mad0, lamb) madVert <- st_transform(mad0, vert) oldPar <- par(mfrow=c(2, 2)) plot(st_geometry(mad0), main='Unprojected (WGS84)') plot(st_geometry(madAlb), main='Albers') plot(st_geometry(madLamb), main='Lambert') plot(st_geometry(madVert), main='Vertical') par(oldPar) ### Canada # The effect is more noticeable when plotting large areas, # especially if they lie near the poles. # This example can take a few minutes to run and plot. library(terra) canFile <- system.file('extdata', 'canada_level0_gadm41.gpkg', package='enmSdmX') can <- vect(canFile) alb <- customAlbers(can) lamb <- customLambert(can) vert <- customVNS(can) canAlb <- project(can, alb) canLamb <- project(can, lamb) canVert <- project(can, vert) oldPar <- par(mfrow=c(2, 2)) plot(can, main = 'Unprojected (WGS84)') plot(canAlb, main = 'Albers') plot(canLamb, main = 'Lambert') plot(canVert, main = 'Vertical') par(oldPar)
This function converts geographic coordinates in decimal format to degrees-minutes-seconds (DD-MM-SS) format.
decimalToDms(x)
decimalToDms(x)
x |
Numeric or vector of numeric values, longitude or latitude in decimal format. |
A numeric matrix with three columns: degrees, seconds, and seconds. Note that the hemisphere (i.e., indicated by the sign of x) is not returned since it could be either north/south or east/west.
decimalToDms(38.56123) # latitude of St. Louis, Missouri, USA decimalToDms(90.06521) # longitude of St. Louis, Missouri, USA
decimalToDms(38.56123) # latitude of St. Louis, Missouri, USA decimalToDms(90.06521) # longitude of St. Louis, Missouri, USA
This function converts geographic coordinates in degrees-minutes-seconds (DD-MM-SS) format to decimal format.
dmsToDecimal(dd, mm, ss, hemis = NULL)
dmsToDecimal(dd, mm, ss, hemis = NULL)
dd |
Numeric. Degrees longitude or latitude. Can be a decimal value. |
mm |
Numeric. Minutes longitude or latitude. Can be a decimal value. |
ss |
Numeric. Second longitude or latitude. Can be a decimal value. |
hemis |
Character or |
Numeric.
dmsToDecimal(38, 37, 38) # latitude of St. Louis, Missouri, USA dmsToDecimal(38, 37, 38, 'N') # latitude of St. Louis, Missouri, USA dmsToDecimal(90, 11, 52.1) # longitude of St. Louis, Missouri, USA dmsToDecimal(90, 11, 52.1, 'W') # longitude of St. Louis, Missouri, USA
dmsToDecimal(38, 37, 38) # latitude of St. Louis, Missouri, USA dmsToDecimal(38, 37, 38, 'N') # latitude of St. Louis, Missouri, USA dmsToDecimal(90, 11, 52.1) # longitude of St. Louis, Missouri, USA dmsToDecimal(90, 11, 52.1, 'W') # longitude of St. Louis, Missouri, USA
This function thins spatial points such that no more than one point falls within each cell of a reference raster. If more than one point falls in a cell, the first point in the input data is retained unless the user specifies a priority for keeping points.
elimCellDuplicates(x, rast, longLat = NULL, priority = NULL)
elimCellDuplicates(x, rast, longLat = NULL, priority = NULL)
x |
Points. This can be either a |
rast |
|
longLat |
Two-element character vector or two-element integer vector. If |
priority |
Either |
Object of class x
.
# This example can take >10 second to run. library(terra) x <- data.frame( long=c(-90.1, -90.1, -90.2, 20), lat=c(38, 38, 38, 38), point=letters[1:4] ) rast <- rast() # empty raster covering entire world with 1-degree resolution elimCellDuplicates(x, rast, longLat=c(1, 2)) elimCellDuplicates(x, rast, longLat=c(1, 2), priority=c(3, 2, 1, 0))
# This example can take >10 second to run. library(terra) x <- data.frame( long=c(-90.1, -90.1, -90.2, 20), lat=c(38, 38, 38, 38), point=letters[1:4] ) rast <- rast() # empty raster covering entire world with 1-degree resolution elimCellDuplicates(x, rast, longLat=c(1, 2)) elimCellDuplicates(x, rast, longLat=c(1, 2), priority=c(3, 2, 1, 0))
This function calculates the area under the receiver-operator characteristic curve (AUC) following Mason & Graham (2002). Each case (presence/non-presence) can be assigned a weight, if desired.
evalAUC( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), na.rm = FALSE, ... )
evalAUC( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), na.rm = FALSE, ... )
pres |
Vector of predictions for "positive" cases (e.g., predictions at presence sites). |
contrast |
Vector of predictions for "negative" cases (e.g., predictions at absence/background sites). |
presWeight |
Weights of positive cases. The default is to assign each positive case a weight of 1. |
contrastWeight |
Weights of contrast cases. The default is to assign each case a weight of 1. |
na.rm |
Logical. If |
... |
Other arguments (unused). |
A Numeric value.
Mason, S.J. and N.E. Graham. 2002. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Quarterly Journal of the Royal Meteorological Society 128:2145-2166. doi:10.1256/003590002320603584
pres <- seq(0.5, 1, by=0.1) contrast <- seq(0, 1, by=0.01) # unweighted evalAUC(pres, contrast) # weighted (weight presences with low predictions more) presWeight <- c(1, 1, 1, 0.5, 0.5, 0.5) evalAUC(pres, contrast, presWeight=presWeight) # weighted (weight presences with high predictions more) presWeight <- c(0.5, 0.5, 0.5, 1, 1, 1) evalAUC(pres, contrast, presWeight=presWeight) # weight presences and absences contrastWeight <- sqrt(contrast) evalAUC(pres, contrast, presWeight=presWeight, contrastWeight=contrastWeight)
pres <- seq(0.5, 1, by=0.1) contrast <- seq(0, 1, by=0.01) # unweighted evalAUC(pres, contrast) # weighted (weight presences with low predictions more) presWeight <- c(1, 1, 1, 0.5, 0.5, 0.5) evalAUC(pres, contrast, presWeight=presWeight) # weighted (weight presences with high predictions more) presWeight <- c(0.5, 0.5, 0.5, 1, 1, 1) evalAUC(pres, contrast, presWeight=presWeight) # weight presences and absences contrastWeight <- sqrt(contrast) evalAUC(pres, contrast, presWeight=presWeight, contrastWeight=contrastWeight)
This function calculates the continuous Boyce index (CBI), a measure of model accuracy for presence-only test data.
evalContBoyce( pres, contrast, numBins = 101, binWidth = 0.1, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), autoWindow = TRUE, method = "spearman", dropZeros = TRUE, graph = FALSE, table = FALSE, na.rm = FALSE, ... )
evalContBoyce( pres, contrast, numBins = 101, binWidth = 0.1, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), autoWindow = TRUE, method = "spearman", dropZeros = TRUE, graph = FALSE, table = FALSE, na.rm = FALSE, ... )
pres |
Numeric vector. Predicted values at presence sites. |
contrast |
Numeric vector. Predicted values at background sites. |
numBins |
Positive integer. Number of (overlapping) bins into which to divide predictions. |
binWidth |
Positive numeric value < 1. Size of a bin. Each bin will be |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
autoWindow |
Logical. If |
method |
Character. Type of correlation to calculate. The default is |
dropZeros |
Logical. If |
graph |
Logical. If |
table |
Logical. If |
na.rm |
Logical. If |
... |
Other arguments (not used). |
CBI is the Spearman rank correlation coefficient between the proportion of sites in each prediction class and the expected proportion of predictions in each prediction class based on the proportion of the landscape that is in that class. The index ranges from -1 to 1. Values >0 indicate the model's output is positively correlated with the true probability of presence. Values <0 indicate it is negatively correlated with the true probability of presence.
Numeric value, or if table
is TRUE
, then a list object with CBI plus a data frame with P (proportion of presence weights per bin), E (expected proportion of presence weights per bin–from contrast sites), and the ratio of the two.
Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A. 2002. Evaluating resource selection functions. Ecological Modeling 157:281-300. doi:10.1016/S0304-3800(02)00200-4
Hirzel, A.H., Le Lay, G., Helfer, V., Randon, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modeling 199:142-152. doi:10.1016/j.ecolmodel.2006.05.017
cor
, pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThreshold
, evalThresholdStats
, evalTjursR2
, evalTSS
set.seed(123) pres <- sqrt(runif(100)) contrast <- runif(1000) evalContBoyce(pres, contrast) presWeight <- c(rep(1, 10), rep(0.5, 90)) evalContBoyce(pres, contrast, presWeight=presWeight)
set.seed(123) pres <- sqrt(runif(100)) contrast <- runif(1000) evalContBoyce(pres, contrast) presWeight <- c(rep(1, 10), rep(0.5, 90)) evalContBoyce(pres, contrast, presWeight=presWeight)
This function calculates a multivariate version of the area under the receiver-operator characteristic curve (AUC). The multivariate version is simply the mean AUC across all possible pairwise AUCs for all cases (Hand & Till 2001). For example, if we have predictions that can be classified into three groups of expectation, say A, B, and C, where we expect predictions assigned to group A are > those in B and C, and predictions in group B are expected to be > those in group C, the multivariate AUC for this situation is mean(wAB * auc_mean(A, B), wAC * auc_mean(A, C), wBC * auc_mean(B, C))
, where auc_mean(X, Y)
, is the AUC calculated between cases X
and Y
, and wXY
is a weight for that case-comparison.
evalMultiAUC(..., weightBySize = FALSE, na.rm = FALSE)
evalMultiAUC(..., weightBySize = FALSE, na.rm = FALSE)
... |
A set of two or more numeric vectors or two or more 2-column matrices or data frames. The objects must be listed in order of expected probability. For example, you might have a set of predictions for objects you expect to have a low predicted probability (e.g., long-term absences of an animal), a set that you expect to have middle levels of probability (e.g., sites that were recently vacated), and a set for which you expect a high level of predicted probability (e.g., sites that are currently occupied). In this case you should list the cases in order: low, middle, high. If a 2-column matrix or data frame is supplied, then the first column is assumed to represent predictions and the second assumed to represent site-level weights (see |
weightBySize |
Logical, if |
na.rm |
Logical. If |
Named numeric vector. The names will appear as case2_over_case1
(which in this example means the AUC of item #1 in the ...
when compared to the second item in ...
), plus multivariate
(which is the multivariate AUC).
Hand, DJ and Till, RJ. 2001. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45:171-186 doi:10.1023/A:1010920819831.
pa_evaluate
, evalAUC
, evalContBoyce
, evalThreshold
, evalThresholdStats
, evalTjursR2
, evalTSS
set.seed(123) # no weights low <- runif(10)^2 middle <- runif(10) high <- sqrt(runif(20)) evalMultiAUC(low, middle, high) # equal weights low <- matrix(c(low, rep(1, length(low))), ncol=2) middle <- matrix(c(middle, rep(1, length(middle))), ncol=2) high <- matrix(c(high, rep(1, length(high))), ncol=2) evalMultiAUC(low, middle, high) # equal weights with weighting by number of comparisons evalMultiAUC(low, middle, high, weightBySize=TRUE) # unequal weights middle[ , 2] <- ifelse(middle[ , 1] > 0.5, 0.1, 1) evalMultiAUC(low, middle, high) # unequal weights with weighting by number of comparisons evalMultiAUC(low, middle, high, weightBySize=TRUE)
set.seed(123) # no weights low <- runif(10)^2 middle <- runif(10) high <- sqrt(runif(20)) evalMultiAUC(low, middle, high) # equal weights low <- matrix(c(low, rep(1, length(low))), ncol=2) middle <- matrix(c(middle, rep(1, length(middle))), ncol=2) high <- matrix(c(high, rep(1, length(high))), ncol=2) evalMultiAUC(low, middle, high) # equal weights with weighting by number of comparisons evalMultiAUC(low, middle, high, weightBySize=TRUE) # unequal weights middle[ , 2] <- ifelse(middle[ , 1] > 0.5, 0.1, 1) evalMultiAUC(low, middle, high) # unequal weights with weighting by number of comparisons evalMultiAUC(low, middle, high, weightBySize=TRUE)
This function is similar to the threshold
function in the predicts package, which calculates thresholds to create binary predictions from continuous values. However, unlike that function, it allows the user to specify weights for presences and absence/background predictions. The output will thus be the threshold that best matches the specified criterion taking into account the relative weights of the input values.
evalThreshold( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), at = c("msss", "mdss", "minPres", "prevalence", "sensitivity"), sensitivity = 0.9, thresholds = seq(0, 1, by = 0.001), na.rm = FALSE, ... )
evalThreshold( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), at = c("msss", "mdss", "minPres", "prevalence", "sensitivity"), sensitivity = 0.9, thresholds = seq(0, 1, by = 0.001), na.rm = FALSE, ... )
pres |
Numeric vector. Predicted values at test presences. |
contrast |
Numeric vector. Predicted values at background/absence sites. |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
at |
Character or character vector, name(s) of threshold(s) to calculate. The default is to calculate them all.
|
sensitivity |
Value of specificity to match (used only if |
thresholds |
Numeric vector. Thresholds at which to calculate the sum of sensitivity and specificity. The default evaluates all values from 0 to 1 in steps of 0.01. |
na.rm |
Logical. If |
... |
Other arguments (unused). |
Named numeric vector. Fielding, A.H. and J.F. Bell. 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24:38-49. doi:10.1017/S0376892997000088
threshold
, pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThresholdStats
, evalTjursR2
, evalTSS
set.seed(123) # set of bad and good predictions at presences bad <- runif(100)^2 good <- runif(100)^0.1 hist(good, breaks=seq(0, 1, by=0.1), border='green', main='Presences') hist(bad, breaks=seq(0, 1, by=0.1), border='red', add=TRUE) pres <- c(bad, good) contrast <- runif(1000) evalThreshold(pres, contrast) # upweight bad predictions presWeight <- c(rep(1, 100), rep(0.1, 100)) evalThreshold(pres, contrast, presWeight=presWeight) # upweight good predictions presWeight <- c(rep(0.1, 100), rep(1, 100)) evalThreshold(pres, contrast, presWeight=presWeight)
set.seed(123) # set of bad and good predictions at presences bad <- runif(100)^2 good <- runif(100)^0.1 hist(good, breaks=seq(0, 1, by=0.1), border='green', main='Presences') hist(bad, breaks=seq(0, 1, by=0.1), border='red', add=TRUE) pres <- c(bad, good) contrast <- runif(1000) evalThreshold(pres, contrast) # upweight bad predictions presWeight <- c(rep(1, 100), rep(0.1, 100)) evalThreshold(pres, contrast, presWeight=presWeight) # upweight good predictions presWeight <- c(rep(0.1, 100), rep(1, 100)) evalThreshold(pres, contrast, presWeight=presWeight)
This function calculates a series of evaluation statistics based on a threshold or thresholds used to convert continuous predictions to binary predictions.
evalThresholdStats( thresholds, pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), delta = 0.001, na.rm = FALSE, bg = NULL, bgWeight = NULL, ... )
evalThresholdStats( thresholds, pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), delta = 0.001, na.rm = FALSE, bg = NULL, bgWeight = NULL, ... )
thresholds |
Numeric or numeric vector. Threshold(s) at which to calculate sensitivity and specificity. |
pres |
Numeric vector. Predicted values at test presences |
contrast |
Numeric vector. Predicted values at background/absence sites. |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
delta |
Positive numeric >0 in the range [0, 1] and usually very small. This value is used only if calculating the SEDI threshold when any true positive rate or false negative rate is 0 or the false negative rate is 1. Since SEDI uses log(x) and log(1 - x), values of 0 and 1 will produce |
na.rm |
Logical. If |
bg |
Same as |
bgWeight |
Same as |
... |
Other arguments (unused). |
8-column matrix with the following named columns. a = weight of presences >= threshold, b = weight of backgrounds >= threshold, c = weight of presences < threshold, d = weight of backgrounds < threshold, and N = sum of presence and background weights.
'threshold'
: Threshold
'sensitivity'
: Sensitivity (a / (a + c))
'specificity'
: Specificity (d / (d + b))
'ccr'
: Correct classification rate ((a + d) / N)
'ppp'
: Positive predictive power (a / (a + b))
'npp'
: Negative predictive power (d / (c + d))
'mr'
: Misclassification rate ((b + c) / N)
Fielding, A.H. and J.F. Bell. 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24:38-49. doi:10.1017/S0376892997000088
threshold
, pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThreshold
, evalTjursR2
, evalTSS
set.seed(123) # set of bad and good predictions at presences bad <- runif(100)^2 good <- runif(100)^0.1 hist(good, breaks=seq(0, 1, by=0.1), border='green', main='Presences') hist(bad, breaks=seq(0, 1, by=0.1), border='red', add=TRUE) pres <- c(bad, good) contrast <- runif(1000) thresholds <- c(0.1, 0.5, 0.9) evalThresholdStats(thresholds, pres, contrast) # upweight bad predictions presWeight <- c(rep(1, 100), rep(0.1, 100)) evalThresholdStats(thresholds, pres, contrast, presWeight=presWeight) # upweight good predictions presWeight <- c(rep(0.1, 100), rep(1, 100)) evalThresholdStats(thresholds, pres, contrast, presWeight=presWeight)
set.seed(123) # set of bad and good predictions at presences bad <- runif(100)^2 good <- runif(100)^0.1 hist(good, breaks=seq(0, 1, by=0.1), border='green', main='Presences') hist(bad, breaks=seq(0, 1, by=0.1), border='red', add=TRUE) pres <- c(bad, good) contrast <- runif(1000) thresholds <- c(0.1, 0.5, 0.9) evalThresholdStats(thresholds, pres, contrast) # upweight bad predictions presWeight <- c(rep(1, 100), rep(0.1, 100)) evalThresholdStats(thresholds, pres, contrast, presWeight=presWeight) # upweight good predictions presWeight <- c(rep(0.1, 100), rep(1, 100)) evalThresholdStats(thresholds, pres, contrast, presWeight=presWeight)
This function calculates Tjur's R2 metric of model discrimination accuracy. Unweighted R2 is simply the difference between the mean predicted value at presence sites and the mean predicted value at absence/background sites. The weighted version allows for differing weights between presences and between absences/contrast values (i.e., the difference between the weighted mean of predictions at presences and weighted mean predictions at absences/contrast locations).
evalTjursR2( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), na.rm = FALSE, ... )
evalTjursR2( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), na.rm = FALSE, ... )
pres |
Predictions at presence sites. |
contrast |
Predictions at absence/background sites. |
presWeight |
Weights of presence cases. The default is to assign each presence case a weight of 1. |
contrastWeight |
Weights of absence/background cases. The default is to assign each case a weight of 1. |
na.rm |
Logical. If |
... |
Other arguments (unused). |
Numeric value.
Tjur, T. 2009. Coefficients of determination in logistic regression models-A new proposal: The coefficient of discrimination. The American Statistician 63:366-372. doi:10.1198/tast.2009.08210
pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThreshold
, evalThresholdStats
, evalTSS
pres <- seq(0.5, 1, by=0.1) contrast <- seq(0, 1, by=0.01) # unweighted evalTjursR2(pres, contrast) # weighted (weight presences with low predictions more) presWeight <- c(1, 1, 1, 0.5, 0.5, 0.5) evalTjursR2(pres, contrast, presWeight=presWeight) # weighted (weight presences with high predictions more) presWeight <- c(0.5, 0.5, 0.5, 1, 1, 1) evalTjursR2(pres, contrast, presWeight=presWeight) # weight presences and absences contrastWeight <- sqrt(contrast) evalTjursR2(pres, contrast, presWeight=presWeight, contrastWeight=contrastWeight)
pres <- seq(0.5, 1, by=0.1) contrast <- seq(0, 1, by=0.01) # unweighted evalTjursR2(pres, contrast) # weighted (weight presences with low predictions more) presWeight <- c(1, 1, 1, 0.5, 0.5, 0.5) evalTjursR2(pres, contrast, presWeight=presWeight) # weighted (weight presences with high predictions more) presWeight <- c(0.5, 0.5, 0.5, 1, 1, 1) evalTjursR2(pres, contrast, presWeight=presWeight) # weight presences and absences contrastWeight <- sqrt(contrast) evalTjursR2(pres, contrast, presWeight=presWeight, contrastWeight=contrastWeight)
This function calculates the True Skill Statistic (TSS).
evalTSS( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), thresholds = seq(0, 1, by = 0.001), na.rm = FALSE, ... )
evalTSS( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), thresholds = seq(0, 1, by = 0.001), na.rm = FALSE, ... )
pres |
Numeric vector. Predicted values at test presences |
contrast |
Numeric vector. Predicted values at background/absence sites. |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
thresholds |
Numeric vector. Thresholds at which to calculate the sum of sensitivity and specificity. The default evaluates all values from 0 to 1 in steps of 0.01. |
na.rm |
Logical. If |
... |
Other arguments (unused). |
This function calculates the maximum value of the True Skill Statistic (i.e., across all thresholds, the values that maximizes sensitivity plus specificity).
Numeric value.
See Allouche, O., Tsoar, A., and Kadmon, R. 2006. Assessing the accuracy of species distribution models: Prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43:1223-1232. doi:10.1111/j.1365-2664.2006.01214.x
pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThreshold
, evalThresholdStats
, evalTjursR2
set.seed(123) # set of bad and good predictions at presences bad <- runif(30)^2 good <- runif(30)^0.1 hist(good, breaks=seq(0, 1, by=0.1), border='green', main='Presences') hist(bad, breaks=seq(0, 1, by=0.1), border='red', add=TRUE) pres <- c(bad, good) contrast <- runif(1000) evalTSS(pres, contrast) # upweight bad predictions presWeight <- c(rep(1, 30), rep(0.1, 30)) evalTSS(pres, contrast, presWeight=presWeight) # upweight good predictions presWeight <- c(rep(0.1, 30), rep(1, 30)) evalTSS(pres, contrast, presWeight=presWeight)
set.seed(123) # set of bad and good predictions at presences bad <- runif(30)^2 good <- runif(30)^0.1 hist(good, breaks=seq(0, 1, by=0.1), border='green', main='Presences') hist(bad, breaks=seq(0, 1, by=0.1), border='red', add=TRUE) pres <- c(bad, good) contrast <- runif(1000) evalTSS(pres, contrast) # upweight bad predictions presWeight <- c(rep(1, 30), rep(0.1, 30)) evalTSS(pres, contrast, presWeight=presWeight) # upweight good predictions presWeight <- c(rep(0.1, 30), rep(1, 30)) evalTSS(pres, contrast, presWeight=presWeight)
This function returns a SpatVector
or sf
polygon representing an extent. The input can be a SpatExtent
or sf
object, or an object from which a SpatExtent
(extent) can be obtained.
extentToVect(x, ...)
extentToVect(x, ...)
x |
A |
... |
Arguments to supply to |
A SpatVector
(usual) or, if the input is an sf
object, an sf
polygon object.
data(mad0) madExtent <- extentToVect(mad0) plot(madExtent, border='blue', lty='dotted') plot(mad0[1], add=TRUE) # NB This is the same as: library(terra) madExtent <- ext(mad0) madExtent <- as.polygons(madExtent, crs = crs(mad0))
data(mad0) madExtent <- extentToVect(mad0) plot(madExtent, border='blue', lty='dotted') plot(mad0[1], add=TRUE) # NB This is the same as: library(terra) madExtent <- ext(mad0) madExtent <- as.polygons(madExtent, crs = crs(mad0))
This function generates geographically-distinct cross-validation folds, or "geo-folds" ("g-folds" for short). Points are grouped by proximity to one another. Folds can be forced to have at least a minimum number of points in them. Results are deterministic (i.e., the same every time for the same data).
More specifically, g-folds are created using this process:
To start, all pairwise distances between points are calculated. These are used in a clustering algorithm to create a dendrogram of relationships by distance. The dendrogram is then "cut" so it has k
groups (folds). If each fold has at least the minimum desired number of points (minIn
), then the process stops and fold assignments are returned.
However, if at least one fold has fewer than the desired number of points, a series of steps is executed.
First, the fold with a centroid that is farthest from all others is selected. If it has sufficient points, then the next-most distant fold is selected, and so on.
Once a fold is identified that has fewer than the desired number of points, it is grown by adding to it the points closest to its centroid, one at a time. Each time a point is added, the fold centroid is calculated again. The fold is grown until it has the desired number of points. Call this "fold #1". From hereafter, these points are considered "assigned" and not eligible for re-assignment.
The remaining "unassigned" points are then clustered again, but this time into k - 1
folds. And again, the most-distant group found that has fewer than the desired number of points is found. This fold is then grown as before, using only unassigned points. This fold then becomes "fold #2."
The process repeats iteratively until there are k
folds assigned, each with at least the desired number of points.
The potential downside of this approach is that the last fold is assigned the remainder of points, so will be the largest. One way to avoid gross imbalance is to select the value of minIn
such that it divides the points into nearly equally-sized groups.
geoFold(x, k, minIn = 1, longLat = 1:2, method = "complete", ...)
geoFold(x, k, minIn = 1, longLat = 1:2, method = "complete", ...)
x |
A "spatial points" object of class |
k |
Numeric: Number of folds to create. |
minIn |
Numeric: Minimum number of points required to be in a fold. |
longLat |
Character or integer vector: This is ignored if |
method |
Character: Method used by |
... |
Additional arguments (unused) |
Note that in general it is probably mathematically impossible to cluster points in 2-dimensional space into k
groups, each with at least minIn
points, in a manner that seems "reasonable" to the eye in all cases. In experimentation, "unreasonable" results often appear when the number of groups is high.
A vector of integers the same length as the number of points in x
. Each integer indicates which fold a point in x
belongs to.
library(sf) library(terra) # lemur occurrence data data(mad0) data(lemurs) crs <- getCRS('WGS84') ll <- c('longitude', 'latitude') # use occurrences of all species... easier to see on map occs <- st_as_sf(lemurs, coords = ll, crs = getCRS('WGS84')) # create 100 background points mad0 <- vect(mad0) bg <- spatSample(mad0, 100) ### assign 3 folds to occurrences and to background sites k <- 3 minIn <- floor(nrow(occs) / k) # maximally spread between folds presFolds <- geoFold(occs, k = k, minIn = minIn) bgFolds <- geoFoldContrast(bg, pres = occs, presFolds = presFolds) # number of sites per fold table(presFolds) table(bgFolds) # map plot(mad0, border = 'gray', main = paste(k, 'geo-folds')) plot(bg, pch = 3, col = bgFolds + 1, add = TRUE) plot(st_geometry(occs), pch = 20 + presFolds, bg = presFolds + 1, add = TRUE) legend( 'bottomright', legend = c( 'presence fold 1', 'presence fold 2', 'presence fold 3', 'background fold 1', 'background fold 2', 'background fold 3' ), pch = c(21, 22, 23, 3, 3), col = c(rep('black', 3), 2, 3), pt.bg = c(2, 3, 4, NA, NA) )
library(sf) library(terra) # lemur occurrence data data(mad0) data(lemurs) crs <- getCRS('WGS84') ll <- c('longitude', 'latitude') # use occurrences of all species... easier to see on map occs <- st_as_sf(lemurs, coords = ll, crs = getCRS('WGS84')) # create 100 background points mad0 <- vect(mad0) bg <- spatSample(mad0, 100) ### assign 3 folds to occurrences and to background sites k <- 3 minIn <- floor(nrow(occs) / k) # maximally spread between folds presFolds <- geoFold(occs, k = k, minIn = minIn) bgFolds <- geoFoldContrast(bg, pres = occs, presFolds = presFolds) # number of sites per fold table(presFolds) table(bgFolds) # map plot(mad0, border = 'gray', main = paste(k, 'geo-folds')) plot(bg, pch = 3, col = bgFolds + 1, add = TRUE) plot(st_geometry(occs), pch = 20 + presFolds, bg = presFolds + 1, add = TRUE) legend( 'bottomright', legend = c( 'presence fold 1', 'presence fold 2', 'presence fold 3', 'background fold 1', 'background fold 2', 'background fold 3' ), pch = c(21, 22, 23, 3, 3), col = c(rep('black', 3), 2, 3), pt.bg = c(2, 3, 4, NA, NA) )
This function generates geographically-distinct cross-validation folds, or "geo-folds" of background or absence sites (i.e., "contrast" sites). Each contrast site is assigned to a fold based on the fold of the presence site that is closest. Typically, this function is run after geoFold
is run to assign presences to folds.
geoFoldContrast( contrast, pres, presFolds, contrastLongLat = 1:2, presLongLat = 1:2, ... )
geoFoldContrast( contrast, pres, presFolds, contrastLongLat = 1:2, presLongLat = 1:2, ... )
contrast |
A "spatial points" object representing contrast sites:
|
pres |
A "spatial points" object representing presence sites:
|
presFolds |
Numeric vector: These provide the folds to which |
contrastLongLat , presLongLat
|
Character or integer vector: A character or integer vector specifying the columns in |
... |
Additional arguments (unused) |
A vector of integers the same length as the number of points in contrast
. Each integer indicates which fold a point in contrast
belongs to.
library(sf) library(terra) # lemur occurrence data data(mad0) data(lemurs) crs <- getCRS('WGS84') ll <- c('longitude', 'latitude') # use occurrences of all species... easier to see on map occs <- st_as_sf(lemurs, coords = ll, crs = getCRS('WGS84')) # create 100 background points mad0 <- vect(mad0) bg <- spatSample(mad0, 100) ### assign 3 folds to occurrences and to background sites k <- 3 minIn <- floor(nrow(occs) / k) # maximally spread between folds presFolds <- geoFold(occs, k = k, minIn = minIn) bgFolds <- geoFoldContrast(bg, pres = occs, presFolds = presFolds) # number of sites per fold table(presFolds) table(bgFolds) # map plot(mad0, border = 'gray', main = paste(k, 'geo-folds')) plot(bg, pch = 3, col = bgFolds + 1, add = TRUE) plot(st_geometry(occs), pch = 20 + presFolds, bg = presFolds + 1, add = TRUE) legend( 'bottomright', legend = c( 'presence fold 1', 'presence fold 2', 'presence fold 3', 'background fold 1', 'background fold 2', 'background fold 3' ), pch = c(21, 22, 23, 3, 3), col = c(rep('black', 3), 2, 3), pt.bg = c(2, 3, 4, NA, NA) )
library(sf) library(terra) # lemur occurrence data data(mad0) data(lemurs) crs <- getCRS('WGS84') ll <- c('longitude', 'latitude') # use occurrences of all species... easier to see on map occs <- st_as_sf(lemurs, coords = ll, crs = getCRS('WGS84')) # create 100 background points mad0 <- vect(mad0) bg <- spatSample(mad0, 100) ### assign 3 folds to occurrences and to background sites k <- 3 minIn <- floor(nrow(occs) / k) # maximally spread between folds presFolds <- geoFold(occs, k = k, minIn = minIn) bgFolds <- geoFoldContrast(bg, pres = occs, presFolds = presFolds) # number of sites per fold table(presFolds) table(bgFolds) # map plot(mad0, border = 'gray', main = paste(k, 'geo-folds')) plot(bg, pch = 3, col = bgFolds + 1, add = TRUE) plot(st_geometry(occs), pch = 20 + presFolds, bg = presFolds + 1, add = TRUE) legend( 'bottomright', legend = c( 'presence fold 1', 'presence fold 2', 'presence fold 3', 'background fold 1', 'background fold 2', 'background fold 3' ), pch = c(21, 22, 23, 3, 3), col = c(rep('black', 3), 2, 3), pt.bg = c(2, 3, 4, NA, NA) )
This function thins geographic points such that none have nearest neighbors closer than some user-specified distance. For a given set of points that fall within this distance, thinning can be conducted in two ways. Both begin by first calculating all pairwise distances between points. Then, clusters of points are found based on proximity using the "single-linkage" method (i.e., based on minimum distance between groups). Then, either a deterministic or random method is used to select the retained points:
Deterministic: For each cluster, distances between each point in the cluster and all points outside of the cluster are calculated. The point retained in each cluster is the one with the greatest minimum pairwise distance to any points in any other cluster. This point will this be maximally isolated from any other point.
Random: For each cluster, a random point is chosen.
geoThin(x, minDist, random = FALSE, longLat = 1:2, method = "single", ...)
geoThin(x, minDist, random = FALSE, longLat = 1:2, method = "single", ...)
x |
A "spatial points" object of class |
minDist |
Minimum distance (in meters) needed between points to retain them. Points falling closer than this distance will be candidates for being discarded. |
random |
Logical: If |
longLat |
Numeric or integer vector: This is ignored if |
method |
Character: Method used by |
... |
Additional arguments. Not used. |
Object of class x
.
library(sf) # lemur occurrence data data(mad0) data(lemurs) crs <- getCRS('WGS84') occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] ll <- c('longitude', 'latitude') occs <- st_as_sf(occs, coords = ll, crs = getCRS('WGS84')) # deterministically thin det <- geoThin(x = occs, minDist = 30000) # randomly thin set.seed(123) rand <- geoThin(x = occs, minDist = 30000, random = TRUE) # map oldPar <- par(mfrow = c(1, 2)) plot(st_geometry(occs), cex = 1.4, main = 'Deterministic') plot(st_geometry(det), pch = 21, cex = 1.4, bg = 1:nrow(det), add = TRUE) plot(st_geometry(mad0), add = TRUE) plot(st_geometry(occs), cex = 1.4, main = 'Random') plot(st_geometry(rand), pch = 21, cex = 1.4, bg = 1:nrow(rand), add = TRUE) plot(st_geometry(mad0), add = TRUE) par(oldPar)
library(sf) # lemur occurrence data data(mad0) data(lemurs) crs <- getCRS('WGS84') occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] ll <- c('longitude', 'latitude') occs <- st_as_sf(occs, coords = ll, crs = getCRS('WGS84')) # deterministically thin det <- geoThin(x = occs, minDist = 30000) # randomly thin set.seed(123) rand <- geoThin(x = occs, minDist = 30000, random = TRUE) # map oldPar <- par(mfrow = c(1, 2)) plot(st_geometry(occs), cex = 1.4, main = 'Deterministic') plot(st_geometry(det), pch = 21, cex = 1.4, bg = 1:nrow(det), add = TRUE) plot(st_geometry(mad0), add = TRUE) plot(st_geometry(occs), cex = 1.4, main = 'Random') plot(st_geometry(rand), pch = 21, cex = 1.4, bg = 1:nrow(rand), add = TRUE) plot(st_geometry(mad0), add = TRUE) par(oldPar)
Retrieve the Well-Known text string (WKT2) for a coordinate reference system (CRS) by name or from a spatial object. The most common usage of the function is to return the WKT2 string using an easy-to-remember name. For example, getCRS('wgs84')
returns the WKT2 string for the WGS84 datum. To get a table of strings, just use getCRS()
.
getCRS(x = NULL, nice = FALSE, warn = TRUE)
getCRS(x = NULL, nice = FALSE, warn = TRUE)
x |
This can be any of:
|
nice |
If |
warn |
If |
A string representing WKT2 (well-known text) object or a data.frame
.
# view table of available CRSs getCRS() # get specific WKT2 strings getCRS('WGS84') getCRS('Mollweide') getCRS('WorldClim') # WKT2 strings nice for your eyes getCRS('WGS84', TRUE) data(mad0) getCRS(mad0)
# view table of available CRSs getCRS() # get specific WKT2 strings getCRS('WGS84') getCRS('Mollweide') getCRS('WorldClim') # WKT2 strings nice for your eyes getCRS('WGS84', TRUE) data(mad0) getCRS(mad0)
These functions get values from a raster at specific cells, or values to specific cells.
getValueByCell(x, cell, format = "raster") setValueByCell(x, val, cell, format = "raster")
getValueByCell(x, cell, format = "raster") setValueByCell(x, val, cell, format = "raster")
x |
A |
cell |
Cell indices. There must be one per value in |
format |
The type of cell indexing used. This can be either "raster" for row indexing (default) or "matrix" for column indexing. Row indexing (the default for rasters), starts with cell "1" in the upper left, cell "2" is to its right, and so on. Numbering then wraps around to the next row. Column indexing (the default for matrices) has the cell "1" in the upper left corner of the matrix. The cell "2" is below it, and so on. The numbering then wraps around to the top of the next column. |
val |
One or more values. If more the number of cells specified is greater than the number of values in |
A data frame (getValueByCell
) with cell numbers (in row format), or a SpatRaster
(setValueByCell
).
library(terra) x <- rast(nrow=10, ncol=10) x[] <- round(10 * runif(100)) cell <- c(1, 20, 40, 80) getValueByCell(x, cell = cell) getValueByCell(x, cell = cell, format = 'matrix') y <- setValueByCell(x, val = 20, cell = cell) plot(y) z <- setValueByCell(x, val = 30, cell = cell, format = 'matrix') plot(c(x, y, z))
library(terra) x <- rast(nrow=10, ncol=10) x[] <- round(10 * runif(100)) cell <- c(1, 20, 40, 80) getValueByCell(x, cell = cell) getValueByCell(x, cell = cell, format = 'matrix') y <- setValueByCell(x, val = 20, cell = cell) plot(y) z <- setValueByCell(x, val = 30, cell = cell, format = 'matrix') plot(c(x, y, z))
Calculate "global" statistics across all the values in a raster. This function is a wrapper for global
. That function, by default, sets na.rm = FALSE
, so any cell that is NA
can cause the summary statistic to also be NA
(usually undesirable). The function also returns a data.frame
, so often needs a further line of code to get the actual value(s). This function sets na.rm = TRUE
by default, and returns a numeric vector (not a data.frame
).
globalx(x, fun, na.rm = TRUE, ..., weights = NULL)
globalx(x, fun, na.rm = TRUE, ..., weights = NULL)
x |
A |
fun |
A function or the name of a function (in quotes). See |
na.rm |
If |
... |
Additional arguments to pass to |
weights |
Either |
A numeric vector, one value per layer in x
.
library(terra) r <- rast(ncols=10, nrows=10) values(r) <- 1:ncell(r) global(r, 'sum') # terra globalx(r, 'sum') # enmSdmX global(r, "mean", na.rm=TRUE)[1, 1] # terra... same as enmSdmX::globalx
library(terra) r <- rast(ncols=10, nrows=10) values(r) <- 1:ncell(r) global(r, 'sum') # terra globalx(r, 'sum') # enmSdmX global(r, "mean", na.rm=TRUE)[1, 1] # terra... same as enmSdmX::globalx
This function returns a series of rasters interpolated from another series of rasters. For example, the input might represent rasters of a process measured at times t, t + 1, and t + 4. The rasters at t + 2 and t + 3 could be interpolated based on the values in the other rasters. Note that this function can take a lot of time and memory, even for relatively small rasters.
interpolateRasts( rasts, interpFrom, interpTo, type = "linear", onFail = NA, useRasts = FALSE, na.rm = TRUE, verbose = TRUE, ... )
interpolateRasts( rasts, interpFrom, interpTo, type = "linear", onFail = NA, useRasts = FALSE, na.rm = TRUE, verbose = TRUE, ... )
rasts |
A "stack" of |
interpFrom |
Numeric vector, one value per raster in |
interpTo |
Numeric vector, values of "distances" at which to interpolate the rasters. |
type |
Character. The type of model used to do the interpolation. Note that some of these (the first few) are guaranteed to go through every point being interpolated from. The second set, however, are effectively regressions so are not guaranteed to do through any of the points. Note that some methods cannot handle cases where at least some series of cells have < a given number of non-
|
onFail |
Either |
useRasts |
Logical. If |
na.rm |
Logical, if |
verbose |
Logical. If |
... |
Other arguments passed to |
This function can be very memory-intensive for large rasters. It may speed things up (and make them possible) to do interpolations piece by piece (e.g., instead of interpolating between times t0, t1, t2, t3, ..., interpolate between t0 and t1, then t1 and t2, etc. This may give results that differ from using the entire set, however, depending on what type of interpolation is used. Note that using linear and splines will often yield very similar results, except that in a small number of cases splines may produce very extreme interpolated values.
A SpatRaster
"stack" with one layer per element in interpTo
.
approximate
, approxfun
, splinefun
, trainNS
, glm
, , bs
, smooth.spline
.
# NB: The example below can take a few minutes to run. library(terra) interpFrom <- c(1, 3, 4, 8, 10, 11, 15) interpTo <- 1:15 rx <- rast(nrows=10, ncols=10) r1 <- setValues(rx, rnorm(100, 1)) r3 <- setValues(rx, rnorm(100, 3)) r4 <- setValues(rx, rnorm(100, 5)) r8 <- setValues(rx, rnorm(100, 11)) r10 <- setValues(rx, rnorm(100, 3)) r11 <- setValues(rx, rnorm(100, 5)) r15 <- setValues(rx, rnorm(100, 13)) rasts <- c(r1, r3, r4, r8, r10, r11, r15) names(rasts) <- paste0('rasts', interpFrom) linear <- interpolateRasts(rasts, interpFrom, interpTo) spline <- interpolateRasts(rasts, interpFrom, interpTo, type='spline') gam <- interpolateRasts(rasts, interpFrom, interpTo, type='gam', onFail='linear') ns <- interpolateRasts(rasts, interpFrom, interpTo, type='ns', onFail='linear', verbose=FALSE) poly <- interpolateRasts(rasts, interpFrom, interpTo, type='poly', onFail='linear') bs <- interpolateRasts(rasts, interpFrom, interpTo, type='bs', onFail='linear') ss <- interpolateRasts(rasts, interpFrom, interpTo, type='smooth.spline', onFail='linear', verbose=FALSE) # examine trends for a particular point on the landscape pts <- matrix(c(-9, 13), ncol = 2) pts <- vect(pts) linearExt <- unlist(terra::extract(linear, pts, ID=FALSE)) splineExt <- unlist(terra::extract(spline, pts, ID=FALSE)) gamExt <- unlist(terra::extract(gam, pts, ID=FALSE)) nsExt <- unlist(terra::extract(ns, pts, ID=FALSE)) polyExt <- unlist(terra::extract(poly, pts, ID=FALSE)) bsExt <- unlist(terra::extract(bs, pts, ID=FALSE)) ssExt <- unlist(terra::extract(ss, pts, ID=FALSE)) mins <- min(linearExt, splineExt, gamExt, nsExt, polyExt, bsExt, ssExt) maxs <- max(linearExt, splineExt, gamExt, nsExt, polyExt, bsExt, ssExt) plot(interpTo, linearExt, type='l', lwd=2, ylim=c(mins, maxs), ylab='Value') lines(interpTo, splineExt, col='blue') lines(interpTo, gamExt, col='green') lines(interpTo, nsExt, col='orange') lines(interpTo, polyExt, col='gray') lines(interpTo, bsExt, col='magenta') lines(interpTo, ssExt, col='cyan') ext <- unlist(extract(rasts, pts, ID = FALSE)) points(interpFrom, ext) legend('topleft', inset=0.01, lty=c(rep(1, 7), NA), legend=c('linear', 'spline', 'GAM', 'NS', 'polynomial', 'B-spline', 'Smooth spline', 'Observed'), col=c('black', 'blue', 'green', 'orange', 'gray', 'magenta', 'cyan'), pch=c(rep(NA, 7), 1))
# NB: The example below can take a few minutes to run. library(terra) interpFrom <- c(1, 3, 4, 8, 10, 11, 15) interpTo <- 1:15 rx <- rast(nrows=10, ncols=10) r1 <- setValues(rx, rnorm(100, 1)) r3 <- setValues(rx, rnorm(100, 3)) r4 <- setValues(rx, rnorm(100, 5)) r8 <- setValues(rx, rnorm(100, 11)) r10 <- setValues(rx, rnorm(100, 3)) r11 <- setValues(rx, rnorm(100, 5)) r15 <- setValues(rx, rnorm(100, 13)) rasts <- c(r1, r3, r4, r8, r10, r11, r15) names(rasts) <- paste0('rasts', interpFrom) linear <- interpolateRasts(rasts, interpFrom, interpTo) spline <- interpolateRasts(rasts, interpFrom, interpTo, type='spline') gam <- interpolateRasts(rasts, interpFrom, interpTo, type='gam', onFail='linear') ns <- interpolateRasts(rasts, interpFrom, interpTo, type='ns', onFail='linear', verbose=FALSE) poly <- interpolateRasts(rasts, interpFrom, interpTo, type='poly', onFail='linear') bs <- interpolateRasts(rasts, interpFrom, interpTo, type='bs', onFail='linear') ss <- interpolateRasts(rasts, interpFrom, interpTo, type='smooth.spline', onFail='linear', verbose=FALSE) # examine trends for a particular point on the landscape pts <- matrix(c(-9, 13), ncol = 2) pts <- vect(pts) linearExt <- unlist(terra::extract(linear, pts, ID=FALSE)) splineExt <- unlist(terra::extract(spline, pts, ID=FALSE)) gamExt <- unlist(terra::extract(gam, pts, ID=FALSE)) nsExt <- unlist(terra::extract(ns, pts, ID=FALSE)) polyExt <- unlist(terra::extract(poly, pts, ID=FALSE)) bsExt <- unlist(terra::extract(bs, pts, ID=FALSE)) ssExt <- unlist(terra::extract(ss, pts, ID=FALSE)) mins <- min(linearExt, splineExt, gamExt, nsExt, polyExt, bsExt, ssExt) maxs <- max(linearExt, splineExt, gamExt, nsExt, polyExt, bsExt, ssExt) plot(interpTo, linearExt, type='l', lwd=2, ylim=c(mins, maxs), ylab='Value') lines(interpTo, splineExt, col='blue') lines(interpTo, gamExt, col='green') lines(interpTo, nsExt, col='orange') lines(interpTo, polyExt, col='gray') lines(interpTo, bsExt, col='magenta') lines(interpTo, ssExt, col='cyan') ext <- unlist(extract(rasts, pts, ID = FALSE)) points(interpFrom, ext) legend('topleft', inset=0.01, lty=c(rep(1, 7), NA), legend=c('linear', 'spline', 'GAM', 'NS', 'polynomial', 'B-spline', 'Smooth spline', 'Observed'), col=c('black', 'blue', 'green', 'orange', 'gray', 'magenta', 'cyan'), pch=c(rep(NA, 7), 1))
Data frame of lemur occurrences
data(lemurs)
data(lemurs)
An object of class 'data.frame'
.
data(lemurs) lemurs
data(lemurs) lemurs
This function generates a raster stack with two rasters, one with cell values equal to the cell's longitude and the other with cell values equal to the cell's latitude.
longLatRasts(x, m = TRUE, filePath = NULL, ...)
longLatRasts(x, m = TRUE, filePath = NULL, ...)
x |
|
m |
Any of:
|
filePath |
String or |
... |
Arguments to pass to |
Object of class SpatRaster
.
library(terra) # generate long/lat rasters for the world x <- rast() # raster with 1 deg resolution and extent equal to entire world x[] <- 1:ncell(x) longLat <- longLatRasts(x) plot(longLat) # demonstrate masking # randomly force some cells to NA v <- 1:ncell(x) n <- 10000 v[sample(v, n)] <- NA x[] <- v longLatTRUE <- longLatRasts(x, m = TRUE) longLatFALSE <- longLatRasts(x, m = FALSE) rasts <- c(x, longLatTRUE, x, longLatFALSE) names(rasts) <- c('x', 'long_m_TRUE', 'lat_m_TRUE', 'x', 'long_m_FALSE', 'lat_m_FALSE') plot(rasts)
library(terra) # generate long/lat rasters for the world x <- rast() # raster with 1 deg resolution and extent equal to entire world x[] <- 1:ncell(x) longLat <- longLatRasts(x) plot(longLat) # demonstrate masking # randomly force some cells to NA v <- 1:ncell(x) n <- 10000 v[sample(v, n)] <- NA x[] <- v longLatTRUE <- longLatRasts(x, m = TRUE) longLatFALSE <- longLatRasts(x, m = FALSE) rasts <- c(x, longLatTRUE, x, longLatFALSE) names(rasts) <- c('x', 'long_m_TRUE', 'lat_m_TRUE', 'x', 'long_m_FALSE', 'lat_m_FALSE') plot(rasts)
Outline of Madagascar from GADM. The geometry has been simplified from the version available in GADM, so pleased do not use this for "official" analyses.
data(mad0, package='enmSdmX')
data(mad0, package='enmSdmX')
An object of class sf
.
library(sf) data(mad0) mad0 plot(st_geometry(mad0), main='Madagascar')
library(sf) data(mad0) mad0 plot(st_geometry(mad0), main='Madagascar')
Outlines of regions ("Faritra") of Madagascar from GADM. The geometry has been simplified from the version available in GADM, so pleased do not use this for "official" analyses.
data(mad1, package='enmSdmX')
data(mad1, package='enmSdmX')
An object of class sf
.
library(sf) data(mad1) mad1 plot(st_geometry(mad1), main='Madagascar')
library(sf) data(mad1) mad1 plot(st_geometry(mad1), main='Madagascar')
Rasters representing average climate across 1970-2000 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
An object of class 'SpatRaster'
.
library(terra) rastFile <- system.file('extdata', 'madClim.tif', package='enmSdmX') madClim <- rast(rastFile) plot(madClim)
library(terra) rastFile <- system.file('extdata', 'madClim.tif', package='enmSdmX') madClim <- rast(rastFile) plot(madClim)
Rasters representing average climate across 2021-2040 modeled with CanESM5 for SSP 585 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
An object of class 'SpatRaster'
.
library(terra) rastFile <- system.file('extdata', 'madClim2030.tif', package='enmSdmX') madClimFut <- rast(rastFile) plot(madClimFut)
library(terra) rastFile <- system.file('extdata', 'madClim2030.tif', package='enmSdmX') madClimFut <- rast(rastFile) plot(madClimFut)
Rasters representing average climate across 2041-2060 modeled with CanESM5 for SSP 585 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
An object of class 'SpatRaster'
.
library(terra) rastFile <- system.file('extdata', 'madClim2050.tif', package='enmSdmX') madClimFut <- rast(rastFile) plot(madClimFut)
library(terra) rastFile <- system.file('extdata', 'madClim2050.tif', package='enmSdmX') madClimFut <- rast(rastFile) plot(madClimFut)
Rasters representing average climate across 2061-2080 modeled with CanESM5 for SSP 585 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
An object of class 'SpatRaster'
.
library(terra) rastFile <- system.file('extdata', 'madClim2070.tif', package='enmSdmX') madClimFut <- rast(rastFile) plot(madClimFut)
library(terra) rastFile <- system.file('extdata', 'madClim2070.tif', package='enmSdmX') madClimFut <- rast(rastFile) plot(madClimFut)
Rasters representing average climate across 2081-2100 modeled with CanESM5 for SSP 585 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
An object of class 'SpatRaster'
.
library(terra) rastFile <- system.file('extdata', 'madClim2090.tif', package='enmSdmX') madClimFut <- rast(rastFile) plot(madClimFut)
library(terra) rastFile <- system.file('extdata', 'madClim2090.tif', package='enmSdmX') madClimFut <- rast(rastFile) plot(madClimFut)
This function returns the number of response data used in a model (i.e., the sample size). If the data are binary it can return the number of 1s and 0s.
modelSize(x, binary = TRUE, graceful = TRUE)
modelSize(x, binary = TRUE, graceful = TRUE)
x |
A model object. This can be of many classes, including "gbm", "glm", "gam", "MaxEnt", and so on. |
binary |
If |
graceful |
If |
One or two named integers.
set.seed(123) y <- runif(1:101)^2 yBinary <- as.integer(y > 0.6) x <- data.frame(x1=1:101, x2=rnorm(101)) model <- lm(y ~ x1 + x2, data=x) modelBinary <- glm(yBinary ~ x1 + x2, data=x, family='binomial') modelSize(model, FALSE) modelSize(model, TRUE) # not binary input... notice warning modelSize(modelBinary) modelSize(modelBinary, FALSE)
set.seed(123) y <- runif(1:101)^2 yBinary <- as.integer(y > 0.6) x <- data.frame(x1=1:101, x2=rnorm(101)) model <- lm(y ~ x1 + x2, data=x) modelBinary <- glm(yBinary ~ x1 + x2, data=x, family='binomial') modelSize(model, FALSE) modelSize(model, TRUE) # not binary input... notice warning modelSize(modelBinary) modelSize(modelBinary, FALSE)
This function implements the "nearest environmental point" method (Smith et al. 2023) to enable the use of occurrence records geolocated only to a general place (e.g., a country or province), along with occurrences georeferenced with little error. The function returns environments from a set of precisely-geolocated points plus the environment associated with each imprecise record.
nearestEnvPoints( rasts, pts = NULL, polys = NULL, centerFrom = "pts", pca = TRUE, numPcs = 3, center = TRUE, scale = TRUE, rule = "nearest", na.rm = TRUE, out = "both" )
nearestEnvPoints( rasts, pts = NULL, polys = NULL, centerFrom = "pts", pca = TRUE, numPcs = 3, center = TRUE, scale = TRUE, rule = "nearest", na.rm = TRUE, out = "both" )
rasts |
A |
pts |
A set of spatial points of class |
polys |
A set of spatial polygons of class |
centerFrom |
Indicates how to locate the "reference" centroid used to identify single points on each polygon. This is only relevant if both
|
pca |
If |
numPcs |
The number of PC axes used to find environmental centroids. This is only used if |
center , scale
|
Settings for |
rule |
Determines how to identify the single environmental point to associate with each polygon. Options include:
|
na.rm |
If |
out |
Determines what is returned. Only used if both
|
This function locates a set of points from the environments covered by each polygon using the following procedure, the details of which depend on what arguments are specified:
Only pts
is specified: Environments are taken directly from the locations of pts
in environmental space.
Only polys
is specified: Environments are taken from the closest environment of all the environments associated with each each polygon that is closest to the environmental centroid of the environmental centroids of the polygons (that may be confusing, but it is not a typo).
pts
and polys
are specified: Environments are taken from the locations of pts
plus the environment from each polygon closest to the environmental centroid of pts
. By default, the function uses the environmental centroid of the precise occurrences in step (1), but this can be changed to the environmental centroid of the centroids of the polygons or the environmental centroid of the points defined by the union of precise occurrence points plus the environmental centroids of the polygons.
The function can alternatively return the points on the vertices of the MCP, or points on the input polygons closest to the reference centroid.
A data frame.
Smith, A.B., Murphy, S.J., Henderson, D., and Erickson, K.D. 2023. Including imprecisely georeferenced specimens improves accuracy of species distribution models and estimates of niche breadth. Global Ecology and Biogeography In press. Open access pre-print: doi:10.1101/2021.06.10.447988
nearestGeogPoints
for the "nearest geographic point" method, a related approach for geographic space.
# This is a contrived example based on red-bellied lemurs in Madagascar. # Point locations (which are real data) will be assumed to be "precise" # records. We will designate a set of Faritas ("counties") to represent # "imprecise" occurrences that can only be georeferenced to a geopolitical # unit. library(sf) library(terra) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur point data data(lemurs) precise <- lemurs[lemurs$species == 'Eulemur rubriventer', ] ll <- c('longitude', 'latitude') precise <- sf::st_as_sf(precise[ , ll], coords=ll, crs=wgs84) # *fake* lemur administrative unit-level data faritras <- c('Vakinankaratra', 'Haute matsiatra', 'Ihorombe', 'Vatovavy Fitovinany', 'Alaotra-Mangoro', 'Analanjirofo', 'Atsinanana', 'Analamanga', 'Itasy') data(mad1) imprecise <- mad1[mad1$NAME_2 %in% faritras, ] # climate predictors rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') rasts <- rast(rastFile) ### Plot environment of points and environments of each polygon closest to ### centroid of environments of points. In this example, we use the first two ### principal component axes to characterize the niche. # apply Nearest Environmental Point method envPtsPolys <- nearestEnvPoints(rasts, pts = precise, polys = imprecise, pca = TRUE, numPcs = 2) envPolys <- nearestEnvPoints(rasts, pts = precise, polys = imprecise, numPcs = 2, out = 'polys') envPts <- nearestEnvPoints(rasts, pts = precise, polys = imprecise, numPcs = 2, out = 'pts') allPolyEnvs <- extract(rasts, imprecise) # plot occurrences in environmental space plot(envPtsPolys$PC1, envPtsPolys$PC2, pch=16, col='black', xlab='PC1', ylab='PC2') points(envPolys$PC1, envPolys$PC2, pch=21, bg='orange') legend( 'bottomleft', inset = 0.01, legend = c('precise', 'imprecise (closest)'), pch = c(16, 21), col = c('black', 'black'), pt.bg = c('orange', 'orange') ) ### compare identified environments to all environments across all polygons ########################################################################### env <- as.data.frame(rasts) pca <- stats::prcomp(env, center=TRUE, scale.=TRUE) allPolyEnvs <- extract(rasts, imprecise, ID = FALSE) allPolyEnvsPcs <- predict(pca, allPolyEnvs) allPolyEnvs <- cbind(allPolyEnvs, allPolyEnvsPcs) # plot in environmental space plot(allPolyEnvs$PC1, allPolyEnvs$PC2, pch=16, col='orange', xlab='PC1', ylab='PC2') points(envPts$PC1, envPts$PC2, pch=16) points(envPolys$PC1, envPolys$PC2, pch=1) legend( 'bottomleft', inset = 0.01, legend = c('precise', 'imprecise (closest)', 'imprecise (all)'), pch = c(16, 21, 16), col = c('black', 'black', 'orange'), pt.bg = c(NA, 'orange') ) ### display niches (minimum convex hulls) estimated ### using just precise or precise + imprecise records ##################################################### pcs <- c('PC1', 'PC2') preciseIndices <- chull(envPts[ , pcs]) preciseImpreciseIndices <- chull(envPtsPolys[ , pcs]) preciseIndices <- c(preciseIndices, preciseIndices[1]) preciseImpreciseIndices <- c(preciseImpreciseIndices, preciseImpreciseIndices[1]) preciseOnlyNiche <- envPts[preciseIndices, pcs] preciseImpreciseNiche <- envPtsPolys[preciseImpreciseIndices, pcs] # plot in environmental space plot(allPolyEnvs$PC1, allPolyEnvs$PC2, pch=16, col='orange', xlab='PC1', ylab='PC2') points(envPts$PC1, envPts$PC2, pch=16) points(envPolys$PC1, envPolys$PC2, pch=1) lines(preciseImpreciseNiche, col='coral4', lwd=2) lines(preciseOnlyNiche, lty='dotted') legend( 'bottomleft', inset = 0.01, legend = c('precise', 'imprecise (closest)', 'imprecise (all)', 'MCP imprecise-only', 'MCP precise + imprecise'), pch = c(16, 21, 16, NA, NA), col = c('black', 'black', 'orange', 'black', 'coral4'), pt.bg = c(NA, 'orange', NA, NA, NA), lwd = c(NA, NA, NA, 1, 2), lty = c(NA, NA, NA, 'dotted', 'solid') )
# This is a contrived example based on red-bellied lemurs in Madagascar. # Point locations (which are real data) will be assumed to be "precise" # records. We will designate a set of Faritas ("counties") to represent # "imprecise" occurrences that can only be georeferenced to a geopolitical # unit. library(sf) library(terra) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur point data data(lemurs) precise <- lemurs[lemurs$species == 'Eulemur rubriventer', ] ll <- c('longitude', 'latitude') precise <- sf::st_as_sf(precise[ , ll], coords=ll, crs=wgs84) # *fake* lemur administrative unit-level data faritras <- c('Vakinankaratra', 'Haute matsiatra', 'Ihorombe', 'Vatovavy Fitovinany', 'Alaotra-Mangoro', 'Analanjirofo', 'Atsinanana', 'Analamanga', 'Itasy') data(mad1) imprecise <- mad1[mad1$NAME_2 %in% faritras, ] # climate predictors rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') rasts <- rast(rastFile) ### Plot environment of points and environments of each polygon closest to ### centroid of environments of points. In this example, we use the first two ### principal component axes to characterize the niche. # apply Nearest Environmental Point method envPtsPolys <- nearestEnvPoints(rasts, pts = precise, polys = imprecise, pca = TRUE, numPcs = 2) envPolys <- nearestEnvPoints(rasts, pts = precise, polys = imprecise, numPcs = 2, out = 'polys') envPts <- nearestEnvPoints(rasts, pts = precise, polys = imprecise, numPcs = 2, out = 'pts') allPolyEnvs <- extract(rasts, imprecise) # plot occurrences in environmental space plot(envPtsPolys$PC1, envPtsPolys$PC2, pch=16, col='black', xlab='PC1', ylab='PC2') points(envPolys$PC1, envPolys$PC2, pch=21, bg='orange') legend( 'bottomleft', inset = 0.01, legend = c('precise', 'imprecise (closest)'), pch = c(16, 21), col = c('black', 'black'), pt.bg = c('orange', 'orange') ) ### compare identified environments to all environments across all polygons ########################################################################### env <- as.data.frame(rasts) pca <- stats::prcomp(env, center=TRUE, scale.=TRUE) allPolyEnvs <- extract(rasts, imprecise, ID = FALSE) allPolyEnvsPcs <- predict(pca, allPolyEnvs) allPolyEnvs <- cbind(allPolyEnvs, allPolyEnvsPcs) # plot in environmental space plot(allPolyEnvs$PC1, allPolyEnvs$PC2, pch=16, col='orange', xlab='PC1', ylab='PC2') points(envPts$PC1, envPts$PC2, pch=16) points(envPolys$PC1, envPolys$PC2, pch=1) legend( 'bottomleft', inset = 0.01, legend = c('precise', 'imprecise (closest)', 'imprecise (all)'), pch = c(16, 21, 16), col = c('black', 'black', 'orange'), pt.bg = c(NA, 'orange') ) ### display niches (minimum convex hulls) estimated ### using just precise or precise + imprecise records ##################################################### pcs <- c('PC1', 'PC2') preciseIndices <- chull(envPts[ , pcs]) preciseImpreciseIndices <- chull(envPtsPolys[ , pcs]) preciseIndices <- c(preciseIndices, preciseIndices[1]) preciseImpreciseIndices <- c(preciseImpreciseIndices, preciseImpreciseIndices[1]) preciseOnlyNiche <- envPts[preciseIndices, pcs] preciseImpreciseNiche <- envPtsPolys[preciseImpreciseIndices, pcs] # plot in environmental space plot(allPolyEnvs$PC1, allPolyEnvs$PC2, pch=16, col='orange', xlab='PC1', ylab='PC2') points(envPts$PC1, envPts$PC2, pch=16) points(envPolys$PC1, envPolys$PC2, pch=1) lines(preciseImpreciseNiche, col='coral4', lwd=2) lines(preciseOnlyNiche, lty='dotted') legend( 'bottomleft', inset = 0.01, legend = c('precise', 'imprecise (closest)', 'imprecise (all)', 'MCP imprecise-only', 'MCP precise + imprecise'), pch = c(16, 21, 16, NA, NA), col = c('black', 'black', 'orange', 'black', 'coral4'), pt.bg = c(NA, 'orange', NA, NA, NA), lwd = c(NA, NA, NA, 1, 2), lty = c(NA, NA, NA, 'dotted', 'solid') )
This function implements the "nearest geographic point" method (Smith et al. 2023) to enable the use of occurrence records geolocated only to a general place (e.g., a country or province), along with occurrences georeferenced with little error. The function returns a minimum convex polygon (MCP) constructed from a set of spatial polygons and/or points.
nearestGeogPoints( pts = NULL, polys = NULL, centerFrom = "pts", return = "mcp", terra = TRUE )
nearestGeogPoints( pts = NULL, polys = NULL, centerFrom = "pts", return = "mcp", terra = TRUE )
pts |
Either |
polys |
Either |
centerFrom |
Indicates how to locate the "reference" centroid used to identify points on each polygon. This is only relevant if both
|
return |
Determines what is returned:
|
terra |
If |
This function constructs a minimum convex polygon (MCP) from a set of spatial points and/or spatial polygons. The manner in which this is done depends on whether polys
and/or pts
are specified:
Only pts
is supplied: The MCP is constructed directly from the points.
Only polys
is supplied: The MCP is constructed from the point on each polygon closest to the centroid of the centroids of the polygons.
Both pts
and polys
are supplied: The MCP is constructed from the combined set of pts
and from the point on each polygon closest to the centroid of pts
. By default, the function uses the centroid of the precise occurrences in step (1), but this can be changed to the centroid of the centroids of the polygons or the centroid of the points defined by the union of precise occurrence points plus the centroids of the polygons.
The function can alternatively return the points on the vertices of the MCP, or points on the input polygons closest to the reference centroid.
SpatVector
, or sf POLYGON
representing a minimum convex polygon.
Smith, A.B., Murphy, S.J., Henderson, D., and Erickson, K.D. 2023. Including imprecisely georeferenced specimens improves accuracy of species distribution models and estimates of niche breadth. Global Ecology and Biogeography 32:342-355. doi:10.1111/geb.13628 Open access pre-print: doi:10.1101/2021.06.10.447988.
nearestEnvPoints
for the "nearest environmental point" method, a related application for estimating niche breadth in environmental space.
library(sf) library(terra) ####################################################### ### example using SpatVector inputs (terra package) ### ####################################################### ### prepare data ################ # Get coordinate reference systems: # * WGS84 # * Tananarive (Paris) / Laborde Grid - EPSG:29701 wgs84 <- getCRS('WGS84') madProj <- getCRS('Madagascar Albers') # outline of Madagascar faritras data(mad1) mad1 <- vect(mad1) mad1 <- project(mad1, madProj) # lemur point data data(lemurs) redBelly <- lemurs[lemurs$species == 'Eulemur rubriventer', ] ll <- c('longitude', 'latitude') redBelly <- vect(redBelly, geom=ll, crs=wgs84) redBelly <- project(redBelly, madProj) # *fake* lemur farita-level data faritras <- c('Toamasina', 'Atsimo-Atsinana', 'Amoron\'i mania', 'Sava', 'Itasy') polys <- mad1[mad1$NAME_2 %in% faritras, ] ### apply Nearest Geographic Point method ######################################### # get three kinds of minimum convex polygons (MCPs): # MCP using just polygons mcpPolys <- nearestGeogPoints(polys = polys) # MCP using just points mcpPts <- nearestGeogPoints(pts = redBelly) # MCP using points & polys mcpPolysPoints <- nearestGeogPoints(pts = redBelly, polys = polys) # compare extent of occurrence (EOO) in m2 expanse(mcpPolys) expanse(mcpPts) expanse(mcpPolysPoints) ### plot minimum convex polygons ################################ # MCP from precise occurrences only plot(mad1, border='gray', main='MCP points only') plot(polys, col='gray80', add=TRUE) plot(mcpPts, col=scales::alpha('red', 0.4), add=TRUE) plot(redBelly, pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('red', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) # MCP from imprecise occurrences only plot(mad1, border='gray', main='MCP polys only') plot(polys, col='gray80', add=TRUE) plot(mcpPolys, col=scales::alpha('orange', 0.4), add=TRUE) plot(redBelly, pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('orange', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) # MCP from precise and imprecise occurrences plot(mad1, border='gray', main='MCP polys + points') plot(polys, col='gray80', add=TRUE) plot(mcpPolysPoints, col=scales::alpha('green', 0.4), add=TRUE) plot(redBelly, pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('green', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) ############################################ ### example using sf inputs (sf package) ### ############################################ ### prepare data ################ # Get coordinate reference systems: # * WGS84 # * Tananarive (Paris) / Laborde Grid - EPSG:29701 madProj <- sf::st_crs(getCRS('Madagascar Albers')) wgs84 <- getCRS('WGS84') # outline of Madagascar faritras data(mad1) mad1 <- sf::st_transform(mad1, madProj) # lemur point occurrence data data(lemurs) redBelly <- lemurs[lemurs$species == 'Eulemur rubriventer', ] ll <- c('longitude', 'latitude') redBelly <- sf::st_as_sf(redBelly[ , ll], crs=wgs84, coords=ll) redBelly <- sf::st_transform(redBelly, madProj) # *fake* farita-level occurrences faritras <- c('Toamasina', 'Atsimo-Atsinana', 'Amoron\'i mania', 'Sava', 'Itasy') polys <- mad1[mad1$NAME_2 %in% faritras, ] ### apply Nearest Geographic Point method ######################################### # get three kinds of minimum convex polygons (MCPs): # MCP using just polygons mcpPolys <- nearestGeogPoints(polys = polys, terra = FALSE) # MCP using just points mcpPts <- nearestGeogPoints(pts = redBelly, terra = FALSE) # MCP using points & polys mcpPolysPoints <- nearestGeogPoints(pts = redBelly, polys = polys, terra = FALSE) # extent of occurrence (EOO) in m2 sf::st_area(mcpPolys) sf::st_area(mcpPts) sf::st_area(mcpPolysPoints) ### plot minimum convex polygons ################################ # MCP from precise occurrences only plot(st_geometry(mad1), border='gray', main='MCP points only') plot(st_geometry(polys), col='gray80', add=TRUE) plot(st_geometry(mcpPts), col=scales::alpha('red', 0.4), add=TRUE) plot(st_geometry(redBelly), pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('red', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) # MCP from imprecise occurrences only plot(st_geometry(mad1), border='gray', main='MCP points only') plot(st_geometry(polys), col='gray80', add=TRUE) plot(st_geometry(mcpPolys), col=scales::alpha('orange', 0.4), add=TRUE) plot(st_geometry(redBelly), pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('orange', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) # MCP from precise and imprecise occurrences plot(st_geometry(mad1), border='gray', main='MCP points only') plot(st_geometry(polys), col='gray80', add=TRUE) plot(st_geometry(mcpPolysPoints), col=scales::alpha('green', 0.4), add=TRUE) plot(st_geometry(redBelly), pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('green', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) ### NOTE # Using SpatVector input (terra package) yields EOOs that are slightly # larger than using Spatial* (sp) or sf (sf) objects (by about 0.03-0.07% # in this example). The difference arises because terra::expanse() yields a # different value than sf::st_area.
library(sf) library(terra) ####################################################### ### example using SpatVector inputs (terra package) ### ####################################################### ### prepare data ################ # Get coordinate reference systems: # * WGS84 # * Tananarive (Paris) / Laborde Grid - EPSG:29701 wgs84 <- getCRS('WGS84') madProj <- getCRS('Madagascar Albers') # outline of Madagascar faritras data(mad1) mad1 <- vect(mad1) mad1 <- project(mad1, madProj) # lemur point data data(lemurs) redBelly <- lemurs[lemurs$species == 'Eulemur rubriventer', ] ll <- c('longitude', 'latitude') redBelly <- vect(redBelly, geom=ll, crs=wgs84) redBelly <- project(redBelly, madProj) # *fake* lemur farita-level data faritras <- c('Toamasina', 'Atsimo-Atsinana', 'Amoron\'i mania', 'Sava', 'Itasy') polys <- mad1[mad1$NAME_2 %in% faritras, ] ### apply Nearest Geographic Point method ######################################### # get three kinds of minimum convex polygons (MCPs): # MCP using just polygons mcpPolys <- nearestGeogPoints(polys = polys) # MCP using just points mcpPts <- nearestGeogPoints(pts = redBelly) # MCP using points & polys mcpPolysPoints <- nearestGeogPoints(pts = redBelly, polys = polys) # compare extent of occurrence (EOO) in m2 expanse(mcpPolys) expanse(mcpPts) expanse(mcpPolysPoints) ### plot minimum convex polygons ################################ # MCP from precise occurrences only plot(mad1, border='gray', main='MCP points only') plot(polys, col='gray80', add=TRUE) plot(mcpPts, col=scales::alpha('red', 0.4), add=TRUE) plot(redBelly, pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('red', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) # MCP from imprecise occurrences only plot(mad1, border='gray', main='MCP polys only') plot(polys, col='gray80', add=TRUE) plot(mcpPolys, col=scales::alpha('orange', 0.4), add=TRUE) plot(redBelly, pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('orange', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) # MCP from precise and imprecise occurrences plot(mad1, border='gray', main='MCP polys + points') plot(polys, col='gray80', add=TRUE) plot(mcpPolysPoints, col=scales::alpha('green', 0.4), add=TRUE) plot(redBelly, pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('green', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) ############################################ ### example using sf inputs (sf package) ### ############################################ ### prepare data ################ # Get coordinate reference systems: # * WGS84 # * Tananarive (Paris) / Laborde Grid - EPSG:29701 madProj <- sf::st_crs(getCRS('Madagascar Albers')) wgs84 <- getCRS('WGS84') # outline of Madagascar faritras data(mad1) mad1 <- sf::st_transform(mad1, madProj) # lemur point occurrence data data(lemurs) redBelly <- lemurs[lemurs$species == 'Eulemur rubriventer', ] ll <- c('longitude', 'latitude') redBelly <- sf::st_as_sf(redBelly[ , ll], crs=wgs84, coords=ll) redBelly <- sf::st_transform(redBelly, madProj) # *fake* farita-level occurrences faritras <- c('Toamasina', 'Atsimo-Atsinana', 'Amoron\'i mania', 'Sava', 'Itasy') polys <- mad1[mad1$NAME_2 %in% faritras, ] ### apply Nearest Geographic Point method ######################################### # get three kinds of minimum convex polygons (MCPs): # MCP using just polygons mcpPolys <- nearestGeogPoints(polys = polys, terra = FALSE) # MCP using just points mcpPts <- nearestGeogPoints(pts = redBelly, terra = FALSE) # MCP using points & polys mcpPolysPoints <- nearestGeogPoints(pts = redBelly, polys = polys, terra = FALSE) # extent of occurrence (EOO) in m2 sf::st_area(mcpPolys) sf::st_area(mcpPts) sf::st_area(mcpPolysPoints) ### plot minimum convex polygons ################################ # MCP from precise occurrences only plot(st_geometry(mad1), border='gray', main='MCP points only') plot(st_geometry(polys), col='gray80', add=TRUE) plot(st_geometry(mcpPts), col=scales::alpha('red', 0.4), add=TRUE) plot(st_geometry(redBelly), pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('red', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) # MCP from imprecise occurrences only plot(st_geometry(mad1), border='gray', main='MCP points only') plot(st_geometry(polys), col='gray80', add=TRUE) plot(st_geometry(mcpPolys), col=scales::alpha('orange', 0.4), add=TRUE) plot(st_geometry(redBelly), pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('orange', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) # MCP from precise and imprecise occurrences plot(st_geometry(mad1), border='gray', main='MCP points only') plot(st_geometry(polys), col='gray80', add=TRUE) plot(st_geometry(mcpPolysPoints), col=scales::alpha('green', 0.4), add=TRUE) plot(st_geometry(redBelly), pch=21, bg='red', add=TRUE) legend('topleft', legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'), fill=c(NA, 'gray', scales::alpha('green', 0.4)), pch=c(21, NA, NA), pt.bg=c('red', NA, NA), border=c(NA, 'black', 'black')) ### NOTE # Using SpatVector input (terra package) yields EOOs that are slightly # larger than using Spatial* (sp) or sf (sf) objects (by about 0.03-0.07% # in this example). The difference arises because terra::expanse() yields a # different value than sf::st_area.
This function calculates several metrics of niche overlap based on predictions for two species (or for the same species but different models) at the same sites.
nicheOverlapMetrics( x1, x2, method = c("meanDiff", "meanAbsDiff", "rmsd", "d", "i", "esp", "cor", "rankCor"), w = rep(1, length(x1)), na.rm = FALSE, ... )
nicheOverlapMetrics( x1, x2, method = c("meanDiff", "meanAbsDiff", "rmsd", "d", "i", "esp", "cor", "rankCor"), w = rep(1, length(x1)), na.rm = FALSE, ... )
x1 |
Numeric. Vector of predictions from a model. |
x2 |
Numeric. Vector of predictions from another model. |
method |
Character vector, indicates type of metric to calculate:
|
w |
Numeric vector. Weights of predictions in |
na.rm |
Logical. If T |
... |
Other arguments (not used). |
List object with one element per value specified by the argument in method
.
Warren, D.L., Glor, R.E., and Turelli, M. 2008. Environmental niche equivalency versus conservatism: Quantitative approaches to niche evolution. Evolution 62:2868-2883. doi:10.1111/j.1558-5646.2008.00482.x
Warren, D.L., Glor, R.E., and Turelli, M. 2008. Erratum. Evolution 62:2868-2883. doi:10.1111/j.1558-5646.2010.01204.x
Godsoe, W. 2014. Inferring the similarity of species distributions using Species' Distribution Models. Ecography 37:130-136. doi:10.1111/j.1600-0587.2013.00403.x
x1 <- seq(0, 1, length.out=100) x2 <- x1^2 nicheOverlapMetrics(x1, x2)
x1 <- seq(0, 1, length.out=100) x2 <- x1^2 nicheOverlapMetrics(x1, x2)
This function creates a "rectangular" SpatVector
object with the same dimensions as a plot window. It is especially useful for cropping subsequent rasters or vector objects to the plot window. A plot must be made before calling this function.
plotExtent(x = NULL)
plotExtent(x = NULL)
x |
Either |
SpatVector
if (FALSE) { library(sf) data(mad0) plot(st_geometry(mad0)) outline <- plotExtent(mad0) plot(outline, col='cornflowerblue', lty='dotted') plot(st_geometry(mad0), add=TRUE) }
if (FALSE) { library(sf) data(mad0) plot(st_geometry(mad0)) outline <- plotExtent(mad0) plot(outline, col='cornflowerblue', lty='dotted') plot(st_geometry(mad0), add=TRUE) }
This is a generic predict function that automatically uses the model common arguments for predicting models of the following types: linear models, generalized linear models (GLMs), generalized additive models (GAMs), random forests, boosted regression trees (BRTs)/gradient boosting machines (GBMs), conditional random forests, MaxEnt, and more.
predictEnmSdm( model, newdata, maxentFun = "terra", scale = TRUE, cores = 1, nrows = nrow(newdata), paths = .libPaths(), ... )
predictEnmSdm( model, newdata, maxentFun = "terra", scale = TRUE, cores = 1, nrows = nrow(newdata), paths = .libPaths(), ... )
model |
Object of class |
newdata |
Data frame or matrix, or |
maxentFun |
This argument is only used if the |
scale |
Logical. If the model is a GLM trained with |
cores |
Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. This is forced to 1 if |
nrows |
Number of rows of |
paths |
Locations where packages are stored. This is typically not useful to the general user, and is only supplied for when the function is called as a functional. |
... |
Arguments to pass to the algorithm-specific |
Numeric or SpatRaster
.
predict
from the stats package, predict
from the terra package, predictMaxEnt
, predictMaxNet
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
Takes a MaxEnt lambda
object or a MaxEnt object and returns raw or logistic predictions. Its output is the same as the predict
function from the terra package, and in fact, is slower than the function from terra. However, this function does allow custom manipulations that those functions do not allow (e.g., permuting product features while leaving other features with the same variables intact). This function does not clamp predictions–beyond the range of the training data, it extends the prediction in the direction it was going (up/down/no change). The function is based on Peter D. Wilson's document "Guidelines for computing MaxEnt model output values from a lambdas file". The function has a special feature in that it allows you to permute single variables or combinations of variables in specific features before making predictions. This is potentially useful, for example, if you wanted to determine the relative importance of a quadratic feature for a particular variable in a Maxent model relative to the other features in the model. You can also permute values of a variable regardless of which features they appear in. For product features, you can implement the permutation before or after the values are multiplied together (before often makes for bigger differences in predictions).
predictMaxEnt( x, data, type = "cloglog", perm = NULL, permLinear = NULL, permQuad = NULL, permHinge = NULL, permThresh = NULL, permProd = NULL, permProdRule = NULL, ... )
predictMaxEnt( x, data, type = "cloglog", perm = NULL, permLinear = NULL, permQuad = NULL, permHinge = NULL, permThresh = NULL, permProd = NULL, permProdRule = NULL, ... )
x |
Either a Maxent lambda object or a Maxent model object |
data |
Data frame. Data to which to make predictions |
type |
Character. One of:
|
perm |
Character vector. Name(s) of variable to permute before calculating predictions. This permutes the variables for all features in which they occur. If a variable is named here, it overrides permutation settings for each feature featType. Note that for product features the variable is permuted before the product is taken. This permutation is performed before any subsequent permutations (i.e., so if both variables in a product feature are included in |
permLinear |
Character vector. Names(s) of variables to permute in linear features before calculating predictions. Ignored if |
permQuad |
Names(s) of variables to permute in quadratic features before calculating predictions. Ignored if |
permHinge |
Character vector. Names(s) of variables to permute in forward/reverse hinge features before calculating predictions. Ignored if |
permThresh |
Character vector. Names(s) of variables to permute in threshold features before calculating predictions. Ignored if |
permProd |
Character list. A list object of |
permProdRule |
Character. Rule for how permutation of product features is applied: |
... |
Extra arguments (not used). |
Numeric.
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
This function is the same as the predict
function in the maxnet package, except that:
If the input is a data frame, the output is a vector as output (not a single-column matrix);
If the input is a SpatRaster
, the output is a SpatRaster
;
The default output is on the cloglog scale;
The function can be explicitly called (versus doing, say, maxnet:::predict.maxnet
, which does not work even when that would be really useful...).
predictMaxNet(model, newdata, clamp = TRUE, type = "cloglog", ...)
predictMaxNet(model, newdata, clamp = TRUE, type = "cloglog", ...)
model |
Object of class |
newdata |
Object of class |
clamp |
If |
type |
One of:
|
... |
Other arguments (unused). |
Numeric vector or SpatRaster
predict
from the terra package, and maxnet
(see the predict
function therein)
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
This function returns coordinates randomly located on a raster where cells can be sampled with replacement (if desired) and where the probability of selection is proportionate to the cell value, cell area, or the product of cell value times cell area.
sampleRast(x, n, adjArea = TRUE, replace = TRUE, prob = TRUE)
sampleRast(x, n, adjArea = TRUE, replace = TRUE, prob = TRUE)
x |
|
n |
Positive integer. Number of points to draw. |
adjArea |
If |
replace |
If |
prob |
If |
2-column matrix with longitude and latitude of random points.
library(terra) r <- rast() nc <- ncell(r) r[] <- 1:nc rands1 <- sampleRast(r, 10000) rands2 <- sampleRast(r, 10000, adjArea=FALSE) rands3 <- sampleRast(r, 10000, prob=FALSE) rands4 <- sampleRast(r, 10000, adjArea=FALSE, prob=FALSE) oldPar <- par(mfrow=c(2, 2)) plot(r, main='adjArea = TRUE & prob = TRUE') points(rands1, pch='.') plot(r, main='adjArea = FALSE & prob = TRUE') points(rands2, pch='.') plot(r, main='adjArea = TRUE & prob = FALSE') points(rands3, pch='.') plot(r, main='adjArea = FALSE & prob = FALSE') points(rands4, pch='.') par(oldPar)
library(terra) r <- rast() nc <- ncell(r) r[] <- 1:nc rands1 <- sampleRast(r, 10000) rands2 <- sampleRast(r, 10000, adjArea=FALSE) rands3 <- sampleRast(r, 10000, prob=FALSE) rands4 <- sampleRast(r, 10000, adjArea=FALSE, prob=FALSE) oldPar <- par(mfrow=c(2, 2)) plot(r, main='adjArea = TRUE & prob = TRUE') points(rands1, pch='.') plot(r, main='adjArea = FALSE & prob = TRUE') points(rands2, pch='.') plot(r, main='adjArea = TRUE & prob = FALSE') points(rands3, pch='.') plot(r, main='adjArea = FALSE & prob = FALSE') points(rands4, pch='.') par(oldPar)
This function converts a SpatVector
object from the terra package to a Spatial
object of the appropriate class (SpatialPoints
, SpatialPointsDataFrame
, SpatialPolygons
, or SpatialPolygonsDataFrame
) from the sp package. Note that sp is to be retired in 2023, so this function is to be come useful only for legacy applications.
spatVectorToSpatial(x)
spatVectorToSpatial(x)
x |
|
Object of class Spatial
.
library(terra) f <- system.file('ex/lux.shp', package='terra') v <- vect(f) spat <- spatVectorToSpatial(v) class(spat)
library(terra) f <- system.file('ex/lux.shp', package='terra') v <- vect(f) spat <- spatVectorToSpatial(v) class(spat)
This function creates a raster from an object with an extent (i.e., another raster or similar spatial object) with square cells. The user can specify cell resolution (linear dimension) or the approximate number of cells desired.
squareCellRast(x, numCells = NULL, res = NULL, vals = NULL)
squareCellRast(x, numCells = NULL, res = NULL, vals = NULL)
x |
An object with a spatial extent property (e.g., a |
numCells |
Positive integer, approximate number of cells desired. If this is specified, then |
res |
Positive numeric. Size of a cell in the units of the projection of |
vals |
Numeric, value to assign to cells. Note that if this is shorter than the number of cells in the output, then values will be recycled. If longer, then values will be truncated. The default is to assign all 0s. |
SpatRaster
object. The raster will have an extent of the same size or larger than the extent of x
.
library(sf) library(terra) # project outline of Madagascar to equal-area: data(mad0) mad0Ea <- st_transform(mad0, getCRS('madAlbers')) n <- 101 cellSize_meters <- 10E4 byNumCells <- squareCellRast(mad0Ea, numCells=n) byCellSize <- squareCellRast(mad0Ea, res=cellSize_meters) oldPar <- par(mfrow=c(1, 2)) main1 <- paste0('Cells: ', n, ' desired, ', ncell(byNumCells), ' actual') plot(byNumCells, main = main1) plot(mad0Ea, add = TRUE) main2 <- paste0('Cells ', cellSize_meters, ' m on a side') plot(byCellSize, main = main2) plot(mad0Ea, add = TRUE) par(oldPar) # Note that in this example they look the same, but the one on the left # has one less row than the one on the right.
library(sf) library(terra) # project outline of Madagascar to equal-area: data(mad0) mad0Ea <- st_transform(mad0, getCRS('madAlbers')) n <- 101 cellSize_meters <- 10E4 byNumCells <- squareCellRast(mad0Ea, numCells=n) byCellSize <- squareCellRast(mad0Ea, res=cellSize_meters) oldPar <- par(mfrow=c(1, 2)) main1 <- paste0('Cells: ', n, ' desired, ', ncell(byNumCells), ' actual') plot(byNumCells, main = main1) plot(mad0Ea, add = TRUE) main2 <- paste0('Cells ', cellSize_meters, ' m on a side') plot(byCellSize, main = main2) plot(mad0Ea, add = TRUE) par(oldPar) # Note that in this example they look the same, but the one on the left # has one less row than the one on the right.
This function summarizes models calibrated using the trainByCrossValid
function. It returns aspects of the best models across k-folds (the particular aspects depends on the kind of models used).
summaryByCrossValid( x, metric = "cbiTest", decreasing = TRUE, interceptOnly = TRUE )
summaryByCrossValid( x, metric = "cbiTest", decreasing = TRUE, interceptOnly = TRUE )
x |
The output from the |
metric |
Metric by which to select the best model in each k-fold. This can be any of the columns that appear in the data frames in
|
decreasing |
Logical, if |
interceptOnly |
Logical. If |
Data frame with statistics on the best set of models across k-folds. Depending on the model algorithm, this could be:
BRTs (boosted regression trees): Learning rate, tree complexity, and bag fraction.
GLMs (generalized linear models): Frequency of use of each term in the best models.
Maxent: Frequency of times each specific combination of feature classes was used in the best models plus mean master regularization multiplier for each feature set.
NSs (natural splines): Data frame, one row per fold and one column per predictor, with values representing the maximum degrees of freedom used for each variable in the best model of each fold.
RFs (random forests): Data frame, one row per fold, with values representing the optimal value of numTrees
and mtry
(see ranger
).
trainByCrossValid
, trainBRT
, trainGAM
, trainGLM
, trainMaxEnt
, trainNS
, trainRF
# The example below show a very basic modeling workflow. It has been # designed to work fast, not produce accurate, defensible models. # The general idea is to calibrate a series of models and evaluate them # against a withheld set of data. One can then use the series of models # of the top models to better select a "final" model. ## Not run: # Running the entire set of commands can take a few minutes. This can # be sped up by increasing the number of cores used. The examples below use # one core, but you can change that argument according to your machine's # capabilities. library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create background sites (using just 1000 to speed things up!) bgEnv <- terra::spatSample(madClim, 3000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') # using "vector" form of "folds" argument folds <- dismo::kfold(env, 3) # just 3 folds (for speed) ### calibrate models #################### cores <- 1 # increase this to go faster, if your computer handles it ## MaxEnt mxx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainMaxEnt, regMult = 1:2, # too few values for valid model, but fast! verbose = 1, cores = cores ) # summarize MaxEnt feature sets and regularization across folds summaryByCrossValid(mxx) ## MaxNet mnx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainMaxNet, regMult = 1:2, # too few values for valid model, but fast! verbose = 1, cores = cores ) # summarize MaxEnt feature sets and regularization across folds summaryByCrossValid(mnx) ## generalized linear models glx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainGLM, verbose = 1, cores = cores ) # summarize GLM terms in best models summaryByCrossValid(glx) ## generalized additive models gax <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainGAM, verbose = 1, cores = cores ) # summarize GAM terms in best models summaryByCrossValid(gax) ## natural splines nsx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainNS, df = 1:2, verbose = 1, cores = cores ) # summarize NS terms in best models summaryByCrossValid(nsx) ## boosted regression trees brtx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainBRT, learningRate = c(0.001, 0.0001), # too few values for reliable model(?) treeComplexity = c(2, 4), # too few values for reliable model, but fast minTrees = 1000, maxTrees = 1500, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = 1, cores = cores ) # summarize BRT parameters across best models summaryByCrossValid(brtx) ## random forests rfx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainRF, verbose = 1, cores = cores ) # summarize RF parameters in best models summaryByCrossValid(rfx) ## End(Not run)
# The example below show a very basic modeling workflow. It has been # designed to work fast, not produce accurate, defensible models. # The general idea is to calibrate a series of models and evaluate them # against a withheld set of data. One can then use the series of models # of the top models to better select a "final" model. ## Not run: # Running the entire set of commands can take a few minutes. This can # be sped up by increasing the number of cores used. The examples below use # one core, but you can change that argument according to your machine's # capabilities. library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create background sites (using just 1000 to speed things up!) bgEnv <- terra::spatSample(madClim, 3000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') # using "vector" form of "folds" argument folds <- dismo::kfold(env, 3) # just 3 folds (for speed) ### calibrate models #################### cores <- 1 # increase this to go faster, if your computer handles it ## MaxEnt mxx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainMaxEnt, regMult = 1:2, # too few values for valid model, but fast! verbose = 1, cores = cores ) # summarize MaxEnt feature sets and regularization across folds summaryByCrossValid(mxx) ## MaxNet mnx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainMaxNet, regMult = 1:2, # too few values for valid model, but fast! verbose = 1, cores = cores ) # summarize MaxEnt feature sets and regularization across folds summaryByCrossValid(mnx) ## generalized linear models glx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainGLM, verbose = 1, cores = cores ) # summarize GLM terms in best models summaryByCrossValid(glx) ## generalized additive models gax <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainGAM, verbose = 1, cores = cores ) # summarize GAM terms in best models summaryByCrossValid(gax) ## natural splines nsx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainNS, df = 1:2, verbose = 1, cores = cores ) # summarize NS terms in best models summaryByCrossValid(nsx) ## boosted regression trees brtx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainBRT, learningRate = c(0.001, 0.0001), # too few values for reliable model(?) treeComplexity = c(2, 4), # too few values for reliable model, but fast minTrees = 1000, maxTrees = 1500, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = 1, cores = cores ) # summarize BRT parameters across best models summaryByCrossValid(brtx) ## random forests rfx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainRF, verbose = 1, cores = cores ) # summarize RF parameters in best models summaryByCrossValid(rfx) ## End(Not run)
This function calibrates a boosted regression tree (or gradient boosting machine) model, and is a wrapper for gbm
. The function uses a grid search to assess the best combination of learning rate, tree depth, and bag fraction based on cross-validated deviance. If a particular combination of parameters leads to an unconverged model, the script attempts again using slightly different parameters. Its output is any or all of: a table with deviance of evaluated models; all evaluated models; and/or the single model with the lowest deviance.
trainBRT( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], learningRate = c(1e-04, 0.001, 0.01), treeComplexity = c(5, 3, 1), bagFraction = 0.6, minTrees = 1000, maxTrees = 8000, tries = 5, tryBy = c("learningRate", "treeComplexity", "maxTrees", "stepSize"), w = TRUE, anyway = FALSE, family = "bernoulli", out = "model", cores = 1, verbose = FALSE, ... )
trainBRT( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], learningRate = c(1e-04, 0.001, 0.01), treeComplexity = c(5, 3, 1), bagFraction = 0.6, minTrees = 1000, maxTrees = 8000, tries = 5, tryBy = c("learningRate", "treeComplexity", "maxTrees", "stepSize"), w = TRUE, anyway = FALSE, family = "bernoulli", out = "model", cores = 1, verbose = FALSE, ... )
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
learningRate |
Numeric. Learning rate at which model learns from successive trees (Elith et al. 2008 recommend 0.0001 to 0.1). |
treeComplexity |
Positive integer. Tree complexity: depth of branches in a single tree (1 to 16). |
bagFraction |
Numeric in the range [0, 1]. Bag fraction: proportion of data used for training in cross-validation (Elith et al. 2008 recommend 0.5 to 0.7). |
minTrees |
Positive integer. Minimum number of trees to be scored as a "usable" model (Elith et al. 2008 recommend at least 1000). Default is 1000. |
maxTrees |
Positive integer. Maximum number of trees in model set. |
tries |
Integer > 0. Number of times to try to train a model with a particular set of tuning parameters. The function will stop training the first time a model converges (usually on the first attempt). Non-convergence seems to be related to the number of trees tried in each step. So if non-convergence occurs then the function automatically increases the number of trees in the step size until |
tryBy |
Character vector. A list that contains one or more of |
w |
Weights. Any of:
|
anyway |
Logical. If |
family |
Character. Name of error family. |
out |
Character vector. One or more values:
|
cores |
Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Additional arguments (not used). |
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of two or more of these.
Elith, J., J.R. Leathwick, & T. Hastie. 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77:802-813. doi:10.1111/j.1365-2656.2008.01390.x
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
This function is an extension of any of the trainXYZ
functions for calibrating species distribution and ecological niche models. This function uses the trainXYZ
function to calibrate and evaluate a suite of models using cross-validation. The models are evaluated against withheld data to determine the optimal settings for a "final" model using all available data. The function returns a set of models and/or a table with statistics on each model. The statistics represent various measures of model accuracy, and are calculated against training and test sites (separately).
trainByCrossValid( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], folds = predicts::folds(data), trainFx = enmSdmX::trainGLM, ..., weightEvalTrain = TRUE, weightEvalTest = TRUE, na.rm = FALSE, outputModels = TRUE, verbose = 0 )
trainByCrossValid( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], folds = predicts::folds(data), trainFx = enmSdmX::trainGLM, ..., weightEvalTrain = TRUE, weightEvalTest = TRUE, na.rm = FALSE, outputModels = TRUE, verbose = 0 )
data |
Data frame or matrix. Response variable and environmental predictors (and no other fields) for presences and non-presence sites. |
resp |
Character or integer. Name or column index of response variable. Default is to use the first column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in |
folds |
Either a numeric vector, or matrix or data frame which specify which rows in
|
trainFx |
Function, name of the |
... |
Arguments to pass to the "trainXYZ" function. |
weightEvalTrain |
Logical, if |
weightEvalTest |
Logical, if |
na.rm |
Logical, if |
outputModels |
If |
verbose |
Numeric. If 0 show no progress updates. If > 0 then show minimal progress updates for this function only. If > 1 show detailed progress for this function. If > 2 show detailed progress plus detailed progress for the |
In some cases models do not converge (e.g., boosted regression trees and generalized additive models sometimes suffer from this issue). In this case the model will be skipped, but a data frame with the k-fold and model number in the fold will be returned in the $meta element in the output. If no models converged, then this data frame will be empty.
A list object with several named elements:
meta
: Meta-data on the model call.
folds
: The folds
object.
models
(if outputModels
is TRUE
): A list of model objects, one per data fold.
tuning
: One data frame per k-fold, each containing evaluation statistics for all candidate models in the fold. In addition to algorithm-specific fields, these consist of:
'logLoss'
: Log loss. Higher (less negative) values imply better fit.
'cbi'
: Continuous Boyce Index (CBI). Calculated with evalContBoyce
.
'auc'
: Area under the receiver-operator characteristic curve (AUC). Calculated with evalAUC
.
'tss'
: Maximum value of the True Skill Statistic. Calculated with evalTSS
.
'msss'
: Sensitivity and specificity calculated at the threshold that maximizes sensitivity (true presence prediction rate) plus specificity (true absence prediction rate).
'mdss'
: Sensitivity (se) and specificity (sp) calculated at the threshold that minimizes the difference between sensitivity and specificity.
'minTrainPres'
: Sensitivity (se) and specificity (sp) at the greatest threshold at which all training presences are classified as "present".
'trainSe95'
and/or 'trainSe90'
: Sensitivity (se) and specificity (sp) at the threshold that ensures either 95 or 90 percent of all training presences are classified as "present" (training sensitivity = 0.95 or 0.9).
Fielding, A.H. and J.F. Bell. 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24:38-49. doi:10.1017/S0376892997000088 La Rest, K., Pinaud, D., Monestiez, P., Chadoeuf, J., and Bretagnolle, V. 2014. Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation. Global Ecology and Biogeography 23:811-820. doi:10.1111/geb.12161 Radosavljevic, A. and Anderson, R.P. 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography 41:629-643. doi:10.1111/jbi.12227
summaryByCrossValid
, trainBRT
, trainGAM
, trainGLM
, trainMaxEnt
, trainMaxNet
, trainNS
, trainRF
# The example below show a very basic modeling workflow. It has been # designed to work fast, not produce accurate, defensible models. # The general idea is to calibrate a series of models and evaluate them # against a withheld set of data. One can then use the series of models # of the top models to better select a "final" model. ## Not run: # Running the entire set of commands can take a few minutes. This can # be sped up by increasing the number of cores used. The examples below use # one core, but you can change that argument according to your machine's # capabilities. library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create background sites (using just 1000 to speed things up!) bgEnv <- terra::spatSample(madClim, 3000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') # using "vector" form of "folds" argument folds <- dismo::kfold(env, 3) # just 3 folds (for speed) ### calibrate models #################### cores <- 1 # increase this to go faster, if your computer handles it ## MaxEnt mxx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainMaxEnt, regMult = 1:2, # too few values for valid model, but fast! verbose = 1, cores = cores ) # summarize MaxEnt feature sets and regularization across folds summaryByCrossValid(mxx) ## MaxNet mnx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainMaxNet, regMult = 1:2, # too few values for valid model, but fast! verbose = 1, cores = cores ) # summarize MaxEnt feature sets and regularization across folds summaryByCrossValid(mnx) ## generalized linear models glx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainGLM, verbose = 1, cores = cores ) # summarize GLM terms in best models summaryByCrossValid(glx) ## generalized additive models gax <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainGAM, verbose = 1, cores = cores ) # summarize GAM terms in best models summaryByCrossValid(gax) ## natural splines nsx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainNS, df = 1:2, verbose = 1, cores = cores ) # summarize NS terms in best models summaryByCrossValid(nsx) ## boosted regression trees brtx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainBRT, learningRate = c(0.001, 0.0001), # too few values for reliable model(?) treeComplexity = c(2, 4), # too few values for reliable model, but fast minTrees = 1000, maxTrees = 1500, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = 1, cores = cores ) # summarize BRT parameters across best models summaryByCrossValid(brtx) ## random forests rfx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainRF, verbose = 1, cores = cores ) # summarize RF parameters in best models summaryByCrossValid(rfx) ## End(Not run)
# The example below show a very basic modeling workflow. It has been # designed to work fast, not produce accurate, defensible models. # The general idea is to calibrate a series of models and evaluate them # against a withheld set of data. One can then use the series of models # of the top models to better select a "final" model. ## Not run: # Running the entire set of commands can take a few minutes. This can # be sped up by increasing the number of cores used. The examples below use # one core, but you can change that argument according to your machine's # capabilities. library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create background sites (using just 1000 to speed things up!) bgEnv <- terra::spatSample(madClim, 3000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') # using "vector" form of "folds" argument folds <- dismo::kfold(env, 3) # just 3 folds (for speed) ### calibrate models #################### cores <- 1 # increase this to go faster, if your computer handles it ## MaxEnt mxx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainMaxEnt, regMult = 1:2, # too few values for valid model, but fast! verbose = 1, cores = cores ) # summarize MaxEnt feature sets and regularization across folds summaryByCrossValid(mxx) ## MaxNet mnx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainMaxNet, regMult = 1:2, # too few values for valid model, but fast! verbose = 1, cores = cores ) # summarize MaxEnt feature sets and regularization across folds summaryByCrossValid(mnx) ## generalized linear models glx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainGLM, verbose = 1, cores = cores ) # summarize GLM terms in best models summaryByCrossValid(glx) ## generalized additive models gax <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainGAM, verbose = 1, cores = cores ) # summarize GAM terms in best models summaryByCrossValid(gax) ## natural splines nsx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainNS, df = 1:2, verbose = 1, cores = cores ) # summarize NS terms in best models summaryByCrossValid(nsx) ## boosted regression trees brtx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainBRT, learningRate = c(0.001, 0.0001), # too few values for reliable model(?) treeComplexity = c(2, 4), # too few values for reliable model, but fast minTrees = 1000, maxTrees = 1500, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = 1, cores = cores ) # summarize BRT parameters across best models summaryByCrossValid(brtx) ## random forests rfx <- trainByCrossValid( data = env, resp = 'presBg', preds = c('bio1', 'bio12'), folds = folds, trainFx = trainRF, verbose = 1, cores = cores ) # summarize RF parameters in best models summaryByCrossValid(rfx) ## End(Not run)
This function calibrates a set of "ensembles of small models" (ESM), which are designed for modeling species with few occurrence records. In the original formulation, each model has two covariates interacting additively. Models are calibrated using all possible combinations of covariates. By default, this function does the same, but can also include univariate models, models with two covariates plus their interaction term, and models with quadratic and corresponding linear terms. This function will only train generalized linear models. Extending the types of algorithms is planned!
trainESM( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], univariate = FALSE, quadratic = FALSE, interaction = FALSE, interceptOnly = FALSE, method = "glm.fit", scale = NA, w = TRUE, family = stats::binomial(), ..., verbose = FALSE )
trainESM( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], univariate = FALSE, quadratic = FALSE, interaction = FALSE, interceptOnly = FALSE, method = "glm.fit", scale = NA, w = TRUE, family = stats::binomial(), ..., verbose = FALSE )
data |
Data frame or matrix. Response variable and environmental predictors (and no other fields) for presences and non-presence sites. |
resp |
Character or integer. Name or column index of response variable. Default is to use the first column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in |
univariate , quadratic , interaction
|
|
interceptOnly |
If |
method |
Character: Name of function used to solve the GLM. For "normal" GLMs, this can be |
scale |
Either |
w |
Weights. Any of:
|
family |
Character or function. Name of family for data error structure (see |
... |
Arguments to pass to |
verbose |
Logical. If |
A list object with several named elements:
models
: A list with each ESM model.
tuning
: A data.frame
with one row per model, in the order as they appear in $models
.
Breiner, F.T., Guisan, A., Bergamini, A., and Nobis, M.P. 2015. Overcoming limitations of modeling rare species by using ensembles of small models. Methods in Ecology and Evolution 6:1210-1218.. doi:10.1111/2041-210X.12403 Lomba, A., L. Pellissier, C. Randin, J. Vicente, J. Horondo, and A. Guisan. 2010. Overcoming the rare species modeling complex: A novel hierarchical framework applied to an Iberian endemic plant. Biological Conservation 143:2647-2657. doi:10.1016/j.biocon.2010.07.007
trainBRT
, trainGAM
, trainGLM
, trainMaxEnt
, trainMaxNet
, trainNS
, trainRF
, trainByCrossValid
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # "traditional" ESMs with just 2 linear predictors # just one model in this case because we have just 2 predictors esm1 <- trainESM( data = env, resp = 'presBg', preds = predictors, family = stats::binomial(), scale = TRUE, w = TRUE ) str(esm1, 1) esm1$tuning # extended ESM with other kinds of terms esm2 <- trainESM( data = env, resp = 'presBg', preds = predictors, univariate = TRUE, quadratic = TRUE, interaction = TRUE, interceptOnly = TRUE, family = stats::binomial(), scale = TRUE, w = TRUE, verbose = TRUE ) str(esm2, 1) esm2$tuning ### make a set of predictions to rasters ######################################## # center environmental rasters and divide by their SD madClimScaled <- scale(madClim, center = esm2$scale$mean, scale = esm2$scale$sd) # make one raster per model predictions <- list() for (i in 1:length(esm2$models)) { predictions[[i]] <- predict(madClimScaled, esm2$models[[i]], type = 'response') } # combine into a "stack" predictions <- do.call(c, predictions) names(predictions) <- esm2$tuning$model plot(predictions) # calculate (unweighted) mean prediction <- mean(predictions) plot(prediction) plot(occs, pch = 1, add = TRUE)
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # "traditional" ESMs with just 2 linear predictors # just one model in this case because we have just 2 predictors esm1 <- trainESM( data = env, resp = 'presBg', preds = predictors, family = stats::binomial(), scale = TRUE, w = TRUE ) str(esm1, 1) esm1$tuning # extended ESM with other kinds of terms esm2 <- trainESM( data = env, resp = 'presBg', preds = predictors, univariate = TRUE, quadratic = TRUE, interaction = TRUE, interceptOnly = TRUE, family = stats::binomial(), scale = TRUE, w = TRUE, verbose = TRUE ) str(esm2, 1) esm2$tuning ### make a set of predictions to rasters ######################################## # center environmental rasters and divide by their SD madClimScaled <- scale(madClim, center = esm2$scale$mean, scale = esm2$scale$sd) # make one raster per model predictions <- list() for (i in 1:length(esm2$models)) { predictions[[i]] <- predict(madClimScaled, esm2$models[[i]], type = 'response') } # combine into a "stack" predictions <- do.call(c, predictions) names(predictions) <- esm2$tuning$model plot(predictions) # calculate (unweighted) mean prediction <- mean(predictions) plot(prediction) plot(occs, pch = 1, add = TRUE)
This function constructs a generalized additive model. By default, the model is constructed in a two-stage process. First, the "construct" phase generates a series of simple models with univariate and bivariate interaction terms. These simple models are then ranked based on their AICc. Second, the "select" phase creates a "full" model from the simple models such that there is at least presPerTermInitial
presences (if the response is binary) or data rows (if not) for each smooth term to be estimated (not counting the intercept). Finally, it selects the best model using AICc from all possible subsets of this "full" model. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.
trainGAM( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], gamma = 1, scale = 0, smoothingBasis = "cs", interaction = "te", interceptOnly = TRUE, construct = TRUE, select = TRUE, presPerTermInitial = 10, presPerTermFinal = 10, maxTerms = 8, w = TRUE, family = "binomial", out = "model", cores = 1, verbose = FALSE, ... )
trainGAM( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], gamma = 1, scale = 0, smoothingBasis = "cs", interaction = "te", interceptOnly = TRUE, construct = TRUE, select = TRUE, presPerTermInitial = 10, presPerTermFinal = 10, maxTerms = 8, w = TRUE, family = "binomial", out = "model", cores = 1, verbose = FALSE, ... )
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
gamma |
Initial penalty to degrees of freedom to use (larger ==> smoother fits). |
scale |
A numeric value indicating the "scale" parameter (see argument |
smoothingBasis |
Character. Indicates the type of smoothing basis. The default is |
interaction |
Character or |
interceptOnly |
If |
construct |
If |
select |
If |
presPerTermInitial |
Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only if |
presPerTermFinal |
Positive integer. Minimum number of presence sites per term in initial starting model; used only if |
maxTerms |
Maximum number of terms to be used in any model, not including the intercept (default is 8). Used only if |
w |
Weights. Any of:
|
family |
Name of family for data error structure (see |
out |
Character vector. One or more values:
|
cores |
Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Extra arguments (not used). |
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these.
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
This function constructs a generalized linear model. By default, the model is constructed in a two-stage process. First, the "construct" phase generates a series of simple models with univariate, quadratic, or 2-way-interaction terms. These simple models are then ranked based on their AICc. Second, the "select" phase creates a "full" model from the simple models such that there is at least presPerTermInitial
presences (if the response is binary) or data rows (if not) for each coefficient to be estimated (not counting the intercept). Finally, it selects the best model using AICc from all possible subsets of this "full" model, while respecting marginality (i.e., all lower-order terms of higher-order terms appear in the model).
The function outputs any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.
trainGLM( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], scale = NA, construct = TRUE, select = TRUE, quadratic = TRUE, interaction = TRUE, interceptOnly = TRUE, method = "glm.fit", presPerTermInitial = 10, presPerTermFinal = 10, maxTerms = 8, w = TRUE, family = stats::binomial(), removeInvalid = TRUE, failIfNoValid = TRUE, out = "model", cores = 1, verbose = FALSE, ... )
trainGLM( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], scale = NA, construct = TRUE, select = TRUE, quadratic = TRUE, interaction = TRUE, interceptOnly = TRUE, method = "glm.fit", presPerTermInitial = 10, presPerTermFinal = 10, maxTerms = 8, w = TRUE, family = stats::binomial(), removeInvalid = TRUE, failIfNoValid = TRUE, out = "model", cores = 1, verbose = FALSE, ... )
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
scale |
Either |
construct |
Logical. If |
select |
Logical. If |
quadratic |
Logical. Used only if |
interaction |
Logical. Used only if |
interceptOnly |
If |
method |
Character: Name of function used to solve the GLM. For "normal" GLMs, this can be |
presPerTermInitial |
Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only is |
presPerTermFinal |
Positive integer. Minimum number of presence sites per term in initial starting model. Used only if |
maxTerms |
Maximum number of terms to be used in any model, not including the intercept (default is 8). Used only if |
w |
Weights. Any of:
|
family |
Name of family for data error structure (see |
removeInvalid |
Logical. If |
failIfNoValid |
Logical. If |
out |
Character vector. One or more values:
|
cores |
Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Arguments to pass to |
This function is designed to find the most parsimonious model given the amount of calibration data that is available to it. 'trainGLM()' can work with any data, but has been designed to work specifically as a species distribution model where the response is either binary (default) or abundance. Specifically, it 1) identifies the most parsimonious model (lowest AICc) with 2) optimal flexibility (optimal degrees of freedom in splines) and 3) allows for (but does not require) interaction terms between predictors (if desired). If the defaults are used, the following procedure is applied:
Constructing a set of simple model terms, each with 1 to 4 degrees of freedom. Terms can be univariate or bilabiate (two-way interactions). Predictors can be continuous or factors. If any simple models has convergence issues or boundary issues (coefficients that approach negative or positive infinity), it is removed.
Constructing a series of models, each with one of the terms, then using the models to rank terms by AICc.
From the top set of terms, creating a "full" model. The full model will ensure the maximum number of terms is <= 'maxTerms', and that for each term, there are at least 'presPerTermFinal' data points.
All possible submodels, plus the full model, are evaluated and ranked by AICc. If a model has convergence or boundary issues, it is removed from the set. The most parsimonious model (lowest AICc) is returned.
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these. If scale
is TRUE
, any model object will also have an element named $scale
, which contains the means and standard deviations for predictors that are not factors. The data frame reports the AICc for all of the models evaluated, sorted by best to worst. The converged
column indicates whether the model converged ("TRUE
" is good), and the boundary
column whether the model parameters are near the boundary (usually, negative or positive infinity; "FALSE
" is good).
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
This function calculates the "best" MaxEnt model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The best model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better).
trainMaxEnt( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], regMult = c(seq(0.5, 5, by = 0.5), 7.5, 10), classes = "default", testClasses = TRUE, dropOverparam = TRUE, anyway = TRUE, forceLinear = TRUE, jackknife = FALSE, arguments = NULL, scratchDir = NULL, out = "model", cores = 1, verbose = FALSE, ... )
trainMaxEnt( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], regMult = c(seq(0.5, 5, by = 0.5), 7.5, 10), classes = "default", testClasses = TRUE, dropOverparam = TRUE, anyway = TRUE, forceLinear = TRUE, jackknife = FALSE, arguments = NULL, scratchDir = NULL, out = "model", cores = 1, verbose = FALSE, ... )
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
regMult |
Numeric vector. Values of the master regularization parameters (called |
classes |
Character vector. Names of feature classes to use (either |
testClasses |
Logical. If |
dropOverparam |
Logical, if |
anyway |
Logical. Same as |
forceLinear |
Logical. If |
jackknife |
Logical. If |
arguments |
|
scratchDir |
Character. Directory to which to write temporary files. Leave as NULL to create a temporary folder in the current working directory. |
out |
Character vector. One or more values:
|
cores |
Number of cores to use. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Extra arguments. Not used. |
The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. The function requires the maxent
jar file (see Details). Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.
Note that due to differences in how MaxEnt and MaxNet are implemented in their base packages, the models will not necessarily be the same even for the same training data.
This function is a wrapper for MaxEnt()
. The MaxEnt
function creates a series of files on disk for each model. This function assumes you do not want those files, so deletes most of them. However, there is one that cannot be deleted and the normal ways of changing its permissions in R do not work. So the function simply writes over that file (which is allowed) to make it smaller. Regardless, if you run many models your temporary directory (argument scratchDir
) can fill up and require manual deletion.
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these.
Warren, D.L. and S.N. Siefert. 2011. Ecological niche modeling in Maxent: The importance of model complexity and the performance of model selection criteria. Ecological Applications 21:335-342. doi:10.1890/10-1171.1
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
This function calculates the "best" MaxNet model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The "best" model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better).
trainMaxNet( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], regMult = c(seq(0.5, 5, by = 0.5), 7.5, 10), classes = "default", testClasses = TRUE, dropOverparam = TRUE, forceLinear = TRUE, out = "model", cores = 1, verbose = FALSE, ... )
trainMaxNet( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], regMult = c(seq(0.5, 5, by = 0.5), 7.5, 10), classes = "default", testClasses = TRUE, dropOverparam = TRUE, forceLinear = TRUE, out = "model", cores = 1, verbose = FALSE, ... )
data |
Data frame or matrix. Contains a column indicating whether each row is a presence (1) or background (0) site, plus columns for environmental predictors. |
resp |
Character or integer. Name or column index of response variable. Default is to use the first column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in |
regMult |
Numeric vector. Values of the master regularization parameters (called |
classes |
Character vector. Names of feature classes to use (either |
testClasses |
Logical. If |
dropOverparam |
Logical, if |
forceLinear |
Logical. If |
out |
Character vector. One or more values:
|
cores |
Number of cores to use. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Extra arguments. Not used. |
The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.
Note that due to differences in how MaxEnt and MaxNet are implemented in their base packages, the models will not necessarily be the same even for the same training data.
If out = 'model'
this function returns an object of class MaxEnt
. If out = 'tuning'
this function returns a data frame with tuning parameters, log likelihood, and AICc for each model tried. If out = c('model', 'tuning'
then it returns a list object with the MaxEnt
object and the data frame.
Phillips, S.J., Anderson, R.P., Dudík, M. Schapire, R.E., and Blair, M.E. 2017. Opening the black box: An open-source release of Maxent. Ecography 40:887-893. doi:10.1111/ecog.03049 Warren, D.L. and S.N. Siefert. 2011. Ecological niche modeling in Maxent: The importance of model complexity and the performance of model selection criteria. Ecological Applications 21:335-342. doi:10.1890/10-1171.1
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
This function constructs a natural-spline model by evaluating all possible models given the available predictors and constraints. "Constraints" in this case include the degrees of freedom for a spline, whether or not interaction terms are included, minimum number of presence sites per model term, and maximum number of terms to include in the model. Its output is any or all of: the most parsimonious model (lowest AICc); all models evaluated; and/or a table with AICc for all evaluated models.
trainNS( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], scale = NA, df = 1:4, interaction = TRUE, interceptOnly = TRUE, method = "glm.fit", presPerTermFinal = 10, maxTerms = 8, w = TRUE, family = "binomial", removeInvalid = TRUE, failIfNoValid = TRUE, out = "model", cores = 1, verbose = FALSE, ... )
trainNS( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], scale = NA, df = 1:4, interaction = TRUE, interceptOnly = TRUE, method = "glm.fit", presPerTermFinal = 10, maxTerms = 8, w = TRUE, family = "binomial", removeInvalid = TRUE, failIfNoValid = TRUE, out = "model", cores = 1, verbose = FALSE, ... )
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
scale |
Either |
df |
A vector of integers > 0 or |
interaction |
If |
interceptOnly |
If |
method |
Character, name of function used to solve. This can be |
presPerTermFinal |
Minimum number of presence sites per term in initial starting model. |
maxTerms |
Maximum number of terms to be used in any model, not including the intercept (default is 8). |
w |
Weights. Any of:
|
family |
Name of family for data error structure (see |
removeInvalid |
Logical. If |
failIfNoValid |
Logical. If |
out |
Character vector. One or more values:
|
cores |
Number of cores to use. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Arguments to send to |
This function is designed to find the most parsimonious model given the amount of calibration data that is available to it. 'trainNS()' can work with any data, but has been designed to work specifically as a species distribution model where the response is either binary (default) or abundance. Specifically, it 1) identifies the most parsimonious model (lowest AICc) with 2) optimal flexibility (optimal degrees of freedom in splines) and 3) allows for (but does not require) interaction terms between predictors (if desired). If the defaults are used, the following procedure is applied:
Constructing a set of simple model terms, each with 1 to 4 degrees of freedom. Terms can be univariate or bilabiate (two-way interactions). Predictors can be continuous or factors. If any simple models has convergence issues or boundary issues (coefficients that approach negative or positive infinity), it is removed.
Constructing a series of models, each with one of the terms, then using the models to rank terms by AICc.
From the top set of terms, creating a "full" model. The full model will ensure the maximum number of terms is <= 'maxTerms', and that for each term, there are at least 'presPerTermFinal' data points.
All possible submodels, plus the full model, are evaluated and ranked by AICc. If a model has convergence or boundary issues, it is removed from the set. The most parsimonious model (lowest AICc) is returned.
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these. If scale
is TRUE
, any model object will also have an element named $scale
, which contains the means and standard deviations for predictors that are not factors. The data frame reports the AICc for all of the models evaluated, sorted by best to worst. The converged
column indicates whether the model converged ("TRUE
" is good), and the boundary
column whether the model parameters are near the boundary (usually, negative or positive infinity; "FALSE
" is good).
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
This function trains a random forest model. It identifies the optimal number of trees and value for mtry
(number of variables sampled as candidates at each split) using out-of-bag error (OOB). The number of trees in each candidate model is set by the user with argument numTrees
. The number of predictors to test per split, mtry
, is found by exploring a range of values. If the response (y
) is a factor, the starting value for mtry
is max(1, floor(p / 3))
, where p
is the number of predictors. If the response is not a factor, the starting value is max(1, floor(sqrt(p)))
. Values ymtryIncrement
argument until the total number of predictors is used. See ranger
for more details.
The output of the function is any or all of: a table with out-of-bag (OOB) error of evaluated models; all evaluated models; and/or the single model with the lowest OOB error.
trainRF( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], numTrees = c(250, 500, 750, 1000), mtryIncrement = 2, w = TRUE, binary = TRUE, out = "model", cores = 1, verbose = FALSE, ... )
trainRF( data, resp = names(data)[1], preds = names(data)[2:ncol(data)], numTrees = c(250, 500, 750, 1000), mtryIncrement = 2, w = TRUE, binary = TRUE, out = "model", cores = 1, verbose = FALSE, ... )
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
numTrees |
Vector of number of trees to grow. All possible combinations of |
mtryIncrement |
Positive integer (default is 2). Number of predictors to add to |
w |
Weights. For random forests, weights are simply used as relative probabilities of selecting a row in
|
binary |
Logical. If |
out |
Character vector. One or more values:
|
cores |
Number of cores to use. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Arguments to pass to |
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these.
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
# NB: The examples below show a very basic modeling workflow. They have been # designed to work fast, not produce accurate, defensible models. They can # take a few minutes to run. library(mgcv) library(sf) library(terra) set.seed(123) ### setup data ############## # environmental rasters rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') madClim <- rast(rastFile) # coordinate reference system wgs84 <- getCRS('WGS84') # lemur occurrence data data(lemurs) occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) occs <- elimCellDuplicates(occs, madClim) occEnv <- extract(madClim, occs, ID = FALSE) occEnv <- occEnv[complete.cases(occEnv), ] # create 10000 background sites (or as many as raster can support) bgEnv <- terra::spatSample(madClim, 20000) bgEnv <- bgEnv[complete.cases(bgEnv), ] bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] # collate occurrences and background sites presBg <- data.frame( presBg = c( rep(1, nrow(occEnv)), rep(0, nrow(bgEnv)) ) ) env <- rbind(occEnv, bgEnv) env <- cbind(presBg, env) predictors <- c('bio1', 'bio12') ### calibrate models #################### # Note that all of the trainXYZ functions can made to go faster using the # "cores" argument (set to just 1, by default). The examples below will not # go too much faster using more cores because they are simplified, but # you can try! cores <- 1 # MaxEnt mx <- trainMaxEnt( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # MaxNet mn <- trainMaxNet( data = env, resp = 'presBg', preds = predictors, regMult = 1, # too few values for reliable model, but fast verbose = TRUE, cores = cores ) # generalized linear model (GLM) gl <- trainGLM( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors verbose = TRUE, cores = cores ) # generalized additive model (GAM) ga <- trainGAM( data = env, resp = 'presBg', preds = predictors, verbose = TRUE, cores = cores ) # natural splines ns <- trainNS( data = env, resp = 'presBg', preds = predictors, scale = TRUE, # automatic scaling of predictors df = 1:2, # too few values for reliable model(?) verbose = TRUE, cores = cores ) # boosted regression trees envSub <- env[1:1049, ] # subsetting data to run faster brt <- trainBRT( data = envSub, resp = 'presBg', preds = predictors, learningRate = 0.001, # too few values for reliable model(?) treeComplexity = c(2, 3), # too few values for reliable model, but fast minTrees = 1200, # minimum trees for reliable model(?), but fast maxTrees = 1200, # too small for reliable model(?), but fast tryBy = 'treeComplexity', anyway = TRUE, # return models that did not converge verbose = TRUE, cores = cores ) # random forests rf <- trainRF( data = env, resp = 'presBg', preds = predictors, numTrees = c(100, 500), # using at least 500 recommended, but fast! verbose = TRUE, cores = cores ) ### make maps of models ####################### # NB We do not have to scale rasters before predicting GLMs and NSs because we # used the `scale = TRUE` argument in trainGLM() and trainNS(). mxMap <- predictEnmSdm(mx, madClim) mnMap <- predictEnmSdm(mn, madClim) glMap <- predictEnmSdm(gl, madClim) gaMap <- predictEnmSdm(ga, madClim) nsMap <- predictEnmSdm(ns, madClim) brtMap <- predictEnmSdm(brt, madClim) rfMap <- predictEnmSdm(rf, madClim) maps <- c( mxMap, mnMap, glMap, gaMap, nsMap, brtMap, rfMap ) names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs') fun <- function() plot(occs, col='black', pch=3, add=TRUE) plot(maps, fun = fun, nc = 4) ### compare model responses to BIO12 (mean annual precipitation) ################################################################ # make a data frame holding all other variables at mean across occurrences, # varying only BIO12 occEnvMeans <- colMeans(occEnv, na.rm=TRUE) occEnvMeans <- rbind(occEnvMeans) occEnvMeans <- as.data.frame(occEnvMeans) climFrame <- occEnvMeans[rep(1, 100), ] rownames(climFrame) <- NULL minBio12 <- min(env$bio12) maxBio12 <- max(env$bio12) climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100) predMx <- predictEnmSdm(mx, climFrame) predMn <- predictEnmSdm(mn, climFrame) predGl <- predictEnmSdm(gl, climFrame) predGa <- predictEnmSdm(ga, climFrame) predNat <- predictEnmSdm(ns, climFrame) predBrt <- predictEnmSdm(brt, climFrame) predRf <- predictEnmSdm(rf, climFrame) plot(climFrame$bio12, predMx, xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1)) lines(climFrame$bio12, predMn, lty='solid', col='red') lines(climFrame$bio12, predGl, lty='dotted', col='blue') lines(climFrame$bio12, predGa, lty='dashed', col='green') lines(climFrame$bio12, predNat, lty=4, col='purple') lines(climFrame$bio12, predBrt, lty=5, col='orange') lines(climFrame$bio12, predRf, lty=6, col='cyan') legend( 'topleft', inset = 0.01, legend = c( 'MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF' ), lty = c(1, 1:6), col = c( 'black', 'red', 'blue', 'green', 'purple', 'orange', 'cyan' ), bg = 'white' )
This is a guide to solving issues with running functions that can use more than one core. This includes the train
XYZ functions, bioticVelocity
, and predictEnmSdm
. Each of these function has the argument cores
. By default, the value of cores
is 1, so the function will use only one core. By setting this higher, you can use more cores on your machine. However, occasionally you will run into the error:Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: object '.doSnowGlobals' not found
This means that the worker "nodes" (different instances of R
started by the function to run in parallel) cannot find the doParallel package, even if it is installed on your system.
There are several solutions to this issue. One of them may work for you, and none are inherent to enmSdmX, as far as I can tell.
Strangely enough, running R in parallel sometimes looks like you are accessing the internet to anti-virus software. So, it may block access to other instances of R. You will have to do some surgery on your anti-virus software settings to find where to change this.
R has a default directory where packages are stored on any system. If your packages are stored in a different place, worker nodes may not be able to find them if you use setwd
to change the working directory. I do not know if you have to set the working directory back to the default for your system, or if you have to change it to the folder that contains the folder where your R packages reside (for me, they are the same directory). You can see what your current working directory is using getwd
. RStudio will often change this directory automatically.
So, if you get this error, try using setwd
to set your working directory to the default one for your system, or to the folder that contains the folder that contains your packages.
I'm always game to help you track down your problems (with this package, not necessarily in general). The best way is to create an issue on GitHub.
Not responsible for damage to your computer.
This function calculates weights for points based on proximity to other points and the distance of spatial autocorrelation.
weightByDist(x, maxDist, alpha = 1)
weightByDist(x, maxDist, alpha = 1)
x |
A spatial points object of class |
maxDist |
Maximum distance beyond which a two neighboring points are assumed to have no effect on one another for calculation of weights. |
alpha |
Scaling parameter (see equations above). |
Weights can be used, for example, to account for spatial bias in the manner in which the points were observed. Weighting is calculated on the assumption that if two points fell exactly on top of one another, they should each have a weight of 1/2. If three points had the exact same coordinates, then their weights should be 1/3, and so on. Increasing distance between points should increase their weight, up to the distance at which there is no "significant" spatial autocorrelation, beyond which a point should have a weight of 1. This distance needs to be supplied by the user, as it will depend on the intended use of the weights. The distance can be calculated from "standard" metrics of spatial autocorrelation (e.g., a variogram), or on the basis of knowledge of the system (e.g., maximum dispersal distance of an organism).
For a given point , the weight is defined as
where
in which is the total number of points closer than the maximum distance (
) of point
, and
the distance between focal point
and point
.
is a weighting factor. By default, this is set to 1, but can be changed by the user to augment or diminish the effect that neighboring points have on the weight of a focal cell. When
is <1, neighboring points will reduce the weight of the focal point relative to the default, and when
is >1, they will have less effect relative to the default. When all neighboring points are at or beyond the maximum distance of spatial autocorrelation, then the focal point gets a weight
of 1. When at least neighboring one point is less than this distance away, the weight of the focal point will be >0 but <1.
A numeric vector of weights.
library(sf) # lemur occurrence data data(lemurs) wgs84 <- getCRS('WGS84') occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- sf::st_as_sf(occs, coords=c('longitude', 'latitude'), crs=wgs84) # weights maxDist <- 30000 # in meters, for this example w <- weightByDist(occs, maxDist) # plot plot(st_geometry(occs), cex=5 * w, main='point size ~ weight') plot(st_geometry(mad0), col='gainsboro', border='gray70', add=TRUE) plot(st_geometry(occs), cex=5 * w, add=TRUE)
library(sf) # lemur occurrence data data(lemurs) wgs84 <- getCRS('WGS84') occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] occs <- sf::st_as_sf(occs, coords=c('longitude', 'latitude'), crs=wgs84) # weights maxDist <- 30000 # in meters, for this example w <- weightByDist(occs, maxDist) # plot plot(st_geometry(occs), cex=5 * w, main='point size ~ weight') plot(st_geometry(mad0), col='gainsboro', border='gray70', add=TRUE) plot(st_geometry(occs), cex=5 * w, add=TRUE)