Title: | Species Distribution Modeling |
---|---|
Description: | Methods for species distribution modeling, that is, predicting the environmental similarity of any site to that of the locations of known occurrences of a species. |
Authors: | Robert J. Hijmans [cre, aut] , Steven Phillips [aut], John Leathwick [aut], Jane Elith [aut] |
Maintainer: | Robert J. Hijmans <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.3-16 |
Built: | 2024-11-25 19:30:59 UTC |
Source: | CRAN |
This package implements a few species distribution models, including an R link to the 'maxent' model, and native implementations of Bioclim and Domain. It also provides a number of functions that can assist in using Boosted Regresssion Trees.
A good place to start is the vignette, which you can access by typing vignette('sdm', 'dismo')
In addition there are a number of functions, such sampling background points, k-fold sampling, and for model evaluation (AUC) that are useful for these and for other species distribution modeling methods available in R (e.g. GLM, GAM, and RandomForest).
Robert J. Hijmans, Steven Phillips, John Leathwick and Jane Elith
Distribution data for Solanum acaule (a plant species that occurs in the high Andes of Peru and Bolivia).
Downloaded from GBIF with the gbif
function. For use in the 'species distribution modeling' vignette.
data(acaule)
data(acaule)
A number of sites with presence or absence of the short-finned eel (Anguilla australis) in New Zealand, and environmental data at these sites; and gridded data of the environmental variables for the study area.
type | variable name | values | mean and range |
Reach | LocSed | weighted average of proportional cover of bed sediment | 1=mud, 2=sand, 3=fine gravel; 4=coarse gravel; 5=cobble; 6=boulder; 7=bedrock; 3.77, 1 to 7 |
Segment | SegSumT | Summer air temperature (degrees C) | 16.3, 8.9 to 19.8 |
SegTSeas | Winter air temperature (degrees C), normalised with respect to SegJanT | 0.36, -4.2 to 4.1 | |
SegLowFlow | segment low flow (m3/sec), fourth root transformed | 1.092, 1.0 to 4.09 | |
Downstream | DSDist | distance to coast (km) | 74, 0.03 to 433.4 |
DSDam | presence of known downstream obstructions, mostly dams | 0.18, 0 or 1 | |
DSMaxSlope | maximum downstream slope (degrees) | 3.1, 0 to 29.7 | |
Upstream / catchment | USAvgT | average temperature in catchment (deg C) compared to segment, normalised with respect to SegJanT | -0.38, -7.7 to 2.2 |
USRainDays | days/month with rain greater than 25 mm | 1.22, 0.21 to 3.30 | |
USSlope | average slope in the upstream catchment (degrees) | 14.3, 0 to 41.0 | |
USNative | area with indigenous forest (proportion) | 0.57, 0 to 1 | |
Fishing method | fishing method in five classes: electric, net, spot, trap & mixture | NA | |
data(Anguilla_train) data(Anguilla_test) data(Anguilla_grids)
data(Anguilla_train) data(Anguilla_test) data(Anguilla_grids)
John R. Leathwick and Jane Elith
Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81
The Bioclim algorithm has been extensively used for species distribution modeling. Bioclim is the classic 'climate-envelope-model'. Although it generally does not perform as good as some other modeling methods (Elith et al. 2006) and is unsuited for predicting climate change effects (Hijmans and Graham, 2006). It is still used, however, among other reasons because the algorithm is easy to understand and thus useful in teaching species distribution modeling.
The BIOCLIM algorithm computes the similarity of a location by comparing the values of environmental variables at any location to a percentile distribution of the values at known locations of occurrence ('training sites'). The closer to the 50th percentile (the median), the more suitable the location is. The tails of the distribution are not distinguished, that is, 10 percentile is treated as equivalent to 90 percentile.
In this R implementation, percentile scores are between 0 and 1, but predicted values larger than 0.5 are subtracted from 1. Then, the minimum percentile score across all the environmental variables is computed (i.e. this is like Liebig's law of the minimum, except that high values can also be limiting factors). The final value is subtracted from 1 and multiplied with 2 so that the results are between 0 and 1. The reason for this transformation is that the results become more like that of other distribution modeling methods and are thus easier to interpret. The value 1 will rarely be observed as it would require a location that has the median value of the training data for all the variables considered. The value 0 is very common as it is assigned to all cells with a value of an environmental variable that is outside the percentile distribution (the range of the training data) for at least one of the variables.
In the predict function, you can choose to ignore one of the tails of the distribution (e.g, to make low rainfall a limiting factor, but not high rainfall),
bioclim(x, p, ...)
bioclim(x, p, ...)
x |
Raster* object or matrix |
p |
two column matrix or SpatialPoints* object |
... |
Additional arguments |
An object of class 'Bioclim' (inherits from DistModel-class
)
Robert J. Hijmans
Nix, H.A., 1986. A biogeographic analysis of Australian elapid snakes. In: Atlas of Elapid Snakes of Australia. (Ed.) R. Longmore, pp. 4-15. Australian Flora and Fauna Series Number 7. Australian Government Publishing Service: Canberra.
Booth, T.H., H.A. Nix, J.R. Busby and M.F. Hutchinson, 2014. BIOCLIM: the first species distribution modelling package, its early applications and relevance to most current MAXENT studies. Diversity and Distributions 20: 1-9
Elith, J., C.H. Graham, R.P. Anderson, M. Dudik, S. Ferrier, A. Guisan, R.J. Hijmans, F. Huettmann, J. Leathwick, A. Lehmann, J. Li, L.G. Lohmann, B. Loiselle, G. Manion, C. Moritz, M. Nakamura, Y. Nakazawa, J. McC. Overton, A.T. Peterson, S. Phillips, K. Richardson, R. Scachetti-Pereira, R. Schapire, J. Soberon, S. Williams, M. Wisz and N. Zimmerman, 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography 29: 129-151. doi:10.1111/j.2006.0906-7590.04596.x
Hijmans R.J., and C.H. Graham, 2006. Testing the ability of climate envelope models to predict the effect of climate change on species distributions. Global change biology 12: 2272-2281. doi:10.1111/j.1365-2486.2006.01256.x
predict, maxent, domain, mahal
logo <- stack(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(48.243420, 48.243420, 47.985820, 52.880230, 49.531423, 46.182616, 54.168232, 69.624263, 83.792291, 85.337894, 74.261072, 83.792291, 95.126713, 84.565092, 66.275456, 41.803408, 25.832176, 3.936132, 18.876962, 17.331359,7.048974, 13.648543, 26.093446, 28.544714, 39.104026, 44.572240, 51.171810, 56.262906, 46.269272, 38.161230, 30.618865, 21.945145, 34.390047, 59.656971, 69.839163, 73.233228, 63.239594, 45.892154, 43.252326, 28.356155) , ncol=2) bc <- bioclim(logo, pts) #or v <- extract(logo, pts) bc <- bioclim(v) p1 <- predict(logo, bc) p2 <- predict(logo, bc, tails=c('both', 'low', 'high')) #or #sp <- SpatialPoints(pts) #bc <- bioclim(logo, pts)
logo <- stack(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(48.243420, 48.243420, 47.985820, 52.880230, 49.531423, 46.182616, 54.168232, 69.624263, 83.792291, 85.337894, 74.261072, 83.792291, 95.126713, 84.565092, 66.275456, 41.803408, 25.832176, 3.936132, 18.876962, 17.331359,7.048974, 13.648543, 26.093446, 28.544714, 39.104026, 44.572240, 51.171810, 56.262906, 46.269272, 38.161230, 30.618865, 21.945145, 34.390047, 59.656971, 69.839163, 73.233228, 63.239594, 45.892154, 43.252326, 28.356155) , ncol=2) bc <- bioclim(logo, pts) #or v <- extract(logo, pts) bc <- bioclim(v) p1 <- predict(logo, bc) p2 <- predict(logo, bc, tails=c('both', 'low', 'high')) #or #sp <- SpatialPoints(pts) #bc <- bioclim(logo, pts)
Function to create 'bioclimatic variables' from monthly climate data.
biovars(prec, tmin, tmax, ...)
biovars(prec, tmin, tmax, ...)
prec |
vector, matrix, or RasterStack/Brick of precipitation data |
tmin |
vector, matrix, or RasterStack/Brick of minimum temperature data |
tmax |
vector, matrix, or RasterStack/Brick of maximum temperature data |
... |
Additional arguments |
Input data is normaly monthly. I.e. there should be 12 values (layers) for each variable, but the function should also work for e.g. weekly data (with some changes in the meaning of the output variables. E.g. #8 would then not be for a quater (3 months), but for a 3 week period).
Depending on the class of the input data, an object of class 'vector', 'matrix' or 'RasterBrick' with 19 variables (columns, layers)
bio1 = Mean annual temperature
bio2 = Mean diurnal range (mean of max temp - min temp)
bio3 = Isothermality (bio2/bio7) (* 100)
bio4 = Temperature seasonality (standard deviation *100)
bio5 = Max temperature of warmest month
bio6 = Min temperature of coldest month
bio7 = Temperature annual range (bio5-bio6)
bio8 = Mean temperature of the wettest quarter
bio9 = Mean temperature of driest quarter
bio10 = Mean temperature of warmest quarter
bio11 = Mean temperature of coldest quarter
bio12 = Total (annual) precipitation
bio13 = Precipitation of wettest month
bio14 = Precipitation of driest month
bio15 = Precipitation seasonality (coefficient of variation)
bio16 = Precipitation of wettest quarter
bio17 = Precipitation of driest quarter
bio18 = Precipitation of warmest quarter
Robert J. Hijmans
tmin <- c(10,12,14,16,18,20,22,21,19,17,15,12) tmax <- tmin + 5 prec <- c(0,2,10,30,80,160,80,20,40,60,20,0) biovars(prec, tmin, tmax) tmn = tmx = prc = brick(nrow=1, ncol=1) tmn <- setValues(tmn, t(matrix(c(10,12,14,16,18,20,22,21,19,17,15,12)))) tmx <- tmn + 5 prc <- setValues(prc, t(matrix(c(0,2,10,30,80,160,80,20,40,60,20,0)))) b <- biovars(prc, tmn, tmx) as.matrix(b)
tmin <- c(10,12,14,16,18,20,22,21,19,17,15,12) tmax <- tmin + 5 prec <- c(0,2,10,30,80,160,80,20,40,60,20,0) biovars(prec, tmin, tmax) tmn = tmx = prc = brick(nrow=1, ncol=1) tmn <- setValues(tmn, t(matrix(c(10,12,14,16,18,20,22,21,19,17,15,12)))) tmx <- tmn + 5 prc <- setValues(prc, t(matrix(c(0,2,10,30,80,160,80,20,40,60,20,0)))) b <- biovars(prc, tmn, tmx) as.matrix(b)
Make a box plot of model evaluation data, i.e., the model predictions for known presence and absence points.
Arguments:
x
Object of class ModelEvaluation
...
Additional arguments that can be passed to boxplot
Robert J. Hijmans
Function to calculate deviance given two vectors of observed and predicted values. Requires a family argument which is set to binomial by default
calc.deviance(obs, pred, weights = rep(1,length(obs)), family="binomial", calc.mean = TRUE)
calc.deviance(obs, pred, weights = rep(1,length(obs)), family="binomial", calc.mean = TRUE)
obs |
a vector with observed values |
pred |
a vector with predicted values that correspond the the values in obs |
weights |
a vector of weight values |
family |
One of "binomial", "bernoulli", "poisson", "laplace", or "gaussian" |
calc.mean |
Logical. If |
John R. Leathwick and Jane Elith
The Circle hull model predicts that a species is present at sites inside the smallest circle that can contain a set of training points, and absent outside that circle.
circleHull(p, ...)
circleHull(p, ...)
p |
point locations (presence). Two column matrix, data.frame or SpatialPoints* object |
... |
Additional arguments. See details |
An object of class 'CircleHull' (inherits from DistModel-class
)
Robert J. Hijmans
circles, convHull, rectHull, predict
r <- raster(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) train <- pts[1:12, ] test <- pts[13:20, ] ch <- circleHull(train) predict(ch, test) plot(r) plot(ch, border='red', lwd=2, add=TRUE) points(train, col='red', pch=20, cex=2) points(test, col='black', pch=20, cex=2) pr <- predict(ch, r, progress='') plot(pr) points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2) # to get the polygons: p <- polygons(ch) p
r <- raster(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) train <- pts[1:12, ] test <- pts[13:20, ] ch <- circleHull(train) predict(ch, test) plot(r) plot(ch, border='red', lwd=2, add=TRUE) points(train, col='red', pch=20, cex=2) points(test, col='black', pch=20, cex=2) pr <- predict(ch, r, progress='') plot(pr) points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2) # to get the polygons: p <- polygons(ch) p
The Circles Range model predicts that a species is present at sites within a certain distance from a training point, and absent further away.
## S4 method for signature 'matrix' circles(p, d, lonlat, n=360, r=6378137, dissolve=TRUE, ...) ## S4 method for signature 'SpatialPoints' circles(p, d, lonlat, n=360, r=6378137, dissolve=TRUE, ...)
## S4 method for signature 'matrix' circles(p, d, lonlat, n=360, r=6378137, dissolve=TRUE, ...) ## S4 method for signature 'SpatialPoints' circles(p, d, lonlat, n=360, r=6378137, dissolve=TRUE, ...)
p |
point locations (presence). Two column matrix, data.frame or SpatialPoints* object |
d |
numeric. The radius of each circle in meters. A single number or a vector with elements corresponding to rows in |
lonlat |
logical. Are these longitude/latitude data? If missing this is taken from the |
n |
integer. How many vertices in the circle? Default is 360 |
r |
numeric. Radius of the earth. Only relevant for longitude/latitude data. Default is 6378137 m |
dissolve |
logical. Dissolve overlapping circles. Setting this to |
... |
additional arguments, none implemented |
An object of class 'CirclesRange' (inherits from DistModel-class
)
Robert J. Hijmans
predict, geoDist, convHull, maxent, domain,
mahal, convHull
r <- raster(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) train <- pts[1:12, ] test <- pts[13:20, ] cc <- circles(train, lonlat=FALSE) predict(cc, test) plot(r) plot(cc, border='red', lwd=2, add=TRUE) points(train, col='red', pch=20, cex=2) points(test, col='black', pch=20, cex=2) pr <- predict(cc, r, progress='') plot(r, legend=FALSE) plot(pr, add=TRUE, col='blue') points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2) # to get the polygons: p <- polygons(cc) p # to compute the Circular Area Range of a species (see Hijmans and Spooner, 2001) pts <- train*10 ca100 <- polygons(circles(pts, d=100, lonlat=FALSE)) ca150 <- polygons(circles(pts, d=150, lonlat=FALSE)) sum(area(ca150)) / (pi*150^2) sum(area(ca100)) / (pi*100^2) par(mfrow=c(1,2)) plot(ca100); points(pts) plot(ca150); points(pts)
r <- raster(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) train <- pts[1:12, ] test <- pts[13:20, ] cc <- circles(train, lonlat=FALSE) predict(cc, test) plot(r) plot(cc, border='red', lwd=2, add=TRUE) points(train, col='red', pch=20, cex=2) points(test, col='black', pch=20, cex=2) pr <- predict(cc, r, progress='') plot(r, legend=FALSE) plot(pr, add=TRUE, col='blue') points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2) # to get the polygons: p <- polygons(cc) p # to compute the Circular Area Range of a species (see Hijmans and Spooner, 2001) pts <- train*10 ca100 <- polygons(circles(pts, d=100, lonlat=FALSE)) ca150 <- polygons(circles(pts, d=150, lonlat=FALSE)) sum(area(ca150)) / (pi*150^2) sum(area(ca100)) / (pi*100^2) par(mfrow=c(1,2)) plot(ca100); points(pts) plot(ca150); points(pts)
The Convex hull model predicts that a species is present at sites inside the convex hull of a set of training points, and absent outside that hull. I.e. this is the spatial convex hull, not an environmental hull.
convHull(x, ...)
convHull(x, ...)
x |
point locations (presence). Two column matrix, data.frame or SpatialPoints* object |
... |
Additional arguments. See details |
You can supply an argument n (>= 1) to get n convex hulls around subsets of the points. You can also set n=1:x, to get a set of overlapping polygons consisting of 1 to x parts. I.e. the first polygon has 1 part, the second has 2 parts, and x has x parts.
An object of class 'ConvexHull' (inherits from DistModel-class
)
Robert J. Hijmans
predict, geoDist, maxent, domain, mahal
r <- raster(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) train <- pts[1:12, ] test <- pts[13:20, ] ch <- convHull(train) predict(ch, test) plot(r) plot(ch, border='red', lwd=2, add=TRUE) points(train, col='red', pch=20, cex=2) points(test, col='black', pch=20, cex=2) pr <- predict(ch, r, progress='') plot(pr) points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2) # to get the polygons: p <- polygons(ch) p
r <- raster(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) train <- pts[1:12, ] test <- pts[13:20, ] ch <- convHull(train) predict(ch, test) plot(r) plot(ch, border='red', lwd=2, add=TRUE) points(train, col='red', pch=20, cex=2) points(test, col='black', pch=20, cex=2) pr <- predict(ch, r, progress='') plot(pr) points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2) # to get the polygons: p <- polygons(ch) p
Evaluate a model for intervals of distances to the nearest point in a reference dataset for presence data and a sample of the absence data selected to have a low spatial sorting bias (obtained with pwdSample).
dcEvaluate(p, a, reference, lonlat=TRUE, binsize=15, predp, preda, model, predictors, fun=predict)
dcEvaluate(p, a, reference, lonlat=TRUE, binsize=15, predp, preda, model, predictors, fun=predict)
p |
two column matrix (x, y) or (longitude/latitude) or SpatialPoints object, for point locations |
a |
two column matrix (x, y) or (longitude/latitude) or SpatialPoints object, for point locations |
reference |
as above for reference point locations to which distances are computed |
lonlat |
Logical. Use |
binsize |
postive integer. How many presence points in each distance bin? |
predp |
p |
preda |
a |
model |
m |
predictors |
pr |
fun |
function |
list with Evaluation objects
Robert J. Hijmans
Create a density plots of presence and absence data
A density plot. Presence data are in red, and absence data (if available) are in blue.
density(x, ...)
x |
Object of class 'ModelEvaluation' or of a class that inherits from 'DistModel, (such as 'MaxEnt', 'Bioclim') | |
... |
Additional arguments that can be passed to plot | |
Robert J. Hijmans
Parent class for a number of distribution models defined in the dismo package (those created by bioclim, domain, maxent and mahal
). This is a virtual Class, no objects may be direclty created from it.
presence
:presence data used
absence
:absence or background data used
hasabsence
:Logical indicating whether there is any absence data
Robert J. Hijmans
The Domain algorithm (Carpenter et al. 1993) that has been extensively used for species distribution modeling. It is included here for that reason but please note that it generally does not perform very well in model comparison (Elith et al. 2006, Hijmans and Graham, 2006). The Domain algorithm computes the Gower distance between environmental variables at any location and those at any of the known locations of occurrence ('training sites'). For each variable the minimum distance between a site and any of the training points is taken. To integrate over environmental variables, the maximum distance to any of the variables is used. This distance is subtracted from one, and (in this R implementation) values below zero are truncated so that the scores are between 0 (low) and 1 (high).
domain(x, p, ...)
domain(x, p, ...)
x |
Raster* object or matrix |
p |
two column matrix or SpatialPoints* object |
... |
Additional arguments |
An object of class 'Domain' (inherits from DistModel-class
)
Robert J. Hijmans
Carpenter G., A.N. Gillison and J. Winter, 1993. Domain: a flexible modelling procedure for mapping potential distributions of plants and animals. Biodiversity Conservation 2:667-680.
Elith, J., C.H. Graham, R.P. Anderson, M. Dudik, S. Ferrier, A. Guisan, R.J. Hijmans, F. Huettmann, J. Leathwick, A. Lehmann, J. Li, L.G. Lohmann, B. Loiselle, G. Manion, C. Moritz, M. Nakamura, Y. Nakazawa, J. McC. Overton, A.T. Peterson, S. Phillips, K. Richardson, R. Scachetti-Pereira, R. Schapire, J. Soberon, S. Williams, M. Wisz and N. Zimmerman, 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography 29: 129-151. doi:10.1111/j.2006.0906-7590.04596.x
Hijmans R.J., and C.H. Graham, 2006. Testing the ability of climate envelope models to predict the effect of climate change on species distributions. Global change biology 12: 2272-2281. doi:10.1111/j.1365-2486.2006.01256.x
predict, maxent, bioclim, mahal
logo <- stack(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(48.243420, 48.243420, 47.985820, 52.880230, 49.531423, 46.182616, 54.168232, 69.624263, 83.792291, 85.337894, 74.261072, 83.792291, 95.126713, 84.565092, 66.275456, 41.803408, 25.832176, 3.936132, 18.876962, 17.331359,7.048974, 13.648543, 26.093446, 28.544714, 39.104026, 44.572240, 51.171810, 56.262906, 46.269272, 38.161230, 30.618865, 21.945145, 34.390047, 59.656971, 69.839163, 73.233228, 63.239594, 45.892154, 43.252326, 28.356155), ncol=2) d <- domain(logo, pts) p <- predict(d, logo)
logo <- stack(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(48.243420, 48.243420, 47.985820, 52.880230, 49.531423, 46.182616, 54.168232, 69.624263, 83.792291, 85.337894, 74.261072, 83.792291, 95.126713, 84.565092, 66.275456, 41.803408, 25.832176, 3.936132, 18.876962, 17.331359,7.048974, 13.648543, 26.093446, 28.544714, 39.104026, 44.572240, 51.171810, 56.262906, 46.269272, 38.161230, 30.618865, 21.945145, 34.390047, 59.656971, 69.839163, 73.233228, 63.239594, 45.892154, 43.252326, 28.356155), ncol=2) d <- domain(logo, pts) p <- predict(d, logo)
Very simple mechanistic model for plants.
ecocrop(crop, tmin, tavg, prec, rainfed=TRUE, ...) getCrop(name) data(ECOcrops)
ecocrop(crop, tmin, tavg, prec, rainfed=TRUE, ...) getCrop(name) data(ECOcrops)
crop |
An object of class 'ECOCROP', or the name of a crop as in getCrop |
tmin |
Vector of monthly minimum temperature (degrees C) |
tavg |
Vector of monthly average temperature (degrees C) |
prec |
Vector of monthly precipitation (mm) |
rainfed |
Logical. If |
... |
Additinal arguments |
name |
Name of a crop (character). If missing a data.frame with all crop names is returned |
Object of class ECOCROP
Robert J. Hijmans
ecocrop('potato', 5:16, 15:26, runif(12)*100) getCrop('Acacia brachystachya Benth.') crop <- getCrop('Hot pepper') ecocrop(crop, 5:16, 15:26, rainfed=FALSE) # with spatial data tmin = tavg = prec = brick(nrow=1, ncol=1) tmin <- setValues(tmin, t(matrix(5:16))) tavg <- tmin + 5 prec <- setValues(prec, t(matrix(15:26))) crop <- getCrop('Hot pepper') ecocrop(crop, tmin, tavg, prec, rainfed = FALSE)
ecocrop('potato', 5:16, 15:26, runif(12)*100) getCrop('Acacia brachystachya Benth.') crop <- getCrop('Hot pepper') ecocrop(crop, 5:16, 15:26, rainfed=FALSE) # with spatial data tmin = tavg = prec = brick(nrow=1, ncol=1) tmin <- setValues(tmin, t(matrix(5:16))) tavg <- tmin + 5 prec <- setValues(prec, t(matrix(15:26))) crop <- getCrop('Hot pepper') ecocrop(crop, tmin, tavg, prec, rainfed = FALSE)
Simple generic limiting factor based model, in the tradition of the PLANTGRO model (Hackett, 1991)
## S4 method for signature 'matrix,matrix' ecolim(x, y, extrapolate=TRUE, ...)
## S4 method for signature 'matrix,matrix' ecolim(x, y, extrapolate=TRUE, ...)
x |
numeric matrix with driver variables (each column has values for the variables). Values have to be in ascending order |
y |
numeric matrix with responses (between 0 and 1), one column for each column in |
extrapolate |
logical. Should the model extrapolate beyond the extremes of x? If |
... |
Additional arguments. None implemented |
Robert J. Hijmans
Hackett, C., 1991. PLANTGRO, a software package for coarse prediction of plant growth. CSIRO, Melbourne, Australia
# get predictor variables fnames <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''), pattern='grd', full.names=TRUE ) env <- stack(fnames) bio1 <- c(200,250,400,450) bio12 <- c(0,1000, 3000, 4000) r1 <- c(0, 1, 1, 0) r2 <- c(0, 0, 1, 1) x <- cbind(bio1, bio12) y <- cbind(r1, r2) e <- ecolim(x, y) plot(e, lwd=2, col='red') p <- predict(e, env) plot(p) # no extrapolation: ef <- ecolim(x, y, extrapolate=FALSE) pf <- predict(ef, env) plot(pf) occurence <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='') occ <- read.table(occurence, header=TRUE, sep=',')[,-1] fold <- kfold(occ, k=5) occtest <- occ[fold == 1, ] occtrain <- occ[fold != 1, ] bg <- randomPoints(env, 1000) ## Not run: # An approach to optimize the values based on # some known presences and (here random) absences # for the same species as in the maxent example # intial parameters v <- c(200, 250, 400, 450, 0, 1000, 3000, 4000) # function to be minimized f <- function(p) { x[] <- p # numbers must go up if ( any(x[-1,] < x[-nrow(x), ]) ) return(Inf) e <- ecolim(x, y) # we are minimizing, hence 1-AUC 1-evaluate(e, p=occtrain, a=bg, x=env)@auc } # patience... set.seed(0) z <- optim(v, f) x[] <- z$par eco <- ecolim(x, y) evaluate(eco, p=occtest, a=bg, x=env) set.seed(0) pwd <- pwdSample(occtest,bg,occtrain) ptest <- occtest[!is.na(pwd),] atest <- bg[na.omit(pwd),] evaluate(eco, p=ptest, a=atest, x=env) p2 <- predict(eco, env) plot(p2) ## End(Not run)
# get predictor variables fnames <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''), pattern='grd', full.names=TRUE ) env <- stack(fnames) bio1 <- c(200,250,400,450) bio12 <- c(0,1000, 3000, 4000) r1 <- c(0, 1, 1, 0) r2 <- c(0, 0, 1, 1) x <- cbind(bio1, bio12) y <- cbind(r1, r2) e <- ecolim(x, y) plot(e, lwd=2, col='red') p <- predict(e, env) plot(p) # no extrapolation: ef <- ecolim(x, y, extrapolate=FALSE) pf <- predict(ef, env) plot(pf) occurence <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='') occ <- read.table(occurence, header=TRUE, sep=',')[,-1] fold <- kfold(occ, k=5) occtest <- occ[fold == 1, ] occtrain <- occ[fold != 1, ] bg <- randomPoints(env, 1000) ## Not run: # An approach to optimize the values based on # some known presences and (here random) absences # for the same species as in the maxent example # intial parameters v <- c(200, 250, 400, 450, 0, 1000, 3000, 4000) # function to be minimized f <- function(p) { x[] <- p # numbers must go up if ( any(x[-1,] < x[-nrow(x), ]) ) return(Inf) e <- ecolim(x, y) # we are minimizing, hence 1-AUC 1-evaluate(e, p=occtrain, a=bg, x=env)@auc } # patience... set.seed(0) z <- optim(v, f) x[] <- z$par eco <- ecolim(x, y) evaluate(eco, p=occtest, a=bg, x=env) set.seed(0) pwd <- pwdSample(occtest,bg,occtrain) ptest <- occtest[!is.na(pwd),] atest <- bg[na.omit(pwd),] evaluate(eco, p=ptest, a=atest, x=env) p2 <- predict(eco, env) plot(p2) ## End(Not run)
Evaluation of models with presence/absence data. Given a vector of presence and a vector of absence values (or a model and presence and absence points and predictors), confusion matrices are computed (for varying thresholds), and model evaluation statistics are computed for each confusion matrix / threshold. See the description of class ModelEvaluation-class
for more info.
evaluate(p, a, model, x, tr, ...)
evaluate(p, a, model, x, tr, ...)
p |
presence points (x and y coordinates or SpatialPoints* object). Or, if Or, a matrix with values to compute predictions for |
a |
absence points (x and y coordinates or SpatialPoints* object). Or, if Or, a matrix with values to compute predictions for |
model |
any fitted model, including objects inheriting from 'DistModel'; not used when |
x |
Optional. Predictor variables (object of class Raster*). If present, |
tr |
Optional. a vector of threshold values to use for computing the confusion matrices |
... |
Additional arguments for the predict function |
An object of ModelEvaluation-class
Robert J. Hijmans
Fielding, A.H. and J.F. Bell, 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24:38-49
## See ?maxent for an example with real data. # this is a contrived example: # p has the predicted values for 50 known cases (locations) # with presence of the phenomenon (species) p <- rnorm(50, mean=0.7, sd=0.3) # a has the predicted values for 50 background locations (or absence) a <- rnorm(50, mean=0.4, sd=0.4) e <- evaluate(p=p, a=a) threshold(e) plot(e, 'ROC') plot(e, 'TPR') boxplot(e) density(e) str(e)
## See ?maxent for an example with real data. # this is a contrived example: # p has the predicted values for 50 known cases (locations) # with presence of the phenomenon (species) p <- rnorm(50, mean=0.7, sd=0.3) # a has the predicted values for 50 background locations (or absence) a <- rnorm(50, mean=0.4, sd=0.4) e <- evaluate(p=p, a=a) threshold(e) plot(e, 'ROC') plot(e, 'TPR') boxplot(e) density(e) str(e)
Preparing data for model testing with the ROCR package.
evaluateROCR(model, p, a, x)
evaluateROCR(model, p, a, x)
model |
any fitted model, including objects inheriting from 'DistModel' |
p |
presence points (x and y coordinates or SpatialPoints* object). Or, if Or, a matrix with values to compute predictions for |
a |
absence points (x and y coordinates or SpatialPoints* object). Or, if Or, a matrix with values to compute predictions for |
x |
optional. predictor variables, if present, |
An object of class "prediction" (defined in the ROCR package)
Robert J. Hijmans
Make a ROC curve, or a plot of a threshold dependent measure against threshold values
usage: plot(x, y, ...)
x |
Object of class ModelEvaluation | |
y |
Character. Either 'ROC' or a threshold dependent measure such as 'kappa', 'TPR' | |
... |
Additional arguments that can be passed to plot | |
Robert J. Hijmans
ModelEvaluation-class, density, pairs, plot
# p = the predicted value for 50 known cases (locations) with presence of the phenomenon (species) p = rnorm(50, mean=0.7, sd=0.3) # b = the predicted value for 50 known cases (locations) with absence of the phenomenon (species) a = rnorm(50, mean=0.4, sd=0.4) e = evaluate(p=p, a=a) plot(e, 'ROC') plot(e, 'kappa') plot(e, 'FPR') plot(e, 'prevalence')
# p = the predicted value for 50 known cases (locations) with presence of the phenomenon (species) p = rnorm(50, mean=0.7, sd=0.3) # b = the predicted value for 50 known cases (locations) with absence of the phenomenon (species) a = rnorm(50, mean=0.4, sd=0.4) e = evaluate(p=p, a=a) plot(e, 'ROC') plot(e, 'kappa') plot(e, 'FPR') plot(e, 'prevalence')
This function downloads species occurence records from the Global Biodiversity Information Facility (GBIF) data portal. You can download either a single species (if you append a '*' to the species name) or a subspecies of comparable level. You can download the data for an entire genus by using species='*'
. Before using this function, please first check the GBIF data use agreement and see the note below about how to cite these data.
gbif(genus, species="", ext=NULL, args=NULL, geo=TRUE, sp=FALSE, removeZeros=FALSE, download=TRUE, ntries=5, nrecs=300, start=1, end=Inf)
gbif(genus, species="", ext=NULL, args=NULL, geo=TRUE, sp=FALSE, removeZeros=FALSE, download=TRUE, ntries=5, nrecs=300, start=1, end=Inf)
genus |
character. genus name |
species |
character. species name. Use '*' to download the entire genus. Append '*' to the species name to get all naming variants (e.g. with and witout species author name) and sub-taxa |
ext |
Extent object to limit the geographic extent of the records. An extent can be created using functions like |
args |
character. Additional arguments to refine the query. See query parameters in https://www.gbif.org/developer/occurrence for more details |
geo |
logical. If |
sp |
logical. If |
removeZeros |
logical. If |
download |
logical. If |
ntries |
integer. How many times should the function attempt to download the data, if an invalid response is returned (perhaps because the GBIF server is very busy) |
nrecs |
integer. How many records to download in a single request (max is 300)? |
start |
integer. Record number from which to start requesting data |
end |
integer. Last record to request |
data frame
Under the terms of the GBIF data user agreement, users who download data agree to cite a DOI. Citation rewards data-publishing institutions and individuals and provides support for sharing open data [1][2]. You can get a DOI for the data you downloaded by creating a "derived" dataset. For this to work, you need to keep the "datasetKey" variable in your dataset.
Robert J. Hijmans
https://www.gbif.org/occurrence
## Not run: gbif('solanum', download=FALSE) gbif('solanum', 'acaule', download=FALSE) gbif('Batrachoseps', '' , down=FALSE) gbif('Batrachoseps', 'luciae', down=FALSE) g <- gbif('Batrachoseps', 'luciae', geo=TRUE) plot(g$lon, g$lat) gs <- gbif('Batrachoseps', 'luciae', sp=TRUE) plot(gs) ## End(Not run)
## Not run: gbif('solanum', download=FALSE) gbif('solanum', 'acaule', download=FALSE) gbif('Batrachoseps', '' , down=FALSE) gbif('Batrachoseps', 'luciae', down=FALSE) g <- gbif('Batrachoseps', 'luciae', geo=TRUE) plot(g$lon, g$lat) gs <- gbif('Batrachoseps', 'luciae', sp=TRUE) plot(gs) ## End(Not run)
Calculates a gradient boosting (gbm) object with a fixed number of trees. The optimal number of trees can be identified using gbm.step or some other procedure. Mostly used as a utility function, e.g., when being called by gbm.simplify. It takes as input a dataset and arguments selecting x and y variables, learning rate and tree complexity.
gbm.fixed(data, gbm.x, gbm.y, tree.complexity = 1, site.weights = rep(1, nrow(data)), verbose = TRUE, learning.rate = 0.001, n.trees = 2000, bag.fraction = 0.5, family = "bernoulli", keep.data = FALSE, var.monotone = rep(0, length(gbm.x)))
gbm.fixed(data, gbm.x, gbm.y, tree.complexity = 1, site.weights = rep(1, nrow(data)), verbose = TRUE, learning.rate = 0.001, n.trees = 2000, bag.fraction = 0.5, family = "bernoulli", keep.data = FALSE, var.monotone = rep(0, length(gbm.x)))
data |
data.frame |
gbm.x |
indices of the predictors in the input dataframe |
gbm.y |
index of the response in the input dataframe |
tree.complexity |
the tree depth - sometimes referred to as interaction depth |
site.weights |
by default set equal |
verbose |
to control reporting |
learning.rate |
controls speed of the gradient descent |
n.trees |
default number of trees |
bag.fraction |
varies random sample size for each new tree |
family |
can be any of "bernoulli", "poisson", "gaussian", or "laplace" |
keep.data |
Logical. If |
var.monotone |
constrain to positive (1) or negative monontone (-1) |
object of class gbm
John R. Leathwick and Jane Elith
Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81
Calculates a gradient boosting (gbm) object in which model complexity is determined using a training set with predictions made to a withheld set. An initial set of trees is fitted, and then trees are progressively added testing performance along the way, using gbm.perf until the optimal number of trees is identified.
As any structured ordering of the data should be avoided, a copy of the data set is BY DEFAULT randomly reordered each time the function is run.
gbm.holdout(data, gbm.x, gbm.y, learning.rate = 0.001, tree.complexity = 1, family = "bernoulli", n.trees = 200, add.trees = n.trees, max.trees = 20000, verbose = TRUE, train.fraction = 0.8, permute = TRUE, prev.stratify = TRUE, var.monotone = rep(0, length(gbm.x)), site.weights = rep(1, nrow(data)), refit = TRUE, keep.data = TRUE)
gbm.holdout(data, gbm.x, gbm.y, learning.rate = 0.001, tree.complexity = 1, family = "bernoulli", n.trees = 200, add.trees = n.trees, max.trees = 20000, verbose = TRUE, train.fraction = 0.8, permute = TRUE, prev.stratify = TRUE, var.monotone = rep(0, length(gbm.x)), site.weights = rep(1, nrow(data)), refit = TRUE, keep.data = TRUE)
data |
data.frame |
gbm.x |
indices of the predictors in the input dataframe |
gbm.y |
index of the response in the input dataframe |
learning.rate |
typically varied between 0.1 and 0.001 |
tree.complexity |
sometimes called interaction depth |
family |
"bernoulli","poisson", etc. as for gbm |
n.trees |
initial number of trees |
add.trees |
number of trees to add at each increment |
max.trees |
maximum number of trees to fit |
verbose |
controls degree of screen reporting |
train.fraction |
proportion of data to use for training |
permute |
reorder data to start with |
prev.stratify |
stratify selection for presence/absence data |
var.monotone |
allows constraining of response to monotone |
site.weights |
set equal to 1 by default |
refit |
refit the model with the full data but id'd no of trees |
keep.data |
keep copy of the data |
A gbm object
John R. Leathwick and Jane Elith
Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81
Tests whether interactions have been detected and modelled, and reports the relative strength of these. Results can be visualised with gbm.perspec
The function assesses the magnitude of 2nd order interaction effects in gbm models fitted with interaction depths greater than 1. This is achieved by:
1. forming predictions on the linear scale for each predictor pair;
2. fitting a linear model that relates these predictions to the predictor pair, with the the predictors fitted as factors;
3. calculating the mean value of the residuals, the magnitude of which increases with the strength of any interaction effect;
4. results are stored in an array;
5. finally, the n most important interactions are identified, where n is 25
gbm.interactions(gbm.object, use.weights=FALSE, mask.object)
gbm.interactions(gbm.object, use.weights=FALSE, mask.object)
gbm.object |
A gbm object |
use.weights |
Logical. If |
mask.object |
a gbm object describing sample intensity |
object of class gbm
Takes a gbm boosted regression tree object produced by gbm.step and plots a perspective plot showing predicted values for two predictors as specified by number using x and y. Values for all other variables are set at their mean by default but values can be specified by giving a list consisting of the variable name and its desired value, e.g., c(name1 = 12.2, name2 = 57.6)
gbm.perspec(gbm.object, x = 1, y = 2, pred.means = NULL, x.label = NULL, x.range = NULL, y.label = NULL, z.label = "fitted value", y.range = NULL, z.range = NULL, leg.coords = NULL, ticktype = "detailed", theta = 55, phi = 40, smooth = "none", mask = FALSE, perspective = TRUE, ...)
gbm.perspec(gbm.object, x = 1, y = 2, pred.means = NULL, x.label = NULL, x.range = NULL, y.label = NULL, z.label = "fitted value", y.range = NULL, z.range = NULL, leg.coords = NULL, ticktype = "detailed", theta = 55, phi = 40, smooth = "none", mask = FALSE, perspective = TRUE, ...)
gbm.object |
object of class gbm |
x |
the first variable to be plotted |
y |
the second variable to be plotted |
pred.means |
allows specification of values for other variables |
x.label |
allows manual specification of the x label |
x.range |
manual range specification for the x variable |
y.label |
and y label |
z.label |
default z label |
y.range |
and the y |
z.range |
allows control of the vertical axis |
leg.coords |
can specify coords (x, y) for legend |
ticktype |
specifiy detailed types - otherwise "simple" |
theta |
rotation |
phi |
and elevation |
smooth |
controls smoothing of the predicted surface |
mask |
controls masking using a sample intensity model |
perspective |
controls whether a contour or perspective plot is drawn |
... |
allows the passing of additional arguments to plotting routine useful options include shade, ltheta, lphi for controlling illumination and cex for controlling text size - cex.axis and cex.lab have no effect |
John R. Leathwick and Jane Elith
Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81
Function to plot gbm response variables, with the option of adding a smooth representation of the response if requested additional options in this version allow for plotting on a common scale. Note hat fitted functions are centered by subtracting their mean.
gbm.plot(gbm.object, variable.no=0, smooth=FALSE, rug=TRUE, n.plots=length(pred.names), common.scale=TRUE, write.title=TRUE, y.label="fitted function", x.label=NULL, show.contrib=TRUE, plot.layout=c(3, 4), ...)
gbm.plot(gbm.object, variable.no=0, smooth=FALSE, rug=TRUE, n.plots=length(pred.names), common.scale=TRUE, write.title=TRUE, y.label="fitted function", x.label=NULL, show.contrib=TRUE, plot.layout=c(3, 4), ...)
gbm.object |
a gbm object - could be one from gbm.step |
variable.no |
the var to plot - if zero then plots all |
smooth |
Logical. If |
rug |
Logical. If |
n.plots |
plot the first n most important preds |
common.scale |
Logical. If |
write.title |
Logical. If |
y.label |
the default y-axis label |
x.label |
the default x-axis label |
show.contrib |
Logical. If |
plot.layout |
define the default layout for graphs on the page |
... |
other arguments to pass to the plotting useful options include cex.axis, cex.lab, etc. |
John R. Leathwick and Jane Elith
Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81
Plots the fitted values from a gbm object returned by any of the model fitting options. This can give a more reliable guide to the shape of the fitted surface than can be obtained from the individual functions, particularly when predictor variables are correlated and/or samples are unevenly distributed in environmental space. Allows masking out of absences to enable focus on sites with high predicted values.
gbm.plot.fits(gbm.object, v=0, mask.presence=FALSE, use.factor=FALSE )
gbm.plot.fits(gbm.object, v=0, mask.presence=FALSE, use.factor=FALSE )
gbm.object |
a gbm object |
v |
variable numbers to be plotted (if 0 then all are plotted) |
mask.presence |
Logical. If |
use.factor |
Logical. If |
John R. Leathwick and Jane Elith
Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81
The function takes an inital cross-validated model as produced by gbm.step and then assesses the potential to remove predictors using k-fold cross validation. This done for each fold, removing the lowest contributing predictor, and repeating this process for a set number of steps. After the removal of each predictor, the change in predictive deviance is computed relative to that obtained when using all predictors. The function returns a list containing the mean change in deviance and its standard error as a function of the number of variables removed. Having completed the cross validation, it then identifies the sequence of variable to remove when using the full data set, testing this up to the number of steps used in the cross-validation phase of the analysis with results reported to the screen.
The function returns a table containing the order in which variables are to be removed and some vectors, each of which specifies the predictor column numbers in the original dataframe - the latter can be used as an argument to gbm.step e.g., gbm.step(data = data, gbm.x = simplify.object$pred.list[[4]]... would implement a new analysis with the original predictor set, minus its four lowest contributing predictors.
gbm.simplify(gbm.object, n.folds = 10, n.drops = "auto", alpha = 1, prev.stratify = TRUE, eval.data = NULL, plot = TRUE)
gbm.simplify(gbm.object, n.folds = 10, n.drops = "auto", alpha = 1, prev.stratify = TRUE, eval.data = NULL, plot = TRUE)
gbm.object |
a gbm object describing sample intensity |
n.folds |
number of times to repeat the analysis |
n.drops |
can be automatic or an integer specifying the number of drops to check |
alpha |
controls stopping when n.drops = "auto" |
prev.stratify |
use prevalence stratification in selecting evaluation data |
eval.data |
an independent evaluation data set - leave here for now |
plot |
plot results |
A list with these elements: deviance.summary, deviance.matrix, drop.count, final.drops, pred.list, and gbm.call = gbm.call))
John R. Leathwick and Jane Elith
Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81
Function to assess the optimal number of boosting trees using k-fold cross validation. This is an implementation of the cross-validation procedure described on page 215 of Hastie et al (2001).
The data is divided into 10 subsets, with stratification by prevalence if required for presence/absence data. The function then fits a gbm model of increasing complexity along the sequence from n.trees
to n.trees + (n.steps * step.size)
, calculating the residual deviance at each step along the way. After each fold processed, the function calculates the average holdout residual deviance and its standard error and then identifies the optimal number of trees as that at which the holdout deviance is minimised. It fits a model with this number of trees, returning it as a gbm model along with additional information from the cross-validation selection process.
gbm.step(data, gbm.x, gbm.y, offset = NULL, fold.vector = NULL, tree.complexity = 1, learning.rate = 0.01, bag.fraction = 0.75, site.weights = rep(1, nrow(data)), var.monotone = rep(0, length(gbm.x)), n.folds = 10, prev.stratify = TRUE, family = "bernoulli", n.trees = 50, step.size = n.trees, max.trees = 10000, tolerance.method = "auto", tolerance = 0.001, plot.main = TRUE, plot.folds = FALSE, verbose = TRUE, silent = FALSE, keep.fold.models = FALSE, keep.fold.vector = FALSE, keep.fold.fit = FALSE, ...)
gbm.step(data, gbm.x, gbm.y, offset = NULL, fold.vector = NULL, tree.complexity = 1, learning.rate = 0.01, bag.fraction = 0.75, site.weights = rep(1, nrow(data)), var.monotone = rep(0, length(gbm.x)), n.folds = 10, prev.stratify = TRUE, family = "bernoulli", n.trees = 50, step.size = n.trees, max.trees = 10000, tolerance.method = "auto", tolerance = 0.001, plot.main = TRUE, plot.folds = FALSE, verbose = TRUE, silent = FALSE, keep.fold.models = FALSE, keep.fold.vector = FALSE, keep.fold.fit = FALSE, ...)
data |
input data.frame |
gbm.x |
indices or names of predictor variables in |
gbm.y |
index or name of response variable in |
offset |
offset |
fold.vector |
a fold vector to be read in for cross validation with offsets |
tree.complexity |
sets the complexity of individual trees |
learning.rate |
sets the weight applied to inidivudal trees |
bag.fraction |
sets the proportion of observations used in selecting variables |
site.weights |
allows varying weighting for sites |
var.monotone |
restricts responses to individual predictors to monotone |
n.folds |
number of folds |
prev.stratify |
prevalence stratify the folds - only for presence/absence data |
family |
family - bernoulli (=binomial), poisson, laplace or gaussian |
n.trees |
number of initial trees to fit |
step.size |
numbers of trees to add at each cycle |
max.trees |
max number of trees to fit before stopping |
tolerance.method |
method to use in deciding to stop - "fixed" or "auto" |
tolerance |
tolerance value to use - if method == fixed is absolute, if auto is multiplier * total mean deviance |
plot.main |
Logical. plot hold-out deviance curve |
plot.folds |
Logical. plot the individual folds as well |
verbose |
Logical. control amount of screen reporting |
silent |
Logical. to allow running with no output for simplifying model) |
keep.fold.models |
Logical. keep the fold models from cross valiation |
keep.fold.vector |
Logical. allows the vector defining fold membership to be kept |
keep.fold.fit |
Logical. allows the predicted values for observations from cross-validation to be kept |
... |
Logical. allows for any additional plotting parameters |
object of S3 class gbm
This and other boosted regression trees (BRT) functions in the dismo package do not work if you use only one predictor. There is an easy work around: make a dummy variable with a constant value and then fit a model with two predictors, the one of interest and the dummy variable, which will be ignored by the model fitting as it has no useful information.
John R. Leathwick and Jane Elith
Hastie, T., R. Tibshirani, and J.H. Friedman, 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81
data(Anguilla_train) # reduce data set to speed things up a bit Anguilla_train = Anguilla_train[1:200,] angaus.tc5.lr01 <- gbm.step(data=Anguilla_train, gbm.x = 3:14, gbm.y = 2, family = "bernoulli", tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5)
data(Anguilla_train) # reduce data set to speed things up a bit Anguilla_train = Anguilla_train[1:200,] angaus.tc5.lr01 <- gbm.step(data=Anguilla_train, gbm.x = 3:14, gbm.y = 2, family = "bernoulli", tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5)
A wrapper around the Google geocoding web-service. It returns 0 to n matches. It is important to be as precise as possible, e.g. always include the country in the locality description.
geocode(x, oneRecord=FALSE, extent=NULL, progress='', geocode_key, ...)
geocode(x, oneRecord=FALSE, extent=NULL, progress='', geocode_key, ...)
x |
A vector of locality descriptions |
oneRecord |
Logical. If |
extent |
An Extent object, or an object that can be coerced to one, to bias the search towards that region |
progress |
Character. Valid values are "" (no progress indicator), "text" or "window" |
geocode_key |
character. Your Google API key for geocoding (and billing). See https://developers.google.com/maps/documentation/geocoding/get-api-key) |
... |
additional arguments (currently none implemeted) |
data.frame
with the following fields:
originalPlace |
the locality description as provided (in argument |
interpretedPlace |
the locality as interpreted by the Google API |
lon |
longitude |
lat |
latitude |
lonmin |
minimum longitude of the bounding box |
lonmax |
maximum longitude of the bounding box |
latmin |
minimum latitude of the bounding box |
latmax |
maximum latitude of the bounding box |
uncertainty |
distance from |
It is important to compare fields originalPlace
and interpretedPlace
as the Google interpretation of a (perhaps vague) locality description can be very speculative
Robert J. Hijmans
## Not run: geocode(c('1600 Pennsylvania Ave NW, Washington DC', 'Luca, Italy', 'Kampala')) geocode(c('San Jose', 'San Jose, Mexico')) geocode(c('San Jose', 'San Jose, Mexico'), oneRecord=TRUE) ## End(Not run)
## Not run: geocode(c('1600 Pennsylvania Ave NW, Washington DC', 'Luca, Italy', 'Kampala')) geocode(c('San Jose', 'San Jose, Mexico')) geocode(c('San Jose', 'San Jose, Mexico'), oneRecord=TRUE) ## End(Not run)
The geographic distance model predicts that the likelyhood of presence is highest near places where a species has been observed. It can be used as a null-model to calibrate cross-validation scores with.
The predicted values are the inverse distance to the nearest known presence point. Distances smaller than or equal to zero are set to 1 (highest score).
geoDist(p, ...)
geoDist(p, ...)
p |
point locations (presence). Two column matrix, data.frame or SpatialPoints* object |
... |
Additional arguments. You must supply a lonlat= argument (logical), unless p is a SpatialPoints* object and has a valid CRS (coordinate reference system). You can also supply an additional argument 'a' for absence points (currently ignored). Argument 'a' should be of the same class as argument 'p' |
An object of class 'GeographicDistance' (inherits from DistModel-class
)
Robert J. Hijmans
predict, convHull, maxent, domain, mahal, voronoiHull, geoIDW
r <- raster(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) colnames(pts) <- c('x', 'y') train <- pts[1:12, ] test <- pts[13:20, ] gd <- geoDist(train, lonlat=FALSE) p <- predict(gd, r) ## Not run: plot(p) points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2) ## End(Not run)
r <- raster(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) colnames(pts) <- c('x', 'y') train <- pts[1:12, ] test <- pts[13:20, ] gd <- geoDist(train, lonlat=FALSE) p <- predict(gd, r) ## Not run: plot(p) points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2) ## End(Not run)
Retrieve a 'Google Map' that can be used as a background for plotting points and other spatial data.
The projection of the returned Raster object is "Mercator" (unless you use lonlat=TRUE
), and other spatial data may need to be transformed before it can be plotted on top of the Google map. You can use the Mercator
function to transform points from longitude/latitude to Mercator. For SpatialLines and SpatialPolygons objects, use spTransform
in the rgdal
package.
This function uses the Google static maps web-service, and is based on functions by Markus Loecher for the RgoogleMaps
package.
gmap(x, exp=1, type='terrain', filename='', style=NULL, scale=1, zoom=NULL, size=c(640, 640), rgb=FALSE, lonlat=FALSE, map_key, geocode_key, ...) Mercator(p, inverse = FALSE)
gmap(x, exp=1, type='terrain', filename='', style=NULL, scale=1, zoom=NULL, size=c(640, 640), rgb=FALSE, lonlat=FALSE, map_key, geocode_key, ...) Mercator(p, inverse = FALSE)
x |
a textual locality description, or an Extent object (with longitude/latitude coordinates), or an object that can be coerced to one (such as a Raster* or Spatial* object), in any (known) coordinate system |
exp |
numeric. An expansion factor to enlarge (by multiplication) the extent specified by |
type |
character. Choose from 'roadmap', 'satellite', 'hybrid', 'terrain' |
filename |
character. Filename (optional). You can open the resulting file in a GIS program |
style |
character. Additional style arguments. See https://developers.google.com/maps/documentation/maps-static/overview?csw=1#StyledMapFeatures.
Note that certain style features do not work in combination with (the default) |
scale |
1 or 2. Using 2 doubles the number of pixels returned (and thus gives you better image quality if you need a large image) |
zoom |
integer between 0 (the whole world) to 21 (very small area), centered on the center of the extent |
size |
vector of two integers indicating the number of columns and rows that is requested (what is returned depends on other factors as well). Maximum values are |
rgb |
logical. If |
lonlat |
logical. If |
map_key |
character. Your Google API key for mapping (and billing). See https://developers.google.com/maps/documentation/javascript/get-api-key) |
geocode_key |
character. Your Google API key for geocoding (and billing). Only relevant if |
... |
additional parameters |
p |
Points. A two-column matrix, or a SpatialPoints object |
inverse |
Should the inverse projection be done (from Mercator to longitude/latitude?) |
If argument x
is a textual locality description, the geocode
function is used to retrieve the extent that should be mapped.
Change the type to 'roadmap' if the map returned says "sorry we have no imagery here"; or use a larger extent.
The returned RasterLayer has a Mercator projection. To plot points (or lines or polygons) on top of it, these need to be transformed first.
A matrix of longitude/latitude data can be transformed with the Mercator function used in the example below. 'Spatial*' objects can be transformed with spTransform
p <- spTransform(x, "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs")
RasterLayer
Robert Hijmans and Sebastien Rochette, based on code by Markus Loecher, Sense Networks <markus at sensenetworks.com> in the RgoogleMaps package
## Not run: mymapkey = "pk-tHVbDiymfUL" mygeokey = "Skxe99-adfKeax" library(rgdal) # from a maxtrix with lon/lat points x <- runif(30)*10 + 40 y <- runif(30)*10 - 20 xy <- cbind(x, y) g <- gmap(xy, type='hybrid', map_key=mymapkey) plot(g, inter=TRUE) points(Mercator(xy) , col='red', pch=20) # or from an Extent object e <- extent( -121.9531 , -120.3897 , 35.36 , 36.61956 ) # you can also get an Extent object by clicking on the map twice after using: # drawExtent() r <- gmap(e, map_key=mymapkey) plot(r, interpolate=TRUE) # transform points to Mercator for plotting on top of map: pt <- matrix(c(-121, 36), ncol=2) ptm <- Mercator(pt) points(ptm, cex=3, pch=20, col='blue') Mercator(ptm, inverse=TRUE) # transform Spatial objects to Mercator for plotting on top of map # here for points, but particularly relevant for lines and polygons pt <- data.frame(pt) coordinates(pt) <- ~X1 + X2 proj4string(pt) <-"+proj=longlat +datum=WGS84 +ellps=WGS84" ptm2 <- spTransform(pt, CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs")) points(ptm, col='red', pch='x', cex=3) # get a map using names g = gmap('Australia', map_key=mymapkey, geocode_key=mygeokey) plot(g, inter=TRUE) gs = gmap('Sydney, New South Wales, Australia', type='satellite', map_key=mymapkey, geocode_key=mygeokey) plot(gs, inter=TRUE) gs = gmap('Sydney, Australia', type='satellite', exp=3, map_key=mymapkey, geocode_key=mygeokey) plot(gs, inter=TRUE) gs = gmap('Sydney, Australia', type='hybrid', zoom=10, scale=2, map_key=mymapkey, geocode_key=mygeokey) plot(gs, inter=TRUE) # styles: g <- gmap("Brooklyn", style="feature:road.local|element:geometry|hue:0x00ff00|saturation:100 &style=feature:landscape|element:geometry|lightness:-100", type='roadmap', map_key=mymapkey, geocode_key=mygeokey) plot(g) ## End(Not run)
## Not run: mymapkey = "pk-tHVbDiymfUL" mygeokey = "Skxe99-adfKeax" library(rgdal) # from a maxtrix with lon/lat points x <- runif(30)*10 + 40 y <- runif(30)*10 - 20 xy <- cbind(x, y) g <- gmap(xy, type='hybrid', map_key=mymapkey) plot(g, inter=TRUE) points(Mercator(xy) , col='red', pch=20) # or from an Extent object e <- extent( -121.9531 , -120.3897 , 35.36 , 36.61956 ) # you can also get an Extent object by clicking on the map twice after using: # drawExtent() r <- gmap(e, map_key=mymapkey) plot(r, interpolate=TRUE) # transform points to Mercator for plotting on top of map: pt <- matrix(c(-121, 36), ncol=2) ptm <- Mercator(pt) points(ptm, cex=3, pch=20, col='blue') Mercator(ptm, inverse=TRUE) # transform Spatial objects to Mercator for plotting on top of map # here for points, but particularly relevant for lines and polygons pt <- data.frame(pt) coordinates(pt) <- ~X1 + X2 proj4string(pt) <-"+proj=longlat +datum=WGS84 +ellps=WGS84" ptm2 <- spTransform(pt, CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs")) points(ptm, col='red', pch='x', cex=3) # get a map using names g = gmap('Australia', map_key=mymapkey, geocode_key=mygeokey) plot(g, inter=TRUE) gs = gmap('Sydney, New South Wales, Australia', type='satellite', map_key=mymapkey, geocode_key=mygeokey) plot(gs, inter=TRUE) gs = gmap('Sydney, Australia', type='satellite', exp=3, map_key=mymapkey, geocode_key=mygeokey) plot(gs, inter=TRUE) gs = gmap('Sydney, Australia', type='hybrid', zoom=10, scale=2, map_key=mymapkey, geocode_key=mygeokey) plot(gs, inter=TRUE) # styles: g <- gmap("Brooklyn", style="feature:road.local|element:geometry|hue:0x00ff00|saturation:100 &style=feature:landscape|element:geometry|lightness:-100", type='roadmap', map_key=mymapkey, geocode_key=mygeokey) plot(g) ## End(Not run)
Sample points from xy, using a grid (raster) as stratification. Up to n points are sampled from each stratum (cell). For "chessboard" sampling (i.e. sampling from half the cells), use the argument chess='black'
, or chess='white'
.
gridSample(xy, r, n=1, chess='')
gridSample(xy, r, n=1, chess='')
xy |
A two column matrix or data.frame with x and y coordinates (or longitude and latitude), or a SpatialPoints* object |
r |
Raster* object |
n |
Maximum number of samples per cell |
chess |
Character. ”, 'black', or 'white'. If 'black' or 'white', "chess-board" sampling is used. I.e. only the 'white' fields, or only the 'black' fields are sampled. Cell number 1 (the upper left corner of |
A two column matrix with x and y coordinates (or longitude and latitude)
Robert J. Hijmans
x <- rnorm(1000, 10, 5) y <- rnorm(1000, 50, 5) xy <- cbind(x,y) res <- 5 r <- raster(extent(range(xy[,1]), range(xy[,2])) + res) res(r) <- res samp <- gridSample(xy, r, n=1) plot(xy, cex=0.1) points(samp, pch='x', col='red')
x <- rnorm(1000, 10, 5) y <- rnorm(1000, 50, 5) xy <- cbind(x,y) res <- 5 r <- raster(extent(range(xy[,1]), range(xy[,2])) + res) res(r) <- res samp <- gridSample(xy, r, n=1) plot(xy, cex=0.1) points(samp, pch='x', col='red')
Inverse-distance weighted predictions for presence/absence data. Computed with the gstat function from the gstat package.
geoIDW(p, a, ...)
geoIDW(p, a, ...)
p |
Presence points. Two column matrix, data.frame, or a SpatialPoints* object |
a |
Absence points. Must be of the same class as p |
... |
Addtional arguments. None implemented |
An object of class InvDistWeightModel (inherits from DistModel-class
)
Robert J. Hijmans
r <- raster(system.file("external/rlogo.grd", package="raster")) # presence points p <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) # absence points a <- matrix(c(30, 23, 5, 5, 31, 33, 91, 63, 60, 88, 93, 97, 65, 68, 85, 97, 35, 32, 29, 55, 3, 8, 19, 71, 49, 36, 69, 41, 20, 28, 18, 9, 5, 9, 25, 71, 8, 32, 46, 60), ncol=2) idw <- geoIDW(p, a) prd <- predict(r, idw) ## Not run: plot(prd) points(p) points(a, pch='x') ## End(Not run)
r <- raster(system.file("external/rlogo.grd", package="raster")) # presence points p <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) # absence points a <- matrix(c(30, 23, 5, 5, 31, 33, 91, 63, 60, 88, 93, 97, 65, 68, 85, 97, 35, 32, 29, 55, 3, 8, 19, 71, 49, 36, 69, 41, 20, 28, 18, 9, 5, 9, 25, 71, 8, 32, 46, 60), ncol=2) idw <- geoIDW(p, a) prd <- predict(r, idw) ## Not run: plot(prd) points(p) points(a, pch='x') ## End(Not run)
k-fold partitioning of a data set for model testing purposes. Each record in a matrix (or similar data structure) is randomly assigned to a group. Group numbers are between 1 and k
.
kfold(x, k=5, by)
kfold(x, k=5, by)
x |
a vector, matrix, data.frame, or Spatial object |
k |
number of groups |
by |
Optional argument. A vector or factor with sub-groups (e.g. species). Its length should be the same as the number of records in x |
a vector with group assignments
Robert J. Hijmans
#library(disdat) #data(NSWtrain) ## a single species #srsp1 <- subset(NSWtrain, spid=='srsp1') #kfold(srsp1, k = 5) ## all species #k = kfold(NSWtrain, k=5, by=NSWtrain$spid) #k[NSWtrain$spid=='srsp1'] ## each group has the same number of records ##(except for adjustments if the number of records divided by k is not an integer) #table(k[NSWtrain$spid=='srsp1']) #k[NSWtrain$spid=='ousp5']
#library(disdat) #data(NSWtrain) ## a single species #srsp1 <- subset(NSWtrain, spid=='srsp1') #kfold(srsp1, k = 5) ## all species #k = kfold(NSWtrain, k=5, by=NSWtrain$spid) #k[NSWtrain$spid=='srsp1'] ## each group has the same number of records ##(except for adjustments if the number of records divided by k is not an integer) #table(k[NSWtrain$spid=='srsp1']) #k[NSWtrain$spid=='ousp5']
Distribution model based on the Mahalanobis distance. The predictions are (1-distance). I.e. the highest possible value is 1, and there will likely be large negative numbers.
mahal(x, p, ...)
mahal(x, p, ...)
x |
Raster* object or matrix |
p |
two column matrix or SpatialPoints* object |
... |
Additional arguments. Currently not used |
An object of class Mahalanobis (inherits from DistModel-class
)
Robert J. Hijmans
predict, maxent, bioclim, domain
logo <- stack(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(48.243420, 48.243420, 47.985820, 52.880230, 49.531423, 46.182616, 54.168232, 69.624263, 83.792291, 85.337894, 74.261072, 83.792291, 95.126713, 84.565092, 66.275456, 41.803408, 25.832176, 3.936132, 18.876962, 17.331359, 7.048974, 13.648543, 26.093446, 28.544714, 39.104026, 44.572240, 51.171810, 56.262906, 46.269272, 38.161230, 30.618865, 21.945145, 34.390047, 59.656971, 69.839163, 73.233228, 63.239594, 45.892154, 43.252326, 28.356155), ncol=2) # fit model m <- mahal(logo, pts) # make a prediction predict(m, logo[1]) x <- predict(m, logo) # or x <- predict(logo, m) via raster::predict # plot(x > 0)
logo <- stack(system.file("external/rlogo.grd", package="raster")) #presence data pts <- matrix(c(48.243420, 48.243420, 47.985820, 52.880230, 49.531423, 46.182616, 54.168232, 69.624263, 83.792291, 85.337894, 74.261072, 83.792291, 95.126713, 84.565092, 66.275456, 41.803408, 25.832176, 3.936132, 18.876962, 17.331359, 7.048974, 13.648543, 26.093446, 28.544714, 39.104026, 44.572240, 51.171810, 56.262906, 46.269272, 38.161230, 30.618865, 21.945145, 34.390047, 59.656971, 69.839163, 73.233228, 63.239594, 45.892154, 43.252326, 28.356155), ncol=2) # fit model m <- mahal(logo, pts) # make a prediction predict(m, logo[1]) x <- predict(m, logo) # or x <- predict(logo, m) via raster::predict # plot(x > 0)
Build a "MaxEnt" (Maximum Entropy) species distribution model (see references below). The function uses environmental data for locations of known presence and for a large number of 'background' locations. Environmental data can be extracted from raster files. The result is a model object that can be used to predict the suitability of other locations, for example, to predict the entire range of a species.
Background points are sampled randomly from the cells that are not NA
in the first predictor variable, unless background points are specified with argument a
.
This function uses the MaxEnt species distribution model software by Phillips, Dudik and Schapire.
## S4 method for signature 'Raster,ANY' maxent(x, p, a=NULL, factors=NULL, removeDuplicates=TRUE, nbg=10000, ...) ## S4 method for signature 'SpatialGridDataFrame,ANY' maxent(x, p, a=NULL, removeDuplicates=TRUE, nbg=10000, ...) ## S4 method for signature 'data.frame,vector' maxent(x, p, args=NULL, path, ...) ## S4 method for signature 'missing,missing' maxent(x, p, silent=FALSE, ...)
## S4 method for signature 'Raster,ANY' maxent(x, p, a=NULL, factors=NULL, removeDuplicates=TRUE, nbg=10000, ...) ## S4 method for signature 'SpatialGridDataFrame,ANY' maxent(x, p, a=NULL, removeDuplicates=TRUE, nbg=10000, ...) ## S4 method for signature 'data.frame,vector' maxent(x, p, args=NULL, path, ...) ## S4 method for signature 'missing,missing' maxent(x, p, silent=FALSE, ...)
x |
Predictors. Raster* object or SpatialGridDataFrame, containing grids with predictor variables. These will be used to extract values from for the point locations. |
p |
Occurrence data. This can be a data.frame, matrix, SpatialPoints* object, or a vector. If If |
a |
Background points. Only used if |
nbg |
Number of background points to use. These are sampled randomly from the cells that are not |
factors |
character. Which (if any) variables should be considered as categorical? Either by (layer)name or by index. Only used when argument 'x' is a Raster* object because it is not needed in other cases as you can set the appropriate class to the variables in the data.frame |
args |
charater. Additional argument that can be passed to MaxEnt. See the MaxEnt help for more information. The R maxent function only uses the arguments relevant to model fitting. There is no point in using args='outputformat=raw' when *fitting* the model; but you can use arguments relevant for *prediction* when using the predict function. Some other arguments do not apply at all to the R implementation. An example is 'outputfiletype', because the 'predict' function has its own 'filename' argument for that |
removeDuplicates |
Boolean. If |
path |
character. Optional argument to set where you want the MaxEnt output files to be stored. This allows you to permanently keep these files. If not supplied the MaxEnt files will be stored in a temporary file. These are the files that are shown in a browser when typing the model name or when you use "show(model)" |
silent |
Boolean. If |
... |
Additional arguments |
An object of class 'MaxEnt' (inherits from DistModel-class
). Or a 'MaxEntReplicates' object if you use 'replicates=' as part of the args
argument. If the function is run without any arguments a boolean value is returned (TRUE
if maxent.jar was found).
Steven Phillips and Robert J. Hijmans
Steven J. Phillips, Miroslav Dudik, Robert E. Schapire, 2004. A maximum entropy approach to species distribution modeling. Proceedings of the Twenty-First International Conference on Machine Learning: 655-662.
Steven J. Phillips, Robert P. Anderson, Robert E. Schapire, 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190:231-259.
Jane Elith, Steven J. Phillips, Trevor Hastie, Miroslav Dudik, Yung En Chee, Colin J. Yates, 2011. A statistical explanation of MaxEnt for ecologists. Diversity and Distributions 17:43-57. doi:10.1111/j.1472-4642.2010.00725.x
# test if you can use maxent maxent() if (maxent()) { # get predictor variables fnames <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''), pattern='grd', full.names=TRUE ) predictors <- stack(fnames) #plot(predictors) # file with presence points occurence <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='') occ <- read.table(occurence, header=TRUE, sep=',')[,-1] # witholding a 20% sample for testing fold <- kfold(occ, k=5) occtest <- occ[fold == 1, ] occtrain <- occ[fold != 1, ] # fit model, biome is a categorical variable me <- maxent(predictors, occtrain, factors='biome') # see the maxent results in a browser: me # use "args" # me2 <- maxent(predictors, occtrain, factors='biome', args=c("-J", "-P")) # plot showing importance of each variable plot(me) # response curves # response(me) # predict to entire dataset r <- predict(me, predictors) # with some options: # r <- predict(me, predictors, args=c("outputformat=raw"), progress='text', # filename='maxent_prediction.grd') plot(r) points(occ) #testing # background data bg <- randomPoints(predictors, 1000) #simplest way to use 'evaluate' e1 <- evaluate(me, p=occtest, a=bg, x=predictors) # alternative 1 # extract values pvtest <- data.frame(extract(predictors, occtest)) avtest <- data.frame(extract(predictors, bg)) e2 <- evaluate(me, p=pvtest, a=avtest) # alternative 2 # predict to testing points testp <- predict(me, pvtest) head(testp) testa <- predict(me, avtest) e3 <- evaluate(p=testp, a=testa) e3 threshold(e3) plot(e3, 'ROC') }
# test if you can use maxent maxent() if (maxent()) { # get predictor variables fnames <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''), pattern='grd', full.names=TRUE ) predictors <- stack(fnames) #plot(predictors) # file with presence points occurence <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='') occ <- read.table(occurence, header=TRUE, sep=',')[,-1] # witholding a 20% sample for testing fold <- kfold(occ, k=5) occtest <- occ[fold == 1, ] occtrain <- occ[fold != 1, ] # fit model, biome is a categorical variable me <- maxent(predictors, occtrain, factors='biome') # see the maxent results in a browser: me # use "args" # me2 <- maxent(predictors, occtrain, factors='biome', args=c("-J", "-P")) # plot showing importance of each variable plot(me) # response curves # response(me) # predict to entire dataset r <- predict(me, predictors) # with some options: # r <- predict(me, predictors, args=c("outputformat=raw"), progress='text', # filename='maxent_prediction.grd') plot(r) points(occ) #testing # background data bg <- randomPoints(predictors, 1000) #simplest way to use 'evaluate' e1 <- evaluate(me, p=occtest, a=bg, x=predictors) # alternative 1 # extract values pvtest <- data.frame(extract(predictors, occtest)) avtest <- data.frame(extract(predictors, bg)) e2 <- evaluate(me, p=pvtest, a=avtest) # alternative 2 # predict to testing points testp <- predict(me, pvtest) head(testp) testa <- predict(me, avtest) e3 <- evaluate(p=testp, a=testa) e3 threshold(e3) plot(e3, 'ROC') }
Compute multivariate environmental similarity surfaces (MESS), as described by Elith et al., 2010
mess(x, v, full=FALSE, filename='', ...)
mess(x, v, full=FALSE, filename='', ...)
x |
Raster* object |
v |
matrix or data.frame containing the reference values. Each column should correspond to one layer of the Raster* object |
full |
logical. If |
filename |
character. Output filename (optional) |
... |
additional arguments as for |
v
can be obtained for a set of points using extract
.
A RasterBrick with layers corresponding to the input layers and an additional layer with the mess values (if full=TRUE
and nlayers(x) > 1
) or a RasterLayer with the MESS values (if full=FALSE
).
Jean-Pierre Rossi <[email protected]>, Robert Hijmans, Paulo van Breugel
Elith J., M. Kearney M., and S. Phillips, 2010. The art of modelling range-shifting species. Methods in Ecology and Evolution 1:330-342.
set.seed(9) r <- raster(ncol=10, nrow=10) r1 <- setValues(r, (1:ncell(r))/10 + rnorm(ncell(r))) r2 <- setValues(r, (1:ncell(r))/10 + rnorm(ncell(r))) r3 <- setValues(r, (1:ncell(r))/10 + rnorm(ncell(r))) s <- stack(r1,r2,r3) names(s) <- c('a', 'b', 'c') xy <- cbind(rep(c(10,30,50), 3), rep(c(10,30,50), each=3)) refpt <- extract(s, xy) ms <- mess(s, refpt, full=TRUE) plot(ms) ## Not run: filename <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='') bradypus <- read.table(filename, header=TRUE, sep=',') bradypus <- bradypus[,2:3] files <- list.files(path=paste(system.file(package="dismo"),'/ex', sep=''), pattern='grd', full.names=TRUE ) predictors <- stack(files) predictors <- dropLayer(x=predictors,i=9) reference_points <- extract(predictors, bradypus) mss <- mess(x=predictors, v=reference_points, full=TRUE) plot(mss) ## End(Not run)
set.seed(9) r <- raster(ncol=10, nrow=10) r1 <- setValues(r, (1:ncell(r))/10 + rnorm(ncell(r))) r2 <- setValues(r, (1:ncell(r))/10 + rnorm(ncell(r))) r3 <- setValues(r, (1:ncell(r))/10 + rnorm(ncell(r))) s <- stack(r1,r2,r3) names(s) <- c('a', 'b', 'c') xy <- cbind(rep(c(10,30,50), 3), rep(c(10,30,50), each=3)) refpt <- extract(s, xy) ms <- mess(s, refpt, full=TRUE) plot(ms) ## Not run: filename <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='') bradypus <- read.table(filename, header=TRUE, sep=',') bradypus <- bradypus[,2:3] files <- list.files(path=paste(system.file(package="dismo"),'/ex', sep=''), pattern='grd', full.names=TRUE ) predictors <- stack(files) predictors <- dropLayer(x=predictors,i=9) reference_points <- extract(predictors, bradypus) mss <- mess(x=predictors, v=reference_points, full=TRUE) plot(mss) ## End(Not run)
Class to store results of model cross-validation with presence/absence (0/1) data
presence
:presence data used
absence
:absence data used
np
:number of presence points
na
:number of absence points
auc
:Area under the receiver operator (ROC) curve
pauc
:p-value for the AUC (for the Wilcoxon test W statistic
cor
:Correlation coefficient
pcor
:p-value for correlation coefficient
t
:vector of thresholds used to compute confusion matrices
confusion
:confusion matrices
prevalence
:Prevalence
ODP
:Overall diagnostic power
CCR
:Correct classification rate
TPR
:True positive rate
TNR
:True negative rate
FPR
:False positive rate
FNR
:False negative rate
PPP
:Positive predictive power
NPP
:Negative predictive power
MCR
:Misclassification rate
OR
:Odds-ratio
kappa
:Cohen's kappa
Robert J. Hijmans
Fielding, A. H. & J.F. Bell, 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24: 38-49
Liu, C., M. White & G. Newell, 2011. Measuring and comparing the accuracy of species distribution models with presence-absence data. Ecography 34: 232-243.
Compute niche equivalency for two species following Warren et al. (2009). The statistic ranges from 0 to 1.
nicheEquivalency(sp1, sp2, predictors, n=99, model=maxent, verbose=TRUE, ...)
nicheEquivalency(sp1, sp2, predictors, n=99, model=maxent, verbose=TRUE, ...)
sp1 |
coordinates for species 1 (matrix with (x, y) or (lon, lat), or SpatialPoints) |
sp2 |
coordinates for species 2 (matrix with (x, y) or (lon, lat), or SpatialPoints) |
predictors |
Raster object with environmental variables |
n |
integer. Number of randomizations |
model |
function. modeling algorithm to me used |
verbose |
logical. If |
... |
additional arguments (none) |
numeric
Brian Anacker. Based on a similar function in by Christoph Heibl in the phyloclim package
Warren, D.L., R.E. Glor, M. Turelli, 2008. Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution 62:2868-2883.
Compute niche overlap from predictions of species distributions with the 'I' or 'D' similarity statistic of Warren et al. (2009). The statistic ranges from 0 (no overlap) to 1 (the distributions are identical).
nicheOverlap(x, y, stat='I', mask=TRUE, checkNegatives=TRUE)
nicheOverlap(x, y, stat='I', mask=TRUE, checkNegatives=TRUE)
x |
RasterLayer with non-negative values (predictions of the probability that a site is suitable for a species) |
y |
RasterLayer with non-negative values, as above |
stat |
character either 'I' or 'D' to get the statistic with that name |
mask |
logical. If |
checkNegatives |
logical. If |
numeric
Based on SDMTools::Istat by Jeremy VanDerWal
Warren, D.L., R.E. Glor, M. Turelli, and D. Funk. 2009. Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution 62:2868-2883; Erratum: Evolution 65: 1215
r1 <- raster(nr=18, nc=36) r2 <- raster(nr=18, nc=36) set.seed(0) r1[] <- runif(ncell(r1)) r2[] <- runif(ncell(r1)) nicheOverlap(r1, r2)
r1 <- raster(nr=18, nc=36) r2 <- raster(nr=18, nc=36) set.seed(0) r1[] <- runif(ncell(r1)) r2[] <- runif(ncell(r1)) nicheOverlap(r1, r2)
Pair plots of presence and absence (background) data.
pairs(x, v=NULL, pa='pa', hist=TRUE, cor=TRUE)
x |
Object of class DistModel or derived from that class (such as 'MaxEnt', 'Bioclim') | |
v |
numeric, to select a subset of pairs, e.g. v=1:3 to plot only the first three variables | |
pa |
Character. Either 'pa', 'p', or 'a' to show presence and absence, presence, or absence data respectively | |
hist |
logical. If TRUE a histogram of the values is shown on the diagonal |
|
cor |
logical. If TRUE the correlation coefficient is shown in the upper panels |
|
Robert J. Hijmans
Plot predictor values for occurrence (presence and absence) data in a DistModel (or derived) object.
usage: plot(x, y, ...)
x |
Object of class DistModel or from a class that inherits from it | |
y |
missing | |
... |
Additional arguments that can be passed to plot | |
Robert J. Hijmans
Extract values from a Raster* object for point locations. This function adds a few options that can be useful in the context of species distribution modeling to extract
function in the raster package.
pointValues(x, p, a, uniquecells = TRUE, na.rm = TRUE)
pointValues(x, p, a, uniquecells = TRUE, na.rm = TRUE)
x |
A Raster* object |
p |
Points. Two-colum matrix or data.frame; or a SpatialPoints* object |
a |
Additional points (absences). Two-colum matrix or data.frame; or a SpatialPoints* object |
uniquecells |
Logical. If |
na.rm |
Logical. If |
matrix
Robert J. Hijmans
Make a RasterLayer with a prediction based on a model object of class the inherits from 'DistModel', including: Bioclim, Domain, MaxEnt, Mahalanobis, and GeographicDistance. Predictions with model objects that do not inherit from DistModel can be made using the similar predict
function in the 'raster' package.
Provide a Raster* object with the independent variables. The names of the layers in the Raster* object should include those expected by the model.
A RasterLayer or, (if x
is a matrix), a vector.
predict(object, x, ext=NULL, filename='', progress='text', ...)
object |
A fitted model of class Bioclim, Domain, MaxEnt, ConvexHull, or Mahalanobis (classes that inherit from DistModel) | |
x |
A Raster* object or a data.frame | |
ext |
An extent object to limit the prediction to a sub-region of 'x'. Or an object that can be coerced to an Extent object by extent; such as a Raster* or Spatial* object | |
filename |
Output filename for a new raster; if NA the result is not written to a file but returned with the RasterLayer object, in the data slot | |
progress |
Character. Valid values are "" (no progress bar), "text" and "windows" (on that platform only) | |
... |
Additional model specific arguments. And additional arguments for file writing as for writeRaster | |
For maxent models, there is an additional argument 'args'
used to pass arguments (options) to the maxent software. See the help page for maxent
for more information.
For bioclim models, there is an additional argument 'tails'
which you can use to ignore the left or right tail of the percentile distribution for a variable. If supplied, tails should be a character vector with a length equal to the number of variables used in the model. Valid values are "both" (the default), "low" and "high". For example, if you have a variable x with an observed distribution between 10 and 20 and you are predicting the bioclim value for a value 25, the default result would be zero (outside of all observed values); but if you use tail='low', the high (right) tail is ignored and the value returned will be 1.
For geoDist models, there is an additional argument fun
that allows you to use your own (inverse) distance function, and argument scale=1
that allows you to scale the values (distances smaller than this value become one, and the others are divided by this value before computing the inverse distance).
Robert J. Hijmans
For spatial predictions with GLM, GAM, BRT, randomForest, etc., see predict in the Raster package.
To fit a model that can be used with this predict method, see maxent, bioclim, mahal, domain, geoDist, convHull
Extent object: extent
logo <- stack(system.file("external/rlogo.grd", package="raster")) pts <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85, 66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31, 22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2) b <- bioclim(logo, pts) # prediction for a sub-region e <- extent(30,90,20,60) p <- predict(b, logo, progress='text', ext=e) plot(p)
logo <- stack(system.file("external/rlogo.grd", package="raster")) pts <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85, 66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31, 22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2) b <- bioclim(logo, pts) # prediction for a sub-region e <- extent(30,90,20,60) p <- predict(b, logo, progress='text', ext=e) plot(p)
Simple helper function to prepare data for model fitting
prepareData(x, p, b, factors, xy=FALSE)
prepareData(x, p, b, factors, xy=FALSE)
x |
Raster* object |
p |
presence points |
b |
background (or absence) points |
factors |
vectors indicating which variables are factors (using layer names or numbers) |
xy |
logical. If |
data.frame with nlayers(x)+1
columns and nrow(p) + nrow(b)
rows. The first column, 'pb' indicates whether a record represents presence '1' or background '0' values. The other columns have the values from the Raster* object.
Robert J. Hijmans
Select pairs of points from two sets (without replacement) that have a similar distance to their nearest point in another set of points.
For each point in "fixed
", a point is selected from "sample
" that has a similar distance (as defined by threshold
) to its nearest point in "reference
" (note that these are likely to be different points in reference
). The select point is either the nearest point nearest=TRUE
, or a randomly select point nearest=FALSE
that is within the threshold distance. If no point within the threshold distance is found in sample
, the point in fixed
is dropped.
Hijmans (2012) proposed this sampling approach to remove 'spatial sorting bias' (ssb
) from evaluation data used in cross-validation of presence-only species distribution models. In that context, fixed
are the testing-presence points, sample
the testing-absence (or testing-background) points, and reference
the training-presence points.
pwdSample(fixed, sample, reference, tr=0.33, nearest=TRUE, n=1, lonlat=TRUE, warn=TRUE)
pwdSample(fixed, sample, reference, tr=0.33, nearest=TRUE, n=1, lonlat=TRUE, warn=TRUE)
fixed |
two column matrix (x, y) or (longitude/latitude) or SpatialPoints object, for point locations for which a pair should be found in |
sample |
as above for point locations from which to sample to make a pair with a point from |
reference |
as above for reference point locations to which distances are computed |
n |
How many pairs do you want for each point in |
tr |
Numeric, normally below 1. The threshold distance for a pair of points (one of |
nearest |
Logical. If |
lonlat |
Logical. Use |
warn |
Logical. If |
A matrix of nrow(fixed) and ncol(n), that indicates, for each point (row) in fixed
which point(s) in sample
it is paired to; or NA
if no suitable pair was available.
Robert J. Hijmans
Hijmans, R.J., 2012. Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null-model. Ecology 93: 679-688
ref <- matrix(c(-54.5,-38.5, 2.5, -9.5, -45.5, 1.5, 9.5, 4.5, -10.5, -10.5), ncol=2) fix <- matrix(c(-56.5, -30.5, -6.5, 14.5, -25.5, -48.5, 14.5, -2.5, 14.5, -11.5, -17.5, -11.5), ncol=2) r <- raster() extent(r) <- c(-110, 110, -45, 45) r[] <- 1 set.seed(0) sam <- randomPoints(r, n=50) par(mfrow=c(1,2)) plot(sam, pch='x') points(ref, col='red', pch=18, cex=2) points(fix, col='blue', pch=20, cex=2) i <- pwdSample(fix, sam, ref, lonlat=TRUE) i sfix <- fix[!is.na(i), ] ssam <- sam[i[!is.na(i)], ] ssam plot(sam, pch='x', cex=0) points(ssam, pch='x') points(ref, col='red', pch=18, cex=2) points(sfix, col='blue', pch=20, cex=2) # try to get 3 pairs for each point in 'fixed' pwdSample(fix, sam, ref, lonlat=TRUE, n=3)
ref <- matrix(c(-54.5,-38.5, 2.5, -9.5, -45.5, 1.5, 9.5, 4.5, -10.5, -10.5), ncol=2) fix <- matrix(c(-56.5, -30.5, -6.5, 14.5, -25.5, -48.5, 14.5, -2.5, 14.5, -11.5, -17.5, -11.5), ncol=2) r <- raster() extent(r) <- c(-110, 110, -45, 45) r[] <- 1 set.seed(0) sam <- randomPoints(r, n=50) par(mfrow=c(1,2)) plot(sam, pch='x') points(ref, col='red', pch=18, cex=2) points(fix, col='blue', pch=20, cex=2) i <- pwdSample(fix, sam, ref, lonlat=TRUE) i sfix <- fix[!is.na(i), ] ssam <- sam[i[!is.na(i)], ] ssam plot(sam, pch='x', cex=0) points(ssam, pch='x') points(ref, col='red', pch=18, cex=2) points(sfix, col='blue', pch=20, cex=2) # try to get 3 pairs for each point in 'fixed' pwdSample(fix, sam, ref, lonlat=TRUE, n=3)
Null model based on randomization of locations as suggested by Raes and ter Steege (2007).
nullRandom(x, model, n=25, rep=25, pa=FALSE)
nullRandom(x, model, n=25, rep=25, pa=FALSE)
x |
data.frame with environmental predictor values for collecting localities |
model |
Model function that creates a model of class 'DistModel' |
n |
Sample size |
rep |
Number of repetitions |
pa |
Boolean. Prensence-only or presence/background model (e.g. Maxent) |
List with n
object of class ModelEvaluation-class
Robert J. Hijmans
Raes, N. & H. ter Steege, 2007. A null-model for significance testing of presence-only species distribution models. Ecography 30:727-736.
predictors <- stack(list.files(path=paste(system.file(package="dismo"), '/ex', sep=''), pattern='grd', full.names=TRUE )) occurence <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='') occ <- read.table(occurence, header=TRUE, sep=',')[,-1] x <- extract(predictors, occ) nr <- nullRandom(x, bioclim, n=25, rep=25, pa=FALSE) mean(sapply(nr, function(x)x@auc))
predictors <- stack(list.files(path=paste(system.file(package="dismo"), '/ex', sep=''), pattern='grd', full.names=TRUE )) occurence <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='') occ <- read.table(occurence, header=TRUE, sep=',')[,-1] x <- extract(predictors, occ) nr <- nullRandom(x, bioclim, n=25, rep=25, pa=FALSE) mean(sapply(nr, function(x)x@auc))
Generate random points that can be used to extract background values ("random-absence"). The points are sampled (without replacement) from the cells that are not 'NA
' in raster 'mask
'.
If the coordinate reference system (of mask
) is longitude/latitude, sampling is weighted by the size of the cells. That is, because cells close to the equator are larger than cells closer to the poles, equatorial cells have a higher probability of being selected.
randomPoints(mask, n, p, ext=NULL, extf=1.1, excludep=TRUE, prob=FALSE, cellnumbers=FALSE, tryf=3, warn=2, lonlatCorrection=TRUE)
randomPoints(mask, n, p, ext=NULL, extf=1.1, excludep=TRUE, prob=FALSE, cellnumbers=FALSE, tryf=3, warn=2, lonlatCorrection=TRUE)
mask |
Raster* object. If the object has cell values, cells with |
n |
integer. Number of points |
p |
Presence points (if provided, random points won't be in the same cells (as defined by mask) |
ext |
Extent object. Can be used to restrict sampling to a spatial extent |
extf |
numeric. Multiplyer to adjust the size of extent 'ext'. The default increases of 1.1 increases the extent a little (5% at each side of the extent) |
excludep |
logical. If |
prob |
logical. If |
cellnumbers |
logical. If |
tryf |
numeric > 1. Multiplyer used for initial sample size from which the requested sample size is extracted after removing NA points (outside of mask) |
warn |
integer. 2 or higher gives most warnings. 0 or lower gives no warnings if sample size |
lonlatCorrection |
logical. If |
matrix with coordinates, or, if cellnumbers=TRUE
, a vector with cell numbers.
Robert J. Hijmans
The Rectangular Hull model predicts that a species is present at sites inside the minimum (rotated) bounding rectangle of a set of training points, and absent outside that rectangle.
rectHull(p, ...)
rectHull(p, ...)
p |
point locations (presence). Two column matrix, data.frame or SpatialPoints* object |
... |
Additional arguments. See details |
You can supply an argument n (>= 1) to get n hulls around subset of the points. This uses k-means to form clusters. To reproduce the clusters you may need to use set.seed
.
An object of class 'RectangularHull' (inherits from DistModel-class
)
Robert J. Hijmans. Using an algorithm by whuber and Bangyou on gis.stackexchange.com
predict, circleHull, convHull, maxent, domain, mahal
r <- raster(system.file("external/rlogo.grd", package="raster")) # presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) train <- pts[1:12, ] test <- pts[13:20, ] rh <- rectHull(train) predict(rh, test) plot(r) plot(rh, border='red', lwd=2, add=TRUE) points(train, col='red', pch=20, cex=2) points(test, col='black', pch=20, cex=2) pr <- predict(rh, r, progress='') plot(pr) points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2)
r <- raster(system.file("external/rlogo.grd", package="raster")) # presence data pts <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) train <- pts[1:12, ] test <- pts[13:20, ] rh <- rectHull(train) predict(rh, test) plot(r) plot(rh, border='red', lwd=2, add=TRUE) points(train, col='red', pch=20, cex=2) points(test, col='black', pch=20, cex=2) pr <- predict(rh, r, progress='') plot(pr) points(test, col='black', pch=20, cex=2) points(train, col='red', pch=20, cex=2)
Generate 'response plots', i.e. single variable response curves for a model
response(x, ...)
response(x, ...)
x |
Model object that inherits from 'DistModel', e.g. 'MaxEnt'. Also works for some other models (e.g. GLM) |
... |
Additional arguments. See Details |
var |
Variable to be plotted (if NULL, all variables will be plotted) | |
at |
Function to indicate at what level the other variables should be. E.g. median (the default), mean, min, max. Note that currently the function fails when 'mean' is used and one of the variables is a factor. Instead of a function you can also supply a single-row matrix with values for each of the variables. If NULL, partial response plots as are computed (this can take a lot of time). Partial response plots show, for each variable, the average prediction for all data points that were used to fit the model (or other data, if supplied), for 10 regularly spaced values of the variable. | |
range |
'pa' (default) or 'p'. Show responses for the range of the presence data (p) or presence and absence (background) data (pa) for models that use absence data. | |
expand |
percentage to expand the range of values with. Default is 10 | |
rug |
Logical. If TRUE (the default) a 'rug' of deciles is plotted on the horizontal axes) |
|
data |
data.frame with data to use, normally this is the data used to fit the model and does not needs to be supplied as they are available from the model object | |
fun |
predict function. The default is "predict"; but you may change this to e.g., function(x, y, ...) predict(x, y, type='response', ...). | |
... |
Additional graphical parameters | |
Used for the side-effect of a plot
Robert J. Hijmans
Determine "spatial sorting bias", or the difference between two point data sets in the average distance to the nearest point in a reference dataset.
ssb(p, a, reference, lonlat=TRUE, avg=TRUE)
ssb(p, a, reference, lonlat=TRUE, avg=TRUE)
p |
two column matrix (x, y) or (longitude/latitude) or SpatialPoints object, for point locations |
a |
two column matrix (x, y) or (longitude/latitude) or SpatialPoints object, for point locations |
reference |
as above for reference point locations to which distances are computed |
lonlat |
Logical. Use |
avg |
Logical. If |
matrix with two values. 'dp': the average distance from a point in p
to the nearest point in reference
and 'da': the average distance from a point in a
to the nearest point in reference
.
Distance is in meters if lonlat=TRUE
, and in mapunits (typically also meters) if lonlat=FALSE
Robert J. Hijmans
Hijmans, R.J., 2012. Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null-model. Ecology 93: 679-688.
ref <- matrix(c(-54.5,-38.5, 2.5, -9.5, -45.5, 1.5, 9.5, 4.5, -10.5, -10.5), ncol=2) p <- matrix(c(-56.5, -30.5, -6.5, 14.5, -25.5, -48.5, 14.5, -2.5, 14.5, -11.5, -17.5, -11.5), ncol=2) r <- raster() extent(r) <- c(-110, 110, -45, 45) r[] <- 1 set.seed(0) a <- randomPoints(r, n=50) b <- ssb(p, a, ref) # distances in km b / 1000 # an index of spatial sorting bias (1 is no bias, near 0 is extreme bias) b[1] / b[2]
ref <- matrix(c(-54.5,-38.5, 2.5, -9.5, -45.5, 1.5, 9.5, 4.5, -10.5, -10.5), ncol=2) p <- matrix(c(-56.5, -30.5, -6.5, 14.5, -25.5, -48.5, 14.5, -2.5, 14.5, -11.5, -17.5, -11.5), ncol=2) r <- raster() extent(r) <- c(-110, 110, -45, 45) r[] <- 1 set.seed(0) a <- randomPoints(r, n=50) b <- ssb(p, a, ref) # distances in km b / 1000 # an index of spatial sorting bias (1 is no bias, near 0 is extreme bias) b[1] / b[2]
Find a threshold (cut-off) to transform model predictions (probabilities, distances, or similar values) to a binary score (presence or absence).
## S4 method for signature 'ModelEvaluation' threshold(x, stat='', sensitivity=0.9, ...)
## S4 method for signature 'ModelEvaluation' threshold(x, stat='', sensitivity=0.9, ...)
x |
A ModelEvaluation object (see |
stat |
character. To select a particular threshold (see section 'value' for possible values) |
sensitivity |
numeric between 0 and 1. For the fixed sensitivity threshold |
... |
Additional arguments. None implemented |
data.frame with the following columns:
kappa: the threshold at which kappa is highest ("max kappa")
spec_sens: the threshold at which the sum of the sensitivity (true positive rate) and specificity (true negative rate) is highest
no_omission: the highest threshold at which there is no omission
prevalence: modeled prevalence is closest to observed prevalence
equal_sens_spec: equal sensitivity and specificity
sensitivty: fixed (specified) sensitivity
Robert J. Hijmans and Diego Nieto-Lugilde
## See ?maxent for an example with real data. # this is a contrived example: # p has the predicted values for 50 known cases (locations) # with presence of the phenomenon (species) p <- rnorm(50, mean=0.7, sd=0.3) # b has the predicted values for 50 background locations (or absence) a <- rnorm(50, mean=0.4, sd=0.4) e <- evaluate(p=p, a=a) threshold(e)
## See ?maxent for an example with real data. # this is a contrived example: # p has the predicted values for 50 known cases (locations) # with presence of the phenomenon (species) p <- rnorm(50, mean=0.7, sd=0.3) # b has the predicted values for 50 background locations (or absence) a <- rnorm(50, mean=0.4, sd=0.4) e <- evaluate(p=p, a=a) threshold(e)
Create Voronoi polygons for a set of points. (These are also known Thiessen polygons, and Nearest Neighbor polygons; and the technique used is referred to as Delauny triangulation.)
## S4 method for signature 'ANY' voronoi(x, ext, eps=1e-09, ...)
## S4 method for signature 'ANY' voronoi(x, ext, eps=1e-09, ...)
x |
SpatialPoints* or two column matrix with x and y coordinates |
ext |
Extent. Can be used to set the corners of the rectangular window enclosing the triangulation. The default is the data range plus 10 percent. See |
eps |
Numerical tolerance used in triangulation. See |
... |
Additional arguments (none) |
SpatialPolygonsDataFrame
This method is based on the link[deldir]{deldir}
function by Rolf Turner and code by Carson Farmer
# points p <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) v <- voronoi(p) v
# points p <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) v <- voronoi(p) v
Voronoi polygons for presence/absence data
## S4 method for signature 'matrix,matrix' voronoiHull(p, a, ext=NULL, dissolve=FALSE, crs=NA, ...) ## S4 method for signature 'data.frame,data.frame' voronoiHull(p, a, ext=NULL, dissolve=FALSE, crs=NA, ...) ## S4 method for signature 'SpatialPoints,SpatialPoints' voronoiHull(p, a, ext=NULL, dissolve=FALSE, ...)
## S4 method for signature 'matrix,matrix' voronoiHull(p, a, ext=NULL, dissolve=FALSE, crs=NA, ...) ## S4 method for signature 'data.frame,data.frame' voronoiHull(p, a, ext=NULL, dissolve=FALSE, crs=NA, ...) ## S4 method for signature 'SpatialPoints,SpatialPoints' voronoiHull(p, a, ext=NULL, dissolve=FALSE, ...)
p |
Presence points. Two column |
a |
Absence points. Must be of the same class as |
ext |
Extent to limit or expand the area of interest |
dissolve |
Boolean. Dissolve (aggregate) polygons? |
crs |
character or CRS object. PROJ.4 notation coordinate reference system |
... |
Additional arguments passed to |
A VoronoiHull object (inherits from DistModel-class
)
This function is only correct when using a planar coordinate reference system (not longitude/latitude).
Robert J. Hijmans
r <- raster(system.file("external/rlogo.grd", package="raster")) # presence points p <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) # absence points a <- matrix(c(30, 23, 5, 5, 31, 33, 91, 63, 60, 88, 93, 97, 65, 68, 85, 97, 35, 32, 29, 55, 3, 8, 19, 71, 49, 36, 69, 41, 20, 28, 18, 9, 5, 9, 25, 71, 8, 32, 46, 60), ncol=2) v <- voronoiHull(p, a) x <- predict(r, v) ## Not run: plot(x) points(p, col='black', pch=20, cex=2) points(a, col='red', pch=20, cex=2) ## End(Not run)
r <- raster(system.file("external/rlogo.grd", package="raster")) # presence points p <- matrix(c(17, 42, 85, 70, 19, 53, 26, 84, 84, 46, 48, 85, 4, 95, 48, 54, 66, 74, 50, 48, 28, 73, 38, 56, 43, 29, 63, 22, 46, 45, 7, 60, 46, 34, 14, 51, 70, 31, 39, 26), ncol=2) # absence points a <- matrix(c(30, 23, 5, 5, 31, 33, 91, 63, 60, 88, 93, 97, 65, 68, 85, 97, 35, 32, 29, 55, 3, 8, 19, 71, 49, 36, 69, 41, 20, 28, 18, 9, 5, 9, 25, 71, 8, 32, 46, 60), ncol=2) v <- voronoiHull(p, a) x <- predict(r, v) ## Not run: plot(x) points(p, col='black', pch=20, cex=2) points(a, col='red', pch=20, cex=2) ## End(Not run)