Title: | Spatiotemporal Resampling Methods for 'mlr3' |
---|---|
Description: | Extends the mlr3 machine learning framework with spatio-temporal resampling methods to account for the presence of spatiotemporal autocorrelation (STAC) in predictor variables. STAC may cause highly biased performance estimates in cross-validation if ignored. A JSS article is available at <doi:10.18637/jss.v111.i07>. |
Authors: | Patrick Schratz [aut, cre] , Marc Becker [aut] , Jannes Muenchow [ctb] , Michel Lang [ctb] |
Maintainer: | Patrick Schratz <[email protected]> |
License: | LGPL-3 |
Version: | 2.3.2 |
Built: | 2024-11-29 16:18:17 UTC |
Source: | CRAN |
Extends the mlr3 machine learning framework with spatio-temporal resampling methods to account for the presence of spatiotemporal autocorrelation (STAC) in predictor variables. STAC may cause highly biased performance estimates in cross-validation if ignored. A JSS article is available at doi:10.18637/jss.v111.i07.
Book on mlr3: https://mlr3book.mlr-org.com
mlr3book section about spatiotemporal data: https://mlr3book.mlr-org.com/chapters/chapter13/beyond_regression_and_classification.html#spatiotemp-cv
package vignettes: https://mlr3spatiotempcv.mlr-org.com/dev/articles/
Use cases and examples: https://mlr3gallery.mlr-org.com
More classification and regression tasks: mlr3data
More classification and regression learners: mlr3learners
Even more learners: https://github.com/mlr-org/mlr3extralearners
Preprocessing and machine learning pipelines: mlr3pipelines
Tuning of hyperparameters: mlr3tuning
Visualizations for many mlr3 objects: mlr3viz
Survival analysis and probabilistic regression: mlr3proba
Cluster analysis: mlr3cluster
Feature selection filters: mlr3filters
Feature selection wrappers: mlr3fselect
Interface to real (out-of-memory) data bases: mlr3db
Performance measures as plain functions: mlr3measures
Parallelization framework: future
Progress bars: progressr
Maintainer: Patrick Schratz [email protected] (ORCID)
Authors:
Marc Becker [email protected] (ORCID)
Other contributors:
Jannes Muenchow [email protected] (ORCID) [contributor]
Michel Lang [email protected] (ORCID) [contributor]
Schratz P, Muenchow J, Iturritxa E, Richter J, Brenning A (2019). “Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data.” Ecological Modelling, 406, 109–120. doi:10.1016/j.ecolmodel.2019.06.002.
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. doi:10.1101/357798.
Meyer H, Reudenbach C, Hengl T, Katurji M, Nauss T (2018). “Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation.” Environmental Modelling & Software, 101, 1–9. doi:10.1016/j.envsoft.2017.12.001.
Zhao Y, Karypis G (2002). “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” 11th Conference of Information and Knowledge Management (CIKM), 51-524. doi:10.1145/584792.584877.
Useful links:
Report bugs at https://github.com/mlr-org/mlr3spatiotempcv/issues
Convert an object to a TaskClassifST. This is a S3 generic for the following objects:
TaskClassifST: Ensure the identity.
data.frame()
and mlr3::DataBackend: Provides an alternative to the
constructor of TaskClassifST.
sf::sf: Extracts spatial meta data before construction.
mlr3::TaskRegr: Calls mlr3::convert_task()
.
as_task_classif_st(x, ...) ## S3 method for class 'TaskClassifST' as_task_classif_st(x, clone = FALSE, ...) ## S3 method for class 'data.frame' as_task_classif_st( x, target, id = deparse(substitute(x)), positive = NULL, coordinate_names, crs = NA_character_, coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'DataBackend' as_task_classif_st( x, target, id = deparse(substitute(x)), positive = NULL, coordinate_names, crs, coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'sf' as_task_classif_st( x, target = NULL, id = deparse(substitute(x)), positive = NULL, coords_as_features = FALSE, label = NA_character_, ... )
as_task_classif_st(x, ...) ## S3 method for class 'TaskClassifST' as_task_classif_st(x, clone = FALSE, ...) ## S3 method for class 'data.frame' as_task_classif_st( x, target, id = deparse(substitute(x)), positive = NULL, coordinate_names, crs = NA_character_, coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'DataBackend' as_task_classif_st( x, target, id = deparse(substitute(x)), positive = NULL, coordinate_names, crs, coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'sf' as_task_classif_st( x, target = NULL, id = deparse(substitute(x)), positive = NULL, coords_as_features = FALSE, label = NA_character_, ... )
x |
(any) |
... |
(any) |
clone |
( |
target |
( |
id |
( |
positive |
( |
coordinate_names |
( |
crs |
( |
coords_as_features |
( |
label |
( |
if (mlr3misc::require_namespaces(c("sf"), quietly = TRUE)) { library("mlr3") data("ecuador", package = "mlr3spatiotempcv") # data.frame as_task_classif_st(ecuador, target = "slides", positive = "TRUE", coords_as_features = FALSE, crs = "+proj=utm +zone=17 +south +datum=WGS84 +units=m +no_defs", coordinate_names = c("x", "y")) # sf ecuador_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717) as_task_classif_st(ecuador_sf, target = "slides", positive = "TRUE") }
if (mlr3misc::require_namespaces(c("sf"), quietly = TRUE)) { library("mlr3") data("ecuador", package = "mlr3spatiotempcv") # data.frame as_task_classif_st(ecuador, target = "slides", positive = "TRUE", coords_as_features = FALSE, crs = "+proj=utm +zone=17 +south +datum=WGS84 +units=m +no_defs", coordinate_names = c("x", "y")) # sf ecuador_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717) as_task_classif_st(ecuador_sf, target = "slides", positive = "TRUE") }
Convert object to a TaskRegrST.
This is a S3 generic, specialized for at least the following objects:
TaskRegrST: Ensure the identity.
data.frame()
and mlr3::DataBackend: Provides an alternative to the
constructor of TaskRegrST.
sf::sf: Extracts spatial meta data before construction.
## S3 method for class 'TaskClassifST' as_task_regr_st( x, target = NULL, drop_original_target = FALSE, drop_levels = TRUE, ... ) as_task_regr_st(x, ...) ## S3 method for class 'TaskRegrST' as_task_regr_st(x, clone = FALSE, ...) ## S3 method for class 'data.frame' as_task_regr_st( x, target, id = deparse(substitute(x)), coordinate_names, crs = NA_character_, coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'DataBackend' as_task_regr_st( x, target, id = deparse(substitute(x)), positive = NULL, coordinate_names, crs, coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'sf' as_task_regr_st( x, target = NULL, id = deparse(substitute(x)), coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'TaskClassifST' as_task_regr_st( x, target = NULL, drop_original_target = FALSE, drop_levels = TRUE, ... )
## S3 method for class 'TaskClassifST' as_task_regr_st( x, target = NULL, drop_original_target = FALSE, drop_levels = TRUE, ... ) as_task_regr_st(x, ...) ## S3 method for class 'TaskRegrST' as_task_regr_st(x, clone = FALSE, ...) ## S3 method for class 'data.frame' as_task_regr_st( x, target, id = deparse(substitute(x)), coordinate_names, crs = NA_character_, coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'DataBackend' as_task_regr_st( x, target, id = deparse(substitute(x)), positive = NULL, coordinate_names, crs, coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'sf' as_task_regr_st( x, target = NULL, id = deparse(substitute(x)), coords_as_features = FALSE, label = NA_character_, ... ) ## S3 method for class 'TaskClassifST' as_task_regr_st( x, target = NULL, drop_original_target = FALSE, drop_levels = TRUE, ... )
x |
(any) |
target |
( |
drop_original_target |
( |
drop_levels |
( |
... |
(any) |
clone |
( |
id |
( |
coordinate_names |
( |
crs |
( |
coords_as_features |
( |
label |
( |
positive |
( |
if (mlr3misc::require_namespaces(c("sf"), quietly = TRUE)) { library("mlr3") data("cookfarm_mlr3", package = "mlr3spatiotempcv") # data.frame as_task_regr_st(cookfarm_mlr3, target = "PHIHOX", coords_as_features = FALSE, crs = 26911, coordinate_names = c("x", "y")) # sf cookfarm_sf = sf::st_as_sf(cookfarm_mlr3, coords = c("x", "y"), crs = 26911) as_task_regr_st(cookfarm_sf, target = "PHIHOX") }
if (mlr3misc::require_namespaces(c("sf"), quietly = TRUE)) { library("mlr3") data("cookfarm_mlr3", package = "mlr3spatiotempcv") # data.frame as_task_regr_st(cookfarm_mlr3, target = "PHIHOX", coords_as_features = FALSE, crs = 26911, coordinate_names = c("x", "y")) # sf cookfarm_sf = sf::st_as_sf(cookfarm_mlr3, coords = c("x", "y"), crs = 26911) as_task_regr_st(cookfarm_sf, target = "PHIHOX") }
Generic S3 plot()
and autoplot()
(ggplot2) methods.
## S3 method for class 'ResamplingCustomCV' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingCustomCV' plot(x, ...)
## S3 method for class 'ResamplingCustomCV' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingCustomCV' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
sample_fold_n |
|
... |
Passed to |
x |
|
mlr3book chapter on "Spatial Analysis"
if (mlr3misc::require_namespaces(c("sf", "patchwork"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") breaks = quantile(task$data()$dem, seq(0, 1, length = 6)) zclass = cut(task$data()$dem, breaks, include.lowest = TRUE) resampling = rsmp("custom_cv") resampling$instantiate(task, f = zclass) autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) autoplot(resampling, task, fold_id = 1) autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
if (mlr3misc::require_namespaces(c("sf", "patchwork"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") breaks = quantile(task$data()$dem, seq(0, 1, length = 6)) zclass = cut(task$data()$dem, breaks, include.lowest = TRUE) resampling = rsmp("custom_cv") resampling$instantiate(task, f = zclass) autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) autoplot(resampling, task, fold_id = 1) autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
Generic S3 plot()
and autoplot()
(ggplot2) methods.
## S3 method for class 'ResamplingCV' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedCV' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingCV' plot(x, ...) ## S3 method for class 'ResamplingRepeatedCV' plot(x, ...)
## S3 method for class 'ResamplingCV' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedCV' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingCV' plot(x, ...) ## S3 method for class 'ResamplingRepeatedCV' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
sample_fold_n |
|
... |
Passed to |
repeats_id |
|
x |
|
mlr3book chapter on "Spatial Analysis"
if (mlr3misc::require_namespaces(c("sf", "patchwork", "ggtext", "ggsci"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("cv") resampling$instantiate(task) autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) autoplot(resampling, task, fold_id = 1) autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
if (mlr3misc::require_namespaces(c("sf", "patchwork", "ggtext", "ggsci"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("cv") resampling$instantiate(task) autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) autoplot(resampling, task, fold_id = 1) autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
Generic S3 plot()
and autoplot()
(ggplot2) methods to
visualize mlr3 spatiotemporal resampling objects.
## S3 method for class 'ResamplingSpCVBlock' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_blocks = FALSE, show_labels = FALSE, sample_fold_n = NULL, label_size = 2, ... ) ## S3 method for class 'ResamplingRepeatedSpCVBlock' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_blocks = FALSE, show_labels = FALSE, sample_fold_n = NULL, label_size = 2, ... ) ## S3 method for class 'ResamplingSpCVBlock' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVBlock' plot(x, ...)
## S3 method for class 'ResamplingSpCVBlock' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_blocks = FALSE, show_labels = FALSE, sample_fold_n = NULL, label_size = 2, ... ) ## S3 method for class 'ResamplingRepeatedSpCVBlock' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_blocks = FALSE, show_labels = FALSE, sample_fold_n = NULL, label_size = 2, ... ) ## S3 method for class 'ResamplingSpCVBlock' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVBlock' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
show_blocks |
|
show_labels |
|
sample_fold_n |
|
label_size |
|
... |
Passed to |
repeats_id |
|
x |
|
By default a plot is returned; if fold_id
is set, a gridded plot is
created. If plot_as_grid = FALSE
, a list of plot objects is returned.
This can be used to align the plots individually.
When no single fold is selected, the ggsci::scale_color_ucscgb()
palette
is used to display all partitions.
If you want to change the colors, call <plot> + <color-palette>()
.
ggplot2::ggplot()
or list of ggplot2 objects.
mlr3book chapter on "Spatial Analysis"
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_block", range = 1000L) resampling$instantiate(task) ## list of ggplot2 resamplings plot_list = autoplot(resampling, task, crs = 4326, fold_id = c(1, 2), plot_as_grid = FALSE) ## Visualize all partitions autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) ## Visualize the train/test split of a single fold autoplot(resampling, task, fold_id = 1) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) ## Visualize train/test splits of multiple folds autoplot(resampling, task, fold_id = c(1, 2), show_blocks = TRUE) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_block", range = 1000L) resampling$instantiate(task) ## list of ggplot2 resamplings plot_list = autoplot(resampling, task, crs = 4326, fold_id = c(1, 2), plot_as_grid = FALSE) ## Visualize all partitions autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) ## Visualize the train/test split of a single fold autoplot(resampling, task, fold_id = 1) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) ## Visualize train/test splits of multiple folds autoplot(resampling, task, fold_id = c(1, 2), show_blocks = TRUE) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
Generic S3 plot()
and autoplot()
(ggplot2) methods to
visualize mlr3 spatiotemporal resampling objects.
## S3 method for class 'ResamplingSpCVBuffer' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_omitted = FALSE, ... ) ## S3 method for class 'ResamplingSpCVBuffer' plot(x, ...)
## S3 method for class 'ResamplingSpCVBuffer' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_omitted = FALSE, ... ) ## S3 method for class 'ResamplingSpCVBuffer' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
show_omitted |
|
... |
Passed to |
x |
|
mlr3book chapter on "Spatial Analysis"
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_buffer", theRange = 1000) resampling$instantiate(task) ## single fold autoplot(resampling, task, fold_id = 1) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) ## multiple folds autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_buffer", theRange = 1000) resampling$instantiate(task) ## single fold autoplot(resampling, task, fold_id = 1) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) ## multiple folds autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
Generic S3 plot()
and autoplot()
(ggplot2) methods.
## S3 method for class 'ResamplingSpCVCoords' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVCoords' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVCoords' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVCoords' plot(x, ...)
## S3 method for class 'ResamplingSpCVCoords' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVCoords' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVCoords' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVCoords' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
sample_fold_n |
|
... |
Passed to |
repeats_id |
|
x |
|
mlr3book chapter on "Spatial Analysis"
if (mlr3misc::require_namespaces(c("sf"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_coords") resampling$instantiate(task) autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) autoplot(resampling, task, fold_id = 1) autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
if (mlr3misc::require_namespaces(c("sf"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_coords") resampling$instantiate(task) autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) autoplot(resampling, task, fold_id = 1) autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
Generic S3 plot()
and autoplot()
(ggplot2) methods to
visualize mlr3 spatiotemporal resampling objects.
## S3 method for class 'ResamplingSpCVDisc' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", repeats_id = NULL, show_omitted = FALSE, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVDisc' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_omitted = FALSE, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVDisc' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVDisc' plot(x, ...)
## S3 method for class 'ResamplingSpCVDisc' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", repeats_id = NULL, show_omitted = FALSE, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVDisc' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_omitted = FALSE, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVDisc' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVDisc' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
repeats_id |
|
show_omitted |
|
sample_fold_n |
|
... |
Passed to |
x |
|
This method requires to set argument fold_id
and no plot containing all
partitions can be created. This is because the method does not make use of
all observations but only a subset of them (many observations are left out).
Hence, train and test sets of one fold are not re-used in other folds as in
other methods and plotting these without a train/test indicator would not
make sense.
This method has both a 2D and a 3D plotting method.
The 2D method returns a ggplot with x and y axes representing the spatial
coordinates.
The 3D method uses plotly to create an interactive 3D plot.
Set plot3D = TRUE
to use the 3D method.
Note that spatiotemporal datasets usually suffer from overplotting in 2D mode.
mlr3book chapter on "Spatial Analysis"
Vignette Spatiotemporal Visualization.
if (mlr3misc::require_namespaces("sf", quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_disc", folds = 5, radius = 200L, buffer = 200L) resampling$instantiate(task) autoplot(resampling, task, fold_id = 1, show_omitted = TRUE, size = 0.7) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
if (mlr3misc::require_namespaces("sf", quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_disc", folds = 5, radius = 200L, buffer = 200L) resampling$instantiate(task) autoplot(resampling, task, fold_id = 1, show_omitted = TRUE, size = 0.7) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
Generic S3 plot()
and autoplot()
(ggplot2) methods.
## S3 method for class 'ResamplingSpCVEnv' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVEnv' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVEnv' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVEnv' plot(x, ...)
## S3 method for class 'ResamplingSpCVEnv' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVEnv' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVEnv' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVEnv' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
sample_fold_n |
|
... |
Passed to |
repeats_id |
|
x |
|
mlr3book chapter on "Spatial Analysis"
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_env", folds = 4, features = "dem") resampling$instantiate(task) autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) autoplot(resampling, task, fold_id = 1) autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_env", folds = 4, features = "dem") resampling$instantiate(task) autoplot(resampling, task) + ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) autoplot(resampling, task, fold_id = 1) autoplot(resampling, task, fold_id = c(1, 2)) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
Generic S3 plot()
and autoplot()
(ggplot2) methods to
visualize mlr3 spatiotemporal resampling objects.
## S3 method for class 'ResamplingSpCVKnndm' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", repeats_id = NULL, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVKnndm' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVKnndm' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVKnndm' plot(x, ...)
## S3 method for class 'ResamplingSpCVKnndm' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", repeats_id = NULL, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVKnndm' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVKnndm' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVKnndm' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
repeats_id |
|
sample_fold_n |
|
... |
Passed to |
x |
|
This method requires to set argument fold_id
and no plot containing all
partitions can be created. This is because the method does not make use of
all observations but only a subset of them (many observations are left out).
Hence, train and test sets of one fold are not re-used in other folds as in
other methods and plotting these without a train/test indicator would not
make sense.
This method has both a 2D and a 3D plotting method.
The 2D method returns a ggplot with x and y axes representing the spatial
coordinates.
The 3D method uses plotly to create an interactive 3D plot.
Set plot3D = TRUE
to use the 3D method.
Note that spatiotemporal datasets usually suffer from overplotting in 2D mode.
mlr3book chapter on "Spatial Analysis"
Vignette Spatiotemporal Visualization.
if (mlr3misc::require_namespaces(c("CAST", "sf"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") points = sf::st_as_sf(task$coordinates(), crs = task$crs, coords = c("x", "y")) modeldomain = sf::st_as_sfc(sf::st_bbox(points)) resampling = rsmp("spcv_knndm", folds = 5, modeldomain = modeldomain) resampling$instantiate(task) autoplot(resampling, task, fold_id = 1, size = 0.7) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
if (mlr3misc::require_namespaces(c("CAST", "sf"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") points = sf::st_as_sf(task$coordinates(), crs = task$crs, coords = c("x", "y")) modeldomain = sf::st_as_sfc(sf::st_bbox(points)) resampling = rsmp("spcv_knndm", folds = 5, modeldomain = modeldomain) resampling$instantiate(task) autoplot(resampling, task, fold_id = 1, size = 0.7) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
Generic S3 plot()
and autoplot()
(ggplot2) methods to
visualize mlr3 spatiotemporal resampling objects.
## S3 method for class 'ResamplingSpCVTiles' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", repeats_id = NULL, show_omitted = FALSE, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVTiles' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_omitted = FALSE, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVTiles' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVTiles' plot(x, ...)
## S3 method for class 'ResamplingSpCVTiles' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", repeats_id = NULL, show_omitted = FALSE, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSpCVTiles' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", show_omitted = FALSE, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingSpCVTiles' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSpCVTiles' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
repeats_id |
|
show_omitted |
|
sample_fold_n |
|
... |
Passed to |
x |
|
Specific combinations of arguments of "spcv_tiles"
remove some
observations, hence show_omitted
has an effect in some cases.
mlr3book chapter on "Spatial Analysis"
Vignette Spatiotemporal Visualization.
if (mlr3misc::require_namespaces(c("sf", "sperrorest"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_tiles", nsplit = c(4L, 3L), reassign = FALSE) resampling$instantiate(task) autoplot(resampling, task, fold_id = 1, show_omitted = TRUE, size = 0.7) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
if (mlr3misc::require_namespaces(c("sf", "sperrorest"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task = tsk("ecuador") resampling = rsmp("spcv_tiles", nsplit = c(4L, 3L), reassign = FALSE) resampling$instantiate(task) autoplot(resampling, task, fold_id = 1, show_omitted = TRUE, size = 0.7) * ggplot2::scale_x_continuous(breaks = seq(-79.085, -79.055, 0.01)) }
Generic S3 plot()
and autoplot()
(ggplot2) methods to
visualize mlr3 spatiotemporal resampling objects.
## S3 method for class 'ResamplingSptCVCstf' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", repeats_id = NULL, tickformat_date = "%Y-%m", nticks_x = 3, nticks_y = 3, point_size = 3, axis_label_fontsize = 11, static_image = FALSE, show_omitted = FALSE, plot3D = NULL, plot_time_var = NULL, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSptCVCstf' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", tickformat_date = "%Y-%m", nticks_x = 3, nticks_y = 3, point_size = 3, axis_label_fontsize = 11, plot3D = NULL, plot_time_var = NULL, ... ) ## S3 method for class 'ResamplingSptCVCstf' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSptCVCstf' plot(x, ...)
## S3 method for class 'ResamplingSptCVCstf' autoplot( object, task, fold_id = NULL, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", repeats_id = NULL, tickformat_date = "%Y-%m", nticks_x = 3, nticks_y = 3, point_size = 3, axis_label_fontsize = 11, static_image = FALSE, show_omitted = FALSE, plot3D = NULL, plot_time_var = NULL, sample_fold_n = NULL, ... ) ## S3 method for class 'ResamplingRepeatedSptCVCstf' autoplot( object, task, fold_id = NULL, repeats_id = 1, plot_as_grid = TRUE, train_color = "#0072B5", test_color = "#E18727", tickformat_date = "%Y-%m", nticks_x = 3, nticks_y = 3, point_size = 3, axis_label_fontsize = 11, plot3D = NULL, plot_time_var = NULL, ... ) ## S3 method for class 'ResamplingSptCVCstf' plot(x, ...) ## S3 method for class 'ResamplingRepeatedSptCVCstf' plot(x, ...)
object |
|
task |
|
fold_id |
|
plot_as_grid |
|
train_color |
|
test_color |
|
repeats_id |
|
tickformat_date |
|
nticks_x |
|
nticks_y |
|
point_size |
|
axis_label_fontsize |
|
static_image |
|
show_omitted |
|
plot3D |
|
plot_time_var |
|
sample_fold_n |
|
... |
Passed down to |
x |
|
This method requires to set argument fold_id
.
No plot showing all folds in one plot can be created.
This is because the LLTO method does not make use of all observations but only
a subset of them (many observations are omitted).
Hence, train and test sets of one fold are not re-used in other folds as in
other methods and plotting these without a train/test indicator would be
misleading.
This method has both a 2D and a 3D plotting method.
The 2D method returns a ggplot with x and y axes representing the spatial
coordinates.
The 3D method uses plotly to create an interactive 3D plot.
Set plot3D = TRUE
to use the 3D method.
Note that spatiotemporal datasets usually suffer from overplotting in 2D mode.
mlr3book chapter on "Spatiotemporal Visualization"
Vignette Spatiotemporal Visualization.
if (mlr3misc::require_namespaces(c("sf", "plotly"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task_st = tsk("cookfarm_mlr3") task_st$set_col_roles("SOURCEID", "space") task_st$set_col_roles("Date", "time") resampling = rsmp("sptcv_cstf", folds = 5) resampling$instantiate(task_st) # with both `"space"` and `"time"` column roles set (LLTO), the omitted # observations per fold can be shown by setting `show_omitted = TRUE` autoplot(resampling, task_st, fold_id = 1, show_omitted = TRUE) }
if (mlr3misc::require_namespaces(c("sf", "plotly"), quietly = TRUE)) { library(mlr3) library(mlr3spatiotempcv) task_st = tsk("cookfarm_mlr3") task_st$set_col_roles("SOURCEID", "space") task_st$set_col_roles("Date", "time") resampling = rsmp("sptcv_cstf", folds = 5) resampling$instantiate(task_st) # with both `"space"` and `"time"` column roles set (LLTO), the omitted # observations per fold can be shown by setting `show_omitted = TRUE` autoplot(resampling, task_st, fold_id = 1, show_omitted = TRUE) }
This function creates spatially separated folds based on a distance to number of row and/or column.
It assigns blocks to the training and testing folds randomly, systematically or
in a checkerboard pattern. The distance (size
)
should be in metres, regardless of the unit of the reference system of
the input data (for more information see the details section). By default,
the function creates blocks according to the extent and shape of the spatial sample data (x
e.g.
the species occurrence), Alternatively, blocks can be created based on r
assuming that the
user has considered the landscape for the given species and case study.
Blocks can also be offset so the origin is not at the outer corner of the rasters.
Instead of providing a distance, the blocks can also be created by specifying a number of rows and/or
columns and divide the study area into vertical or horizontal bins, as presented in Wenger & Olden (2012)
and Bahn & McGill (2012). Finally, the blocks can be specified by a user-defined spatial polygon layer.
To maintain consistency, all functions in this package use meters as their unit of
measurement. However, when the input map has a geographic coordinate system (in decimal degrees),
the block size is calculated by dividing the size
parameter by deg_to_metre
(which
defaults to 111325 meters, the standard distance of one degree of latitude on the Equator).
In reality, this value varies by a factor of the cosine of the latitude. So, an alternative sensible
value could be cos(mean(sf::st_bbox(x)[c(2,4)]) * pi/180) * 111325
.
The offset
can be used to change the spatial position of the blocks. It can also be used to
assess the sensitivity of analysis results to shifting in the blocking arrangements.
These options are available when size
is defined. By default the region is
located in the middle of the blocks and by setting the offsets, the blocks will shift.
Roberts et. al. (2017) suggest that blocks should be substantially bigger than the range of spatial
autocorrelation (in model residual) to obtain realistic error estimates, while a buffer with the size of
the spatial autocorrelation range would result in a good estimation of error. This is because of the so-called
edge effect (O'Sullivan & Unwin, 2014), whereby points located on the edges of the blocks of opposite sets are
not separated spatially. Blocking with a buffering strategy overcomes this issue (see cv_buffer
).
By default blockCV::cv_spatial()
does not allow the creation of multiple
repetitions. mlr3spatiotempcv
adds support for this when using the size
argument for fold creation. When supplying a vector of length(repeats)
for
argument size
, these different settings will be used to create folds which
differ among the repetitions.
Multiple repetitions are not possible when using the "row & cols" approach because the created folds will always be the same.
The 'Description' and 'Details' fields are inherited from the respective upstream function.
For a list of available arguments, please see blockCV::cv_spatial.
blockCV
>= 3.0.0 changed the argument names of the implementation. For backward compatibility, mlr3spatiotempcv
is still using the old ones.
Here's a list which shows the mapping between blockCV
< 3.0.0 and blockCV
>= 3.0.0:
range
-> size
rasterLayer
-> r
speciesData
-> points
showBlocks
-> plot
cols
and rows
-> rows_cols
The default of argument hexagon
is different in mlr3spatiotempcv
(FALSE
instead of TRUE
) to create square blocks instead of hexagonal blocks by default.
repeats
(integer(1)
)
Number of repeats.
mlr3::Resampling
-> ResamplingRepeatedSpCVBlock
blocks
sf | list of sf objects
Polygons (sf
objects) as returned by blockCV which grouped
observations into partitions.
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create an "spatial block" repeated resampling instance.
For a list of available arguments, please see blockCV::cv_spatial.
ResamplingRepeatedSpCVBlock$new(id = "repeated_spcv_block")
id
character(1)
Identifier for the resampling strategy.
folds()
Translates iteration numbers to fold number.
ResamplingRepeatedSpCVBlock$folds(iters)
iters
integer()
Iteration number.
repeats()
Translates iteration numbers to repetition number.
ResamplingRepeatedSpCVBlock$repeats(iters)
iters
integer()
Iteration number.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingRepeatedSpCVBlock$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingRepeatedSpCVBlock$clone(deep = FALSE)
deep
Whether to make a deep clone.
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. doi:10.1101/357798.
## Not run: if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("diplodia") # Instantiate Resampling rrcv = rsmp("repeated_spcv_block", folds = 3, repeats = 2, range = c(5000L, 10000L)) rrcv$instantiate(task) # Individual sets: rrcv$iters rrcv$folds(1:6) rrcv$repeats(1:6) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance # table } ## End(Not run)
## Not run: if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("diplodia") # Instantiate Resampling rrcv = rsmp("repeated_spcv_block", folds = 3, repeats = 2, range = c(5000L, 10000L)) rrcv$instantiate(task) # Individual sets: rrcv$iters rrcv$folds(1:6) rrcv$repeats(1:6) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance # table } ## End(Not run)
Splits data by clustering in the coordinate space.
See the upstream implementation at sperrorest::partition_kmeans()
and
Brenning (2012) for further information.
Universal partitioning method that splits the data in the coordinate space.
Useful for spatially homogeneous datasets that cannot be split well with
rectangular approaches like ResamplingSpCVBlock
.
folds
(integer(1)
)
Number of folds.
repeats
(integer(1)
)
Number of repeats.
mlr3::Resampling
-> ResamplingRepeatedSpCVCoords
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create an "coordinate-based" repeated resampling instance.
For a list of available arguments, please see sperrorest::partition_cv.
ResamplingRepeatedSpCVCoords$new(id = "repeated_spcv_coords")
id
character(1)
Identifier for the resampling strategy.
folds()
Translates iteration numbers to fold number.
ResamplingRepeatedSpCVCoords$folds(iters)
iters
integer()
Iteration number.
repeats()
Translates iteration numbers to repetition number.
ResamplingRepeatedSpCVCoords$repeats(iters)
iters
integer()
Iteration number.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingRepeatedSpCVCoords$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingRepeatedSpCVCoords$clone(deep = FALSE)
deep
Whether to make a deep clone.
Brenning A (2012). “Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. doi:10.1109/igarss.2012.6352393.
library(mlr3) task = tsk("diplodia") # Instantiate Resampling rrcv = rsmp("repeated_spcv_coords", folds = 3, repeats = 5) rrcv$instantiate(task) # Individual sets: rrcv$iters rrcv$folds(1:6) rrcv$repeats(1:6) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance # table
library(mlr3) task = tsk("diplodia") # Instantiate Resampling rrcv = rsmp("repeated_spcv_coords", folds = 3, repeats = 5) rrcv$instantiate(task) # Individual sets: rrcv$iters rrcv$folds(1:6) rrcv$repeats(1:6) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance # table
(sperrorest) Repeated spatial "disc" resampling
(sperrorest) Repeated spatial "disc" resampling
repeats
(integer(1)
)
Number of repeats.
mlr3::Resampling
-> ResamplingRepeatedSpCVDisc
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create a "Spatial 'Disc' resampling" resampling instance.
For a list of available arguments, please see sperrorest::partition_disc.
ResamplingRepeatedSpCVDisc$new(id = "repeated_spcv_disc")
id
character(1)
Identifier for the resampling strategy.
folds()
Translates iteration numbers to fold number.
ResamplingRepeatedSpCVDisc$folds(iters)
iters
integer()
Iteration number.
repeats()
Translates iteration numbers to repetition number.
ResamplingRepeatedSpCVDisc$repeats(iters)
iters
integer()
Iteration number.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingRepeatedSpCVDisc$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingRepeatedSpCVDisc$clone(deep = FALSE)
deep
Whether to make a deep clone.
Brenning A (2012). “Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. doi:10.1109/igarss.2012.6352393.
library(mlr3) task = tsk("ecuador") # Instantiate Resampling rrcv = rsmp("repeated_spcv_disc", folds = 3L, repeats = 2, radius = 200L, buffer = 200L) rrcv$instantiate(task) # Individual sets: rrcv$iters rrcv$folds(1:6) rrcv$repeats(1:6) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance # table
library(mlr3) task = tsk("ecuador") # Instantiate Resampling rrcv = rsmp("repeated_spcv_disc", folds = 3L, repeats = 2, radius = 200L, buffer = 200L) rrcv$instantiate(task) # Individual sets: rrcv$iters rrcv$folds(1:6) rrcv$repeats(1:6) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance # table
Splits data by clustering in the feature space.
See the upstream implementation at blockCV::cv_cluster()
and
Valavi et al. (2018) for further information.
Useful when the dataset is supposed to be split on environmental information which is present in features. The method allows for a combination of multiple features for clustering.
The input of raster images directly as in blockCV::cv_cluster()
is not
supported. See mlr3spatial and its raster DataBackends for such
support in mlr3.
folds
(integer(1)
)
Number of folds.
features
(character()
)
The features to use for clustering.
repeats
(integer(1)
)
Number of repeats.
mlr3::Resampling
-> ResamplingRepeatedSpCVEnv
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create an "Environmental Block" repeated resampling instance.
For a list of available arguments, please see blockCV::cv_cluster.
ResamplingRepeatedSpCVEnv$new(id = "repeated_spcv_env")
id
character(1)
Identifier for the resampling strategy.
folds()
Translates iteration numbers to fold number.
ResamplingRepeatedSpCVEnv$folds(iters)
iters
integer()
Iteration number.
repeats()
Translates iteration numbers to repetition number.
ResamplingRepeatedSpCVEnv$repeats(iters)
iters
integer()
Iteration number.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingRepeatedSpCVEnv$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingRepeatedSpCVEnv$clone(deep = FALSE)
deep
Whether to make a deep clone.
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. doi:10.1101/357798.
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rrcv = rsmp("repeated_spcv_env", folds = 4, repeats = 2) rrcv$instantiate(task) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance }
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rrcv = rsmp("repeated_spcv_env", folds = 4, repeats = 2) rrcv$instantiate(task) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance }
This function implements the kNNDM algorithm and returns the necessary indices to perform a k-fold NNDM CV for map validation.
knndm is a k-fold version of NNDM LOO CV for medium and large datasets. Brielfy, the algorithm tries to find a k-fold configuration such that the integral of the absolute differences (Wasserstein W statistic) between the empirical nearest neighbour distance distribution function between the test and training data during CV (Gj*), and the empirical nearest neighbour distance distribution function between the prediction and training points (Gij), is minimised. It does so by performing clustering of the training points' coordinates for different numbers of clusters that range from k to N (number of observations), merging them into k final folds, and selecting the configuration with the lowest W.
Using a projected CRS in 'knndm' has large computational advantages since fast nearest neighbour search can be done via the 'FNN' package, while working with geographic coordinates requires computing the full spherical distance matrices. As a clustering algorithm, 'kmeans' can only be used for projected CRS while 'hierarchical' can work with both projected and geographical coordinates, though it requires calculating the full distance matrix of the training points even for a projected CRS.
In order to select between clustering algorithms and number of folds 'k', different 'knndm' configurations can be run and compared, being the one with a lower W statistic the one that offers a better match. W statistics between 'knndm' runs are comparable as long as 'tpoints' and 'predpoints' or 'modeldomain' stay the same.
Map validation using 'knndm' should be used using 'CAST::global_validation', i.e. by stacking all out-of-sample predictions and evaluating them all at once. The reasons behind this are 1) The resulting folds can be unbalanced and 2) nearest neighbour functions are constructed and matched using all CV folds simultaneously.
If training data points are very clustered with respect to the prediction area and the presented 'knndm' configuration still show signs of Gj* > Gij, there are several things that can be tried. First, increase the 'maxp' parameter; this may help to control for strong clustering (at the cost of having unbalanced folds). Secondly, decrease the number of final folds 'k', which may help to have larger clusters.
The 'modeldomain' is either a sf polygon that defines the prediction area, or alternatively a SpatRaster out of which a polygon, transformed into the CRS of the training points, is defined as the outline of all non-NA cells. Then, the function takes a regular point sample (amount defined by 'samplesize') from the spatial extent. As an alternative use 'predpoints' instead of 'modeldomain', if you have already defined the prediction locations (e.g. raster pixel centroids). When using either 'modeldomain' or 'predpoints', we advise to plot the study area polygon and the training/prediction points as a previous step to ensure they are aligned.
'knndm' can also be performed in the feature space by setting 'space' to "feature". Euclidean distances or Mahalanobis distances can be used for distance calculation, but only Euclidean are tested. In this case, nearest neighbour distances are calculated in n-dimensional feature space rather than in geographical space. 'tpoints' and 'predpoints' can be data frames or sf objects containing the values of the features. Note that the names of 'tpoints' and 'predpoints' must be the same. 'predpoints' can also be missing, if 'modeldomain' is of class SpatRaster. In this case, the values of of the SpatRaster will be extracted to the 'predpoints'. In the case of any categorical features, Gower distances will be used to calculate the Nearest Neighbour distances [Experimental]. If categorical features are present, and 'clustering' = "kmeans", K-Prototype clustering will be performed instead.
folds
(integer(1)
)
Number of folds.
stratify
If TRUE
, stratify on the target column.
repeats
(integer(1)
)
Number of repeats.
mlr3::Resampling
-> ResamplingRepeatedSpCVKnndm
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create a "K-fold Nearest Neighbour Distance Matching" resampling instance.
ResamplingRepeatedSpCVKnndm$new(id = "repeated_spcv_knndm")
id
character(1)
Identifier for the resampling strategy.
folds()
Translates iteration numbers to fold number.
ResamplingRepeatedSpCVKnndm$folds(iters)
iters
integer()
Iteration number.
repeats()
Translates iteration numbers to repetition number.
ResamplingRepeatedSpCVKnndm$repeats(iters)
iters
integer()
Iteration number.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingRepeatedSpCVKnndm$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingRepeatedSpCVKnndm$clone(deep = FALSE)
deep
Whether to make a deep clone.
Linnenbrink, J., Mila, C., Ludwig, M., Meyer, H. (2023). “kNNDM: k-fold Nearest Neighbour Distance Matching Cross-Validation for map accuracy estimation.” EGUsphere, 2023, 1–16. doi:10.5194/egusphere-2023-1308, https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1308/.
library(mlr3) library(mlr3spatial) set.seed(42) simarea = list(matrix(c(0, 0, 0, 100, 100, 100, 100, 0, 0, 0), ncol = 2, byrow = TRUE)) simarea = sf::st_polygon(simarea) train_points = sf::st_sample(simarea, 1000, type = "random") train_points = sf::st_as_sf(train_points) train_points$target = as.factor(sample(c("TRUE", "FALSE"), 1000, replace = TRUE)) pred_points = sf::st_sample(simarea, 1000, type = "regular") task = mlr3spatial::as_task_classif_st(sf::st_as_sf(train_points), "target", positive = "TRUE") cv_knndm = rsmp("repeated_spcv_knndm", predpoints = pred_points, repeats = 2) cv_knndm$instantiate(task) #' ### Individual sets: # cv_knndm$train_set(1) # cv_knndm$test_set(1) # check that no obs are in both sets intersect(cv_knndm$train_set(1), cv_knndm$test_set(1)) # good! # Internal storage: # cv_knndm$instance # table
library(mlr3) library(mlr3spatial) set.seed(42) simarea = list(matrix(c(0, 0, 0, 100, 100, 100, 100, 0, 0, 0), ncol = 2, byrow = TRUE)) simarea = sf::st_polygon(simarea) train_points = sf::st_sample(simarea, 1000, type = "random") train_points = sf::st_as_sf(train_points) train_points$target = as.factor(sample(c("TRUE", "FALSE"), 1000, replace = TRUE)) pred_points = sf::st_sample(simarea, 1000, type = "regular") task = mlr3spatial::as_task_classif_st(sf::st_as_sf(train_points), "target", positive = "TRUE") cv_knndm = rsmp("repeated_spcv_knndm", predpoints = pred_points, repeats = 2) cv_knndm$instantiate(task) #' ### Individual sets: # cv_knndm$train_set(1) # cv_knndm$test_set(1) # check that no obs are in both sets intersect(cv_knndm$train_set(1), cv_knndm$test_set(1)) # good! # Internal storage: # cv_knndm$instance # table
Spatial partitioning using rectangular tiles.
Small partitions can optionally be merged into adjacent ones to avoid
partitions with too few observations.
This method is similar to ResamplingSpCVBlock
by making use of
rectangular zones in the coordinate space.
See the upstream implementation at sperrorest::partition_disc()
and
Brenning (2012) for further information.
dsplit
(integer(2)
)
Equidistance of splits in (possibly rotated) x direction (dsplit[1]
) and y direction (dsplit[2]
) used to define tiles.
If dsplit is of length 1, its value is recycled.
Either dsplit
or nsplit
must be specified.
nsplit
(integer(2)
)
Number of splits in (possibly rotated) x direction (nsplit[1]
) and y direction (nsplit[2]
) used to define tiles.
If nsplit
is of length 1, its value is recycled.
rotation
(character(1)
)
Whether and how the rectangular grid should be rotated; random rotation is only possible between -45 and +45 degrees.
Accepted values: One of c("none", "random", "user")
.
user_rotation
(character(1)
)
Only used when rotation = "user"
.
Angle(s) (in degrees) by which the rectangular grid is to be rotated in
each repetition.
Either a vector of same length as repeats
, or a single number that
will be replicated length(repeats)
times.
offset
(logical(1)
)
Whether and how the rectangular grid should be shifted by an offset.
Accepted values: One of c("none", "random", "user")
.
user_offset
(logical(1)
)
Only used when offset = "user"
.
A list (or vector) of two components specifying a shift of the rectangular
grid in (possibly rotated) x and y direction.
The offset values are relative values, a value of 0.5 resulting in a
one-half tile shift towards the left, or upward.
If this is a list, its first (second) component refers to the rotated
x (y) direction, and both components must have same length as repeats
(or length 1).
If a vector of length 2 (or list components have length 1), the two values
will be interpreted as relative shifts in (rotated) x and y direction,
respectively, and will therefore be recycled as needed (length(repeats)
times each).
reassign
(logical(1)
)
If TRUE
, 'small' tiles (as per min_frac
and min_n
) are merged with
(smallest) adjacent tiles.
If FALSE
, small tiles are 'eliminated', i.e., set to NA.
min_frac
(numeric(1)
)
Value must be >=0, <1.
Minimum relative size of partition as percentage of sample.
min_n
(integer(1)
)
Minimum number of samples per partition.
iterate
(integer(1)
)
Passed down to sperrorest::tile_neighbors()
.
repeats
(integer(1)
)
Number of repeats.
mlr3::Resampling
-> ResamplingRepeatedSpCVTiles
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create a "Spatial 'Tiles' resampling" resampling instance.
For a list of available arguments, please see sperrorest::partition_tiles.
ResamplingRepeatedSpCVTiles$new(id = "repeated_spcv_tiles")
id
character(1)
Identifier for the resampling strategy.
folds()
Translates iteration numbers to fold number.
ResamplingRepeatedSpCVTiles$folds(iters)
iters
integer()
Iteration number.
repeats()
Translates iteration numbers to repetition number.
ResamplingRepeatedSpCVTiles$repeats(iters)
iters
integer()
Iteration number.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingRepeatedSpCVTiles$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingRepeatedSpCVTiles$clone(deep = FALSE)
deep
Whether to make a deep clone.
Brenning A (2012). “Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. doi:10.1109/igarss.2012.6352393.
ResamplingSpCVBlock
if (mlr3misc::require_namespaces("sperrorest", quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rrcv = rsmp("repeated_spcv_tiles", repeats = 2, nsplit = c(4L, 3L), reassign = FALSE) rrcv$instantiate(task) # Individual sets: rrcv$iters rrcv$folds(10:12) rrcv$repeats(10:12) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance # table }
if (mlr3misc::require_namespaces("sperrorest", quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rrcv = rsmp("repeated_spcv_tiles", repeats = 2, nsplit = c(4L, 3L), reassign = FALSE) rrcv$instantiate(task) # Individual sets: rrcv$iters rrcv$folds(10:12) rrcv$repeats(10:12) # Individual sets: rrcv$train_set(1) rrcv$test_set(1) intersect(rrcv$train_set(1), rrcv$test_set(1)) # Internal storage: rrcv$instance # table }
Splits data using Leave-Location-Out (LLO), Leave-Time-Out (LTO) and
Leave-Location-and-Time-Out (LLTO) partitioning.
See the upstream implementation at CreateSpacetimeFolds()
(package CAST) and Meyer et al. (2018) for further information.
LLO predicts on unknown locations i.e. complete locations are left out in the
training sets.
The "space"
role in Task$col_roles
identifies spatial units.
If stratify
is TRUE
, the target distribution is similar in each fold.
This is useful for land cover classification when the observations
are polygons.
In this case, LLO with stratification should be used to hold back complete
polygons and have a similar target distribution in each fold.
LTO leaves out complete temporal units which are identified by the
"time"
role in Task$col_roles
.
LLTO leaves out spatial and temporal units.
See the examples.
folds
(integer(1)
)
Number of folds.
stratify
If TRUE
, stratify on the target column.
repeats
(integer(1)
)
Number of repeats.
mlr3::Resampling
-> ResamplingRepeatedSptCVCstf
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create a "Spacetime Folds" resampling instance.
ResamplingRepeatedSptCVCstf$new(id = "repeated_sptcv_cstf")
id
character(1)
Identifier for the resampling strategy.
folds()
Translates iteration numbers to fold number.
ResamplingRepeatedSptCVCstf$folds(iters)
iters
integer()
Iteration number.
repeats()
Translates iteration numbers to repetition number.
ResamplingRepeatedSptCVCstf$repeats(iters)
iters
integer()
Iteration number.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingRepeatedSptCVCstf$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingRepeatedSptCVCstf$clone(deep = FALSE)
deep
Whether to make a deep clone.
Zhao Y, Karypis G (2002). “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” 11th Conference of Information and Knowledge Management (CIKM), 51-524. doi:10.1145/584792.584877.
library(mlr3) task = tsk("cookfarm_mlr3") task$set_col_roles("SOURCEID", roles = "space") task$set_col_roles("Date", roles = "time") # Instantiate Resampling rcv = rsmp("repeated_sptcv_cstf", folds = 5, repeats = 2) rcv$instantiate(task) ### Individual sets: # rcv$train_set(1) # rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: # rcv$instance # table
library(mlr3) task = tsk("cookfarm_mlr3") task$set_col_roles("SOURCEID", roles = "space") task$set_col_roles("Date", roles = "time") # Instantiate Resampling rcv = rsmp("repeated_sptcv_cstf", folds = 5, repeats = 2) rcv$instantiate(task) ### Individual sets: # rcv$train_set(1) # rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: # rcv$instance # table
This function creates spatially separated folds based on a distance to number of row and/or column.
It assigns blocks to the training and testing folds randomly, systematically or
in a checkerboard pattern. The distance (size
)
should be in metres, regardless of the unit of the reference system of
the input data (for more information see the details section). By default,
the function creates blocks according to the extent and shape of the spatial sample data (x
e.g.
the species occurrence), Alternatively, blocks can be created based on r
assuming that the
user has considered the landscape for the given species and case study.
Blocks can also be offset so the origin is not at the outer corner of the rasters.
Instead of providing a distance, the blocks can also be created by specifying a number of rows and/or
columns and divide the study area into vertical or horizontal bins, as presented in Wenger & Olden (2012)
and Bahn & McGill (2012). Finally, the blocks can be specified by a user-defined spatial polygon layer.
To maintain consistency, all functions in this package use meters as their unit of
measurement. However, when the input map has a geographic coordinate system (in decimal degrees),
the block size is calculated by dividing the size
parameter by deg_to_metre
(which
defaults to 111325 meters, the standard distance of one degree of latitude on the Equator).
In reality, this value varies by a factor of the cosine of the latitude. So, an alternative sensible
value could be cos(mean(sf::st_bbox(x)[c(2,4)]) * pi/180) * 111325
.
The offset
can be used to change the spatial position of the blocks. It can also be used to
assess the sensitivity of analysis results to shifting in the blocking arrangements.
These options are available when size
is defined. By default the region is
located in the middle of the blocks and by setting the offsets, the blocks will shift.
Roberts et. al. (2017) suggest that blocks should be substantially bigger than the range of spatial
autocorrelation (in model residual) to obtain realistic error estimates, while a buffer with the size of
the spatial autocorrelation range would result in a good estimation of error. This is because of the so-called
edge effect (O'Sullivan & Unwin, 2014), whereby points located on the edges of the blocks of opposite sets are
not separated spatially. Blocking with a buffering strategy overcomes this issue (see cv_buffer
).
By default blockCV::cv_spatial()
does not allow the creation of multiple
repetitions. mlr3spatiotempcv
adds support for this when using the size
argument for fold creation. When supplying a vector of length(repeats)
for
argument size
, these different settings will be used to create folds which
differ among the repetitions.
Multiple repetitions are not possible when using the "row & cols" approach because the created folds will always be the same.
The 'Description' and 'Details' fields are inherited from the respective upstream function.
For a list of available arguments, please see blockCV::cv_spatial.
blockCV
>= 3.0.0 changed the argument names of the implementation. For backward compatibility, mlr3spatiotempcv
is still using the old ones.
Here's a list which shows the mapping between blockCV
< 3.0.0 and blockCV
>= 3.0.0:
range
-> size
rasterLayer
-> r
speciesData
-> points
showBlocks
-> plot
cols
and rows
-> rows_cols
The default of argument hexagon
is different in mlr3spatiotempcv
(FALSE
instead of TRUE
) to create square blocks instead of hexagonal blocks by default.
mlr3::Resampling
-> ResamplingSpCVBlock
blocks
sf | list of sf objects
Polygons (sf
objects) as returned by blockCV which grouped
observations into partitions.
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create an "spatial block" resampling instance.
For a list of available arguments, please see
blockCV::cv_spatial()
.
ResamplingSpCVBlock$new(id = "spcv_block")
id
character(1)
Identifier for the resampling strategy.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingSpCVBlock$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingSpCVBlock$clone(deep = FALSE)
deep
Whether to make a deep clone.
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. doi:10.1101/357798.
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_block", range = 3000L, folds = 3) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) intersect(rcv$train_set(1), rcv$test_set(1)) # Internal storage: rcv$instance }
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_block", range = 3000L, folds = 3) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) intersect(rcv$train_set(1), rcv$test_set(1)) # Internal storage: rcv$instance }
This function generates spatially separated train and test folds by considering buffers of
the specified distance (size
parameter) around each observation point.
This approach is a form of leave-one-out cross-validation. Each fold is generated by excluding
nearby observations around each testing point within the specified distance (ideally the range of
spatial autocorrelation, see cv_spatial_autocor
). In this method, the testing set never
directly abuts a training sample (e.g. presence or absence; 0s and 1s). For more information see the details section.
When working with presence-background (presence and pseudo-absence) species distribution
data (should be specified by presence_bg = TRUE
argument), only presence records are used
for specifying the folds (recommended). Consider a target presence point. The buffer is defined around this target point,
using the specified range (size
). By default, the testing fold comprises only the target presence point (all background
points within the buffer are also added when add_bg = TRUE
).
Any non-target presence points inside the buffer are excluded.
All points (presence and background) outside of buffer are used for the training set.
The methods cycles through all the presence data, so the number of folds is equal to
the number of presence points in the dataset.
For presence-absence data (and all other types of data), folds are created based on all records, both
presences and absences. As above, a target observation (presence or absence) forms a test point, all
presence and absence points other than the target point within the buffer are ignored, and the training
set comprises all presences and absences outside the buffer. Apart from the folds, the number
of training-presence, training-absence, testing-presence and testing-absence
records is stored and returned in the records
table. If column = NULL
and presence_bg = FALSE
,
the procedure is like presence-absence data. All other data types (continuous, count or multi-class responses) should be
done by presence_bg = FALSE
.
The 'Description' and 'Details' fields are inherited from the respective upstream function. For a list of available arguments, please see blockCV::cv_buffer.
blockCV
>= 3.0.0 changed the argument names of the implementation. For backward compatibility, mlr3spatiotempcv
is still using the old ones.
Here's a list which shows the mapping between blockCV
< 3.0.0 and blockCV
>= 3.0.0:
theRange
-> size
addBG
-> add_bg
spDataType
(character vector) -> presence_bg
(boolean)
mlr3::Resampling
-> ResamplingSpCVBuffer
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create an "Environmental Block" resampling instance.
For a list of available arguments, please see
blockCV::cv_buffer()
.
ResamplingSpCVBuffer$new(id = "spcv_buffer")
id
character(1)
Identifier for the resampling strategy.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingSpCVBuffer$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingSpCVBuffer$clone(deep = FALSE)
deep
Whether to make a deep clone.
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. doi:10.1101/357798.
ResamplingSpCVDisc
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_buffer", theRange = 10000) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) intersect(rcv$train_set(1), rcv$test_set(1)) # Internal storage: # rcv$instance }
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_buffer", theRange = 10000) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) intersect(rcv$train_set(1), rcv$test_set(1)) # Internal storage: # rcv$instance }
Splits data by clustering in the coordinate space.
See the upstream implementation at sperrorest::partition_kmeans()
and
Brenning (2012) for further information.
Universal partitioning method that splits the data in the coordinate space.
Useful for spatially homogeneous datasets that cannot be split well with
rectangular approaches like ResamplingSpCVBlock
.
folds
(integer(1)
)
Number of folds.
mlr3::Resampling
-> ResamplingSpCVCoords
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create an "coordinate-based" repeated resampling instance.
For a list of available arguments, please see sperrorest::partition_cv.
ResamplingSpCVCoords$new(id = "spcv_coords")
id
character(1)
Identifier for the resampling strategy.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingSpCVCoords$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingSpCVCoords$clone(deep = FALSE)
deep
Whether to make a deep clone.
Brenning A (2012). “Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. doi:10.1109/igarss.2012.6352393.
library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_coords", folds = 5) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: rcv$instance # table
library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_coords", folds = 5) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: rcv$instance # table
Spatial partitioning using circular test areas of one of more observations.
Optionally, a buffer around the test area can be used to exclude observations.
See the upstream implementation at sperrorest::partition_disc()
and
Brenning (2012) for further information.
folds
(integer(1)
)
Number of folds.
radius
(numeric(1)
)
Radius of test area disc.
buffer
(integer(1)
)
Radius around test area disc which is excluded from training or test set.
prob
(integer(1)
)
Optional argument passed down to sample()
.
replace
(logical(1)
)
Optional argument passed down to sample()
. Sample with or without
replacement.
mlr3::Resampling
-> ResamplingSpCVDisc
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create a "Spatial 'Disc' resampling" resampling instance.
For a list of available arguments, please see sperrorest::partition_disc.
ResamplingSpCVDisc$new(id = "spcv_disc")
id
character(1)
Identifier for the resampling strategy.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingSpCVDisc$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingSpCVDisc$clone(deep = FALSE)
deep
Whether to make a deep clone.
Brenning A (2012). “Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. doi:10.1109/igarss.2012.6352393.
library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_disc", folds = 3L, radius = 200L, buffer = 200L) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: rcv$instance # table
library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_disc", folds = 3L, radius = 200L, buffer = 200L) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: rcv$instance # table
Splits data by clustering in the feature space.
See the upstream implementation at blockCV::cv_cluster()
and
Valavi et al. (2018) for further information.
Useful when the dataset is supposed to be split on environmental information which is present in features. The method allows for a combination of multiple features for clustering.
The input of raster images directly as in blockCV::cv_cluster()
is not
supported. See mlr3spatial and its raster DataBackends for such
support in mlr3.
folds
(integer(1)
)
Number of folds.
features
(character()
)
The features to use for clustering.
mlr3::Resampling
-> ResamplingSpCVEnv
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create an "Environmental Block" resampling instance.
For a list of available arguments, please see blockCV::cv_cluster.
ResamplingSpCVEnv$new(id = "spcv_env")
id
character(1)
Identifier for the resampling strategy.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingSpCVEnv$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingSpCVEnv$clone(deep = FALSE)
deep
Whether to make a deep clone.
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. doi:10.1101/357798.
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_env", folds = 4) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) intersect(rcv$train_set(1), rcv$test_set(1)) # Internal storage: rcv$instance }
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_env", folds = 4) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) intersect(rcv$train_set(1), rcv$test_set(1)) # Internal storage: rcv$instance }
This function implements the kNNDM algorithm and returns the necessary indices to perform a k-fold NNDM CV for map validation.
knndm is a k-fold version of NNDM LOO CV for medium and large datasets. Brielfy, the algorithm tries to find a k-fold configuration such that the integral of the absolute differences (Wasserstein W statistic) between the empirical nearest neighbour distance distribution function between the test and training data during CV (Gj*), and the empirical nearest neighbour distance distribution function between the prediction and training points (Gij), is minimised. It does so by performing clustering of the training points' coordinates for different numbers of clusters that range from k to N (number of observations), merging them into k final folds, and selecting the configuration with the lowest W.
Using a projected CRS in 'knndm' has large computational advantages since fast nearest neighbour search can be done via the 'FNN' package, while working with geographic coordinates requires computing the full spherical distance matrices. As a clustering algorithm, 'kmeans' can only be used for projected CRS while 'hierarchical' can work with both projected and geographical coordinates, though it requires calculating the full distance matrix of the training points even for a projected CRS.
In order to select between clustering algorithms and number of folds 'k', different 'knndm' configurations can be run and compared, being the one with a lower W statistic the one that offers a better match. W statistics between 'knndm' runs are comparable as long as 'tpoints' and 'predpoints' or 'modeldomain' stay the same.
Map validation using 'knndm' should be used using 'CAST::global_validation', i.e. by stacking all out-of-sample predictions and evaluating them all at once. The reasons behind this are 1) The resulting folds can be unbalanced and 2) nearest neighbour functions are constructed and matched using all CV folds simultaneously.
If training data points are very clustered with respect to the prediction area and the presented 'knndm' configuration still show signs of Gj* > Gij, there are several things that can be tried. First, increase the 'maxp' parameter; this may help to control for strong clustering (at the cost of having unbalanced folds). Secondly, decrease the number of final folds 'k', which may help to have larger clusters.
The 'modeldomain' is either a sf polygon that defines the prediction area, or alternatively a SpatRaster out of which a polygon, transformed into the CRS of the training points, is defined as the outline of all non-NA cells. Then, the function takes a regular point sample (amount defined by 'samplesize') from the spatial extent. As an alternative use 'predpoints' instead of 'modeldomain', if you have already defined the prediction locations (e.g. raster pixel centroids). When using either 'modeldomain' or 'predpoints', we advise to plot the study area polygon and the training/prediction points as a previous step to ensure they are aligned.
'knndm' can also be performed in the feature space by setting 'space' to "feature". Euclidean distances or Mahalanobis distances can be used for distance calculation, but only Euclidean are tested. In this case, nearest neighbour distances are calculated in n-dimensional feature space rather than in geographical space. 'tpoints' and 'predpoints' can be data frames or sf objects containing the values of the features. Note that the names of 'tpoints' and 'predpoints' must be the same. 'predpoints' can also be missing, if 'modeldomain' is of class SpatRaster. In this case, the values of of the SpatRaster will be extracted to the 'predpoints'. In the case of any categorical features, Gower distances will be used to calculate the Nearest Neighbour distances [Experimental]. If categorical features are present, and 'clustering' = "kmeans", K-Prototype clustering will be performed instead.
folds
(integer(1)
)
Number of folds.
stratify
If TRUE
, stratify on the target column.
mlr3::Resampling
-> ResamplingSpCVKnndm
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create a "K-fold Nearest Neighbour Distance Matching" resampling instance.
ResamplingSpCVKnndm$new(id = "spcv_knndm")
id
character(1)
Identifier for the resampling strategy.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingSpCVKnndm$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingSpCVKnndm$clone(deep = FALSE)
deep
Whether to make a deep clone.
Linnenbrink, J., Mila, C., Ludwig, M., Meyer, H. (2023). “kNNDM: k-fold Nearest Neighbour Distance Matching Cross-Validation for map accuracy estimation.” EGUsphere, 2023, 1–16. doi:10.5194/egusphere-2023-1308, https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1308/.
if (mlr3misc::require_namespaces(c("sf", "CAST"), quietly = TRUE)) { library(mlr3) library(sf) set.seed(42) task = tsk("ecuador") points = sf::st_as_sf(task$coordinates(), crs = task$crs, coords = c("x", "y")) modeldomain = sf::st_as_sfc(sf::st_bbox(points)) set.seed(42) cv_knndm = rsmp("spcv_knndm", modeldomain = modeldomain) cv_knndm$instantiate(task) #' ### Individual sets: # cv_knndm$train_set(1) # cv_knndm$test_set(1) # check that no obs are in both sets intersect(cv_knndm$train_set(1), cv_knndm$test_set(1)) # good! # Internal storage: # cv_knndm$instance # table }
if (mlr3misc::require_namespaces(c("sf", "CAST"), quietly = TRUE)) { library(mlr3) library(sf) set.seed(42) task = tsk("ecuador") points = sf::st_as_sf(task$coordinates(), crs = task$crs, coords = c("x", "y")) modeldomain = sf::st_as_sfc(sf::st_bbox(points)) set.seed(42) cv_knndm = rsmp("spcv_knndm", modeldomain = modeldomain) cv_knndm$instantiate(task) #' ### Individual sets: # cv_knndm$train_set(1) # cv_knndm$test_set(1) # check that no obs are in both sets intersect(cv_knndm$train_set(1), cv_knndm$test_set(1)) # good! # Internal storage: # cv_knndm$instance # table }
Spatial partitioning using rectangular tiles.
Small partitions can optionally be merged into adjacent ones to avoid
partitions with too few observations.
This method is similar to ResamplingSpCVBlock
by making use of
rectangular zones in the coordinate space.
See the upstream implementation at sperrorest::partition_disc()
and
Brenning (2012) for further information.
dsplit
(integer(2)
)
Equidistance of splits in (possibly rotated) x direction (dsplit[1]
) and y direction (dsplit[2]
) used to define tiles.
If dsplit is of length 1, its value is recycled.
Either dsplit
or nsplit
must be specified.
nsplit
(integer(2)
)
Number of splits in (possibly rotated) x direction (nsplit[1]
) and y direction (nsplit[2]
) used to define tiles.
If nsplit
is of length 1, its value is recycled.
rotation
(character(1)
)
Whether and how the rectangular grid should be rotated; random rotation is only possible between -45 and +45 degrees.
Accepted values: One of c("none", "random", "user")
.
user_rotation
(character(1)
)
Only used when rotation = "user"
.
Angle(s) (in degrees) by which the rectangular grid is to be rotated in
each repetition.
Either a vector of same length as repeats
, or a single number that
will be replicated length(repeats)
times.
offset
(logical(1)
)
Whether and how the rectangular grid should be shifted by an offset.
Accepted values: One of c("none", "random", "user")
.
user_offset
(logical(1)
)
Only used when offset = "user"
.
A list (or vector) of two components specifying a shift of the rectangular
grid in (possibly rotated) x and y direction.
The offset values are relative values, a value of 0.5 resulting in a
one-half tile shift towards the left, or upward.
If this is a list, its first (second) component refers to the rotated
x (y) direction, and both components must have same length as repeats
(or length 1).
If a vector of length 2 (or list components have length 1), the two values
will be interpreted as relative shifts in (rotated) x and y direction,
respectively, and will therefore be recycled as needed (length(repeats)
times each).
reassign
(logical(1)
)
If TRUE
, 'small' tiles (as per min_frac
and min_n
) are merged with
(smallest) adjacent tiles.
If FALSE
, small tiles are 'eliminated', i.e., set to NA.
min_frac
(numeric(1)
)
Value must be >=0, <1.
Minimum relative size of partition as percentage of sample.
min_n
(integer(1)
)
Minimum number of samples per partition.
iterate
(integer(1)
)
Passed down to sperrorest::tile_neighbors()
.
mlr3::Resampling
-> ResamplingSpCVTiles
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create a "Spatial 'Tiles' resampling" resampling instance.
ResamplingSpCVTiles$new(id = "spcv_tiles")
id
character(1)
Identifier for the resampling strategy.
For a list of available arguments, please see
sperrorest::partition_tiles.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingSpCVTiles$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingSpCVTiles$clone(deep = FALSE)
deep
Whether to make a deep clone.
Brenning A (2012). “Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. doi:10.1109/igarss.2012.6352393.
ResamplingSpCVBlock
if (mlr3misc::require_namespaces("sperrorest", quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_tiles", nsplit = c(4L, 3L), reassign = FALSE) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: rcv$instance # table }
if (mlr3misc::require_namespaces("sperrorest", quietly = TRUE)) { library(mlr3) task = tsk("ecuador") # Instantiate Resampling rcv = rsmp("spcv_tiles", nsplit = c(4L, 3L), reassign = FALSE) rcv$instantiate(task) # Individual sets: rcv$train_set(1) rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: rcv$instance # table }
Splits data using Leave-Location-Out (LLO), Leave-Time-Out (LTO) and
Leave-Location-and-Time-Out (LLTO) partitioning.
See the upstream implementation at CreateSpacetimeFolds()
(package CAST) and Meyer et al. (2018) for further information.
LLO predicts on unknown locations i.e. complete locations are left out in the
training sets.
The "space"
role in Task$col_roles
identifies spatial units.
If stratify
is TRUE
, the target distribution is similar in each fold.
This is useful for land cover classification when the observations
are polygons.
In this case, LLO with stratification should be used to hold back complete
polygons and have a similar target distribution in each fold.
LTO leaves out complete temporal units which are identified by the
"time"
role in Task$col_roles
.
LLTO leaves out spatial and temporal units.
See the examples.
folds
(integer(1)
)
Number of folds.
stratify
If TRUE
, stratify on the target column.
mlr3::Resampling
-> ResamplingSptCVCstf
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create a "Spacetime Folds" resampling instance.
ResamplingSptCVCstf$new(id = "sptcv_cstf")
id
character(1)
Identifier for the resampling strategy.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingSptCVCstf$instantiate(task)
task
mlr3::Task
A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingSptCVCstf$clone(deep = FALSE)
deep
Whether to make a deep clone.
Meyer H, Reudenbach C, Hengl T, Katurji M, Nauss T (2018). “Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation.” Environmental Modelling & Software, 101, 1–9. doi:10.1016/j.envsoft.2017.12.001.
library(mlr3) task = tsk("cookfarm_mlr3") task$set_col_roles("SOURCEID", roles = "space") task$set_col_roles("Date", roles = "time") # Instantiate Resampling rcv = rsmp("sptcv_cstf", folds = 5) rcv$instantiate(task) ### Individual sets: # rcv$train_set(1) # rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: # rcv$instance # table
library(mlr3) task = tsk("cookfarm_mlr3") task$set_col_roles("SOURCEID", roles = "space") task$set_col_roles("Date", roles = "time") # Instantiate Resampling rcv = rsmp("sptcv_cstf", folds = 5) rcv$instantiate(task) ### Individual sets: # rcv$train_set(1) # rcv$test_set(1) # check that no obs are in both sets intersect(rcv$train_set(1), rcv$test_set(1)) # good! # Internal storage: # rcv$instance # table
The R.J. Cook Agronomy Farm (cookfarm) is a Long-Term Agroecosystem Research Site operated by Washington State University, located near Pullman, Washington, USA. Contains spatio-temporal (3D+T) measurements of three soil properties and a number of spatial and temporal regression covariates.
Here, only the "Profiles" dataset is used from the collection.
The Date
column was appended from the readings
dataset.
In addition coordinates were appended to the task as variables "x"
and "y"
.
The dataset was borrowed and adapted from package GSIF which was on archived on CRAN in 2021-03.
data(cookfarm_mlr3)
data(cookfarm_mlr3)
R6::R6Class inheriting from mlr3::TaskRegr.
mlr_tasks$get("cookfarm") tsk("cookfarm_mlr3")
The task has set column roles "space" and "time" for variables "Date"
and
"SOURCEID"
, respectively.
These are used by certain methods during partitioning, e.g.,
mlr_resamplings_sptcv_cstf
with variant "Leave-location-and-time-out".
If only one of space or time should left out, the column roles must be
adjusted by the user!
Gasch, C.K., Hengl, T., Gräler, B., Meyer, H., Magney, T., Brown, D.J., 2015. Spatio-temporal interpolation of soil water, temperature, and electrical conductivity in 3D+T: the Cook Agronomy Farm data set. Spatial Statistics, 14, pp.70–90.
Gasch, C.K., D.J. Brown, E.S. Brooks, M. Yourek, M. Poggio, D.R. Cobos, C.S. Campbell, 2016? Retroactive calibration of soil moisture sensors using a two-step, soil-specific correction. Submitted to Vadose Zone Journal.
Gasch, C.K., D.J. Brown, C.S. Campbell, D.R. Cobos, E.S. Brooks, M. Chahal, M. Poggio, 2016? A field-scale sensor network data set for monitoring and modeling the spatial and temporal variation of soil moisture in a dryland agricultural field. Submitted to Water Resources Research.
Dictionary of Tasks: mlr3::mlr_tasks
as.data.table(mlr_tasks)
for a complete table of all (also dynamically created) Tasks.
Other Task:
TaskClassifST
,
TaskRegrST
,
mlr_tasks_diplodia
,
mlr_tasks_ecuador
Data set created by Patrick Schratz, University of Jena (Germany) and Eugenia Iturritxa, NEIKER, Vitoria-Gasteiz (Spain). This dataset should be cited as Schratz et al. (2019) (see reference below). The publication also contains additional information on data collection. The data set provided here shows infections of trees by the pathogen Diplodia Sapinea in the Basque Country in Spain. Predictors are environmental variables like temperature, precipitation, soil and more.
data(diplodia)
data(diplodia)
R6::R6Class inheriting from mlr3::TaskClassif.
mlr_tasks$get("diplodia") tsk("diplodia")
Schratz P, Muenchow J, Iturritxa E, Richter J, Brenning A (2019). “Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data.” Ecological Modelling, 406, 109–120. doi:10.1016/j.ecolmodel.2019.06.002.
Dictionary of Tasks: mlr3::mlr_tasks
as.data.table(mlr_tasks)
for a complete table of all (also dynamically created) Tasks.
Other Task:
TaskClassifST
,
TaskRegrST
,
mlr_tasks_cookfarm_mlr3
,
mlr_tasks_ecuador
Data set created by Jannes Muenchow, University of Erlangen-Nuernberg, Germany. This dataset should be cited as Muenchow et al. (2012) (see reference below). The publication also contains additional information on data collection and the geomorphology of the area. The data set provided here is (a subset of) the one from the 'natural' part of the RBSF area and corresponds to landslide distribution in the year 2000.
data(ecuador)
data(ecuador)
R6::R6Class inheriting from mlr3::TaskClassif.
mlr_tasks$get("ecuador") tsk("ecuador")
Muenchow, J., Brenning, A., Richter, M., 2012. Geomorphic process rates of landslides along a humidity gradient in the tropical Andes. Geomorphology, 139-140: 271-284.
Dictionary of Tasks: mlr3::mlr_tasks
as.data.table(mlr_tasks)
for a complete table of all (also dynamically created) Tasks.
Other Task:
TaskClassifST
,
TaskRegrST
,
mlr_tasks_cookfarm_mlr3
,
mlr_tasks_diplodia
This task specializes mlr3::Task and mlr3::TaskSupervised for
spatiotemporal classification problems. The target column is assumed to be a
factor. The task_type
is set to "classif"
and "spatiotemporal"
.
A spatial example task is available via tsk("ecuador")
, a spatiotemporal
one via tsk("cookfarm_mlr3")
.
The coordinate reference system passed during initialization must match the
one which was used during data creation, otherwise offsets of multiple meters
may occur. By default, coordinates are not used as features. This can be
changed by setting coords_as_features = TRUE
.
mlr3::Task
-> mlr3::TaskSupervised
-> mlr3::TaskClassif
-> TaskClassifST
crs
(character(1)
)
Returns coordinate reference system of task.
coordinate_names
(character()
)
Coordinate names.
coords_as_features
(logical(1)
)
If TRUE
, coordinates are used as features.
This is a shortcut for
task$set_col_roles(c("x", "y"), role = "feature")
with the assumption
that the coordinates in the data are named "x"
and "y"
.
mlr3::Task$add_strata()
mlr3::Task$cbind()
mlr3::Task$data()
mlr3::Task$divide()
mlr3::Task$filter()
mlr3::Task$format()
mlr3::Task$formula()
mlr3::Task$head()
mlr3::Task$help()
mlr3::Task$levels()
mlr3::Task$missings()
mlr3::Task$rbind()
mlr3::Task$rename()
mlr3::Task$select()
mlr3::Task$set_col_roles()
mlr3::Task$set_levels()
mlr3::Task$set_row_roles()
mlr3::TaskClassif$droplevels()
mlr3::TaskClassif$truth()
new()
Create a new spatiotemporal resampling Task
TaskClassifST$new( id, backend, target, positive = NULL, label = NA_character_, coordinate_names, crs = NA_character_, coords_as_features = FALSE, extra_args = list() )
id
(character(1)
)
Identifier for the new instance.
backend
(mlr3::DataBackend)
Either a mlr3::DataBackend, or any object which is convertible to a
mlr3::DataBackend with as_data_backend()
.
E.g., am sf
will be converted to a mlr3::DataBackendDataTable.
target
(character(1)
)
Name of the target column.
positive
(character(1)
)
Only for binary classification: Name of the positive class.
The levels of the target columns are reordered accordingly, so that the
first element of $class_names
is the positive class, and the second
element is the negative class.
label
(character(1)
)
Label for the new instance. Shown in as.data.table(mlr_tasks)
.
coordinate_names
(character(1)
)
The column names of the coordinates in the data.
crs
(character(1)
)
Coordinate reference system.
WKT2 or EPSG string.
coords_as_features
(logical(1)
)
If TRUE
, coordinates are used as features.
This is a shortcut for
task$set_col_roles(c("x", "y"), role = "feature")
with the assumption
that the coordinates in the data are named "x"
and "y"
.
extra_args
(named list()
)
Named list of constructor arguments, required for converting task types
via mlr3::convert_task()
.
coordinates()
Returns coordinates of observations.
TaskClassifST$coordinates(row_ids = NULL)
row_ids
(integer()
)
Vector of rows indices as subset of task$row_ids
.
print()
Print the task.
TaskClassifST$print(...)
...
Arguments passed to the $print()
method of the superclass.
clone()
The objects of this class are cloneable with this method.
TaskClassifST$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Task:
TaskRegrST
,
mlr_tasks_cookfarm_mlr3
,
mlr_tasks_diplodia
,
mlr_tasks_ecuador
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { task = as_task_classif_st(ecuador, target = "slides", positive = "TRUE", coordinate_names = c("x", "y") ) # passing objects of class 'sf' is also supported data_sf = sf::st_as_sf(ecuador, coords = c("x", "y")) task = as_task_classif_st(data_sf, target = "slides", positive = "TRUE") task$task_type task$formula() task$class_names task$positive task$negative task$coordinates() task$coordinate_names }
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) { task = as_task_classif_st(ecuador, target = "slides", positive = "TRUE", coordinate_names = c("x", "y") ) # passing objects of class 'sf' is also supported data_sf = sf::st_as_sf(ecuador, coords = c("x", "y")) task = as_task_classif_st(data_sf, target = "slides", positive = "TRUE") task$task_type task$formula() task$class_names task$positive task$negative task$coordinates() task$coordinate_names }
This task specializes mlr3::Task and mlr3::TaskSupervised for spatiotemporal classification problems.
A spatial example task is available via tsk("ecuador")
, a spatiotemporal
one via tsk("cookfarm_mlr3")
.
The coordinate reference system passed during initialization must match the
one which was used during data creation, otherwise offsets of multiple meters
may occur. By default, coordinates are not used as features. This can be
changed by setting coords_as_features = TRUE
.
mlr3::Task
-> mlr3::TaskSupervised
-> mlr3::TaskRegr
-> TaskRegrST
crs
(character(1)
)
Returns coordinate reference system of task.
coordinate_names
(character()
)
Coordinate names.
coords_as_features
(logical(1)
)
If TRUE
, coordinates are used as features.
This is a shortcut for
task$set_col_roles(c("x", "y"), role = "feature")
with the assumption
that the coordinates in the data are named "x"
and "y"
.
mlr3::Task$add_strata()
mlr3::Task$cbind()
mlr3::Task$data()
mlr3::Task$divide()
mlr3::Task$droplevels()
mlr3::Task$filter()
mlr3::Task$format()
mlr3::Task$formula()
mlr3::Task$head()
mlr3::Task$help()
mlr3::Task$levels()
mlr3::Task$missings()
mlr3::Task$rbind()
mlr3::Task$rename()
mlr3::Task$select()
mlr3::Task$set_col_roles()
mlr3::Task$set_levels()
mlr3::Task$set_row_roles()
mlr3::TaskRegr$truth()
new()
Create a new spatiotemporal resampling Task Returns coordinates of observations.
TaskRegrST$new( id, backend, target, label = NA_character_, coordinate_names, crs = NA_character_, coords_as_features = FALSE, extra_args = list() )
id
(character(1)
)
Identifier for the new instance.
backend
(mlr3::DataBackend)
Either a mlr3::DataBackend, or any object which is convertible to a
mlr3::DataBackend with as_data_backend()
.
E.g., am sf
will be converted to a mlr3::DataBackendDataTable.
target
(character(1)
)
Name of the target column.
label
(character(1)
)
Label for the new instance. Shown in as.data.table(mlr_tasks)
.
coordinate_names
(character(1)
)
The column names of the coordinates in the data.
crs
(character(1)
)
Coordinate reference system.
WKT2 or EPSG string.
coords_as_features
(logical(1)
)
If TRUE
, coordinates are used as features.
This is a shortcut for
task$set_col_roles(c("x", "y"), role = "feature")
with the assumption
that the coordinates in the data are named "x"
and "y"
.
extra_args
(named list()
)
Named list of constructor arguments, required for converting task types
via mlr3::convert_task()
.
coordinates()
TaskRegrST$coordinates(row_ids = NULL)
row_ids
(integer()
)
Vector of rows indices as subset of task$row_ids
.
print()
Print the task.
TaskRegrST$print(...)
...
Arguments passed to the $print()
method of the superclass.
clone()
The objects of this class are cloneable with this method.
TaskRegrST$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Task:
TaskClassifST
,
mlr_tasks_cookfarm_mlr3
,
mlr_tasks_diplodia
,
mlr_tasks_ecuador