Title: | A Set of Common Tools for Seasonal to Decadal Verification |
---|---|
Description: | The advanced version of package 's2dverification'. It is intended for 'seasonal to decadal' (s2d) climate forecast verification, but it can also be used in other kinds of forecasts or general climate analysis. This package is specially designed for the comparison between the experimental and observational datasets. The functionality of the included functions covers from data retrieval, data post-processing, skill scores against observation, to visualization. Compared to 's2dverification', 's2dv' is more compatible with the package 'startR', able to use multiple cores for computation and handle multi-dimensional arrays with a higher flexibility. The CDO version used in development is 1.9.8. |
Authors: | BSC-CNS [aut, cph], An-Chi Ho [aut], Nuria Perez-Zanon [aut], Roberto Bilbao [ctb], Josep Cos [ctb], Carlos Delgado [ctb], Llorenç Lledó [ctb], Andrea Manrique [ctb], Deborah Verfaillie [ctb], Eva Rifà [ctb], Victòria Agudetse [ctb], Nadia Milders [ctb], Ariadna Batalla [ctb, cre] |
Maintainer: | Ariadna Batalla <[email protected]> |
License: | GPL-3 |
Version: | 2.1.0 |
Built: | 2024-12-13 06:58:04 UTC |
Source: | CRAN |
The Absolute Mean Bias Skill Score is based on the Absolute Mean Error (Wilks,
2011) between the ensemble mean forecast and the observations. It measures
the accuracy of the forecast in comparison with a reference forecast to assess
whether the forecast presents an improvement or a worsening with respect to
that reference. The Mean Bias Skill Score ranges between minus infinite and 1.
Positive values indicate that the forecast has higher skill than the reference
forecast, while negative values indicate that it has a lower skill. Examples
of reference forecasts are the climatological forecast (average of the
observations), a previous model version, or another model. It is computed as
AbsBiasSS = 1 - AbsBias_exp / AbsBias_ref
. The statistical significance
is obtained based on a Random Walk test at the confidence level specified
(DelSole and Tippett, 2016). If there is more than one dataset, the result
will be computed for each pair of exp and obs data.
AbsBiasSS( exp, obs, ref = NULL, time_dim = "sdate", memb_dim = NULL, dat_dim = NULL, na.rm = FALSE, sig_method.type = "two.sided.approx", alpha = 0.05, ncores = NULL )
AbsBiasSS( exp, obs, ref = NULL, time_dim = "sdate", memb_dim = NULL, dat_dim = NULL, na.rm = FALSE, sig_method.type = "two.sided.approx", alpha = 0.05, ncores = NULL )
exp |
A named numerical array of the forecast with at least time dimension. |
obs |
A named numerical array of the observation with at least time dimension. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. |
ref |
A named numerical array of the reference forecast data with at least time dimension. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. If there is only one reference dataset, it should not have dataset dimension. If there is corresponding reference for each experiement, the dataset dimension must have the same length as in 'exp'. If 'ref' is NULL, the climatological forecast is used as reference forecast. The default value is NULL. |
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the ensemble mean; it should be set to NULL if the parameter 'exp' and 'ref' are already the ensemble mean. The default value is NULL. |
dat_dim |
A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL. |
na.rm |
A logical value indicating if NAs should be removed (TRUE) or kept (FALSE) for computation. The default value is FALSE. |
sig_method.type |
A character string indicating the test type of the
significance method. Check |
alpha |
A numeric of the significance level to be used in the statistical significance test. The default value is 0.05. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
$biasSS |
A numerical array of BiasSS with dimensions nexp, nobs and the rest dimensions of 'exp' except 'time_dim' and 'memb_dim'. |
$sign |
A logical array of the statistical significance of the BiasSS with the same dimensions as $biasSS. nexp is the number of experiment (i.e., 'dat_dim' in exp), and nobs is the number of observation (i.e., 'dat_dim' in obs). If dat_dim is NULL, nexp and nobs are omitted. |
Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7 DelSole and Tippett, 2016; https://doi.org/10.1175/MWR-D-15-0218.1
exp <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, member = 10, sdate = 50)) ref <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, sdate = 50)) biasSS1 <- AbsBiasSS(exp = exp, obs = obs, ref = ref, memb_dim = 'member') biasSS2 <- AbsBiasSS(exp = exp, obs = obs, ref = NULL, memb_dim = 'member')
exp <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, member = 10, sdate = 50)) ref <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, sdate = 50)) biasSS1 <- AbsBiasSS(exp = exp, obs = obs, ref = ref, memb_dim = 'member') biasSS2 <- AbsBiasSS(exp = exp, obs = obs, ref = NULL, memb_dim = 'member')
Calculate the spatial anomaly correlation coefficient (ACC) for the ensemble mean of each model and the corresponding references over a spatial domain. It can return a forecast time series if the data contain forest time dimension, and also the ACC mean over one dimension, e.g., start date dimension. The domain of interest can be specified by providing the list of longitudes/ latitudes of the data together with the corners of the domain: lonlatbox = c(lonmin, lonmax, latmin, latmax). The data will be adjusted to have a spatial mean of zero, then area weighting is applied. The formula is referenced from Wilks (2011; section 7.6.4; https://doi.org/10.1016/B978-0-12-385022-5.00008-7).
ACC( exp, obs, dat_dim = NULL, lat_dim = "lat", lon_dim = "lon", avg_dim = "sdate", memb_dim = "member", lat = NULL, lon = NULL, lonlatbox = NULL, alpha = 0.05, pval = TRUE, sign = FALSE, conf = TRUE, conftype = "parametric", ncores = NULL )
ACC( exp, obs, dat_dim = NULL, lat_dim = "lat", lon_dim = "lon", avg_dim = "sdate", memb_dim = "member", lat = NULL, lon = NULL, lonlatbox = NULL, alpha = 0.05, pval = TRUE, sign = FALSE, conf = TRUE, conftype = "parametric", ncores = NULL )
exp |
A numeric array of experimental anomalies with named dimensions. The dimension must have at least 'lat_dim' and 'lon_dim'. |
obs |
A numeric array of observational anomalies with named dimensions. The dimension should be the same as 'exp' except the length of 'dat_dim' and 'memb_dim'. |
dat_dim |
A character string indicating the name of dataset (nobs/nexp) dimension. The default value is NULL (no dataset). |
lat_dim |
A character string indicating the name of the latitude dimension of 'exp' and 'obs' along which ACC is computed. The default value is 'lat'. |
lon_dim |
A character string indicating the name of the longitude dimension of 'exp' and 'obs' along which ACC is computed. The default value is 'lon'. |
avg_dim |
A character string indicating the name of the dimension to be averaged, which is usually the time dimension. If no need to calculate mean ACC, set as NULL. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension. If the data are not ensemble ones, set as NULL. The default value is 'member'. |
lat |
A vector of the latitudes of the exp/obs grids. It is used for area weighting and when the domain of interested 'lonlatbox' is specified. |
lon |
A vector of the longitudes of the exp/obs grids. Only required when the domain of interested 'lonlatbox' is specified. The default value is NULL. |
lonlatbox |
A numeric vector of 4 indicating the corners of the domain of interested: c(lonmin, lonmax, latmin, latmax). The default value is NULL and the whole data will be used. |
alpha |
A numeric indicating the significance level for the statistical significance test. The default value is 0.05. |
pval |
A logical value indicating whether to compute the p-value or not. The default value is TRUE. |
sign |
A logical value indicating whether to retrieve the statistical significance of the test Ho: ACC = 0 based on 'alpha'. The default value is FALSE. |
conf |
A logical value indicating whether to retrieve the confidence intervals or not. The default value is TRUE. |
conftype |
A charater string of "parametric" or "bootstrap". "parametric" provides a confidence interval for the ACC computed by a Fisher transformation and a significance level for the ACC from a one-sided student-T distribution. "bootstrap" provides a confidence interval for the ACC and MACC computed from bootstrapping on the members with 100 drawings with replacement. To guarantee the statistical robustness of the result, make sure that your experiment and observation always have the same number of members. "bootstrap" requires 'memb_dim' has value. The default value is 'parametric'. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing the numeric arrays:
acc |
The ACC with the dimensions c(nexp, nobs, the rest of the dimension except lat_dim, lon_dim and memb_dim). nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the number of observation (i.e., dat_dim in obs). If dat_dim is NULL, nexp and nobs are omitted. |
macc |
The mean anomaly correlation coefficient with dimensions c(nexp, nobs, the rest of the dimension except lat_dim, lon_dim, memb_dim, and avg_dim). Only present if 'avg_dim' is not NULL. If dat_dim is NULL, nexp and nobs are omitted. |
conf.lower (if conftype = "parametric") or acc_conf.lower (if
conftype = "bootstrap") |
The lower confidence interval of ACC with the same dimensions as ACC. Only
present if |
conf.upper (if conftype = "parametric") or acc_conf.upper (if
conftype = "bootstrap") |
The upper confidence interval of ACC with the same dimensions as ACC. Only
present if |
p.val |
The p-value with the same dimensions as ACC. Only present if
|
$sign |
The statistical significance. Only present if |
macc_conf.lower |
The lower confidence interval of MACC with the same dimensions as MACC.
Only present if |
macc_conf.upper |
The upper confidence interval of MACC with the same dimensions as MACC.
Only present if |
Joliffe and Stephenson (2012). Forecast Verification: A Practitioner's Guide in Atmospheric Science. Wiley-Blackwell.; Wilks (2011; section 7.6.4; https://doi.org/10.1016/B978-0-12-385022-5.00008-7).
sampleData$mod <- Season(sampleData$mod, monini = 11, moninf = 12, monsup = 2) sampleData$obs <- Season(sampleData$obs, monini = 11, moninf = 12, monsup = 2) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) acc <- ACC(ano_exp, ano_obs, lat = sampleData$lat, dat_dim = 'dataset') acc_bootstrap <- ACC(ano_exp, ano_obs, conftype = 'bootstrap', lat = sampleData$lat, dat_dim = 'dataset') # Combine acc results for PlotACC res <- array(c(acc$conf.lower, acc$acc, acc$conf.upper, acc$p.val), dim = c(dim(acc$acc), 4)) res_bootstrap <- array(c(acc$acc_conf.lower, acc$acc, acc$acc_conf.upper, acc$p.val), dim = c(dim(acc$acc), 4)) PlotACC(res, startDates) PlotACC(res_bootstrap, startDates)
sampleData$mod <- Season(sampleData$mod, monini = 11, moninf = 12, monsup = 2) sampleData$obs <- Season(sampleData$obs, monini = 11, moninf = 12, monsup = 2) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) acc <- ACC(ano_exp, ano_obs, lat = sampleData$lat, dat_dim = 'dataset') acc_bootstrap <- ACC(ano_exp, ano_obs, conftype = 'bootstrap', lat = sampleData$lat, dat_dim = 'dataset') # Combine acc results for PlotACC res <- array(c(acc$conf.lower, acc$acc, acc$conf.upper, acc$p.val), dim = c(dim(acc$acc), 4)) res_bootstrap <- array(c(acc$acc_conf.lower, acc$acc, acc$acc_conf.upper, acc$p.val), dim = c(dim(acc$acc), 4)) PlotACC(res, startDates) PlotACC(res_bootstrap, startDates)
The Atlantic Multidecadal Variability (AMV), also known as Atlantic Multidecadal Oscillation (AMO), is a mode of natural variability of the sea surface temperatures (SST) over the North Atlantic Ocean on multi-decadal time scales. The AMV index is computed as the difference of weighted-averaged SST anomalies over the North Atlantic region (0ºN-60ºN, 280ºE-360ºE) and the weighted-averaged SST anomalies over 60ºS-60ºN, 0ºE-360ºE (Trenberth & Dennis, 2005; Doblas-Reyes et al., 2013). If different members and/or datasets are provided, the climatology (used to calculate the anomalies) is computed individually for all of them.
AMV( data, data_lats, data_lons, type, lat_dim = "lat", lon_dim = "lon", mask = NULL, monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
AMV( data, data_lats, data_lons, type, lat_dim = "lat", lon_dim = "lon", mask = NULL, monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
data |
A numerical array to be used for the index computation with, at least, the dimensions: 1) latitude, longitude, start date and forecast month (in case of decadal predictions), 2) latitude, longitude, year and month (in case of historical simulations or observations). This data has to be provided, at least, over the whole region needed to compute the index. |
data_lats |
A numeric vector indicating the latitudes of the data. |
data_lons |
A numeric vector indicating the longitudes of the data. |
type |
A character string indicating the type of data ('dcpp' for decadal predictions, 'hist' for historical simulations, or 'obs' for observations or reanalyses). |
lat_dim |
A character string of the name of the latitude dimension. The default value is 'lat'. |
lon_dim |
A character string of the name of the longitude dimension. The default value is 'lon'. |
mask |
An array of a mask (with 0's in the grid points that have to be masked) or NULL (i.e., no mask is used). This parameter allows to remove the values over land in case the dataset is a combination of surface air temperature over land and sea surface temperature over the ocean. Also, it can be used to mask those grid points that are missing in the observational dataset for a fair comparison between the forecast system and the reference dataset. The default value is NULL. |
monini |
An integer indicating the month in which the forecast system is initialized. Only used when parameter 'type' is 'dcpp'. The default value is 11, i.e., initialized in November. |
fmonth_dim |
A character string indicating the name of the forecast month dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'fmonth'. |
sdate_dim |
A character string indicating the name of the start date dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'sdate'. |
indices_for_clim |
A numeric vector of the indices of the years to
compute the climatology for calculating the anomalies, or NULL so the
climatology is calculated over the whole period. If the data are already
anomalies, set it to FALSE. The default value is NULL. |
year_dim |
A character string indicating the name of the year dimension The default value is 'year'. Only used if parameter 'type' is 'hist' or 'obs'. |
month_dim |
A character string indicating the name of the month dimension. The default value is 'month'. Only used if parameter 'type' is 'hist' or 'obs'. |
na.rm |
A logical value indicanting whether to remove NA values. The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array with the AMV index with the same dimensions as data except the lat_dim, lon_dim and fmonth_dim (month_dim) in case of decadal predictions (historical simulations or observations). In case of decadal predictions, a new dimension 'fyear' is added.
## Observations or reanalyses obs <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_obs <- AMV(data = obs, data_lats = lat, data_lons = lon, type = 'obs') ## Historical simulations hist <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_hist <- AMV(data = hist, data_lats = lat, data_lons = lon, type = 'hist') ## Decadal predictions dcpp <- array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_dcpp <- AMV(data = dcpp, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1)
## Observations or reanalyses obs <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_obs <- AMV(data = obs, data_lats = lat, data_lons = lon, type = 'obs') ## Historical simulations hist <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_hist <- AMV(data = hist, data_lats = lat, data_lons = lon, type = 'hist') ## Decadal predictions dcpp <- array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_dcpp <- AMV(data = dcpp, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1)
Create animations of maps in an equi-rectangular or stereographic projection, showing the anomalies, the climatologies, the mean InterQuartile Range, Maximum-Mininum, Standard Deviation, Median Absolute Deviation, the trends, the RMSE, the correlation or the RMSSS, between modelled and observed data along the forecast time (lead-time) for all input experiments and input observational datasets.
AnimateMap( var, lon, lat, toptitle = rep("", 11), sizetit = 1, units = "", monini = 1, freq = 12, msk95lev = FALSE, brks = NULL, cols = NULL, filled.continents = FALSE, lonmin = 0, lonmax = 360, latmin = -90, latmax = 90, intlon = 20, intlat = 30, drawleg = TRUE, subsampleg = 1, colNA = "white", equi = TRUE, fileout = c("output1_animvsltime.gif", "output2_animvsltime.gif", "output3_animvsltime.gif"), ... )
AnimateMap( var, lon, lat, toptitle = rep("", 11), sizetit = 1, units = "", monini = 1, freq = 12, msk95lev = FALSE, brks = NULL, cols = NULL, filled.continents = FALSE, lonmin = 0, lonmax = 360, latmin = -90, latmax = 90, intlon = 20, intlat = 30, drawleg = TRUE, subsampleg = 1, colNA = "white", equi = TRUE, fileout = c("output1_animvsltime.gif", "output2_animvsltime.gif", "output3_animvsltime.gif"), ... )
var |
Matrix of dimensions (nltime, nlat, nlon) or (nexp/nmod, nltime, nlat, nlon) or (nexp/nmod, 3/4, nltime, nlat, nlon) or (nexp/nmod, nobs, 3/4, nltime, nlat, nlon). |
lon |
Vector containing longtitudes (degrees). |
lat |
Vector containing latitudes (degrees). |
toptitle |
c(”,”, ...) array of main title for each animation, optional. If RMS, RMSSS, correlations: first exp with successive obs, then second exp with successive obs, etc ... |
sizetit |
Multiplicative factor to increase title size, optional. |
units |
Units, optional. |
monini |
Starting month between 1 and 12. Default = 1. |
freq |
1 = yearly, 12 = monthly, 4 = seasonal ... |
msk95lev |
TRUE/FALSE grid points with dots if 95% significance level reached. Default = FALSE. |
brks |
Limits of colour levels, optional. For example: seq(min(var), max(var), (max(var) - min(var)) / 10). |
cols |
Vector of colours of length(brks) - 1, optional. |
filled.continents |
Continents filled in grey (TRUE) or represented by a black line (FALSE). Default = TRUE. Filling unavailable if crossing Greenwich and equi = TRUE. Filling unavailable if square = FALSE and equi = TRUE. |
lonmin |
Westward limit of the domain to plot (> 0 or < 0). Default : 0 degrees. |
lonmax |
Eastward limit of the domain to plot (> 0 or < 0). lonmax > lonmin. Default : 360 degrees. |
latmin |
Southward limit of the domain to plot. Default : -90 degrees. |
latmax |
Northward limit of the domain to plot. Default : 90 degrees. |
intlon |
Interval between longitude ticks on x-axis. Default = 20 degrees. |
intlat |
Interval between latitude ticks on y-axis for equi = TRUE or between latitude circles for equi = FALSE. Default = 30 degrees. |
drawleg |
Draw a colorbar. Can be FALSE only if square = FALSE or equi = FALSE. Default = TRUE. |
subsampleg |
Supsampling factor of the interval between ticks on colorbar. Default = 1 = every colour level. |
colNA |
Color used to represent NA. Default = 'white'. |
equi |
TRUE/FALSE == cylindrical equidistant/stereographic projection. Default: TRUE. |
fileout |
c(”, ”, ...) array of output file name for each animation. If RMS, RMSSS, correlations : first exp with successive obs, then second exp with successive obs, etc ... |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
Examples of input:
Outputs from clim (exp, obs, memb = FALSE): (nmod, nltime, nlat, nlon) or (nobs, nltime, nlat, nlon)
Model output from load/ano/smoothing: (nmod, nmemb, sdate, nltime, nlat, nlon) then passed through spread(var, posdim = 2, narm = TRUE) & mean1dim(var, posdim = 3, narm = TRUE) or through trend(mean1dim(var, 2), posTR = 2): (nmod, 3, nltime, nlat, nlon) animates average along start dates of IQR/MaxMin/SD/MAD across members or trends of the ensemble-mean computed accross the start dates.
model and observed output from load/ano/smoothing: (nmod, nmemb, sdate, nltime, nlat, nlon) & (nobs, nmemb, sdate, nltime, nlat, nlon) then averaged along members mean1dim(var_exp/var_obs, posdim = 2): (nmod, sdate, nltime, nlat, nlon) (nobs, sdate, nltime, nlat, nlon) then passed through corr(exp, obs, posloop = 1, poscor = 2) or RMS(exp, obs, posloop = 1, posRMS = 2): (nmod, nobs, 3, nltime, nlat, nlon) animates correlations or RMS between each exp & each obs against leadtime.
# See ?Load for explanations on the first part of this example ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, output = 'lonlat', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) clim <- Clim(sampleData$mod, sampleData$obs, memb = FALSE) ## Not run: AnimateMap(clim$clim_exp, sampleData$lon, sampleData$lat, toptitle = "climatology of decadal prediction", sizetit = 1, units = "degree", brks = seq(270, 300, 3), monini = 11, freq = 12, msk95lev = FALSE, filled.continents = TRUE, intlon = 10, intlat = 10, fileout = 'clim_dec.gif') ## End(Not run) # More examples in s2dverification but are deleted for now
# See ?Load for explanations on the first part of this example ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, output = 'lonlat', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) clim <- Clim(sampleData$mod, sampleData$obs, memb = FALSE) ## Not run: AnimateMap(clim$clim_exp, sampleData$lon, sampleData$lat, toptitle = "climatology of decadal prediction", sizetit = 1, units = "degree", brks = seq(270, 300, 3), monini = 11, freq = 12, msk95lev = FALSE, filled.continents = TRUE, intlon = 10, intlat = 10, fileout = 'clim_dec.gif') ## End(Not run) # More examples in s2dverification but are deleted for now
This function computes anomalies from a multidimensional data array and a climatology array.
Ano(data, clim, ncores = NULL)
Ano(data, clim, ncores = NULL)
data |
A numeric array with named dimensions, representing the model or observational data to be calculated the anomalies. It should involve all the dimensions in parameter 'clim', and it can have more other dimensions. |
clim |
A numeric array with named dimensions, representing the climatologies to be deducted from parameter 'data'. It can be generated by Clim(). The dimensions should all be involved in parameter 'data' with the same length. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
An array with same dimensions as parameter 'data'.
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) ## Not run: PlotAno(ano_exp, ano_obs, startDates, toptitle = 'Anomaly', ytitle = c('K', 'K', 'K'), legends = 'ERSST', biglab = FALSE) ## End(Not run)
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) ## Not run: PlotAno(ano_exp, ano_obs, startDates, toptitle = 'Anomaly', ytitle = c('K', 'K', 'K'), legends = 'ERSST', biglab = FALSE) ## End(Not run)
Compute the anomalies from the arrays of the experimental and observational data output by subtracting the climatologies computed with a leave-one-out cross validation technique and a per-pair method (Garcia-Serrano and Doblas-Reyes, CD, 2012). Per-pair climatology means that only the start dates covered by the whole experiments/observational datasets will be used. In other words, the startdates which do not all have values along 'dat_dim' dimension of both the 'exp' and 'obs' are excluded when computing the climatologies.
Ano_CrossValid( exp, obs, time_dim = "sdate", dat_dim = c("dataset", "member"), memb_dim = "member", memb = TRUE, ncores = NULL )
Ano_CrossValid( exp, obs, time_dim = "sdate", dat_dim = c("dataset", "member"), memb_dim = "member", memb = TRUE, ncores = NULL )
exp |
A named numeric array of experimental data, with at least dimensions 'time_dim' and 'dat_dim'. |
obs |
A named numeric array of observational data, same dimensions as parameter 'exp' except along 'dat_dim'. |
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
dat_dim |
A character vector indicating the name of the dataset and member dimensions. When calculating the climatology, if data at one startdate (i.e., 'time_dim') is not complete along 'dat_dim', this startdate along 'dat_dim' will be discarded. If there is no dataset dimension, it can be NULL. The default value is "c('dataset', 'member')". |
memb_dim |
A character string indicating the name of the member dimension. Only used when parameter 'memb' is FALSE. It must be one element in 'dat_dim'. The default value is 'member'. |
memb |
A logical value indicating whether to subtract the climatology based on the individual members (TRUE) or the ensemble mean over all members (FALSE) when calculating the anomalies. The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list of 2:
$exp |
A numeric array with the same dimensions as 'exp'. The dimension order may change. |
$obs |
A numeric array with the same dimensions as 'obs'.The dimension order may change. |
# Load sample data as in Load() example: example(Load) anomalies <- Ano_CrossValid(sampleData$mod, sampleData$obs) ## Not run: PlotAno(anomalies$exp, anomalies$obs, startDates, toptitle = paste('anomalies'), ytitle = c('K', 'K', 'K'), legends = 'ERSST', biglab = FALSE) ## End(Not run)
# Load sample data as in Load() example: example(Load) anomalies <- Ano_CrossValid(sampleData$mod, sampleData$obs) ## Not run: PlotAno(anomalies$exp, anomalies$obs, startDates, toptitle = paste('anomalies'), ytitle = c('K', 'K', 'K'), legends = 'ERSST', biglab = FALSE) ## End(Not run)
The Mean Bias or Mean Error (Wilks, 2011) is defined as the mean difference between the ensemble mean forecast and the observations. It is a deterministic metric. Positive values indicate that the forecasts are on average too high and negative values indicate that the forecasts are on average too low. It also allows to compute the Absolute Mean Bias or bias without temporal mean. If there is more than one dataset, the result will be computed for each pair of exp and obs data.
Bias( exp, obs, time_dim = "sdate", memb_dim = NULL, dat_dim = NULL, na.rm = FALSE, absolute = FALSE, time_mean = TRUE, alpha = 0.05, ncores = NULL )
Bias( exp, obs, time_dim = "sdate", memb_dim = NULL, dat_dim = NULL, na.rm = FALSE, absolute = FALSE, time_mean = TRUE, alpha = 0.05, ncores = NULL )
exp |
A named numerical array of the forecast with at least time dimension. |
obs |
A named numerical array of the observation with at least time dimension. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. |
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the ensemble mean; it should be set to NULL if the parameter 'exp' is already the ensemble mean. The default value is NULL. |
dat_dim |
A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL. |
na.rm |
A logical value indicating if NAs should be removed (TRUE) or kept (FALSE) for computation. The default value is FALSE. |
absolute |
A logical value indicating whether to compute the absolute bias. The default value is FALSE. |
time_mean |
A logical value indicating whether to compute the temporal mean of the bias. The default value is TRUE. |
alpha |
A numeric or NULL (default) to indicate the significance level using Welch's t-test. Only available when absolute is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array of bias with dimensions c(nexp, nobs, the rest dimensions of 'exp' except 'time_dim' (if time_mean = T) and 'memb_dim'). nexp is the number of experiment (i.e., 'dat_dim' in exp), and nobs is the number of observation (i.e., 'dat_dim' in obs). If dat_dim is NULL, nexp and nobs are omitted. If alpha is specified, and absolute is FALSE, the result is a list with two elements: the bias as described above and the significance as a logical array with the same dimensions.
Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7
exp <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, sdate = 50)) bias <- Bias(exp = exp, obs = obs, memb_dim = 'member') bias2 <- Bias(exp = exp, obs = obs, memb_dim = 'member', alpha = 0.01) abs_bias <- Bias(exp = exp, obs = obs, memb_dim = 'member', absolute = TRUE, alpha = NULL)
exp <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(dat = 1, lat = 3, lon = 5, sdate = 50)) bias <- Bias(exp = exp, obs = obs, memb_dim = 'member') bias2 <- Bias(exp = exp, obs = obs, memb_dim = 'member', alpha = 0.01) abs_bias <- Bias(exp = exp, obs = obs, memb_dim = 'member', absolute = TRUE, alpha = NULL)
Compute the Brier score (BS) and the components of its standard decompostion with the two within-bin components described in Stephenson et al., (2008). It also returns the bias-corrected decomposition of the BS (Ferro and Fricker, 2012). BSS has the climatology as the reference forecast.
BrierScore( exp, obs, thresholds = seq(0.1, 0.9, 0.1), time_dim = "sdate", dat_dim = NULL, memb_dim = NULL, ncores = NULL )
BrierScore( exp, obs, thresholds = seq(0.1, 0.9, 0.1), time_dim = "sdate", dat_dim = NULL, memb_dim = NULL, ncores = NULL )
exp |
A vector or a numeric array with named dimensions. It should be the predicted probabilities which are within the range [0, 1] if memb_dim doesn't exist. If it has memb_dim, the value should be 0 or 1, and the predicted probabilities will be computed by ensemble mean. The dimensions must at least have 'time_dim'. range [0, 1]. |
obs |
A numeric array with named dimensions of the binary observations (0 or 1). The dimension must be the same as 'exp' except memb_dim, which is optional. If it has 'memb_dim', then the length must be 1. The length of 'dat_dim' can be different from 'exp' if it has. |
thresholds |
A numeric vector used to bin the forecasts. The default
value is |
time_dim |
A character string indicating the name of dimension along which Brier score is computed. The default value is 'sdate'. |
dat_dim |
A character string indicating the name of dataset dimension in 'exp' and 'obs'. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL. |
memb_dim |
A character string of the name of the member dimension in 'exp' (and 'obs', optional). The function will do the ensemble mean over this dimension. If there is no member dimension, set NULL. The default value is NULL. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list that contains:
$rel |
standard reliability |
$res |
standard resolution |
$unc |
standard uncertainty |
$bs |
Brier score |
$bs_check_res |
rel - res + unc |
$bss_res |
res - rel / unc |
$gres |
generalized resolution |
$bs_check_gres |
rel - gres + unc |
$bss_gres |
gres - rel / unc |
$rel_bias_corrected |
bias - corrected rel |
$gres_bias_corrected |
bias - corrected gres |
$unc_bias_corrected |
bias - corrected unc |
$bss_bias_corrected |
gres_bias_corrected - rel_bias_corrected / unc_bias_corrected |
$nk |
number of forecast in each bin |
$fkbar |
average probability of each bin |
$okbar |
relative frequency that the observed event occurred |
The data type and dimensions of the items depend on if the input 'exp' and
'obs' are:
(a) Vectors
(b) Arrays with 'dat_dim' specified
(c) Arrays with no 'dat_dim' specified
Items 'rel', 'res', 'unc', 'bs', 'bs_check_res', 'bss_res', 'gres',
'bs_check_gres', 'bss_gres', 'rel_bias_corrected', 'gres_bias_corrected',
'unc_bias_corrected', and 'bss_bias_corrected' are (a) a number (b) an array
with dimensions c(nexp, nobs, all the rest dimensions in 'exp' and 'obs'
except 'time_dim' and 'memb_dim') (c) an array with dimensions of
'exp' and 'obs' except 'time_dim' and 'memb_dim'
Items 'nk', 'fkbar', and 'okbar' are (a) a vector of length of bin number
determined by 'threshold' (b) an array with dimensions c(nexp, nobs,
no. of bins, all the rest dimensions in 'exp' and 'obs' except 'time_dim' and
'memb_dim') (c) an array with dimensions c(no. of bin, all the rest dimensions
in 'exp' and 'obs' except 'time_dim' and 'memb_dim')
Wilks (2006) Statistical Methods in the Atmospheric Sciences.
Stephenson et al. (2008). Two extra components in the Brier score decomposition.
Weather and Forecasting, 23: 752-757.
Ferro and Fricker (2012). A bias-corrected decomposition of the BS.
Quarterly Journal of the Royal Meteorological Society, DOI: 10.1002/qj.1924.
# Inputs are vectors exp <- runif(10) obs <- round(exp) x <- BrierScore(exp, obs) # Inputs are arrays example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) bins_ano_exp <- ProbBins(ano_exp, thr = c(1/3, 2/3)) bins_ano_obs <- ProbBins(ano_obs, thr = c(1/3, 2/3)) res <- BrierScore(bins_ano_exp, MeanDims(bins_ano_obs, 'member'), memb_dim = 'member')
# Inputs are vectors exp <- runif(10) obs <- round(exp) x <- BrierScore(exp, obs) # Inputs are arrays example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) bins_ano_exp <- ProbBins(ano_exp, thr = c(1/3, 2/3)) bins_ano_obs <- ProbBins(ano_obs, thr = c(1/3, 2/3)) res <- BrierScore(bins_ano_exp, MeanDims(bins_ano_obs, 'member'), memb_dim = 'member')
This function takes as inputs a multidimensional array (optional), a vector
or matrix of longitudes, a vector or matrix of latitudes, a destination grid
specification, and the name of a method to be used to interpolate (one of
those available in the 'remap' utility in CDO). The interpolated array is
returned (if provided) together with the new longitudes and latitudes.CDORemap()
permutes by default the dimensions of the input array (if
needed), splits it in chunks (CDO can work with data arrays of up to 4
dimensions), generates a file with the data of each chunk, interpolates it
with CDO, reads it back into R and merges it into a result array. If no
input array is provided, the longitude and latitude vectors will be
transformed only. If the array is already on the desired destination grid,
no transformation is performed (this behvaiour works only for lonlat and
gaussian grids).
Any metadata attached to the input data array, longitudes or latitudes will
be preserved or accordingly modified.
CDORemap( data_array = NULL, lons, lats, grid, method, avoid_writes = TRUE, crop = TRUE, force_remap = FALSE, write_dir = tempdir(), print_sys_msg = FALSE, ncores = NULL )
CDORemap( data_array = NULL, lons, lats, grid, method, avoid_writes = TRUE, crop = TRUE, force_remap = FALSE, write_dir = tempdir(), print_sys_msg = FALSE, ncores = NULL )
data_array |
Multidimensional numeric array to be interpolated. If
provided, it must have at least a longitude and a latitude dimensions,
identified by the array dimension names. The names for these dimensions
must be one of the recognized by s2dverification (can be checked with
|
lons |
Numeric vector or array of longitudes of the centers of the grid cells. Its size must match the size of the longitude/latitude dimensions of the input array. |
lats |
Numeric vector or array of latitudes of the centers of the grid cells. Its size must match the size of the longitude/latitude dimensions of the input array. |
grid |
Character string specifying either a name of a target grid (recognized by CDO; e.g.: 'r256x128', 't106grid') or a path to another NetCDF file which to read the target grid from (a single grid must be defined in such file). |
method |
Character string specifying an interpolation method (recognized by CDO; e.g.: 'con', 'bil', 'bic', 'dis', 'con2', 'laf', 'nn'). The following long names are also supported: 'conservative', 'bilinear', 'bicubic' and 'distance-weighted'. |
avoid_writes |
The step of permutation is needed when the input array
has more than 3 dimensions and none of the longitude or latitude dimensions
in the right-most position (CDO would not accept it without permuting
previously). This step, executed by default when needed, can be avoided
for the price of writing more intermediate files (whis usually is
unconvenient) by setting the parameter |
crop |
Whether to crop the data after interpolation with
'cdo sellonlatbox' (TRUE) or to extend interpolated data to the whole
world as CDO does by default (FALSE). The default value is TRUE.
|
force_remap |
Whether to force remapping, even if the input data array is already on the target grid. |
write_dir |
Path to the directory where to create the intermediate
files for CDO to work. By default, the R session temporary directory is
used ( |
print_sys_msg |
A logical value indicating to print the messages from system CDO commands. The default is FALSE to keep function using clean. |
ncores |
An integer indicating the number of theads used for
interpolation (i.e., |
A list with the following components:
'data_array' |
The interpolated data array (if an input array is provided at all, NULL otherwise). |
'lons' |
The longitudes of the data on the destination grid. |
'lats' |
The latitudes of the data on the destination grid. |
## Not run: # Interpolating only vectors of longitudes and latitudes lon <- seq(0, 360 - 360/50, length.out = 50) lat <- seq(-90, 90, length.out = 25) tas2 <- CDORemap(NULL, lon, lat, 't170grid', 'bil', TRUE) # Minimal array interpolation tas <- array(1:50, dim = c(25, 50)) names(dim(tas)) <- c('lat', 'lon') lon <- seq(0, 360 - 360/50, length.out = 50) lat <- seq(-90, 90, length.out = 25) tas2 <- CDORemap(tas, lon, lat, 't170grid', 'bil', TRUE) # Metadata can be attached to the inputs. It will be preserved and # accordignly modified. tas <- array(1:50, dim = c(25, 50)) names(dim(tas)) <- c('lat', 'lon') lon <- seq(0, 360 - 360/50, length.out = 50) metadata <- list(lon = list(units = 'degrees_east')) attr(lon, 'variables') <- metadata lat <- seq(-90, 90, length.out = 25) metadata <- list(lat = list(units = 'degrees_north')) attr(lat, 'variables') <- metadata metadata <- list(tas = list(dim = list(lat = list(len = 25, vals = lat), lon = list(len = 50, vals = lon) ))) attr(tas, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 't170grid', 'bil', TRUE) # Arrays of any number of dimensions in any order can be provided. num_lats <- 25 num_lons <- 50 tas <- array(1:(10*num_lats*10*num_lons*10), dim = c(10, num_lats, 10, num_lons, 10)) names(dim(tas)) <- c('a', 'lat', 'b', 'lon', 'c') lon <- seq(0, 360 - 360/num_lons, length.out = num_lons) metadata <- list(lon = list(units = 'degrees_east')) attr(lon, 'variables') <- metadata lat <- seq(-90, 90, length.out = num_lats) metadata <- list(lat = list(units = 'degrees_north')) attr(lat, 'variables') <- metadata metadata <- list(tas = list(dim = list(a = list(), lat = list(len = num_lats, vals = lat), b = list(), lon = list(len = num_lons, vals = lon), c = list() ))) attr(tas, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', TRUE) # The step of permutation can be avoided but more intermediate file writes # will be performed. tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', FALSE) # If the provided array has the longitude or latitude dimension in the # right-most position, the same number of file writes will be performed, # even if avoid_wrties = FALSE. num_lats <- 25 num_lons <- 50 tas <- array(1:(10*num_lats*10*num_lons*10), dim = c(10, num_lats, 10, num_lons)) names(dim(tas)) <- c('a', 'lat', 'b', 'lon') lon <- seq(0, 360 - 360/num_lons, length.out = num_lons) metadata <- list(lon = list(units = 'degrees_east')) attr(lon, 'variables') <- metadata lat <- seq(-90, 90, length.out = num_lats) metadata <- list(lat = list(units = 'degrees_north')) attr(lat, 'variables') <- metadata metadata <- list(tas = list(dim = list(a = list(), lat = list(len = num_lats, vals = lat), b = list(), lon = list(len = num_lons, vals = lon) ))) attr(tas, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', TRUE) tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', FALSE) # An example of an interpolation from and onto a rectangular regular grid num_lats <- 25 num_lons <- 50 tas <- array(1:(1*num_lats*num_lons), dim = c(num_lats, num_lons)) names(dim(tas)) <- c('y', 'x') lon <- array(seq(0, 360 - 360/num_lons, length.out = num_lons), dim = c(num_lons, num_lats)) metadata <- list(lon = list(units = 'degrees_east')) names(dim(lon)) <- c('x', 'y') attr(lon, 'variables') <- metadata lat <- t(array(seq(-90, 90, length.out = num_lats), dim = c(num_lats, num_lons))) metadata <- list(lat = list(units = 'degrees_north')) names(dim(lat)) <- c('x', 'y') attr(lat, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 'r100x50', 'bil') # An example of an interpolation from an irregular grid onto a gaussian grid num_lats <- 25 num_lons <- 50 tas <- array(1:(10*num_lats*10*num_lons*10), dim = c(10, num_lats, 10, num_lons)) names(dim(tas)) <- c('a', 'j', 'b', 'i') lon <- array(seq(0, 360 - 360/num_lons, length.out = num_lons), dim = c(num_lons, num_lats)) metadata <- list(lon = list(units = 'degrees_east')) names(dim(lon)) <- c('i', 'j') attr(lon, 'variables') <- metadata lat <- t(array(seq(-90, 90, length.out = num_lats), dim = c(num_lats, num_lons))) metadata <- list(lat = list(units = 'degrees_north')) names(dim(lat)) <- c('i', 'j') attr(lat, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil') # Again, the dimensions can be in any order num_lats <- 25 num_lons <- 50 tas <- array(1:(10*num_lats*10*num_lons), dim = c(10, num_lats, 10, num_lons)) names(dim(tas)) <- c('a', 'j', 'b', 'i') lon <- array(seq(0, 360 - 360/num_lons, length.out = num_lons), dim = c(num_lons, num_lats)) names(dim(lon)) <- c('i', 'j') lat <- t(array(seq(-90, 90, length.out = num_lats), dim = c(num_lats, num_lons))) names(dim(lat)) <- c('i', 'j') tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil') tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', FALSE) # It is ossible to specify an external NetCDF file as target grid reference tas2 <- CDORemap(tas, lon, lat, 'external_file.nc', 'bil') ## End(Not run)
## Not run: # Interpolating only vectors of longitudes and latitudes lon <- seq(0, 360 - 360/50, length.out = 50) lat <- seq(-90, 90, length.out = 25) tas2 <- CDORemap(NULL, lon, lat, 't170grid', 'bil', TRUE) # Minimal array interpolation tas <- array(1:50, dim = c(25, 50)) names(dim(tas)) <- c('lat', 'lon') lon <- seq(0, 360 - 360/50, length.out = 50) lat <- seq(-90, 90, length.out = 25) tas2 <- CDORemap(tas, lon, lat, 't170grid', 'bil', TRUE) # Metadata can be attached to the inputs. It will be preserved and # accordignly modified. tas <- array(1:50, dim = c(25, 50)) names(dim(tas)) <- c('lat', 'lon') lon <- seq(0, 360 - 360/50, length.out = 50) metadata <- list(lon = list(units = 'degrees_east')) attr(lon, 'variables') <- metadata lat <- seq(-90, 90, length.out = 25) metadata <- list(lat = list(units = 'degrees_north')) attr(lat, 'variables') <- metadata metadata <- list(tas = list(dim = list(lat = list(len = 25, vals = lat), lon = list(len = 50, vals = lon) ))) attr(tas, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 't170grid', 'bil', TRUE) # Arrays of any number of dimensions in any order can be provided. num_lats <- 25 num_lons <- 50 tas <- array(1:(10*num_lats*10*num_lons*10), dim = c(10, num_lats, 10, num_lons, 10)) names(dim(tas)) <- c('a', 'lat', 'b', 'lon', 'c') lon <- seq(0, 360 - 360/num_lons, length.out = num_lons) metadata <- list(lon = list(units = 'degrees_east')) attr(lon, 'variables') <- metadata lat <- seq(-90, 90, length.out = num_lats) metadata <- list(lat = list(units = 'degrees_north')) attr(lat, 'variables') <- metadata metadata <- list(tas = list(dim = list(a = list(), lat = list(len = num_lats, vals = lat), b = list(), lon = list(len = num_lons, vals = lon), c = list() ))) attr(tas, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', TRUE) # The step of permutation can be avoided but more intermediate file writes # will be performed. tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', FALSE) # If the provided array has the longitude or latitude dimension in the # right-most position, the same number of file writes will be performed, # even if avoid_wrties = FALSE. num_lats <- 25 num_lons <- 50 tas <- array(1:(10*num_lats*10*num_lons*10), dim = c(10, num_lats, 10, num_lons)) names(dim(tas)) <- c('a', 'lat', 'b', 'lon') lon <- seq(0, 360 - 360/num_lons, length.out = num_lons) metadata <- list(lon = list(units = 'degrees_east')) attr(lon, 'variables') <- metadata lat <- seq(-90, 90, length.out = num_lats) metadata <- list(lat = list(units = 'degrees_north')) attr(lat, 'variables') <- metadata metadata <- list(tas = list(dim = list(a = list(), lat = list(len = num_lats, vals = lat), b = list(), lon = list(len = num_lons, vals = lon) ))) attr(tas, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', TRUE) tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', FALSE) # An example of an interpolation from and onto a rectangular regular grid num_lats <- 25 num_lons <- 50 tas <- array(1:(1*num_lats*num_lons), dim = c(num_lats, num_lons)) names(dim(tas)) <- c('y', 'x') lon <- array(seq(0, 360 - 360/num_lons, length.out = num_lons), dim = c(num_lons, num_lats)) metadata <- list(lon = list(units = 'degrees_east')) names(dim(lon)) <- c('x', 'y') attr(lon, 'variables') <- metadata lat <- t(array(seq(-90, 90, length.out = num_lats), dim = c(num_lats, num_lons))) metadata <- list(lat = list(units = 'degrees_north')) names(dim(lat)) <- c('x', 'y') attr(lat, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 'r100x50', 'bil') # An example of an interpolation from an irregular grid onto a gaussian grid num_lats <- 25 num_lons <- 50 tas <- array(1:(10*num_lats*10*num_lons*10), dim = c(10, num_lats, 10, num_lons)) names(dim(tas)) <- c('a', 'j', 'b', 'i') lon <- array(seq(0, 360 - 360/num_lons, length.out = num_lons), dim = c(num_lons, num_lats)) metadata <- list(lon = list(units = 'degrees_east')) names(dim(lon)) <- c('i', 'j') attr(lon, 'variables') <- metadata lat <- t(array(seq(-90, 90, length.out = num_lats), dim = c(num_lats, num_lons))) metadata <- list(lat = list(units = 'degrees_north')) names(dim(lat)) <- c('i', 'j') attr(lat, 'variables') <- metadata tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil') # Again, the dimensions can be in any order num_lats <- 25 num_lons <- 50 tas <- array(1:(10*num_lats*10*num_lons), dim = c(10, num_lats, 10, num_lons)) names(dim(tas)) <- c('a', 'j', 'b', 'i') lon <- array(seq(0, 360 - 360/num_lons, length.out = num_lons), dim = c(num_lons, num_lats)) names(dim(lon)) <- c('i', 'j') lat <- t(array(seq(-90, 90, length.out = num_lats), dim = c(num_lats, num_lons))) names(dim(lat)) <- c('i', 'j') tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil') tas2 <- CDORemap(tas, lon, lat, 't17grid', 'bil', FALSE) # It is ossible to specify an external NetCDF file as target grid reference tas2 <- CDORemap(tas, lon, lat, 'external_file.nc', 'bil') ## End(Not run)
This function computes per-pair climatologies for the experimental and observational data using one of the following methods:
per-pair method (Garcia-Serrano and Doblas-Reyes, CD, 2012 https://doi.org/10.1007/s00382-012-1413-1)
Kharin method (Kharin et al, GRL, 2012 https://doi.org/10.1029/2012GL052647)
Fuckar method (Fuckar et al, GRL, 2014 https://doi.org/10.1002/2014GL060815)
Per-pair climatology means that only the startdates covered by the whole experiments/observational dataset will be used. In other words, the startdates which are not all available along 'dat_dim' dimension of both the 'exp' and 'obs' are excluded when computing the climatologies. Kharin method is the linear trend bias correction method, and Fuckar method is the initial condition bias correction method. The two methods both do the per-pair correction beforehand.
Clim( exp, obs, time_dim = "sdate", dat_dim = c("dataset", "member"), method = "clim", ftime_dim = "ftime", memb = TRUE, memb_dim = "member", na.rm = TRUE, ncores = NULL )
Clim( exp, obs, time_dim = "sdate", dat_dim = c("dataset", "member"), method = "clim", ftime_dim = "ftime", memb = TRUE, memb_dim = "member", na.rm = TRUE, ncores = NULL )
exp |
A named numeric array of experimental data with at least dimension 'time_dim'. |
obs |
A named numeric array of observational data that has the same dimension as 'exp' except 'dat_dim'. |
time_dim |
A character string indicating the name of dimension along which the climatologies are computed. The default value is 'sdate'. |
dat_dim |
A character vector indicating the name of the dataset and member dimensions. If data at one startdate (i.e., 'time_dim') are not complete along 'dat_dim', this startdate along 'dat_dim' will be discarded. If there is no dataset dimension, it can be NULL, however, it will be more efficient to simply use mean() to do the calculation. The default value is "c('dataset', 'member')". |
method |
A character string indicating the method to be used. The options include 'clim' (per-pair method), 'kharin' (Kharin method), and 'NDV' (Fuckar method). The default value is 'clim'. |
ftime_dim |
A character string indicating the name of forecast time dimension. Only used when method = 'NDV'. The default value is 'ftime'. |
memb |
A logical value indicating whether to remain 'memb_dim' dimension (TRUE) or do ensemble mean over 'memb_dim' (FALSE). The default value is TRUE. |
memb_dim |
A character string indicating the name of the member dimension. Only used when parameter 'memb' is FALSE. It must be one element in 'dat_dim'. The default value is 'member'. |
na.rm |
A logical value indicating whether to remove NA values along 'time_dim' when calculating climatology (TRUE) or return NA if there is NA along 'time_dim' (FALSE). The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list of 2:
$clim_exp |
A numeric array with the same dimensions as parameter 'exp' but dimension 'time_dim' is moved to the first position. If parameter 'method' is 'clim', dimension 'time_dim' is removed. If parameter 'memb' is FALSE, dimension 'memb_dim' is also removed. |
$clim_obs |
A numeric array with the same dimensions as parameter 'obs' except dimension 'time_dim' is removed. If parameter 'memb' is FALSE, dimension 'memb_dim' is also removed. |
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) clim2 <- Clim(sampleData$mod, sampleData$obs, method = 'kharin', memb = FALSE) ## Not run: PlotClim(clim$clim_exp, clim$clim_obs, toptitle = paste('sea surface temperature climatologies'), ytitle = 'K', monini = 11, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE) ## End(Not run)
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) clim2 <- Clim(sampleData$mod, sampleData$obs, method = 'kharin', memb = FALSE) ## Not run: PlotClim(clim$clim_exp, clim$clim_obs, toptitle = paste('sea surface temperature climatologies'), ytitle = 'K', monini = 11, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE) ## End(Not run)
Generates a colorblind friendly color palette with color ranges useful in climate temperature variable plotting.
clim.palette(palette = "bluered") clim.colors(n, palette = "bluered")
clim.palette(palette = "bluered") clim.colors(n, palette = "bluered")
palette |
Which type of palette to generate: from blue through white to red ('bluered'), from red through white to blue ('redblue'), from yellow through orange to red ('yellowred'), from red through orange to red ('redyellow'), from purple through white to orange ('purpleorange'), and from orange through white to purple ('orangepurple'). |
n |
Number of colors to generate. |
lims <- seq(-1, 1, length.out = 21) ColorBar(lims, color_fun = clim.palette('redyellow')) cols <- clim.colors(20) ColorBar(lims, cols)
lims <- seq(-1, 1, length.out = 21) ColorBar(lims, color_fun = clim.palette('redyellow')) cols <- clim.colors(20) ColorBar(lims, cols)
Compute cluster centers and their time series of occurrences, with the K-means clustering method using Euclidean distance, of an array of input data with any number of dimensions that at least contain time_dim. Specifically, it partitions the array along time axis in K groups or clusters in which each space vector/array belongs to (i.e., is a member of) the cluster with the nearest center or centroid. This function is a wrapper of kmeans() and relies on the NbClust package (Charrad et al., 2014 JSS) to determine the optimal number of clusters used for K-means clustering if it is not provided by users.
Cluster( data, weights = NULL, time_dim = "sdate", space_dim = NULL, nclusters = NULL, index = "sdindex", ncores = NULL )
Cluster( data, weights = NULL, time_dim = "sdate", space_dim = NULL, nclusters = NULL, index = "sdindex", ncores = NULL )
data |
A numeric array with named dimensions that at least have 'time_dim' corresponding to time and 'space_dim' (optional) corresponding to either area-averages over a series of domains or the grid points for any sptial grid structure. |
weights |
A numeric array with named dimension of multiplicative weights based on the areas covering each domain/region or grid-cell of 'data'. The dimensions must be equal to the 'space_dim' in 'data'. The default value is NULL which means no weighting is applied. |
time_dim |
A character string indicating the name of time dimension in 'data'. The default value is 'sdate'. |
space_dim |
A character vector indicating the names of spatial dimensions in 'data'. The default value is NULL. |
nclusters |
A positive integer K that must be bigger than 1 indicating the number of clusters to be computed, or K initial cluster centers to be used in the method. The default value is NULL, which means that the number of clusters will be determined by NbClust(). The parameter 'index' therefore needs to be specified for NbClust() to find the optimal number of clusters to be used for K-means clustering calculation. |
index |
A character string of the validity index from NbClust package that can be used to determine optimal K if K is not specified with 'nclusters'. The default value is 'sdindex' (Halkidi et al. 2001, JIIS). Other indices available in NBClust are "kl", "ch", "hartigan", "ccc", "scott", "marriot", "trcovw", "tracew", "friedman", "rubin", "cindex", "db", "silhouette", "duda", "pseudot2", "beale", "ratkowsky", "ball", "ptbiserial", "gap", "frey", "mcclain", "gamma", "gplus", "tau", "dunn", "hubert", "sdindex", and "sdbw". One can also use all of them with the option 'alllong' or almost all indices except gap, gamma, gplus and tau with 'all', when the optimal number of clusters K is detremined by the majority rule (the maximum of histogram of the results of all indices with finite solutions). Use of some indices on a big and/or unstructured dataset can be computationally intense and/or could lead to numerical singularity. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing:
$cluster |
An integer array of the occurrence of a cluster along time, i.e., when certain data member in time is allocated to a specific cluster. The dimensions are same as 'data' without 'space_dim'. |
$centers |
A numeric array of cluster centres or centroids (e.g. [1:K, 1:spatial degrees of freedom]). The rest dimensions are same as 'data' except 'time_dim' and 'space_dim'. |
$totss |
A numeric array of the total sum of squares. The dimensions are same as 'data' except 'time_dim' and 'space_dim'. |
$withinss |
A numeric array of within-cluster sum of squares, one component per cluster. The first dimenion is the number of cluster, and the rest dimensions are same as 'data' except 'time_dim' and 'space_dim'. |
$tot.withinss |
A numeric array of the total within-cluster sum of squares, i.e., sum(withinss). The dimensions are same as 'data' except 'time_dim' and 'space_dim'. |
$betweenss |
A numeric array of the between-cluster sum of squares, i.e. totss-tot.withinss. The dimensions are same as 'data' except 'time_dim' and 'space_dim'. |
$size |
A numeric array of the number of points in each cluster. The first dimenion is the number of cluster, and the rest dimensions are same as 'data' except 'time_dim' and 'space_dim'. |
$iter |
A numeric array of the number of (outer) iterations. The dimensions are same as 'data' except 'time_dim' and 'space_dim'. |
$ifault |
A numeric array of an indicator of a possible algorithm problem. The dimensions are same as 'data' except 'time_dim' and 'space_dim'. |
Wilks, 2011, Statistical Methods in the Atmospheric Sciences, 3rd ed., Elsevire, pp 676.
# Generating synthetic data a1 <- array(dim = c(200, 4)) mean1 <- 0 sd1 <- 0.3 c0 <- seq(1, 200) c1 <- sort(sample(x = 1:200, size = sample(x = 50:150, size = 1), replace = FALSE)) x1 <- c(1, 1, 1, 1) for (i1 in c1) { a1[i1, ] <- x1 + rnorm(4, mean = mean1, sd = sd1) } c1p5 <- c0[!(c0 %in% c1)] c2 <- c1p5[seq(1, length(c1p5), 2)] x2 <- c(2, 2, 4, 4) for (i2 in c2) { a1[i2, ] <- x2 + rnorm(4, mean = mean1, sd = sd1) } c3 <- c1p5[seq(2, length(c1p5), 2)] x3 <- c(3, 3, 1, 1) for (i3 in c3) { a1[i3, ] <- x3 + rnorm(4, mean = mean1, sd = sd1) } # Computing the clusters names(dim(a1)) <- c('sdate', 'space') res1 <- Cluster(data = a1, weights = array(1, dim = dim(a1)[2]), nclusters = 3) res2 <- Cluster(data = a1, weights = array(1, dim = dim(a1)[2]))
# Generating synthetic data a1 <- array(dim = c(200, 4)) mean1 <- 0 sd1 <- 0.3 c0 <- seq(1, 200) c1 <- sort(sample(x = 1:200, size = sample(x = 50:150, size = 1), replace = FALSE)) x1 <- c(1, 1, 1, 1) for (i1 in c1) { a1[i1, ] <- x1 + rnorm(4, mean = mean1, sd = sd1) } c1p5 <- c0[!(c0 %in% c1)] c2 <- c1p5[seq(1, length(c1p5), 2)] x2 <- c(2, 2, 4, 4) for (i2 in c2) { a1[i2, ] <- x2 + rnorm(4, mean = mean1, sd = sd1) } c3 <- c1p5[seq(2, length(c1p5), 2)] x3 <- c(3, 3, 1, 1) for (i3 in c3) { a1[i3, ] <- x3 + rnorm(4, mean = mean1, sd = sd1) } # Computing the clusters names(dim(a1)) <- c('sdate', 'space') res1 <- Cluster(data = a1, weights = array(1, dim = dim(a1)[2]), nclusters = 3) res2 <- Cluster(data = a1, weights = array(1, dim = dim(a1)[2]))
Generates a color bar to use as colouring function for map plots and
optionally draws it (horizontally or vertically) to be added to map
multipanels or plots. It is possible to draw triangles at the ends of the
colour bar to represent values that go beyond the range of interest. A
number of options is provided to adjust the colours and the position and
size of the components. The drawn colour bar spans a whole figure region
and is compatible with figure layouts.
The generated colour bar consists of a set of breaks that define the
length(brks) - 1 intervals to classify each of the values in each of the
grid cells of a two-dimensional field. The corresponding grid cell of a
given value of the field will be coloured in function of the interval it
belongs to.
The only mandatory parameters are 'var_limits' or 'brks' (in its second
format, see below).
ColorBar( brks = NULL, cols = NULL, vertical = TRUE, subsampleg = NULL, bar_limits = NULL, var_limits = NULL, triangle_ends = NULL, col_inf = NULL, col_sup = NULL, color_fun = clim.palette(), plot = TRUE, draw_ticks = TRUE, draw_separators = FALSE, triangle_ends_scale = 1, extra_labels = NULL, title = NULL, title_scale = 1, label_scale = 1, tick_scale = 1, extra_margin = rep(0, 4), label_digits = 4, ... )
ColorBar( brks = NULL, cols = NULL, vertical = TRUE, subsampleg = NULL, bar_limits = NULL, var_limits = NULL, triangle_ends = NULL, col_inf = NULL, col_sup = NULL, color_fun = clim.palette(), plot = TRUE, draw_ticks = TRUE, draw_separators = FALSE, triangle_ends_scale = 1, extra_labels = NULL, title = NULL, title_scale = 1, label_scale = 1, tick_scale = 1, extra_margin = rep(0, 4), label_digits = 4, ... )
brks |
Can be provided in two formats:
This parameter is optional if 'var_limits' is specified. If 'brks' not specified but 'cols' is specified, it will take as value length(cols) + 1. If 'cols' is not specified either, 'brks' will take 21 as value. |
cols |
Vector of length(brks) - 1 valid colour identifiers, for each
interval defined by the breaks. This parameter is optional and will be
filled in with a vector of length(brks) - 1 colours generated with the
function provided in 'color_fun' ( |
vertical |
TRUE/FALSE for vertical/horizontal colour bar (disregarded if plot = FALSE). |
subsampleg |
The first of each subsampleg breaks will be ticked on the colorbar. Takes by default an approximation of a value that yields a readable tick arrangement (extreme breaks always ticked). If set to 0 or lower, no labels are drawn. See the code of the function for details or use 'extra_labels' for customized tick arrangements. |
bar_limits |
Vector of two numeric values with the extremes of the range of values represented in the colour bar. If 'var_limits' go beyond this interval, the drawing of triangle extremes is triggered at the corresponding sides, painted in 'col_inf' and 'col_sup'. Either of them can be set as NA and will then take as value the corresponding extreme in 'var_limits' (hence a triangle end won't be triggered for these sides). Takes as default the extremes of 'brks' if available, else the same values as 'var_limits'. |
var_limits |
Vector of two numeric values with the minimum and maximum values of the field to represent. These are used to know whether to draw triangle ends at the extremes of the colour bar and what colour to fill them in with. If not specified, take the same value as the extremes of 'brks'. Hence the parameter 'brks' is mandatory if 'var_limits' is not specified. |
triangle_ends |
Vector of two logical elements, indicating whether to force the drawing of triangle ends at each of the extremes of the colour bar. This choice is automatically made from the provided 'brks', 'bar_limits', 'var_limits', 'col_inf' and 'col_sup', but the behaviour can be manually forced to draw or not to draw the triangle ends with this parameter. If 'cols' is provided, 'col_inf' and 'col_sup' will take priority over 'triangle_ends' when deciding whether to draw the triangle ends or not. |
col_inf |
Colour to fill the inferior triangle end with. Useful if specifying colours manually with parameter 'cols', to specify the colour and to trigger the drawing of the lower extreme triangle, or if 'cols' is not specified, to replace the colour automatically generated by ColorBar(). |
col_sup |
Colour to fill the superior triangle end with. Useful if specifying colours manually with parameter 'cols', to specify the colour and to trigger the drawing of the upper extreme triangle, or if 'cols' is not specified, to replace the colour automatically generated by ColorBar(). |
color_fun |
Function to generate the colours of the color bar. Must take an integer and must return as many colours. The returned colour vector can have the attribute 'na_color', with a colour to draw NA values. This parameter is set by default to clim.palette(). |
plot |
Logical value indicating whether to only compute its breaks and colours (FALSE) or to also draw it on the current device (TRUE). |
draw_ticks |
Whether to draw ticks for the labels along the colour bar (TRUE) or not (FALSE). TRUE by default. Disregarded if 'plot = FALSE'. |
draw_separators |
Whether to draw black lines in the borders of each of the colour rectancles of the colour bar (TRUE) or not (FALSE). FALSE by default. Disregarded if 'plot = FALSE'. |
triangle_ends_scale |
Scale factor for the drawn triangle ends of the colour bar, if drawn at all. Takes 1 by default (rectangle triangle proportional to the thickness of the colour bar). Disregarded if 'plot = FALSE'. |
extra_labels |
Numeric vector of extra labels to draw along axis of the colour bar. The number of provided decimals will be conserved. Disregarded if 'plot = FALSE'. |
title |
Title to draw on top of the colour bar, most commonly with the units of the represented field in the neighbour figures. Empty by default. |
title_scale |
Scale factor for the 'title' of the colour bar. Takes 1 by default. |
label_scale |
Scale factor for the labels of the colour bar. Takes 1 by default. |
tick_scale |
Scale factor for the length of the ticks of the labels along the colour bar. Takes 1 by default. |
extra_margin |
Extra margins to be added around the colour bar, in the format c(y1, x1, y2, x2). The units are margin lines. Takes rep(0, 4) by default. |
label_digits |
Number of significant digits to be displayed in the labels of the colour bar, usually to avoid too many decimal digits overflowing the figure region. This does not have effect over the labels provided in 'extra_labels'. Takes 4 by default. |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
brks |
Breaks used for splitting the range in intervals. |
cols |
Colours generated for each of the length(brks) - 1 intervals. Always of length length(brks) - 1. |
col_inf |
Colour used to draw the lower triangle end in the colour bar (NULL if not drawn at all). |
col_sup |
Colour used to draw the upper triangle end in the colour bar (NULL if not drawn at all). |
cols <- c("dodgerblue4", "dodgerblue1", "forestgreen", "yellowgreen", "white", "white", "yellow", "orange", "red", "saddlebrown") lims <- seq(-1, 1, 0.2) ColorBar(lims, cols)
cols <- c("dodgerblue4", "dodgerblue1", "forestgreen", "yellowgreen", "white", "white", "yellow", "orange", "red", "saddlebrown") lims <- seq(-1, 1, 0.2) ColorBar(lims, cols)
Composite a multi-dimensional array which contains two spatial and one temporal dimensions, e.g., (lon, lat, time), according to the indices of mode/cluster occurrences in time. The p-value by t-test is also computed.
Composite( data, occ, time_dim = "time", space_dim = c("lon", "lat"), lag = 0, eno = FALSE, K = NULL, fileout = NULL, ncores = NULL )
Composite( data, occ, time_dim = "time", space_dim = c("lon", "lat"), lag = 0, eno = FALSE, K = NULL, fileout = NULL, ncores = NULL )
data |
A numeric array containing two spatial and one temporal dimensions. |
occ |
A vector of the occurrence time series of mode(s)/cluster(s). The length should be the same as the temporal dimension in 'data'. (*1) When one wants to composite all modes, e.g., all K = 3 clusters then for example occurrences could look like: 1 1 2 3 2 3 1 3 3 2 3 2 2 3 2. (*2) Otherwise for compositing only the 2nd mode or cluster of the above example occurrences should look like 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1. |
time_dim |
A character string indicating the name of the temporal dimension in 'data'. The default value is 'time'. |
space_dim |
A character vector indicating the names of the spatial dimensions in 'data'. The default value is c('lon', 'lat'). |
lag |
An integer indicating the lag time step. E.g., for lag = 2, +2 occurrences will be used (i.e., shifted 2 time steps forward). The default value is 0. |
eno |
A logical value indicating whether to use the effective sample size (TRUE) or the total sample size (FALSE) for the number of degrees of freedom. The default value is FALSE. |
K |
A numeric value indicating the maximum number of composites. The default value is NULL, which means the maximum value provided in 'occ' is used. |
fileout |
A character string indicating the name of the .sav output file The default value is NULL, which means not to save the output. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing:
$composite |
A numeric array of the spatial dimensions and new dimension 'K' first, followed by the same dimensions as parameter 'data'. The length of dimension 'K' is parameter 'K'. |
$p.val |
A numeric array with the same dimension as $composite. It is the p-value of the composites obtained through a t-test that accounts for the serial dependence of the data. |
blank <- array(0, dim = c(20, 10, 30)) x1 <- blank t1 <- blank f1 <- blank for (i in 1:20) { x1[i, , ] <- i } for (i in 1:30) { t1[, , i] <- i } # This is 2D propagating sin wave example, where we use f1(lon, lat, time) # wave field. Compositing (like using stroboscopicc light) at different # time steps can lead to modification or cancelation of wave pattern. for (i in 1:20) { for (j in 1:30) { f1[i, , j] <- 3 * sin(2 * pi * x1[i, , j] / 5. - 2 * pi * t1[i, , j] / 6.) } } names(dim(f1)) <- c('lon', 'lat', 'time') occ <- rep(0, 30) occ[c(2, 5, 8, 11, 14, 17, 20, 23)] <- 1 res <- Composite(data = f1, occ = occ) filled.contour(res$composite[, , 1]) occ <- rep(0, 30) occ[c(3, 9, 15, 21)] <- 1 res <- Composite(data = f1, occ = occ) filled.contour(res$composite[, , 1]) # Example with one missing composite in occ: data <- 1:(4 * 5 * 6) dim(data) <- c(lon = 4, lat = 5, case = 6) occ <- c(1, 1, 2, 2, 3, 3) res <- Composite(data, occ, time_dim = 'case', K = 4)
blank <- array(0, dim = c(20, 10, 30)) x1 <- blank t1 <- blank f1 <- blank for (i in 1:20) { x1[i, , ] <- i } for (i in 1:30) { t1[, , i] <- i } # This is 2D propagating sin wave example, where we use f1(lon, lat, time) # wave field. Compositing (like using stroboscopicc light) at different # time steps can lead to modification or cancelation of wave pattern. for (i in 1:20) { for (j in 1:30) { f1[i, , j] <- 3 * sin(2 * pi * x1[i, , j] / 5. - 2 * pi * t1[i, , j] / 6.) } } names(dim(f1)) <- c('lon', 'lat', 'time') occ <- rep(0, 30) occ[c(2, 5, 8, 11, 14, 17, 20, 23)] <- 1 res <- Composite(data = f1, occ = occ) filled.contour(res$composite[, , 1]) occ <- rep(0, 30) occ[c(3, 9, 15, 21)] <- 1 res <- Composite(data = f1, occ = occ) filled.contour(res$composite[, , 1]) # Example with one missing composite in occ: data <- 1:(4 * 5 * 6) dim(data) <- c(lon = 4, lat = 5, case = 6) occ <- c(1, 1, 2, 2, 3, 3) res <- Composite(data, occ, time_dim = 'case', K = 4)
Given a pair of dataset name and variable name, this function determines applies all the matching entries found in the corresponding configuration table to work out the dataset main path, file path, actual name of variable inside NetCDF files, ...
ConfigApplyMatchingEntries( configuration, var, exp = NULL, obs = NULL, show_entries = FALSE, show_result = TRUE )
ConfigApplyMatchingEntries( configuration, var, exp = NULL, obs = NULL, show_entries = FALSE, show_result = TRUE )
configuration |
Configuration object obtained from ConfigFileOpen() or ConfigFileCreate(). |
var |
Name of the variable to load. Will be interpreted as a string, regular expressions do not apply here. Examples: 'tas' or 'tasmax_q90'. |
exp |
Set of experimental dataset identifiers. Will be interpreted as a strings, regular expressions do not apply here. Can be NULL (not to check in experimental dataset tables), and takes by default NULL. Examples: c('EnsEcmwfSeas', 'EnsUkmoSeas'), c('i00k'). |
obs |
Set of observational dataset identifiers. Will be interpreted as a strings, regular expressions do not apply here. Can be NULL (not to check in observational dataset tables), and takes by default NULL. Examples: c('GLORYS', 'ERAint'), c('NCEP'). |
show_entries |
Flag to stipulate whether to show the found matching entries for all datasets and variable name. |
show_result |
Flag to stipulate whether to show the result of applying all the matching entries (dataset main path, file path, ...). |
A list with the information resulting of applying the matching entries is returned.
ConfigApplyMatchingEntries, ConfigEditDefinition, ConfigEditEntry, ConfigFileOpen, ConfigShowSimilarEntries, ConfigShowTable
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") s2dv::ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Now apply matching entries for variable and experiment name and show the # result match_info <- ConfigApplyMatchingEntries(configuration, 'tas', exp = c('ExampleExperiment2'), show_result = TRUE)
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") s2dv::ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Now apply matching entries for variable and experiment name and show the # result match_info <- ConfigApplyMatchingEntries(configuration, 'tas', exp = c('ExampleExperiment2'), show_result = TRUE)
These functions help in adding, modifying or removing variable definitions
in a configuration object obtained with ConfigFileOpen
or
ConfigFileCreate
. ConfigEditDefinition() will add the
definition if not existing.
ConfigEditDefinition(configuration, name, value, confirm = TRUE) ConfigRemoveDefinition(configuration, name)
ConfigEditDefinition(configuration, name, value, confirm = TRUE) ConfigRemoveDefinition(configuration, name)
configuration |
Configuration object obtained wit ConfigFileOpen() or ConfigFileCreate(). |
name |
Name of the variable to add/modify/remove. |
value |
Value to associate to the variable. |
confirm |
Flag to stipulate whether to ask for confirmation if the variable is being modified. Takes by default TRUE. |
A modified configuration object is returned.
[ConfigApplyMatchingEntries()], [ConfigEditDefinition()], [ConfigEditEntry()], [ConfigFileOpen()], [ConfigShowSimilarEntries()], [ConfigShowTable()].
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Now apply matching entries for variable and experiment name and show the # result match_info <- ConfigApplyMatchingEntries(configuration, 'tas', exp = c('ExampleExperiment2'), show_result = TRUE)
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Now apply matching entries for variable and experiment name and show the # result match_info <- ConfigApplyMatchingEntries(configuration, 'tas', exp = c('ExampleExperiment2'), show_result = TRUE)
ConfigAddEntry(), ConfigEditEntry() and ConfigRemoveEntry() are functions
to manage entries in a configuration object created with ConfigFileOpen().
Before adding an entry, make sure the defaults don't do already what you
want (ConfigShowDefinitions(), ConfigShowTable()).
Before adding an entry, make sure it doesn't override and spoil what other
entries do (ConfigShowTable(), ConfigFileOpen()).
Before adding an entry, make sure there aren't other entries that already
do what you want (ConfigShowSimilarEntries()).
ConfigEditEntry( configuration, dataset_type, position, dataset_name = NULL, var_name = NULL, main_path = NULL, file_path = NULL, nc_var_name = NULL, suffix = NULL, varmin = NULL, varmax = NULL ) ConfigAddEntry( configuration, dataset_type, position = "last", dataset_name = ".*", var_name = ".*", main_path = "*", file_path = "*", nc_var_name = "*", suffix = "*", varmin = "*", varmax = "*" ) ConfigRemoveEntry( configuration, dataset_type, dataset_name = NULL, var_name = NULL, position = NULL )
ConfigEditEntry( configuration, dataset_type, position, dataset_name = NULL, var_name = NULL, main_path = NULL, file_path = NULL, nc_var_name = NULL, suffix = NULL, varmin = NULL, varmax = NULL ) ConfigAddEntry( configuration, dataset_type, position = "last", dataset_name = ".*", var_name = ".*", main_path = "*", file_path = "*", nc_var_name = "*", suffix = "*", varmin = "*", varmax = "*" ) ConfigRemoveEntry( configuration, dataset_type, dataset_name = NULL, var_name = NULL, position = NULL )
configuration |
Configuration object obtained via ConfigFileOpen() or ConfigFileCreate() that will be modified accordingly. |
dataset_type |
Whether to modify a table of experimental datasets or a table of observational datasets. Can take values 'experiments' or 'observations' respectively. |
position |
'position' tells the index in the table of the entry to edit or remove. Use ConfigShowTable() to see the index of the entry. In ConfigAddEntry() it can also take the value "last" (default), that will put the entry at the end of the corresponding level, or "first" at the beginning. See ?ConfigFileOpen for more information. If 'dataset_name' and 'var_name' are specified this argument is ignored in ConfigRemoveEntry(). |
dataset_name , var_name , main_path , file_path , nc_var_name , suffix , varmin , varmax
|
These parameters tell the dataset name, variable name, main path, ..., of
the entry to add, edit or remove. |
The function returns an accordingly modified configuration object. To apply the changes in the configuration file it must be saved using ConfigFileSave().
ConfigApplyMatchingEntries, ConfigEditDefinition, ConfigEditEntry, ConfigFileOpen, ConfigShowSimilarEntries, ConfigShowTable
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment", "ExampleVariable", "/path/to/ExampleExperiment/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Add another entry configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit second entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 2, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Remove first entry configuration <- ConfigRemoveEntry(configuration, "experiments", "ExampleExperiment", "ExampleVariable") # Show results ConfigShowTable(configuration, "experiments") # Save the configuration ConfigFileSave(configuration, config_file, confirm = FALSE)
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment", "ExampleVariable", "/path/to/ExampleExperiment/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Add another entry configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit second entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 2, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Remove first entry configuration <- ConfigRemoveEntry(configuration, "experiments", "ExampleExperiment", "ExampleVariable") # Show results ConfigShowTable(configuration, "experiments") # Save the configuration ConfigFileSave(configuration, config_file, confirm = FALSE)
These functions help in creating, opening and saving configuration files.
ConfigFileOpen(file_path, silent = FALSE, stop = FALSE) ConfigFileCreate(file_path, confirm = TRUE) ConfigFileSave(configuration, file_path, confirm = TRUE)
ConfigFileOpen(file_path, silent = FALSE, stop = FALSE) ConfigFileCreate(file_path, confirm = TRUE) ConfigFileSave(configuration, file_path, confirm = TRUE)
file_path |
Path to the configuration file to create/open/save. |
silent |
Flag to activate or deactivate verbose mode. Defaults to FALSE (verbose mode on). |
stop |
TRUE/FALSE whether to raise an error if not all the mandatory default variables are defined in the configuration file. |
confirm |
Flag to stipulate whether to ask for confirmation when
saving a configuration file that already exists. |
configuration |
Configuration object to save in a file. |
ConfigFileOpen() loads all the data contained in the configuration file
specified as parameter 'file_path'.
Returns a configuration object with the variables needed for the
configuration file mechanism to work.
This function is called from inside the Load() function to load the
configuration file specified in 'configfile'.
ConfigFileCreate() creates an empty configuration file and saves it to
the specified path. It may be opened later with ConfigFileOpen() to be edited.
Some default values are set when creating a file with this function, you
can check these with ConfigShowDefinitions().
ConfigFileSave() saves a configuration object into a file, which may then
be used from Load().
Two examples of configuration files can be found inside the 'inst/config/'
folder in the package:
BSC.conf: configuration file used at BSC-CNS. Contains location data on several datasets and variables.
template.conf: very simple configuration file intended to be used as pattern when starting from scratch.
How the configuration file works:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It contains one list and two tables.
Each of these have a header that starts with '!!'. These are key lines and
should not be removed or reordered.
Lines starting with '#' and blank lines will be ignored.
The list should contains variable definitions and default value definitions.
The first table contains information about experiments.
The third table contains information about observations.
Each table entry is a list of comma-separated elements.
The two first are part of a key that is associated to a value formed by the
other elements.
The key elements are a dataset identifier and a variable name.
The value elements are the dataset main path, dataset file path, the
variable name inside the .nc file, a default suffix (explained below) and a
minimum and maximum vaues beyond which loaded data is deactivated.
Given a dataset name and a variable name, a full path is obtained
concatenating the main path and the file path.
Also the nc variable name, the suffixes and the limit values are obtained.
Any of the elements in the keys can contain regular expressions[1] that will
cause matching for sets of dataset names or variable names.
The dataset path and file path can contain shell globbing expressions[2]
that will cause matching for sets of paths when fetching the file in the
full path.
The full path can point to an OPeNDAP URL.
Any of the elements in the value can contain variables that will be replaced
to an associated string.
Variables can be defined only in the list at the top of the file.
The pattern of a variable definition is
VARIABLE_NAME = VARIABLE_VALUE
and can be accessed from within the table values or from within the variable
values as
$VARIABLE_NAME$
For example:
FILE_NAME = tos.nc
!!table of experiments
ecmwf, tos, /path/to/dataset/, $FILE_NAME$
There are some reserved variables that will offer information about the
store frequency, the current startdate Load() is fetching, etc:
$VAR_NAME$, $START_DATE$, $STORE_FREQ$, $MEMBER_NUMBER$
for experiments only: $EXP_NAME$
for observations only: $OBS_NAME$, $YEAR$, $MONTH$, $DAY$
Additionally, from an element in an entry value you can access the other
elements of the entry as:
$EXP_MAIN_PATH$, $EXP_FILE_PATH$,
$VAR_NAME$, $SUFFIX$, $VAR_MIN$, $VAR_MAX$
The variable $SUFFIX$ is useful because it can be used to take part in the
main or file path. For example: '/path/to$SUFFIX$/dataset/'.
It will be replaced by the value in the column that corresponds to the
suffix unless the user specifies a different suffix via the parameter
'suffixexp' or 'suffixobs'.
This way the user is able to load two variables with the same name in the
same dataset but with slight modifications, with a suffix anywhere in the
path to the data that advices of this slight modification.
The entries in a table will be grouped in 4 levels of specificity:
General entries:
- the key dataset name and variable name are both a regular expression
matching any sequence of characters (.*) that will cause matching for any
pair of dataset and variable names
Example: .*, .*, /dataset/main/path/, file/path, nc_var_name, suffix,
var_min, var_max
Dataset entries:
- the key variable name matches any sequence of characters
Example: ecmwf, .*, /dataset/main/path/, file/path, nc_var_name,
suffix, var_min, var_max
Variable entries:
- the key dataset name matches any sequence of characters
Example: .*, tos, /dataset/main/path/, file/path, nc_var_name,
suffix, var_min, var_max
Specific entries:
- both key values are specified
Example: ecmwf, tos, /dataset/main/path/, file/path, nc_var_name,
suffix, var_min, var_max
Given a pair of dataset name and variable name for which we want to know the
full path, all the rules that match will be applied from more general to
more specific.
If there is more than one entry per group that match a given key pair,
these will be applied in the order of appearance in the configuration file
(top to bottom).
An asterisk (*) in any value element will be interpreted as 'leave it as is
or take the default value if yet not defined'.
The default values are defined in the following reserved variables:
$DEFAULT_EXP_MAIN_PATH$, $DEFAULT_EXP_FILE_PATH$, $DEFAULT_NC_VAR_NAME$,
$DEFAULT_OBS_MAIN_PATH$, $DEFAULT_OBS_FILE_PATH$, $DEFAULT_SUFFIX$,
$DEFAULT_VAR_MIN$, $DEFAULT_VAR_MAX$,
$DEFAULT_DIM_NAME_LATITUDES$, $DEFAULT_DIM_NAME_LONGITUDES$,
$DEFAULT_DIM_NAME_MEMBERS$
Trailing asterisks in an entry are not mandatory. For example
ecmwf, .*, /dataset/main/path/, *, *, *, *, *
will have the same effect as
ecmwf, .*, /dataset/main/path/
A double quote only (") in any key or value element will be interpreted as
'fill in with the same value as the entry above'.
ConfigFileOpen() returns a configuration object with all the information for
the configuration file mechanism to work.
ConfigFileSave() returns TRUE if the file has been saved and FALSE otherwise.
ConfigFileCreate() returns nothing.
[1] https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html
[2] https://tldp.org/LDP/abs/html/globbingref.html
ConfigApplyMatchingEntries, ConfigEditDefinition, ConfigEditEntry, ConfigFileOpen, ConfigShowSimilarEntries, ConfigShowTable
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Now apply matching entries for variable and experiment name and show the # result match_info <- ConfigApplyMatchingEntries(configuration, 'tas', exp = c('ExampleExperiment2'), show_result = TRUE) # Finally save the configuration file. ConfigFileSave(configuration, config_file, confirm = FALSE)
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Now apply matching entries for variable and experiment name and show the # result match_info <- ConfigApplyMatchingEntries(configuration, 'tas', exp = c('ExampleExperiment2'), show_result = TRUE) # Finally save the configuration file. ConfigFileSave(configuration, config_file, confirm = FALSE)
These functions help in finding similar entries in tables of supported
datasets by comparing all entries with some given information.
This is useful when dealing with complex configuration files and not sure
if already support certain variables or datasets.
At least one field must be provided in ConfigShowSimilarEntries().
Other fields can be unspecified and won't be taken into account. If more
than one field is provided, sameness is avreaged over all provided fields
and entries are sorted from higher average to lower.
ConfigShowSimilarEntries( configuration, dataset_name = NULL, var_name = NULL, main_path = NULL, file_path = NULL, nc_var_name = NULL, suffix = NULL, varmin = NULL, varmax = NULL, n_results = 10 )
ConfigShowSimilarEntries( configuration, dataset_name = NULL, var_name = NULL, main_path = NULL, file_path = NULL, nc_var_name = NULL, suffix = NULL, varmin = NULL, varmax = NULL, n_results = 10 )
configuration |
Configuration object obtained either from ConfigFileCreate() or ConfigFileOpen(). |
dataset_name |
Optional dataset name to look for similars of. |
var_name |
Optional variable name to look for similars of. |
main_path |
Optional main path to look for similars of. |
file_path |
Optional file path to look for similars of. |
nc_var_name |
Optional variable name inside NetCDF file to look for similars of. |
suffix |
Optional suffix to look for similars of. |
varmin |
Optional variable minimum to look for similars of. |
varmax |
Optional variable maximum to look for similars of. |
n_results |
Top 'n_results' alike results will be shown only. Defaults to 10 in ConfigShowSimilarEntries() and to 5 in ConfigShowSimilarVars(). |
Sameness is calculated with string distances as specified by Simon White in [1].
These functions return information about the found matches.
[1] Simon White, string seamness: http://www.catalysoft.com/articles/StrikeAMatch.html
ConfigApplyMatchingEntries, ConfigEditDefinition, ConfigEditEntry, ConfigFileOpen, ConfigShowSimilarEntries, ConfigShowTable
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = "Var.*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Look for similar entries ConfigShowSimilarEntries(configuration, dataset_name = "Exper", var_name = "Vari")
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = "Var.*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Look for similar entries ConfigShowSimilarEntries(configuration, dataset_name = "Exper", var_name = "Vari")
These functions show the tables of supported datasets and definitions in a configuration object obtained via ConfigFileCreate() or ConfigFileOpen().
ConfigShowTable(configuration, dataset_type, line_numbers = NULL) ConfigShowDefinitions(configuration)
ConfigShowTable(configuration, dataset_type, line_numbers = NULL) ConfigShowDefinitions(configuration)
configuration |
Configuration object obtained from ConfigFileCreate() or ConfigFileOpen(). |
dataset_type |
In ConfigShowTable(), 'dataset_type' tells whether the table to show is of experimental datasets or of observational datasets. Can take values 'experiments' or 'observations'. |
line_numbers |
'line_numbers' is an optional vector of numbers as long as the number of entries in the specified table. Intended for internal use. |
These functions return nothing.
[ConfigApplyMatchingEntries()], [ConfigEditDefinition()], [ConfigEditEntry()], [ConfigFileOpen()], [ConfigShowSimilarEntries()], [ConfigShowTable()].
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Show tables, lists and definitions ConfigShowTable(configuration, 'experiments') ConfigShowDefinitions(configuration)
# Create an empty configuration file config_file <- paste0(tempdir(), "/example.conf") ConfigFileCreate(config_file, confirm = FALSE) # Open it into a configuration object configuration <- ConfigFileOpen(config_file) # Add an entry at the bottom of 4th level of file-per-startdate experiments # table which will associate the experiment "ExampleExperiment2" and variable # "ExampleVariable" to some information about its location. configuration <- ConfigAddEntry(configuration, "experiments", "last", "ExampleExperiment2", "ExampleVariable", "/path/to/ExampleExperiment2/", "ExampleVariable/ExampleVariable_$START_DATE$.nc") # Edit entry to generalize for any variable. Changing variable needs . configuration <- ConfigEditEntry(configuration, "experiments", 1, var_name = ".*", file_path = "$VAR_NAME$/$VAR_NAME$_$START_DATE$.nc") # Show tables, lists and definitions ConfigShowTable(configuration, 'experiments') ConfigShowDefinitions(configuration)
Compute the linear trend for a time series by least square fitting together
with the associated error interval for both the observational and model data.
The 95% confidence interval and detrended observational and model data are
also provided.
The function doesn't do the ensemble mean, so if the input data have the
member dimension, ensemble mean needs to be computed beforehand.
Consist_Trend( exp, obs, dat_dim = "dataset", time_dim = "sdate", interval = 1, ncores = NULL )
Consist_Trend( exp, obs, dat_dim = "dataset", time_dim = "sdate", interval = 1, ncores = NULL )
exp |
A named numeric array of experimental data, with at least two dimensions 'time_dim' and 'dat_dim'. |
obs |
A named numeric array of observational data, same dimensions as parameter 'exp' except along 'dat_dim'. |
dat_dim |
A character string indicating the name of the dataset dimensions. If data at some point of 'time_dim' are not complete along 'dat_dim' in both 'exp' and 'obs', this point in all 'dat_dim' will be discarded. The default value is 'dataset'. |
time_dim |
A character string indicating the name of dimension along which the trend is computed. The default value is 'sdate'. |
interval |
A positive numeric indicating the unit length between two points along 'time_dim' dimension. The default value is 1. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing:
$trend |
A numeric array of the trend coefficients of model and observational data with dimensions c(stats = 2, nexp + nobs, the rest dimensions of 'exp' and 'obs' except time_dim), where 'nexp' is the length of 'dat_dim' in 'exp' and 'nobs' is the length of 'dat_dim' in 'obs. The 'stats' dimension contains the intercept and the slope. |
$conf.lower |
A numeric array of the lower limit of 95% confidence interval with dimensions same as $trend. The 'stats' dimension contains the lower confidence level of the intercept and the slope. |
$conf.upper |
A numeric array of the upper limit of 95% confidence interval with dimensions same as $trend. The 'stats' dimension contains the upper confidence level of the intercept and the slope. |
$detrended_exp |
A numeric array of the detrended model data with the same dimensions as 'exp'. |
$detrended_obs |
A numeric array of the detrended observational data with the same dimensions as 'obs'. |
#'# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) runmean_months <- 12 smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = runmean_months) smooth_ano_obs <- Smoothing(ano_obs, runmeanlen = runmean_months) dim_to_mean <- 'member' # average along members years_between_startdates <- 5 trend <- Consist_Trend(MeanDims(smooth_ano_exp, dim_to_mean, na.rm = TRUE), MeanDims(smooth_ano_obs, dim_to_mean, na.rm = TRUE), interval = years_between_startdates) #Bind data for plotting trend_bind <- abind::abind(trend$conf.lower[2, , ], trend$trend[2, , ], trend$conf.upper[2, , ], trend$trend[1, , ], along = 0) trend_bind <- Reorder(trend_bind, c(2, 1, 3)) PlotVsLTime(trend_bind, toptitle = "trend", ytitle = "K/(5 years)", monini = 11, limits = c(-0.8, 0.8), listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) PlotAno(InsertDim(trend$detrended_exp, 2, 1), InsertDim(trend$detrended_obs, 2, 1), startDates, "Detrended tos anomalies", ytitle = 'K', legends = 'ERSST', biglab = FALSE)
#'# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) runmean_months <- 12 smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = runmean_months) smooth_ano_obs <- Smoothing(ano_obs, runmeanlen = runmean_months) dim_to_mean <- 'member' # average along members years_between_startdates <- 5 trend <- Consist_Trend(MeanDims(smooth_ano_exp, dim_to_mean, na.rm = TRUE), MeanDims(smooth_ano_obs, dim_to_mean, na.rm = TRUE), interval = years_between_startdates) #Bind data for plotting trend_bind <- abind::abind(trend$conf.lower[2, , ], trend$trend[2, , ], trend$conf.upper[2, , ], trend$trend[1, , ], along = 0) trend_bind <- Reorder(trend_bind, c(2, 1, 3)) PlotVsLTime(trend_bind, toptitle = "trend", ytitle = "K/(5 years)", monini = 11, limits = c(-0.8, 0.8), listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) PlotAno(InsertDim(trend$detrended_exp, 2, 1), InsertDim(trend$detrended_obs, 2, 1), startDates, "Detrended tos anomalies", ytitle = 'K', legends = 'ERSST', biglab = FALSE)
Calculate the correlation coefficient (Pearson, Kendall or Spearman) for
an array of forecast and an array of observation. The correlations are
computed along 'time_dim' that usually refers to the start date dimension. If
'comp_dim' is given, the correlations are computed only if obs along comp_dim
dimension are complete between limits[1] and limits[2], i.e., there is no NA
between limits[1] and limits[2]. This option can be activated if the user
wants to account only for the forecasts which the corresponding observations
are available at all leadtimes.
The confidence interval is computed by the Fisher transformation and the
significance level relies on an one-sided student-T distribution.
The function can calculate ensemble mean before correlation by 'memb_dim'
specified and 'memb = F'. If ensemble mean is not calculated, correlation will
be calculated for each member.
If there is only one dataset for exp and obs, you can simply use cor() to
compute the correlation.
Corr( exp, obs, time_dim = "sdate", dat_dim = NULL, comp_dim = NULL, limits = NULL, method = "pearson", memb_dim = NULL, memb = TRUE, pval = TRUE, conf = TRUE, sign = FALSE, alpha = 0.05, ncores = NULL )
Corr( exp, obs, time_dim = "sdate", dat_dim = NULL, comp_dim = NULL, limits = NULL, method = "pearson", memb_dim = NULL, memb = TRUE, pval = TRUE, conf = TRUE, sign = FALSE, alpha = 0.05, ncores = NULL )
exp |
A named numeric array of experimental data, with at least dimension 'time_dim'. |
obs |
A named numeric array of observational data, same dimensions as parameter 'exp' except along 'dat_dim' and 'memb_dim'. |
time_dim |
A character string indicating the name of dimension along which the correlations are computed. The default value is 'sdate'. |
dat_dim |
A character string indicating the name of dataset (nobs/nexp) dimension. The default value is NULL (no dataset). |
comp_dim |
A character string indicating the name of dimension along which obs is taken into account only if it is complete. The default value is NULL. |
limits |
A vector of two integers indicating the range along comp_dim to be completed. The default is c(1, length(comp_dim dimension)). |
method |
A character string indicating the type of correlation: 'pearson', 'spearman', or 'kendall'. The default value is 'pearson'. |
memb_dim |
A character string indicating the name of the member dimension. It must be one dimension in 'exp' and 'obs'. If there is no member dimension, set NULL. The default value is NULL. |
memb |
A logical value indicating whether to remain 'memb_dim' dimension (TRUE) or do ensemble mean over 'memb_dim' (FALSE). Only functional when 'memb_dim' is not NULL. The default value is TRUE. |
pval |
A logical value indicating whether to return or not the p-value of the test Ho: Corr = 0. The default value is TRUE. |
conf |
A logical value indicating whether to return or not the confidence intervals. The default value is TRUE. |
sign |
A logical value indicating whether to retrieve the statistical significance of the test Ho: Corr = 0 based on 'alpha'. The default value is FALSE. |
alpha |
A numeric indicating the significance level for the statistical significance test. The default value is 0.05. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing the numeric arrays with dimension:
c(nexp, nobs, exp_memb, obs_memb, all other dimensions of exp except
time_dim and memb_dim).
nexp is the number of experiment (i.e., 'dat_dim' in exp), and nobs is the
number of observation (i.e., 'dat_dim' in obs). If dat_dim is NULL, nexp and
nobs are omitted. exp_memb is the number of member in experiment (i.e.,
'memb_dim' in exp) and obs_memb is the number of member in observation (i.e.,
'memb_dim' in obs). If memb = F, exp_memb and obs_memb are omitted.
$corr |
The correlation coefficient. |
$p.val |
The p-value. Only present if |
$conf.lower |
The lower confidence interval. Only present if |
$conf.upper |
The upper confidence interval. Only present if |
$sign |
The statistical significance. Only present if |
# Case 1: Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) runmean_months <- 12 # Smooth along lead-times smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = runmean_months) smooth_ano_obs <- Smoothing(ano_obs, runmeanlen = runmean_months) required_complete_row <- 3 # Discard start dates which contain any NA lead-times leadtimes_per_startdate <- 60 corr <- Corr(MeanDims(smooth_ano_exp, 'member'), MeanDims(smooth_ano_obs, 'member'), comp_dim = 'ftime', dat_dim = 'dataset', limits = c(ceiling((runmean_months + 1) / 2), leadtimes_per_startdate - floor(runmean_months / 2))) # Case 2: Keep member dimension corr <- Corr(smooth_ano_exp, smooth_ano_obs, memb_dim = 'member', dat_dim = 'dataset') # ensemble mean corr <- Corr(smooth_ano_exp, smooth_ano_obs, memb_dim = 'member', memb = FALSE, dat_dim = 'dataset')
# Case 1: Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) runmean_months <- 12 # Smooth along lead-times smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = runmean_months) smooth_ano_obs <- Smoothing(ano_obs, runmeanlen = runmean_months) required_complete_row <- 3 # Discard start dates which contain any NA lead-times leadtimes_per_startdate <- 60 corr <- Corr(MeanDims(smooth_ano_exp, 'member'), MeanDims(smooth_ano_obs, 'member'), comp_dim = 'ftime', dat_dim = 'dataset', limits = c(ceiling((runmean_months + 1) / 2), leadtimes_per_startdate - floor(runmean_months / 2))) # Case 2: Keep member dimension corr <- Corr(smooth_ano_exp, smooth_ano_obs, memb_dim = 'member', dat_dim = 'dataset') # ensemble mean corr <- Corr(smooth_ano_exp, smooth_ano_obs, memb_dim = 'member', memb = FALSE, dat_dim = 'dataset')
The Continuous Ranked Probability Score (CRPS; Wilks, 2011) is the continuous version of the Ranked Probability Score (RPS; Wilks, 2011). It is a skill metric to evaluate the full distribution of probabilistic forecasts. It has a negative orientation (i.e., the higher-quality forecast the smaller CRPS) and it rewards the forecast that has probability concentration around the observed value. In case of a deterministic forecast, the CRPS is reduced to the mean absolute error. It has the same units as the data. The function is based on enscrps_cpp from SpecsVerification. If there is more than one dataset, CRPS will be computed for each pair of exp and obs data.
CRPS( exp, obs, time_dim = "sdate", memb_dim = "member", dat_dim = NULL, Fair = FALSE, return_mean = TRUE, ncores = NULL )
CRPS( exp, obs, time_dim = "sdate", memb_dim = "member", dat_dim = NULL, Fair = FALSE, return_mean = TRUE, ncores = NULL )
exp |
A named numerical array of the forecast with at least time dimension. |
obs |
A named numerical array of the observation with at least time dimension. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. |
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the probabilities of the forecast. The default value is 'member'. |
dat_dim |
A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL. |
Fair |
A logical indicating whether to compute the FairCRPS (the potential CRPS that the forecast would have with an infinite ensemble size). The default value is FALSE. |
return_mean |
A logical indicating whether to return the temporal mean of the CRPS or not. If TRUE, the temporal mean is calculated along time_dim, if FALSE the time dimension is not aggregated. The default is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array of CRPS with dimensions c(nexp, nobs, the rest dimensions of 'exp' except 'time_dim' and 'memb_dim' dimensions). nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the number of observation (i.e., dat_dim in obs). If dat_dim is NULL, nexp and nobs are omitted.
Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7
exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) res <- CRPS(exp = exp, obs = obs)
exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) res <- CRPS(exp = exp, obs = obs)
The Continuous Ranked Probability Skill Score (CRPSS; Wilks, 2011) is the skill score based on the Continuous Ranked Probability Score (CRPS; Wilks, 2011). It can be used to assess whether a forecast presents an improvement or worsening with respect to a reference forecast. The CRPSS ranges between minus infinite and 1. If the CRPSS is positive, it indicates that the forecast has higher skill than the reference forecast, while a negative value means that it has a lower skill. Examples of reference forecasts are the climatological forecast, persistence, a previous model version, or another model. It is computed as 'CRPSS = 1 - CRPS_exp / CRPS_ref. The statistical significance is obtained based on a Random Walk test at the specified confidence level (DelSole and Tippett, 2016).
CRPSS( exp, obs, ref = NULL, time_dim = "sdate", memb_dim = "member", dat_dim = NULL, Fair = FALSE, clim.cross.val = TRUE, sig_method.type = "two.sided.approx", alpha = 0.05, N.eff = NA, ncores = NULL )
CRPSS( exp, obs, ref = NULL, time_dim = "sdate", memb_dim = "member", dat_dim = NULL, Fair = FALSE, clim.cross.val = TRUE, sig_method.type = "two.sided.approx", alpha = 0.05, N.eff = NA, ncores = NULL )
exp |
A named numerical array of the forecast with at least time dimension. |
obs |
A named numerical array of the observation with at least time dimension. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. |
ref |
A named numerical array of the reference forecast data with at least time and member dimension. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. If there is only one reference dataset, it should not have dataset dimension. If there is corresponding reference for each experiment, the dataset dimension must have the same length as in 'exp'. If 'ref' is NULL, the climatological forecast is used as reference forecast. To build the climatological forecast, the observed values along the whole time period are used as different members for all time steps. The parameter 'clim.cross.val' controls whether to build it using cross-validation. The default value is NULL. |
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the probabilities of the forecast and the reference forecast. The default value is 'member'. |
dat_dim |
A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL. |
Fair |
A logical indicating whether to compute the FairCRPSS (the potential CRPSS that the forecast would have with an infinite ensemble size). The default value is FALSE. |
clim.cross.val |
A logical indicating whether to build the climatological forecast in cross-validation (i.e. excluding the observed value of the time step when building the probabilistic distribution function for that particular time step). Only used if 'ref' is NULL. The default value is TRUE. |
sig_method.type |
A character string indicating the test type of the
significance method. Check |
alpha |
A numeric of the significance level to be used in the statistical significance test. The default value is 0.05. |
N.eff |
Effective sample size to be used in the statistical significance test. It can be NA (and it will be computed with the s2dv:::.Eno), FALSE (and it will use the length of "obs" along "time_dim", so the autocorrelation is not taken into account), a numeric (which is used for all cases), or an array with the same dimensions as "obs" except "time_dim" (for a particular N.eff to be used for each case). The default value is NA. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
$crpss |
A numerical array of the CRPSS with dimensions c(nexp, nobs, the rest dimensions of 'exp' except 'time_dim' and 'memb_dim' dimensions). nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the number of observation (i.e., dat_dim in obs). If 'dat_dim' is NULL, nexp and nobs are omitted. |
$sign |
A logical array of the statistical significance of the CRPSS with the same dimensions as $crpss. |
Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7 DelSole and Tippett, 2016; https://doi.org/10.1175/MWR-D-15-0218.1
exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) ref <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) res <- CRPSS(exp = exp, obs = obs) ## climatology as reference forecast res <- CRPSS(exp = exp, obs = obs, ref = ref) ## ref as reference forecast
exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) ref <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) res <- CRPSS(exp = exp, obs = obs) ## climatology as reference forecast res <- CRPSS(exp = exp, obs = obs, ref = ref) ## ref as reference forecast
Compute the correlation difference between two deterministic forecasts. Positive values of the correlation difference indicate that the forecast is more skillful than the reference forecast, while negative values mean that the reference forecast is more skillful. The statistical significance of the correlation differences is computed with a one-sided or two-sided test for equality of dependent correlation coefficients (Steiger, 1980; Siegert et al., 2017) using effective degrees of freedom to account for the autocorrelation of the time series (Zwiers and von Storch, 1995).
DiffCorr( exp, obs, ref, N.eff = NA, time_dim = "sdate", memb_dim = NULL, method = "pearson", alpha = 0.05, handle.na = "return.na", test.type = "two-sided", pval = TRUE, sign = FALSE, ncores = NULL )
DiffCorr( exp, obs, ref, N.eff = NA, time_dim = "sdate", memb_dim = NULL, method = "pearson", alpha = 0.05, handle.na = "return.na", test.type = "two-sided", pval = TRUE, sign = FALSE, ncores = NULL )
exp |
A named numerical array of the forecast data with at least time dimension. |
obs |
A named numerical array with the observations with at least time dimension. The dimensions must be the same as "exp" except 'memb_dim'. |
ref |
A named numerical array of the reference forecast data with at least time dimension. The dimensions must be the same as "exp" except 'memb_dim'. |
N.eff |
Effective sample size to be used in the statistical significance test. It can be NA (and it will be computed with the s2dv:::.Eno), a numeric (which is used for all cases), or an array with the same dimensions as "obs" except "time_dim" (for a particular N.eff to be used for each case) . The default value is NA. |
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the ensemble mean of the forecast and reference forecast. If it is NULL (default), the ensemble mean should be provided directly to the function. |
method |
A character string indicating the correlation coefficient to be computed ("pearson" or "spearman"). The default value is "pearson". |
alpha |
A numeric of the significance level to be used in the statistical significance test (output "sign"). The default value is 0.05. |
handle.na |
A charcater string indicating how to handle missing values. If "return.na", NAs will be returned for the cases that contain at least one NA in "exp", "ref", or "obs". If "only.complete.triplets", only the time steps with no missing values in all "exp", "ref", and "obs" will be used. If "na.fail", an error will arise if any of "exp", "ref", or "obs" contains any NA. The default value is "return.na". |
test.type |
A character string indicating the type of significance test. It can be "two-sided" (to assess whether the skill of "exp" and "ref" are significantly different) or "one-sided" (to assess whether the skill of "exp" is significantly higher than that of "ref") following Steiger (1980). The default value is "two-sided". |
pval |
A logical value indicating whether to return the p-value of the significance test Ho: DiffCorr = 0. The default value is TRUE. |
sign |
A logical value indicating whether to return the statistical significance of the test Ho: DiffCorr = 0 based on 'alpha'. The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list with:
$diff.corr |
A numerical array of the correlation differences with the same dimensions as the input arrays except "time_dim" (and "memb_dim" if provided). |
$sign |
A logical array of the statistical significance of the correlation differences with the same dimensions as the input arrays except "time_dim" (and "memb_dim" if provided). Returned only if "sign" is TRUE. |
$p.val |
A numeric array of the p-values with the same dimensions as the input arrays except "time_dim" (and "memb_dim" if provided). Returned only if "pval" is TRUE. |
Steiger, 1980; https://content.apa.org/doi/10.1037/0033-2909.87.2.245 Siegert et al., 2017; https://doi.org/10.1175/MWR-D-16-0037.1 Zwiers and von Storch, 1995; https://doi.org/10.1175/1520-0442(1995)008<0336:TSCIAI>2.0.CO;2
exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) ref <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) res_two.sided_sign <- DiffCorr(exp, obs, ref, memb_dim = 'member', test.type = 'two-sided', alpha = 0.05) res_one.sided_pval <- DiffCorr(exp, obs, ref, memb_dim = 'member', test.type = 'one-sided', alpha = NULL)
exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) ref <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) res_two.sided_sign <- DiffCorr(exp, obs, ref, memb_dim = 'member', test.type = 'two-sided', alpha = 0.05) res_one.sided_pval <- DiffCorr(exp, obs, ref, memb_dim = 'member', test.type = 'one-sided', alpha = NULL)
Compute the number of effective samples along one dimension of an array. This
effective number of independent observations can be used in
statistical/inference tests.
The calculation is based on eno function from Caio Coelho from rclim.txt.
Eno(data, time_dim = "sdate", na.action = na.pass, ncores = NULL)
Eno(data, time_dim = "sdate", na.action = na.pass, ncores = NULL)
data |
A numeric array with named dimensions. |
time_dim |
A function indicating the dimension along which to compute the effective sample size. The default value is 'sdate'. |
na.action |
A function. It can be na.pass (missing values are allowed) or na.fail (no missing values are allowed). See details in stats::acf(). The default value is na.pass. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
An array with the same dimension as parameter 'data' except the time_dim dimension, which is removed after the computation. The array indicates the number of effective sample along time_dim.
set.seed(1) data <- array(rnorm(800), dim = c(dataset = 1, member = 2, sdate = 4, ftime = 4, lat = 10, lon = 10)) na <- floor(runif(40, min = 1, max = 800)) data[na] <- NA res <- Eno(data)
set.seed(1) data <- array(rnorm(800), dim = c(dataset = 1, member = 2, sdate = 4, ftime = 4, lat = 10, lon = 10)) na <- floor(runif(40, min = 1, max = 800)) data[na] <- NA res <- Eno(data)
Perform an area-weighted EOF analysis using single value decomposition (SVD) based on a covariance matrix or a correlation matrix if parameter 'corr' is set to TRUE.
EOF( ano, lat, lon, time_dim = "sdate", space_dim = c("lat", "lon"), neofs = 15, corr = FALSE, ncores = NULL )
EOF( ano, lat, lon, time_dim = "sdate", space_dim = c("lat", "lon"), neofs = 15, corr = FALSE, ncores = NULL )
ano |
A numerical array of anomalies with named dimensions to calculate EOF. The dimensions must have at least 'time_dim' and 'space_dim'. NAs could exist but it should be consistent along time_dim. That is, if one grid point has NAs, all the time steps at this point should be NAs. |
lat |
A vector of the latitudes of 'ano'. |
lon |
A vector of the longitudes of 'ano'. |
time_dim |
A character string indicating the name of the time dimension of 'ano'. The default value is 'sdate'. |
space_dim |
A vector of two character strings. The first is the dimension name of latitude of 'ano' and the second is the dimension name of longitude of 'ano'. The default value is c('lat', 'lon'). |
neofs |
A positive integer of the modes to be kept. The default value is 15. If time length or the product of the length of space_dim is smaller than neofs, neofs will be changed to the minimum of the three values. |
corr |
A logical value indicating whether to base on a correlation (TRUE) or on a covariance matrix (FALSE). The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing:
EOFs |
An array of EOF patterns normalized to 1 (unitless) with dimensions
(number of modes, rest of the dimensions of 'ano' except 'time_dim').
Multiplying |
PCs |
An array of principal components with the units of the original field to the power of 2, with dimensions (time_dim, number of modes, rest of the dimensions of 'ano' except 'space_dim'). 'PCs' contains already the percentage of explained variance so, to reconstruct the original field it's only needed to multiply 'EOFs' by 'PCs'. |
var |
An array of the percentage ( explained by each mode (number of modes). The dimensions are (number of modes, rest of the dimensions of 'ano' except 'time_dim' and 'space_dim'). |
mask |
An array of the mask with dimensions (space_dim, rest of the dimensions of 'ano' except 'time_dim'). It is made from 'ano', 1 for the positions that 'ano' has value and NA for the positions that 'ano' has NA. It is used to replace NAs with 0s for EOF calculation and mask the result with NAs again after the calculation. |
wght |
An array of the area weighting with dimensions 'space_dim'. It is calculated by cosine of 'lat' and used to compute the fraction of variance explained by each EOFs. |
tot_var |
A number or a numeric array of the total variance explained by all the modes. The dimensions are same as 'ano' except 'time_dim' and 'space_dim'. |
ProjectField, NAO, PlotBoxWhisker
# This example computes the EOFs along forecast horizons and plots the one # that explains the greatest amount of variability. The example data has low # resolution so the result may not be explanatory, but it displays how to # use this function. ano <- Ano_CrossValid(sampleData$mod, sampleData$obs) tmp <- MeanDims(ano$exp, c('dataset', 'member')) ano <- tmp[1, , ,] names(dim(ano)) <- names(dim(tmp))[-2] eof <- EOF(ano, sampleData$lat, sampleData$lon) ## Not run: PlotEquiMap(eof$EOFs[1, , ], sampleData$lon, sampleData$lat) ## End(Not run)
# This example computes the EOFs along forecast horizons and plots the one # that explains the greatest amount of variability. The example data has low # resolution so the result may not be explanatory, but it displays how to # use this function. ano <- Ano_CrossValid(sampleData$mod, sampleData$obs) tmp <- MeanDims(ano$exp, c('dataset', 'member')) ano <- tmp[1, , ,] names(dim(ano)) <- names(dim(tmp))[-2] eof <- EOF(ano, sampleData$lat, sampleData$lon) ## Not run: PlotEquiMap(eof$EOFs[1, , ], sampleData$lon, sampleData$lat) ## End(Not run)
Calculate the four main teleconnection indices in European Atlantic Ocean
region: North Atlantic oscillation (NAO), East Atlantic Pattern (EA), East
Atlantic/Western Russia (EAWR), and Scandinavian pattern (SCA). The function
REOF()
is used for the calculation, and the first four modes are
returned.
EuroAtlanticTC( ano, lat, lon, ntrunc = 30, time_dim = "sdate", space_dim = c("lat", "lon"), corr = FALSE, ncores = NULL )
EuroAtlanticTC( ano, lat, lon, ntrunc = 30, time_dim = "sdate", space_dim = c("lat", "lon"), corr = FALSE, ncores = NULL )
ano |
A numerical array of anomalies with named dimensions to calculate REOF then the four teleconnections. The dimensions must have at least 'time_dim' and 'space_dim', and the data should cover the European Atlantic Ocean area (20N-80N, 90W-60E). |
lat |
A vector of the latitudes of 'ano'. It should be 20N-80N. |
lon |
A vector of the longitudes of 'ano'. It should be 90W-60E. |
ntrunc |
A positive integer of the modes to be kept. The default value is 30. If time length or the product of latitude length and longitude length is less than ntrunc, ntrunc is equal to the minimum of the three values. |
time_dim |
A character string indicating the name of the time dimension of 'ano'. The default value is 'sdate'. |
space_dim |
A vector of two character strings. The first is the dimension name of latitude of 'ano' and the second is the dimension name of longitude of 'ano'. The default value is c('lat', 'lon'). |
corr |
A logical value indicating whether to base on a correlation (TRUE) or on a covariance matrix (FALSE). The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing:
patterns |
An array of the first four REOF patterns normalized to 1 (unitless) with dimensions (modes = 4, the rest of the dimensions of 'ano' except 'time_dim'). The modes represent NAO, EA, EAWR, and SCA, of which the order and sign changes depending on the dataset and period employed, so manual reordering may be needed. Multiplying 'patterns' by 'indices' gives the original reconstructed field. |
indices |
An array of the first four principal components with the units of the original field to the power of 2, with dimensions (time_dim, modes = 4, the rest of the dimensions of 'ano' except 'space_dim'). |
var |
An array of the percentage ( explained by each mode. The dimensions are (modes = ntrunc, the rest of the dimensions of 'ano' except 'time_dim' and 'space_dim'). |
wght |
An array of the area weighting with dimensions 'space_dim'. It is calculated by the square root of cosine of 'lat' and used to compute the fraction of variance explained by each REOFs. |
REOF NAO
# Use synthetic data set.seed(1) dat <- array(rnorm(800), dim = c(dat = 2, sdate = 5, lat = 8, lon = 15)) lat <- seq(10, 90, length.out = 8) lon <- seq(-100, 70, length.out = 15) res <- EuroAtlanticTC(dat, lat = lat, lon = lon)
# Use synthetic data set.seed(1) dat <- array(rnorm(800), dim = c(dat = 2, sdate = 5, lat = 8, lon = 15)) lat <- seq(10, 90, length.out = 8) lon <- seq(-100, 70, length.out = 15) res <- EuroAtlanticTC(dat, lat = lat, lon = lon)
Filter out the selected frequency from a time series. The filtering is performed by dichotomy, seeking for a frequency around the parameter 'freq' and the phase that maximizes the signal to subtract from the time series. The maximization of the signal to subtract relies on a minimization of the mean square differences between the time series ('data') and the cosine of the specified frequency and phase.
Filter(data, freq, time_dim = "ftime", ncores = NULL)
Filter(data, freq, time_dim = "ftime", ncores = NULL)
data |
A numeric vector or array of the data to be filtered. If it's a vector, it should be a time series. If it's an array, the dimensions must have at least 'time_dim'. |
freq |
A number of the frequency to filter. |
time_dim |
A character string indicating the dimension along which to compute the filtering. The default value is 'ftime'. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numeric vector or array of the filtered data with the dimensions the same as 'data'.
# Load sample data as in Load() example: example(Load) ensmod <- MeanDims(sampleData$mod, 2) spectrum <- Spectrum(ensmod) for (jsdate in 1:dim(spectrum)['sdate']) { for (jlen in 1:dim(spectrum)['ftime']) { if (spectrum[jlen, 2, 1, jsdate] > spectrum[jlen, 3, 1, jsdate]) { ensmod[1, jsdate, ] <- Filter(ensmod[1, jsdate, ], spectrum[jlen, 1, 1, jsdate]) } } } PlotAno(InsertDim(ensmod, 2, 1), sdates = startDates)
# Load sample data as in Load() example: example(Load) ensmod <- MeanDims(sampleData$mod, 2) spectrum <- Spectrum(ensmod) for (jsdate in 1:dim(spectrum)['sdate']) { for (jlen in 1:dim(spectrum)['ftime']) { if (spectrum[jlen, 2, 1, jsdate] > spectrum[jlen, 3, 1, jsdate]) { ensmod[1, jsdate, ] <- Filter(ensmod[1, jsdate, ], spectrum[jlen, 1, 1, jsdate]) } } } PlotAno(InsertDim(ensmod, 2, 1), sdates = startDates)
Compute probabilistic forecasts from an ensemble based on the relative thresholds, or the probabilistic observations (i.e., which probabilistic category was observed). A reference period can be specified to calculate the absolute thresholds between each probabilistic category. The absolute thresholds can be computed in cross-validation mode. If data is an ensemble, the probabilities are calculated as the percentage of members that fall into each category. For observations (or forecast without member dimension), 1 means that the event happened, while 0 indicates that the event did not happen. Weighted probabilities can be computed if the weights are provided for each ensemble member and time step. The absolute thresholds can also be provided directly for probabilities calculation.
GetProbs( data, time_dim = "sdate", memb_dim = "member", indices_for_quantiles = NULL, prob_thresholds = c(1/3, 2/3), abs_thresholds = NULL, bin_dim_abs = "bin", weights = NULL, cross.val = FALSE, ncores = NULL )
GetProbs( data, time_dim = "sdate", memb_dim = "member", indices_for_quantiles = NULL, prob_thresholds = c(1/3, 2/3), abs_thresholds = NULL, bin_dim_abs = "bin", weights = NULL, cross.val = FALSE, ncores = NULL )
data |
A named numerical array of the forecasts or observations with, at least, time dimension. |
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the probabilities of the forecast, or NULL if there is no member dimension (e.g., for observations, or for forecast with only one ensemble member). The default value is 'member'. |
indices_for_quantiles |
A vector of the indices to be taken along 'time_dim' for computing the absolute thresholds between the probabilistic categories. If NULL (default), the whole period is used. It is only used when 'prob_thresholds' is provided. |
prob_thresholds |
A numeric vector of the relative thresholds (from 0 to 1) between the categories. The default value is c(1/3, 2/3), which corresponds to tercile equiprobable categories. |
abs_thresholds |
A numeric array or vector of the absolute thresholds in
the same units as |
bin_dim_abs |
A character string of the dimension name of 'abs_thresholds' array in which category limits are stored. It will also be the probabilistic category dimension name in the output. The default value is 'bin'. |
weights |
A named numerical array of the weights for 'data' with dimensions 'time_dim' and 'memb_dim' (if 'data' has them). The default value is NULL. The ensemble should have at least 70 members or span at least 10 time steps and have more than 45 members if consistency between the weighted and unweighted methodologies is desired. |
cross.val |
A logical indicating whether to compute the thresholds between probabilistic categories in cross-validation mode. The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array of probabilities with dimensions c(bin_dim_abs, the rest
dimensions of 'data' except 'memb_dim'). 'bin' dimension has the length of
probabilistic categories, i.e., length(prob_thresholds) + 1
.
data <- array(rnorm(2000), dim = c(ensemble = 25, sdate = 20, time = 4)) res <- GetProbs(data = data, time_dim = 'sdate', memb_dim = 'ensemble', indices_for_quantiles = 4:17) # abs_thresholds is provided abs_thr1 <- c(-0.2, 0.3) abs_thr2 <- array(c(-0.2, 0.3) + rnorm(40) * 0.1, dim = c(cat = 2, sdate = 20)) res1 <- GetProbs(data = data, time_dim = 'sdate', memb_dim = 'ensemble', prob_thresholds = NULL, abs_thresholds = abs_thr1) res2 <- GetProbs(data = data, time_dim = 'sdate', memb_dim = 'ensemble', prob_thresholds = NULL, abs_thresholds = abs_thr2, bin_dim_abs = 'cat')
data <- array(rnorm(2000), dim = c(ensemble = 25, sdate = 20, time = 4)) res <- GetProbs(data = data, time_dim = 'sdate', memb_dim = 'ensemble', indices_for_quantiles = 4:17) # abs_thresholds is provided abs_thr1 <- c(-0.2, 0.3) abs_thr2 <- array(c(-0.2, 0.3) + rnorm(40) * 0.1, dim = c(cat = 2, sdate = 20)) res1 <- GetProbs(data = data, time_dim = 'sdate', memb_dim = 'ensemble', prob_thresholds = NULL, abs_thresholds = abs_thr1) res2 <- GetProbs(data = data, time_dim = 'sdate', memb_dim = 'ensemble', prob_thresholds = NULL, abs_thresholds = abs_thr2, bin_dim_abs = 'cat')
The Global Mean Surface Temperature (GMST) anomalies are computed as the weighted-averaged surface air temperature anomalies over land and sea surface temperature anomalies over the ocean. If different members and/or datasets are provided, the climatology (used to calculate the anomalies) is computed individually for all of them.
GMST( data_tas, data_tos, data_lats, data_lons, mask_sea_land, sea_value, type, mask = NULL, lat_dim = "lat", lon_dim = "lon", monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
GMST( data_tas, data_tos, data_lats, data_lons, mask_sea_land, sea_value, type, mask = NULL, lat_dim = "lat", lon_dim = "lon", monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
data_tas |
A numerical array with the surface air temperature data to be used for the index computation with, at least, the dimensions: 1) latitude, longitude, start date and forecast month (in case of decadal predictions), 2) latitude, longitude, year and month (in case of historical simulations or observations). This data has to be provided, at least, over the whole region needed to compute the index. The dimensions must be identical to thos of data_tos. |
data_tos |
A numerical array with the sea surface temperature data to be used for the index computation with, at least, the dimensions: 1) latitude, longitude, start date and forecast month (in case of decadal predictions), 2) latitude, longitude, year and month (in case of historical simulations or observations). This data has to be provided, at least, over the whole region needed to compute the index. The dimensions must be identical to thos of data_tas. |
data_lats |
A numeric vector indicating the latitudes of the data. |
data_lons |
A numeric vector indicating the longitudes of the data. |
mask_sea_land |
An array with dimensions [lat_dim = data_lats, lon_dim = data_lons] for blending 'data_tas' and 'data_tos'. |
sea_value |
A numeric value indicating the sea grid points in 'mask_sea_land'. |
type |
A character string indicating the type of data ('dcpp' for decadal predictions, 'hist' for historical simulations, or 'obs' for observations or reanalyses). |
mask |
An array of a mask (with 0's in the grid points that have to be masked) or NULL (i.e., no mask is used). This parameter allows to remove the values over land in case the dataset is a combination of surface air temperature over land and sea surface temperature over the ocean. Also, it can be used to mask those grid points that are missing in the observational dataset for a fair comparison between the forecast system and the reference dataset. The default value is NULL. |
lat_dim |
A character string of the name of the latitude dimension. The default value is 'lat'. |
lon_dim |
A character string of the name of the longitude dimension. The default value is 'lon'. |
monini |
An integer indicating the month in which the forecast system is initialized. Only used when parameter 'type' is 'dcpp'. The default value is 11, i.e., initialized in November. |
fmonth_dim |
A character string indicating the name of the forecast month dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'fmonth'. |
sdate_dim |
A character string indicating the name of the start date dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'sdate'. |
indices_for_clim |
A numeric vector of the indices of the years to
compute the climatology for calculating the anomalies, or NULL so the
climatology is calculated over the whole period. If the data are already
anomalies, set it to FALSE. The default value is NULL. |
year_dim |
A character string indicating the name of the year dimension The default value is 'year'. Only used if parameter 'type' is 'hist' or 'obs'. |
month_dim |
A character string indicating the name of the month dimension. The default value is 'month'. Only used if parameter 'type' is 'hist' or 'obs'. |
na.rm |
A logical value indicanting whether to remove NA values. The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array with the GMST anomalies with the same dimensions as data_tas except the lat_dim, lon_dim and fmonth_dim (month_dim) in case of decadal predictions (historical simulations or observations). In case of decadal predictions, a new dimension 'fyear' is added.
## Observations or reanalyses obs_tas <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) obs_tos <- array(2:101, dim = c(year = 5, lat = 19, lon = 37, month = 12)) mask_sea_land <- array(c(1,0,1), dim = c(lat = 19, lon = 37)) sea_value <- 1 lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_obs <- GMST(data_tas = obs_tas, data_tos = obs_tos, data_lats = lat, data_lons = lon, type = 'obs', mask_sea_land = mask_sea_land, sea_value = sea_value) ## Historical simulations hist_tas <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) hist_tos <- array(2:101, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) mask_sea_land <- array(c(1,0,1), dim = c(lat = 19, lon = 37)) sea_value <- 1 lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_hist <- GMST(data_tas = hist_tas, data_tos = hist_tos, data_lats = lat, data_lons = lon, type = 'hist', mask_sea_land = mask_sea_land, sea_value = sea_value) ## Decadal predictions dcpp_tas <- array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) dcpp_tos <- array(2:101, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) mask_sea_land <- array(c(1,0,1), dim = c(lat = 19, lon = 37)) sea_value <- 1 lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_dcpp <- GMST(data_tas = dcpp_tas, data_tos = dcpp_tos, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1, mask_sea_land = mask_sea_land, sea_value = sea_value)
## Observations or reanalyses obs_tas <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) obs_tos <- array(2:101, dim = c(year = 5, lat = 19, lon = 37, month = 12)) mask_sea_land <- array(c(1,0,1), dim = c(lat = 19, lon = 37)) sea_value <- 1 lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_obs <- GMST(data_tas = obs_tas, data_tos = obs_tos, data_lats = lat, data_lons = lon, type = 'obs', mask_sea_land = mask_sea_land, sea_value = sea_value) ## Historical simulations hist_tas <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) hist_tos <- array(2:101, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) mask_sea_land <- array(c(1,0,1), dim = c(lat = 19, lon = 37)) sea_value <- 1 lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_hist <- GMST(data_tas = hist_tas, data_tos = hist_tos, data_lats = lat, data_lons = lon, type = 'hist', mask_sea_land = mask_sea_land, sea_value = sea_value) ## Decadal predictions dcpp_tas <- array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) dcpp_tos <- array(2:101, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) mask_sea_land <- array(c(1,0,1), dim = c(lat = 19, lon = 37)) sea_value <- 1 lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_dcpp <- GMST(data_tas = dcpp_tas, data_tos = dcpp_tos, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1, mask_sea_land = mask_sea_land, sea_value = sea_value)
The Global Surface Air Temperature (GSAT) anomalies are computed as the weighted-averaged surface air temperature anomalies over the global region. If different members and/or datasets are provided, the climatology (used to calculate the anomalies) is computed individually for all of them.
GSAT( data, data_lats, data_lons, type, lat_dim = "lat", lon_dim = "lon", mask = NULL, monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
GSAT( data, data_lats, data_lons, type, lat_dim = "lat", lon_dim = "lon", mask = NULL, monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
data |
A numerical array to be used for the index computation with, at least, the dimensions: 1) latitude, longitude, start date and forecast month (in case of decadal predictions), 2) latitude, longitude, year and month (in case of historical simulations or observations). This data has to be provided, at least, over the whole region needed to compute the index. |
data_lats |
A numeric vector indicating the latitudes of the data. |
data_lons |
A numeric vector indicating the longitudes of the data. |
type |
A character string indicating the type of data ('dcpp' for decadal predictions, 'hist' for historical simulations, or 'obs' for observations or reanalyses). |
lat_dim |
A character string of the name of the latitude dimension. The default value is 'lat'. |
lon_dim |
A character string of the name of the longitude dimension. The default value is 'lon'. |
mask |
An array of a mask (with 0's in the grid points that have to be masked) or NULL (i.e., no mask is used). This parameter allows to remove the values over land in case the dataset is a combination of surface air temperature over land and sea surface temperature over the ocean. Also, it can be used to mask those grid points that are missing in the observational dataset for a fair comparison between the forecast system and the reference dataset. The default value is NULL. |
monini |
An integer indicating the month in which the forecast system is initialized. Only used when parameter 'type' is 'dcpp'. The default value is 11, i.e., initialized in November. |
fmonth_dim |
A character string indicating the name of the forecast month dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'fmonth'. |
sdate_dim |
A character string indicating the name of the start date dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'sdate'. |
indices_for_clim |
A numeric vector of the indices of the years to
compute the climatology for calculating the anomalies, or NULL so the
climatology is calculated over the whole period. If the data are already
anomalies, set it to FALSE. The default value is NULL. |
year_dim |
A character string indicating the name of the year dimension The default value is 'year'. Only used if parameter 'type' is 'hist' or 'obs'. |
month_dim |
A character string indicating the name of the month dimension. The default value is 'month'. Only used if parameter 'type' is 'hist' or 'obs'. |
na.rm |
A logical value indicanting whether to remove NA values. The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array with the GSAT anomalies with the same dimensions as data except the lat_dim, lon_dim and fmonth_dim (month_dim) in case of decadal predictions (historical simulations or observations). In case of decadal predictions, a new dimension 'fyear' is added.
## Observations or reanalyses obs <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_obs <- GSAT(data = obs, data_lats = lat, data_lons = lon, type = 'obs') ## Historical simulations hist <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_hist <- GSAT(data = hist, data_lats = lat, data_lons = lon, type = 'hist') ## Decadal predictions dcpp <- array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_dcpp <- GSAT(data = dcpp, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1)
## Observations or reanalyses obs <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_obs <- GSAT(data = obs, data_lats = lat, data_lons = lon, type = 'obs') ## Historical simulations hist <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_hist <- GSAT(data = hist, data_lats = lat, data_lons = lon, type = 'hist') ## Decadal predictions dcpp <- array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_dcpp <- GSAT(data = dcpp, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1)
Reorganize a long run (historical typically) with only one start date into chunks corresponding to a set of start dates. The time frequency of the data should be monthly.
Histo2Hindcast( data, sdatesin, sdatesout, nleadtimesout, sdate_dim = "sdate", ftime_dim = "ftime", ncores = NULL )
Histo2Hindcast( data, sdatesin, sdatesout, nleadtimesout, sdate_dim = "sdate", ftime_dim = "ftime", ncores = NULL )
data |
A numeric array of model or observational data with dimensions at least sdate_dim and ftime_dim. |
sdatesin |
A character string of the start date of 'data'. The format should be 'YYYYMMDD' or 'YYYYMM'. |
sdatesout |
A vector of character string indicating the expected start dates of the output. The format should be 'YYYYMMDD' or 'YYYYMM'. |
nleadtimesout |
A positive integer indicating the length of leadtimes of the output. |
sdate_dim |
A character string indicating the name of the sdate date dimension of 'data'. The default value is 'sdate'. |
ftime_dim |
A character string indicating the name of the lead time dimension of 'data'. The default value is 'ftime'. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numeric array with the same dimensions as data, except the length of sdate_dim is 'sdatesout' and the length of ftime_dim is nleadtimesout.
sdates_out <- c('19901101', '19911101', '19921101', '19931101', '19941101') leadtimes_per_startdate <- 12 exp_data <- Histo2Hindcast(sampleData$mod, startDates, sdates_out, leadtimes_per_startdate) obs_data <- Histo2Hindcast(sampleData$obs, startDates, sdates_out, leadtimes_per_startdate) ## Not run: exp_data <- Reorder(exp_data, c(3, 4, 1, 2)) obs_data <- Reorder(obs_data, c(3, 4, 1, 2)) PlotAno(exp_data, obs_data, sdates_out, toptitle = paste('Anomalies reorganized into shorter chunks'), ytitle = 'K', fileout = NULL) ## End(Not run)
sdates_out <- c('19901101', '19911101', '19921101', '19931101', '19941101') leadtimes_per_startdate <- 12 exp_data <- Histo2Hindcast(sampleData$mod, startDates, sdates_out, leadtimes_per_startdate) obs_data <- Histo2Hindcast(sampleData$obs, startDates, sdates_out, leadtimes_per_startdate) ## Not run: exp_data <- Reorder(exp_data, c(3, 4, 1, 2)) obs_data <- Reorder(obs_data, c(3, 4, 1, 2)) PlotAno(exp_data, obs_data, sdates_out, toptitle = paste('Anomalies reorganized into shorter chunks'), ytitle = 'K', fileout = NULL) ## End(Not run)
Insert an extra dimension into an array at position 'posdim' with length 'lendim'. The array repeats along the new dimension.
InsertDim(data, posdim, lendim, name = NULL)
InsertDim(data, posdim, lendim, name = NULL)
data |
An array to which the additional dimension to be added. |
posdim |
An integer indicating the position of the new dimension. |
lendim |
An integer indicating the length of the new dimension. |
name |
A character string indicating the name for the new dimension. The default value is NULL. |
An array as parameter 'data' but with the added named dimension.
a <- array(rnorm(15), dim = c(a = 3, b = 1, c = 5, d = 1)) res <- InsertDim(InsertDim(a, posdim = 2, lendim = 1, name = 'e'), 4, c(f = 2)) dim(res)
a <- array(rnorm(15), dim = c(a = 3, b = 1, c = 5, d = 1)) res <- InsertDim(InsertDim(a, posdim = 2, lendim = 1, name = 'e'), 4, c(f = 2)) dim(res)
This function tells whether a year is a leap year or not.
LeapYear(year)
LeapYear(year)
year |
A numeric value indicating the year in the Gregorian calendar. |
Boolean telling whether the year is a leap year or not.
print(LeapYear(1990)) print(LeapYear(1991)) print(LeapYear(1992)) print(LeapYear(1993))
print(LeapYear(1990)) print(LeapYear(1991)) print(LeapYear(1992)) print(LeapYear(1993))
This function loads monthly or daily data from a set of specified
experimental datasets together with data that date-corresponds from a set
of specified observational datasets. See parameters 'storefreq',
'sampleperiod', 'exp' and 'obs'.
A set of starting dates is specified through the parameter 'sdates'. Data of
each starting date is loaded for each model.
Load()
arranges the data in two arrays with a similar format both
with the following dimensions:
The number of experimental datasets determined by the user through the argument 'exp' (for the experimental data array) or the number of observational datasets available for validation (for the observational array) determined as well by the user through the argument 'obs'.
The greatest number of members across all experiments (in the experimental data array) or across all observational datasets (in the observational data array).
The number of starting dates determined by the user through the 'sdates' argument.
The greatest number of lead-times.
The number of latitudes of the selected zone.
The number of longitudes of the selected zone.
Dimensions 5 and 6 are optional and their presence depends on the type of
the specified variable (global mean or 2-dimensional) and on the selected
output type (area averaged time series, latitude averaged time series,
longitude averaged time series or 2-dimensional time series).
In the case of loading an area average the dimensions of the arrays will be
only the first 4.
Only a specified variable is loaded from each experiment at each starting
date. See parameter 'var'.
Afterwards, observational data that matches every starting date and lead-time
of every experimental dataset is fetched in the file system (so, if two
predictions at two different start dates overlap, some observational values
will be loaded and kept in memory more than once).
If no data is found in the file system for an experimental or observational
array point it is filled with an NA value.
If the specified output is 2-dimensional or latitude- or longitude-averaged
time series all the data is interpolated into a common grid. If the
specified output type is area averaged time series the data is averaged on
the individual grid of each dataset but can also be averaged after
interpolating into a common grid. See parameters 'grid' and 'method'.
Once the two arrays are filled by calling this function, other functions in
the s2dv package that receive as inputs data formatted in this
data structure can be executed (e.g: Clim()
to compute climatologies,
Ano()
to compute anomalies, ...).
Load() has many additional parameters to disable values and trim dimensions
of selected variable, even masks can be applied to 2-dimensional variables.
See parameters 'nmember', 'nmemberobs', 'nleadtime', 'leadtimemin',
'leadtimemax', 'sampleperiod', 'lonmin', 'lonmax', 'latmin', 'latmax',
'maskmod', 'maskobs', 'varmin', 'varmax'.
The parameters 'exp' and 'obs' can take various forms. The most direct form
is a list of lists, where each sub-list has the component 'path' associated
to a character string with a pattern of the path to the files of a dataset
to be loaded. These patterns can contain wildcards and tags that will be
replaced automatically by Load()
with the specified starting dates,
member numbers, variable name, etc.
See parameter 'exp' or 'obs' for details.
Only NetCDF files are supported. OPeNDAP URLs to NetCDF files are also
supported.Load()
can load 2-dimensional or global mean variables in any of the
following formats:
experiments:
file per ensemble per starting date (YYYY, MM and DD somewhere in the path)
file per member per starting date
(YYYY, MM, DD and MemberNumber somewhere in the path. Ensemble
experiments with different numbers of members can be loaded in
a single Load()
call.)
(YYYY, MM and DD specify the starting dates of the predictions)
observations:
file per ensemble per month (YYYY and MM somewhere in the path)
file per member per month (YYYY, MM and MemberNumber somewhere in the path, obs with different numbers of members supported)
file per dataset (No constraints in the path but the time axes in the file have to be properly defined)
(YYYY and MM correspond to the actual month data in the file)
In all the formats the data can be stored in a daily or monthly frequency,
or a multiple of these (see parameters 'storefreq' and 'sampleperiod').
All the data files must contain the target variable defined over time and
potentially over members, latitude and longitude dimensions in any order,
time being the record dimension.
In the case of a two-dimensional variable, the variables longitude and
latitude must be defined inside the data file too and must have the same
names as the dimension for longitudes and latitudes respectively.
The names of these dimensions (and longitude and latitude variables) and the
name for the members dimension are expected to be 'longitude', 'latitude'
and 'ensemble' respectively. However, these names can be adjusted with the
parameter 'dimnames' or can be configured in the configuration file (read
below in parameters 'exp', 'obs' or see ?ConfigFileOpen
for more information.
All the data files are expected to have numeric values representable with
32 bits. Be aware when choosing the fill values or infinite values in the
datasets to load.
The Load() function returns a named list following a structure similar to
the used in the package 'downscaleR'.
The components are the following:
'mod' is the array that contains the experimental data. It has the attribute 'dimensions' associated to a vector of strings with the labels of each dimension of the array, in order.
'obs' is the array that contains the observational data. It has the attribute 'dimensions' associated to a vector of strings with the labels of each dimension of the array, in order.
'obs' is the array that contains the observational data.
'lat' and 'lon' are the latitudes and longitudes of the grid into
which the data is interpolated (0 if the loaded variable is a global
mean or the output is an area average).
Both have the attribute 'cdo_grid_des' associated with a character
string with the name of the common grid of the data, following the CDO
naming conventions for grids.
The attribute 'projection' is kept for compatibility with 'downscaleR'.
'Variable' has the following components:
'varName', with the short name of the loaded variable as specified in the parameter 'var'.
'level', with information on the pressure level of the variable. Is kept to NULL by now.
And the following attributes:
'is_standard', kept for compatibility with 'downscaleR', tells if a dataset has been homogenized to standards with 'downscaleR' catalogs.
'units', a character string with the units of measure of the variable, as found in the source files.
'longname', a character string with the long name of the variable, as found in the source files.
'daily_agg_cellfun', 'monthly_agg_cellfun', 'verification_time', kept for compatibility with 'downscaleR'.
'Datasets' has the following components:
'exp', a named list where the names are the identifying character strings of each experiment in 'exp', each associated to a list with the following components:
'members', a list with the names of the members of the dataset.
'source', a path or URL to the source of the dataset.
'obs', similar to 'exp' but for observational datasets.
'Dates', with the follwing components:
'start', an array of dimensions (sdate, time) with the POSIX initial date of each forecast time of each starting date.
'end', an array of dimensions (sdate, time) with the POSIX final date of each forecast time of each starting date.
'InitializationDates', a vector of starting dates as specified in 'sdates', in POSIX format.
'when', a time stamp of the date the Load()
call to obtain
the data was issued.
'source_files', a vector of character strings with complete paths
to all the found files involved in the Load()
call.
'not_found_files', a vector of character strings with complete
paths to not found files involved in the Load()
call.
Load( var, exp = NULL, obs = NULL, sdates, nmember = NULL, nmemberobs = NULL, nleadtime = NULL, leadtimemin = 1, leadtimemax = NULL, storefreq = "monthly", sampleperiod = 1, lonmin = 0, lonmax = 360, latmin = -90, latmax = 90, output = "areave", method = "conservative", grid = NULL, maskmod = vector("list", 15), maskobs = vector("list", 15), configfile = NULL, varmin = NULL, varmax = NULL, silent = FALSE, nprocs = NULL, dimnames = NULL, remapcells = 2, path_glob_permissive = "partial" )
Load( var, exp = NULL, obs = NULL, sdates, nmember = NULL, nmemberobs = NULL, nleadtime = NULL, leadtimemin = 1, leadtimemax = NULL, storefreq = "monthly", sampleperiod = 1, lonmin = 0, lonmax = 360, latmin = -90, latmax = 90, output = "areave", method = "conservative", grid = NULL, maskmod = vector("list", 15), maskobs = vector("list", 15), configfile = NULL, varmin = NULL, varmax = NULL, silent = FALSE, nprocs = NULL, dimnames = NULL, remapcells = 2, path_glob_permissive = "partial" )
var |
Short name of the variable to load. It should coincide with the
variable name inside the data files. |
exp |
Parameter to specify which experimental datasets to load data
from.
The tag $START_DATES$ will be replaced with all the starting dates
specified in 'sdates'. $YEAR$, $MONTH$ and $DAY$ will take a value for each
iteration over 'sdates', simply these are the same as $START_DATE$ but
split in parts. list( list( name = 'experimentA', path = file.path('/path/to/$DATASET_NAME$/$STORE_FREQ$', '$VAR_NAME$$SUFFIX$', '$VAR_NAME$_$START_DATE$.nc'), nc_var_name = '$VAR_NAME$', suffix = '_3hourly', var_min = '-1e19', var_max = '1e19' ) ) This will make |
obs |
Argument with the same format as parameter 'exp'. See details on
parameter 'exp'. |
sdates |
Vector of starting dates of the experimental runs to be loaded
following the pattern 'YYYYMMDD'. |
nmember |
Vector with the numbers of members to load from the specified
experimental datasets in 'exp'. |
nmemberobs |
Vector with the numbers of members to load from the
specified observational datasets in 'obs'. |
nleadtime |
Deprecated. See parameter 'leadtimemax'. |
leadtimemin |
Only lead-times higher or equal to 'leadtimemin' are loaded. Takes by default value 1. |
leadtimemax |
Only lead-times lower or equal to 'leadtimemax' are loaded.
Takes by default the number of lead-times of the first experimental
dataset in 'exp'. |
storefreq |
Frequency at which the data to be loaded is stored in the
file system. Can take values 'monthly' or 'daily'. |
sampleperiod |
To load only a subset between 'leadtimemin' and
'leadtimemax' with the period of subsampling 'sampleperiod'. |
lonmin |
If a 2-dimensional variable is loaded, values at longitudes
lower than 'lonmin' aren't loaded. |
lonmax |
If a 2-dimensional variable is loaded, values at longitudes
higher than 'lonmax' aren't loaded. |
latmin |
If a 2-dimensional variable is loaded, values at latitudes
lower than 'latmin' aren't loaded. |
latmax |
If a 2-dimensional variable is loaded, values at latitudes
higher than 'latmax' aren't loaded. |
output |
This parameter determines the format in which the data is
arranged in the output arrays.
Takes by default the value 'areave'. If the variable specified in 'var' is
a global mean, this parameter is forced to 'areave'. |
method |
This parameter determines the interpolation method to be used
when regridding data (see 'output'). Can take values 'bilinear', 'bicubic',
'conservative', 'distance-weighted'. |
grid |
A common grid can be specified through the parameter 'grid' when
loading 2-dimensional data. Data is then interpolated onto this grid
whichever 'output' type is specified. If the selected output type is
'areave' and a 'grid' is specified, the area averages are calculated after
interpolating to the specified grid. |
maskmod |
List of masks to be applied to the data of each experimental
dataset respectively, if a 2-dimensional variable is specified in 'var'. |
maskobs |
See help on parameter 'maskmod'. |
configfile |
Path to the s2dv configuration file from which
to retrieve information on location in file system (and other) of datasets. |
varmin |
Loaded experimental and observational data values smaller
than 'varmin' will be disabled (replaced by NA values). |
varmax |
Loaded experimental and observational data values greater
than 'varmax' will be disabled (replaced by NA values). |
silent |
Parameter to show (FALSE) or hide (TRUE) information messages. |
nprocs |
Number of parallel processes created to perform the fetch
and computation of data. |
dimnames |
Named list where the name of each element is a generic
name of the expected dimensions inside the NetCDF files. These generic
names are 'lon', 'lat' and 'member'. 'time' is not needed because it's
detected automatically by discard. |
remapcells |
When loading a 2-dimensional variable, spatial subsets can
be requested via |
path_glob_permissive |
In some cases, when specifying a path pattern
(either in the parameters 'exp'/'obs' or in a configuration file) one can
specify path patterns that contain shell globbing expressions. Too much
freedom in putting globbing expressions in the path patterns can be
dangerous and make |
The two output matrices have between 2 and 6 dimensions:
Number of experimental/observational datasets.
Number of members.
Number of startdates.
Number of leadtimes.
Number of latitudes (optional).
Number of longitudes (optional).
but the two matrices have the same number of dimensions and only the first two dimensions can have different lengths depending on the input arguments. For a detailed explanation of the process, read the documentation attached to the package or check the comments in the code.
Load()
returns a named list following a structure similar to the
used in the package 'downscaleR'.
The components are the following:
'mod' is the array that contains the experimental data. It has the
attribute 'dimensions' associated to a vector of strings with the
labels of each dimension of the array, in order. The order of the
latitudes is always forced to be from 90 to -90 whereas the order of
the longitudes is kept as in the original files (if possible). The
longitude values provided in lon
lower than 0 are added 360
(but still kept in the original order). In some cases, however, if
multiple data sets are loaded in longitude-latitude mode, the
longitudes (and also the data arrays in mod
and obs
) are
re-ordered afterwards by Load()
to range from 0 to 360; a
warning is given in such cases. The longitude and latitude of the
center of the grid cell that corresponds to the value [j, i] in 'mod'
(along the dimensions latitude and longitude, respectively) can be
found in the outputs lon
[i] and lat
[j]
'obs' is the array that contains the observational data. The same documentation of parameter 'mod' applies to this parameter.
'lat' and 'lon' are the latitudes and longitudes of the centers of
the cells of the grid the data is interpolated into (0 if the loaded
variable is a global mean or the output is an area average).
Both have the attribute 'cdo_grid_des' associated with a character
string with the name of the common grid of the data, following the CDO
naming conventions for grids.
'lon' has the attributes 'first_lon' and 'last_lon', with the first
and last longitude values found in the region defined by 'lonmin' and
'lonmax'. 'lat' has also the equivalent attributes 'first_lat' and
'last_lat'.
'lon' has also the attribute 'data_across_gw' which tells whether the
requested region via 'lonmin', 'lonmax', 'latmin', 'latmax' goes across
the Greenwich meridian. As explained in the documentation of the
parameter 'mod', the loaded data array is kept in the same order as in
the original files when possible: this means that, in some cases, even
if the data goes across the Greenwich, the data array may not go
across the Greenwich. The attribute 'array_across_gw' tells whether
the array actually goes across the Greenwich. E.g: The longitudes in
the data files are defined to be from 0 to 360. The requested
longitudes are from -80 to 40. The original order is kept, hence the
longitudes in the array will be ordered as follows:
0, ..., 40, 280, ..., 360. In that case, 'data_across_gw' will be TRUE
and 'array_across_gw' will be FALSE.
The attribute 'projection' is kept for compatibility with 'downscaleR'.
'Variable' has the following components:
'varName', with the short name of the loaded variable as specified in the parameter 'var'.
'level', with information on the pressure level of the variable. Is kept to NULL by now.
And the following attributes:
'is_standard', kept for compatibility with 'downscaleR', tells if a dataset has been homogenized to standards with 'downscaleR' catalogs.
'units', a character string with the units of measure of the variable, as found in the source files.
'longname', a character string with the long name of the variable, as found in the source files.
'daily_agg_cellfun', 'monthly_agg_cellfun', 'verification_time', kept for compatibility with 'downscaleR'.
'Datasets' has the following components:
'exp', a named list where the names are the identifying character strings of each experiment in 'exp', each associated to a list with the following components:
'members', a list with the names of the members of the dataset.
'source', a path or URL to the source of the dataset.
'obs', similar to 'exp' but for observational datasets.
'Dates', with the follwing components:
'start', an array of dimensions (sdate, time) with the POSIX initial date of each forecast time of each starting date.
'end', an array of dimensions (sdate, time) with the POSIX final date of each forecast time of each starting date.
'InitializationDates', a vector of starting dates as specified in 'sdates', in POSIX format.
'when', a time stamp of the date the Load()
call to obtain
the data was issued.
'source_files', a vector of character strings with complete paths
to all the found files involved in the Load()
call.
'not_found_files', a vector of character strings with complete
paths to not found files involved in the Load()
call.
# Let's assume we want to perform verification with data of a variable # called 'tos' from a model called 'model' and observed data coming from # an observational dataset called 'observation'. # # The model was run in the context of an experiment named 'experiment'. # It simulated from 1st November in 1985, 1990, 1995, 2000 and 2005 for a # period of 5 years time from each starting date. 5 different sets of # initial conditions were used so an ensemble of 5 members was generated # for each starting date. # The model generated values for the variables 'tos' and 'tas' in a # 3-hourly frequency but, after some initial post-processing, it was # averaged over every month. # The resulting monthly average series were stored in a file for each # starting date for each variable with the data of the 5 ensemble members. # The resulting directory tree was the following: # model # |--> experiment # |--> monthly_mean # |--> tos_3hourly # | |--> tos_19851101.nc # | |--> tos_19901101.nc # | . # | . # | |--> tos_20051101.nc # |--> tas_3hourly # |--> tas_19851101.nc # |--> tas_19901101.nc # . # . # |--> tas_20051101.nc # # The observation recorded values of 'tos' and 'tas' at each day of the # month over that period but was also averaged over months and stored in # a file per month. The directory tree was the following: # observation # |--> monthly_mean # |--> tos # | |--> tos_198511.nc # | |--> tos_198512.nc # | |--> tos_198601.nc # | . # | . # | |--> tos_201010.nc # |--> tas # |--> tas_198511.nc # |--> tas_198512.nc # |--> tas_198601.nc # . # . # |--> tas_201010.nc # # The model data is stored in a file-per-startdate fashion and the # observational data is stored in a file-per-month, and both are stored in # a monthly frequency. The file format is NetCDF. # Hence all the data is supported by Load() (see details and other supported # conventions in ?Load) but first we need to configure it properly. # # These data files are included in the package (in the 'sample_data' folder), # only for the variable 'tos'. They have been interpolated to a very low # resolution grid so as to make it on CRAN. # The original grid names (following CDO conventions) for experimental and # observational data were 't106grid' and 'r180x89' respectively. The final # resolutions are 'r20x10' and 'r16x8' respectively. # The experimental data comes from the decadal climate prediction experiment # run at IC3 in the context of the CMIP5 project. Its name within IC3 local # database is 'i00k'. # The observational dataset used for verification is the 'ERSST' # observational dataset. # # The next two examples are equivalent and show how to load the variable # 'tos' from these sample datasets, the first providing lists of lists to # the parameters 'exp' and 'obs' (see documentation on these parameters) and # the second providing vectors of character strings, hence using a # configuration file. # # The code is not run because it dispatches system calls to 'cdo' which is # not allowed in the examples as per CRAN policies. You can run it on your # system though. # Instead, the code in 'dontshow' is run, which loads the equivalent # already processed data in R. # # Example 1: Providing lists of lists to 'exp' and 'obs': # ## Not run: data_path <- system.file('sample_data', package = 's2dv') exp <- list( name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/monthly_mean', '$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATES$.nc') ) obs <- list( name = 'observation', path = file.path(data_path, 'observation/$OBS_NAME$/monthly_mean', '$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc') ) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(exp), list(obs), startDates, output = 'areave', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) # # Example 2: Providing vectors of character strings to 'exp' and 'obs' # and using a configuration file. # # The configuration file 'sample.conf' that we will create in the example # has the proper entries to load these (see ?LoadConfigFile for details on # writing a configuration file). # ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, output = 'areave', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) # # Example 3: providing character strings in 'exp' and 'obs', and providing # a configuration file. # The configuration file 'sample.conf' that we will create in the example # has the proper entries to load these (see ?LoadConfigFile for details on # writing a configuration file). # configfile <- paste0(tempdir(), '/sample.conf') ConfigFileCreate(configfile, confirm = FALSE) c <- ConfigFileOpen(configfile) c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MIN', '-1e19', confirm = FALSE) c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MAX', '1e19', confirm = FALSE) data_path <- system.file('sample_data', package = 's2dv') exp_data_path <- paste0(data_path, '/model/$EXP_NAME$/') obs_data_path <- paste0(data_path, '/$OBS_NAME$/') c <- ConfigAddEntry(c, 'experiments', dataset_name = 'experiment', var_name = 'tos', main_path = exp_data_path, file_path = '$STORE_FREQ$_mean/$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATE$.nc') c <- ConfigAddEntry(c, 'observations', dataset_name = 'observation', var_name = 'tos', main_path = obs_data_path, file_path = '$STORE_FREQ$_mean/$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc') ConfigFileSave(c, configfile, confirm = FALSE) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', c('experiment'), c('observation'), startDates, output = 'areave', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40, configfile = configfile) ## End(Not run)
# Let's assume we want to perform verification with data of a variable # called 'tos' from a model called 'model' and observed data coming from # an observational dataset called 'observation'. # # The model was run in the context of an experiment named 'experiment'. # It simulated from 1st November in 1985, 1990, 1995, 2000 and 2005 for a # period of 5 years time from each starting date. 5 different sets of # initial conditions were used so an ensemble of 5 members was generated # for each starting date. # The model generated values for the variables 'tos' and 'tas' in a # 3-hourly frequency but, after some initial post-processing, it was # averaged over every month. # The resulting monthly average series were stored in a file for each # starting date for each variable with the data of the 5 ensemble members. # The resulting directory tree was the following: # model # |--> experiment # |--> monthly_mean # |--> tos_3hourly # | |--> tos_19851101.nc # | |--> tos_19901101.nc # | . # | . # | |--> tos_20051101.nc # |--> tas_3hourly # |--> tas_19851101.nc # |--> tas_19901101.nc # . # . # |--> tas_20051101.nc # # The observation recorded values of 'tos' and 'tas' at each day of the # month over that period but was also averaged over months and stored in # a file per month. The directory tree was the following: # observation # |--> monthly_mean # |--> tos # | |--> tos_198511.nc # | |--> tos_198512.nc # | |--> tos_198601.nc # | . # | . # | |--> tos_201010.nc # |--> tas # |--> tas_198511.nc # |--> tas_198512.nc # |--> tas_198601.nc # . # . # |--> tas_201010.nc # # The model data is stored in a file-per-startdate fashion and the # observational data is stored in a file-per-month, and both are stored in # a monthly frequency. The file format is NetCDF. # Hence all the data is supported by Load() (see details and other supported # conventions in ?Load) but first we need to configure it properly. # # These data files are included in the package (in the 'sample_data' folder), # only for the variable 'tos'. They have been interpolated to a very low # resolution grid so as to make it on CRAN. # The original grid names (following CDO conventions) for experimental and # observational data were 't106grid' and 'r180x89' respectively. The final # resolutions are 'r20x10' and 'r16x8' respectively. # The experimental data comes from the decadal climate prediction experiment # run at IC3 in the context of the CMIP5 project. Its name within IC3 local # database is 'i00k'. # The observational dataset used for verification is the 'ERSST' # observational dataset. # # The next two examples are equivalent and show how to load the variable # 'tos' from these sample datasets, the first providing lists of lists to # the parameters 'exp' and 'obs' (see documentation on these parameters) and # the second providing vectors of character strings, hence using a # configuration file. # # The code is not run because it dispatches system calls to 'cdo' which is # not allowed in the examples as per CRAN policies. You can run it on your # system though. # Instead, the code in 'dontshow' is run, which loads the equivalent # already processed data in R. # # Example 1: Providing lists of lists to 'exp' and 'obs': # ## Not run: data_path <- system.file('sample_data', package = 's2dv') exp <- list( name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/monthly_mean', '$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATES$.nc') ) obs <- list( name = 'observation', path = file.path(data_path, 'observation/$OBS_NAME$/monthly_mean', '$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc') ) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(exp), list(obs), startDates, output = 'areave', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) # # Example 2: Providing vectors of character strings to 'exp' and 'obs' # and using a configuration file. # # The configuration file 'sample.conf' that we will create in the example # has the proper entries to load these (see ?LoadConfigFile for details on # writing a configuration file). # ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, output = 'areave', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) # # Example 3: providing character strings in 'exp' and 'obs', and providing # a configuration file. # The configuration file 'sample.conf' that we will create in the example # has the proper entries to load these (see ?LoadConfigFile for details on # writing a configuration file). # configfile <- paste0(tempdir(), '/sample.conf') ConfigFileCreate(configfile, confirm = FALSE) c <- ConfigFileOpen(configfile) c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MIN', '-1e19', confirm = FALSE) c <- ConfigEditDefinition(c, 'DEFAULT_VAR_MAX', '1e19', confirm = FALSE) data_path <- system.file('sample_data', package = 's2dv') exp_data_path <- paste0(data_path, '/model/$EXP_NAME$/') obs_data_path <- paste0(data_path, '/$OBS_NAME$/') c <- ConfigAddEntry(c, 'experiments', dataset_name = 'experiment', var_name = 'tos', main_path = exp_data_path, file_path = '$STORE_FREQ$_mean/$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATE$.nc') c <- ConfigAddEntry(c, 'observations', dataset_name = 'observation', var_name = 'tos', main_path = obs_data_path, file_path = '$STORE_FREQ$_mean/$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc') ConfigFileSave(c, configfile, confirm = FALSE) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', c('experiment'), c('observation'), startDates, output = 'areave', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40, configfile = configfile) ## End(Not run)
This function returns the mean of an array along a set of dimensions and preserves the dimension names if it has.
MeanDims(data, dims, na.rm = FALSE, drop = TRUE)
MeanDims(data, dims, na.rm = FALSE, drop = TRUE)
data |
An array to be averaged. |
dims |
A vector of numeric or charactor string, indicating along which dimensions to average. |
na.rm |
A logical value indicating whether to ignore NA values (TRUE) or not (FALSE). |
drop |
A logical value indicating whether to keep the averaged dimension (FALSE) or drop it (TRUE). The default value is TRUE. |
A numeric or an array with the same dimension as parameter 'data' except the 'dims' dimensions. If 'drop' is TRUE, 'dims' will be removed; if 'drop' is FALSE, 'dims' will be preserved and the length will be 1. If all the dimensions are averaged out, a numeric is returned.
a <- array(rnorm(24), dim = c(dat = 2, member = 3, time = 4)) ens_mean <- MeanDims(a, 'member') dim(ens_mean) ens_time_mean <- MeanDims(a, c(2, 3), drop = FALSE) dim(ens_time_mean)
a <- array(rnorm(24), dim = c(dat = 2, member = 3, time = 4)) ens_mean <- MeanDims(a, 'member') dim(ens_mean) ens_time_mean <- MeanDims(a, c(2, 3), drop = FALSE) dim(ens_time_mean)
Compute the mean square error for an array of forecasts and an array of
observations. The MSEs are computed along time_dim, the dimension which
corresponds to the start date dimension. If comp_dim is given, the MSEs are
computed only if obs along the comp_dim dimension are complete between
limits[1] and limits[2], i.e. there are no NAs between limits[1] and
limits[2]. This option can be activated if the user wants to account only
for the forecasts for which the corresponding observations are available at
all leadtimes.
The confidence interval is computed by the chi2 distribution.
MSE( exp, obs, time_dim = "sdate", dat_dim = NULL, memb_dim = NULL, comp_dim = NULL, limits = NULL, conf = TRUE, alpha = 0.05, ncores = NULL )
MSE( exp, obs, time_dim = "sdate", dat_dim = NULL, memb_dim = NULL, comp_dim = NULL, limits = NULL, conf = TRUE, alpha = 0.05, ncores = NULL )
exp |
A named numeric array of experimental data, with at least 'time_dim' dimension. It can also be a vector with the same length as 'obs'. |
obs |
A named numeric array of observational data, same dimensions as parameter 'exp' except along 'dat_dim' and 'memb_dim'. It can also be a vector with the same length as 'exp'. |
time_dim |
A character string indicating the name of dimension along which the correlations are computed. The default value is 'sdate'. |
dat_dim |
A character string indicating the name of dataset or member (nobs/nexp) dimension. The datasets of exp and obs will be paired and computed MSE for each pair. The default value is NULL. |
memb_dim |
A character string indicating the name of the member dimension to compute the ensemble mean; it should be set to NULL if the input data are already the ensemble mean. The default value is NULL. |
comp_dim |
A character string indicating the name of dimension along which obs is taken into account only if it is complete. The default value is NULL. |
limits |
A vector of two integers indicating the range along comp_dim to be completed. The default value is c(1, length(comp_dim dimension)). |
conf |
A logical value indicating whether to retrieve the confidence intervals or not. The default value is TRUE. |
alpha |
A numeric indicating the significance level for the statistical significance test. The default value is 0.05. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing the numeric arrays with dimension:
c(nexp, nobs, all other dimensions of exp except time_dim).
nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the
number of observation (i.e., dat_dim in obs).
$mse |
The mean square error. |
$conf.lower |
The lower confidence interval. Only present if |
$conf.upper |
The upper confidence interval. Only present if |
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = 12, time_dim = 'ftime') smooth_ano_obs <- Smoothing(ano_obs, runmeanlen = 12, time_dim = 'ftime') res <- MSE(smooth_ano_exp, smooth_ano_obs, memb_dim = 'member', comp_dim = 'ftime', limits = c(7, 54)) # Synthetic data: exp1 <- array(rnorm(120), dim = c(dat = 3, sdate = 10, ftime = 4)) obs1 <- array(rnorm(80), dim = c(dat = 2, sdate = 10, ftime = 4)) res1 <- MSE(exp1, obs1, comp_dim = 'ftime', dat_dim = 'dat') exp2 <- array(rnorm(20), dim = c(sdate = 5, member = 4)) obs2 <- array(rnorm(10), dim = c(sdate = 5, member = 2)) res2 <- MSE(exp2, obs2, memb_dim = 'member')
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = 12, time_dim = 'ftime') smooth_ano_obs <- Smoothing(ano_obs, runmeanlen = 12, time_dim = 'ftime') res <- MSE(smooth_ano_exp, smooth_ano_obs, memb_dim = 'member', comp_dim = 'ftime', limits = c(7, 54)) # Synthetic data: exp1 <- array(rnorm(120), dim = c(dat = 3, sdate = 10, ftime = 4)) obs1 <- array(rnorm(80), dim = c(dat = 2, sdate = 10, ftime = 4)) res1 <- MSE(exp1, obs1, comp_dim = 'ftime', dat_dim = 'dat') exp2 <- array(rnorm(20), dim = c(sdate = 5, member = 4)) obs2 <- array(rnorm(10), dim = c(sdate = 5, member = 2)) res2 <- MSE(exp2, obs2, memb_dim = 'member')
Compute the mean square error skill score (MSSS) between an array of forecast
'exp' and an array of observation 'obs'. The two arrays should have the same
dimensions except along 'dat_dim' and 'memb_dim'. The MSSSs are computed along
'time_dim', the dimension which corresponds to the start date dimension.
MSSS computes the mean square error skill score of each exp in 1:nexp
against each obs in 1:nobs which gives nexp * nobs MSSS for each grid point
of the array.
The p-value and significance test are optionally provided by an one-sided
Fisher test or Random Walk test.
MSSS( exp, obs, ref = NULL, time_dim = "sdate", dat_dim = NULL, memb_dim = NULL, pval = TRUE, sign = FALSE, alpha = 0.05, N.eff = NA, sig_method = "one-sided Fisher", sig_method.type = NULL, ncores = NULL )
MSSS( exp, obs, ref = NULL, time_dim = "sdate", dat_dim = NULL, memb_dim = NULL, pval = TRUE, sign = FALSE, alpha = 0.05, N.eff = NA, sig_method = "one-sided Fisher", sig_method.type = NULL, ncores = NULL )
exp |
A named numeric array of experimental data which contains at least time dimension (time_dim). It can also be a vector with the same length as 'obs', then the vector will automatically be 'time_dim'. |
obs |
A named numeric array of observational data which contains at least time dimension (time_dim). The dimensions should be the same as parameter 'exp' except the length of 'dat_dim' and 'memb_dim' dimension. It can also be a vector with the same length as 'exp', then the vector will automatically be 'time_dim'. |
ref |
A named numerical array of the reference forecast data with at least time dimension, or 0 (typical climatological forecast) or 1 (normalized climatological forecast). If it is an array, the dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. If there is only one reference dataset, it should not have dataset dimension. If there is corresponding reference for each experiment, the dataset dimension must have the same length as in 'exp'. If 'ref' is NULL, the typical climatological forecast is used as reference forecast (equivalent to 0.) The default value is NULL. |
time_dim |
A character string indicating the name of dimension along which the MSSS are computed. The default value is 'sdate'. |
dat_dim |
A character string indicating the name of dataset (nobs/nexp) dimension. The default value is NULL. |
memb_dim |
A character string indicating the name of the member dimension to compute the ensemble mean; it should be set to NULL if the data are already the ensemble mean. The default value is NULL. |
pval |
A logical value indicating whether to compute or not the p-value of the test Ho: MSSS = 0. The default value is TRUE. |
sign |
A logical value indicating whether to compute or not the statistical significance of the test Ho: MSSS = 0. The default value is FALSE. |
alpha |
A numeric of the significance level to be used in the statistical significance test. The default value is 0.05. |
N.eff |
Effective sample size to be used in the statistical significance test with the Random Walk. It can be NA (and it will be computed with the s2dv:::.Eno), FALSE (and it will use the length of "obs" along "time_dim", so the autocorrelation is not taken into account), a numeric (which is used for all cases), or an array with the same dimensions as "obs" except "time_dim" (for a particular N.eff to be used for each case). The default value is NA. |
sig_method |
A character string indicating the significance method. The options are "one-sided Fisher" (default) and "Random Walk". |
sig_method.type |
A character string indicating the test type of the
significance method. Check |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing the numeric arrays with dimension:
c(nexp, nobs, all other dimensions of exp except time_dim).
nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the
number of observation (i.e., dat_dim in obs). If dat_dim is NULL, nexp and
nobs are omitted.
$msss |
A numerical array of the mean square error skill score. |
$p.val |
A numerical array of the p-value with the same dimensions as $msss.
Only present if |
sign |
A logical array of the statistical significance of the MSSS with the same
dimensions as $msss. Only present if |
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) rmsss <- MSSS(ano_exp, ano_obs, dat_dim = 'dataset', memb_dim = 'member') # Synthetic data: exp <- array(rnorm(30), dim = c(dataset = 2, time = 3, memb = 5)) obs <- array(rnorm(15), dim = c(time = 3, dataset = 1)) res <- MSSS(exp, obs, time_dim = 'time', dat_dim = 'dataset', memb_dim = 'memb')
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) rmsss <- MSSS(ano_exp, ano_obs, dat_dim = 'dataset', memb_dim = 'member') # Synthetic data: exp <- array(rnorm(30), dim = c(dataset = 2, time = 3, memb = 5)) obs <- array(rnorm(15), dim = c(time = 3, dataset = 1)) res <- MSSS(exp, obs, time_dim = 'time', dat_dim = 'dataset', memb_dim = 'memb')
Compute the North Atlantic Oscillation (NAO) index based on the leading EOF of the sea level pressure (SLP) anomalies over the north Atlantic region (20N-80N, 80W-40E). The PCs are obtained by projecting the forecast and observed anomalies onto the observed EOF pattern or the forecast anomalies onto the EOF pattern of the other years of the forecast. By default (ftime_avg = 2:4), NAO() computes the NAO index for 1-month lead seasonal forecasts that can be plotted with PlotBoxWhisker(). It returns cross-validated PCs of the NAO index for hindcast (exp) and observations (obs) based on the leading EOF pattern, or, if forecast (exp_cor) is provided, the NAO index for forecast and the corresponding data (exp and obs).
NAO( exp = NULL, obs = NULL, exp_cor = NULL, lat, lon, time_dim = "sdate", memb_dim = "member", space_dim = c("lat", "lon"), ftime_dim = "ftime", ftime_avg = 2:4, obsproj = TRUE, ncores = NULL )
NAO( exp = NULL, obs = NULL, exp_cor = NULL, lat, lon, time_dim = "sdate", memb_dim = "member", space_dim = c("lat", "lon"), ftime_dim = "ftime", ftime_avg = 2:4, obsproj = TRUE, ncores = NULL )
exp |
A named numeric array of North Atlantic SLP (20N-80N, 80W-40E)
hindcast anomalies from |
obs |
A named numeric array of North Atlantic SLP (20N-80N, 80W-40E)
observed anomalies from |
exp_cor |
A named numeric array of the Nort Atlantic SLP (20-80N, 80W-40E)
forecast anomalies from |
lat |
A vector of the latitudes of 'exp' and 'obs'. |
lon |
A vector of the longitudes of 'exp' and 'obs'. |
time_dim |
A character string indicating the name of the time dimension of 'exp' and 'obs'. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension of 'exp' (and 'obs', optional). If 'obs' has memb_dim, the length must be 1. The default value is 'member'. |
space_dim |
A vector of two character strings. The first is the dimension name of latitude of 'ano' and the second is the dimension name of longitude of 'ano'. The default value is c('lat', 'lon'). |
ftime_dim |
A character string indicating the name of the forecast time dimension of 'exp' and 'obs'. The default value is 'ftime'. |
ftime_avg |
A numeric vector of the forecast time steps to average across the target period. If average is not needed, set NULL. The default value is 2:4, i.e., from 2nd to 4th forecast time steps. |
obsproj |
A logical value indicating whether to compute the NAO index by projecting the forecast anomalies onto the leading EOF of observational reference (TRUE, default) or compute the NAO by first computing the leading EOF of the forecast anomalies (in cross-validation mode, i.e. leave the evaluated year out), then projecting forecast anomalies onto this EOF (FALSE). If 'exp_cor' is provided, 'obs' will be used when obsproj is TRUE and 'exp' will be used when obsproj is FALSE, and no cross-validation is applied. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list which contains some of the following items depending on the data inputs:
exp |
A numeric array of hindcast NAO index in verification format with the same dimensions as 'exp' except space_dim and ftime_dim. If ftime_avg is NULL, ftime_dim remains. |
obs |
A numeric array of observation NAO index in verification format with the same dimensions as 'obs' except space_dim and ftime_dim. If ftime_avg is NULL, ftime_dim remains. |
exp_cor |
A numeric array of forecast NAO index in verification format with the same dimensions as 'exp_cor' except space_dim and ftime_dim. If ftime_avg is NULL, ftime_dim remains. |
Doblas-Reyes, F.J., Pavan, V. and Stephenson, D. (2003). The skill of multi-model seasonal forecasts of the wintertime North Atlantic Oscillation. Climate Dynamics, 21, 501-514. DOI: 10.1007/s00382-003-0350-4
# Make up synthetic data set.seed(1) exp <- array(rnorm(1620), dim = c(member = 2, sdate = 3, ftime = 5, lat = 6, lon = 9)) set.seed(2) obs <- array(rnorm(1620), dim = c(member = 1, sdate = 3, ftime = 5, lat = 6, lon = 9)) lat <- seq(20, 80, length.out = 6) lon <- seq(-80, 40, length.out = 9) nao <- NAO(exp = exp, obs = obs, lat = lat, lon = lon) exp_cor <- array(rnorm(540), dim = c(member = 2, sdate = 1, ftime = 5, lat = 6, lon = 9)) nao <- NAO(exp = exp, obs = obs, exp_cor = exp_cor, lat = lat, lon = lon, obsproj = TRUE) # plot the NAO index ## Not run: nao$exp <- Reorder(nao$exp, c(2, 1)) nao$obs <- Reorder(nao$obs, c(2, 1)) PlotBoxWhisker(nao$exp, nao$obs, "NAO index, DJF", "NAO index (PC1) TOS", monini = 12, yearini = 1985, freq = 1, "Exp. A", "Obs. X") ## End(Not run)
# Make up synthetic data set.seed(1) exp <- array(rnorm(1620), dim = c(member = 2, sdate = 3, ftime = 5, lat = 6, lon = 9)) set.seed(2) obs <- array(rnorm(1620), dim = c(member = 1, sdate = 3, ftime = 5, lat = 6, lon = 9)) lat <- seq(20, 80, length.out = 6) lon <- seq(-80, 40, length.out = 9) nao <- NAO(exp = exp, obs = obs, lat = lat, lon = lon) exp_cor <- array(rnorm(540), dim = c(member = 2, sdate = 1, ftime = 5, lat = 6, lon = 9)) nao <- NAO(exp = exp, obs = obs, exp_cor = exp_cor, lat = lat, lon = lon, obsproj = TRUE) # plot the NAO index ## Not run: nao$exp <- Reorder(nao$exp, c(2, 1)) nao$obs <- Reorder(nao$obs, c(2, 1)) PlotBoxWhisker(nao$exp, nao$obs, "NAO index, DJF", "NAO index (PC1) TOS", monini = 12, yearini = 1985, freq = 1, "Exp. A", "Obs. X") ## End(Not run)
Compute a persistence forecast based on a lagged autoregression of
observational data along the time dimension, with a measure of forecast
uncertainty (prediction interval) based on Coelho et al., 2004.
Persistence( data, dates, start, end, ft_start, ft_end = ft_start, time_dim = "time", max_ft = 10, nmemb = 1, na.action = 10, ncores = NULL )
Persistence( data, dates, start, end, ft_start, ft_end = ft_start, time_dim = "time", max_ft = 10, nmemb = 1, na.action = 10, ncores = NULL )
data |
A numeric array corresponding to the observational data including the time dimension along which the autoregression is computed. The data should start at least 40 time steps (years or days) before 'start'. |
dates |
A sequence of 4-digit integers (YYYY) or string (YYYY-MM-DD) in class 'Date' indicating the dates available in the observations. |
start |
A 4-digit integer (YYYY) or a string (YYYY-MM-DD) in class 'Date' indicating the first start date of the persistence forecast. It must be between 1850 and 2020. |
end |
A 4-digit integer (YYYY) or a string (YYYY-MM-DD) in class 'Date' indicating the last start date of the persistence forecast. It must be between 1850 and 2020. |
ft_start |
An integer indicating the forecast time for which the persistence forecast should be calculated, or the first forecast time of the average forecast times for which persistence should be calculated. |
ft_end |
An (optional) integer indicating the last forecast time of the average forecast times for which persistence should be calculated in the case of a multi-timestep average persistence. The default value is 'ft_start'. |
time_dim |
A character string indicating the dimension along which to compute the autoregression. The default value is 'time'. |
max_ft |
An integer indicating the maximum forecast time possible for 'data'. For example, for decadal prediction 'max_ft' would correspond to 10 (years). The default value is 10. |
nmemb |
An integer indicating the number of ensemble members to generate for the persistence forecast. The default value is 1. |
na.action |
A function or an integer. A function (e.g., na.omit, na.exclude, na.fail, na.pass) indicates what should happen when the data contain NAs. A numeric indicates the maximum number of NA position allowed to compute regression. The default value is 10. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing:
$persistence |
A numeric array with dimensions 'memb', time (start dates), latitudes and longitudes containing the persistence forecast. |
$persistence.mean |
A numeric array with same dimensions as 'persistence', except the 'memb' dimension which is of length 1, containing the ensemble mean persistence forecast. |
$persistence.predint |
A numeric array with same dimensions as 'persistence', except the 'memb' dimension which is of length 1, containing the prediction interval of the persistence forecast. |
$AR.slope |
A numeric array with same dimensions as 'persistence', except the 'memb' dimension which is of length 1, containing the slope coefficient of the autoregression. |
$AR.intercept |
A numeric array with same dimensions as 'persistence', except the 'memb' dimension which is of length 1, containing the intercept coefficient of the autoregression. |
$AR.lowCI |
A numeric array with same dimensions as 'persistence', except the 'memb' dimension which is of length 1, containing the lower value of the confidence interval of the autoregression. |
$AR.highCI |
A numeric array with same dimensions as 'persistence', except the 'memb' dimension which is of length 1, containing the upper value of the confidence interval of the autoregression. |
# Case 1: year # Building an example dataset with yearly start dates from 1920 to 2009 set.seed(1) obs1 <- rnorm(1 * 70 * 2 * 2) dim(obs1) <- c(member = 1, time = 70, lat = 2, lon = 2) dates <- seq(1920, 1989, 1) res <- Persistence(obs1, dates = dates, start = 1961, end = 1980, ft_start = 1, nmemb = 2) # Case 2: day dates <- seq(as.Date(ISOdate(1990, 1, 1)), as.Date(ISOdate(1990, 4, 1)) ,1) start <- as.Date(ISOdate(1990, 2, 15)) end <- as.Date(ISOdate(1990, 4, 1)) set.seed(1) data <- rnorm(1 * length(dates)) dim(data) <- c(member = 1, time = length(dates)) res <- Persistence(data, dates = dates, start = start, end = end, ft_start = 1)
# Case 1: year # Building an example dataset with yearly start dates from 1920 to 2009 set.seed(1) obs1 <- rnorm(1 * 70 * 2 * 2) dim(obs1) <- c(member = 1, time = 70, lat = 2, lon = 2) dates <- seq(1920, 1989, 1) res <- Persistence(obs1, dates = dates, start = 1961, end = 1980, ft_start = 1, nmemb = 2) # Case 2: day dates <- seq(as.Date(ISOdate(1990, 1, 1)), as.Date(ISOdate(1990, 4, 1)) ,1) start <- as.Date(ISOdate(1990, 2, 15)) end <- as.Date(ISOdate(1990, 4, 1)) set.seed(1) data <- rnorm(1 * length(dates)) dim(data) <- c(member = 1, time = length(dates)) res <- Persistence(data, dates = dates, start = start, end = end, ft_start = 1)
Plot two input variables that have the same dimensions in a common plot. One plot for all experiments. The input variables should have dimensions (nexp/nmod, nltime).
Plot2VarsVsLTime( var1, var2, toptitle = "", ytitle = "", monini = 1, freq = 12, nticks = NULL, limits = NULL, listexp = c("exp1", "exp2", "exp3"), listvars = c("var1", "var2"), biglab = FALSE, hlines = NULL, leg = TRUE, siglev = FALSE, sizetit = 1, show_conf = TRUE, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
Plot2VarsVsLTime( var1, var2, toptitle = "", ytitle = "", monini = 1, freq = 12, nticks = NULL, limits = NULL, listexp = c("exp1", "exp2", "exp3"), listvars = c("var1", "var2"), biglab = FALSE, hlines = NULL, leg = TRUE, siglev = FALSE, sizetit = 1, show_conf = TRUE, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
var1 |
Matrix of dimensions (nexp/nmod, nltime). |
var2 |
Matrix of dimensions (nexp/nmod, nltime). |
toptitle |
Main title, optional. |
ytitle |
Title of Y-axis, optional. |
monini |
Starting month between 1 and 12. Default = 1. |
freq |
1 = yearly, 12 = monthly, 4 = seasonal, ... Default = 12. |
nticks |
Number of ticks and labels on the x-axis, optional. |
limits |
c(lower limit, upper limit): limits of the Y-axis, optional. |
listexp |
List of experiment names, up to three, optional. |
listvars |
List of names of input variables, optional. |
biglab |
TRUE/FALSE for presentation/paper plot. Default = FALSE. |
hlines |
c(a, b, ...) Add horizontal black lines at Y-positions a, b, ... The default value is NULL. |
leg |
TRUE/FALSE if legend should be added or not to the plot. Default = TRUE. |
siglev |
TRUE/FALSE if significance level should replace confidence
interval. |
sizetit |
Multiplicative factor to change title size, optional. |
show_conf |
TRUE/FALSE to show/not confidence intervals for input variables. |
fileout |
Name of output file. Extensions allowed: eps/ps, jpeg, png, pdf, bmp and tiff. The default value is NULL. |
width |
File width, in the units specified in the parameter size_units (inches by default). Takes 8 by default. |
height |
File height, in the units specified in the parameter size_units (inches by default). Takes 5 by default. |
size_units |
Units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
Examples of input:
——————
RMSE error for a number of experiments and along lead-time: (nexp, nltime)
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) runmean_months <- 12 smooth_ano_exp <- Smoothing(data = ano_exp, runmeanlen = runmean_months) smooth_ano_obs <- Smoothing(data = ano_obs, runmeanlen = runmean_months) dim_to_mean <- 'member' # mean along members required_complete_row <- 'ftime' # discard startdates for which there are NA leadtimes leadtimes_per_startdate <- 60 rms <- RMS(MeanDims(smooth_ano_exp, dim_to_mean), MeanDims(smooth_ano_obs, dim_to_mean), comp_dim = required_complete_row, dat_dim = 'dataset', limits = c(ceiling((runmean_months + 1) / 2), leadtimes_per_startdate - floor(runmean_months / 2))) smooth_ano_exp_m_sub <- smooth_ano_exp - InsertDim(MeanDims(smooth_ano_exp, 'member', na.rm = TRUE), posdim = 3, lendim = dim(smooth_ano_exp)['member'], name = 'member') suppressWarnings({ spread <- Spread(smooth_ano_exp_m_sub, compute_dim = c('member', 'sdate')) }) #Combine rms outputs into one array rms_combine <- abind::abind(rms$conf.lower, rms$rms, rms$conf.upper, along = 0) rms_combine <- Reorder(rms_combine, c(2, 3, 1, 4)) Plot2VarsVsLTime(InsertDim(rms_combine[, , , ], 1, 1), Reorder(spread$sd, c(1, 3, 2)), toptitle = 'RMSE and spread', monini = 11, freq = 12, listexp = c('CMIP5 IC3'), listvar = c('RMSE', 'spread'))
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) runmean_months <- 12 smooth_ano_exp <- Smoothing(data = ano_exp, runmeanlen = runmean_months) smooth_ano_obs <- Smoothing(data = ano_obs, runmeanlen = runmean_months) dim_to_mean <- 'member' # mean along members required_complete_row <- 'ftime' # discard startdates for which there are NA leadtimes leadtimes_per_startdate <- 60 rms <- RMS(MeanDims(smooth_ano_exp, dim_to_mean), MeanDims(smooth_ano_obs, dim_to_mean), comp_dim = required_complete_row, dat_dim = 'dataset', limits = c(ceiling((runmean_months + 1) / 2), leadtimes_per_startdate - floor(runmean_months / 2))) smooth_ano_exp_m_sub <- smooth_ano_exp - InsertDim(MeanDims(smooth_ano_exp, 'member', na.rm = TRUE), posdim = 3, lendim = dim(smooth_ano_exp)['member'], name = 'member') suppressWarnings({ spread <- Spread(smooth_ano_exp_m_sub, compute_dim = c('member', 'sdate')) }) #Combine rms outputs into one array rms_combine <- abind::abind(rms$conf.lower, rms$rms, rms$conf.upper, along = 0) rms_combine <- Reorder(rms_combine, c(2, 3, 1, 4)) Plot2VarsVsLTime(InsertDim(rms_combine[, , , ], 1, 1), Reorder(spread$sd, c(1, 3, 2)), toptitle = 'RMSE and spread', monini = 11, freq = 12, listexp = c('CMIP5 IC3'), listvar = c('RMSE', 'spread'))
Plots plumes/timeseries of ACC from an array with dimensions
(output from ACC()
):
c(nexp, nobs, nsdates, nltime, 4)
where the fourth dimension is of length 4 and contains the lower limit of
the 95% confidence interval, the ACC, the upper limit of the 95%
confidence interval and the 95% significance level given by a one-sided
T-test.
PlotACC( ACC, sdates, toptitle = "", sizetit = 1, ytitle = "", limits = NULL, legends = NULL, freq = 12, biglab = FALSE, fill = FALSE, linezero = FALSE, points = TRUE, vlines = NULL, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
PlotACC( ACC, sdates, toptitle = "", sizetit = 1, ytitle = "", limits = NULL, legends = NULL, freq = 12, biglab = FALSE, fill = FALSE, linezero = FALSE, points = TRUE, vlines = NULL, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
ACC |
An ACC array with with dimensions: |
sdates |
A character vector of startdates: c('YYYYMMDD','YYYYMMDD'). |
toptitle |
A character string of the main title, optional. |
sizetit |
A multiplicative factor to scale title size, optional. |
ytitle |
A character string of the title of Y-axis for each experiment: c(”, ”), optional. |
limits |
A numeric vector c(lower limit, upper limit): limits of the Y-axis, optional. |
legends |
A character vector of flags to be written in the legend, optional. |
freq |
A integer: 1 = yearly, 12 = monthly, 4 = seasonal, ... Default: 12. |
biglab |
A logical value for presentation/paper plot, Default = FALSE. |
fill |
A logical value if filled confidence interval. Default = FALSE. |
linezero |
A logical value if a line at y=0 should be added. Default = FALSE. |
points |
A logical value if points instead of lines. Default = TRUE. |
vlines |
A vector of x location where to add vertical black lines, optional. |
fileout |
A character string of the output file name. Extensions allowed: eps/ps, jpeg, png, pdf, bmp and tiff. Default is NULL. |
width |
A numeric of the file width, in the units specified in the parameter size_units (inches by default). Takes 8 by default. |
height |
A numeric of the file height, in the units specified in the parameter size_units (inches by default). Takes 5 by default. |
size_units |
A character string of the units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
sampleData$mod <- Season(sampleData$mod, monini = 11, moninf = 12, monsup = 2) sampleData$obs <- Season(sampleData$obs, monini = 11, moninf = 12, monsup = 2) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) acc <- ACC(ano_exp, ano_obs, lat = sampleData$lat, dat_dim = 'dataset') acc_bootstrap <- ACC(ano_exp, ano_obs, lat = sampleData$lat, conftype = 'bootstrap', dat_dim = 'dataset') # Combine acc results for PlotACC res <- array(c(acc$conf.lower, acc$acc, acc$conf.upper, acc$p.val), dim = c(dim(acc$acc), 4)) res_bootstrap <- array(c(acc$acc_conf.lower, acc$acc, acc$acc_conf.upper, acc$p.val), dim = c(dim(acc$acc), 4)) PlotACC(res, startDates) PlotACC(res_bootstrap, startDates)
sampleData$mod <- Season(sampleData$mod, monini = 11, moninf = 12, monsup = 2) sampleData$obs <- Season(sampleData$obs, monini = 11, moninf = 12, monsup = 2) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) acc <- ACC(ano_exp, ano_obs, lat = sampleData$lat, dat_dim = 'dataset') acc_bootstrap <- ACC(ano_exp, ano_obs, lat = sampleData$lat, conftype = 'bootstrap', dat_dim = 'dataset') # Combine acc results for PlotACC res <- array(c(acc$conf.lower, acc$acc, acc$conf.upper, acc$p.val), dim = c(dim(acc$acc), 4)) res_bootstrap <- array(c(acc$acc_conf.lower, acc$acc, acc$acc_conf.upper, acc$p.val), dim = c(dim(acc$acc), 4)) PlotACC(res, startDates) PlotACC(res_bootstrap, startDates)
Plots time series of raw or smoothed anomalies of any variable output from
Load()
or Ano()
or or Ano_CrossValid()
or
Smoothing()
.
PlotAno( exp_ano, obs_ano = NULL, sdates, toptitle = rep("", 15), ytitle = rep("", 15), limits = NULL, legends = NULL, freq = 12, biglab = FALSE, fill = TRUE, memb = TRUE, ensmean = TRUE, linezero = FALSE, points = FALSE, vlines = NULL, sizetit = 1, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
PlotAno( exp_ano, obs_ano = NULL, sdates, toptitle = rep("", 15), ytitle = rep("", 15), limits = NULL, legends = NULL, freq = 12, biglab = FALSE, fill = TRUE, memb = TRUE, ensmean = TRUE, linezero = FALSE, points = FALSE, vlines = NULL, sizetit = 1, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
exp_ano |
A numerical array containing the experimental data: |
obs_ano |
A numerical array containing the observational data: |
sdates |
A character vector of start dates in the format of c('YYYYMMDD','YYYYMMDD'). |
toptitle |
Main title for each experiment: c(”,”), optional. |
ytitle |
Title of Y-axis for each experiment: c(”,”), optional. |
limits |
c(lower limit, upper limit): limits of the Y-axis, optional. |
legends |
List of observational dataset names, optional. |
freq |
1 = yearly, 12 = monthly, 4 = seasonal, ... Default: 12. |
biglab |
TRUE/FALSE for presentation/paper plot. Default = FALSE. |
fill |
TRUE/FALSE if the spread between members should be filled. Default = TRUE. |
memb |
TRUE/FALSE if all members/only the ensemble-mean should be
plotted. |
ensmean |
TRUE/FALSE if the ensemble-mean should be plotted. Default = TRUE. |
linezero |
TRUE/FALSE if a line at y=0 should be added. Default = FALSE. |
points |
TRUE/FALSE if points instead of lines should be shown. Default = FALSE. |
vlines |
List of x location where to add vertical black lines, optional. |
sizetit |
Multiplicative factor to scale title size, optional. |
fileout |
Name of the output file for each experiment: c(”,”). Extensions allowed: eps/ps, jpeg, png, pdf, bmp and tiff. If filenames with different extensions are passed, it will be considered only the first one and it will be extended to the rest. The default value is NULL, which the pop-up window shows. |
width |
File width, in the units specified in the parameter size_units (inches by default). Takes 8 by default. |
height |
File height, in the units specified in the parameter size_units (inches by default). Takes 5 by default. |
size_units |
Units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) smooth_ano_exp <- Smoothing(ano_exp, time_dim = 'ftime', runmeanlen = 12) smooth_ano_obs <- Smoothing(ano_obs, time_dim = 'ftime', runmeanlen = 12) smooth_ano_exp <- Reorder(smooth_ano_exp, c(2, 3, 4, 1)) smooth_ano_obs <- Reorder(smooth_ano_obs, c(2, 3, 4, 1)) PlotAno(smooth_ano_exp, smooth_ano_obs, startDates, toptitle = paste('smoothed anomalies'), ytitle = c('K', 'K', 'K'), legends = 'ERSST', biglab = FALSE)
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) smooth_ano_exp <- Smoothing(ano_exp, time_dim = 'ftime', runmeanlen = 12) smooth_ano_obs <- Smoothing(ano_obs, time_dim = 'ftime', runmeanlen = 12) smooth_ano_exp <- Reorder(smooth_ano_exp, c(2, 3, 4, 1)) smooth_ano_obs <- Reorder(smooth_ano_obs, c(2, 3, 4, 1)) PlotAno(smooth_ano_exp, smooth_ano_obs, startDates, toptitle = paste('smoothed anomalies'), ytitle = c('K', 'K', 'K'), legends = 'ERSST', biglab = FALSE)
Produce time series of box-and-whisker plot showing the distribution of the members of a forecast vs. the observed evolution. The correlation between forecast and observational data is calculated and displayed. Only works for n-monthly to n-yearly time series.
PlotBoxWhisker( exp, obs, toptitle = "", ytitle = "", monini = 1, yearini = 0, freq = 1, expname = "exp 1", obsname = "obs 1", drawleg = TRUE, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
PlotBoxWhisker( exp, obs, toptitle = "", ytitle = "", monini = 1, yearini = 0, freq = 1, expname = "exp 1", obsname = "obs 1", drawleg = TRUE, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
exp |
Forecast array of multi-member time series, e.g., the NAO index of one experiment. The expected dimensions are c(members, start dates/forecast horizons). A vector with only the time dimension can also be provided. Only monthly or lower frequency time series are supported. See parameter freq. |
obs |
Observational vector or array of time series, e.g., the NAO index
of the observations that correspond the forecast data in |
toptitle |
Character string to be drawn as figure title. |
ytitle |
Character string to be drawn as y-axis title. |
monini |
Number of the month of the first time step, from 1 to 12. |
yearini |
Year of the first time step. |
freq |
Frequency of the provided time series: 1 = yearly, 12 = monthly, |
expname |
Experimental dataset name. |
obsname |
Name of the observational reference dataset. |
drawleg |
TRUE/FALSE: whether to draw the legend or not. |
fileout |
Name of output file. Extensions allowed: eps/ps, jpeg, png,
pdf, bmp and tiff. |
width |
File width, in the units specified in the parameter size_units (inches by default). Takes 8 by default. |
height |
File height, in the units specified in the parameter size_units (inches by default). Takes 5 by default. |
size_units |
Units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
Generates a file at the path specified via fileout
.
History:
0.1 - 2013-09 (F. Lienert, [email protected]) - Original code
0.2 - 2015-03 (L. Batte, [email protected]) - Removed all
normalization for sake of clarity.
1.0 - 2016-03 (N. Manubens, [email protected]) - Formatting to R CRAN
EOF, ProjectField, NAO
# See examples on Load() to understand the first lines in this example ## Not run: data_path <- system.file('sample_data', package = 's2dverification') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, leadtimemin = 1, leadtimemax = 4, output = 'lonlat', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) # Now ready to compute the EOFs and project on, for example, the first # variability mode. ano <- Ano_CrossValid(sampleData$mod, sampleData$obs) ano_exp <- array(ano$exp, dim = dim(ano$exp)[-2]) ano_obs <- array(ano$obs, dim = dim(ano$obs)[-2]) nao <- NAO(exp = ano_exp, obs = ano_obs, lat = sampleData$lat, lon = sampleData$lon) # Finally plot the nao index ## Not run: nao$exp <- Reorder(nao$exp, c(2, 1)) nao$obs <- Reorder(nao$obs, c(2, 1)) PlotBoxWhisker(nao$exp, nao$obs, "NAO index, DJF", "NAO index (PC1) TOS", monini = 12, yearini = 1985, freq = 1, "Exp. A", "Obs. X") ## End(Not run)
# See examples on Load() to understand the first lines in this example ## Not run: data_path <- system.file('sample_data', package = 's2dverification') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, leadtimemin = 1, leadtimemax = 4, output = 'lonlat', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) # Now ready to compute the EOFs and project on, for example, the first # variability mode. ano <- Ano_CrossValid(sampleData$mod, sampleData$obs) ano_exp <- array(ano$exp, dim = dim(ano$exp)[-2]) ano_obs <- array(ano$obs, dim = dim(ano$obs)[-2]) nao <- NAO(exp = ano_exp, obs = ano_obs, lat = sampleData$lat, lon = sampleData$lon) # Finally plot the nao index ## Not run: nao$exp <- Reorder(nao$exp, c(2, 1)) nao$obs <- Reorder(nao$obs, c(2, 1)) PlotBoxWhisker(nao$exp, nao$obs, "NAO index, DJF", "NAO index (PC1) TOS", monini = 12, yearini = 1985, freq = 1, "Exp. A", "Obs. X") ## End(Not run)
Plots climatologies as a function of the forecast time for any index output
from Clim()
and organized in matrix with dimensions:
c(nmod/nexp, nmemb/nparam, nltime) or c(nmod/nexp, nltime) for the
experiment data
c(nobs, nmemb, nltime) or c(nobs, nltime) for the observational data
PlotClim( exp_clim, obs_clim = NULL, toptitle = "", ytitle = "", monini = 1, freq = 12, limits = NULL, listexp = c("exp1", "exp2", "exp3"), listobs = c("obs1", "obs2", "obs3"), biglab = FALSE, leg = TRUE, sizetit = 1, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
PlotClim( exp_clim, obs_clim = NULL, toptitle = "", ytitle = "", monini = 1, freq = 12, limits = NULL, listexp = c("exp1", "exp2", "exp3"), listobs = c("obs1", "obs2", "obs3"), biglab = FALSE, leg = TRUE, sizetit = 1, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
exp_clim |
Matrix containing the experimental data with dimensions: |
obs_clim |
Matrix containing the observational data (optional) with
dimensions: |
toptitle |
Main title, optional. |
ytitle |
Title of Y-axis, optional. |
monini |
Starting month between 1 and 12. Default = 1. |
freq |
1 = yearly, 12 = monthly, 4 = seasonal, ... Default = 12. |
limits |
c(lower limit, upper limit): limits of the Y-axis, optional. |
listexp |
List of experiment names, optional. |
listobs |
List of observational dataset names, optional. |
biglab |
TRUE/FALSE for presentation/paper plot. Default = FALSE. |
leg |
TRUE/FALSE to plot the legend or not. |
sizetit |
Multiplicative factor to scale title size, optional. |
fileout |
Name of output file. Extensions allowed: eps/ps, jpeg, png, pdf, bmp and tiff. The default value is NULL, which the figure is shown in a pop-up window. |
width |
File width, in the units specified in the parameter size_units (inches by default). Takes 8 by default. |
height |
File height, in the units specified in the parameter size_units (inches by default). Takes 5 by default. |
size_units |
Units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) PlotClim(clim$clim_exp, clim$clim_obs, toptitle = paste('climatologies'), ytitle = 'K', monini = 11, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, fileout = NULL)
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) PlotClim(clim$clim_exp, clim$clim_obs, toptitle = paste('climatologies'), ytitle = 'K', monini = 11, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, fileout = NULL)
Map longitude-latitude array (on a regular rectangular or gaussian grid) on a cylindrical equidistant latitude and longitude projection with coloured grid cells. Only the region for which data has been provided is displayed. A colour bar (legend) can be plotted and adjusted. It is possible to draw superimposed arrows, dots, symbols, contour lines and boxes. A number of options is provided to adjust the position, size and colour of the components. Some parameters are provided to add and adjust the masks that include continents, oceans, and lakes. This plot function is compatible with figure layouts if colour bar is disabled.
PlotEquiMap( var, lon, lat, varu = NULL, varv = NULL, toptitle = NULL, sizetit = NULL, units = NULL, brks = NULL, cols = NULL, bar_limits = NULL, triangle_ends = NULL, col_inf = NULL, col_sup = NULL, colNA = NULL, color_fun = clim.palette(), square = TRUE, filled.continents = NULL, filled.oceans = FALSE, country.borders = FALSE, coast_color = NULL, coast_width = 1, lake_color = NULL, shapefile = NULL, shapefile_color = NULL, shapefile_lwd = 1, contours = NULL, brks2 = NULL, contour_lwd = 0.5, contour_color = "black", contour_lty = 1, contour_draw_label = TRUE, contour_label_scale = 1, dots = NULL, dot_symbol = 4, dot_size = 1, arr_subsamp = floor(length(lon)/30), arr_scale = 1, arr_ref_len = 15, arr_units = "m/s", arr_scale_shaft = 1, arr_scale_shaft_angle = 1, axelab = TRUE, labW = FALSE, lab_dist_x = NULL, lab_dist_y = NULL, degree_sym = FALSE, intylat = 20, intxlon = 20, xlonshft = 0, ylatshft = 0, xlabels = NULL, ylabels = NULL, axes_tick_scale = 1, axes_label_scale = 1, drawleg = TRUE, subsampleg = NULL, bar_extra_labels = NULL, draw_bar_ticks = TRUE, draw_separators = FALSE, triangle_ends_scale = 1, bar_label_digits = 4, bar_label_scale = 1, units_scale = 1, bar_tick_scale = 1, bar_extra_margin = rep(0, 4), boxlim = NULL, boxcol = "purple2", boxlwd = 5, margin_scale = rep(1, 4), title_scale = 1, numbfig = NULL, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
PlotEquiMap( var, lon, lat, varu = NULL, varv = NULL, toptitle = NULL, sizetit = NULL, units = NULL, brks = NULL, cols = NULL, bar_limits = NULL, triangle_ends = NULL, col_inf = NULL, col_sup = NULL, colNA = NULL, color_fun = clim.palette(), square = TRUE, filled.continents = NULL, filled.oceans = FALSE, country.borders = FALSE, coast_color = NULL, coast_width = 1, lake_color = NULL, shapefile = NULL, shapefile_color = NULL, shapefile_lwd = 1, contours = NULL, brks2 = NULL, contour_lwd = 0.5, contour_color = "black", contour_lty = 1, contour_draw_label = TRUE, contour_label_scale = 1, dots = NULL, dot_symbol = 4, dot_size = 1, arr_subsamp = floor(length(lon)/30), arr_scale = 1, arr_ref_len = 15, arr_units = "m/s", arr_scale_shaft = 1, arr_scale_shaft_angle = 1, axelab = TRUE, labW = FALSE, lab_dist_x = NULL, lab_dist_y = NULL, degree_sym = FALSE, intylat = 20, intxlon = 20, xlonshft = 0, ylatshft = 0, xlabels = NULL, ylabels = NULL, axes_tick_scale = 1, axes_label_scale = 1, drawleg = TRUE, subsampleg = NULL, bar_extra_labels = NULL, draw_bar_ticks = TRUE, draw_separators = FALSE, triangle_ends_scale = 1, bar_label_digits = 4, bar_label_scale = 1, units_scale = 1, bar_tick_scale = 1, bar_extra_margin = rep(0, 4), boxlim = NULL, boxcol = "purple2", boxlwd = 5, margin_scale = rep(1, 4), title_scale = 1, numbfig = NULL, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
var |
Array with the values at each cell of a grid on a regular rectangular or gaussian grid. The array is expected to have two dimensions: c(latitude, longitude). Longitudes can be in ascending or descending order and latitudes in any order. It can contain NA values (coloured with 'colNA'). Arrays with dimensions c(longitude, latitude) will also be accepted but 'lon' and 'lat' will be used to disambiguate so this alternative is not appropriate for square arrays. It is allowed that the positions of the longitudinal and latitudinal coordinate dimensions are interchanged. |
lon |
Numeric vector of longitude locations of the cell centers of the
grid of 'var', in ascending or descending order (same as 'var'). Expected
to be regularly spaced, within either of the ranges [-180, 180] or
[0, 360]. Data for two adjacent regions split by the limits of the
longitude range can also be provided, e.g. |
lat |
Numeric vector of latitude locations of the cell centers of the grid of 'var', in any order (same as 'var'). Expected to be from a regular rectangular or gaussian grid, within the range [-90, 90]. |
varu |
Array of the zonal component of wind/current/other field with the same dimensions as 'var'. It is allowed that the positions of the longitudinal and latitudinal coordinate dimensions are interchanged. |
varv |
Array of the meridional component of wind/current/other field with the same dimensions as 'var'. It is allowed that the positions of the longitudinal and latitudinal coordinate dimensions are interchanged. |
toptitle |
Top title of the figure, scalable with parameter 'title_scale'. |
sizetit |
Scale factor for the figure top title provided in parameter 'toptitle'. Deprecated. Use 'title_scale' instead. |
units |
Title at the top of the colour bar, most commonly the units of the variable provided in parameter 'var'. |
brks , cols , bar_limits , triangle_ends
|
Usually only providing 'brks' is
enough to generate the desired colour bar. These parameters allow to
define n breaks that define n - 1 intervals to classify each of the values
in 'var'. The corresponding grid cell of a given value in 'var' will be
coloured in function of the interval it belongs to. These parameters are
sent to |
col_inf , col_sup , colNA
|
Colour identifiers to colour the values in 'var' that go beyond the extremes of the colour bar and to colour NA values, respectively. 'colNA' takes attr(cols, 'na_color') if available by default, where cols is the parameter 'cols' if provided or the vector of colors returned by 'color_fun'. If not available, it takes 'pink' by default. 'col_inf' and 'col_sup' will take the value of 'colNA' if not specified. See ?ColorBar for a full explanation on 'col_inf' and 'col_sup'. |
color_fun , subsampleg , bar_extra_labels , draw_bar_ticks
|
Set of parameters to control the visual aspect of the drawn colour bar (1/3). See ?ColorBar for a full explanation. |
square |
Logical value to choose either to draw a coloured square for each grid cell in 'var' (TRUE; default) or to draw contour lines and fill the spaces in between with colours (FALSE). In the latter case, 'filled.continents' will take the value FALSE if not specified. |
filled.continents |
Colour to fill in drawn projected continents. Takes the value gray(0.5) by default or, if 'square = FALSE', takes the value FALSE. If set to FALSE, continents are not filled in. |
filled.oceans |
A logical value or the color name to fill in drawn projected oceans. The default value is FALSE. If it is TRUE, the default colour is "light blue". |
country.borders |
A logical value indicating if the country borders should be plotted (TRUE) or not (FALSE). It only works when 'filled.continents' is FALSE. The default value is FALSE. |
coast_color |
Colour of the coast line of the drawn projected continents. Takes the value gray(0.5) by default. |
coast_width |
Line width of the coast line of the drawn projected continents. Takes the value 1 by default. |
lake_color |
Colour of the lake or other water body inside continents. The default value is NULL. |
shapefile |
A character string of the path to a .rds file or a list object containinig shape file data. If it is a .rds file, it should contain a list. The list should contains 'x' and 'y' at least, which indicate the location of the shape. The default value is NULL. |
shapefile_color |
Line color of the shapefile. |
shapefile_lwd |
Line width of the shapefile. The default value is 1. |
contours |
Array of same dimensions as 'var' to be added to the plot and displayed with contours. Parameter 'brks2' is required to define the magnitude breaks for each contour curve. Disregarded if 'square = FALSE'. It is allowed that the positions of the longitudinal and latitudinal coordinate dimensions are interchanged. |
brks2 |
Vector of magnitude breaks where to draw contour curves for the array provided in 'contours' or if 'square = FALSE'. |
contour_lwd |
Line width of the contour curves provided via 'contours' and 'brks2', or if 'square = FALSE'. |
contour_color |
Line color of the contour curves provided via 'contours' and 'brks2', or if 'square = FALSE'. |
contour_lty |
Line type of the contour curves. Takes 1 (solid) by default. See help on 'lty' in par() for other accepted values. |
contour_draw_label |
A logical value indicating whether to draw the contour labels or not. The default value is TRUE. |
contour_label_scale |
Scale factor for the superimposed labels when drawing contour levels. |
dots |
Array of same dimensions as 'var' or with dimensions c(n, dim(var)), where n is the number of dot/symbol layers to add to the plot. A value of TRUE at a grid cell will draw a dot/symbol on the corresponding square of the plot. By default all layers provided in 'dots' are plotted with dots, but a symbol can be specified for each of the layers via the parameter 'dot_symbol'. It is allowed that the positions of the longitudinal and latitudinal coordinate dimensions are interchanged. |
dot_symbol |
Single character/number or vector of characters/numbers that correspond to each of the symbol layers specified in parameter 'dots'. If a single value is specified, it will be applied to all the layers in 'dots'. Takes 15 (centered square) by default. See 'pch' in par() for additional accepted options. |
dot_size |
Scale factor for the dots/symbols to be plotted, specified in 'dots'. If a single value is specified, it will be applied to all layers in 'dots'. Takes 1 by default. |
arr_subsamp |
Subsampling factor to select a subset of arrows in 'varu' and 'varv' to be drawn. Only one out of arr_subsamp arrows will be drawn. Takes 1 by default. |
arr_scale |
Scale factor for drawn arrows from 'varu' and 'varv'. Takes 1 by default. |
arr_ref_len |
Length of the refence arrow to be drawn as legend at the bottom of the figure (in same units as 'varu' and 'varv', only affects the legend for the wind or variable in these arrays). Defaults to 15. |
arr_units |
Units of 'varu' and 'varv', to be drawn in the legend. Takes 'm/s' by default. |
arr_scale_shaft |
Parameter for the scale of the shaft of the arrows (which also depend on the number of figures and the arr_scale parameter). Defaults to 1. |
arr_scale_shaft_angle |
Parameter for the scale of the angle of the shaft of the arrows (which also depend on the number of figure and the arr_scale parameter). Defaults to 1. |
axelab |
Whether to draw longitude and latitude axes or not. TRUE by default. |
labW |
Whether to label the longitude axis with a 'W' instead of minus for negative values. Defaults to FALSE. |
lab_dist_x |
A numeric of the distance of the longitude labels to the box borders. The default value is NULL and is automatically adjusted by the function. |
lab_dist_y |
A numeric of the distance of the latitude labels to the box borders. The default value is NULL and is automatically adjusted by the function. |
degree_sym |
A logical indicating whether to include degree symbol (30° N) or not (30N; default). |
intylat |
Interval between latitude ticks on y-axis, in degrees. Defaults to 20. |
intxlon |
Interval between latitude ticks on x-axis, in degrees. Defaults to 20. |
xlonshft |
A numeric of the degrees to shift the latitude ticks. The default value is 0. |
ylatshft |
A numeric of the degrees to shift the longitude ticks. The default value is 0. |
xlabels |
A vector of character string of the custumized x-axis labels. The values should correspond to each tick, which is decided by the longitude and parameter 'intxlon'. The default value is NULL and the labels will be automatically generated. |
ylabels |
A vector of character string of the custumized y-axis labels. The values should correspond to each tick, which is decided by the latitude and parameter 'intylat'. The default value is NULL and the labels will be automatically generated. |
axes_tick_scale |
Scale factor for the tick lines along the longitude and latitude axes. |
axes_label_scale |
Scale factor for the labels along the longitude and latitude axes. |
drawleg |
Whether to plot a color bar (legend, key) or not. Defaults to TRUE. It is not possible to plot the colour bar if 'add = TRUE'. Use ColorBar() and the return values of PlotEquiMap() instead. |
draw_separators , triangle_ends_scale , bar_label_digits
|
Set of parameters to control the visual aspect of the drawn colour bar (2/3). See ?ColorBar for a full explanation. |
bar_label_scale , units_scale , bar_tick_scale , bar_extra_margin
|
Set of parameters to control the visual aspect of the drawn colour bar (3/3). See ?ColorBar for a full explanation. |
boxlim |
Limits of a box to be added to the plot, in degrees: c(x1, y1, x2, y2). A list with multiple box specifications can also be provided. |
boxcol |
Colour of the box lines. A vector with a colour for each of the boxes is also accepted. Defaults to 'purple2'. |
boxlwd |
Line width of the box lines. A vector with a line width for each of the boxes is also accepted. Defaults to 5. |
margin_scale |
Scale factor for the margins around the map plot, with the format c(y1, x1, y2, x2). Defaults to rep(1, 4). If drawleg = TRUE, then margin_scale[1] is subtracted 1 unit. |
title_scale |
Scale factor for the figure top title. Defaults to 1. |
numbfig |
Number of figures in the layout the plot will be put into. A higher numbfig will result in narrower margins and smaller labels, axe labels, ticks, thinner lines, ... Defaults to 1. |
fileout |
File where to save the plot. If not specified (default) a graphics device will pop up. Extensions allowed: eps/ps, jpeg, png, pdf, bmp and tiff. |
width |
File width, in the units specified in the parameter size_units (inches by default). Takes 8 by default. |
height |
File height, in the units specified in the parameter size_units (inches by default). Takes 5 by default. |
size_units |
Units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
brks |
Breaks used for colouring the map (and legend if drawleg = TRUE). |
cols |
Colours used for colouring the map (and legend if drawleg = TRUE). Always of length length(brks) - 1. |
col_inf |
Colour used to draw the lower triangle end in the colour bar (NULL if not drawn at all). |
col_sup |
Colour used to draw the upper triangle end in the colour bar (NULL if not drawn at all). |
# See examples on Load() to understand the first lines in this example ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, leadtimemin = 1, leadtimemax = 4, output = 'lonlat', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) PlotEquiMap(sampleData$mod[1, 1, 1, 1, , ], sampleData$lon, sampleData$lat, toptitle = 'Predicted sea surface temperature for Nov 1960 from 1st Nov', title_scale = 0.5)
# See examples on Load() to understand the first lines in this example ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, leadtimemin = 1, leadtimemax = 4, output = 'lonlat', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) PlotEquiMap(sampleData$mod[1, 1, 1, 1, , ], sampleData$lon, sampleData$lat, toptitle = 'Predicted sea surface temperature for Nov 1960 from 1st Nov', title_scale = 0.5)
This function takes an array or list of arrays and loops over each of them
to plot all the sub-arrays they contain on an automatically generated
multi-pannel layout. A different plot function (not necessarily from
s2dv) can be applied over each of the provided arrays. The input
dimensions of each of the functions have to be specified, either with the
names or the indices of the corresponding input dimensions. It is possible
to draw a common colour bar at any of the sides of the multi-pannel for all
the s2dv plots that use a colour bar. Common plotting arguments
for all the arrays in 'var' can be specified via the '...' parameter, and
specific plotting arguments for each array can be fully adjusted via
'special_args'. It is possible to draw titles for each of the figures,
layout rows, layout columns and for the whole figure. A number of parameters
is provided in order to adjust the position, size and colour of the
components. Blank cells can be forced to appear and later be filled in
manually with customized plots.
This function pops up a blank new device and fills it in, so it cannot be
nested in complex layouts.
PlotLayout( fun, plot_dims, var, ..., special_args = NULL, nrow = NULL, ncol = NULL, toptitle = NULL, row_titles = NULL, col_titles = NULL, bar_scale = 1, title_scale = 1, title_margin_scale = 1, title_left_shift_scale = 1, subtitle_scale = 1, subtitle_margin_scale = 1, subplot_titles_scale = 1, brks = NULL, cols = NULL, drawleg = "S", titles = NULL, subsampleg = NULL, bar_limits = NULL, triangle_ends = NULL, col_inf = NULL, col_sup = NULL, color_fun = clim.colors, draw_bar_ticks = TRUE, draw_separators = FALSE, triangle_ends_scale = 1, bar_extra_labels = NULL, units = NULL, units_scale = 1, bar_label_scale = 1, bar_tick_scale = 1, bar_extra_margin = rep(0, 4), bar_left_shift_scale = 1, bar_label_digits = 4, extra_margin = rep(0, 4), layout_by_rows = TRUE, fileout = NULL, width = NULL, height = NULL, size_units = "in", res = 100, close_device = TRUE )
PlotLayout( fun, plot_dims, var, ..., special_args = NULL, nrow = NULL, ncol = NULL, toptitle = NULL, row_titles = NULL, col_titles = NULL, bar_scale = 1, title_scale = 1, title_margin_scale = 1, title_left_shift_scale = 1, subtitle_scale = 1, subtitle_margin_scale = 1, subplot_titles_scale = 1, brks = NULL, cols = NULL, drawleg = "S", titles = NULL, subsampleg = NULL, bar_limits = NULL, triangle_ends = NULL, col_inf = NULL, col_sup = NULL, color_fun = clim.colors, draw_bar_ticks = TRUE, draw_separators = FALSE, triangle_ends_scale = 1, bar_extra_labels = NULL, units = NULL, units_scale = 1, bar_label_scale = 1, bar_tick_scale = 1, bar_extra_margin = rep(0, 4), bar_left_shift_scale = 1, bar_label_digits = 4, extra_margin = rep(0, 4), layout_by_rows = TRUE, fileout = NULL, width = NULL, height = NULL, size_units = "in", res = 100, close_device = TRUE )
fun |
Plot function (or name of the function) to be called on the arrays provided in 'var'. If multiple arrays are provided in 'var', a vector of as many function names (character strings!) can be provided in 'fun', one for each array in 'var'. |
plot_dims |
Numeric or character string vector with identifiers of the input plot dimensions of the plot function specified in 'fun'. If character labels are provided, names(dim(var)) or attr('dimensions', var) will be checked to locate the dimensions. As many plots as prod(dim(var)[-plot_dims]) will be generated. If multiple arrays are provided in 'var', 'plot_dims' can be sent a list with a vector of plot dimensions for each. If a single vector is provided, it will be used for all the arrays in 'var'. |
var |
Multi-dimensional array with at least the dimensions expected by the specified plot function in 'fun'. The dimensions reqired by the function must be specified in 'plot_dims'. The dimensions can be disordered and will be reordered automatically. Dimensions can optionally be labelled in order to refer to them with names in 'plot_dims'. All the available plottable sub-arrays will be automatically plotted and arranged in consecutive cells of an automatically arranged layout. A list of multiple (super-)arrays can be specified. The process will be repeated for each of them, by default applying the same plot function to all of them or, if properly specified in 'fun', a different plot function will be applied to each of them. NAs can be passed to the list: a NA will yield a blank cell in the layout, which can be populated after (see .SwitchToFigure). |
... |
Parameters to be sent to the plotting function 'fun'. If multiple arrays are provided in 'var' and multiple functions are provided in 'fun', the parameters provided through ... will be sent to all the plot functions, as common parameters. To specify concrete arguments for each of the plot functions see parameter 'special_args'. |
special_args |
List of sub-lists, each sub-list having specific extra
arguments for each of the plot functions provided in 'fun'. If you want to
fix a different value for each plot in the layout you can do so by
a) splitting your array into a list of sub-arrays (each with the data for
one plot) and providing it as parameter 'var',
b) providing a list of named sub-lists in 'special_args', where the names
of each sub-list match the names of the parameters to be adjusted, and
each value in a sub-list contains the value of the corresponding parameter.
For example, if the plots are two maps with different arguments, the
structure would be like: |
nrow |
Numeric value to force the number of rows in the automatically generated layout. If higher than the required, this will yield blank cells in the layout (which can then be populated). If lower than the required the function will stop. By default it is configured to arrange the layout in a shape as square as possible. Blank cells can be manually populated after with customized plots (see SwitchTofigure). |
ncol |
Numeric value to force the number of columns in the automatically generated layout. If higher than the required, this will yield blank cells in the layout (which can then be populated). If lower than the required the function will stop. By default it is configured to arrange the layout in a shape as square as possible. Blank cells can be manually populated after with customized plots (see SwitchTofigure). |
toptitle |
Topt title for the multi-pannel. Blank by default. |
row_titles |
Character string vector with titles for each of the rows in the layout. Blank by default. |
col_titles |
Character string vector with titles for each of the columns in the layout. Blank by default. |
bar_scale |
Scale factor for the common colour bar. Takes 1 by default. |
title_scale |
Scale factor for the multi-pannel title. Takes 1 by default. |
title_margin_scale |
Scale factor for the margins surrounding the top title. Takes 1 by default. |
title_left_shift_scale |
When plotting row titles, a shift is added to the horizontal positioning of the top title in order to center it to the region of the figures (without taking row titles into account). This shift can be reduced. A value of 0 will remove the shift completely, centering the title to the total width of the device. This parameter will be disregarded if no 'row_titles' are provided. |
subtitle_scale |
Scale factor for the row titles and column titles (specified in 'row_titles' and 'col_titles'). Takes 1 by default. |
subtitle_margin_scale |
Scale factor for the margins surrounding the subtitles. Takes 1 by default. |
subplot_titles_scale |
Scale factor for the subplots top titles. Takes 1 by default. |
brks , cols , bar_limits , triangle_ends
|
Usually only providing 'brks' is
enough to generate the desired colour bar. These parameters allow to
define n breaks that define n - 1 intervals to classify each of the values
in 'var'. The corresponding grid cell of a given value in 'var' will be
coloured in function of the interval it belongs to. These parameters are
sent to |
drawleg |
Where to draw the common colour bar. Can take values TRUE,
FALSE or: |
titles |
Character string vector with titles for each of the figures in the multi-pannel, from top-left to bottom-right. Blank by default. |
col_inf , col_sup
|
Colour identifiers to colour the values in 'var' that go beyond the extremes of the colour bar and to colour NA values, respectively. 'colNA' takes 'white' by default. 'col_inf' and 'col_sup' will take the value of 'colNA' if not specified. See ?ColorBar for a full explanation on 'col_inf' and 'col_sup'. |
color_fun , subsampleg , bar_extra_labels , draw_bar_ticks , draw_separators , triangle_ends_scale , bar_label_digits , bar_label_scale , units_scale , bar_tick_scale , bar_extra_margin
|
Set of parameters to control the visual aspect of the drawn colour bar. See ?ColorBar for a full explanation. |
units |
Title at the top of the colour bar, most commonly the units of the variable provided in parameter 'var'. |
bar_left_shift_scale |
When plotting row titles, a shift is added to the horizontal positioning of the colour bar in order to center it to the region of the figures (without taking row titles into account). This shift can be reduced. A value of 0 will remove the shift completely, centering the colour bar to the total width of the device. This parameter will be disregarded if no 'row_titles' are provided. |
extra_margin |
Extra margins to be added around the layout, in the format c(y1, x1, y2, x2). The units are margin lines. Takes rep(0, 4) by default. |
layout_by_rows |
Logical indicating wether the panels should be filled by columns (FALSE) or by raws (TRUE, default). |
fileout |
File where to save the plot. If not specified (default) a graphics device will pop up. Extensions allowed: eps/ps, jpeg, png, pdf, bmp and tiff. |
width |
Width in inches of the multi-pannel. 7 by default, or 11 if 'fielout' has been specified. |
height |
Height in inches of the multi-pannel. 7 by default, or 11 if 'fileout' has been specified. |
size_units |
Units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
close_device |
Whether to close the graphics device after plotting the layout and a 'fileout' has been specified. This is useful to avoid closing the device when saving the layout into a file and willing to add extra elements or figures. Takes TRUE by default. Disregarded if no 'fileout' has been specified. |
brks |
Breaks used for colouring the map (and legend if drawleg = TRUE). |
cols |
Colours used for colouring the map (and legend if drawleg = TRUE). Always of length length(brks) - 1. |
col_inf |
Colour used to draw the lower triangle end in the colour bar (NULL if not drawn at all). |
col_sup |
Colour used to draw the upper triangle end in the colour bar (NULL if not drawn at all). |
layout_matrix |
Underlying matrix of the layout. Useful to later set any of the layout cells as current figure to add plot elements. See .SwitchToFigure. |
# See examples on Load() to understand the first lines in this example ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, leadtimemin = 1, leadtimemax = 4, output = 'lonlat', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) PlotLayout(PlotEquiMap, c('lat', 'lon'), sampleData$mod[1, , 1, 1, , ], sampleData$lon, sampleData$lat, toptitle = 'Predicted tos for Nov 1960 from 1st Nov', titles = paste('Member', 1:15))
# See examples on Load() to understand the first lines in this example ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, leadtimemin = 1, leadtimemax = 4, output = 'lonlat', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) PlotLayout(PlotEquiMap, c('lat', 'lon'), sampleData$mod[1, , 1, 1, , ], sampleData$lon, sampleData$lat, toptitle = 'Predicted tos for Nov 1960 from 1st Nov', titles = paste('Member', 1:15))
This function converts a numerical data matrix into a coloured grid. It is useful for a slide or article to present tabular results as colors instead of numbers.
PlotMatrix( var, brks = NULL, cols = NULL, toptitle = NULL, title.color = "royalblue4", xtitle = NULL, ytitle = NULL, xlabels = NULL, xvert = FALSE, ylabels = NULL, line = 3, figure.width = 1, legend = TRUE, legend.width = 0.15, xlab_dist = NULL, ylab_dist = NULL, fileout = NULL, size_units = "px", res = 100, ... )
PlotMatrix( var, brks = NULL, cols = NULL, toptitle = NULL, title.color = "royalblue4", xtitle = NULL, ytitle = NULL, xlabels = NULL, xvert = FALSE, ylabels = NULL, line = 3, figure.width = 1, legend = TRUE, legend.width = 0.15, xlab_dist = NULL, ylab_dist = NULL, fileout = NULL, size_units = "px", res = 100, ... )
var |
A numerical matrix containing the values to be displayed in a colored image. |
brks |
A vector of the color bar intervals. The length must be one more than the parameter 'cols'. Use ColorBar() to generate default values. |
cols |
A vector of valid color identifiers for color bar. The length must be one less than the parameter 'brks'. Use ColorBar() to generate default values. |
toptitle |
A string of the title of the grid. Set NULL as default. |
title.color |
A string of valid color identifier to decide the title color. Set "royalblue4" as default. |
xtitle |
A string of title of the x-axis. Set NULL as default. |
ytitle |
A string of title of the y-axis. Set NULL as default. |
xlabels |
A vector of labels of the x-axis. The length must be length of the column of parameter 'var'. Set the sequence from 1 to the length of the column of parameter 'var' as default. |
xvert |
A logical value to decide whether to place x-axis labels vertically. Set FALSE as default, which keeps the labels horizontally. |
ylabels |
A vector of labels of the y-axis The length must be length of the row of parameter 'var'. Set the sequence from 1 to the length of the row of parameter 'var' as default. |
line |
An integer specifying the distance between the title of the x-axis and the x-axis. Set 3 as default. Adjust if the x-axis labels are long. |
figure.width |
A positive number as a ratio adjusting the width of the grids. Set 1 as default. |
legend |
A logical value to decide to draw the grid color legend or not. Set TRUE as default. |
legend.width |
A number between 0 and 0.5 to adjust the legend width. Set 0.15 as default. |
xlab_dist |
A number specifying the distance between the x labels and the x axis. If not specified, it equals to -1 - (nrow(var) / 10 - 1). |
ylab_dist |
A number specifying the distance between the y labels and the y axis. If not specified, it equals to 0.5 - ncol(var) / 10. |
fileout |
A string of full directory path and file name indicating where to save the plot. If not specified (default), a graphics device will pop up. |
size_units |
A string indicating the units of the size of the device (file or window) to plot in. Set 'px' as default. See ?Devices and the creator function of the corresponding device. |
res |
A positive number indicating resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
The additional parameters to be passed to function ColorBar() in s2dv for color legend creation. |
A figure in popup window by default, or saved to the specified path.
#Example with random data PlotMatrix(var = matrix(rnorm(n = 120, mean = 0.3), 10, 12), cols = c('white','#fef0d9','#fdd49e','#fdbb84','#fc8d59', '#e34a33','#b30000', '#7f0000'), brks = c(-1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 1), toptitle = "Mean Absolute Error", xtitle = "Forecast time (month)", ytitle = "Start date", xlabels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
#Example with random data PlotMatrix(var = matrix(rnorm(n = 120, mean = 0.3), 10, 12), cols = c('white','#fef0d9','#fdd49e','#fdbb84','#fc8d59', '#e34a33','#b30000', '#7f0000'), brks = c(-1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 1), toptitle = "Mean Absolute Error", xtitle = "Forecast time (month)", ytitle = "Start date", xlabels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
Plot a (longitude,depth) or (latitude,depth) section.
PlotSection( var, horiz, depth, toptitle = "", sizetit = 1, units = "", brks = NULL, cols = NULL, axelab = TRUE, intydep = 200, intxhoriz = 20, drawleg = TRUE, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
PlotSection( var, horiz, depth, toptitle = "", sizetit = 1, units = "", brks = NULL, cols = NULL, axelab = TRUE, intydep = 200, intxhoriz = 20, drawleg = TRUE, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
var |
Matrix to plot with (longitude/latitude, depth) dimensions. |
horiz |
Array of longitudes or latitudes. |
depth |
Array of depths. |
toptitle |
Title, optional. |
sizetit |
Multiplicative factor to increase title size, optional. |
units |
Units, optional. |
brks |
Colour levels, optional. |
cols |
List of colours, optional. |
axelab |
TRUE/FALSE, label the axis. Default = TRUE. |
intydep |
Interval between depth ticks on y-axis. Default: 200m. |
intxhoriz |
Interval between longitude/latitude ticks on x-axis. |
drawleg |
Draw colorbar. Default: TRUE. |
fileout |
Name of output file. Extensions allowed: eps/ps, jpeg, png,
pdf, bmp and tiff. |
width |
File width, in the units specified in the parameter size_units (inches by default). Takes 8 by default. |
height |
File height, in the units specified in the parameter size_units (inches by default). Takes 5 by default. |
size_units |
Units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
sampleData <- s2dv::sampleDepthData PlotSection(sampleData$mod[1, 1, 1, 1, , ], sampleData$lat, sampleData$depth, toptitle = 'temperature 1995-11 member 0')
sampleData <- s2dv::sampleDepthData PlotSection(sampleData$mod[1, 1, 1, 1, , ], sampleData$lat, sampleData$depth, toptitle = 'temperature 1995-11 member 0')
Map longitude-latitude array (on a regular rectangular or gaussian grid) on a polar stereographic world projection with coloured grid cells. Only the region within a specified latitude interval is displayed. A colour bar (legend) can be plotted and adjusted. It is possible to draw superimposed dots, symbols, boxes, contours, and arrows. A number of options is provided to adjust the position, size and colour of the components. This plot function is compatible with figure layouts if colour bar is disabled.
PlotStereoMap( var, lon, lat, varu = NULL, varv = NULL, latlims = c(60, 90), toptitle = NULL, sizetit = NULL, units = NULL, brks = NULL, cols = NULL, bar_limits = NULL, triangle_ends = NULL, col_inf = NULL, col_sup = NULL, colNA = NULL, color_fun = clim.palette(), filled.continents = FALSE, coast_color = NULL, coast_width = 1, contours = NULL, brks2 = NULL, contour_lwd = 0.5, contour_color = "black", contour_lty = 1, contour_label_draw = TRUE, contour_label_scale = 0.6, dots = NULL, dot_symbol = 4, dot_size = 0.8, intlat = 10, arr_subsamp = floor(length(lon)/30), arr_scale = 1, arr_ref_len = 15, arr_units = "m/s", arr_scale_shaft = 1, arr_scale_shaft_angle = 1, drawleg = TRUE, subsampleg = NULL, bar_extra_labels = NULL, draw_bar_ticks = TRUE, draw_separators = FALSE, triangle_ends_scale = 1, bar_label_digits = 4, bar_label_scale = 1, units_scale = 1, bar_tick_scale = 1, bar_extra_margin = rep(0, 4), boxlim = NULL, boxcol = "purple2", boxlwd = 5, margin_scale = rep(1, 4), title_scale = 1, numbfig = NULL, fileout = NULL, width = 6, height = 5, size_units = "in", res = 100, ... )
PlotStereoMap( var, lon, lat, varu = NULL, varv = NULL, latlims = c(60, 90), toptitle = NULL, sizetit = NULL, units = NULL, brks = NULL, cols = NULL, bar_limits = NULL, triangle_ends = NULL, col_inf = NULL, col_sup = NULL, colNA = NULL, color_fun = clim.palette(), filled.continents = FALSE, coast_color = NULL, coast_width = 1, contours = NULL, brks2 = NULL, contour_lwd = 0.5, contour_color = "black", contour_lty = 1, contour_label_draw = TRUE, contour_label_scale = 0.6, dots = NULL, dot_symbol = 4, dot_size = 0.8, intlat = 10, arr_subsamp = floor(length(lon)/30), arr_scale = 1, arr_ref_len = 15, arr_units = "m/s", arr_scale_shaft = 1, arr_scale_shaft_angle = 1, drawleg = TRUE, subsampleg = NULL, bar_extra_labels = NULL, draw_bar_ticks = TRUE, draw_separators = FALSE, triangle_ends_scale = 1, bar_label_digits = 4, bar_label_scale = 1, units_scale = 1, bar_tick_scale = 1, bar_extra_margin = rep(0, 4), boxlim = NULL, boxcol = "purple2", boxlwd = 5, margin_scale = rep(1, 4), title_scale = 1, numbfig = NULL, fileout = NULL, width = 6, height = 5, size_units = "in", res = 100, ... )
var |
Array with the values at each cell of a grid on a regular rectangular or gaussian grid. The array is expected to have two dimensions: c(latitude, longitude). Longitudes can be in ascending or descending order and latitudes in any order. It can contain NA values (coloured with 'colNA'). Arrays with dimensions c(longitude, latitude) will also be accepted but 'lon' and 'lat' will be used to disambiguate so this alternative is not appropriate for square arrays. |
lon |
Numeric vector of longitude locations of the cell centers of the
grid of 'var', in ascending or descending order (same as 'var'). Expected
to be regularly spaced, within either of the ranges [-180, 180] or
[0, 360]. Data for two adjacent regions split by the limits of the
longitude range can also be provided, e.g. |
lat |
Numeric vector of latitude locations of the cell centers of the grid of 'var', in any order (same as 'var'). Expected to be from a regular rectangular or gaussian grid, within the range [-90, 90]. |
varu |
Array of the zonal component of wind/current/other field with the same dimensions as 'var'. |
varv |
Array of the meridional component of wind/current/other field with the same dimensions as 'var'. |
latlims |
Latitudinal limits of the figure. |
toptitle |
Top title of the figure, scalable with parameter 'title_scale'. |
sizetit |
Scale factor for the figure top title provided in parameter 'toptitle'. Deprecated. Use 'title_scale' instead. |
units |
Title at the top of the colour bar, most commonly the units of the variable provided in parameter 'var'. |
brks , cols , bar_limits , triangle_ends
|
Usually only providing 'brks' is
enough to generate the desired colour bar. These parameters allow to
define n breaks that define n - 1 intervals to classify each of the values
in 'var'. The corresponding grid cell of a given value in 'var' will be
coloured in function of the interval it belongs to. These parameters are
sent to |
col_inf , col_sup , colNA
|
Colour identifiers to colour the values in 'var' that go beyond the extremes of the colour bar and to colour NA values, respectively. 'colNA' takes attr(cols, 'na_color') if available by default, where cols is the parameter 'cols' if provided or the vector of colors returned by 'color_fun'. If not available, it takes 'pink' by default. 'col_inf' and 'col_sup' will take the value of 'colNA' if not specified. See ?ColorBar for a full explanation on 'col_inf' and 'col_sup'. |
color_fun , subsampleg , bar_extra_labels , draw_bar_ticks , draw_separators , triangle_ends_scale , bar_label_digits , bar_label_scale , units_scale , bar_tick_scale , bar_extra_margin
|
Set of parameters to control the visual aspect of the drawn colour bar. See ?ColorBar for a full explanation. |
filled.continents |
Colour to fill in drawn projected continents. Takes the value gray(0.5) by default. If set to FALSE, continents are not filled in. |
coast_color |
Colour of the coast line of the drawn projected continents. Takes the value gray(0.5) by default. |
coast_width |
Line width of the coast line of the drawn projected continents. Takes the value 1 by default. |
contours |
Array of same dimensions as 'var' to be added to the plot and displayed with contours. Parameter 'brks2' is required to define the magnitude breaks for each contour curve. |
brks2 |
A numeric value or vector of magnitude breaks where to draw contour curves for the array provided in 'contours'. If it is a number, it represents the number of breaks (n) that defines (n - 1) intervals to classify 'contours'. |
contour_lwd |
Line width of the contour curves provided via 'contours' and 'brks2'. The default value is 0.5. |
contour_color |
Line color of the contour curves provided via 'contours' and 'brks2'. |
contour_lty |
Line type of the contour curves. Takes 1 (solid) by default. See help on 'lty' in par() for other accepted values. |
contour_label_draw |
A logical value indicating whether to draw the contour labels (TRUE) or not (FALSE) when 'contours' is used. The default value is TRUE. |
contour_label_scale |
Scale factor for the superimposed labels when drawing contour levels. The default value is 0.6. |
dots |
Array of same dimensions as 'var' or with dimensions c(n, dim(var)), where n is the number of dot/symbol layers to add to the plot. A value of TRUE at a grid cell will draw a dot/symbol on the corresponding square of the plot. By default all layers provided in 'dots' are plotted with dots, but a symbol can be specified for each of the layers via the parameter 'dot_symbol'. |
dot_symbol |
Single character/number or vector of characters/numbers that correspond to each of the symbol layers specified in parameter 'dots'. If a single value is specified, it will be applied to all the layers in 'dots'. Takes 15 (centered square) by default. See 'pch' in par() for additional accepted options. |
dot_size |
Scale factor for the dots/symbols to be plotted, specified in 'dots'. If a single value is specified, it will be applied to all layers in 'dots'. Takes 1 by default. |
intlat |
Interval between latitude lines (circles), in degrees. Defaults to 10. |
arr_subsamp |
A number as subsampling factor to select a subset of arrows in 'varu' and 'varv' to be drawn. Only one out of arr_subsamp arrows will be drawn. The default value is 1. |
arr_scale |
A number as scale factor for drawn arrows from 'varu' and 'varv'. The default value is 1. |
arr_ref_len |
A number of the length of the refence arrow to be drawn as legend at the bottom of the figure (in same units as 'varu' and 'varv', only affects the legend for the wind or variable in these arrays). The default value is 15. |
arr_units |
Units of 'varu' and 'varv', to be drawn in the legend. Takes 'm/s' by default. |
arr_scale_shaft |
A number for the scale of the shaft of the arrows (which also depend on the number of figures and the arr_scale parameter). The default value is 1. |
arr_scale_shaft_angle |
A number for the scale of the angle of the shaft of the arrows (which also depend on the number of figure and the arr_scale parameter). The default value is 1. |
drawleg |
Whether to plot a color bar (legend, key) or not. Defaults to TRUE. |
boxlim |
Limits of a box to be added to the plot, in degrees: c(x1, y1, x2, y2). A list with multiple box specifications can also be provided. |
boxcol |
Colour of the box lines. A vector with a colour for each of the boxes is also accepted. Defaults to 'purple2'. |
boxlwd |
Line width of the box lines. A vector with a line width for each of the boxes is also accepted. Defaults to 5. |
margin_scale |
Scale factor for the margins to be added to the plot, with the format c(y1, x1, y2, x2). Defaults to rep(1, 4). If drawleg = TRUE, margin_scale[1] is subtracted 1 unit. |
title_scale |
Scale factor for the figure top title. Defaults to 1. |
numbfig |
Number of figures in the layout the plot will be put into. A higher numbfig will result in narrower margins and smaller labels, axe labels, ticks, thinner lines, ... Defaults to 1. |
fileout |
File where to save the plot. If not specified (default) a graphics device will pop up. Extensions allowed: eps/ps, jpeg, png, pdf, bmp and tiff. |
width |
File width, in the units specified in the parameter size_units (inches by default). Takes 8 by default. |
height |
File height, in the units specified in the parameter size_units (inches by default). Takes 5 by default. |
size_units |
Units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
Arguments to be passed to the method. Only accepts the
following graphical parameters: |
brks |
Breaks used for colouring the map (and legend if drawleg = TRUE). |
cols |
Colours used for colouring the map (and legend if drawleg = TRUE). Always of length length(brks) - 1. |
col_inf |
Colour used to draw the lower triangle end in the colour bar (NULL if not drawn at all). |
col_sup |
Colour used to draw the upper triangle end in the colour bar (NULL if not drawn at all). |
data <- matrix(rnorm(100 * 50), 100, 50) x <- seq(from = 0, to = 360, length.out = 100) y <- seq(from = -90, to = 90, length.out = 50) ## Not run: PlotStereoMap(data, x, y, latlims = c(60, 90), brks = 50, toptitle = "This is the title") ## End(Not run)
data <- matrix(rnorm(100 * 50), 100, 50) x <- seq(from = 0, to = 360, length.out = 100) y <- seq(from = -90, to = 90, length.out = 50) ## Not run: PlotStereoMap(data, x, y, latlims = c(60, 90), brks = 50, toptitle = "This is the title") ## End(Not run)
Plot the correlation (Corr()
), the root mean square error
(RMS()
) between the forecast values and their observational
counterpart, the slope of their trend (Trend()
), the
InterQuartile range, maximum-mininum, standard deviation or median absolute
Deviation of the ensemble members (Spread()
), or the ratio between
the ensemble spread and the RMSE of the ensemble mean (RatioSDRMS()
)
along the forecast time for all the input experiments on the same figure
with their confidence intervals.
PlotVsLTime( var, toptitle = "", ytitle = "", monini = 1, freq = 12, nticks = NULL, limits = NULL, listexp = c("exp1", "exp2", "exp3"), listobs = c("obs1", "obs2", "obs3"), biglab = FALSE, hlines = NULL, leg = TRUE, siglev = FALSE, sizetit = 1, show_conf = TRUE, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
PlotVsLTime( var, toptitle = "", ytitle = "", monini = 1, freq = 12, nticks = NULL, limits = NULL, listexp = c("exp1", "exp2", "exp3"), listobs = c("obs1", "obs2", "obs3"), biglab = FALSE, hlines = NULL, leg = TRUE, siglev = FALSE, sizetit = 1, show_conf = TRUE, fileout = NULL, width = 8, height = 5, size_units = "in", res = 100, ... )
var |
Matrix containing any Prediction Score with dimensions: |
toptitle |
Main title, optional. |
ytitle |
Title of Y-axis, optional. |
monini |
Starting month between 1 and 12. Default = 1. |
freq |
1 = yearly, 12 = monthly, 4 = seasonal, ... Default = 12. |
nticks |
Number of ticks and labels on the x-axis, optional. |
limits |
c(lower limit, upper limit): limits of the Y-axis, optional. |
listexp |
List of experiment names, optional. |
listobs |
List of observation names, optional. |
biglab |
TRUE/FALSE for presentation/paper plot. Default = FALSE. |
hlines |
c(a,b, ..) Add horizontal black lines at Y-positions a,b, ... |
leg |
TRUE/FALSE if legend should be added or not to the plot. Default = TRUE. |
siglev |
TRUE/FALSE if significance level should replace confidence
interval. |
sizetit |
Multiplicative factor to change title size, optional. |
show_conf |
TRUE/FALSE to show/not confidence intervals for input variables. |
fileout |
Name of output file. Extensions allowed: eps/ps, jpeg, png, pdf, bmp and tiff. The default value is NULL. |
width |
File width, in the units specified in the parameter size_units (inches by default). Takes 8 by default. |
height |
File height, in the units specified in the parameter size_units (inches by default). Takes 5 by default. |
size_units |
Units of the size of the device (file or window) to plot in. Inches ('in') by default. See ?Devices and the creator function of the corresponding device. |
res |
Resolution of the device (file or window) to plot in. See ?Devices and the creator function of the corresponding device. |
... |
Arguments to be passed to the method. Only accepts the following
graphical parameters: |
Examples of input:
Model and observed output from Load()
then Clim()
then
Ano()
then Smoothing()
:
(nmod, nmemb, nsdate, nltime) and (nobs, nmemb, nsdate, nltime)
then averaged over the membersMean1Dim(var_exp/var_obs, posdim = 2)
:
(nmod, nsdate, nltime) and (nobs, nsdate, nltime)
then passed throughCorr(exp, obs, posloop = 1, poscor = 2)
orRMS(exp, obs, posloop = 1, posRMS = 2)
:
(nmod, nobs, 3, nltime)
would plot the correlations or RMS between each exp & each obs as a function
of the forecast time.
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) runmean_months <- 12 smooth_ano_exp <- Smoothing(data = ano_exp, runmeanlen = runmean_months) smooth_ano_obs <- Smoothing(data = ano_obs, runmeanlen = runmean_months) dim_to_mean <- 'member' # mean along members required_complete_row <- 'ftime' # discard startdates for which there are NA leadtimes leadtimes_per_startdate <- 60 corr <- Corr(MeanDims(smooth_ano_exp, dim_to_mean), MeanDims(smooth_ano_obs, dim_to_mean), comp_dim = required_complete_row, dat_dim = 'dataset', limits = c(ceiling((runmean_months + 1) / 2), leadtimes_per_startdate - floor(runmean_months / 2))) # Combine corr results for plotting corr_combine <- abind::abind(corr$conf.lower, corr$corr, corr$conf.upper, corr$p.val, along = 0) corr_combine <- Reorder(corr_combine, c(2, 3, 1, 4)) PlotVsLTime(corr_combine, toptitle = "correlations", ytitle = "correlation", monini = 11, limits = c(-1, 2), listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(-1, 0, 1))
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) runmean_months <- 12 smooth_ano_exp <- Smoothing(data = ano_exp, runmeanlen = runmean_months) smooth_ano_obs <- Smoothing(data = ano_obs, runmeanlen = runmean_months) dim_to_mean <- 'member' # mean along members required_complete_row <- 'ftime' # discard startdates for which there are NA leadtimes leadtimes_per_startdate <- 60 corr <- Corr(MeanDims(smooth_ano_exp, dim_to_mean), MeanDims(smooth_ano_obs, dim_to_mean), comp_dim = required_complete_row, dat_dim = 'dataset', limits = c(ceiling((runmean_months + 1) / 2), leadtimes_per_startdate - floor(runmean_months / 2))) # Combine corr results for plotting corr_combine <- abind::abind(corr$conf.lower, corr$corr, corr$conf.upper, corr$p.val, along = 0) corr_combine <- Reorder(corr_combine, c(2, 3, 1, 4)) PlotVsLTime(corr_combine, toptitle = "correlations", ytitle = "correlation", monini = 11, limits = c(-1, 2), listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(-1, 0, 1))
Compute probabilistic bins of a set of forecast years ('fcyr') relative to the forecast climatology over the whole period of anomalies, optionally excluding the selected forecast years ('fcyr') or the forecast year for which the probabilistic bins are being computed (see 'compPeriod').
ProbBins( data, thr, fcyr = "all", time_dim = "sdate", memb_dim = "member", quantile = TRUE, compPeriod = "Full period", ncores = NULL )
ProbBins( data, thr, fcyr = "all", time_dim = "sdate", memb_dim = "member", quantile = TRUE, compPeriod = "Full period", ncores = NULL )
data |
An numeric array of anomalies with the dimensions 'time_dim' and
'memb_dim' at least. It can be generated by |
thr |
A numeric vector used as the quantiles (if 'quantile' is TRUE) or thresholds (if 'quantile' is FALSE) to bin the anomalies. If it is quantile, it must be within [0, 1]. |
fcyr |
A numeric vector of the indices of the forecast years (i.e., time_dim) to compute the probabilistic bins for, or 'all' to compute the bins for all the years. E.g., c(1:5), c(1, 4), 4, or 'all'. The default value is 'all'. |
time_dim |
A character string indicating the dimension along which to compute the probabilistic bins. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension or the dimension to be merged with 'time_dim' for probabilistic calculation. The default value is 'member'. |
quantile |
A logical value indicating if the thresholds ('thr') are quantiles (TRUE) or the absolute thresholds of the bins (FALSE). The default value is TRUE. |
compPeriod |
A character string referring to three computation options: |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numeric array of probabilistic information with dimensions:
c(bin = length of 'thr' + 1, time_dim = length of 'fcyr', memb_dim, the
rest of dimensions of 'data')
The values along the 'bin' dimension take values 0 or 1 depending on which
of the 'thr' + 1 cathegories the forecast or observation at the corresponding
grid point, time step, member and start date belongs to.
clim <- Clim(sampleMap$mod, sampleMap$obs) ano_exp <- Ano(sampleMap$mod, clim$clim_exp) PB <- ProbBins(ano_exp, fcyr = 3, thr = c(1/3, 2/3), quantile = TRUE)
clim <- Clim(sampleMap$mod, sampleMap$obs) ano_exp <- Ano(sampleMap$mod, clim$clim_exp) PB <- ProbBins(ano_exp, fcyr = 3, thr = c(1/3, 2/3), quantile = TRUE)
Project anomalies onto modes of variability to get the temporal evolution of
the EOF mode selected. It returns principal components (PCs) by area-weighted
projection onto EOF pattern (from EOF()
) or REOF pattern (from
REOF()
or EuroAtlanticTC()
). The calculation removes NA and
returns NA if the whole spatial pattern is NA.
ProjectField( ano, eof, time_dim = "sdate", space_dim = c("lat", "lon"), mode = NULL, ncores = NULL )
ProjectField( ano, eof, time_dim = "sdate", space_dim = c("lat", "lon"), mode = NULL, ncores = NULL )
ano |
A numerical array of anomalies with named dimensions. The dimensions must have at least 'time_dim' and 'space_dim'. It can be generated by Ano(). |
eof |
A list that contains at least 'EOFs' or 'REOFs' and 'wght', which are both arrays. 'EOFs' or 'REOFs' must have dimensions 'mode' and 'space_dim' at least. 'wght' has dimensions space_dim. It can be generated by EOF() or REOF(). |
time_dim |
A character string indicating the name of the time dimension of 'ano'. The default value is 'sdate'. |
space_dim |
A vector of two character strings. The first is the dimension name of latitude of 'ano' and the second is the dimension name of longitude of 'ano'. The default value is c('lat', 'lon'). |
mode |
An integer of the variability mode number in the EOF to be projected on. The default value is NULL, which means all the modes of 'eof' is calculated. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array of the principal components in the verification format. The dimensions are the same as 'ano' except 'space_dim'.
EOF, NAO, PlotBoxWhisker
ano <- Ano_CrossValid(sampleData$mod, sampleData$obs) eof_exp <- EOF(ano$exp, sampleData$lat, sampleData$lon) eof_obs <- EOF(ano$obs, sampleData$lat, sampleData$lon) mode1_exp <- ProjectField(ano$exp, eof_exp, mode = 1) mode1_obs <- ProjectField(ano$obs, eof_obs, mode = 1) ## Not run: # Plot the forecast and the observation of the first mode for the last year # of forecast sdate_dim_length <- dim(mode1_obs)['sdate'] plot(mode1_obs[sdate_dim_length, 1, 1, ], type = "l", ylim = c(-1, 1), lwd = 2) for (i in 1:dim(mode1_exp)['member']) { par(new = TRUE) plot(mode1_exp[sdate_dim_length, 1, i, ], type = "l", col = rainbow(10)[i], ylim = c(-15000, 15000)) } ## End(Not run)
ano <- Ano_CrossValid(sampleData$mod, sampleData$obs) eof_exp <- EOF(ano$exp, sampleData$lat, sampleData$lon) eof_obs <- EOF(ano$obs, sampleData$lat, sampleData$lon) mode1_exp <- ProjectField(ano$exp, eof_exp, mode = 1) mode1_obs <- ProjectField(ano$obs, eof_obs, mode = 1) ## Not run: # Plot the forecast and the observation of the first mode for the last year # of forecast sdate_dim_length <- dim(mode1_obs)['sdate'] plot(mode1_obs[sdate_dim_length, 1, 1, ], type = "l", ylim = c(-1, 1), lwd = 2) for (i in 1:dim(mode1_exp)['member']) { par(new = TRUE) plot(mode1_exp[sdate_dim_length, 1, i, ], type = "l", col = rainbow(10)[i], ylim = c(-15000, 15000)) } ## End(Not run)
Forecast comparison of the skill obtained with 2 forecasts (with respect to a common observational reference) based on Random Walks (DelSole and Tippett, 2016).
RandomWalkTest( skill_A, skill_B, time_dim = "sdate", test.type = "two.sided.approx", alpha = 0.05, pval = TRUE, sign = FALSE, N.eff = FALSE, ncores = NULL )
RandomWalkTest( skill_A, skill_B, time_dim = "sdate", test.type = "two.sided.approx", alpha = 0.05, pval = TRUE, sign = FALSE, N.eff = FALSE, ncores = NULL )
skill_A |
A numerical array of the time series of the scores obtained with the forecaster A. |
skill_B |
A numerical array of the time series of the scores obtained with the forecaster B. The dimensions should be identical as parameter 'skill_A'. |
time_dim |
A character string indicating the name of the dimension along which the tests are computed. The default value is 'sdate'. |
test.type |
A character string indicating the type of significance test. It can be "two.sided.approx" (to assess whether forecaster A and forecaster B are significantly different in terms of skill with a two-sided test using the approximation of DelSole and Tippett, 2016), "two.sided" (to assess whether forecaster A and forecaster B are significantly different in terms of skill with an exact two-sided test), "greater" (to assess whether forecaster A shows significantly better skill than forecaster B with a one-sided test for negatively oriented scores), or "less" (to assess whether forecaster A shows significantly better skill than forecaster B with a one-sided test for positively oriented scores). The default value is "two.sided.approx". |
alpha |
A numeric of the significance level to be used in the statistical significance test (output "sign"). The default value is 0.05. |
pval |
A logical value indicating whether to return the p-value of the significance test. The default value is TRUE. |
sign |
A logical value indicating whether to return the statistical significance of the test based on 'alpha'. The default value is FALSE. |
N.eff |
Effective sample size to be used in the statistical significance test. It can be FALSE (and the length of the time series will be used), a numeric (which is used for all cases), or an array with the same dimensions as "skill_A" except "time_dim" (for a particular N.eff to be used for each case). The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
Null and alternative hypothesis for "two-sided" test (regardless of the
orientation of the scores):
H0: forecaster A and forecaster B are not different in terms of skill
H1: forecaster A and forecaster B are different in terms of skill
Null and alternative hypothesis for one-sided "greater" (for negatively
oriented scores, i.e., the lower the better) and "less" (for positively
oriented scores, i.e., the higher the better) tests:
H0: forecaster A is not better than forecaster B
H1: forecaster A is better than forecaster B
Examples of negatively oriented scores are the RPS, RMSE and the Error, while the ROC score is a positively oriented score.
DelSole and Tippett (2016) approximation for two-sided test at 95 level: significant if the difference between the number of times that forecaster A has been better than forecaster B and forecaster B has been better than forecaster A is above 2sqrt(N) or below -2sqrt(N).
A list with:
$score |
A numerical array with the same dimensions as the input arrays except 'time_dim'. The number of times that forecaster A has been better than forecaster B minus the number of times that forecaster B has been better than forecaster A (for skill negatively oriented, i.e., the lower the better). If $score is positive, forecaster A has been better more times than forecaster B. If $score is negative, forecaster B has been better more times than forecaster A. |
$sign |
A logical array of the statistical significance with the same dimensions as the input arrays except "time_dim". Returned only if "sign" is TRUE. |
$p.val |
A numeric array of the p-values with the same dimensions as the input arrays except "time_dim". Returned only if "pval" is TRUE. |
DelSole and Tippett (2016): https://doi.org/10.1175/MWR-D-15-0218.1
fcst_A <- array(data = 11:50, dim = c(sdate = 10, lat = 2, lon = 2)) fcst_B <- array(data = 21:60, dim = c(sdate = 10, lat = 2, lon = 2)) reference <- array(data = 1:40, dim = c(sdate = 10, lat = 2, lon = 2)) scores_A <- abs(fcst_A - reference) scores_B <- abs(fcst_B - reference) res1 <- RandomWalkTest(skill_A = scores_A, skill_B = scores_B, pval = FALSE, sign = TRUE) res2 <- RandomWalkTest(skill_A = scores_A, skill_B = scores_B, test.type = 'greater')
fcst_A <- array(data = 11:50, dim = c(sdate = 10, lat = 2, lon = 2)) fcst_B <- array(data = 21:60, dim = c(sdate = 10, lat = 2, lon = 2)) reference <- array(data = 1:40, dim = c(sdate = 10, lat = 2, lon = 2)) scores_A <- abs(fcst_A - reference) scores_B <- abs(fcst_B - reference) res1 <- RandomWalkTest(skill_A = scores_A, skill_B = scores_B, pval = FALSE, sign = TRUE) res2 <- RandomWalkTest(skill_A = scores_A, skill_B = scores_B, test.type = 'greater')
This function computes the ratio of predictable components (RPC; Eade et al., 2014).
RatioPredictableComponents( exp, obs, time_dim = "year", memb_dim = "member", na.rm = FALSE, ncores = NULL )
RatioPredictableComponents( exp, obs, time_dim = "year", memb_dim = "member", na.rm = FALSE, ncores = NULL )
exp |
A numerical array with, at least, 'time_dim' and 'memb_dim' dimensions. |
obs |
A numerical array with the same dimensions than 'exp' except the 'memb_dim' dimension. |
time_dim |
A character string indicating the name of the time dimension. The default value is 'year'. |
memb_dim |
A character string indicating the name of the member dimension. The default value is 'member'. |
na.rm |
A logical value indicating whether to remove NA values during the computation. The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
An array of the ratio of the predictable components. it has the same dimensions as 'exp' except 'time_dim' and 'memb_dim' dimensions.
exp <- array(data = runif(600), dim = c(year = 15, member = 10, lat = 2, lon = 2)) obs <- array(data = runif(60), dim = c(year = 15, lat = 2, lon = 2)) RatioPredictableComponents(exp, obs)
exp <- array(data = runif(600), dim = c(year = 15, member = 10, lat = 2, lon = 2)) obs <- array(data = runif(60), dim = c(year = 15, lat = 2, lon = 2)) RatioPredictableComponents(exp, obs)
Calculate the ratio of the RMSE for two forecasts with the same observation, that is, RMSE(ens, obs) / RMSE(ens.ref, obs). The p-value is provided by a two-sided Fischer test.
RatioRMS(exp1, exp2, obs, time_dim = "sdate", pval = TRUE, ncores = NULL)
RatioRMS(exp1, exp2, obs, time_dim = "sdate", pval = TRUE, ncores = NULL)
exp1 |
A numeric array with named dimensions of the first experimental data. It must have at least 'time_dim' and have the same dimensions as 'exp2' and 'obs'. |
exp2 |
A numeric array with named dimensions of the second experimental data. It must have at least 'time_dim' and have the same dimensions as 'exp1' and 'obs'. |
obs |
A numeric array with named dimensions of the observational data. It must have at least 'time_dim' and have the same dimensions as 'exp1' and 'exp2'. |
time_dim |
A character string of the dimension name along which RMS is computed. The default value is 'sdate'. |
pval |
A logical value indicating whether to compute the p-value of Ho: RMSE1/RMSE2 = 1 or not. The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing the numeric arrays with dimensions identical with 'exp1', 'exp2', and 'obs', except 'time_dim':
$ratiorms |
The ratio between the RMSE (i.e., RMSE1/RMSE2). |
$p.val |
The p-value of the two-sided Fisher test with Ho: RMSE1/RMSE2 = 1. Only exists if 'pval' is TRUE. |
# Compute DJF seasonal means and anomalies. initial_month <- 11 mean_start_month <- 12 mean_stop_month <- 2 sampleData$mod <- Season(sampleData$mod, monini = initial_month, moninf = mean_start_month, monsup = mean_stop_month) sampleData$obs <- Season(sampleData$obs, monini = initial_month, moninf = mean_start_month, monsup = mean_stop_month) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) # Generate two experiments with 2 and 1 members from the only experiment # available in the sample data. Take only data values for a single forecast # time step. ano_exp_1 <- ClimProjDiags::Subset(ano_exp, 'member', c(1, 2)) ano_exp_2 <- ClimProjDiags::Subset(ano_exp, 'member', c(3)) ano_exp_1 <- ClimProjDiags::Subset(ano_exp_1, c('dataset', 'ftime'), list(1, 1), drop = 'selected') ano_exp_2 <- ClimProjDiags::Subset(ano_exp_2, c('dataset', 'ftime'), list(1, 1), drop = 'selected') ano_obs <- ClimProjDiags::Subset(ano_obs, c('dataset', 'ftime'), list(1, 1), drop = 'selected') # Compute ensemble mean and provide as inputs to RatioRMS. rrms <- RatioRMS(MeanDims(ano_exp_1, 'member'), MeanDims(ano_exp_2, 'member'), MeanDims(ano_obs, 'member')) # Plot the RatioRMS for the first forecast time step. PlotEquiMap(rrms$ratiorms, sampleData$lon, sampleData$lat, toptitle = 'Ratio RMSE')
# Compute DJF seasonal means and anomalies. initial_month <- 11 mean_start_month <- 12 mean_stop_month <- 2 sampleData$mod <- Season(sampleData$mod, monini = initial_month, moninf = mean_start_month, monsup = mean_stop_month) sampleData$obs <- Season(sampleData$obs, monini = initial_month, moninf = mean_start_month, monsup = mean_stop_month) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) # Generate two experiments with 2 and 1 members from the only experiment # available in the sample data. Take only data values for a single forecast # time step. ano_exp_1 <- ClimProjDiags::Subset(ano_exp, 'member', c(1, 2)) ano_exp_2 <- ClimProjDiags::Subset(ano_exp, 'member', c(3)) ano_exp_1 <- ClimProjDiags::Subset(ano_exp_1, c('dataset', 'ftime'), list(1, 1), drop = 'selected') ano_exp_2 <- ClimProjDiags::Subset(ano_exp_2, c('dataset', 'ftime'), list(1, 1), drop = 'selected') ano_obs <- ClimProjDiags::Subset(ano_obs, c('dataset', 'ftime'), list(1, 1), drop = 'selected') # Compute ensemble mean and provide as inputs to RatioRMS. rrms <- RatioRMS(MeanDims(ano_exp_1, 'member'), MeanDims(ano_exp_2, 'member'), MeanDims(ano_obs, 'member')) # Plot the RatioRMS for the first forecast time step. PlotEquiMap(rrms$ratiorms, sampleData$lon, sampleData$lat, toptitle = 'Ratio RMSE')
Compute the ratio between the standard deviation of the members around the ensemble mean in experimental data and the RMSE between the ensemble mean of experimental and observational data. The p-value is provided by a one-sided Fisher's test.
RatioSDRMS( exp, obs, dat_dim = NULL, memb_dim = "member", time_dim = "sdate", pval = TRUE, ncores = NULL )
RatioSDRMS( exp, obs, dat_dim = NULL, memb_dim = "member", time_dim = "sdate", pval = TRUE, ncores = NULL )
exp |
A named numeric array of experimental data with at least two dimensions 'memb_dim' and 'time_dim'. |
obs |
A named numeric array of observational data with at least two dimensions 'memb_dim' and 'time_dim'. It should have the same dimensions as parameter 'exp' except along 'dat_dim' and 'memb_dim'. |
dat_dim |
A character string indicating the name of dataset (nobs/nexp) dimension. The default value is NULL (no dataset). |
memb_dim |
A character string indicating the name of the member dimension. It must be one dimension in 'exp' and 'obs'. The default value is 'member'. |
time_dim |
A character string indicating the name of dimension along which the ratio is computed. The default value is 'sdate'. |
pval |
A logical value indicating whether to compute or not the p-value of the test Ho : SD/RMSE = 1 or not. The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list of two arrays with dimensions c(nexp, nobs, the rest of
dimensions of 'exp' and 'obs' except memb_dim and time_dim), which nexp is
the length of dat_dim of 'exp' and nobs is the length of dat_dim of 'obs'.
If dat_dim is NULL, nexp and nobs are omitted.
$ratio |
The ratio of the ensemble spread and RMSE. |
$p_val |
The p-value of the one-sided Fisher's test with Ho: SD/RMSE = 1. Only present
if |
# Load sample data as in Load() example: example(Load) rsdrms <- RatioSDRMS(sampleData$mod, sampleData$obs, dat_dim = 'dataset') # Reorder the data in order to plot it with PlotVsLTime rsdrms_plot <- array(dim = c(dim(rsdrms$ratio)[1:2], 4, dim(rsdrms$ratio)[3])) rsdrms_plot[, , 2, ] <- rsdrms$ratio rsdrms_plot[, , 4, ] <- rsdrms$p.val ## Not run: PlotVsLTime(rsdrms_plot, toptitle = "Ratio ensemble spread / RMSE", ytitle = "", monini = 11, limits = c(-1, 1.3), listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, siglev = TRUE) ## End(Not run)
# Load sample data as in Load() example: example(Load) rsdrms <- RatioSDRMS(sampleData$mod, sampleData$obs, dat_dim = 'dataset') # Reorder the data in order to plot it with PlotVsLTime rsdrms_plot <- array(dim = c(dim(rsdrms$ratio)[1:2], 4, dim(rsdrms$ratio)[3])) rsdrms_plot[, , 2, ] <- rsdrms$ratio rsdrms_plot[, , 4, ] <- rsdrms$p.val ## Not run: PlotVsLTime(rsdrms_plot, toptitle = "Ratio ensemble spread / RMSE", ytitle = "", monini = 11, limits = c(-1, 1.3), listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, siglev = TRUE) ## End(Not run)
Compute the regression of the array 'datay' on the array 'datax' along the
'reg_dim' dimension by least square fitting (default) or self-defined model.
The function provides the slope of the regression, the intercept, and the
associated p-value and confidence interval. The filtered datay from the
regression onto datax is also provided.
The p-value relies on the F distribution, and the confidence interval relies
on the student-T distribution.
Regression( datay, datax, reg_dim = "sdate", formula = y ~ x, pval = TRUE, conf = TRUE, sign = FALSE, alpha = 0.05, na.action = na.omit, ncores = NULL )
Regression( datay, datax, reg_dim = "sdate", formula = y ~ x, pval = TRUE, conf = TRUE, sign = FALSE, alpha = 0.05, na.action = na.omit, ncores = NULL )
datay |
An numeric array as predictand including the dimension along which the regression is computed. |
datax |
An numeric array as predictor. The dimension should be identical as parameter 'datay'. |
reg_dim |
A character string indicating the dimension along which to compute the regression. The default value is 'sdate'. |
formula |
An object of class "formula" (see function |
pval |
A logical value indicating whether to retrieve the p-value or not. The default value is TRUE. |
conf |
A logical value indicating whether to retrieve the confidence intervals or not. The default value is TRUE. |
sign |
A logical value indicating whether to compute or not the statistical significance of the test The default value is FALSE. |
alpha |
A numeric of the significance level to be used in the statistical significance test. The default value is 0.05. |
na.action |
A function or an integer. A function (e.g., na.omit, na.exclude, na.fail, na.pass) indicates what should happen when the data contain NAs. A numeric indicates the maximum number of NA position (it counts as long as one of datay and datax is NA) allowed for compute regression. The default value is na.omit- |
ncores |
An integer indicating the number of cores to use for parallel computation. Default value is NULL. |
A list containing:
$regression |
A numeric array with same dimensions as parameter 'datay' and 'datax' except
the 'reg_dim' dimension, which is replaced by a 'stats' dimension containing
the regression coefficients from the lowest order (i.e., intercept) to
the highest degree. The length of the 'stats' dimension should be
|
$conf.lower |
A numeric array with same dimensions as parameter 'daty' and 'datax' except
the 'reg_dim' dimension, which is replaced by a 'stats' dimension containing
the lower value of the |
$conf.upper |
A numeric array with same dimensions as parameter 'daty' and 'datax' except
the 'reg_dim' dimension, which is replaced by a 'stats' dimension containing
the upper value of the |
$p.val |
A numeric array with same dimensions as parameter 'daty' and 'datax' except the 'reg_dim' dimension, The array contains the p-value. |
sign |
A logical array of the statistical significance of the regression with the
same dimensions as $regression. Only present if |
$filtered |
A numeric array with the same dimension as paramter 'datay' and 'datax', the filtered datay from the regression onto datax along the 'reg_dim' dimension. |
# Load sample data as in Load() example: example(Load) datay <- sampleData$mod[, 1, , ] names(dim(datay)) <- c('sdate', 'ftime') datax <- sampleData$obs[, 1, , ] names(dim(datax)) <- c('sdate', 'ftime') res1 <- Regression(datay, datax, formula = y~poly(x, 2, raw = TRUE)) res2 <- Regression(datay, datax, alpha = 0.1)
# Load sample data as in Load() example: example(Load) datay <- sampleData$mod[, 1, , ] names(dim(datay)) <- c('sdate', 'ftime') datax <- sampleData$obs[, 1, , ] names(dim(datax)) <- c('sdate', 'ftime') res1 <- Regression(datay, datax, formula = y~poly(x, 2, raw = TRUE)) res2 <- Regression(datay, datax, alpha = 0.1)
Perform an area-weighted EOF analysis with varimax rotation using single
value decomposition (SVD) based on a covariance matrix or a correlation matrix if
parameter 'corr' is set to TRUE. The internal s2dv function .EOF()
is used
internally.
REOF( ano, lat, lon, ntrunc = 15, time_dim = "sdate", space_dim = c("lat", "lon"), corr = FALSE, ncores = NULL )
REOF( ano, lat, lon, ntrunc = 15, time_dim = "sdate", space_dim = c("lat", "lon"), corr = FALSE, ncores = NULL )
ano |
A numerical array of anomalies with named dimensions to calculate REOF. The dimensions must have at least 'time_dim' and 'space_dim'. |
lat |
A vector of the latitudes of 'ano'. |
lon |
A vector of the longitudes of 'ano'. |
ntrunc |
A positive integer of the number of eofs to be kept for varimax
rotation. This function uses this value as 'neof' too, which is the number
of eofs to return by |
time_dim |
A character string indicating the name of the time dimension of 'ano'. The default value is 'sdate'. |
space_dim |
A vector of two character strings. The first is the dimension name of latitude of 'ano' and the second is the dimension name of longitude of 'ano'. The default value is c('lat', 'lon'). |
corr |
A logical value indicating whether to base on a correlation (TRUE) or on a covariance matrix (FALSE). The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing:
REOFs |
An array of REOF patterns normalized to 1 (unitless) with dimensions (number of modes, the rest of the dimensions of 'ano' except 'time_dim'). Multiplying 'REOFs' by 'RPCs' gives the original reconstructed field. |
RPCs |
An array of principal components with the units of the original field to the power of 2, with dimensions (time_dim, number of modes, the rest of the dimensions of 'ano' except 'space_dim'). |
var |
An array of the percentage ( explained by each mode. The dimensions are (number of modes, the rest of the dimension except 'time_dim' and 'space_dim'). |
wght |
An array of the area weighting with dimensions 'space_dim'. It is calculated by the square root of cosine of 'lat' and used to compute the fraction of variance explained by each REOFs. |
EOF
# This example computes the REOFs along forecast horizons and plots the one # that explains the greatest amount of variability. The example data has low # resolution so the result may not be explanatory, but it displays how to # use this function. ano <- Ano_CrossValid(sampleData$mod, sampleData$obs) ano <- MeanDims(ano$exp, c('dataset', 'member')) res <- REOF(ano, lat = sampleData$lat, lon = sampleData$lon, ntrunc = 5) ## Not run: PlotEquiMap(eof$EOFs[1, , , 1], sampleData$lat, sampleData$lon) ## End(Not run)
# This example computes the REOFs along forecast horizons and plots the one # that explains the greatest amount of variability. The example data has low # resolution so the result may not be explanatory, but it displays how to # use this function. ano <- Ano_CrossValid(sampleData$mod, sampleData$obs) ano <- MeanDims(ano$exp, c('dataset', 'member')) res <- REOF(ano, lat = sampleData$lat, lon = sampleData$lon, ntrunc = 5) ## Not run: PlotEquiMap(eof$EOFs[1, , , 1], sampleData$lat, sampleData$lon) ## End(Not run)
Reorder the dimensions of a multi-dimensional array. The order can be provided either as indices or the dimension names. If the order is dimension name, the function looks for names(dim(x)). If it doesn't exist, the function checks if attributes "dimensions" exists; this attribute is in the objects generated by Load().
Reorder(data, order)
Reorder(data, order)
data |
An array of which the dimensions to be reordered. |
order |
A vector of indices or character strings indicating the new order of the dimensions. |
An array which has the same values as parameter 'data' but with different dimension order.
dat1 <- array(c(1:30), dim = c(dat = 1, sdate = 3, ftime = 2, lon = 5)) print(dim(Reorder(dat1, c(2, 1, 4, 3)))) print(dim(Reorder(dat1, c('sdate', 'dat', 'lon', 'ftime')))) dat2 <- array(c(1:10), dim = c(2, 1, 5)) print(dim(Reorder(dat2, c(2, 1, 3)))) attr(dat2, 'dimensions') <- c('sdate', 'time', 'region') dat2_reorder <- Reorder(dat2, c('time', 'sdate', 'region')) # A character array dat3 <- array(paste0('a', 1:24), dim = c(b = 2, c = 3, d = 4)) dat3_reorder <- Reorder(dat3, c('d', 'c', 'b'))
dat1 <- array(c(1:30), dim = c(dat = 1, sdate = 3, ftime = 2, lon = 5)) print(dim(Reorder(dat1, c(2, 1, 4, 3)))) print(dim(Reorder(dat1, c('sdate', 'dat', 'lon', 'ftime')))) dat2 <- array(c(1:10), dim = c(2, 1, 5)) print(dim(Reorder(dat2, c(2, 1, 3)))) attr(dat2, 'dimensions') <- c('sdate', 'time', 'region') dat2_reorder <- Reorder(dat2, c('time', 'sdate', 'region')) # A character array dat3 <- array(paste0('a', 1:24), dim = c(b = 2, c = 3, d = 4)) dat3_reorder <- Reorder(dat3, c('d', 'c', 'b'))
The residual correlation assesses whether a forecast captures any of the observed variability that is not already captured by a reference forecast (Smith et al., 2019; https://doi.org/10.1038/s41612-019-0071-y.). The procedure is as follows: the residuals of the forecasts and observations are computed by linearly regressing out the reference forecast's ensemble mean from the forecasts' ensemble mean and observations, respectively. Then, the residual correlation is computed as the correlation between both residuals. Positive values of the residual correlation indicate that the forecast capture more observed variability than the reference forecast, while negative values mean that the reference forecast capture more. The significance of the residual correlation is computed with a two-sided t-test (Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7) using an effective degrees of freedom to account for the time series' autocorrelation (von Storch and Zwiers, 1999; https://doi.org/10.1017/CBO9780511612336).
ResidualCorr( exp, obs, ref, N.eff = NA, time_dim = "sdate", memb_dim = NULL, method = "pearson", alpha = 0.05, handle.na = "return.na", pval = TRUE, sign = FALSE, ncores = NULL )
ResidualCorr( exp, obs, ref, N.eff = NA, time_dim = "sdate", memb_dim = NULL, method = "pearson", alpha = 0.05, handle.na = "return.na", pval = TRUE, sign = FALSE, ncores = NULL )
exp |
A named numerical array of the forecast with at least time dimension. |
obs |
A named numerical array of the observations with at least time dimension. The dimensions must be the same as "exp" except 'memb_dim'. |
ref |
A named numerical array of the reference forecast data with at least time dimension. The dimensions must be the same as "exp" except 'memb_dim'. |
N.eff |
Effective sample size to be used in the statistical significance test. It can be NA (and it will be computed with the s2dv:::.Eno), a numeric (which is used for all cases), or an array with the same dimensions as "obs" except "time_dim" (for a particular N.eff to be used for each case) . The default value is NA. |
time_dim |
A character string indicating the name of the time dimension. The default value is 'year'. |
memb_dim |
A character string indicating the name of the member dimension to compute the ensemble mean of the forecast and reference forecast. If it is NULL, the ensemble mean should be provided directly to the function. The default value is NULL. |
method |
A character string indicating the correlation coefficient to be computed ("pearson", "kendall", or "spearman"). The default value is "pearson". |
alpha |
A numeric of the significance level to be used in the statistical significance test (output "sign"). The default value is 0.05. |
handle.na |
A charcater string indicating how to handle missing values. If "return.na", NAs will be returned for the cases that contain at least one NA in "exp", "ref", or "obs". If "only.complete.triplets", only the time steps with no missing values in all "exp", "ref", and "obs" will be used. If "na.fail", an error will arise if any of "exp", "ref", or "obs" contains any NA. The default value is "return.na". |
pval |
A logical value indicating whether to return the p-value of the significance test Ho: DiffCorr = 0. The default value is TRUE. |
sign |
A logical value indicating whether to return the statistical significance of the test Ho: DiffCorr = 0 based on 'alpha'. The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list with:
$res.corr |
A numerical array of the residual correlation with the same dimensions as the input arrays except "time_dim" (and "memb_dim" if provided). |
$sign |
A logical array indicating whether the residual correlation is statistically significant or not with the same dimensions as the input arrays except "time_dim" (and "memb_dim" if provided). Returned only if "sign" is TRUE. |
$p.val |
A numeric array of the p-values with the same dimensions as the input arrays except "time_dim" (and "memb_dim" if provided). Returned only if "pval" is TRUE. |
exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) ref <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 5, sdate = 50)) res <- ResidualCorr(exp = exp, obs = obs, ref = ref, memb_dim = 'member')
exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) ref <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 5, sdate = 50)) res <- ResidualCorr(exp = exp, obs = obs, ref = ref, memb_dim = 'member')
Compute the root mean square error for an array of forecasts and an array of
observations. The RMSEs are computed along time_dim, the dimension which
corresponds to the start date dimension. If comp_dim is given, the RMSEs are
computed only if obs along the comp_dim dimension are complete between
limits[1] and limits[2], i.e. there are no NAs between limits[1] and
limits[2]. This option can be activated if the user wishes to account only
for the forecasts for which the corresponding observations are available at
all leadtimes.
The confidence interval is computed by the chi2 distribution.
RMS( exp, obs, time_dim = "sdate", memb_dim = NULL, dat_dim = NULL, comp_dim = NULL, limits = NULL, conf = TRUE, alpha = 0.05, ncores = NULL )
RMS( exp, obs, time_dim = "sdate", memb_dim = NULL, dat_dim = NULL, comp_dim = NULL, limits = NULL, conf = TRUE, alpha = 0.05, ncores = NULL )
exp |
A named numeric array of experimental data, with at least 'time_dim' dimension. It can also be a vector with the same length as 'obs'. |
obs |
A named numeric array of observational data, same dimensions as parameter 'exp' except along 'dat_dim' and 'memb_dim'. It can also be a vector with the same length as 'exp'. |
time_dim |
A character string indicating the name of dimension along which the correlations are computed. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the ensemble mean; it should be set to NULL if the input data are already the ensemble mean. The default value is NULL. |
dat_dim |
A character string indicating the name of dataset or member (nobs/nexp) dimension. The datasets of exp and obs will be paired and computed RMS for each pair. The default value is NULL. |
comp_dim |
A character string indicating the name of dimension along which obs is taken into account only if it is complete. The default value is NULL. |
limits |
A vector of two integers indicating the range along comp_dim to be completed. The default value is c(1, length(comp_dim dimension)). |
conf |
A logical value indicating whether to retrieve the confidence intervals or not. The default value is TRUE. |
alpha |
A numeric indicating the significance level for the statistical significance test. The default value is 0.05. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing the numeric arrays with dimension:
c(nexp, nobs, all other dimensions of exp except time_dim).
nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the
number of observation (i.e., dat_dim in obs). If dat_dim is NULL, nexp and
nobs are omitted.
$rms |
The root mean square error. |
$conf.lower |
The lower confidence interval. Only present if |
$conf.upper |
The upper confidence interval. Only present if |
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = 12, time_dim = 'ftime') smooth_ano_obs <- Smoothing(ano_obs, runmeanlen = 12, time_dim = 'ftime') res <- RMS(smooth_ano_exp, smooth_ano_obs, memb_dim = 'member', comp_dim = 'ftime', limits = c(7, 54)) # Synthetic data: exp1 <- array(rnorm(120), dim = c(dat = 3, sdate = 10, ftime = 4)) obs1 <- array(rnorm(80), dim = c(dat = 2, sdate = 10, ftime = 4)) na <- floor(runif(10, min = 1, max = 80)) obs1[na] <- NA res1 <- RMS(exp1, obs1, comp_dim = 'ftime', dat_dim = 'dat') exp2 <- array(rnorm(20), dim = c(sdate = 5, member = 4)) obs2 <- array(rnorm(10), dim = c(sdate = 5, member = 2)) res2 <- RMS(exp2, obs2, memb_dim = 'member')
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = 12, time_dim = 'ftime') smooth_ano_obs <- Smoothing(ano_obs, runmeanlen = 12, time_dim = 'ftime') res <- RMS(smooth_ano_exp, smooth_ano_obs, memb_dim = 'member', comp_dim = 'ftime', limits = c(7, 54)) # Synthetic data: exp1 <- array(rnorm(120), dim = c(dat = 3, sdate = 10, ftime = 4)) obs1 <- array(rnorm(80), dim = c(dat = 2, sdate = 10, ftime = 4)) na <- floor(runif(10, min = 1, max = 80)) obs1[na] <- NA res1 <- RMS(exp1, obs1, comp_dim = 'ftime', dat_dim = 'dat') exp2 <- array(rnorm(20), dim = c(sdate = 5, member = 4)) obs2 <- array(rnorm(10), dim = c(sdate = 5, member = 2)) res2 <- RMS(exp2, obs2, memb_dim = 'member')
Compute the root mean square error skill score (RMSSS) between an array of
forecast 'exp' and an array of observation 'obs'. The two arrays should
have the same dimensions except along 'dat_dim' and 'memb_dim'. The RMSSSs
are computed along 'time_dim', the dimension which corresponds to the start
date dimension.
RMSSS computes the root mean square error skill score of each exp in 1:nexp
against each obs in 1:nobs which gives nexp * nobs RMSSS for each grid point
of the array.
The p-value and significance test are optionally provided by an one-sided
Fisher test or Random Walk test.
RMSSS( exp, obs, ref = NULL, time_dim = "sdate", dat_dim = NULL, memb_dim = NULL, pval = TRUE, sign = FALSE, alpha = 0.05, N.eff = NA, sig_method = "one-sided Fisher", sig_method.type = NULL, ncores = NULL )
RMSSS( exp, obs, ref = NULL, time_dim = "sdate", dat_dim = NULL, memb_dim = NULL, pval = TRUE, sign = FALSE, alpha = 0.05, N.eff = NA, sig_method = "one-sided Fisher", sig_method.type = NULL, ncores = NULL )
exp |
A named numeric array of experimental data which contains at least time dimension (time_dim). It can also be a vector with the same length as 'obs', then the vector will automatically be 'time_dim'. |
obs |
A named numeric array of observational data which contains at least time dimension (time_dim). The dimensions should be the same as parameter 'exp' except the length of 'dat_dim' and 'memb_dim' dimension. It can also be a vector with the same length as 'exp', then the vector will automatically be 'time_dim'. |
ref |
A named numerical array of the reference forecast data with at least time dimension, or 0 (typical climatological forecast) or 1 (normalized climatological forecast). If it is an array, the dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. If there is only one reference dataset, it should not have dataset dimension. If there is corresponding reference for each experiment, the dataset dimension must have the same length as in 'exp'. If 'ref' is NULL, the typical climatological forecast is used as reference forecast (equivalent to 0.) The default value is NULL. |
time_dim |
A character string indicating the name of dimension along which the RMSSS are computed. The default value is 'sdate'. |
dat_dim |
A character string indicating the name of dataset (nobs/nexp) dimension. The default value is NULL. |
memb_dim |
A character string indicating the name of the member dimension to compute the ensemble mean; it should be set to NULL if the data are already the ensemble mean. The default value is NULL. |
pval |
A logical value indicating whether to compute or not the p-value of the test Ho: RMSSS = 0. The default value is TRUE. |
sign |
A logical value indicating whether to compute or not the statistical significance of the test Ho: RMSSS = 0. The default value is FALSE. |
alpha |
A numeric of the significance level to be used in the statistical significance test. The default value is 0.05. |
N.eff |
Effective sample size to be used in the statistical significance test with the Random Walk. It can be NA (and it will be computed with the s2dv:::.Eno), FALSE (and it will use the length of 'obs' along 'time_dim', so the autocorrelation is not taken into account), a numeric (which is used for all cases), or an array with the same dimensions as 'obs' except 'time_dim' (for a particular N.eff to be used for each case). The default value is NA. |
sig_method |
A character string indicating the significance method. The options are "one-sided Fisher" (default) and "Random Walk". |
sig_method.type |
A character string indicating the test type of the
significance method. Check |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing the numeric arrays with dimension:
c(nexp, nobs, all other dimensions of exp except time_dim).
nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the
number of observation (i.e., dat_dim in obs). If dat_dim is NULL, nexp and
nobs are omitted.
$rmsss |
A numerical array of the root mean square error skill score. |
$p.val |
A numerical array of the p-value with the same dimensions as $rmsss.
Only present if |
sign |
A logical array of the statistical significance of the RMSSS with the same
dimensions as $rmsss. Only present if |
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) rmsss <- RMSSS(ano_exp, ano_obs, dat_dim = 'dataset', memb_dim = 'member') set.seed(1) exp1 <- array(rnorm(30), dim = c(dataset = 2, time = 3, memb = 5)) set.seed(2) obs1 <- array(rnorm(15), dim = c(time = 3, memb = 5, dataset = 1)) res1 <- RMSSS(exp1, obs1, time_dim = 'time', dat_dim = 'dataset') exp2 <- array(rnorm(30), dim = c(lat = 2, time = 3, memb = 5)) obs2 <- array(rnorm(15), dim = c(time = 3, lat = 2)) res2 <- RMSSS(exp2, obs2, time_dim = 'time', memb_dim = 'memb') exp3 <- array(rnorm(30), dim = c(lat = 2, time = 3)) obs3 <- array(rnorm(15), dim = c(time = 3, lat = 2)) res3 <- RMSSS(exp3, obs3, time_dim = 'time')
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) rmsss <- RMSSS(ano_exp, ano_obs, dat_dim = 'dataset', memb_dim = 'member') set.seed(1) exp1 <- array(rnorm(30), dim = c(dataset = 2, time = 3, memb = 5)) set.seed(2) obs1 <- array(rnorm(15), dim = c(time = 3, memb = 5, dataset = 1)) res1 <- RMSSS(exp1, obs1, time_dim = 'time', dat_dim = 'dataset') exp2 <- array(rnorm(30), dim = c(lat = 2, time = 3, memb = 5)) obs2 <- array(rnorm(15), dim = c(time = 3, lat = 2)) res2 <- RMSSS(exp2, obs2, time_dim = 'time', memb_dim = 'memb') exp3 <- array(rnorm(30), dim = c(lat = 2, time = 3)) obs3 <- array(rnorm(15), dim = c(time = 3, lat = 2)) res3 <- RMSSS(exp3, obs3, time_dim = 'time')
The Relative Operating Characteristic Skill Score (ROCSS; Kharin and Zwiers,
2003) is based on the ROC curve, which gives information about the hit rates
against the false-alarm rates for a particular category or event. The ROC
curve can be summarized with the area under the ROC curve, known as the ROC
score, to provide a skill value for each category. The ROCSS ranges between
minus infinite and 1. A positive ROCSS value indicates that the forecast has
higher skill than the reference forecast, meaning the contrary otherwise.
The function accepts either the data or the probabilities of each data as
inputs. If there is more than one dataset, RPSS will be computed for each pair
of exp and obs data.
ROCSS( exp, obs, ref = NULL, time_dim = "sdate", memb_dim = "member", cat_dim = NULL, dat_dim = NULL, prob_thresholds = c(1/3, 2/3), indices_for_clim = NULL, cross.val = FALSE, ncores = NULL )
ROCSS( exp, obs, ref = NULL, time_dim = "sdate", memb_dim = "member", cat_dim = NULL, dat_dim = NULL, prob_thresholds = c(1/3, 2/3), indices_for_clim = NULL, cross.val = FALSE, ncores = NULL )
exp |
A named numerical array of either the forecast with at least time
and member dimensions, or the probabilities with at least time and category
dimensions. The probabilities can be generated by |
obs |
A named numerical array of either the observations with at least
time dimension, or the probabilities with at least time and category
dimensions. The probabilities can be generated by |
ref |
A named numerical array of the reference forecast with at least
time and member dimensions, or the probabilities with at least time and
category dimensions. The probability can be generated by
|
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the probabilities of the forecast and the reference forecast. The default value is 'member'. If the data are probabilities, set memb_dim as NULL. |
cat_dim |
A character string indicating the name of the category dimension that is needed when exp, obs, and ref are probabilities. The default value is NULL, which means that the data are not probabilities. |
dat_dim |
A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL. |
prob_thresholds |
A numeric vector of the relative thresholds (from 0 to 1) between the categories. The default value is c(1/3, 2/3), which corresponds to tercile equiprobable categories. |
indices_for_clim |
A vector of the indices to be taken along 'time_dim' for computing the thresholds between the probabilistic categories. If NULL, the whole period is used. The default value is NULL. |
cross.val |
A logical indicating whether to compute the thresholds between probabilistic categories in cross-validation. The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array of ROCSS with dimensions c(nexp, nobs, cat, the rest
dimensions of 'exp' except 'time_dim' and 'memb_dim' dimensions). nexp is the
number of experiment (i.e., dat_dim in exp), and nobs is the number of
observation (i.e., dat_dim in obs). If dat_dim is NULL, nexp and nobs are
omitted. dimension 'cat' refers to the probabilistic category, i.e.,
1 + length(prob_thresholds)
.
Kharin, V. V. and Zwiers, F. W. (2003): https://doi.org/10.1175/1520-0442(2003)016
# Use data as input exp <- array(rnorm(1000), dim = c(lon = 3, lat = 2, sdate = 60, member = 10)) ref <- array(rnorm(1000), dim = c(lon = 3, lat = 2, sdate = 60, member = 10)) obs <- array(rnorm(1000), dim = c(lon = 3, lat = 2, sdate = 60)) ROCSS(exp = exp, obs = obs) ## random forecast as reference forecast ROCSS(exp = exp, obs = obs, ref = ref) ## ref as reference forecast # Use probs as input exp_probs <- GetProbs(exp, memb_dim = 'member') obs_probs <- GetProbs(obs, memb_dim = NULL) ref_probs <- GetProbs(ref, memb_dim = 'member') ROCSS(exp = exp_probs, obs = obs_probs, ref = ref_probs, memb_dim = NULL, cat_dim = 'bin')
# Use data as input exp <- array(rnorm(1000), dim = c(lon = 3, lat = 2, sdate = 60, member = 10)) ref <- array(rnorm(1000), dim = c(lon = 3, lat = 2, sdate = 60, member = 10)) obs <- array(rnorm(1000), dim = c(lon = 3, lat = 2, sdate = 60)) ROCSS(exp = exp, obs = obs) ## random forecast as reference forecast ROCSS(exp = exp, obs = obs, ref = ref) ## ref as reference forecast # Use probs as input exp_probs <- GetProbs(exp, memb_dim = 'member') obs_probs <- GetProbs(obs, memb_dim = NULL) ref_probs <- GetProbs(ref, memb_dim = 'member') ROCSS(exp = exp_probs, obs = obs_probs, ref = ref_probs, memb_dim = NULL, cat_dim = 'bin')
The Ranked Probability Score (RPS; Wilks, 2011) is defined as the sum of the
squared differences between the cumulative forecast probabilities (computed
from the ensemble members) and the observations (defined as 0
did not happen and 100
of multi-categorical probabilistic forecasts. The RPS ranges between 0
(perfect forecast) and n-1 (worst possible forecast), where n is the number of
categories. In the case of a forecast divided into two categories (the lowest
number of categories that a probabilistic forecast can have), the RPS
corresponds to the Brier Score (BS; Wilks, 2011), therefore ranging between 0
and 1.
The function first calculates the probabilities for forecasts and observations,
then use them to calculate RPS. Or, the probabilities of exp and obs can be
provided directly to compute the score. If there is more than one dataset, RPS
will be computed for each pair of exp and obs data. The fraction of acceptable
NAs can be adjusted.
RPS( exp, obs, time_dim = "sdate", memb_dim = "member", cat_dim = NULL, dat_dim = NULL, prob_thresholds = c(1/3, 2/3), indices_for_clim = NULL, Fair = FALSE, weights = NULL, cross.val = FALSE, return_mean = TRUE, na.rm = FALSE, ncores = NULL )
RPS( exp, obs, time_dim = "sdate", memb_dim = "member", cat_dim = NULL, dat_dim = NULL, prob_thresholds = c(1/3, 2/3), indices_for_clim = NULL, Fair = FALSE, weights = NULL, cross.val = FALSE, return_mean = TRUE, na.rm = FALSE, ncores = NULL )
exp |
A named numerical array of either the forecasts with at least time
and member dimensions, or the probabilities with at least time and category
dimensions. The probabilities can be generated by |
obs |
A named numerical array of either the observation with at least
time dimension, or the probabilities with at least time and category
dimensions. The probabilities can be generated by |
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the probabilities of the forecast. The default value is 'member'. If the data are probabilities, set memb_dim as NULL. |
cat_dim |
A character string indicating the name of the category dimension that is needed when the exp and obs are probabilities. The default value is NULL, which means that the data are not probabilities. |
dat_dim |
A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL. |
prob_thresholds |
A numeric vector of the relative thresholds (from 0 to 1) between the categories. The default value is c(1/3, 2/3), which corresponds to tercile equiprobable categories. |
indices_for_clim |
A vector of the indices to be taken along 'time_dim' for computing the thresholds between the probabilistic categories. If NULL, the whole period is used. The default value is NULL. |
Fair |
A logical indicating whether to compute the FairRPS (the potential RPS that the forecast would have with an infinite ensemble size). The default value is FALSE. |
weights |
A named numerical array of the weights for 'exp' probability calculation. If 'dat_dim' is NULL, the dimensions should include 'memb_dim' and 'time_dim'. Else, the dimension should also include 'dat_dim'. The default value is NULL. The ensemble should have at least 70 members or span at least 10 time steps and have more than 45 members if consistency between the weighted and unweighted methodologies is desired. |
cross.val |
A logical indicating whether to compute the thresholds between probabilistic categories in cross-validation. The default value is FALSE. |
return_mean |
A logical indicating whether to return the temporal mean of the RPS or not. If TRUE, the temporal mean is calculated along time_dim, if FALSE the time dimension is not aggregated. The default is TRUE. |
na.rm |
A logical or numeric value between 0 and 1. If it is numeric, it means the lower limit for the fraction of the non-NA values. 1 is equal to FALSE (no NA is acceptable), 0 is equal to TRUE (all NAs are acceptable). than na.rm. Otherwise, RPS will be calculated. The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array of RPS with dimensions c(nexp, nobs, the rest dimensions of 'exp' except 'time_dim' and 'memb_dim' dimensions). nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the number of observation (i.e., dat_dim in obs). If dat_dim is NULL, nexp and nobs are omitted.
Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7
# Use synthetic data exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) res <- RPS(exp = exp, obs = obs) # Use probabilities as inputs exp_probs <- GetProbs(exp, time_dim = 'sdate', memb_dim = 'member') obs_probs <- GetProbs(obs, time_dim = 'sdate', memb_dim = NULL) res2 <- RPS(exp = exp_probs, obs = obs_probs, memb_dim = NULL, cat_dim = 'bin')
# Use synthetic data exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50)) res <- RPS(exp = exp, obs = obs) # Use probabilities as inputs exp_probs <- GetProbs(exp, time_dim = 'sdate', memb_dim = 'member') obs_probs <- GetProbs(obs, time_dim = 'sdate', memb_dim = NULL) res2 <- RPS(exp = exp_probs, obs = obs_probs, memb_dim = NULL, cat_dim = 'bin')
The Ranked Probability Skill Score (RPSS; Wilks, 2011) is the skill score
based on the Ranked Probability Score (RPS; Wilks, 2011). It can be used to
assess whether a forecast presents an improvement or worsening with respect to
a reference forecast. The RPSS ranges between minus infinite and 1. If the
RPSS is positive, it indicates that the forecast has higher skill than the
reference forecast, while a negative value means that it has a lower skill.
Examples of reference forecasts are the climatological forecast (same
probabilities for all categories for all time steps), persistence, a previous
model version, and another model. It is computed as
RPSS = 1 - RPS_exp / RPS_ref
. The statistical significance is obtained
based on a Random Walk test at the specified confidence level (DelSole and
Tippett, 2016).
The function accepts either the ensemble members or the probabilities of
each data as inputs. If there is more than one dataset, RPSS will be
computed for each pair of exp and obs data. The NA ratio of data will be
examined before the calculation. If the ratio is higher than the threshold
(assigned by parameter na.rm
), NA will be returned directly. NAs are
counted by per-pair method, which means that only the time steps that all the
datasets have values count as non-NA values.
RPSS( exp, obs, ref = NULL, time_dim = "sdate", memb_dim = "member", cat_dim = NULL, dat_dim = NULL, prob_thresholds = c(1/3, 2/3), indices_for_clim = NULL, Fair = FALSE, weights_exp = NULL, weights_ref = NULL, cross.val = FALSE, na.rm = FALSE, sig_method.type = "two.sided.approx", alpha = 0.05, N.eff = NA, ncores = NULL )
RPSS( exp, obs, ref = NULL, time_dim = "sdate", memb_dim = "member", cat_dim = NULL, dat_dim = NULL, prob_thresholds = c(1/3, 2/3), indices_for_clim = NULL, Fair = FALSE, weights_exp = NULL, weights_ref = NULL, cross.val = FALSE, na.rm = FALSE, sig_method.type = "two.sided.approx", alpha = 0.05, N.eff = NA, ncores = NULL )
exp |
A named numerical array of either the forecast with at least time
and member dimensions, or the probabilities with at least time and category
dimensions. The probabilities can be generated by |
obs |
A named numerical array of either the observation with at least
time dimension, or the probabilities with at least time and category
dimensions. The probabilities can be generated by |
ref |
A named numerical array of either the reference forecast with at
least time and member dimensions, or the probabilities with at least time and
category dimensions. The probabilities can be generated by
|
time_dim |
A character string indicating the name of the time dimension. The default value is 'sdate'. |
memb_dim |
A character string indicating the name of the member dimension to compute the probabilities of the forecast and the reference forecast. The default value is 'member'. If the data are probabilities, set memb_dim as NULL. |
cat_dim |
A character string indicating the name of the category dimension that is needed when exp, obs, and ref are probabilities. The default value is NULL, which means that the data are not probabilities. |
dat_dim |
A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL. |
prob_thresholds |
A numeric vector of the relative thresholds (from 0 to 1) between the categories. The default value is c(1/3, 2/3), which corresponds to tercile equiprobable categories. |
indices_for_clim |
A vector of the indices to be taken along 'time_dim' for computing the thresholds between the probabilistic categories. If NULL, the whole period is used. The default value is NULL. |
Fair |
A logical indicating whether to compute the FairRPSS (the potential RPSS that the forecast would have with an infinite ensemble size). The default value is FALSE. |
weights_exp |
A named numerical array of the forecast ensemble weights for probability calculation. The dimension should include 'memb_dim', 'time_dim' and 'dat_dim' if there are multiple datasets. All dimension lengths must be equal to 'exp' dimension lengths. The default value is NULL, which means no weighting is applied. The ensemble should have at least 70 members or span at least 10 time steps and have more than 45 members if consistency between the weighted and unweighted methodologies is desired. |
weights_ref |
Same as 'weights_exp' but for the reference forecast. |
cross.val |
A logical indicating whether to compute the thresholds between probabilistics categories in cross-validation. The default value is FALSE. |
na.rm |
A logical or numeric value between 0 and 1. If it is numeric, it means the lower limit for the fraction of the non-NA values. 1 is equal to FALSE (no NA is acceptable), 0 is equal to TRUE (all NAs are acceptable). than na.rm. Otherwise, RPS will be calculated. The default value is FALSE. |
sig_method.type |
A character string indicating the test type of the
significance method. Check |
alpha |
A numeric of the significance level to be used in the statistical significance test. The default value is 0.05. |
N.eff |
Effective sample size to be used in the statistical significance test. It can be NA (and it will be computed with the s2dv:::.Eno), FALSE (and it will use the length of 'obs' along 'time_dim', so the autocorrelation is not taken into account), a numeric (which is used for all cases), or an array with the same dimensions as 'obs' except 'time_dim' (for a particular N.eff to be used for each case). The default value is NA. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
$rpss |
A numerical array of RPSS with dimensions c(nexp, nobs, the rest dimensions of 'exp' except 'time_dim' and 'memb_dim' dimensions). nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the number of observation i.e., dat_dim in obs). If dat_dim is NULL, nexp and nobs are omitted. |
$sign |
A logical array of the statistical significance of the RPSS with the same dimensions as $rpss. |
Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7 DelSole and Tippett, 2016; https://doi.org/10.1175/MWR-D-15-0218.1
set.seed(1) exp <- array(rnorm(3000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) set.seed(2) obs <- array(rnorm(300), dim = c(lat = 3, lon = 2, sdate = 50)) set.seed(3) ref <- array(rnorm(3000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) weights <- sapply(1:dim(exp)['sdate'], function(i) { n <- abs(rnorm(10)) n/sum(n) }) dim(weights) <- c(member = 10, sdate = 50) # Use data as input res <- RPSS(exp = exp, obs = obs) ## climatology as reference forecast res <- RPSS(exp = exp, obs = obs, ref = ref) ## ref as reference forecast res <- RPSS(exp = exp, obs = obs, ref = ref, weights_exp = weights, weights_ref = weights) res <- RPSS(exp = exp, obs = obs, alpha = 0.01, sig_method.type = 'two.sided') # Use probs as input exp_probs <- GetProbs(exp, memb_dim = 'member') obs_probs <- GetProbs(obs, memb_dim = NULL) ref_probs <- GetProbs(ref, memb_dim = 'member') res <- RPSS(exp = exp_probs, obs = obs_probs, ref = ref_probs, memb_dim = NULL, N.eff = FALSE, cat_dim = 'bin')
set.seed(1) exp <- array(rnorm(3000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) set.seed(2) obs <- array(rnorm(300), dim = c(lat = 3, lon = 2, sdate = 50)) set.seed(3) ref <- array(rnorm(3000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50)) weights <- sapply(1:dim(exp)['sdate'], function(i) { n <- abs(rnorm(10)) n/sum(n) }) dim(weights) <- c(member = 10, sdate = 50) # Use data as input res <- RPSS(exp = exp, obs = obs) ## climatology as reference forecast res <- RPSS(exp = exp, obs = obs, ref = ref) ## ref as reference forecast res <- RPSS(exp = exp, obs = obs, ref = ref, weights_exp = weights, weights_ref = weights) res <- RPSS(exp = exp, obs = obs, alpha = 0.01, sig_method.type = 'two.sided') # Use probs as input exp_probs <- GetProbs(exp, memb_dim = 'member') obs_probs <- GetProbs(obs, memb_dim = NULL) ref_probs <- GetProbs(ref, memb_dim = 'member') res <- RPSS(exp = exp_probs, obs = obs_probs, ref = ref_probs, memb_dim = NULL, N.eff = FALSE, cat_dim = 'bin')
This data set provides data in function of latitudes and depths for the
variable 'tos', i.e. sea surface temperature, from the decadal climate
prediction experiment run at IC3 in the context of the CMIP5 project.
Its name within IC3 local database is 'i00k'.
data(sampleDepthData)
data(sampleDepthData)
The data set provides with a variable named 'sampleDepthData'.
sampleDepthData$exp is an array that contains the experimental data and the
dimension meanings and values are:
c(# of experimental datasets, # of members, # of starting dates,
# of lead-times, # of depths, # of latitudes)
c(1, 5, 3, 60, 7, 21)
sampleDepthData$obs should be an array that contained the observational data
but in this sample is not defined (NULL).
sampleDepthData$depths is an array with the 7 longitudes covered by the data.
sampleDepthData$lat is an array with the 21 latitudes covered by the data.
This data set provides data in function of longitudes and latitudes for the variable 'tos', i.e. sea surface temperature, over the mediterranean zone from the sample experimental and observational datasets attached to the package. See examples on how to use Load() for details.
The data is provided through a variable named 'sampleMap' and is structured as expected from the 'Load()' function in the 's2dv' package if was called as follows:
data_path <- system.file('sample_data', package = 's2dv') exp <- list( name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/monthly_mean', '$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATES$.nc') ) obs <- list( name = 'observation', path = file.path(data_path, 'observation/$OBS_NAME$/monthly_mean', '$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc') ) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(exp), list(obs), startDates, leadtimemin = 1, leadtimemax = 4, output = 'lonlat', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40)
Check the documentation on 'Load()' in the package 's2dv' for more information.
data(sampleMap)
data(sampleMap)
The data set provides with a variable named 'sampleMap'.
sampleMap$mod is an array that contains the experimental data and the dimension meanings and values are:
c(# of experimental datasets, # of members, # of starting dates, # of lead-times, # of latitudes, # of longitudes)
c(1, 3, 5, 60, 2, 3)
sampleMap$obs is an array that contains the observational data and the dimension meanings and values are:
c(# of observational datasets, # of members, # of starting dates, # of lead-times, # of latitudes, # of longitudes)
c(1, 1, 5, 60, 2, 3)
sampleMap$lat is an array with the 2 latitudes covered by the data (see examples on Load() for details on why such low resolution).
sampleMap$lon is an array with the 3 longitudes covered by the data (see examples on Load() for details on why such low resolution).
This data set provides area averaged data for the variable 'tos', i.e. sea
surface temperature, over the mediterranean zone from the example datasets
attached to the package. See examples on Load() for more details.
The data is provided through a variable named 'sampleTimeSeries' and is
structured as expected from the 'Load()' function in the 's2dv'
package if was called as follows:
data_path <- system.file('sample_data', package = 's2dv') exp <- list( name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/monthly_mean', '$VAR_NAME$_3hourly/$VAR_NAME$_$START_DATES$.nc') ) obs <- list( name = 'observation', path = file.path(data_path, 'observation/$OBS_NAME$/monthly_mean', '$VAR_NAME$/$VAR_NAME$_$YEAR$$MONTH$.nc') ) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(exp), list(obs), startDates, output = 'areave', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40)
Check the documentation on 'Load()' in the package 's2dv' for more information.
data(sampleTimeSeries)
data(sampleTimeSeries)
The data set provides with a variable named 'sampleTimeSeries'.
sampleTimeSeries$mod is an array that contains the experimental data and the dimension meanings and values are:
c(# of experimental datasets, # of members, # of starting dates, # of lead-times)
c(1, 3, 5, 60)
sampleTimeSeries$obs is an array that contains the observational data and the dimension meanings and values are:
c(# of observational datasets, # of members, # of starting dates, # of lead-times)
c(1, 1, 5, 60)
sampleTimeSeries$lat is an array with the 2 latitudes covered by the data that was area averaged to calculate the time series (see examples on Load() for details on why such low resolution).
sampleTimeSeries$lon is an array with the 3 longitudes covered by the data that was area averaged to calculate the time series (see examples on Load() for details on why such low resolution).
Compute the seasonal mean (or other methods) on monthly time series along one dimension of a named multi-dimensional arrays. Partial season is not accounted.
Season( data, monini, moninf, monsup, time_dim = "ftime", method = mean, na.rm = TRUE, ncores = NULL )
Season( data, monini, moninf, monsup, time_dim = "ftime", method = mean, na.rm = TRUE, ncores = NULL )
data |
A named numeric array with at least one dimension 'time_dim'. |
monini |
An integer indicating what the first month of the time series is. It can be from 1 to 12. |
moninf |
An integer indicating the starting month of the seasonal calculation. It can be from 1 to 12. |
monsup |
An integer indicating the end month of the seasonal calculation. It can be from 1 to 12. |
time_dim |
A character string indicating the name of dimension along which the seasonal mean or other calculations are computed. The default value is 'ftime'. |
method |
An R function to be applied for seasonal calculation. For example, 'sum' can be used for total precipitation. The default value is mean. |
na.rm |
A logical value indicating whether to remove NA values along 'time_dim' when calculating climatology (TRUE) or return NA if there is NA along 'time_dim' (FALSE). The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
An array with the same dimensions as data except along the 'time_dim' dimension, of which the length changes to the number of seasons.
set.seed(1) dat1 <- array(rnorm(144 * 3), dim = c(member = 2, sdate = 2, ftime = 12*3, lon = 3)) res <- Season(data = dat1, monini = 1, moninf = 1, monsup = 2) res <- Season(data = dat1, monini = 10, moninf = 12, monsup = 2) dat2 <- dat1 set.seed(2) na <- floor(runif(30, min = 1, max = 144 * 3)) dat2[na] <- NA res <- Season(data = dat2, monini = 3, moninf = 1, monsup = 2) res <- Season(data = dat2, monini = 3, moninf = 1, monsup = 2, na.rm = FALSE)
set.seed(1) dat1 <- array(rnorm(144 * 3), dim = c(member = 2, sdate = 2, ftime = 12*3, lon = 3)) res <- Season(data = dat1, monini = 1, moninf = 1, monsup = 2) res <- Season(data = dat1, monini = 10, moninf = 12, monsup = 2) dat2 <- dat1 set.seed(2) na <- floor(runif(30, min = 1, max = 144 * 3)) dat2[na] <- NA res <- Season(data = dat2, monini = 3, moninf = 1, monsup = 2) res <- Season(data = dat2, monini = 3, moninf = 1, monsup = 2, na.rm = FALSE)
This function computes the signal-to-noise ratio, where the signal is the ensemble mean variance and the noise is the variance of the ensemble members about the ensemble mean (Eade et al., 2014; Scaife and Smith, 2018).
SignalNoiseRatio( data, time_dim = "year", member_dim = "member", na.rm = FALSE, ncores = NULL )
SignalNoiseRatio( data, time_dim = "year", member_dim = "member", na.rm = FALSE, ncores = NULL )
data |
A numerical array with, at least, 'time_dim' and 'member_dim' dimensions. |
time_dim |
A character string indicating the name of the time dimension in 'data'. The default value is 'year'. |
member_dim |
A character string indicating the name of the member dimension in 'data'. The default value is 'member'. |
na.rm |
A logical value indicating whether to remove NA values during the computation. The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
An array with of the signal-to-noise ratio. It has the same dimensions as 'data' except 'time_dim' and 'member_dim' dimensions.
exp <- array(data = runif(600), dim = c(year = 15, member = 10, lat = 2, lon = 2)) SignalNoiseRatio(exp)
exp <- array(data = runif(600), dim = c(year = 15, member = 10, lat = 2, lon = 2)) SignalNoiseRatio(exp)
Smooth an array of any number of dimensions along one dimension.
Smoothing(data, time_dim = "ftime", runmeanlen = 12, ncores = NULL)
Smoothing(data, time_dim = "ftime", runmeanlen = 12, ncores = NULL)
data |
A numerical array to be smoothed along one of its dimension (typically the forecast time dimension). |
time_dim |
A character string indicating the name of the dimension to be smoothed along. The default value is 'ftime'. |
runmeanlen |
An integer indicating the running mean length of sampling units (typically months). The default value is 12. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array with the same dimensions as parameter 'data' but the 'time_dim' dimension is moved to the first. The head and tail part which do not have enough neighboring data for smoothing is assigned as NA.
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) smooth_ano_exp <- Smoothing(ano_exp, time_dim = 'ftime', runmeanlen = 12) smooth_ano_obs <- Smoothing(ano_obs, time_dim = 'ftime', runmeanlen = 12) smooth_ano_exp <- Reorder(smooth_ano_exp, c(2, 3, 4, 1)) smooth_ano_obs <- Reorder(smooth_ano_obs, c(2, 3, 4, 1)) ## Not run: PlotAno(smooth_ano_exp, smooth_ano_obs, startDates, toptitle = "Smoothed Mediterranean mean SST", ytitle = "K") ## End(Not run)
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) ano_obs <- Ano(sampleData$obs, clim$clim_obs) smooth_ano_exp <- Smoothing(ano_exp, time_dim = 'ftime', runmeanlen = 12) smooth_ano_obs <- Smoothing(ano_obs, time_dim = 'ftime', runmeanlen = 12) smooth_ano_exp <- Reorder(smooth_ano_exp, c(2, 3, 4, 1)) smooth_ano_obs <- Reorder(smooth_ano_obs, c(2, 3, 4, 1)) ## Not run: PlotAno(smooth_ano_exp, smooth_ano_obs, startDates, toptitle = "Smoothed Mediterranean mean SST", ytitle = "K") ## End(Not run)
Estimate the frequency spectrum of the data array together with a
user-specified confidence level. The output is provided as an array with
dimensions c(number of frequencies, stats = 3, other margin dimensions of
data). The 'stats' dimension contains the frequencies at which the spectral
density is estimated, the estimates of the spectral density, and the
significance level.
The spectrum estimation relies on an R built-in function spectrum()
and the confidence interval is estimated by the Monte-Carlo method.
Spectrum(data, time_dim = "ftime", alpha = 0.05, ncores = NULL)
Spectrum(data, time_dim = "ftime", alpha = 0.05, ncores = NULL)
data |
A vector or numeric array of which the frequency spectrum is required. If it's a vector, it should be a time series. If it's an array, the dimensions must have at least 'time_dim'. The data is assumed to be evenly spaced in time. |
time_dim |
A character string indicating the dimension along which to compute the frequency spectrum. The default value is 'ftime'. |
alpha |
A numeric indicating the significance level for the Monte-Carlo significance test. The default value is 0.05. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numeric array of the frequency spectrum with dimensions c(<time_dim> = number of frequencies, stats = 3, the rest of the dimensions of 'data'). The 'stats' dimension contains the frequency values, the spectral density, and the confidence interval.
# Load sample data as in Load() example: example(Load) ensmod <- MeanDims(sampleData$mod, 2) spectrum <- Spectrum(ensmod) for (jsdate in 1:dim(spectrum)['sdate']) { for (jlen in 1:dim(spectrum)['ftime']) { if (spectrum[jlen, 2, 1, jsdate] > spectrum[jlen, 3, 1, jsdate]) { ensmod[1, jsdate, ] <- Filter(ensmod[1, jsdate, ], spectrum[jlen, 1, 1, jsdate]) } } } PlotAno(InsertDim(ensmod, 2, 1), sdates = startDates)
# Load sample data as in Load() example: example(Load) ensmod <- MeanDims(sampleData$mod, 2) spectrum <- Spectrum(ensmod) for (jsdate in 1:dim(spectrum)['sdate']) { for (jlen in 1:dim(spectrum)['ftime']) { if (spectrum[jlen, 2, 1, jsdate] > spectrum[jlen, 3, 1, jsdate]) { ensmod[1, jsdate, ] <- Filter(ensmod[1, jsdate, ], spectrum[jlen, 1, 1, jsdate]) } } } PlotAno(InsertDim(ensmod, 2, 1), sdates = startDates)
The South Pacific Ocean Dipole (SPOD) index is related to the El Nino-Southern Oscillation (ENSO) and the Inderdecadal Pacific Oscillation (IPO). The SPOD index is computed as the difference of weighted-averaged SST anomalies over 20ºS-48ºS, 165ºE-190ºE (NW pole) and the weighted-averaged SST anomalies over 44ºS-65ºS, 220ºE-260ºE (SE pole) (Saurral et al., 2020). If different members and/or datasets are provided, the climatology (used to calculate the anomalies) is computed individually for all of them.
SPOD( data, data_lats, data_lons, type, lat_dim = "lat", lon_dim = "lon", mask = NULL, monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
SPOD( data, data_lats, data_lons, type, lat_dim = "lat", lon_dim = "lon", mask = NULL, monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
data |
A numerical array to be used for the index computation with, at least, the dimensions: 1) latitude, longitude, start date and forecast month (in case of decadal predictions), 2) latitude, longitude, year and month (in case of historical simulations or observations). This data has to be provided, at least, over the whole region needed to compute the index. |
data_lats |
A numeric vector indicating the latitudes of the data. |
data_lons |
A numeric vector indicating the longitudes of the data. |
type |
A character string indicating the type of data ('dcpp' for decadal predictions, 'hist' for historical simulations, or 'obs' for observations or reanalyses). |
lat_dim |
A character string of the name of the latitude dimension. The default value is 'lat'. |
lon_dim |
A character string of the name of the longitude dimension. The default value is 'lon'. |
mask |
An array of a mask (with 0's in the grid points that have to be masked) or NULL (i.e., no mask is used). This parameter allows to remove the values over land in case the dataset is a combination of surface air temperature over land and sea surface temperature over the ocean. Also, it can be used to mask those grid points that are missing in the observational dataset for a fair comparison between the forecast system and the reference dataset. The default value is NULL. |
monini |
An integer indicating the month in which the forecast system is initialized. Only used when parameter 'type' is 'dcpp'. The default value is 11, i.e., initialized in November. |
fmonth_dim |
A character string indicating the name of the forecast month dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'fmonth'. |
sdate_dim |
A character string indicating the name of the start date dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'sdate'. |
indices_for_clim |
A numeric vector of the indices of the years to
compute the climatology for calculating the anomalies, or NULL so the
climatology is calculated over the whole period. If the data are already
anomalies, set it to FALSE. The default value is NULL. |
year_dim |
A character string indicating the name of the year dimension The default value is 'year'. Only used if parameter 'type' is 'hist' or 'obs'. |
month_dim |
A character string indicating the name of the month dimension. The default value is 'month'. Only used if parameter 'type' is 'hist' or 'obs'. |
na.rm |
A logical value indicanting whether to remove NA values. The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array with the SPOD index with the same dimensions as data except the lat_dim, lon_dim and fmonth_dim (month_dim) in case of decadal predictions (historical simulations or observations). In case of decadal predictions, a new dimension 'fyear' is added.
## Observations or reanalyses obs <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_obs <- SPOD(data = obs, data_lats = lat, data_lons = lon, type = 'obs') ## Historical simulations hist <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_hist <- SPOD(data = hist, data_lats = lat, data_lons = lon, type = 'hist') ## Decadal predictions dcpp <- array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_dcpp <- SPOD(data = dcpp, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1)
## Observations or reanalyses obs <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_obs <- SPOD(data = obs, data_lats = lat, data_lons = lon, type = 'obs') ## Historical simulations hist <- array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_hist <- SPOD(data = hist, data_lats = lat, data_lons = lon, type = 'hist') ## Decadal predictions dcpp <- array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) lat <- seq(-90, 90, 10) lon <- seq(0, 360, 10) index_dcpp <- SPOD(data = dcpp, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1)
Compute interquartile range, maximum-minimum, standard deviation and median
absolute deviation along the list of dimensions provided by the compute_dim
argument (typically along the ensemble member and start date dimension).
The confidence interval is computed by bootstrapping by 100 times. The input
data can be the output of Load()
, Ano()
, or
Ano_CrossValid()
, for example.
Spread( data, compute_dim = "member", na.rm = TRUE, conf = TRUE, alpha = 0.05, ncores = NULL )
Spread( data, compute_dim = "member", na.rm = TRUE, conf = TRUE, alpha = 0.05, ncores = NULL )
data |
A numeric vector or array with named dimensions to compute the statistics. The dimensions should at least include 'compute_dim'. |
compute_dim |
A vector of character strings of the dimension names along which to compute the statistics. The default value is 'member'. |
na.rm |
A logical value indicating if NAs should be removed (TRUE) or kept (FALSE) for computation. The default value is TRUE. |
conf |
A logical value indicating whether to compute the confidence intervals or not. The default value is TRUE. |
alpha |
A numeric of the significance level to be used in the statistical significance test. The default value is 0.05. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list of numeric arrays with the same dimensions as 'data' but without 'compute_dim' and with the first dimension 'stats'. If 'conf' is TRUE, the length of 'stats' is 3 corresponding to the lower limit of the confidence interval, the spread, and the upper limit of the confidence interval. If 'conf' is FALSE, the length of 'stats' is 1 corresponding to the spread.
$iqr |
InterQuartile Range. |
$maxmin |
Maximum - Minimum. |
$sd |
Standard Deviation. |
$mad |
Median Absolute Deviation. |
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) runmean_months <- 12 smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = runmean_months) smooth_ano_exp_m_sub <- smooth_ano_exp - InsertDim(MeanDims(smooth_ano_exp, 'member', na.rm = TRUE), posdim = 3, lendim = dim(smooth_ano_exp)['member'], name = 'member') suppressWarnings({ spread <- Spread(smooth_ano_exp_m_sub, compute_dim = c('member', 'sdate')) }) ## Not run: PlotVsLTime(Reorder(spread$iqr, c('dataset', 'stats', 'ftime')), toptitle = "Inter-Quartile Range between ensemble members", ytitle = "K", monini = 11, limits = NULL, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) PlotVsLTime(Reorder(spread$maxmin, c('dataset', 'stats', 'ftime')), toptitle = "Maximum minus minimum of the members", ytitle = "K", monini = 11, limits = NULL, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) PlotVsLTime(Reorder(spread$sd, c('dataset', 'stats', 'ftime')), toptitle = "Standard deviation of the members", ytitle = "K", monini = 11, limits = NULL, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) PlotVsLTime(Reorder(spread$mad, c('dataset', 'stats', 'ftime')), toptitle = "Median Absolute Deviation of the members", ytitle = "K", monini = 11, limits = NULL, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) ## End(Not run)
# Load sample data as in Load() example: example(Load) clim <- Clim(sampleData$mod, sampleData$obs) ano_exp <- Ano(sampleData$mod, clim$clim_exp) runmean_months <- 12 smooth_ano_exp <- Smoothing(ano_exp, runmeanlen = runmean_months) smooth_ano_exp_m_sub <- smooth_ano_exp - InsertDim(MeanDims(smooth_ano_exp, 'member', na.rm = TRUE), posdim = 3, lendim = dim(smooth_ano_exp)['member'], name = 'member') suppressWarnings({ spread <- Spread(smooth_ano_exp_m_sub, compute_dim = c('member', 'sdate')) }) ## Not run: PlotVsLTime(Reorder(spread$iqr, c('dataset', 'stats', 'ftime')), toptitle = "Inter-Quartile Range between ensemble members", ytitle = "K", monini = 11, limits = NULL, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) PlotVsLTime(Reorder(spread$maxmin, c('dataset', 'stats', 'ftime')), toptitle = "Maximum minus minimum of the members", ytitle = "K", monini = 11, limits = NULL, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) PlotVsLTime(Reorder(spread$sd, c('dataset', 'stats', 'ftime')), toptitle = "Standard deviation of the members", ytitle = "K", monini = 11, limits = NULL, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) PlotVsLTime(Reorder(spread$mad, c('dataset', 'stats', 'ftime')), toptitle = "Median Absolute Deviation of the members", ytitle = "K", monini = 11, limits = NULL, listexp = c('CMIP5 IC3'), listobs = c('ERSST'), biglab = FALSE, hlines = c(0)) ## End(Not run)
Compute the ratio between the spread of the members around the ensemble mean in experimental data and the RMSE between the ensemble mean of experimental and observational data. The p-value and/or the statistical significance is provided by a two-sided Fisher's test.
SprErr( exp, obs, dat_dim = NULL, memb_dim = "member", time_dim = "sdate", pval = TRUE, sign = FALSE, alpha = 0.05, na.rm = FALSE, ncores = NULL )
SprErr( exp, obs, dat_dim = NULL, memb_dim = "member", time_dim = "sdate", pval = TRUE, sign = FALSE, alpha = 0.05, na.rm = FALSE, ncores = NULL )
exp |
A named numeric array of experimental data with at least two dimensions 'memb_dim' and 'time_dim'. |
obs |
A named numeric array of observational data with at least two dimensions 'memb_dim' and 'time_dim'. It should have the same dimensions as parameter 'exp' except along 'dat_dim' and 'memb_dim'. |
dat_dim |
A character string indicating the name of dataset (nobs/nexp) dimension. The default value is NULL (no dataset). |
memb_dim |
A character string indicating the name of the member dimension. It must be one dimension in 'exp' and 'obs'. The default value is 'member'. |
time_dim |
A character string indicating the name of dimension along which the ratio is computed. The default value is 'sdate'. |
pval |
A logical value indicating whether to compute the p-value of the test Ho : SD/RMSE = 1 or not. The default value is TRUE. |
sign |
A logical value indicating whether to retrieve the statistical significance of the test Ho: ACC = 0 based on 'alpha'. The default value is FALSE. |
alpha |
A numeric indicating the significance level for the statistical significance test. The default value is 0.05. |
na.rm |
A logical value indicating whether to remove NA values. The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list of two arrays with dimensions c(nexp, nobs, the rest of
dimensions of 'exp' and 'obs' except memb_dim and time_dim), which nexp is
the length of dat_dim of 'exp' and nobs is the length of dat_dim of 'obs'.
If dat_dim is NULL, nexp and nobs are omitted.
$ratio |
The ratio of the ensemble spread and RMSE. |
$p_val |
The p-value of the two-sided Fisher's test with Ho: Spread/RMSE = 1. Only
present if |
exp <- array(rnorm(30), dim = c(lat = 2, sdate = 3, member = 5)) obs <- array(rnorm(30), dim = c(lat = 2, sdate = 3)) sprerr1 <- SprErr(exp, obs) sprerr2 <- SprErr(exp, obs, pval = FALSE, sign = TRUE) sprerr3 <- SprErr(exp, obs, pval = TRUE, sign = TRUE)
exp <- array(rnorm(30), dim = c(lat = 2, sdate = 3, member = 5)) obs <- array(rnorm(30), dim = c(lat = 2, sdate = 3)) sprerr1 <- SprErr(exp, obs) sprerr2 <- SprErr(exp, obs, pval = FALSE, sign = TRUE) sprerr3 <- SprErr(exp, obs, pval = TRUE, sign = TRUE)
Compute one of G. Villarini's statistically downscaled measure of mean
Atlantic hurricane activity and its variance. The hurricane activity is
estimated using seasonal averages of sea surface temperature anomalies over
the tropical Atlantic (bounded by 10N-25N and 80W-20W) and the tropics at
large (bounded by 30N-30S). The anomalies are for the JJASON season.
The estimated seasonal average is either 1) number of hurricanes, 2) number
of tropical cyclones with lifetime >=48h or 3) power dissipation index
(PDI; in 10^11 m^3 s^(-2)).
The statistical models used in this function are described in references.
StatSeasAtlHurr(atlano, tropano, hrvar = "HR", ncores = NULL)
StatSeasAtlHurr(atlano, tropano, hrvar = "HR", ncores = NULL)
atlano |
A numeric array with named dimensions of Atlantic sea surface temperature anomalies. It must have the same dimensions as 'tropano'. |
tropano |
A numeric array with named dimensions of tropical sea surface temperature anomalies. It must have the same dimensions as 'atlano'. |
hrvar |
A character string of the seasonal average to be estimated. The options are either "HR" (hurricanes), "TC" (tropical cyclones with lifetime >=48h), or "PDI" (power dissipation index). The default value is 'HR'. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list composed of two arrays with the same dimensions as 'atlano' and 'tropano'.
$mean |
The mean of the desired quantity. |
$var |
The variance of that quantity. |
Villarini et al. (2010) Mon Wea Rev, 138, 2681-2705.
Villarini et al. (2012) Mon Wea Rev, 140, 44-65.
Villarini et al. (2012) J Clim, 25, 625-637.
An example of how the function can be used in hurricane forecast studies
is given in
Caron, L.-P. et al. (2014) Multi-year prediction skill of Atlantic hurricane
activity in CMIP5 decadal hindcasts. Climate Dynamics, 42, 2675-2690.
doi:10.1007/s00382-013-1773-1.
# Let AtlAno represents 5 different 5-year forecasts of seasonally averaged # Atlantic sea surface temperature anomalies. AtlAno <- array(runif(25, -1, 1), dim = c(sdate = 5, ftime = 5)) # Let TropAno represents 5 corresponding 5-year forecasts of seasonally # averaged tropical sea surface temperature anomalies. TropAno <- array(runif(25, -1, 1), dim = c(sdate = 5, ftime = 5)) # The seasonal average of hurricanes for each of the five forecasted years, # for each forecast, would then be given by. hr_count <- StatSeasAtlHurr(atlano = AtlAno, tropano = TropAno, hrvar = 'HR')
# Let AtlAno represents 5 different 5-year forecasts of seasonally averaged # Atlantic sea surface temperature anomalies. AtlAno <- array(runif(25, -1, 1), dim = c(sdate = 5, ftime = 5)) # Let TropAno represents 5 corresponding 5-year forecasts of seasonally # averaged tropical sea surface temperature anomalies. TropAno <- array(runif(25, -1, 1), dim = c(sdate = 5, ftime = 5)) # The seasonal average of hurricanes for each of the five forecasted years, # for each forecast, would then be given by. hr_count <- StatSeasAtlHurr(atlano = AtlAno, tropano = TropAno, hrvar = 'HR')
The toymodel is based on the model presented in Weigel et al. (2008) QJRS with an extension to consider non-stationary distributions prescribing a linear trend. The toymodel allows to generate an aritifical forecast based on obsevations provided by the input (from Load) or artificially generated observations based on the input parameters (sig, trend). The forecast can be specfied for any number of start-dates, lead-time and ensemble members. It imitates components of a forecast: (1) predictabiltiy (2) forecast error (3) non-stationarity and (4) ensemble generation. The forecast can be computed for real observations or observations generated artifically.
ToyModel( alpha = 0.1, beta = 0.4, gamma = 1, sig = 1, trend = 0, nstartd = 30, nleadt = 4, nmemb = 10, obsini = NULL, fxerr = NULL )
ToyModel( alpha = 0.1, beta = 0.4, gamma = 1, sig = 1, trend = 0, nstartd = 30, nleadt = 4, nmemb = 10, obsini = NULL, fxerr = NULL )
alpha |
Predicabiltiy of the forecast on the observed residuals Must be a scalar 0 < alpha < 1. |
beta |
Standard deviation of forecast error Must be a scalar 0 < beta < 1. |
gamma |
Factor on the linear trend to sample model uncertainty. Can be a scalar or a vector of scalars -inf < gammay < inf. Defining a scalar results in multiple forecast, corresponding to different models with different trends. |
sig |
Standard deviation of the residual variability of the forecast. If observations are provided 'sig' is computed from the observations. |
trend |
Linear trend of the forecast. The same trend is used for each lead-time. If observations are provided the 'trend' is computed from the observations, with potentially different trends for each lead-time. The trend has no unit and needs to be defined according to the time vector [1,2,3,... nstartd]. |
nstartd |
Number of start-dates of the forecast. If observations are provided the 'nstartd' is computed from the observations. |
nleadt |
Number of lead-times of the forecats. If observations are provided the 'nleadt' is computed from the observations. |
nmemb |
Number of members of the forecasts. |
obsini |
Observations that can be used in the synthetic forecast coming from Load (anomalies are expected). If no observations are provided artifical observations are generated based on Gaussian variaiblity with standard deviation from 'sig' and linear trend from 'trend'. |
fxerr |
Provides a fixed error of the forecast instead of generating one from the level of beta. This allows to perform pair of forecasts with the same conditional error as required for instance in an attribution context. |
List of forecast with $mod including the forecast and $obs the observations. The dimensions correspond to c(length(gamma), nmemb, nstartd, nleadt)
# Example 1: Generate forecast with artifical observations # Seasonal prediction example a <- 0.1 b <- 0.3 g <- 1 sig <- 1 t <- 0.02 ntd <- 30 nlt <- 4 nm <- 10 toyforecast <- ToyModel(alpha = a, beta = b, gamma = g, sig = sig, trend = t, nstartd = ntd, nleadt = nlt, nmemb = nm) # Example 2: Generate forecast from loaded observations # Decadal prediction example ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, output = 'areave', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) a <- 0.1 b <- 0.3 g <- 1 nm <- 10 toyforecast <- ToyModel(alpha = a, beta = b, gamma = g, nmemb = nm, obsini = sampleData$obs, nstartd = 5, nleadt = 60) ## Add PlotAno() back when this function is included!! # \donttest{ #PlotAno(toyforecast$mod, toyforecast$obs, startDates, # toptitle = c("Synthetic decadal temperature prediction"), # fileout = "ex_toymodel.eps") # }
# Example 1: Generate forecast with artifical observations # Seasonal prediction example a <- 0.1 b <- 0.3 g <- 1 sig <- 1 t <- 0.02 ntd <- 30 nlt <- 4 nm <- 10 toyforecast <- ToyModel(alpha = a, beta = b, gamma = g, sig = sig, trend = t, nstartd = ntd, nleadt = nlt, nmemb = nm) # Example 2: Generate forecast from loaded observations # Decadal prediction example ## Not run: data_path <- system.file('sample_data', package = 's2dv') expA <- list(name = 'experiment', path = file.path(data_path, 'model/$EXP_NAME$/$STORE_FREQ$_mean/$VAR_NAME$_3hourly', '$VAR_NAME$_$START_DATE$.nc')) obsX <- list(name = 'observation', path = file.path(data_path, '$OBS_NAME$/$STORE_FREQ$_mean/$VAR_NAME$', '$VAR_NAME$_$YEAR$$MONTH$.nc')) # Now we are ready to use Load(). startDates <- c('19851101', '19901101', '19951101', '20001101', '20051101') sampleData <- Load('tos', list(expA), list(obsX), startDates, output = 'areave', latmin = 27, latmax = 48, lonmin = -12, lonmax = 40) ## End(Not run) a <- 0.1 b <- 0.3 g <- 1 nm <- 10 toyforecast <- ToyModel(alpha = a, beta = b, gamma = g, nmemb = nm, obsini = sampleData$obs, nstartd = 5, nleadt = 60) ## Add PlotAno() back when this function is included!! # \donttest{ #PlotAno(toyforecast$mod, toyforecast$obs, startDates, # toptitle = c("Synthetic decadal temperature prediction"), # fileout = "ex_toymodel.eps") # }
The Tripole Index (TPI) for the Interdecadal Pacific Oscillation (IPO) is computed as the difference of weighted-averaged SST anomalies over 10ºS-10ºN, 170ºE-270ºE minus the mean of the weighted-averaged SST anomalies over 25ºN-45ºN, 140ºE-215ºE and 50ºS-15ºS, 150ºE-200ºE (Henley et al., 2015). If different members and/or datasets are provided, the climatology (used to calculate the anomalies) is computed individually for all of them.
TPI( data, data_lats, data_lons, type, lat_dim = "lat", lon_dim = "lon", mask = NULL, monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
TPI( data, data_lats, data_lons, type, lat_dim = "lat", lon_dim = "lon", mask = NULL, monini = 11, fmonth_dim = "fmonth", sdate_dim = "sdate", indices_for_clim = NULL, year_dim = "year", month_dim = "month", na.rm = TRUE, ncores = NULL )
data |
A numerical array to be used for the index computation with, at least, the dimensions: 1) latitude, longitude, start date and forecast month (in case of decadal predictions), 2) latitude, longitude, year and month (in case of historical simulations or observations). This data has to be provided, at least, over the whole region needed to compute the index. |
data_lats |
A numeric vector indicating the latitudes of the data. |
data_lons |
A numeric vector indicating the longitudes of the data. |
type |
A character string indicating the type of data ('dcpp' for decadal predictions, 'hist' for historical simulations, or 'obs' for observations or reanalyses). |
lat_dim |
A character string of the name of the latitude dimension. The default value is 'lat'. |
lon_dim |
A character string of the name of the longitude dimension. The default value is 'lon'. |
mask |
An array of a mask (with 0's in the grid points that have to be masked) or NULL (i.e., no mask is used). This parameter allows to remove the values over land in case the dataset is a combination of surface air temperature over land and sea surface temperature over the ocean. Also, it can be used to mask those grid points that are missing in the observational dataset for a fair comparison between the forecast system and the reference dataset. The default value is NULL. |
monini |
An integer indicating the month in which the forecast system is initialized. Only used when parameter 'type' is 'dcpp'. The default value is 11, i.e., initialized in November. |
fmonth_dim |
A character string indicating the name of the forecast month dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'fmonth'. |
sdate_dim |
A character string indicating the name of the start date dimension. Only used if parameter 'type' is 'dcpp'. The default value is 'sdate'. |
indices_for_clim |
A numeric vector of the indices of the years to
compute the climatology for calculating the anomalies, or NULL so the
climatology is calculated over the whole period. If the data are already
anomalies, set it to FALSE. The default value is NULL. |
year_dim |
A character string indicating the name of the year dimension The default value is 'year'. Only used if parameter 'type' is 'hist' or 'obs'. |
month_dim |
A character string indicating the name of the month dimension. The default value is 'month'. Only used if parameter 'type' is 'hist' or 'obs'. |
na.rm |
A logical value indicanting whether to remove NA values. The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A numerical array with the TPI index with the same dimensions as data except the lat_dim, lon_dim and fmonth_dim (month_dim) in case of decadal predictions (historical simulations or observations). In case of decadal predictions, a new dimension 'fyear' is added.
## Observations or reanalyses obs = array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) lat = seq(-90, 90, 10) lon = seq(0, 360, 10) index_obs = TPI(data = obs, data_lats = lat, data_lons = lon, type = 'obs') ## Historical simulations hist = array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) lat = seq(-90, 90, 10) lon = seq(0, 360, 10) index_hist = TPI(data = hist, data_lats = lat, data_lons = lon, type = 'hist') ## Decadal predictions dcpp = array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) lat = seq(-90, 90, 10) lon = seq(0, 360, 10) index_dcpp = TPI(data = dcpp, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1)
## Observations or reanalyses obs = array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12)) lat = seq(-90, 90, 10) lon = seq(0, 360, 10) index_obs = TPI(data = obs, data_lats = lat, data_lons = lon, type = 'obs') ## Historical simulations hist = array(1:100, dim = c(year = 5, lat = 19, lon = 37, month = 12, member = 5)) lat = seq(-90, 90, 10) lon = seq(0, 360, 10) index_hist = TPI(data = hist, data_lats = lat, data_lons = lon, type = 'hist') ## Decadal predictions dcpp = array(1:100, dim = c(sdate = 5, lat = 19, lon = 37, fmonth = 24, member = 5)) lat = seq(-90, 90, 10) lon = seq(0, 360, 10) index_dcpp = TPI(data = dcpp, data_lats = lat, data_lons = lon, type = 'dcpp', monini = 1)
Compute the linear trend or any degree of polynomial regression along the
forecast time. It returns the regression coefficients (including the intercept)
and the detrended array. The confidence intervals and p-value are also
provided if needed.
The confidence interval relies on the student-T distribution, and the p-value
is calculated by ANOVA.
Trend( data, time_dim = "ftime", interval = 1, polydeg = 1, alpha = 0.05, conf = TRUE, pval = TRUE, sign = FALSE, ncores = NULL )
Trend( data, time_dim = "ftime", interval = 1, polydeg = 1, alpha = 0.05, conf = TRUE, pval = TRUE, sign = FALSE, ncores = NULL )
data |
An numeric array including the dimension along which the trend is computed. |
time_dim |
A character string indicating the dimension along which to compute the trend. The default value is 'ftime'. |
interval |
A positive numeric indicating the unit length between two points along 'time_dim' dimension. The default value is 1. |
polydeg |
A positive integer indicating the degree of polynomial regression. The default value is 1. |
alpha |
A numeric indicating the significance level for the statistical significance test. The default value is 0.05. |
conf |
A logical value indicating whether to retrieve the confidence intervals or not. The default value is TRUE. |
pval |
A logical value indicating whether to compute the p-value or not. The default value is TRUE. |
sign |
A logical value indicating whether to retrieve the statistical significance based on 'alpha'. The default value is FALSE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
A list containing:
$trend |
A numeric array with the first dimension 'stats', followed by the same
dimensions as parameter 'data' except the 'time_dim' dimension. The length
of the 'stats' dimension should be |
$conf.lower |
A numeric array with the first dimension 'stats', followed by the same
dimensions as parameter 'data' except the 'time_dim' dimension. The length
of the 'stats' dimension should be |
$conf.upper |
A numeric array with the first dimension 'stats', followed by the same
dimensions as parameter 'data' except the 'time_dim' dimension. The length
of the 'stats' dimension should be |
$p.val |
A numeric array of p-value calculated by anova(). The first dimension
'stats' is 1, followed by the same dimensions as parameter 'data' except
the 'time_dim' dimension. Only present if |
$sign |
The statistical significance. Only present if |
$detrended |
A numeric array with the same dimensions as paramter 'data', containing the detrended values along the 'time_dim' dimension. |
# Load sample data as in Load() example: example(Load) months_between_startdates <- 60 trend <- Trend(sampleData$obs, polydeg = 2, interval = months_between_startdates)
# Load sample data as in Load() example: example(Load) months_between_startdates <- 60 trend <- Trend(sampleData$obs, polydeg = 2, interval = months_between_startdates)
Interface to compute probabilistic scores (Brier Score, Brier Skill Score) from the forecast and observational data anomalies. It provides six types to choose.
UltimateBrier( exp, obs, dat_dim = NULL, memb_dim = "member", time_dim = "sdate", quantile = TRUE, thr = c(5/100, 95/100), type = "BS", decomposition = TRUE, ncores = NULL )
UltimateBrier( exp, obs, dat_dim = NULL, memb_dim = "member", time_dim = "sdate", quantile = TRUE, thr = c(5/100, 95/100), type = "BS", decomposition = TRUE, ncores = NULL )
exp |
A numeric array of forecast anomalies with named dimensions that
at least include 'memb_dim', and 'time_dim'. It can be provided
by |
obs |
A numeric array of observational reference anomalies with named
dimensions that at least include 'time_dim'. If it has
'memb_dim', the length must be 1. The dimensions should be consistent with
'exp' except 'dat_dim' and 'memb_dim'. It can be provided by |
dat_dim |
A character string indicating the name of the dataset dimension in 'exp' and 'obs'. The default value is NULL (no dataset). dimension, set NULL. |
memb_dim |
A character string indicating the name of the member dimension in 'exp' (and 'obs') for ensemble mean calculation. The default value is 'member'. |
time_dim |
A character string indicating the dimension along which to compute the probabilistic scores. The default value is 'sdate'. |
quantile |
A logical value to decide whether a quantile (TRUE) or a threshold (FALSE) is used to estimate the forecast and observed probabilities. If 'type' is 'FairEnsembleBS' or 'FairEnsembleBSS', it must be TRUE. The default value is TRUE. |
thr |
A numeric vector to be used in probability calculation (for 'BS',
'FairStartDatesBS', 'BSS', and 'FairStartDatesBSS') and binary event
judgement (for 'FairEnsembleBS' and 'FairEnsembleBSS'). It is as
quantiles if 'quantile' is TRUE or as thresholds if 'quantile' is FALSE.
The default value is |
type |
A character string of the desired score type. It can be the following values:
The default value is 'BS'. |
decomposition |
A logical value to determine whether the decomposition of the Brier Score should be provided (TRUE) or not (FALSE). It is only used when 'type' is 'BS' or 'FairStartDatesBS'. The default value is TRUE. |
ncores |
An integer indicating the number of cores to use for parallel computation. The default value is NULL. |
If 'type' is 'BS' or 'FairStartDatesBS' and 'decomposition' is TRUE, the
output is a list of 4 arrays (see details below.) In other cases, the output
is an array of Brier scores or Brier skill scores. All the arrays have the
same dimensions:
c(nexp, nobs, no. of bins, the rest dimensions of 'exp' except 'time_dim' and
'memb_dim'). 'nexp' and 'nobs' is the length of dataset dimension in 'exp'
and 'obs' respectively. If dat_dim is NULL, nexp and nobs are omitted.
The list of 4 includes:
$bs: Brier Score
$rel: Reliability component
$res: Resolution component
$unc: Uncertainty component
sampleData$mod <- Season(sampleData$mod, monini = 11, moninf = 12, monsup = 2) sampleData$obs <- Season(sampleData$obs, monini = 11, moninf = 12, monsup = 2) clim <- Clim(sampleData$mod, sampleData$obs) exp <- Ano(sampleData$mod, clim$clim_exp) obs <- Ano(sampleData$obs, clim$clim_obs) bs <- UltimateBrier(exp, obs, dat_dim = 'dataset') bss <- UltimateBrier(exp, obs, type = 'BSS', dat_dim = 'dataset')
sampleData$mod <- Season(sampleData$mod, monini = 11, moninf = 12, monsup = 2) sampleData$obs <- Season(sampleData$obs, monini = 11, moninf = 12, monsup = 2) clim <- Clim(sampleData$mod, sampleData$obs) exp <- Ano(sampleData$mod, clim$clim_exp) obs <- Ano(sampleData$obs, clim$clim_obs) bs <- UltimateBrier(exp, obs, dat_dim = 'dataset') bss <- UltimateBrier(exp, obs, type = 'BSS', dat_dim = 'dataset')