In this example, the seasonal temperature forecasts, initialized in october, will be used to perform a downscaling in the Balearic Islands temperature using the cmcc system 3 seasonal forecasting system from the Euro-Mediterranean Center of Climate Change (CMCC), by computing Analogs in Sea level pressure data (SLP) in a larger region (North Atlantic). The first step will be to load the data we want to downscale (i.e. cmcc) in the large region (i.e North Atlantic) for temperature (predictand) and SLP (predictor) and same variables and region for a higher resolution data (ERA5). In a second step we will interpolate the model to the resolution of ERA5. In a third step we will find the analogs using one of the three criterias. In a four step we will get the downscaled dataset in the region selected (local scale, in this case Balearic Islands).
For instance if we want to perform a temperature donwscaling in Balearic Island for October we will get a daily series of temperature with 1 analog per day, the best analog. How we define the best analog for a certain day? This function offers three options for that:
The day with the minimum Euclidean distance in a large scale field: using i.e. pressure or geopotencial height as variables and North Atlantic region as large scale region. The Atmospheric circulation pattern in the North Atlantic (LargeScale) has an important role in the climate in Spain (LocalScale). The function will find the day with the most similar pattern in atmospheric circulation in the database (obs, slp in ERA5) to the day of interest (exp, slp in model). Once the date of the best analog is found, the function takes the associated temperature to that day (obsVar, tas in ERA5), with a subset of the region of interest (Balearic Island).
Same that (1) but in this case we will search for analogs in the local scale (Balearic Island) instead of in the large scale (North Atlantic). Once the date of the best analog is found, the function takes the associated temperature to that day (obsVar, t2m in ERA5), with a subset of the region of interest (Balearic Island).
Same that (2) but here we will search for analogs with higher correlation at local scale (Balearic Island) and instead of using SLP we will use t2m.
In particular the Analogs Method uses a nonlinear approach that follows (Analogs; Yiou et al. 2013).
An efficient implementation of Analogs is provided for CSTools by the
CST_Analogs()
function.
Two datasets are used to illustrate how to use the function. The first one could be enterly run by the users since it is using data samples provided along with the package. The second one uses data that needs to be downloaded or requested.
After loading CSTools package on the R session, the
user will have access to the sample data created with using
CST_Start
:
lonlat_temp_stand
lonlat_prec_st`.
*Note: If it is the first time using CSTools, install the package by
running install.packages("CSTools")
.
library(CSTools)
After exploring the data, the user can directly run the Analogs downscaling method using the ‘Large_dis’ metric:
class(lonlat_temp_st$exp)
names(lonlat_temp_st$obs)
dim(lonlat_temp_st$obs$data)
dim(lonlat_temp_st$exp$data)
lonlat_temp_st$exp$attrs$Dates
lonlat_temp_st$obs$attrs$Dates
There are 15 ensemble members available in the exp
data
set, 6 starting dates and 3 forecast times, which refer to monthly
values during 3 months following starting dates on November in the years
2000, 2001, 2002, 2003, 2004 and 2005.
exp1 <- CST_Subset(x = lonlat_temp_st$exp, along = c('sdate', 'ftime'), indices = list(1, 1))
down_1 <- CST_Analogs(expL = exp1, obsL = lonlat_temp_st$obs)
exp2 <- CST_Subset(x = lonlat_temp_st$exp, along = c('sdate', 'ftime'), indices = list(1, 2))
down_2 <- CST_Analogs(expL = exp2, obsL = lonlat_temp_st$obs)
exp3 <- CST_Subset(x = lonlat_temp_st$exp, along = c('sdate', 'ftime'), indices = list(1, 3))
down_3 <- CST_Analogs(expL = exp3, obsL = lonlat_temp_st$obs)
The visualization of the first three time steps for the ensemble mean of the forecast initialized the 1st of Noveber 2000 can be done using the package s2dv:
library(s2dv)
var = list(MeanDims(down_1$data, 'member'),
MeanDims(down_2$data, 'member'),
MeanDims(down_3$data, 'member'))
PlotLayout(PlotEquiMap, c('lat', 'lon'),
var = var,
nrow = 1, ncol = 3,
lon = down_1$coords$lon,
lat = down_1$coords$lat,
filled.continents = FALSE,
titles = c("2000-11-01", "2000-12-01", "2001-01-01"), units = 'T(K)',
toptitle = 'Analogs sdate November 2000',
width = 10, height = 4)
The user can also request extra Analogs and the information:
down <- CST_Analogs(expL = exp1, obsL = lonlat_temp_st$obs,
nAnalogs = 2, AnalogsInfo = TRUE)
Again, the user can explore the object down1 which is class ‘s2dv_cube’. The element ‘data’ contains in this case metrics and the dates corresponding to the observed field:
class(down)
names(down$data)
dim(down$data$fields)
dim(down$data$metric)
dim(down$data$dates)
down$data$dates[1,15]
The last command run concludes that the best analog of the ensemble 15 corresponding to the 1st of November 2000 is the 1st November 2004:
PlotLayout(PlotEquiMap, c('lat', 'lon'), var = list(down$data$fields[1, , , 15],
lonlat_temp_st$obs$data[1, 1, 5, 1, , ]), nrow = 1, ncol = 2,
lon = down$coords$lon, lat = down$coords$lat, filled.continents = FALSE,
titles = c("Downscaled 2000-11-01", "Observed 2004-11-01"), units = 'T(K)',
width = 7, height = 4)
As expected, they are exatly the same.
In this case, the spatial field of a single forecast day will be
downscale using Analogs in this example. This will allow illustrating
how to use CST_Start
to retrieve observations separated
from simulations. To explore other options, see other CSTools vignettes
as well as CST_Start
documentation and startR package.
The simulations available for the desired model cover the period 1993-2016. Here, the 15th of October 2000 (for the simulation initialized in the 1st of October 2000), will be downscaled. For ERA5 from 1979 to the present days. For this example we will just use October days from 2000 to 2006, so, the starting dates can be defined by running the following lines:
start <- as.Date(paste(2000, 10, "01", sep = ""), "%Y%m%d")
end <- as.Date(paste(2006, 10, "01", sep = ""), "%Y%m%d")
dates <- as.POSIXct(seq(start, end, by = "year"), format = '%Y%m%d', 'UTC')
Using the CST_Start
function from CSTools
package, the data available in our data store can be loaded.
The following lines show how this function can be used. The experimental
datasets are interpolated to the ERA5 grid by specifying the ‘grid’
parameter while ERA5 doesn’t need to be interpolated. While dimension
‘time’ is set to 1 for the experimental dataset, it is set to 31 for the
observations, returning the daily observations for October for the years
requested in ‘sdate’ (2000-2006). Download the data to run the recipe
under the HTTPS:
downloads.cmcc.bo.it/d_chaves/ANALOGS/data_for_Analogs.Rdat or ask
carmen.alvarez-castro at cmcc.it or nuria.perez at bsc.es.
exp_path <- paste0('/esarchive/exp/ecmwf/system4_m1/daily_mean/',
'$var$_f6h/$var$_$sdate$.nc')
obs_path <- paste0('/esarchive/recon/ecmwf/era5/daily_mean/',
'$var$_f1h-r1440x721cds/$var$_$sdate$.nc')
date_exp <- '20001001'
lonmax <- 50
lonmin <- -80
latmax <- 70
latmin <- 22
expTAS <- CST_Start(dataset = exp_path,
var = 'tas',
member = startR::indices(1:15),
sdate = '20001001',
ftime = startR::indices(15),
lat = startR::values(list(latmin, latmax)),
lat_reorder = startR::Sort(decreasing = TRUE),
lon = startR::values(list(lonmin, lonmax)),
lon_reorder = startR::CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
member = c('member', 'ensemble'),
ftime = c('ftime', 'time')),
transform = startR::CDORemapper,
transform_params = list(grid = 'r1440x721',
method = 'bilinear'),
transform_vars = c('lat', 'lon'),
return_vars = list(lat = NULL,
lon = NULL, ftime = 'sdate'),
retrieve = TRUE)
expPSL <- CST_Start(dataset = exp_path,
var = 'psl',
member = startR::indices(1:15),
sdate = '20001001',
ftime = startR::indices(15),
lat = startR::values(list(latmin, latmax)),
lat_reorder = startR::Sort(decreasing = TRUE),
lon = startR::values(list(lonmin, lonmax)),
lon_reorder = startR::CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
member = c('member', 'ensemble'),
ftime = c('ftime', 'time')),
transform = startR::CDORemapper,
transform_params = list(grid = 'r1440x721',
method = 'bilinear'),
transform_vars = c('lat', 'lon'),
return_vars = list(lat = NULL,
lon = NULL, ftime = 'sdate'),
retrieve = TRUE)
obsTAS <- CST_Start(dataset = obs_path,
var = 'tas',
sdate = unique(format(dates, '%Y%m')),
ftime = startR::indices(1:31),
lat = startR::values(list(latmin, latmax)),
lat_reorder = startR::Sort(decreasing = TRUE),
lon = startR::values(list(lonmin, lonmax)),
lon_reorder = startR::CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
ftime = c('ftime', 'time')),
return_vars = list(lat = NULL,
lon = NULL, ftime = 'sdate'),
retrieve = TRUE)
obsPSL <- CST_Start(dataset = obs_path,
var = 'psl',
sdate = unique(format(dates, '%Y%m')),
ftime = startR::indices(1:31),
lat = startR::values(list(latmin, latmax)),
lat_reorder = startR::Sort(decreasing = TRUE),
lon = startR::values(list(lonmin, lonmax)),
lon_reorder = startR::CircularSort(0, 360),
synonims = list(lon = c('lon', 'longitude'),
lat = c('lat', 'latitude'),
ftime = c('ftime', 'time')),
return_vars = list(lat = NULL,
lon = NULL, ftime = 'sdate'),
retrieve = TRUE)
The ‘s2dv_cube’ objects expTAS
,obsTAS
,
expPSL
and obsPSL
are now loaded in the R
environment. The first two elements correspond to the experimental and
observed data for temperature and the other two are the equivalent for
the SLP data.
Loading the data using CST_Start
allows to obtain two
lists, one for the experimental data and another for the observe data,
with the same elements and compatible dimensions of the data
element:
dim(expTAS$data)
# dataset var member sdate ftime lat lon
# 1 1 15 1 1 193 521
dim(obsTAS$data)
# dataset var sdate ftime lat lon
# 1 1 7 31 193 521
The aim is to downscale the temperature field of the simulation for the 15th of October 2000 but looking at the pressure pattern:
down1 <- CST_Analogs(expL = expPSL, obsL = obsPSL, AnalogsInfo = TRUE,
criteria = "Large_dist", nAnalogs = 3,
obsVar = obsTAS, expVar = expTAS)
Some warnings could appear indicating information about undefining
parameters. It is possible to explore the information in object
down
by runing:
names(down1$data)
dim(down1$data$field)
# nAnalogs lat lon member
# 3 193 521 15
dim(down1$data$dates)
# nAnalogs member
# 3 15
down1$data$dates[1,1]
# "07-10-2005"
Now, we can visualize the output:
PlotLayout(PlotEquiMap, c('lat', 'lon'),
var = list(expPSL$data[1, 1, 1, 1, 1, , ], obsPSL$data[1, 1, 1, 15, , ],
obsPSL$data[1, 1, 6, 7, , ]), lon = obsPSL$coords$lon,
lat = obsPSL$coords$lat, filled.continents = FALSE,
titles = c('Exp PSL 15-10-2000','Obs PSL 15-10-2000',
'Obs PSL 7-10-2005'),
toptitle = 'First member', ncol = 3, nrow = 1, width = 10, height = 4)
PlotLayout(PlotEquiMap, c('lat', 'lon'), var = list(
expTAS$data[1, 1, 1, 1, 1, , ], obsTAS$data[1, 1, 1, 15, , ],
down1$data$field[1, , , 1], obsTAS$data[1, 1, 6, 7, , ]),
lon = obsTAS$coords$lon, lat = obsTAS$coords$lat, filled.continents = FALSE,
titles = c('Exp TAS 15-10-2000', 'Obs TAS 15-10-2000',
'Analog TAS 15-10-2000', 'Obs TAS 7-10-2005'),
ncol = 2, nrow = 2)
The previous figure, shows the PSL inputs and the PSL pattern for the PSL the 7th of October, 2005, which is the best analog. Note: Analogs automatically exclude the day is being downscaled from the observations.
The next figure shows the input temperature fields, and the result analog which corresponds to the temperature of the 7th of October, 2005:
The aim is to downscale the temperature simulation of the 15th of October 2000, by considering the pressure spatial pattern n the large scale and the local pressure pattern on a given region. Therefore, a region is defined providing maximum and minimum latitude and longitude coordinates, in this case, selecting the Balearic Islands:
region <- c(lonmin = 0, lonmax = 5, latmin = 38.5, latmax = 40.5)
expPSL$data <- expPSL$data[1, 1, 1, 1, 1, , ]
expTAS$data <- expTAS$data[1, 1, 1, 1, 1, , ]
down2 <- CST_Analogs(expL = expPSL, obsL = obsPSL, AnalogsInfo = TRUE,
criteria = "Local_dist", # nAnalogs = 50,
obsVar = obsTAS, expVar = expTAS,
region = region)
The parameter ‘nAnalogs’ doesn’t correspond to the number of Analogs returned, but to the number of the best observations to use in the comparison between large and local scale.
In this case, when looking to a large scale pattern and also to local scale pattern the best analog for the first member is the 7th of October 2001:
down2$data$dates[2]
# "13-10-2001"
library(ClimProjDiags)
var = list(expTAS$data, obsTAS$data[1, 1, 1, 15, , ],
down2$data$field[1, , ], SelBox(obsTAS$data[1, 1, 2, 13, , ],
lon = as.vector(obsTAS$coords$lon),
lat = as.vector(obsTAS$coords$lat),
region, londim = 'lon', latdim = 'lat')$data)
PlotLayout(PlotEquiMap, c('lat', 'lon'), var = var,
special_args = list(list(lon = expTAS$coords$lon, lat = expTAS$coords$lat),
list(lon = obsTAS$coords$lon, lat = obsTAS$coords$lat),
list(lon = down2$coords$lon, down2$coords$lat),
list(lon = down2$coords$lon, down2$coords$lat)),
filled.continents = FALSE,
titles = c('Exp TAS 15-10-2000', 'Obs TAS 15-10-2000',
'Analog TAS 15-10-2000', 'Obs TAS 13-10-2001'),
ncol = 2, nrow = 2)
Previous figure shows that the best Analog field corrspond to the observed field on the 13th of October 2001.
down3 <- CST_Analogs(expL = expPSL, obsL = obsPSL, AnalogsInfo = TRUE,
criteria = "Local_cor", # nAnalogs = 50,
obsVar = obsTAS, expVar = expTAS,
region = region)
In this case, when looking to a large scale pattern and also to local scale pattern the best analog for the first member is the 10th of October 2001:
down3$data$dates[3]
# [1] "10-10-2001"
var = list(down3$data$field[1, , ],
SelBox(obsTAS$data[1, 1, 2, 10, , ],
lon = as.vector(obsTAS$coords$lon),
lat = as.vector(obsTAS$coords$lat),
region, londim = 'lon', latdim = 'lat')$data)
PlotLayout(PlotEquiMap, c('lat', 'lon'), var = var,
lon = down3$coords$lon, lat = down3$coords$lat,
filled.continents = FALSE,
titles = c('Analog TAS 15-10-2000', 'Obs TAS 10-10-2001'),
ncol = 2, nrow = 1)
Previous figure shows that the best Analog field corrspond to the observed field on the 10th of October 2001.
ExludeTime
is set by default to Time_expL in order to
find the same analog than the day of interest. If there is some interest
in excluding other dates should be included in the argument
‘excludeTime’.
down4 <- CST_Analogs(expL = expPSL, obsL = obsPSL, AnalogsInfo = TRUE,
criteria = "Large_dist", nAnalogs = 20,
obsVar = obsTAS, expVar = expTAS,
region = region, excludeTime = obsPSL$attrs$Dates[10:20])
In this case, the best analog is still being 7th of October, 2005.
Note: You can compute the anomalies values before applying the
criterias (as in Yiou et al, 2013) using CST_Anomaly
of
CSTools package