Title: | Modelling Relative Sea Level Data |
---|---|
Description: | The Bayesian modelling of relative sea-level data using a comprehensive approach that incorporates various statistical models within a unifying framework. Details regarding each statistical models; linear regression (Ashe et al 2019) <doi:10.1016/j.quascirev.2018.10.032>, change point models (Cahill et al 2015) <doi:10.1088/1748-9326/10/8/084002>, integrated Gaussian process models (Cahill et al 2015) <doi:10.1214/15-AOAS824>, temporal splines (Upton et al 2023) <arXiv:2301.09556>, spatio-temporal splines (Upton et al 2023) <arXiv:2301.09556> and generalised additive models (Upton et al 2023) <arXiv:2301.09556>. This package facilitates data loading, model fitting and result summarisation. Notably, it accommodates the inherent measurement errors found in relative sea-level data across multiple dimensions, allowing for their inclusion in the statistical models. |
Authors: | Maeve Upton [cph, aut, cre] , Andrew Parnell [aut], Niamh Cahill [aut] |
Maintainer: | Maeve Upton <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2024-12-23 06:51:07 UTC |
Source: | CRAN |
Cross validation check for spline in time, spline in space time and GAM in order to select the most appropriate number of knots when creating basis functions.
cross_val_check( data, prediction_grid_res = 50, spline_nseg = NULL, spline_nseg_t = 20, spline_nseg_st = 6, n_iterations = 1000, n_burnin = 100, n_thin = 5, n_chains = 2, model_type, n_fold = 5, seed = NULL, CI = 0.95 )
cross_val_check( data, prediction_grid_res = 50, spline_nseg = NULL, spline_nseg_t = 20, spline_nseg_st = 6, n_iterations = 1000, n_burnin = 100, n_thin = 5, n_chains = 2, model_type, n_fold = 5, seed = NULL, CI = 0.95 )
data |
Raw input data |
prediction_grid_res |
Resolution of grid. Predictions over every 50 years(default) can vary based on user preference, as larger values will reduce computational run time. |
spline_nseg |
This setting is focused on the Noisy Input Spline model. It provides the number of segments used to create basis functions. |
spline_nseg_t |
This setting is focused on the Noisy Input Generalised Additive Model. It provides the number of segments used to create basis functions. |
spline_nseg_st |
This setting is focused on the Noisy Input Generalised Additive Model. It provides the number of segments used to create basis functions. |
n_iterations |
Number of iterations. Increasing this value will increase the computational run time. |
n_burnin |
Size of burn-in. This number removes a certain number of samples at the beginning. |
n_thin |
Amount of thinning. |
n_chains |
Number of MCMC chains. The number of times the model will be run. |
model_type |
The user selects their statistical model type. The user can select a Noisy Input Spline in Time using "ni_spline_t". The user can select a Noisy Input Spline in Space Time using "ni_spline_st". The user can select a Noisy Input Generalised Additive Model using "ni_gam_decomp". |
n_fold |
Number of folds required in the cross validation. The default is 5 fold cross validation. |
seed |
If the user wants reproducible results, seed stores the output when random selection was used in the creation of the cross validation. |
CI |
Size of the credible interval required by the user. The default is 0.95 corresponding to 95%. |
A list containing the model comparison measures, e.g. Root Mean Square Error (RMSE), and plot of true vs predicted values
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") cross_val_check(data = data, model_type = "ni_spline_t",n_fold = 2)
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") cross_val_check(data = data, model_type = "ni_spline_t",n_fold = 2)
An example dataset for 22 proxy sites along the Atlantic coast of North America
NAACproxydata
NAACproxydata
A data frame with 1715 rows and 8 columns:
All regions along Atlantic coast of North America
All sites along Atlantic coast of North America
Latitude of the data site
Longitude of the data site
Relative Sea level in meters
Error in RSL
Age in years CE
Error in Age
Check out the vignettes to see the sources
In this function, the raw data is plotted with grey uncertainty boxes representing the uncertainty associated with the input and the output. The function allows the user to plot the proxy record data and tide gauge data together or separately.
## S3 method for class 'reslr_input' plot( x, title = "", xlab = "Year (CE)", ylab = "Relative Sea Level (m)", plot_tide_gauges = FALSE, plot_proxy_records = TRUE, plot_caption = TRUE, ... )
## S3 method for class 'reslr_input' plot( x, title = "", xlab = "Year (CE)", ylab = "Relative Sea Level (m)", plot_tide_gauges = FALSE, plot_proxy_records = TRUE, plot_caption = TRUE, ... )
x |
An object created via the function |
title |
Title of the plot |
xlab |
Labeling the x-axis |
ylab |
Labeling the y-axis |
plot_tide_gauges |
Plotting the tide gauge data with the proxy records |
plot_proxy_records |
Plotting the proxy records on their own and this is the default |
plot_caption |
Plotting an informed caption with the number of tide gauges and proxy sites. |
... |
Not used |
Plot of the raw data with the measurement uncertainty.
full_dataset <- reslr_load(NAACproxydata) plot(full_dataset)
full_dataset <- reslr_load(NAACproxydata) plot(full_dataset)
reslr_mcmc
function.Depending on the model chosen in the reslr_mcmc
function, the package produces a range of output plots.
Total posterior model fit plot with the raw data and measurement uncertainty are created for each statistical model.
The rate of change plots are created for the EIV IGP and the NI spline regression models.
For the NI GAM decomposition, each individual component of the model is plotted. Also, the regional and the non-linear local component, an associated rate plot is produced.
If tide gauges are used in the model, the user has the ability plot the output with or without this additional data source.
## S3 method for class 'reslr_output' plot( x, plot_proxy_records = TRUE, plot_tide_gauges = FALSE, title = "", plot_type = c("model_fit_plot"), plot_caption = TRUE, xlab = "Year (CE)", ylab = "Relative Sea Level (m)", y_rate_lab = "Rate of change (mm per year)", ... )
## S3 method for class 'reslr_output' plot( x, plot_proxy_records = TRUE, plot_tide_gauges = FALSE, title = "", plot_type = c("model_fit_plot"), plot_caption = TRUE, xlab = "Year (CE)", ylab = "Relative Sea Level (m)", y_rate_lab = "Rate of change (mm per year)", ... )
x |
An object of class |
plot_proxy_records |
Plotting the proxy records on their own and this is the default |
plot_tide_gauges |
Plotting the tide gauge data as well as proxy data |
title |
Plotting a title on the output plots |
plot_type |
The user can select the type of output plot they require from the following: "rate_plot", "model_fit_plot", "regional_plot", "regional_rate_plot", "linear_local_plot", "non_linear_local_plot", "non_linear_local_rate_plot", "nigam_component_plot" |
plot_caption |
Plotting an informed caption with the number of tide gauges and proxy sites. |
xlab |
Labeling the x-axis |
ylab |
Labeling the y-axis |
y_rate_lab |
Labeling the y-axis for rate of change plots |
... |
Not used |
Plot of model fit and the rate of change depending on the statistical model in question.
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") x <- reslr_load(data = data) jags_output <- reslr_mcmc(x, model_type = "eiv_slr_t") plot(x = jags_output)
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") x <- reslr_load(data = data) jags_output <- reslr_mcmc(x, model_type = "eiv_slr_t") plot(x = jags_output)
reslr_load
function.In this function, the reslr input object is printed. This is a high-level summary which provides the number of observations and the number of sites utilised in the dataset.
## S3 method for class 'reslr_input' print(x, ...)
## S3 method for class 'reslr_input' print(x, ...)
x |
An object of class |
... |
Other arguments (not supported) |
A neat presentation of your input reslr object
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") reslr_input <- reslr_load(data = data) print(x = reslr_input)
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") reslr_input <- reslr_load(data = data) print(x = reslr_input)
reslr_mcmc
function.This will be very high level printing that the user can use to obtain information about the MCMC run using JAGS. The number of iterations and chains used by the user is printed In addition, the type of statistical model is printed.
## S3 method for class 'reslr_output' print(x, ...)
## S3 method for class 'reslr_output' print(x, ...)
x |
An object of class |
... |
Other arguments (not supported) |
Returns high level information about the reslr_output object, i.e. the number of iterations and chains used.
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") input_data <- reslr_load(data = data) jags_output <- reslr_mcmc(input_data = input_data, model_type = "eiv_slr_t") print(x = jags_output)
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") input_data <- reslr_load(data = data) jags_output <- reslr_mcmc(input_data = input_data, model_type = "eiv_slr_t") print(x = jags_output)
reslr
packageIn this function, the data provided by the user is loaded into the package. The prerequisites of the input data structure has been given in the vignettes. The user can choose to include data from tide gauges which is sourced from the Permanent Service for Mean Sea Level (PSMSL) online database.
reslr_load( data, prediction_grid_res = 50, include_tide_gauge = FALSE, include_linear_rate = FALSE, list_preferred_TGs = NULL, TG_minimum_dist_proxy = FALSE, all_TG_1deg = FALSE, input_age_type = "CE", sediment_average_TG = 10, detrend_data = FALSE, core_col_year = NULL, cross_val = FALSE, test_set )
reslr_load( data, prediction_grid_res = 50, include_tide_gauge = FALSE, include_linear_rate = FALSE, list_preferred_TGs = NULL, TG_minimum_dist_proxy = FALSE, all_TG_1deg = FALSE, input_age_type = "CE", sediment_average_TG = 10, detrend_data = FALSE, core_col_year = NULL, cross_val = FALSE, test_set )
data |
The input data as a dataframe. |
prediction_grid_res |
Resolution of grid. Predictions over every 50 years(default) can vary based on user preference, as larger values will reduce computational run time. |
include_tide_gauge |
Including tide gauge data from PSMSL website. The tide gauge data undergo a cleaning process in this function where flagged stations are removed as recommended by the online database. Next, the tide gauge data is averaged over period defined by sediment_average_TG which default is 10 years corresponding to accumulation rates of proxy records. Then, the user selects their preferred tide gauge based on three criteria: 1.nearest tide gauge to the proxy site; 2. User supplies a list of names of preferred tide gauges; 3. all tide gauges within 1 degree are chosen. |
include_linear_rate |
User decides to include linear_rate and linear_rate_err associated. This relates to linear rate which corresponds to an important physical process that impacts sea level observations which is glacial isostatic adjustment (GIA). For the linear_rate and its associated linear_rate_err, the user can provide these values as additional columns in the input dataframe. If they prefer, the package will calculate the linear_rate and the linear_rate_err using the data. |
list_preferred_TGs |
The user can supply the name or names of the preferred tide gauges from the PSMSL database. |
TG_minimum_dist_proxy |
The package finds the tide gauge closest to the proxy site |
all_TG_1deg |
The package finds all tide gauges within 1 degree of the proxy site |
input_age_type |
The inputted age in years "CE" or year "BP". The default is "CE" and is the preferred structure of the package. The package has the ability to use Before Present ("BP") observations. |
sediment_average_TG |
Average the tide gauge data to make it comparable to accumulation rates of proxy records. The default averaging period for tide gauges is 10 years and the user can alter this. |
detrend_data |
Detrend the data using the linear rate provided to remove this component. |
core_col_year |
The year the sediment core was collected in order to the data to be detrended. |
cross_val |
For the spline in time, spline in space time and the NIGAM the user can undertake cross validation to examine the |
test_set |
The test set dataframe for cross validation |
A list containing data frame of data and prediction grid. The output of this function is two data frames, one with the data and one with the data_grid which represent a grid with evenly spaced time points. If tide gauge data is used, an ID column is included in the two output dataframes displaying the data source, "ProxyRecord" or "TideGaugeData".
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") reslr_load(data = data)
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") reslr_load(data = data)
reslr_input
object through the main reslr Markov chain Monte Carlo (MCMC) function using a chosen statistical modelIn this function, a variety of statistical models can be run depending on the requirements of the user. All models are written within a Bayesian framework and use JAGS (Just Another Gibbs Sampler) to calculate Markov Chain Monte Carlo (MCMC) simulation to obtain estimates of unknown parameters. The user has the ability to alter the number of iterations, the number of burnin, the number of chains and the thinning. These options relate to the amount of MCMC simulations required and should be reviewed by the user to ensure model convergence is achieved without excessively long run times. The user chooses their 'model_type' and as a range of models to choose from.
reslr_mcmc( input_data, model_type, n_cp = 1, igp_smooth = 0.2, n_iterations = 5000, n_burnin = 1000, n_thin = 4, n_chains = 3, CI = 0.95, spline_nseg = NULL, spline_nseg_t = 20, spline_nseg_st = 6 )
reslr_mcmc( input_data, model_type, n_cp = 1, igp_smooth = 0.2, n_iterations = 5000, n_burnin = 1000, n_thin = 4, n_chains = 3, CI = 0.95, spline_nseg = NULL, spline_nseg_t = 20, spline_nseg_st = 6 )
input_data |
Input data from the |
model_type |
The user selects their statistical model type. The user can select a Errors in Variable Simple Linear Regression using "eiv_slr_t". The user can select a Errors in Variable Change Point Regression using "eiv_cp_t". The user can select a Errors in Variable Integrated Gaussian Process using "eiv_igp_t". The user can select a Noisy Input Spline in Time using "ni_spline_t". The user can select a Noisy Input Spline in Space Time using "ni_spline_st". The user can select a Noisy Input Generalised Additive Model using "ni_gam_decomp". |
n_cp |
This setting is focused on the Errors in Variables Change Point model. The user can select the number of change points 1,2 or 3. |
igp_smooth |
This setting is focused on the Errors in Variables Integrated Gaussian Process model. It informs the prior for the smoothness (correlation) parameter if model = "igp" is chosen. Choose a value between 0 and 1. Closer to 1 will increase smoothness. |
n_iterations |
Number of iterations. Increasing this value will increase the computational run time. |
n_burnin |
Size of burn-in. This number removes a certain number of samples at the beginning. |
n_thin |
Amount of thinning. |
n_chains |
Number of MCMC chains. The number of times the model will be run. |
CI |
Size of the credible interval required by the user. The default is 0.95 corresponding to 95%. |
spline_nseg |
This setting is focused on the Noisy Input Spline model. It provides the number of segments used to create basis functions. |
spline_nseg_t |
This setting is focused on the Noisy Input Generalised Additive Model. It provides the number of segments used to create basis functions. |
spline_nseg_st |
This setting is focused on the Noisy Input Generalised Additive Model. It provides the number of segments used to create basis functions. |
A list containing the input data, the JAGS output and output dataframes used for final plots.
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") input_data <- reslr_load(data = data) reslr_mcmc(input_data = input_data, model_type = "eiv_slr_t")
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") input_data <- reslr_load(data = data) reslr_mcmc(input_data = input_data, model_type = "eiv_slr_t")
reslr_mcmc
.A warning message will appear if the model has not been converge. If this appears the user is recommended to re-run the model and alter the reslr_mcmc
function default iteration and MCMC settings. Also, it provides high-level summaries of the estimated parameters.
## S3 method for class 'reslr_output' summary(object, ...)
## S3 method for class 'reslr_output' summary(object, ...)
object |
Output object from the |
... |
Not in use |
A list containing convergence diagnostics and parameter estimates for the output.
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") input_data <- reslr_load(data = data) jags_output <- reslr_mcmc(input_data = input_data, model_type = "eiv_slr_t") summary(object = jags_output)
data <- NAACproxydata %>% dplyr::filter(Site == "Cedar Island") input_data <- reslr_load(data = data) jags_output <- reslr_mcmc(input_data = input_data, model_type = "eiv_slr_t") summary(object = jags_output)