| Title: | Non-Stationary Multivariate (Copula-Based) Framework, Hydrological Applications |
|---|---|
| Description: | To account for non-stationary multivariate data, this package implements the framework including copula and marginal distributions. In addition to modeling and parameter estimations, it allows the computation and visualization of multivariate quantile curves for given events. This package is useful for a variety of disciplines such as finance, climatology and particularly for hydrological applications, where dependence structures and marginal parameters may vary over time. This framework, based on Chebana & Ouarda (2021) <doi:10.1016/j.jhydrol.2020.125907>, integrates both multivariate and non-stationary aspects to be more accurate (e.g. for risk assessment) and more realistic (e.g. considering climate changes). |
| Authors: | Dorsaf Goutali [aut, cre], Fateh Chebana [aut] |
| Maintainer: | Dorsaf Goutali <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.2 |
| Built: | 2026-05-15 09:00:50 UTC |
| Source: | https://github.com/cran/NSMM |
Fitting of a Clayton class copula using Pseudo-maximum Likelihood, the parameters
are estimated as stationary, linear, or quadratic, relating to the covariate ()
.
It returns a data frame with the coefficients of each estimated model, and its corresponding AIC (or AICc), BIC and negative logarithmic pseudo-likelihood. It also returns a 'NScopula' object with filled in attributes according to the best model, selected on minimizing AIC or AICc.
claytonEstimation(data, covar = NULL)claytonEstimation(data, covar = NULL)
data |
Numeric matrix of dimensions |
covar |
Numeric vector of length |
The bivariate Clayton copula function is given by:
where and are the pseudo-observations of the data series and
is the copula parameter.
To measure the fit of the models two measures are used the AIC and BIC. In case the series contains less than 25 observations the small-sample corrected AIC (AICc) is to be used.
Note: During optimization, numerical warnings may occur when the likelihood is evaluated at invalid parameter values. These warnings do not affect the final parameter estimates.
A 'NScopula' object containing the parameters of the best fitted model. NAs are used when a model doesn't take the coefficient.
Data frame containing the Negative Log-likelihood, AIC, BIC, and the estimated parameter coefficients for each model.
Clayton, David G. (1978). "A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence". Biometrika. 65 (1): 141–151.
# Load sample data data("SampData") # Estimate parameters for Clayton copula temp <- claytonEstimation(SampData) temp$all_fits# Load sample data data("SampData") # Estimate parameters for Clayton copula temp <- claytonEstimation(SampData) temp$all_fits
This function uses Maximum Likelihood Estimation to fit to a given data series, the parameters, stationary or parametric, of the following copulas. Gumbel, Galambos, Hüsler-Reiss, Joe, and Clayton. It returns a 'NScopula' object with filled in attributes according to the best model selected on minimizing AIC (or AICc).
Also returns a table of fits with the negative log-likelihood, AIC (or small-sample corrected AIC for series with less than 25 observations), BIC, and estimated parameters for each distribution. If declared, saves to an Excel workspace said table.
copulaNSestimation(data, covar = NULL, file = NULL)copulaNSestimation(data, covar = NULL, file = NULL)
data |
Numeric matrix of dimensions |
covar |
Numeric vector of length |
file |
String to indicate name of Excel file to save as.
Default is |
Chebana and Ouarda (2021) consider three copula classes due to their applications in the field of hydrology; logistic Gumbel, Galambos, and Hüsler-Reiss copulas. In addition the Clayton and Joe copulas are also implemented as alternatives.
All copulas depend depend on the parameter, should be
bigger than 1 for the Gumbel and Joe copulas, for the Clayton copula bigger than -1,
and bigger than 0 for the rest.
To ensure this holds the following transformations are applied when fitting the models,
(Gumbel and Joe), (Clayton),
and (Galambos and Hüsler-Reiss). This results in the following expressions of the parameter that
allow for variations in stationariety:
where are the coefficients to be estimated and the covariate.
To estimate the coefficient the Maximum Pseudo-likelihood (MLP) is preferred.
It is based on the pseudo-observations rather than the original.
where represents the density function corresponding to the copula
.
To assess the fit of the models two metrics are to be used, the Akaike Information Criterion and the Bayesian Information Criterion. The one with the lowest AIC is selected as the best.
Note: During optimization, numerical warnings may occur when the likelihood is evaluated at invalid parameter values. These warnings do not affect the final parameter estimates.
Data frame containing the Negative Log-likelihood, AIC, BIC, and the estimated parameter coefficients for each model (All 'NAs' in the data frame indicate that the model doesn't take said parameter).
'NScopula' object containing the parameters of the best fitted model.
If declared, '.xlsx' file with the evaluation results.
Fateh Chebana, Taha B.M.J. Ouarda, 2021, Multivariate non-stationary hydrological frequency analysis, Journal of Hydrology, Volume 593, doi:10.1016/j.jhydrol.2020.125907.
gumbelEstimation, galambosEstimation, huslerReissEstimation, claytonEstimation
copula: a comprehensive package that allows for copula estimation, analysis, and distribution.
# Load sample data data("SampData") # Estimate parameters for different copula classes temp <- copulaNSestimation(SampData, covar = NULL, file = NULL) temp$all_fits # Get 'NScopula' object with the best fit copula <- temp$best# Load sample data data("SampData") # Estimate parameters for different copula classes temp <- copulaNSestimation(SampData, covar = NULL, file = NULL) temp$all_fits # Get 'NScopula' object with the best fit copula <- temp$best
Fitting of a Galambos class copula using Pseudo-maximum Likelihood, the parameters
are estimated as stationary, linear, or quadratic, relating to the covariate ()
.
It returns a data frame with the coefficients of each estimated model, and its corresponding AIC (or AICc), BIC and negative log-pseudo-likelihood. It also returns a 'NScopula' object with filled in attributes according to the best model, selected on minimizing AIC or AICc.
galambosEstimation(data, covar = NULL)galambosEstimation(data, covar = NULL)
data |
Numeric matrix of dimensions |
covar |
Numeric vector of length |
The bivariate Galambos copula function is given by:
where and are the pseudo-observations of the data series and
is the copula parameter.
To measure the fit of the models two measures are used the AIC and BIC. In case the series contains less than 25 observations the small-sample corrected AIC (AICc) is to be used.
Note: During optimization, numerical warnings may occur when the likelihood is evaluated at invalid parameter values. These warnings do not affect the final parameter estimates.
A 'NScopula' object containing the parameters of the best fitted model. NAs are used when a model doesn't take the coefficient.
Data frame containing the Negative Log-likelihood, AIC, BIC, and the estimated parameter coefficients for each model.
Galambos, J., 1975. Order statistics of samples from multivariate distributions. J. Am. Stat. Assoc. 70 (351), 674–680.
# Load sample data data("SampData") # Estimate parameters for Galambos copula temp <- galambosEstimation(SampData) temp$all_fits# Load sample data data("SampData") # Estimate parameters for Galambos copula temp <- galambosEstimation(SampData) temp$all_fits
Fitting of a Gumbel class copula using Pseudo-maximum Likelihood, the parameters
are estimated as stationary, linear, or quadratic, relating to the covariate ()
.
It returns a data frame with the coefficients of each estimated model, and its corresponding AIC (or AICc), BIC and negative log-pseudo-likelihood. It also returns a 'NScopula' object with filled in attributes according to the best model, selected on minimizing AIC or AICc.
gumbelEstimation(data, covar = NULL)gumbelEstimation(data, covar = NULL)
data |
Numeric matrix of dimensions |
covar |
Numeric vector of length |
The bivariate Gumbel copula function is given by:
where and are the pseudo-observations of the data series and
is the copula parameter.
To measure the fit of the models two measures are used, the AIC and BIC. In case the series contains less than 25 observations the small-sample corrected AIC (AICc) is to be used.
Note: During optimization, numerical warnings may occur when the likelihood is evaluated at invalid parameter values. These warnings do not affect the final parameter estimates.
A 'NScopula' object containing the parameters of the best fitted model. NAs are used when a model doesn't take the coefficient.
Data frame containing the Negative Log-likelihood, AIC, BIC, and the estimated parameter coefficients for each model.
# Load sample data data("SampData") # Estimate parameters for Gumbel copula temp <- gumbelEstimation(SampData) temp$all_fits# Load sample data data("SampData") # Estimate parameters for Gumbel copula temp <- gumbelEstimation(SampData) temp$all_fits
Fitting of a Joe class copula using Pseudo-maximum Likelihood, the parameters
are estimated as stationary, linear, or quadratic, relating to the covariate ()
.
It returns a data frame with the coefficients of each estimated model, and its corresponding AIC (or AICc), BIC and negative log-pseudo-likelihood. It also returns a 'NScopula' object with filled in attributes according to the best model, selected on minimizing AIC or AICc.
huslerReissEstimation(data, covar = NULL)huslerReissEstimation(data, covar = NULL)
data |
Numeric matrix of dimensions |
covar |
Numeric vector of length |
The Hüsler-Reiss copula function is given by:
where is the cumulative function of the standard normal distribution,
and the pseudo-observations of the data series, and
the copula parameter.
To measure the fit of the models two measures are used the AIC and BIC. In case the series contains less than 25 observations the small-sample corrected AIC (AICc) is to be used.
Note: During optimization, numerical warnings may occur when the likelihood is evaluated at invalid parameter values. These warnings do not affect the final parameter estimates.
A 'NScopula' object containing the parameters of the best fitted model. NAs are used when a model doesn't take the coefficient.
Data frame containing the Negative Log-likelihood, AIC, BIC, and the estimated parameter coefficients for each model.
Hüsler, J., Reiss, R.-D., 1989. Maxima of normal random vectors: between independence and complete dependence. Statistics Probability Lett. 7 (4), 283–286.
# Load sample data data("SampData") # Estimate parameters for Husler-Reiss copula temp <- huslerReissEstimation(SampData) temp$all_fits# Load sample data data("SampData") # Estimate parameters for Husler-Reiss copula temp <- huslerReissEstimation(SampData) temp$all_fits
Fitting of a Joe class copula using Pseudo-maximum Likelihood, the parameters
are estimated as stationary, linear, or quadratic, relating to the covariate ()
.
It returns a data frame with the coefficients of each estimated model, and its corresponding AIC (or AICc), BIC and negative log-pseudo-likelihood. It also returns a 'NScopula' object with filled in attributes according to the best model, selected on minimizing AIC or AICc.
joeEstimation(data, covar = NULL)joeEstimation(data, covar = NULL)
data |
Numeric matrix of dimensions |
covar |
Numeric vector of length |
The Joe copula function is given by:
where and are the data series pseudo-observations, and
is the copula parameter.
To measure the fit of the models two measures are used the AIC and BIC. In case the series contains less than 25 observations the small-sample corrected AIC (AICc) is to be used.
Note: During optimization, numerical warnings may occur when the likelihood is evaluated at invalid parameter values. These warnings do not affect the final parameter estimates.
A 'NScopula' object containing the parameters of the best fitted model. NAs are used when a model doesn't take the coefficient.
Data frame containing the Negative Log-likelihood, AIC, BIC, and the estimated parameter coefficients for each model.
# Load sample data data("SampData") # Estimate parameters for Joe copula temp <- joeEstimation(SampData) temp$all_fits# Load sample data data("SampData") # Estimate parameters for Joe copula temp <- joeEstimation(SampData) temp$all_fits
This function uses Maximum Likelihood Estimation to fit to a given data series, the parameters, stationary or parametric, of the following marginal distributions: Generalized Extreme Value, Log-normal, and three parameter Log-normal. It prints or saves to an Excel workspace a table of fits with the negative log-likelihood, AIC (or small-sample corrected AIC for series with less than 25 points), BIC, and estimated parameters for each distribution and returns a 'marginal' object with filled in attributes according to the best model selected on minimizing AIC (or AICc).
marginalNSestimation(data, covar = NULL, file = NULL)marginalNSestimation(data, covar = NULL, file = NULL)
data |
Numeric vector of length n, the data series variable to be fitted. |
covar |
Numeric vector of length |
file |
String to indicate name of Excel file to save as.
Default is |
The function fits a distribution and parameters, stationary or not, to the provided data. Due to their applications in hydrology, three distributions are implemented Chebana and Ouarda (2021).
Generalized Extreme Value:
The cumulative distribution funciton of the GEV distribution is given by:
where and are respectively
the location, scale, and shape parameters.
Log-normal distribution:
The density function for the two parameter LN2 distribution is given by:
where and are respectively the mean and
standard deviation of .
Three parameter Log-normal:
The density function for the three parameter LN3 distribution is given by:
where and are as in LN2 and is the threshold parameter
or lower bound.
To incorporate the possible non-stationarity Chebana and Ouarda (2021) propose to link the distribution parameters to the covariate in the shape:
where is the parameter where trend is inducted and is
is the covariate. Various models; stationary, with linear or quadratic trend, are to be fitted.
To estimate the coefficients of any parameter the maximum likelihood approach is used. To assess the fit of the model the Akaike Information Criterion and Bayesia Information Criterion are computed for each model. The one with the lowest AIC will be selected as best.
Note: During optimization, numerical warnings may occur when the likelihood is evaluated at invalid parameter values. These warnings do not affect the final parameter estimates.
Data frame containing the Negative Log-likelihood, AIC, BIC and the estimated parameter coefficients for each model (All 'NAs' in the data frame indicate that the model doesn't take said parameter).
'NSmarginal' object containing the parameters of the best fitted model.
If declared, '.xlsx' file with the evaluation results.
Fateh Chebana, Taha B.M.J. Ouarda, 2021, Multivariate non-stationary hydrological frequency analysis, Journal of Hydrology, Volume 593, doi:10.1016/j.jhydrol.2020.125907.
Aitchison, J., Brown, J.A.C., 1957. The Lognormal Distribution. Cambridge University Press, London, p. 176.
# Load sample data data("SampData") X <- SampData[, 1] Y <- SampData[, 2] # Estimate parameters for different marginal distributions temp <- marginalNSestimation(X, covar = NULL) temp$all_fits # Get 'NSmarginal' object with the best fit marginal <- temp$best# Load sample data data("SampData") X <- SampData[, 1] Y <- SampData[, 2] # Estimate parameters for different marginal distributions temp <- marginalNSestimation(X, covar = NULL) temp$all_fits # Get 'NSmarginal' object with the best fit marginal <- temp$best
This file contains the functions for computing the bivariate quantile curve
for a given model. It defines a grid on and calls
contourLines to find the values that yield a quantile at level p.
mqc( seriesX, seriesY, copula, covariate = 1, p = 0.95, event = c("E.AND.E", "E.OR.E", "nE.AND.nE", "nE.OR.nE"), step = 0.01, file = NULL, c_name = "Time" )mqc( seriesX, seriesY, copula, covariate = 1, p = 0.95, event = c("E.AND.E", "E.OR.E", "nE.AND.nE", "nE.OR.nE"), step = 0.01, file = NULL, c_name = "Time" )
seriesX |
'NSmarginal' object. |
seriesY |
'NSmarginal' object. |
copula |
'NScopula' object |
covariate |
Numeric vector, if not stationary this value represents the covariate at which the parameters are evaluated. Default is 1. |
p |
Numeric, quantile level to evaluate at. Must be between 0 and 1, default is 0.95. |
event |
Character vector, type of event for the bivariate estimation. Default is "nE.AND.nE". |
step |
Numeric, step size for creation of grid. Must be between 0 and 1, default is 0.01. |
file |
Character vector, file name to use for saving information in Excel. If default NULL, no xlsx is saved. |
c_name |
Character vector. Optional parameter, used to name the covariate or give units. Default is "Time" |
This function calculates the p-level bivariate quantile curve using the MQC approach.
It first sets up a grid on , calculates the (u,v) probabilities using the copula,
computes the proper event based on the chosen event type, and then obtains the contour lines
to find the (u,v) points corresponding to the desired quantile level p. Finally, the
inverse marginal distributions are applied to convert (u,v) points to (X,Y) coordinates in
the original data space.
A 'MQC' object containing the X and Y coordinates of the p-level bivariate quantile curve and all the input information.
# Load sample data data(SampData) X <- SampData[, 1] Y <- SampData[, 2] # Fit marginal distributions temp <- marginalNSestimation(X, covar = NULL, file = NULL) marx <- temp$best temp <- marginalNSestimation(Y, covar = NULL, file = NULL) mary <- temp$best # Fit copula model temp <- copulaNSestimation(SampData, covar = NULL) cop <- temp$best # Compute the multivariate quantile curves MQC <- mqc(marx, mary, cop, covariate = seq(1:50), p = 0.05, event = "E.AND.E", step = 0.01, file = NULL, c_name = "Time")# Load sample data data(SampData) X <- SampData[, 1] Y <- SampData[, 2] # Fit marginal distributions temp <- marginalNSestimation(X, covar = NULL, file = NULL) marx <- temp$best temp <- marginalNSestimation(Y, covar = NULL, file = NULL) mary <- temp$best # Fit copula model temp <- copulaNSestimation(SampData, covar = NULL) cop <- temp$best # Compute the multivariate quantile curves MQC <- mqc(marx, mary, cop, covariate = seq(1:50), p = 0.05, event = "E.AND.E", step = 0.01, file = NULL, c_name = "Time")
The purpose of this function is creating a NScopula object. This object contains
information of the copula: "family", "type", "gamma", and "name".
This object is needed to use some of the functions in the package.
NScopula( family = c("GUM", "GAL", "HUS", "CLA", "JOE"), type = c("0", "1", "2"), gamma, name = "Copula" )NScopula( family = c("GUM", "GAL", "HUS", "CLA", "JOE"), type = c("0", "1", "2"), gamma, name = "Copula" )
family |
Character vector. Options are "GEV", "LN2", and "LN3". |
type |
Character vector. Options are:
|
gamma |
Numeric vector, copula parameter coefficients, |
name |
Character vector, optional parameter for naming the copula. Default is "Copula". |
The purpose of this function is creating a NScopula object. This object contains
information of the copula; "family" refers to the copula family, Gumbel, Galambos,
Hüsler-Reiss, Clayton, or Joe; "type" refers to the relationship between the
copula parameter and the covariate, stationary, linear, or quadratic; "gamma" is a vector containing
coefficients of the parameter . To ensure that
is in an adequate range for each copula family, different transformations must be applied to
obtain it. Let be the covariate of the data series. The gamma parameter for Gumbel and Joe
copula families is given by
for the Clayton copula
and for Hüsler-Reiss and Galambos
.
For more information on the estimation see copulaNSestimation.
A NScopula object
# A Gumbel copula, with a quadratic non-stationary parameter copula <- NScopula(family = "GUM", type = "2", gamma = c(1, 0.1, 0.03))# A Gumbel copula, with a quadratic non-stationary parameter copula <- NScopula(family = "GUM", type = "2", gamma = c(1, 0.1, 0.03))
The purpose of this function is creating a NSmarginal object. This object contains
information of the marginal distribution: "family", "type", "mu", "sig", "k", and"name".
This object is needed to use other functions of the package.
NSmarginal( family = c("GEV", "LN2", "LN3"), type = c("00", "10", "20", "11", "21"), mu, sig, k, name = "Marginal" )NSmarginal( family = c("GEV", "LN2", "LN3"), type = c("00", "10", "20", "11", "21"), mu, sig, k, name = "Marginal" )
family |
Character vector. Options are "GEV", "LN2", and "LN3". |
type |
Character vector. Options are "GEV", "LN2", and "LN3". Options are:
|
mu |
Numeric vector, location parameter coefficients, |
sig |
Numeric vector, scale parameter coefficients, |
k |
Numeric vector, shape or threshold parameter coefficients, |
name |
Character vector, optional parameter for naming the marginal. Default is "Marginal". |
The purpose of this function is creating a NSmarginal object. This object contains
information of the marginal distribution; "family" refers to the distribution, Generalized
Extreme Value, Log-normal or three-parameter Log normal; "type" refers to the relationship between the
distribution parameters and the covariate, stationary, linear, or quadratic; "mu" is a vector containing
coefficients of the location parameter , where
is the covariate; "sig" is vector containing the coefficients for the scale parameter
; "k" is a vector with the coefficients for the shape parameter
in the GEV distribution and threshold in the LN3 distributions ; lastly the optional "name" is
used to name the marginal distribution. Based on the non zero inputs of "mu", "sig", and "k" parameters
the "type" of marginal might be overwritten.
For more information on the estimation see marginalNSestimation.
A NSmarginal object
El Adlouni, S., Ouarda, T. B., Zhang, X., Roy, R., & Bobée, B. (2007). Generalized maximum likelihood estimators for the nonstationary generalized extreme value model. Water Resources Research, 43(3).
Villarini, G., Smith, J.A., Serinaldi, F., Bales, J., Bates, P.D., Krajewski, W.F., 2009. Flood frequency analysis for nonstationary annual peak records in an urban drainage basin. Advances in Water Resources, 32(8): 1255-1266.
# A GEV marginal, with quadratic location parameter marginal <- NSmarginal(family = "GEV", type = "20", mu = c(0.5, 0.3, 0.001), sig = 1, k = 0.5)# A GEV marginal, with quadratic location parameter marginal <- NSmarginal(family = "GEV", type = "20", mu = c(0.5, 0.3, 0.001), sig = 1, k = 0.5)
This file contains the functions for plotting the bivariate quantile curves for a given model. It needs the results from mqc, and the data used to estimate the marginals and copula.
quantilePlotNS( MQC, data = NULL, ylim = NULL, xlim = NULL, color = c("cool", "warm", "dark", "blue"), file = NULL )quantilePlotNS( MQC, data = NULL, ylim = NULL, xlim = NULL, color = c("cool", "warm", "dark", "blue"), file = NULL )
MQC |
'MQC' object, must be related to |
data |
Numeric matrix of dimensions |
ylim |
Numeric vector of length 2, optional parameter the range of the |
xlim |
Numeric vector of length 2, optional parameter the range of the |
color |
Character vector, the plots colorway. Options are "cool", "warm", "dark", and "blue". |
file |
Character vector, file name to use for saving information in Excel. If default NULL, no xlsx is saved. |
The quantile curves plot, if declared a png file
# Load sample data data(SampData) X <- SampData[, 1] Y <- SampData[, 2] # Fit marginal distributions temp <- marginalNSestimation(X, covar = NULL, file = NULL) marx <- temp$best temp <- marginalNSestimation(Y, covar = NULL, file = NULL) mary <- temp$best # Fit copula model temp <- copulaNSestimation(SampData, covar = NULL) cop <- temp$best # Compute the multivariate quantile curves MQC <- mqc(marx, mary, cop, covariate = seq(1:50), p = 0.05, event = "E.AND.E", step = 0.01, file = NULL, c_name = "Time") # Plot the quantile curve quantilePlotNS(MQC, data = SampData)# Load sample data data(SampData) X <- SampData[, 1] Y <- SampData[, 2] # Fit marginal distributions temp <- marginalNSestimation(X, covar = NULL, file = NULL) marx <- temp$best temp <- marginalNSestimation(Y, covar = NULL, file = NULL) mary <- temp$best # Fit copula model temp <- copulaNSestimation(SampData, covar = NULL) cop <- temp$best # Compute the multivariate quantile curves MQC <- mqc(marx, mary, cop, covariate = seq(1:50), p = 0.05, event = "E.AND.E", step = 0.01, file = NULL, c_name = "Time") # Plot the quantile curve quantilePlotNS(MQC, data = SampData)
Synthetic data modeled using copulas with the purpose of exampling the functions of the package, 50 two dimensional observations that follow a Clayton copula with a linear non stationary parameter.
SampDataSampData
SampDataA data frame with 50 rows and 2 columns:
Numeric, first variable
Numeric, second variable
Goutali and Chebana (2024) doi:10.1016/j.envsoft.2024.106090