Title: | Non-Supervised Regional Frequency Analysis |
---|---|
Description: | A collection of statistical tools for objective (non-supervised) applications of the Regional Frequency Analysis methods in hydrology. The package refers to the index-value method and, more precisely, helps the hydrologist to: (1) regionalize the index-value; (2) form homogeneous regions with similar growth curves; (3) fit distribution functions to the empirical regional growth curves. Most of the methods are those described in the Flood Estimation Handbook (Centre for Ecology & Hydrology, 1999, ISBN:9781906698003). Homogeneity tests from Hosking and Wallis (1993) <doi:10.1029/92WR01980> and Viglione et al. (2007) <doi:10.1029/2006WR005095> are available. |
Authors: | Alberto Viglione [aut, cre], Francesco Laio [ctb], Eric Gaume [ctb], Olivier Payrastre [ctb], Jose Luis Salinas [ctb], Chi Cong N'guyen [ctb], Karine Halbert [ctb] |
Maintainer: | Alberto Viglione <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.7-17 |
Built: | 2024-10-31 20:48:27 UTC |
Source: | CRAN |
The estimation of hydrological variables in ungauged basins is a very important topic for many purposes, from research to engineering applications and water management (see the PUB project, Sivapalan et al., 2003). Regardless of the method used to perform such estimation, the underlying idea is to transfer the hydrological information from gauged to ungauged sites. When observations of the same variable at different measuring sites are available and many data samples are used together as source of information, the methods are called regional methods. The well known regional frequency analysis (e.g. Hosking and Wallis, 1997), where the interest is in the assessment of the frequency of hydrological events, belong to this class of methods. In literature, the main studied variable is the flood peak and the most used regional approach is the index-flood method of Dalrymple (1960), in which it is implicitly assumed that the flood frequency distribution for different sites belonging to a homogeneous region is the same except for a site-specific scale factor, the index-flood (see Hosking and Wallis, 1997, for details). Hence, the estimation of the flood frequency distribution for an ungauged site can be divided into two parts: estimation of the index-flood (more in general, the index-value) through linear/non-linear relations with climatic and basin descriptors; estimation of the adimentional flood frequency distribution, the growth curve, assigning the ungauged basin to one homogeneous region.
nsRFA
is a collection of statistical tools for objective (non-supervised) applications of the Regional Frequency Analysis methods in hydrology.
This does not mean that Regional Frequency Analysis should be non-supervised.
These tools are addressed to experts, to help their expert judgement.
The functions in nsRFA
allow the application of the index-flood method in the following points:
- | regionalization of the index-value; |
- | formation of homogeneous regions for the growth curves; |
- | fit of a distribution function to the empirical growth curve of each region; |
Regionalization of the index-value
The index-value can be either the sample mean (e.g. Hosking and Wallis, 1997) or the sample median (e.g. Robson and Reed, 1999) or another scale parameter. Many methodological approaches are available for the index-value estimation, and their differences can be related to the amount of information available. Excluding direct methods, that use information provided by flow data available at the station of interest, regional estimation methods require ancillary hydrological and physical information. Those methods can be divided in two classes: the multiregressive approach and the hydrological simulation approach. For both of them, the best estimator is the one that optimizes some criterion, such as the minimum error, the minimum variance or the maximum efficiency. Due to its simplicity, the most frequently used method is the multiregressive approach (see e.g. Kottegoda & Rosso, 1998; Viglione et al., 2007a), that relates the index-flow to catchment characteristics, such as climatic indices, geologic and morphologic parameters, land cover type, etc., through linear or non-linear equations.
R provides the functions lm
and nls
for linear and non-linear regressions (package stats
).
With the package nsRFA
, a tool to select the best linear regressions given a set of candidate descriptors, bestlm
, is provided.
In REGRDIAGNOSTICS
several functions are provided to analyze the output of lm
, such as:
the coefficient of determination (classic and adjusted);
the Student t significance test;
the variance inflation factor (VIF);
the root mean squared error (RMSE);
the mean absolute error (MAE);
the prediction intervals;
predicted values by a jackknife (cross-validation) procedure.
The functions in DIAGNOSTICS
provide more general diagnostics of model results (that can be also non-linear), comparing estimated values with observed values.
More details are provided in vignettes:
nsRFA_ex01 |
How to use the package nsRFA (example 1): |
Regional frequency analysis of the annual flows in Piemonte | |
and Valle d'Aosta | |
that can be accessed via vignette("nsRFA_ex01", package="nsRFA")
.
Formation of homogeneous regions for the growth curves
Different techniques exist, for example those that lead to the formation of fixed regions through cluster analysis (Hosking and Wallis, 1997, Viglione, 2007), or those based on the method of the region of influence (ROI, Burn, 1990).
The regional procedure can be divided into two parts: the formation of regions and the assignment of an ungauged site to one
of them.
Concerning the first point, the sites are grouped according to their similarity in terms of those basin descriptors that are assumed to explain the shape of the growth curve.
This shape is usually quantified in a parametric way.
For instance, the coefficient of variation (CV) or the L-CV of the curve can be used for this purpose.
The package nsRFA
provide the functions in moments
and Lmoments
to calculate sample moments and L-moments.
Subsequently, the selected parameter is related with basin descriptors through a linear or a more complex model.
A regression analysis is performed with different combination of descriptors, and descriptors that are strongly related with the parameter are used to group sites in regions.
The same tools used for the regionalization of the index value, i.e. bestlm
, REGRDIAGNOSTICS
and DIAGNOSTICS
, can be used if the parametric method is chosen.
nsRFA
also provide a non-parametric approach that considers the dimensionless growth curve as a whole (see, Viglione et al., 2006; Viglione, 2007).
The multiregressive approach can still be used if we reason in terms of (dis)similarity between pairs of basins in the following way:
(1) for each couple of stations, a dissimilarity index between non-dimensional curves is calculated using a quantitatively predefined metric, for example using the Anderson-Darling statistic (A2
), and organising the distances in a matrix with AD.dist
;
(2) for each basin descriptor, the absolute value (or another metric) of the difference between its measure in two basins is used as distance between them, using dist
of the package stats
to obtain distance matrices;
(4) a multiregressive approach (bestlm
, lm
) is applied considering the matrices as variables and the basin descriptors associated to the best regression are chosen;
(5) the significance of the linear relationship between distance matrices is assessed through the Mantel test with mantel.lm
.
In the suitable-descriptor's space, stations with similar descriptor values can be grouped into disjoint regions through a cluster analysis (using functions in traceWminim
) or the ROI method can be used adapting a region to the ungauged basin (roi
).
In both cases, the homogeneity of the regions can be assessed with the functions in HOMTESTS
, where the Hosking and Wallis heterogeneity measures (HW.tests
, see Hosking and Wallis, 1997) and the Anderson-Darling homogeneity test (ADbootstrap.test
, see Viglione et al., 2007b) are provided.
More details are provided in vignettes:
nsRFA_ex01 |
How to use the package nsRFA (example 1): |
Regional frequency analysis of the annual flows in Piemonte | |
and Valle d'Aosta | |
nsRFA_ex02 |
How to use the package nsRFA (example 2): |
Region-Of-Influence approach, some FEH examples | |
that can be accessed via vignette("nsRFA_ex01", package="nsRFA")
.
Fit of a distribution function to the empirical growth curve of each region
Once an homogeneous region is defined, the empirical growth curves can be pooled together and a probability distribution can be fitted to the pooled sample.
The choice of the best distribution can be assisted by a Model Selection Criteria with MSClaio2008
(see, Laio et al., 2008).
The parameters of the selected distribution can be estimated using the method of moments (moment_estimation
), L-moments (par.GEV
, par.genpar
, par.gamma
, ...) or maximum-likelihood (MLlaio2004
).
Goodness-of-fit tests are also available: the Anderson-Darling goodness of fit test with GOFlaio2004
(Laio. 2004), and Monte-Carlo based tests with GOFmontecarlo
.
Confidence intervals for the fitted distribution can be calculated with a Markov Chain Monte Carlo algorithm, using BayesianMCMC
.
More details are provided in vignettes:
nsRFA_ex01 |
How to use the package nsRFA (example 1): |
Regional frequency analysis of the annual flows in Piemonte | |
and Valle d'Aosta | |
MSClaio2008 |
Model selection techniques for the frequency analysis |
of hydrological extremes: the MSClaio2008 R function | |
that can be accessed via vignette("nsRFA_ex01", package="nsRFA")
.
Other functions
varLmoments
provides distribution-free unbiased estimators of the variances and covariances of sample L-moments, as described in Elamir and Seheult (2004).
More details are provided in vignettes:
Fig1ElamirSeheult |
Figure 1 in Elamir and Seheult (2004) |
Package: | nsRFA |
Version: | 0.7 |
The package provides several tools for Regional Frequency Analysis of hydrological variables. The first version dates to 2006 and was developed in Turin at the Politecnico by Alberto Viglione.
For a complete list of the functions, use library(help="nsRFA").
Main changes in version 0.7
0.7-17: | removal of old Fortran code and therefore of functions bestlm (and therefore the vignette nsRFA_ex01 ) and HW.original ; |
0.7-12: | refinement of function BayesianMCMC allowing several threshold and new function BayesianMCMCreg ; |
0.7-1: | refinement of function BayesianMCMC ; |
0.7-0: | plotting position for historical information in DISTPLOTS ; |
Main changes in version 0.6
0.6-9: | new vignette Fig11GriffisStedinger ; |
0.6-8: | exponential and Gumbel distributions added in GOFmontecarlo ; |
0.6-6: | some plotting position/probability plots have been added in DISTPLOTS ; |
0.6-4: | refinement of function BayesianMCMC ; |
0.6-2: | new vignette nsRFA_ex02 ; |
0.6-2: | refinement of function BayesianMCMC ; |
0.6-0: | new vignette nsRFA_ex01 ; |
0.6-0: | new function bestlm ; |
0.6-0: | the plotting position/probability plots in DISTPLOTS have been reshaped; |
0.6-0: | this list of changes has been added; |
Alberto Viglione
Maintainer: Alberto Viglione <[email protected]>
All the manual references are listed here:
Beirlant, J., Goegebeur, Y., Teugels, J., Segers, J., 2004. Statistics of Extremes: Theory and Applications. John Wiley and Sons Inc., 490 pp.
Burn, D.H., 1990. Evaluation of regional flood frequency analysis with a region of influence approach. Water Resources Research 26(10), 2257-2265.
Castellarin, A., Burn, D.H., Brath, A., 2001. Assessing the effectiveness of hydrological similarity measures for flood frequency analysis. Journal of Hydrology 241, 270-285.
Chowdhury, J.U., Stedinger, J.R., Lu, L.H., Jul. 1991. Goodness-of-fit tests for regional generalized extreme value flood distributions. Water Resources Research 27(7), 1765-1776.
D'Agostino, R., Stephens, M., 1986. Goodness-of-fit techniques. Vol.68 of Statistics: textBooks and monographs. Department of Statistics, Southern Methodist University, Dallas, Texas.
Dalrymple, T., 1960. Flood frequency analyses. Vol. 1543-A of Water Supply Paper. U.S. Geological Survey, Reston, Va.
Durbin, J., Knott, M., 1971. Components of Cramer-von Mises statistics. London School of Economics and Political Science, pp. 290-307.
El Adlouni, S., Bob\'ee, B., Ouarda, T.B.M.J., 2008. On the tails of extreme event distributions in hydrology. Journal of Hydrology 355(1-4), 16-33.
Elamir, E. A.H., Seheult, A.H., 2004. Exact variance structure of sample l-moments. Journal of Statistical Planning and Inference 124, 337-359.
Everitt, B., 1974. Cluster Analysis. Social Science Research Council. Halsted Press, New York.
Fill, H., Stedinger, J., 1998. Using regional regression within index flood procedures and an empirical bayesian estimator. Journal of Hydrology 210(1-4), 128-145.
Greenwood, J., Landwehr, J., Matalas, N., Wallis, J., 1979. Probability weighted moments: Definition and relation to parameters of several distributions expressible in inverse form. Water Resources Research 15, 1049-1054.
Hosking, J., 1986. The theory of probability weighted moments. Tech. Rep. RC12210, IBM Research, Yorktown Heights, NY.
Hosking, J., 1990. L-moments: analysis and estimation of distributions using linear combinations of order statistics. J. Royal Statistical Soc. 52, 105-124.
Hosking, J., Wallis, J., 1993. Some statistics useful in regional frequency analysis. Water Resources Research 29(2), 271-281.
Hosking, J., Wallis, J., 1997. Regional Frequency Analysis: An Approach Based on L-Moments. Cambridge University Press.
Institute of Hydrology, 1999. Flood Estimation Handbook, Institute of Hydrology, Oxford.
Kendall, M., Stuart, A., 1961-1979. The Advanced Theory of Statistics. Charles Griffin & Company Limited, London.
Kottegoda, N.T., Rosso, R., 1997. Statistics, Probability, and Reliability for Civil and Environmental Engineers, international Edition. McGraw-Hill Companies.
Laio, F., 2004. Cramer-von Mises and Anderson-Darling goodness of fit tests for extreme value distributions with unknown parameters, Water Resour. Res., 40, W09308, doi:10.1029/2004WR003204.
Laio, F., Tamea, S., 2007. Verification tools for probabilistic forecast of continuous hydrological variables. Hydrology and Earth System Sciences 11, 1267-1277.
Laio, F., Di Baldassarre G., Montanari A., 2008. Model selection techniques for the frequency analysis of hydrological extremes, Water Resour. Res., Under Revision.
Regione Piemonte, 2004. Piano di tutela delle acque. Direzione Pianificazione Risorse Idriche.
Robson, A., Reed, D., 1999. Statistical procedures for flood frequency estimation. In: Flood Estimation HandBook. Vol.~3. Institute of Hydrology Crowmarsh Gifford, Wallingford, Oxfordshire.
Sankarasubramanian, A., Srinivasan, K., 1999. Investigation and comparison of sampling properties of l-moments and conventional moments. Journal of Hydrology 218, 13-34.
Scholz, F., Stephens, M., 1987. K-sample Anderson-Darling tests. Journal of American Statistical Association, 82(399), pp. 918-924.
Sivapalan, M., Takeuchi, K., Franks, S.W., Gupta, V.K., Karambiri, H., Lakshmi, V., Liang, X., McDonnell, J.J., Mendiondo, E.M., O'Connell, P.E., Oki, T., Pomeroy, J.W, Schertzer, D., Uhlenbrook, S., Zehe, E., 2003. IAHS decade on Predictions in Ungauged Basins (PUB), 2003-2012: Shaping an exciting future for the hydrological sciences, Hydrological Sciences - Journal - des Sciences Hydrologiques, 48(6), 857-880.
Smouse, P.E., Long, J.C., Sokal, R.R., 1986. Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Systematic Zoology, 35(4), 627-632.
Stedinger, J., Lu, L., 1995. Appraisal of regional and index flood quantile estimators. Stochastic Hydrology and Hydraulics 9(1), 49-75.
Stedinger, J.R., Vogel, R.M. and Foufoula-Georgiou, E., 1993. Frequency analysis of extreme events. In David R. Maidment, editor, Hand-Book of Hydrology, chapter 18. McGraw-Hill Companies, international edition.
Viglione, A., Claps, P., Laio, F., 2006. Utilizzo di criteri di prossimit\'a nell'analisi regionale del deflusso annuo, XXX Convegno di Idraulica e Costruzioni Idrauliche - IDRA 2006, Roma, 10-15 Settembre 2006.
Viglione, A., 2007. Metodi statistici non-supervised per la stima di grandezze idrologiche in siti non strumentati, PhD thesis at the Politecnico of Turin, February 2007.
Viglione, A., Claps, P., Laio, F., 2007a. Mean annual runoff estimation in North-Western Italy, In: G. La Loggia (Ed.) Water resources assessment and management under water scarcity scenarios, CDSU Publ. Milano.
Viglione, A., Laio, F., Claps, P., 2007b. A comparison of homogeneity tests for regional frequency analysis. Water Resources Research 43~(3).
Vogel, R., Fennessey, N., 1993. L moment diagrams should replace product moment diagrams. Water Resources Research 29(6), 1745-1752.
Vogel, R., Wilson, I., 1996. Probability distribution of annual maximum, mean, and minimum streamflows in the united states. Journal of Hydrologic Engineering 1(2), 69-76.
Ward, J., 1963. Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, pp. 236-244.
Distance matrix for growth curves. Every element of the matrix is the Anderson-Darling statistic calculated between two series.
AD.dist (x, cod, index=2)
AD.dist (x, cod, index=2)
x |
vector representing data from many samples defined with |
cod |
array that defines the data subdivision among sites |
index |
if |
The Anderson-Darling statistic used here is the one defined in https://en.wikipedia.org/wiki/Anderson-Darling_test as .
AD.dist
returns the distance matrix between growth-curves built with the Anderson-Darling statistic.
For information on the package and the Author, and for all the references, see nsRFA
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] # Ad.dist AD.dist(x,cod) # it takes some time
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] # Ad.dist AD.dist(x,cod) # it takes some time
Systematic flood data and historical flood data for the Ard\'eche region (France) as described in: \ Naulet, R. (2002). Utilisation de l'information des crues historiques pour une meilleure pr\'ed\'etermination du risque d'inondation. Application au bassin de l'Ard\‘eche \'a Vallon Pont-d’Arc et St-Martin d'Ard\‘eche. PhD thesis at CEMAGREF, Lyon, and at the Universit\’e du Qu\'ebec, pp. 322. \ and \ Nguyen, C.C. (2012). Am\'elioration des approches Bay\'esiennes MCMC pour l'analyse r\'egionale des crues (Improvement of BayesianMCMC approaches for regional flood frequency analyses). PhD thesis at the University of Nantes, pp. 192.
data(Ardechedata)
data(Ardechedata)
Ardeche_areas
, areas (km2) of the gauged and ungauged catchments in the Ard\'eche region (France);
Ardeche_ungauged_extremes
, flood peaks (m3/s) reconstructed in ungauged catchments and number of years for which the peak was not exceeded;
Beauvene_cont
, sistematic flood peaks (m3/s) recorded at one station;
Chambonas_cont
, sistematic flood peaks (m3/s) recorded at one station;
SaintLaurent_cont
, sistematic flood peaks (m3/s) recorded at one station;
SaintMartin_cont
, sistematic flood peaks (m3/s) recorded at one station;
SaintMartin_hist
, values for historical peaks (m3/s) for one station and for flood perception thresholds (m3/s) non exceeded in the periods indicated;
Vogue_cont
, sistematic flood peaks (m3/s) recorded at one station.
data(Ardechedata) SaintMartin_cont SaintMartin_hist
data(Ardechedata) SaintMartin_cont SaintMartin_hist
Bayesian Markov Chain Monte Carlo algorithm for flood frequency analysis with historical and other information. The user can choose between a local and a regional analysis.
BayesianMCMC (xcont, xhist=NA, infhist=NA, suphist=NA, nbans=NA, seuil=NA, nbpas=1000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV", apriori=function(...){1}, parameters0=NA, varparameters0=NA) BayesianMCMCcont (x, nbpas=NA) BayesianMCMCreg (xcont, scont, xhist=NA, infhist=NA, suphist=NA, shist=NA, nbans=NA, seuil=NA, nbpas=1000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV", apriori=function(...){1}, parameters0=NA, varparameters0=NA) BayesianMCMCregcont (x, nbpas=NA) plotBayesianMCMCreg_surf (x, surf, ask=FALSE, ...) ## S3 method for class 'BayesianMCMC' plot(x, which=1, ask=FALSE, ...) ## S3 method for class 'BayesianMCMC' print(x, ...) ## S3 method for class 'BayesianMCMCreg' plot(x, which=1, ask=FALSE, ...) ## S3 method for class 'BayesianMCMCreg' print(x, ...)
BayesianMCMC (xcont, xhist=NA, infhist=NA, suphist=NA, nbans=NA, seuil=NA, nbpas=1000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV", apriori=function(...){1}, parameters0=NA, varparameters0=NA) BayesianMCMCcont (x, nbpas=NA) BayesianMCMCreg (xcont, scont, xhist=NA, infhist=NA, suphist=NA, shist=NA, nbans=NA, seuil=NA, nbpas=1000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV", apriori=function(...){1}, parameters0=NA, varparameters0=NA) BayesianMCMCregcont (x, nbpas=NA) plotBayesianMCMCreg_surf (x, surf, ask=FALSE, ...) ## S3 method for class 'BayesianMCMC' plot(x, which=1, ask=FALSE, ...) ## S3 method for class 'BayesianMCMC' print(x, ...) ## S3 method for class 'BayesianMCMCreg' plot(x, which=1, ask=FALSE, ...) ## S3 method for class 'BayesianMCMCreg' print(x, ...)
x |
object of class |
xcont |
vector of systematic data |
scont |
vector of upstream catchment surfaces of systematic data |
xhist |
vector of historical data and/or extreme discharges at ungauged sites |
infhist |
vector of inferior limit for historical data and/or extreme discharges at ungauged sites |
suphist |
vector of superior limit for historical data and/or extreme discharges at ungauged sites |
shist |
vector of upstream catchment surfaces of extreme discharges at ungauged sites |
nbans |
period (in years) over which every threshold has not been exceeded except for the historical data and/or extreme discharges at ungauged sites. If several values of xhist for a same threshold, put the number of years associated to the threshold on the first row, then put 0 (see examples) |
seuil |
threshold not exceeded in the historical period except for the historical data and/or extreme discharges at ungauged sites (several thresholds allowed). |
nbpas |
number of iterations for the MCMC algorithm |
nbchaines |
number of chains for the MCMC algorithm |
confint |
confidence limits for the flood quantiles |
dist |
distribution: normal |
apriori |
function of the parameters of the model ‘proportional to’ their a-priori guessed distribution. The default fuction returns always 1, i.e. there is no a-priori information |
parameters0 |
initial values of the parameters for the MCMC algorithm |
varparameters0 |
initial values of the parameter variances for the MCMC algorithm |
which |
a number of a vector of numbers that defines the graph to plot (see details) |
ask |
if TRUE, the interactive mode is run |
surf |
a particular surface (number or vector), not necessarily being a surface included in the scont or shist vectors |
... |
other arguments |
Supported cases
These functions are taking 4 cases into account, depending on the type of data provided: - Using only the systematic data: xcont provided, xhist=NA, infhist=NA and suphist=NA - Using censored information, historical flood known: xcont and xhist provided, infhist=NA and suphist=NA - Using censored information, historical flood unknown precisely but its lower limit known: xcont and infhist provided, xhist=NA and suphist=NA - Taking into account flood estimation intervals: infhist and suphist (respectively lower and upper limits) provided, xcont provided, xhist=NA - Please note that every other case is NOT supported. For example, you can't have some historical flood values perfectly known as well as some other for which you only know a lower limit or an interval.
Regarding the perception thresholds: - By definition, the number of exceedances of each perception threshold within its application period has to be known precisely, and all the floods exceeding the threshold have to be included in xhist (or infhist or suphist). - Several thresholds are allowed. - It is possible to include in xhist (or infhist or suphist) historical values that do not exceed the associated perception threshold. - If for one or several thresholds you only know that this or these threshold have never been exceeded and no more information is available on floods that did not exceed the threshold(s), this case is also supported. In this case, put for the historical flood corresponding to the threshold xhist=-1 (or infhist=-1 or infhist=suphist=-1).
Bayesian inference
Bayesian inference uses a numerical estimate of the degree of belief in a hypothesis before evidence has been observed and calculates a numerical estimate of the degree of belief in the hypothesis after evidence has been observed.
The name ‘Bayesian’ comes from the frequent use of Bayes' theorem in the inference process.
In our case the problem is: which is the probability that a frequency function (of type defined in
dist
) has parameters , given that we have observed the realizations
(defined in
xcont
, xhist
, infhist
, suphist
, nbans
, seuil
).
The Bayes' theorem writes
where
is the conditional probability of
, given
(it is also called the posterior probability because it is derived from or depends upon the specified value of
) and is the result we are interested in;
is the prior probability or marginal probability of
(‘prior’ in the sense that it does not take into account any information about
), and can be given using the input
apriori
(it can be used to account for causal information);
is the conditional probability of
given
and it is defined choosing
dist
and depending on the availability of historical data;
is the prior or marginal probability of
, and acts as a normalizing constant.
Intuitively, Bayes' theorem in this form describes the way in which one's beliefs about observing
are updated by having observed
.
Since complex models cannot be processed in closed form by a Bayesian analysis, namely because of the extreme difficulty in computing the normalization factor , simulation-based Monte Carlo techniques as the MCMC approaches are used.
MCMC Metropolis algorithm
Markov chain Monte Carlo (MCMC) methods (which include random walk Monte Carlo methods), are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample from the desired distribution. The quality of the sample improves as a function of the number of steps.
The MCMC is performed here through a simple Metropolis algorithm, i.e. a Metropolis-Hastings algorithm with symmetric proposal density.
The Metropolis-Hastings algorithm can draw samples from any probability distribution , requiring only that a function proportional to the density can be calculated at
.
In Bayesian applications, the normalization factor is often extremely difficult to compute, so the ability to generate a sample without knowing this constant of proportionality is a major virtue of the algorithm.
The algorithm generates a Markov chain in which each state
depends only on the previous state
.
The algorithm uses a Gaussian proposal density
, which depends on the current state
, to generate a new proposed sample
.
This proposal is accepted as the next value
if
drawn from
satisfies
If the proposal is not accepted, then the current value of is retained (
).
The Markov chain is started from a random initial value and the algorithm is run for many iterations until this initial state is forgotten.
These samples, which are discarded, are known as burn-in.
The remaining set of accepted values of
represent a sample from the distribution
.
As a Gaussian proposal density (or a lognormal one for definite-positive parameters) is used, the variance parameter
has to be tuned during the burn-in period.
This is done by calculating the acceptance rate, which is the fraction of proposed samples that is accepted in a window of the last
samples.
The desired acceptance rate depends on the target distribution, however it has been shown theoretically that the ideal acceptance rate for a one dimensional Gaussian distribution is approx 50%, decreasing to approx 23% for an N-dimensional Gaussian target distribution.
If
is too small the chain will mix slowly (i.e., the acceptance rate will be too high, so the sampling will move around the space slowly and converge slowly to
).
If
is too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density.
The desired acceptance rate is fixed here to 34%.
The MCMC algorithm is based on a code developed by Eric Gaume on Scilab. It is still unstable and not all the distributions have been tested.
BayesianMCMC
and BayesianMCMCcont
(which just continues the simulations of BayesianMCMC
for local analyses and BayesianMCMCreg
and BayesianMCMCregcont
for regional analyses return the following values:
BayesianMCMCreg
and BayesianMCMCregcont
(which just continues the simulations of BayesianMCMC
starting from its output) return the following values:
parameters
matrix (nbpas)x(nbchaines) with the simulated sets of parameters with the MCMC algorithm;
parametersML
set of parameters correspondent to the maximum likelihood;
returnperiods
return periods for which quantilesML
and intervals
are calculated;
quantilesML
quantiles correspondent to returnperiods
for the distribution whose parameters are parametersML
;
logML
maximum log-likelihood;
intervals
confidence intervals for the quantiles quantilesML
for limits confint
;
varparameters
matrix (nbpas)x(nbchaines)x(number of parameters) with the simulated variances for the MCMC algorithm;
vraisdist
likelihoods for the sets parameters
;
propsaut
vector showing the evolution of the acceptance rate during the Bayesian MCMC fitting;
plot.BayesianMCMC
and plot.BayesianMCMCreg
(for a normalized surface of 1 km2) plot the following figures:
1
data as plotting position (the Cunanne plotting position is used), fitted distribution (maximum likelihood) and confidence intervals;
2
diagnostic plot of the MCMC simulation (parameters);
3
diagnostic plot of the MCMC simulation (likelihood and MCMC acceptance rate);
4
posterior distribution of parameters obtained with the MCMC simulation (cloud plots);
5
a-priori distribution of parameters (contour plots);
plotBayesianMCMCreg_surf
plots the same plot as the first one given by plot.BayesianMCMCreg
but for each surface in argument, as well as its mean as a function of the surfaces;
For information on the package and the Author, and for all the references, see nsRFA
.
Eric Gaume, Alberto Viglione, Jose Luis Salinas, Olivier Payrastre, Chi Cong N'guyen, Karine Halbert
.
set.seed(2988) serie <- rand.GEV(120, xi=40, alfa=20, k=-0.4) serie100 <- serie[1:100] serie100[serie100 < 250] <- NA serie20 <- serie[101:120] serie <- c(serie100, serie20) plot(serie, type="h", ylim=c(0, 600), xlab="", ylab="Annual flood peaks [m3/s]", lwd=3) abline(h=0) points(serie100, col=2) ## Not run: # Using only sistematic data only_sist <- BayesianMCMC (xcont=serie20, nbpas=5000, nbchaines=3, varparameters0=c(70, 20, 0.5), confint=c(0.05, 0.95), dist="GEV") plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600)) only_sist <- BayesianMCMCcont(only_sist) plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600)) only_sist <- BayesianMCMCcont(only_sist) plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Adding the information that the threshold 250 m3/s was exceeded # 3 times in the past 100 years with_hist_thresh <- BayesianMCMC (xcont=serie20, infhist=rep(250,3), nbans=100, seuil=250, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_thresh, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Assuming that the 3 historical events are known with high uncertainty with_hist_limits <- BayesianMCMC (xcont=serie20, infhist=c(320,320,250), suphist=c(360,400,270), nbans=100, seuil=250, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_limits, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Assuming that the 3 historical events are perfectly known with_hist_known <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)], nbans=100, seuil=250, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_known, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Perception threshold without available information on floods without_info <- BayesianMCMC (xcont=serie20, xhist=-1, nbans=100, seuil=2400, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV") plot(without_info, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Using one reasonable a-priori distribution fNORM3 <- function (x) { # x = vector of values # mu = vector of means mu = c(44, 26, -0.40) # CM = covariance matrix CM = matrix(c(13, 7.8, -0.055, 7.8, 15, -0.42, -0.055, -0.42, 0.056), nrow=3, ncol=3) CMm1 <- solve(CM) term2 <- exp(-((x - mu) %*% CMm1 %*% (x - mu))/2) term1 <- 1/(2*pi)^(3/2)/sqrt(det(CM)) term1*term2 } with_hist_known2 <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)], nbans=100, seuil=250, nbpas=5000, nbchaines=3, apriori=fNORM3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_known2, 5) plot(with_hist_known2, 4) plot(with_hist_known, 4) plot(with_hist_known) plot(with_hist_known2) # Using one non-reasonable a-priori distribution fNORM3 <- function (x) { # x = vector of values # mu = vector of means mu = c(30, 50, -0.10) # CM = covariance matrix CM = matrix(c(13, 7.8, -0.055, 7.8, 15, -0.42, -0.055, -0.42, 0.056), nrow=3, ncol=3) CMm1 <- solve(CM) term2 <- exp(-((x - mu) %*% CMm1 %*% (x - mu))/2) term2 } with_hist_known3 <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)], nbans=100, seuil=250, nbpas=5000, nbchaines=3, apriori=fNORM3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_known3, 5) plot(with_hist_known3, 4) plot(with_hist_known, 4) plot(with_hist_known) plot(with_hist_known3) ## End(Not run) ## Not run: # Assuming that the historical events are perfectly known and there are 4 different thresholds # The data file is presenting this way: # xhist nbans seuil # 6000 55 6000 # 7400 28 7250 # 6350 8 3050 # 4000 0 3050 # 4550 0 3050 # 3950 0 3050 # 7550 58 2400 # 4650 0 2400 # 3950 0 2400 ## Warning: nbans and seuil should have the same length as xhist. # So when a threshold is exceeded several times, replicate it as many times it is exceeded # and part the number of years of exceedance into the number of times of exceedance # (the way you part the nbans vector is not important, what is important is that you have # length(nbans)=length(xhist) and the total of years for one same threshold equals the number # of years covered by the perception threshold) xhist_thres <- c(6000, 7400, 6350, 4000, 4550, 3950, 7550, 4650, 3950) seuil_thres <- c(6000, 7250, rep(3050, 4), rep(2400, 3)) nbans_thres <- c(55, 28, 8, 0, 0, 0, 58, 0, 0) # The threshold at 6000 has been exceeded for 55 years, the one at 7250 for 28 years, # the one at 3050 for 8 years and the one at 2400 for 58 years with_hist_known_several_thresholds <- BayesianMCMC (xcont=serie20, xhist=xhist_thres, nbans=nbans_thres, seuil=seuil_thres, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV", varparameters0=c(NA, NA, 0.5)) plot(with_hist_known_several_thresholds, which=c(1:3), ask=TRUE) ## REGIONAL: # Regional analysis, assuming that the 3 historical events are perfectly known and # there are 2 perception thresholds regional_with_hist_known <- BayesianMCMCreg (xcont=serie20, scont=c(rep(507,9),rep(2240,11)), xhist=serie100[!is.na(serie100)], shist=c(495, 495, 87), nbans=c(100, 0, 50), seuil=c(312, 312, 221), nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV", varparameters0=c(NA, NA, NA, 0.5)) plot(regional_with_hist_known, which=1:3, ask=TRUE, ylim=c(1,600)) surf=c(571, 2240) plotBayesianMCMCreg_surf(regional_with_hist_known, surf) ## End(Not run)
set.seed(2988) serie <- rand.GEV(120, xi=40, alfa=20, k=-0.4) serie100 <- serie[1:100] serie100[serie100 < 250] <- NA serie20 <- serie[101:120] serie <- c(serie100, serie20) plot(serie, type="h", ylim=c(0, 600), xlab="", ylab="Annual flood peaks [m3/s]", lwd=3) abline(h=0) points(serie100, col=2) ## Not run: # Using only sistematic data only_sist <- BayesianMCMC (xcont=serie20, nbpas=5000, nbchaines=3, varparameters0=c(70, 20, 0.5), confint=c(0.05, 0.95), dist="GEV") plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600)) only_sist <- BayesianMCMCcont(only_sist) plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600)) only_sist <- BayesianMCMCcont(only_sist) plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Adding the information that the threshold 250 m3/s was exceeded # 3 times in the past 100 years with_hist_thresh <- BayesianMCMC (xcont=serie20, infhist=rep(250,3), nbans=100, seuil=250, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_thresh, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Assuming that the 3 historical events are known with high uncertainty with_hist_limits <- BayesianMCMC (xcont=serie20, infhist=c(320,320,250), suphist=c(360,400,270), nbans=100, seuil=250, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_limits, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Assuming that the 3 historical events are perfectly known with_hist_known <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)], nbans=100, seuil=250, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_known, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Perception threshold without available information on floods without_info <- BayesianMCMC (xcont=serie20, xhist=-1, nbans=100, seuil=2400, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV") plot(without_info, which=c(1:3), ask=TRUE, ylim=c(1,600)) # Using one reasonable a-priori distribution fNORM3 <- function (x) { # x = vector of values # mu = vector of means mu = c(44, 26, -0.40) # CM = covariance matrix CM = matrix(c(13, 7.8, -0.055, 7.8, 15, -0.42, -0.055, -0.42, 0.056), nrow=3, ncol=3) CMm1 <- solve(CM) term2 <- exp(-((x - mu) %*% CMm1 %*% (x - mu))/2) term1 <- 1/(2*pi)^(3/2)/sqrt(det(CM)) term1*term2 } with_hist_known2 <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)], nbans=100, seuil=250, nbpas=5000, nbchaines=3, apriori=fNORM3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_known2, 5) plot(with_hist_known2, 4) plot(with_hist_known, 4) plot(with_hist_known) plot(with_hist_known2) # Using one non-reasonable a-priori distribution fNORM3 <- function (x) { # x = vector of values # mu = vector of means mu = c(30, 50, -0.10) # CM = covariance matrix CM = matrix(c(13, 7.8, -0.055, 7.8, 15, -0.42, -0.055, -0.42, 0.056), nrow=3, ncol=3) CMm1 <- solve(CM) term2 <- exp(-((x - mu) %*% CMm1 %*% (x - mu))/2) term2 } with_hist_known3 <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)], nbans=100, seuil=250, nbpas=5000, nbchaines=3, apriori=fNORM3, confint=c(0.05, 0.95), dist="GEV") plot(with_hist_known3, 5) plot(with_hist_known3, 4) plot(with_hist_known, 4) plot(with_hist_known) plot(with_hist_known3) ## End(Not run) ## Not run: # Assuming that the historical events are perfectly known and there are 4 different thresholds # The data file is presenting this way: # xhist nbans seuil # 6000 55 6000 # 7400 28 7250 # 6350 8 3050 # 4000 0 3050 # 4550 0 3050 # 3950 0 3050 # 7550 58 2400 # 4650 0 2400 # 3950 0 2400 ## Warning: nbans and seuil should have the same length as xhist. # So when a threshold is exceeded several times, replicate it as many times it is exceeded # and part the number of years of exceedance into the number of times of exceedance # (the way you part the nbans vector is not important, what is important is that you have # length(nbans)=length(xhist) and the total of years for one same threshold equals the number # of years covered by the perception threshold) xhist_thres <- c(6000, 7400, 6350, 4000, 4550, 3950, 7550, 4650, 3950) seuil_thres <- c(6000, 7250, rep(3050, 4), rep(2400, 3)) nbans_thres <- c(55, 28, 8, 0, 0, 0, 58, 0, 0) # The threshold at 6000 has been exceeded for 55 years, the one at 7250 for 28 years, # the one at 3050 for 8 years and the one at 2400 for 58 years with_hist_known_several_thresholds <- BayesianMCMC (xcont=serie20, xhist=xhist_thres, nbans=nbans_thres, seuil=seuil_thres, nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV", varparameters0=c(NA, NA, 0.5)) plot(with_hist_known_several_thresholds, which=c(1:3), ask=TRUE) ## REGIONAL: # Regional analysis, assuming that the 3 historical events are perfectly known and # there are 2 perception thresholds regional_with_hist_known <- BayesianMCMCreg (xcont=serie20, scont=c(rep(507,9),rep(2240,11)), xhist=serie100[!is.na(serie100)], shist=c(495, 495, 87), nbans=c(100, 0, 50), seuil=c(312, 312, 221), nbpas=5000, nbchaines=3, confint=c(0.05, 0.95), dist="GEV", varparameters0=c(NA, NA, NA, 0.5)) plot(regional_with_hist_known, which=1:3, ask=TRUE, ylim=c(1,600)) surf=c(571, 2240) plotBayesianMCMCreg_surf(regional_with_hist_known, surf) ## End(Not run)
Diagnostics of model results, it compares estimated values y
with observed values x
.
R2 (x, y, na.rm=FALSE) RMSE (x, y, na.rm=FALSE) MAE (x, y, na.rm=FALSE) RMSEP (x, y, na.rm=FALSE) MAEP (x, y, na.rm=FALSE)
R2 (x, y, na.rm=FALSE) RMSE (x, y, na.rm=FALSE) MAE (x, y, na.rm=FALSE) RMSEP (x, y, na.rm=FALSE) MAEP (x, y, na.rm=FALSE)
x |
observed values |
y |
estimated values |
na.rm |
logical. Should missing values be removed? |
If are the observed values,
the estimated values, with
, and
the sample mean of
, then:
See https://en.wikipedia.org/wiki/Coefficient_of_determination, https://en.wikipedia.org/wiki/Mean_squared_error and https://en.wikipedia.org/wiki/Mean_absolute_error for other details.
R2
returns the coefficient of determination of a model.
RMSE
returns the root mean squared error of a model.
MAE
returns the mean absolute error of a model.
RMSE
returns the percentual root mean squared error of a model.
MAE
returns the percentual mean absolute error of a model.
For information on the package and the Author, and for all the references, see nsRFA
.
lm
, summary.lm
, predict.lm
, REGRDIAGNOSTICS
data(hydroSIMN) datregr <- parameters regr0 <- lm(Dm ~ .,datregr); summary(regr0) regr1 <- lm(Dm ~ Am + Hm + Ybar,datregr); summary(regr1) obs <- parameters[,"Dm"] est0 <- regr0$fitted.values est1 <- regr1$fitted.values R2(obs, est0) R2(obs, est1) RMSE(obs, est0) RMSE(obs, est1) MAE(obs, est0) MAE(obs, est1) RMSEP(obs, est0) RMSEP(obs, est1) MAEP(obs, est0) MAEP(obs, est1)
data(hydroSIMN) datregr <- parameters regr0 <- lm(Dm ~ .,datregr); summary(regr0) regr1 <- lm(Dm ~ Am + Hm + Ybar,datregr); summary(regr1) obs <- parameters[,"Dm"] est0 <- regr0$fitted.values est1 <- regr1$fitted.values R2(obs, est0) R2(obs, est1) RMSE(obs, est0) RMSE(obs, est1) MAE(obs, est0) MAE(obs, est1) RMSEP(obs, est0) RMSEP(obs, est1) MAEP(obs, est0) MAEP(obs, est1)
Sample values are plotted against their empirical distribution in graphs where points belonging to a particular distribution should lie on a straight line.
plotpos (x, a=0, orient="xF", ...) plotposRP (x, a=0, orient="xF", ...) loglogplot (x, a=0, orient="xF", ...) unifplot (x, a=0, orient="xF", line=FALSE, ...) normplot (x, a=0, orient="xF", line=FALSE, ...) lognormplot (x, a=0, orient="xF", line=FALSE, ...) studentplot (x, df, a=0, orient="xF", line=FALSE,...) logisplot (x, a=0, orient="xF", line=FALSE,...) gammaplot (x, shape, a=0, orient="xF", line=FALSE,...) expplot (x, a=0, orient="xF", line=FALSE,...) paretoplot (x, a=0, orient="xF", line=FALSE,...) gumbelplot (x, a=0, orient="xF", line=FALSE, ...) frechetplot (x, a=0, orient="xF", line=FALSE,...) weibullplot (x, a=0, orient="xF", line=FALSE,...) plotposRPhist (xcont, xhist=NA, infhist=NA, suphist=NA, nbans=NA, seuil=NA, col12=c(1,1), a=0, orient="xF", ...) pointspos (x, a=0, orient="xF", ...) pointsposRP (x, a=0, orient="xF", ...) loglogpoints (x, a=0, orient="xF", ...) unifpoints (x, a=0, orient="xF", ...) normpoints (x, a=0, orient="xF", ...) studentpoints (x, df, a=0, orient="xF", ...) logispoints (x, a=0, orient="xF", ...) gammapoints (x, shape, a=0, orient="xF", ...) exppoints (x, a=0, orient="xF", ...) gumbelpoints (x, a=0, orient="xF", ...) weibullpoints (x, a=0, orient="xF", ...) regionalplotpos (x, cod, a=0, orient="xF", ...) regionalnormplot (x, cod, a=0, orient="xF", ...) regionallognormplot (x, cod, a=0, orient="xF", ...) regionalexpplot (x, cod, a=0, orient="xF", ...) regionalparetoplot (x, cod, a=0, orient="xF", ...) regionalgumbelplot (x, cod, a=0, orient="xF", ...) regionalfrechetplot (x, cod, a=0, orient="xF", ...) pointsposRPhist (xcont, xhist=NA, infhist=NA, suphist=NA, nbans=NA, seuil=NA, col12=c(1,1), a=0, orient="xF", ...)
plotpos (x, a=0, orient="xF", ...) plotposRP (x, a=0, orient="xF", ...) loglogplot (x, a=0, orient="xF", ...) unifplot (x, a=0, orient="xF", line=FALSE, ...) normplot (x, a=0, orient="xF", line=FALSE, ...) lognormplot (x, a=0, orient="xF", line=FALSE, ...) studentplot (x, df, a=0, orient="xF", line=FALSE,...) logisplot (x, a=0, orient="xF", line=FALSE,...) gammaplot (x, shape, a=0, orient="xF", line=FALSE,...) expplot (x, a=0, orient="xF", line=FALSE,...) paretoplot (x, a=0, orient="xF", line=FALSE,...) gumbelplot (x, a=0, orient="xF", line=FALSE, ...) frechetplot (x, a=0, orient="xF", line=FALSE,...) weibullplot (x, a=0, orient="xF", line=FALSE,...) plotposRPhist (xcont, xhist=NA, infhist=NA, suphist=NA, nbans=NA, seuil=NA, col12=c(1,1), a=0, orient="xF", ...) pointspos (x, a=0, orient="xF", ...) pointsposRP (x, a=0, orient="xF", ...) loglogpoints (x, a=0, orient="xF", ...) unifpoints (x, a=0, orient="xF", ...) normpoints (x, a=0, orient="xF", ...) studentpoints (x, df, a=0, orient="xF", ...) logispoints (x, a=0, orient="xF", ...) gammapoints (x, shape, a=0, orient="xF", ...) exppoints (x, a=0, orient="xF", ...) gumbelpoints (x, a=0, orient="xF", ...) weibullpoints (x, a=0, orient="xF", ...) regionalplotpos (x, cod, a=0, orient="xF", ...) regionalnormplot (x, cod, a=0, orient="xF", ...) regionallognormplot (x, cod, a=0, orient="xF", ...) regionalexpplot (x, cod, a=0, orient="xF", ...) regionalparetoplot (x, cod, a=0, orient="xF", ...) regionalgumbelplot (x, cod, a=0, orient="xF", ...) regionalfrechetplot (x, cod, a=0, orient="xF", ...) pointsposRPhist (xcont, xhist=NA, infhist=NA, suphist=NA, nbans=NA, seuil=NA, col12=c(1,1), a=0, orient="xF", ...)
x |
vector representing a data-sample |
xcont |
vector of systematic data (see |
xhist |
vector of historical data (see |
infhist |
inferior limit for historical data (see |
suphist |
superior limit for historical data (see |
nbans |
period (in years) over which the threshold has not been exceeded except for the historical data (see |
seuil |
threshold non exceeded in the historical period except for the historical data (see |
df |
degrees of freedom (> 0, maybe non-integer) of the Student t distribution. 'df = Inf' is allowed. |
shape |
shape parameter of the distribution |
a |
plotting position parameter, normally between 0 and 0.5 (the default value here, corresponding to the Hazen plotting position, see details) |
orient |
if |
line |
if TRUE (default) a straight line indicating the normal, lognormal, ..., distribution with parameters estimated from |
cod |
array that defines the data subdivision among sites |
col12 |
vector of 2 elements containing the colors for the systematic and historical data respectively |
... |
graphical parameters as |
A brief introduction on Probability Plots (or Quantile-Quantile plots) is available on https://en.wikipedia.org/wiki/Q-Q_plot. For plotting positions see https://en.wikipedia.org/wiki/Plotting_position.
For the quantiles of the comparison distribution typically the Weibull formula is used (default here).
Several different formulas have been used or proposed as symmetrical plotting positions.
Such formulas have the form
for some value of in the range from 0 to 1/2.
The above expression
is one example of these, for
.
The Filliben plotting position has
and the Cunanne plotting position has
should be nearly quantile-unbiased for a range of distributions.
The Hazen plotting position, widely used by engineers, has
.
The Blom's plotting position,
, gives nearly unbiased quantiles for the normal distribution, while the Gringeton plotting position,
, is optimized for the largest observations from a Gumbel distribution.
For the generalized Pareto, the GEV and related distributions of the Type I (Gumbel) and Weibull,
is suggested.
For large sample size, , there is little difference between these various expressions.
Representation of the values of x
vs their empirical probability function in a cartesian, uniform, normal, lognormal or Gumbel plot.
plotpos
and unifplot
are analogous except for the axis notation, unifplot
has the same notation as normplot
, lognormplot
, ...
plotposRP
is analogous to plotpos
but the frequencies are expressed as Return Periods
.
With the default settings,
is defined with the Weibull plotting position
.
The straight line (if
line
=TRUE) indicate the uniform, normal, lognormal or Gumbel distribution with parameters estimated from x
.
The regional plots draw samples of a region on the same plot.
pointspos
, normpoints
, ... are the analogous of points
, they can be used to add points or lines to plotpos
, normplot
, ...
normpoints
can be used either on normplot
or lognormplot
.
exppoints
can be used either on expplot
or paretoplot
(since the log-transformed Pareto random variable is exponentially distributed).
gumbelpoints
can be used either on gumbelplot
or frechetplot
(since the log-transformed Frechet random variable is distributed as a Gumbel).
loglogplot
plots the logarithm of sample vs the logarithm of the empirical exceedance probability. For the log-log plot, the tail probability is represented by a straight line for power-law distributions (e.g. log-pearson, log-logistic, Frechet, ..., HEAVY TAIL), but not for the other subexponential or exponential distributions (e.g. gumbel, gamma, Pearson type III, ..., MODERATE TAIL); see El Adlouni et al. (2008).
plotposRPhist
is based on the method in Stedinger et al. (1993, pp. 18.41-42).
For information on the package and the Author, and for all the references, see nsRFA
.
These functons are analogous to qqnorm
; for the distributions, see Normal
, Lognormal
, LOGNORM
, GUMBEL
.
x <- rnorm(30,10,2) plotpos(x) normplot(x) normplot(x,xlab=expression(D[m]),ylab=expression(hat(F)), main="Normal plot",cex.main=1,font.main=1) normplot(x,line=FALSE) x <- rlnorm(30,log(100),log(10)) normplot(x) lognormplot(x) x <- rand.gumb(30,1000,100) normplot(x) gumbelplot(x) x <- rnorm(30,10,2) y <- rnorm(50,10,3) z <- c(x,y) codz <- c(rep(1,30),rep(2,50)) regionalplotpos(z,codz) regionalnormplot(z,codz,xlab="z") regionallognormplot(z,codz) regionalgumbelplot(z,codz) plotpos(x) pointspos(y,pch=2,col=2) x <- rnorm(50,10,2) F <- seq(0.01,0.99,by=0.01) qq <- qnorm(F,10,2) plotpos(x) pointspos(qq,type="l") normplot(x,line=FALSE) normpoints(x,type="l",lty=2,col=3) lognormplot(x) normpoints(x,type="l",lty=2,col=3) gumbelplot(x) gumbelpoints(x,type="l",lty=2,col=3) # distributions comparison in probabilistic graphs x <- rnorm(50,10,2) F <- seq(0.001,0.999,by=0.001) loglikelhood <- function(param) {-sum(dgamma(x, shape=param[1], scale=param[2], log=TRUE))} parameters <- optim(c(1,1),loglikelhood)$par qq <- qgamma(F,shape=parameters[1],scale=parameters[2]) plotpos(x) pointspos(qq,type="l") normplot(x,line=FALSE) normpoints(qq,type="l") lognormplot(x,line=FALSE) normpoints(qq,type="l")
x <- rnorm(30,10,2) plotpos(x) normplot(x) normplot(x,xlab=expression(D[m]),ylab=expression(hat(F)), main="Normal plot",cex.main=1,font.main=1) normplot(x,line=FALSE) x <- rlnorm(30,log(100),log(10)) normplot(x) lognormplot(x) x <- rand.gumb(30,1000,100) normplot(x) gumbelplot(x) x <- rnorm(30,10,2) y <- rnorm(50,10,3) z <- c(x,y) codz <- c(rep(1,30),rep(2,50)) regionalplotpos(z,codz) regionalnormplot(z,codz,xlab="z") regionallognormplot(z,codz) regionalgumbelplot(z,codz) plotpos(x) pointspos(y,pch=2,col=2) x <- rnorm(50,10,2) F <- seq(0.01,0.99,by=0.01) qq <- qnorm(F,10,2) plotpos(x) pointspos(qq,type="l") normplot(x,line=FALSE) normpoints(x,type="l",lty=2,col=3) lognormplot(x) normpoints(x,type="l",lty=2,col=3) gumbelplot(x) gumbelpoints(x,type="l",lty=2,col=3) # distributions comparison in probabilistic graphs x <- rnorm(50,10,2) F <- seq(0.001,0.999,by=0.001) loglikelhood <- function(param) {-sum(dgamma(x, shape=param[1], scale=param[2], log=TRUE))} parameters <- optim(c(1,1),loglikelhood)$par qq <- qgamma(F,shape=parameters[1],scale=parameters[2]) plotpos(x) pointspos(qq,type="l") normplot(x,line=FALSE) normpoints(qq,type="l") lognormplot(x,line=FALSE) normpoints(qq,type="l")
EXP
provides the link between L-moments of a sample and the two parameter
exponential distribution.
f.exp (x, xi, alfa) F.exp (x, xi, alfa) invF.exp (F, xi, alfa) Lmom.exp (xi, alfa) par.exp (lambda1, lambda2) rand.exp (numerosita, xi, alfa)
f.exp (x, xi, alfa) F.exp (x, xi, alfa) invF.exp (F, xi, alfa) Lmom.exp (xi, alfa) par.exp (lambda1, lambda2) rand.exp (numerosita, xi, alfa)
x |
vector of quantiles |
xi |
vector of exp location parameters |
alfa |
vector of exp scale parameters |
F |
vector of probabilities |
lambda1 |
vector of sample means |
lambda2 |
vector of L-variances |
numerosita |
numeric value indicating the length of the vector to be generated |
See https://en.wikipedia.org/wiki/Exponential_distribution for a brief introduction on the Exponential distribution.
Definition
Parameters (2): (lower endpoint of the distribution),
(scale).
Range of :
.
Probability density function:
Cumulative distribution function:
Quantile function:
L-moments
Parameters
If is known,
is given by
and the L-moment, moment, and maximum-likelihood estimators are identical.
If
is unknown, the parameters are given by
For estimation based on a single sample these estimates are inefficient, but in regional frequency analysis they can give reasonable estimates of upper-tail quantiles.
Lmom.exp
and par.exp
accept input as vectors of equal length. In f.exp
, F.exp
, invF.exp
and rand.exp
parameters (xi
, alfa
) must be atomic.
f.exp
gives the density ,
F.exp
gives the distribution function ,
invFexp
gives
the quantile function ,
Lmom.exp
gives the L-moments (,
,
,
),
par.exp
gives the parameters (xi
, alfa
), and rand.exp
generates random deviates.
For information on the package and the Author, and for all the references, see nsRFA
.
rnorm
, runif
, GENLOGIS
, GENPAR
, GEV
, GUMBEL
, KAPPA
, LOGNORM
, P3
; DISTPLOTS
, GOFmontecarlo
, Lmoments
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.exp(ll[1],ll[2]) f.exp(1800,parameters$xi,parameters$alfa) F.exp(1800,parameters$xi,parameters$alfa) invF.exp(0.7870856,parameters$xi,parameters$alfa) Lmom.exp(parameters$xi,parameters$alfa) rand.exp(100,parameters$xi,parameters$alfa) Rll <- regionalLmoments(x,fac); Rll parameters <- par.exp(Rll[1],Rll[2]) Lmom.exp(parameters$xi,parameters$alfa)
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.exp(ll[1],ll[2]) f.exp(1800,parameters$xi,parameters$alfa) F.exp(1800,parameters$xi,parameters$alfa) invF.exp(0.7870856,parameters$xi,parameters$alfa) Lmom.exp(parameters$xi,parameters$alfa) rand.exp(100,parameters$xi,parameters$alfa) Rll <- regionalLmoments(x,fac); Rll parameters <- par.exp(Rll[1],Rll[2]) Lmom.exp(parameters$xi,parameters$alfa)
Flood Estimation Handbook flood peak data CD-ROM.
data(FEH1000)
data(FEH1000)
Data.frames:
am
is the data.frame of the annual maximum flows with 3 columns:
number
, the code of the station;
date
, date of the annual maximum;
year
, year of the annual maximum (we consider hydrologic year: 1 october - 30 september);
am
, the value of the annual maximum flow [m3/s].
cd
is the data.frame of parameters of 1000 catchements with 24 columns:
number
, the code of the station;
nominal_area
, catchment drainage area [km2];
nominal_ngr_x
, basin outflow coordinates [m];
nominal_ngr_y
, basin outflow coordinates [m];
ihdtm_ngr_x
, basin outflow coordinates by Institute of Hydrology digital terrain model [m];
ihdtm_ngr_y
, basin outflow coordinates by Institute of Hydrology digital terrain model [m];
dtm_area
, catchment drainage area [km2] derived by CEH using their DTM (IHDTM);
saar4170
, standard average annual rainfall 1941-1970 [mm];
bfihost
, baseflow index derived from HOST soils data;
sprhost
, standard percentage runoff derived from HOST soils data;
farl
, index of flood attenuation due to reservoirs and lakes;
saar
, standard average annual rainfall 1961-1990 [mm];
rmed_1d
, median annual maximum 1-day rainfall [mm];
rmed_2d
, median annual maximum 2-days rainfall [mm];
rmed_1h
, median annual maximum 1-hour rainfall [mm];
smdbar
, mean SMD (soil moisture deficit) for the period 1961-1990 calculated from MORECS month-end values [mm];
propwet
, proportion of time when soil moisture deficit <=6 mm during 1961-90, defined using MORECS;
ldp
, longest drainage path [km], defined by recording the greatest distance from a catchment node to the defined outlet;
dplbar
, mean drainage path length [km];
altbar
, mean catchment altitude [m];
dpsbar
, mean catchement slope [m/km];
aspbar
, index representing the dominant aspect of catchment slopes (its values increase clockwise from zero to 360, starting from the north). Mean direction of all inter-nodal slopes with north being zero;
aspvar
, index describing the invariability in aspect of catchment slopes. Values close to one when all slopes face a similar direction;
urbext1990
, extent of urban and suburban land cover in 1990 [fraction].
For information on the package and the Author, and for all the references, see nsRFA
.
http://www.environment-agency.gov.uk/hiflowsuk/
data(FEH1000) names(cd) am[1:20,]
data(FEH1000) names(cd) am[1:20,]
Functions for inversion calculation.
data(functionsLaio)
data(functionsLaio)
Data.frames:
df.kgev
is a data.frame with the skewness coefficient (first column) and the corresponding shape parameter of the GEV (second column)
df.polig
represents the poligamma function.
For information on the package and the Author, and for all the references, see nsRFA
.
GENLOGIS
provides the link between L-moments of a sample and the three parameter
generalized logistic distribution.
f.genlogis (x, xi, alfa, k) F.genlogis (x, xi, alfa, k) invF.genlogis (F, xi, alfa, k) Lmom.genlogis (xi, alfa, k) par.genlogis (lambda1, lambda2, tau3) rand.genlogis (numerosita, xi, alfa, k)
f.genlogis (x, xi, alfa, k) F.genlogis (x, xi, alfa, k) invF.genlogis (F, xi, alfa, k) Lmom.genlogis (xi, alfa, k) par.genlogis (lambda1, lambda2, tau3) rand.genlogis (numerosita, xi, alfa, k)
x |
vector of quantiles |
xi |
vector of genlogis location parameters |
alfa |
vector of genlogis scale parameters |
k |
vector of genlogis shape parameters |
F |
vector of probabilities |
lambda1 |
vector of sample means |
lambda2 |
vector of L-variances |
tau3 |
vector of L-CA (or L-skewness) |
numerosita |
numeric value indicating the length of the vector to be generated |
See https://en.wikipedia.org/wiki/Logistic_distribution for an introduction to the Logistic Distribution.
Definition
Parameters (3): (location),
(scale),
(shape).
Range of :
if
;
if
;
if
.
Probability density function:
where if
,
if
.
Cumulative distribution function:
Quantile function:
if
,
if
.
is the logistic distribution.
L-moments
L-moments are defined for .
Parameters
,
,
.
Lmom.genlogis
and par.genlogis
accept input as vectors of equal length. In f.genlogis
, F.genlogis
, invF.genlogis
and rand.genlogis
parameters (xi
, alfa
, k
) must be atomic.
f.genlogis
gives the density ,
F.genlogis
gives the distribution function ,
invF.genlogis
gives the quantile function ,
Lmom.genlogis
gives the L-moments (,
,
,
),
par.genlogis
gives the parameters (xi
, alfa
, k
), and rand.genlogis
generates random deviates.
For information on the package and the Author, and for all the references, see nsRFA
.
rnorm
, runif
, EXP
, GENPAR
, GEV
, GUMBEL
, KAPPA
, LOGNORM
, P3
; DISTPLOTS
, GOFmontecarlo
, Lmoments
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.genlogis(ll[1],ll[2],ll[4]) f.genlogis(1800,parameters$xi,parameters$alfa,parameters$k) F.genlogis(1800,parameters$xi,parameters$alfa,parameters$k) invF.genlogis(0.7697433,parameters$xi,parameters$alfa,parameters$k) Lmom.genlogis(parameters$xi,parameters$alfa,parameters$k) rand.genlogis(100,parameters$xi,parameters$alfa,parameters$k) Rll <- regionalLmoments(x,fac); Rll parameters <- par.genlogis(Rll[1],Rll[2],Rll[4]) Lmom.genlogis(parameters$xi,parameters$alfa,parameters$k)
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.genlogis(ll[1],ll[2],ll[4]) f.genlogis(1800,parameters$xi,parameters$alfa,parameters$k) F.genlogis(1800,parameters$xi,parameters$alfa,parameters$k) invF.genlogis(0.7697433,parameters$xi,parameters$alfa,parameters$k) Lmom.genlogis(parameters$xi,parameters$alfa,parameters$k) rand.genlogis(100,parameters$xi,parameters$alfa,parameters$k) Rll <- regionalLmoments(x,fac); Rll parameters <- par.genlogis(Rll[1],Rll[2],Rll[4]) Lmom.genlogis(parameters$xi,parameters$alfa,parameters$k)
GENPAR
provides the link between L-moments of a sample and the three parameter
generalized Pareto distribution.
f.genpar (x, xi, alfa, k) F.genpar (x, xi, alfa, k) invF.genpar (F, xi, alfa, k) Lmom.genpar (xi, alfa, k) par.genpar (lambda1, lambda2, tau3) rand.genpar (numerosita, xi, alfa, k)
f.genpar (x, xi, alfa, k) F.genpar (x, xi, alfa, k) invF.genpar (F, xi, alfa, k) Lmom.genpar (xi, alfa, k) par.genpar (lambda1, lambda2, tau3) rand.genpar (numerosita, xi, alfa, k)
x |
vector of quantiles |
xi |
vector of genpar location parameters |
alfa |
vector of genpar scale parameters |
k |
vector of genpar shape parameters |
F |
vector of probabilities |
lambda1 |
vector of sample means |
lambda2 |
vector of L-variances |
tau3 |
vector of L-CA (or L-skewness) |
numerosita |
numeric value indicating the length of the vector to be generated |
See https://en.wikipedia.org/wiki/Pareto_distribution for an introduction to the Pareto distribution.
Definition
Parameters (3): (location),
(scale),
(shape).
Range of :
if
;
if
.
Probability density function:
where if
,
if
.
Cumulative distribution function:
Quantile function:
if
,
if
.
is the exponential distribution;
is the uniform distribution on the interval
.
L-moments
L-moments are defined for .
The relation between and
is given by
Parameters
If is known,
and
;
if
is unknown,
,
and
.
Lmom.genpar
and par.genpar
accept input as vectors of equal length. In f.genpar
, F.genpar
, invF.genpar
and rand.genpar
parameters (xi
, alfa
, k
) must be atomic.
f.genpar
gives the density ,
F.genpar
gives the distribution function ,
invF.genpar
gives
the quantile function ,
Lmom.genpar
gives the L-moments (,
,
,
),
par.genpar
gives the parameters (xi
, alfa
, k
), and rand.genpar
generates random deviates.
For information on the package and the Author, and for all the references, see nsRFA
.
rnorm
, runif
, EXP
, GENLOGIS
, GEV
, GUMBEL
, KAPPA
, LOGNORM
, P3
; DISTPLOTS
, GOFmontecarlo
, Lmoments
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.genpar(ll[1],ll[2],ll[4]) f.genpar(1800,parameters$xi,parameters$alfa,parameters$k) F.genpar(1800,parameters$xi,parameters$alfa,parameters$k) invF.genpar(0.7161775,parameters$xi,parameters$alfa,parameters$k) Lmom.genpar(parameters$xi,parameters$alfa,parameters$k) rand.genpar(100,parameters$xi,parameters$alfa,parameters$k) Rll <- regionalLmoments(x,fac); Rll parameters <- par.genpar(Rll[1],Rll[2],Rll[4]) Lmom.genpar(parameters$xi,parameters$alfa,parameters$k)
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.genpar(ll[1],ll[2],ll[4]) f.genpar(1800,parameters$xi,parameters$alfa,parameters$k) F.genpar(1800,parameters$xi,parameters$alfa,parameters$k) invF.genpar(0.7161775,parameters$xi,parameters$alfa,parameters$k) Lmom.genpar(parameters$xi,parameters$alfa,parameters$k) rand.genpar(100,parameters$xi,parameters$alfa,parameters$k) Rll <- regionalLmoments(x,fac); Rll parameters <- par.genpar(Rll[1],Rll[2],Rll[4]) Lmom.genpar(parameters$xi,parameters$alfa,parameters$k)
GEV
provides the link between L-moments of a sample and the three parameter
generalized extreme value distribution.
f.GEV (x, xi, alfa, k) F.GEV (x, xi, alfa, k) invF.GEV (F, xi, alfa, k) Lmom.GEV (xi, alfa, k) par.GEV (lambda1, lambda2, tau3) rand.GEV (numerosita, xi, alfa, k)
f.GEV (x, xi, alfa, k) F.GEV (x, xi, alfa, k) invF.GEV (F, xi, alfa, k) Lmom.GEV (xi, alfa, k) par.GEV (lambda1, lambda2, tau3) rand.GEV (numerosita, xi, alfa, k)
x |
vector of quantiles |
xi |
vector of GEV location parameters |
alfa |
vector of GEV scale parameters |
k |
vector of GEV shape parameters |
F |
vector of probabilities |
lambda1 |
vector of sample means |
lambda2 |
vector of L-variances |
tau3 |
vector of L-CA (or L-skewness) |
numerosita |
numeric value indicating the length of the vector to be generated |
See https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution for an introduction to the GEV distribution.
Definition
Parameters (3): (location),
(scale),
(shape).
Range of :
if
;
if
;
if
.
Probability density function:
where if
,
if
.
Cumulative distribution function:
Quantile function:
if
,
if
.
is the Gumbel distribution;
is the reverse exponential distribution.
L-moments
L-moments are defined for .
Here denote the gamma function
Parameters
To estimate , no explicit solution is possible, but the following approximation has accurancy better than
for
:
where
The other parameters are then given by
Lmom.GEV
and par.GEV
accept input as vectors of equal length. In f.GEV
, F.GEV
, invF.GEV
and rand.GEV
parameters (xi
, alfa
, k
) must be atomic.
f.GEV
gives the density ,
F.GEV
gives the distribution function ,
invF.GEV
gives
the quantile function ,
Lmom.GEV
gives the L-moments (,
,
,
),
par.GEV
gives the parameters (xi
, alfa
, k
), and rand.GEV
generates random deviates.
For information on the package and the Author, and for all the references, see nsRFA
.
rnorm
, runif
, EXP
, GENLOGIS
, GENPAR
, GUMBEL
, KAPPA
, LOGNORM
, P3
; DISTPLOTS
, GOFmontecarlo
, Lmoments
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.GEV(ll[1],ll[2],ll[4]) f.GEV(1800,parameters$xi,parameters$alfa,parameters$k) F.GEV(1800,parameters$xi,parameters$alfa,parameters$k) invF.GEV(0.7518357,parameters$xi,parameters$alfa,parameters$k) Lmom.GEV(parameters$xi,parameters$alfa,parameters$k) rand.GEV(100,parameters$xi,parameters$alfa,parameters$k) Rll <- regionalLmoments(x,fac); Rll parameters <- par.GEV(Rll[1],Rll[2],Rll[4]) Lmom.GEV(parameters$xi,parameters$alfa,parameters$k)
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.GEV(ll[1],ll[2],ll[4]) f.GEV(1800,parameters$xi,parameters$alfa,parameters$k) F.GEV(1800,parameters$xi,parameters$alfa,parameters$k) invF.GEV(0.7518357,parameters$xi,parameters$alfa,parameters$k) Lmom.GEV(parameters$xi,parameters$alfa,parameters$k) rand.GEV(100,parameters$xi,parameters$alfa,parameters$k) Rll <- regionalLmoments(x,fac); Rll parameters <- par.GEV(Rll[1],Rll[2],Rll[4]) Lmom.GEV(parameters$xi,parameters$alfa,parameters$k)
Anderson-Darling goodness of fit tests for extreme-value distributions, from Laio (2004).
A2_GOFlaio (x, dist="NORM") A2 (F) W2 (F) fw2 (w)
A2_GOFlaio (x, dist="NORM") A2 (F) W2 (F) fw2 (w)
x |
data sample |
dist |
distribution: normal |
F |
cumulative distribution function (that has to be sorted increasingly) |
w |
Transformed test statistic (Laio, 2004) |
An introduction on the Anderson-Darling test is available on https://en.wikipedia.org/wiki/Anderson-Darling_test and in the GOFmontecarlo
help page.
The original paper of Laio (2004) is available on his web site.
A2_GOFlaio
tests the goodness of fit of a distribution with the sample x
; it return the value of the Anderson-Darling statistics and its non-exceedence probability
.
Note that
is the probability of obtaining the test statistic
lower than the one that was actually observed, assuming that the null hypothesis is true, i.e.,
is one minus the p-value usually employed in statistical testing (see https://en.wikipedia.org/wiki/P-value).
If
is, for example, greater than 0.90, the null hypothesis at significance level
is rejected.
A2
is the Anderson-Darling test statistic; it is used by A2_GOFlaio
.
W2
is the Cramer-von Mises test statistic.
fw2
is the approximation of the probability distribution of w
(first 2 terms) when is true (Anderson-Darling, 1952); it is used by
A2_GOFlaio
.
For information on the package and the Author, and for all the references, see nsRFA
.
sm <- rand.gumb(100, 0, 1) ml <- ML_estimation (sm, dist="GEV"); ml F.GEV(sm, ml[1], ml[2], ml[3]) A2(sort(F.GEV(sm, ml[1], ml[2], ml[3]))) A2_GOFlaio(sm, dist="GEV") ml <- ML_estimation (sm, dist="P3"); ml A2(sort(sort(F.gamma(sm, ml[1], ml[2], ml[3])))) A2_GOFlaio(sm, dist="P3")
sm <- rand.gumb(100, 0, 1) ml <- ML_estimation (sm, dist="GEV"); ml F.GEV(sm, ml[1], ml[2], ml[3]) A2(sort(F.GEV(sm, ml[1], ml[2], ml[3]))) A2_GOFlaio(sm, dist="GEV") ml <- ML_estimation (sm, dist="P3"); ml A2(sort(sort(F.gamma(sm, ml[1], ml[2], ml[3])))) A2_GOFlaio(sm, dist="P3")
Anderson-Darling goodness of fit tests for Regional Frequency Analysis: Monte-Carlo method.
gofNORMtest (x) gofEXPtest (x, Nsim=1000) gofGUMBELtest (x, Nsim=1000) gofGENLOGIStest (x, Nsim=1000) gofGENPARtest (x, Nsim=1000) gofGEVtest (x, Nsim=1000) gofLOGNORMtest (x, Nsim=1000) gofP3test (x, Nsim=1000)
gofNORMtest (x) gofEXPtest (x, Nsim=1000) gofGUMBELtest (x, Nsim=1000) gofGENLOGIStest (x, Nsim=1000) gofGENPARtest (x, Nsim=1000) gofGEVtest (x, Nsim=1000) gofLOGNORMtest (x, Nsim=1000) gofP3test (x, Nsim=1000)
x |
data sample |
Nsim |
number of simulated samples from the hypothetical parent distribution |
An introduction, analogous to the following one, on the Anderson-Darling test is available on https://en.wikipedia.org/wiki/Anderson-Darling_test.
Given a sample of data extracted from a distribution
, the test is used to check the null hypothesis
, where
is the hypothetical distribution and
is an array of parameters estimated from the sample
.
The Anderson-Darling goodness of fit test measures the departure between the hypothetical distribution and the cumulative frequency function
defined as:
where is the
-th element of the ordered sample (in increasing order).
The test statistic is:
where , in the case of the Anderson-Darling test (Laio, 2004), is
.
In practice, the statistic is calculated as:
The statistic , obtained in this way, may be confronted with the population of the
's that one obtain if samples effectively belongs to the
hypothetical distribution.
In the case of the test of normality, this distribution is defined (see Laio, 2004).
In other cases, e.g. the Pearson Type III case, can be derived with a Monte-Carlo procedure.
gofNORMtest
tests the goodness of fit of a normal (Gauss) distribution with the sample x
.
gofEXPtest
tests the goodness of fit of a exponential distribution with the sample x
.
gofGUMBELtest
tests the goodness of fit of a Gumbel (EV1) distribution with the sample x
.
gofGENLOGIStest
tests the goodness of fit of a Generalized Logistic distribution with the sample x
.
gofGENPARtest
tests the goodness of fit of a Generalized Pareto distribution with the sample x
.
gofGEVtest
tests the goodness of fit of a Generalized Extreme Value distribution with the sample x
.
gofLOGNORMtest
tests the goodness of fit of a 3 parameters Lognormal distribution with the sample x
.
gofP3test
tests the goodness of fit of a Pearson type III (gamma) distribution with the sample x
.
They return the value of the Anderson-Darling statistics and its non exceedence probability
.
Note that
is the probability of obtaining the test statistic
lower than the one that was actually observed, assuming that the null hypothesis is true, i.e.,
is one minus the p-value usually employed in statistical testing (see https://en.wikipedia.org/wiki/P-value).
If
is, for example, greater than 0.90, the null hypothesis at significance level
is rejected.
For information on the package and the Author, and for all the references, see nsRFA
.
x <- rnorm(30,10,1) gofNORMtest(x) x <- rand.gamma(50, 100, 15, 7) gofP3test(x, Nsim=200) x <- rand.GEV(50, 0.907, 0.169, 0.0304) gofGEVtest(x, Nsim=200) x <- rand.genlogis(50, 0.907, 0.169, 0.0304) gofGENLOGIStest(x, Nsim=200) x <- rand.genpar(50, 0.716, 0.418, 0.476) gofGENPARtest(x, Nsim=200) x <- rand.lognorm(50, 0.716, 0.418, 0.476) gofLOGNORMtest(x, Nsim=200)
x <- rnorm(30,10,1) gofNORMtest(x) x <- rand.gamma(50, 100, 15, 7) gofP3test(x, Nsim=200) x <- rand.GEV(50, 0.907, 0.169, 0.0304) gofGEVtest(x, Nsim=200) x <- rand.genlogis(50, 0.907, 0.169, 0.0304) gofGENLOGIStest(x, Nsim=200) x <- rand.genpar(50, 0.716, 0.418, 0.476) gofGENPARtest(x, Nsim=200) x <- rand.lognorm(50, 0.716, 0.418, 0.476) gofLOGNORMtest(x, Nsim=200)
GUMBEL
provides the link between L-moments of a sample and the two parameter
Gumbel distribution.
f.gumb (x, xi, alfa) F.gumb (x, xi, alfa) invF.gumb (F, xi, alfa) Lmom.gumb (xi, alfa) par.gumb (lambda1, lambda2) rand.gumb (numerosita, xi, alfa)
f.gumb (x, xi, alfa) F.gumb (x, xi, alfa) invF.gumb (F, xi, alfa) Lmom.gumb (xi, alfa) par.gumb (lambda1, lambda2) rand.gumb (numerosita, xi, alfa)
x |
vector of quantiles |
xi |
vector of gumb location parameters |
alfa |
vector of gumb scale parameters |
F |
vector of probabilities |
lambda1 |
vector of sample means |
lambda2 |
vector of L-variances |
numerosita |
numeric value indicating the length of the vector to be generated |
See https://en.wikipedia.org/wiki/Fisher-Tippett_distribution for an introduction to the Gumbel distribution.
Definition
Parameters (2): (location),
(scale).
Range of :
.
Probability density function:
Cumulative distribution function:
Quantile function:
.
L-moments
Here is Euler's constant, 0.5772...
Parameters
Lmom.gumb
and par.gumb
accept input as vectors of equal length. In f.gumb
, F.gumb
, invF.gumb
and rand.gumb
parameters (xi
, alfa
) must be atomic.
f.gumb
gives the density ,
F.gumb
gives the distribution function ,
invF.gumb
gives
the quantile function ,
Lmom.gumb
gives the L-moments (,
,
,
)),
par.gumb
gives the parameters (xi
, alfa
), and rand.gumb
generates random deviates.
For information on the package and the Author, and for all the references, see nsRFA
.
rnorm
, runif
, EXP
, GENLOGIS
, GENPAR
, GEV
, KAPPA
, LOGNORM
, P3
; DISTPLOTS
, GOFmontecarlo
, Lmoments
.
data(hydroSIMN) annualflows[1:10,] summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.gumb(ll[1],ll[2]) f.gumb(1800,parameters$xi,parameters$alfa) F.gumb(1800,parameters$xi,parameters$alfa) invF.gumb(0.7686843,parameters$xi,parameters$alfa) Lmom.gumb(parameters$xi,parameters$alfa) rand.gumb(100,parameters$xi,parameters$alfa) Rll <- regionalLmoments(x,fac); Rll parameters <- par.gumb(Rll[1],Rll[2]) Lmom.gumb(parameters$xi,parameters$alfa)
data(hydroSIMN) annualflows[1:10,] summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.gumb(ll[1],ll[2]) f.gumb(1800,parameters$xi,parameters$alfa) F.gumb(1800,parameters$xi,parameters$alfa) invF.gumb(0.7686843,parameters$xi,parameters$alfa) Lmom.gumb(parameters$xi,parameters$alfa) rand.gumb(100,parameters$xi,parameters$alfa) Rll <- regionalLmoments(x,fac); Rll parameters <- par.gumb(Rll[1],Rll[2]) Lmom.gumb(parameters$xi,parameters$alfa)
Homogeneity tests for Regional Frequency Analysis.
ADbootstrap.test (x, cod, Nsim=500, index=2) HW.tests (x, cod, Nsim=500) DK.test (x, cod) discordancy (x, cod) criticalD ()
ADbootstrap.test (x, cod, Nsim=500, index=2) HW.tests (x, cod, Nsim=500) DK.test (x, cod) discordancy (x, cod) criticalD ()
x |
vector representing data from many samples defined with |
cod |
array that defines the data subdivision among sites |
Nsim |
number of regions simulated with the bootstrap of the original region |
index |
if |
The Hosking and Wallis heterogeneity measures
The idea underlying Hosking and Wallis (1993) heterogeneity
statistics is to measure the sample variability of the L-moment
ratios and compare it to the variation that would be expected in a
homogeneous region. The latter is estimated through repeated
simulations of homogeneous regions with samples drawn from a four
parameter kappa distribution (see e.g., Hosking and Wallis,
1997, pp. 202-204).
More in detail, the steps are the following:
with regards to the samples belonging to the region under analysis, find
the sample L-moment ratios (see, Hosking and Wallis, 1997)
pertaining to the
-th site: these are the
L-coefficient of variation (L-CV),
the coefficient of L-skewness,
and the coefficient of L-kurtosis
Note that the L-moment ratios are not affected by the
normalization by the index value, i.e. it is the same to use
or
in Equations.
Define the regional averaged L-CV, L-skewness and L-kurtosis coefficients,
and compute the statistic
Fit the parameters of a
four-parameters kappa distribution to the regional averaged L-moment ratios
,
and
, and then generate a large number
of realizations of sets of
samples. The
-th site sample in each set
has a kappa distribution as its parent and
record length equal to
. For each simulated
homogeneous set, calculate the statistic
, obtaining
values.
On this vector of
values determine the mean
and standard
deviation
that relate to the hypothesis of homogeneity
(actually, under the composite hypothesis of homogeneity and kappa
parent distribution).
An heterogeneity measure, which is called here
, is finally found as
can be approximated by a normal distributed with zero
mean and unit variance: following Hosking and Wallis (1997),
the region under analysis can therefore be regarded as
‘acceptably homogeneous’ if
, ‘possibly
heterogeneous’ if
, and ‘definitely
heterogeneous’ if
. Hosking and Wallis
(1997) suggest that these limits should be treated as useful
guidelines. Even if the
statistic is constructed
like a significance test, significance levels obtained from such a
test would in fact be accurate only under special assumptions: to have
independent data both serially and between sites, and the true
regional distribution being kappa.
Hosking and Wallis (1993) also give alternative heterogeneity measures
(that we call and
), in which
is
replaced by:
or
The test statistic in this case becomes
or
with similar acceptability limits as the statistic.
Hosking and Wallis (1997) judge
and
to be inferior to
and say that it rarely yields values larger than 2 even for grossly heterogeneous regions.
The bootstrap Anderson-Darling test
A test that does not make any assumption on the parent distribution is the
Anderson-Darling () rank test (Scholz and Stephens, 1987).
The
test is the generalization of the classical
Anderson-Darling goodness of fit test (e.g., D'Agostino and
Stephens, 1986), and it is used to test the hypothesis that
independent samples belong to the same population without
specifying their common distribution function.
The test is based on the comparison between local and regional
empirical distribution functions. The empirical distribution
function, or sample distribution function, is defined by
, where
is
the size of the sample and
are the order statistics,
i.e. the observations arranged in ascending order. Denote the
empirical distribution function of the
-th sample (local) by
, and that of the pooled sample of all
observations (regional) by
. The
-sample Anderson-Darling test
statistic is then defined as
If the pooled ordered sample is , the
computational formula to evaluate
is:
where is the number of observations in the
-th sample
that are not greater than
. The homogeneity test can be
carried out by comparing the obtained
value to the
tabulated percentage points reported by Scholz and Stephens
(1987) for different significance levels.
The statistic depends on the sample values only
through their ranks. This guarantees that the test statistic
remains unchanged when the samples undergo monotonic
transformations, an important stability property not possessed by
heterogeneity measures. However, problems arise in applying this test in a
common index value procedure. In fact, the index
value procedure corresponds to dividing each site sample by a different
value, thus modifying the ranks in the pooled sample. In
particular, this has the effect of making the
local empirical distribution functions much more similar to the
other, providing an impression of homogeneity even when the
samples are highly heterogeneous. The effect is analogous to that
encountered when applying goodness-of-fit tests to distributions
whose parameters are estimated from the same sample used for the
test (e.g., D'Agostino and Stephens, 1986; Laio,
2004). In both cases, the percentage points for the test should be
opportunely redetermined. This can be done with a nonparametric bootstrap approach
presenting the following steps:
build up the pooled sample
of the observed
non-dimensional data.
Sample with replacement from
and generate
artificial local samples, of size
.
Divide each sample for its index value, and calculate
.
Repeat the procedure for
times and obtain a sample
of
,
values, whose
empirical distribution function can be used as an approximation of
, the distribution of
under
the null hypothesis of homogeneity.
The acceptance limits for the test, corresponding to any
significance level
, are then easily determined as the
quantiles of
corresponding to a probability
.
We will call the test obtained with the above procedure the bootstrap Anderson-Darling test, hereafter referred to as .
Durbin and Knott test
The last considered homogeneity test derives from a
goodness-of-fit statistic originally proposed by Durbin and
Knott (1971). The test is formulated to measure discrepancies in
the dispersion of the samples, without accounting for the possible
presence of discrepancies in the mean or skewness of the data.
Under this aspect, the test is similar to the test, while it
is analogous to the
test for the fact that it is a rank test.
The original goodness-of-fit test is very simple: suppose to have
a sample
,
, with hypothetical
distribution
; under the null hypothesis the random variable
has a uniform distribution in the
interval, and
the statistic
is
approximately normally distributed with mean 0 and variance 1
(Durbin and Knott, 1971).
serves the purpose of
detecting discrepancy in data dispersion: if the variance of
is greater than that of the hypothetical distribution
,
is significantly greater than
0, while
is significantly below 0 in the reverse case.
Differences between the mean (or the median) of
and
are instead not detected by
, which guarantees that the
normalization by the index value does not affect the test.
The extension to homogeneity testing of the Durbin and
Knott () statistic is straightforward: we substitute the empirical
distribution function obtained with the pooled observed data,
, for
in
, obtaining at each site a statistic
which is normal under the hypothesis of homogeneity. The statistic
has then a chi-squared
distribution with
degrees of freedom, which allows one to
determine the acceptability limits for the test, corresponding to
any significance level
.
Comparison among tests
The comparison (Viglione et al, 2007) shows that the Hosking and Wallis heterogeneity measure (only based on L-CV) is preferable when skewness is low, while the bootstrap Anderson-Darling test should be used for more skewed regions.
As for
, the Hosking and Wallis heterogeneity measure based on L-CV and L-CA, it is shown once more how much it lacks power.
Our suggestion is to guide the choice of the test according to a compromise between power and Type I error of the and
tests.
The L-moment space is divided into two regions:
if the
coefficient for the region under analysis is lower than 0.23, we propose to use the Hosking and Wallis heterogeneity measure
;
if
, the bootstrap Anderson-Darling test is preferable.
ADbootstrap.test
and DK.test
test gives its test statistic and its distribution value .
If
is, for example, 0.92, samples shouldn't be considered heterogeneous with significance level minor of 8%.
HW.tests
gives the two Hosking and Wallis heterogeneity measures and
; following Hosking and Wallis (1997), the region under analysis can therefore be regarded as ‘acceptably homogeneous’ if
, ‘possibly heterogeneous’ if
, and ‘definitely heterogeneous’ if
.
discordancy
returns the discordancy measure of Hosking and Wallis for all sites.
Hosking and Wallis suggest to consider a site discordant if
if
(where
is the number of sites considered in the region). For
the critical values of
can be listed with
criticalD
.
For information on the package and the Author, and for all the references, see nsRFA
.
traceWminim
, roi
, KAPPA
, HW.original
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] split(x,cod) #ADbootstrap.test(x,cod,Nsim=100) # it takes some time #HW.tests(x,cod) # it takes some time DK.test(x,cod) fac <- factor(annualflows["cod"][,],levels=c(34:38)) x2 <- annualflows[!is.na(fac),"dato"] cod2 <- annualflows[!is.na(fac),"cod"] ADbootstrap.test(x2,cod2,Nsim=100) ADbootstrap.test(x2,cod2,index=1,Nsim=200) HW.tests(x2,cod2,Nsim=100) DK.test(x2,cod2) discordancy(x,cod) criticalD()
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] split(x,cod) #ADbootstrap.test(x,cod,Nsim=100) # it takes some time #HW.tests(x,cod) # it takes some time DK.test(x,cod) fac <- factor(annualflows["cod"][,],levels=c(34:38)) x2 <- annualflows[!is.na(fac),"dato"] cod2 <- annualflows[!is.na(fac),"cod"] ADbootstrap.test(x2,cod2,Nsim=100) ADbootstrap.test(x2,cod2,index=1,Nsim=200) HW.tests(x2,cod2,Nsim=100) DK.test(x2,cod2) discordancy(x,cod) criticalD()
SIMN (Servizio Idrografico e Mareografico Nazionale) flow data samples and catchment parameters.
data(hydroSIMN)
data(hydroSIMN)
Data.frames:
annualflows
is the data.frame of the annual flows with 3 columns:
cod
, the code of the station;
anno
, the year;
dato
, the value of the annual flow [mm].
parameters
is the data.frame of parameters of 47 catchements with 16 columns:
cod
, the code of the station;
Dm
, the mean annual streamflow [mm] as reported in the ‘Pubblicazione n. 17’;
Am
, the mean annual rainfall [mm] as reported in the ‘Pubblicazione n. 17’;
S
, area of the plane projection of the drainage basin [km2];
Hm
, mean elevation of the drainage basin [m a.s.l.];
Pm
, mean slope of the basin [%]:
where is the basin area,
the median elevation and
the elevation of the closing section.
Pm
is a slope measure of a square equivalent basin, and does not account for basin shape;
LLDP
, length of the longest drainage path [km].
The longest drainage path is the longest path between the basin outlet and the most distant point on the basin border, following drainage directions.
Actually the longest drainage path corresponds to the main stream plus the path on the hillslope that connects the stream source to the basin border;
PLDP
, slope of the longest drainage path [%]. Average of the slope values associated to each pixel in the longest drainage path;
S2000
, area above 2000 m a.s.l. [%];
EST
, ‘easting’, sine of the angle between the segment connecting the center of mass and the outlet of the basin and the north.
EST
is 1 if the basin is oriented eastward, -1 if it is oriented westward;
NORD
, ‘northing’, cosine of the angle between the segment connecting the center of mass and the outlet of the basin and the north.
NORD
is 1 if the basin is oriented northward, -1 if it is oriented southward;
Rc
, circularity ratio Rc.
Ratio between the basin area and the area of a circle having the same perimeter:
where is the watershed perimeter;
Xbar
, longitude [deg] of the centroid of the plane projection of the drainage basin;
Ybar
, latitude [deg] of the centroid of the plane projection of the drainage basin;
IT
, Thornthwaite index: a global moisture index that can be estimated, in its simplest form, as the ratio
where is the mean annual potential evapotranspiration on the basin;
IB
, Budyko index: a radiational aridity index expressed as
where is the mean annual net radiation and
is the latent vaporization heat.
Values assumed by
are lower than 1 for humid regions and greater than 1 in arid regions.
meanmonthlyflows
is the data.frame of the mean monthly streamflows [mm] as reported in the ‘Pubblicazione n. 17’.
It has 13 columns because the first one, cod
, is the code of the station.
monthlyflows
is the data.frame of the monthly streamflows [mm] with 4 columns:
cod
, the code of the station;
anno
, the year;
mese
, the month;
dato
, the value of the annual flow [mm].
For information on the package and the Author, and for all the references, see nsRFA
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] split(x,cod) sapply(split(x,cod),mean) sapply(split(x,cod),median) sapply(split(x,cod),quantile) sapply(split(x,cod),Lmoments) parameters
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] split(x,cod) sapply(split(x,cod),mean) sapply(split(x,cod),median) sapply(split(x,cod),quantile) sapply(split(x,cod),Lmoments) parameters
KAPPA
provides the link between L-moments of a sample and the four parameter
kappa distribution.
f.kappa (x, xi, alfa, k, h) F.kappa (x, xi, alfa, k, h) invF.kappa (F, xi, alfa, k, h) Lmom.kappa (xi, alfa, k, h) par.kappa (lambda1, lambda2, tau3, tau4) rand.kappa (numerosita, xi, alfa, k, h)
f.kappa (x, xi, alfa, k, h) F.kappa (x, xi, alfa, k, h) invF.kappa (F, xi, alfa, k, h) Lmom.kappa (xi, alfa, k, h) par.kappa (lambda1, lambda2, tau3, tau4) rand.kappa (numerosita, xi, alfa, k, h)
x |
vector of quantiles |
xi |
vector of kappa location parameters |
alfa |
vector of kappa scale parameters |
k |
vector of kappa third parameters |
h |
vector of kappa fourth parameters |
F |
vector of probabilities |
lambda1 |
vector of sample means |
lambda2 |
vector of L-variances |
tau3 |
vector of L-CA (or L-skewness) |
tau4 |
vector of L-kurtosis |
numerosita |
numeric value indicating the length of the vector to be generated |
Definition
Parameters (4): (location),
(scale),
,
.
Range of : upper bound is
if
,
if
;
lower bound is
if
,
if
and
and
if
and
Probability density function:
Cumulative distribution function:
Quantile function:
is the generalized logistic distribution;
is the generalized eztreme value distribution;
is the generalized Pareto distribution.
L-moments
L-moments are defined for and
, or if
and
.
where
if
;
if
;
Here denote the gamma function
Parameters
There are no simple expressions for the parameters in terms of the L-moments.
However they can be obtained with a numerical algorithm considering the formulations of and
in terms of
and
.
Here we use the function
optim
to minimize where
and
are the sample L-moment ratios.
Lmom.kappa
and par.kappa
accept input as vectors of equal length. In f.kappa
, F.kappa
,
invF.kappa
and rand.kappa
parameters (xi
, alfa
, k
, h
) must be atomic.
f.kappa
gives the density ,
F.kappa
gives the distribution function ,
invFkappa
gives
the quantile function ,
Lmom.kappa
gives the L-moments (,
,
,
),
par.kappa
gives the parameters (xi
, alfa
, k
, h
), and rand.kappa
generates random deviates.
For information on the package and the Author, and for all the references, see nsRFA
.
rnorm
, runif
, EXP
, GENLOGIS
, GENPAR
, GEV
, GUMBEL
, LOGNORM
, P3
; optim
, DISTPLOTS
, GOFmontecarlo
, Lmoments
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.kappa(ll[1],ll[2],ll[4],ll[5]) f.kappa(1800,parameters$xi,parameters$alfa,parameters$k,parameters$h) F.kappa(1800,parameters$xi,parameters$alfa,parameters$k,parameters$h) invF.kappa(0.771088,parameters$xi,parameters$alfa,parameters$k,parameters$h) Lmom.kappa(parameters$xi,parameters$alfa,parameters$k,parameters$h) rand.kappa(100,parameters$xi,parameters$alfa,parameters$k,parameters$h) Rll <- regionalLmoments(x,fac); Rll parameters <- par.kappa(Rll[1],Rll[2],Rll[4],Rll[5]) Lmom.kappa(parameters$xi,parameters$alfa,parameters$k,parameters$h)
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.kappa(ll[1],ll[2],ll[4],ll[5]) f.kappa(1800,parameters$xi,parameters$alfa,parameters$k,parameters$h) F.kappa(1800,parameters$xi,parameters$alfa,parameters$k,parameters$h) invF.kappa(0.771088,parameters$xi,parameters$alfa,parameters$k,parameters$h) Lmom.kappa(parameters$xi,parameters$alfa,parameters$k,parameters$h) rand.kappa(100,parameters$xi,parameters$alfa,parameters$k,parameters$h) Rll <- regionalLmoments(x,fac); Rll parameters <- par.kappa(Rll[1],Rll[2],Rll[4],Rll[5]) Lmom.kappa(parameters$xi,parameters$alfa,parameters$k,parameters$h)
Lmoments
provides the estimate of L-moments of a sample or regional L-moments of a region.
Lmoments (x) regionalLmoments (x,cod) LCV (x) LCA (x) Lkur (x)
Lmoments (x) regionalLmoments (x,cod) LCV (x) LCA (x) Lkur (x)
x |
vector representing a data-sample (or data from many samples defined with |
cod |
array that defines the data subdivision among sites |
The estimation of L-moments is based on a sample of size , arranged in ascending order.
Let
be the ordered sample.
An unbiased estimator of the probability weighted moments
is:
The sample L-moments are defined by:
and in general
where .
The sample L-moment ratios are defined by
and the sample L-CV by
Sample regional L-CV, L-skewness and L-kurtosis coefficients are defined as
Lmoments
gives the L-moments (,
,
,
,
),
regionalLmoments
gives the regional weighted L-moments (,
,
,
,
),
LCV
gives the coefficient of L-variation, LCA
gives the L-skewness and Lkur
gives the L-kurtosis of x
.
For information on the package and the Author, and for all the references, see nsRFA
.
x <- rnorm(30,10,2) Lmoments(x) data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] split(x,cod) camp <- split(x,cod)$"45" Lmoments(camp) sapply(split(x,cod),Lmoments) regionalLmoments(x,cod)
x <- rnorm(30,10,2) Lmoments(x) data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] split(x,cod) camp <- split(x,cod)$"45" Lmoments(camp) sapply(split(x,cod),Lmoments) regionalLmoments(x,cod)
LOGNORM
provides the link between L-moments of a sample and the three parameter
log-normal distribution.
f.lognorm (x, xi, alfa, k) F.lognorm (x, xi, alfa, k) invF.lognorm (F, xi, alfa, k) Lmom.lognorm (xi, alfa, k) par.lognorm (lambda1, lambda2, tau3) rand.lognorm (numerosita, xi, alfa, k)
f.lognorm (x, xi, alfa, k) F.lognorm (x, xi, alfa, k) invF.lognorm (F, xi, alfa, k) Lmom.lognorm (xi, alfa, k) par.lognorm (lambda1, lambda2, tau3) rand.lognorm (numerosita, xi, alfa, k)
x |
vector of quantiles |
xi |
vector of lognorm location parameters |
alfa |
vector of lognorm scale parameters |
k |
vector of lognorm shape parameters |
F |
vector of probabilities |
lambda1 |
vector of sample means |
lambda2 |
vector of L-variances |
tau3 |
vector of L-CA (or L-skewness) |
numerosita |
numeric value indicating the length of the vector to be generated |
See https://en.wikipedia.org/wiki/Log-normal_distribution for an introduction to the lognormal distribution.
Definition
Parameters (3): (location),
(scale),
(shape).
Range of :
if
;
if
;
if
.
Probability density function:
where if
,
if
.
Cumulative distribution function:
where
.
Quantile function:
has no explicit analytical form.
is the Normal distribution with parameters
and
.
L-moments
L-moments are defined for all values of .
There are no simple expressions for the L-moment ratios with
.
Here we use the rational-function approximation given in Hosking and Wallis (1997, p. 199).
Parameters
The shape parameter is a function of
alone.
No explicit solution is possible.
Here we use the approximation given in Hosking and Wallis (1997, p. 199).
Given , the other parameters are given by
Lmom.lognorm
and par.lognorm
accept input as vectors of equal length. In f.lognorm
, F.lognorm
, invF.lognorm
and rand.lognorm
parameters (xi
, alfa
, k
) must be atomic.
f.lognorm
gives the density ,
F.lognorm
gives the distribution function ,
invFlognorm
gives the quantile function ,
Lmom.lognorm
gives the L-moments (,
,
,
),
par.lognorm
gives the parameters (xi
, alfa
, k
), and rand.lognorm
generates random deviates.
For information on the package and the Author, and for all the references, see nsRFA
.
rnorm
, runif
, EXP
, GENLOGIS
, GENPAR
, GEV
, GUMBEL
, KAPPA
, P3
; DISTPLOTS
, GOFmontecarlo
, Lmoments
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.lognorm(ll[1],ll[2],ll[4]) f.lognorm(1800,parameters$xi,parameters$alfa,parameters$k) F.lognorm(1800,parameters$xi,parameters$alfa,parameters$k) invF.lognorm(0.7529877,parameters$xi,parameters$alfa,parameters$k) Lmom.lognorm(parameters$xi,parameters$alfa,parameters$k) rand.lognorm(100,parameters$xi,parameters$alfa,parameters$k) Rll <- regionalLmoments(x,fac); Rll parameters <- par.lognorm(Rll[1],Rll[2],Rll[4]) Lmom.lognorm(parameters$xi,parameters$alfa,parameters$k)
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.lognorm(ll[1],ll[2],ll[4]) f.lognorm(1800,parameters$xi,parameters$alfa,parameters$k) F.lognorm(1800,parameters$xi,parameters$alfa,parameters$k) invF.lognorm(0.7529877,parameters$xi,parameters$alfa,parameters$k) Lmom.lognorm(parameters$xi,parameters$alfa,parameters$k) rand.lognorm(100,parameters$xi,parameters$alfa,parameters$k) Rll <- regionalLmoments(x,fac); Rll parameters <- par.lognorm(Rll[1],Rll[2],Rll[4]) Lmom.lognorm(parameters$xi,parameters$alfa,parameters$k)
Maximum Likelihood estimation of parameters for extreme-value distributions, from Laio (2004).
ML_estimation (x, dist="NORM") moment_estimation (x, dist="NORM")
ML_estimation (x, dist="NORM") moment_estimation (x, dist="NORM")
x |
data sample |
dist |
distribution: normal |
ML_estimation
estimate the parameters of the distribution dist
from a sample x
using the maximum likelihood approach.
moment_estimation
estimate the parameters of the distribution dist
from a sample x
using the moment method.
For information on the package and the Author, and for all the references, see nsRFA
.
# sample from an EV1 distribution sm <- rand.gumb(100, 0, 1) moment_estimation (sm, dist="GEV") ML_estimation (sm, dist="GEV") F.GEV(sm, -0.051, 0.97, -0.024) rand.GEV (100, -0.051, 0.97, -0.024) moment_estimation (sm, dist="P3") ML_estimation (sm, dist="P3")
# sample from an EV1 distribution sm <- rand.gumb(100, 0, 1) moment_estimation (sm, dist="GEV") ML_estimation (sm, dist="GEV") F.GEV(sm, -0.051, 0.97, -0.024) rand.GEV (100, -0.051, 0.97, -0.024) moment_estimation (sm, dist="P3") ML_estimation (sm, dist="P3")
moments
provides the estimate of the first 4 moment-statistics of a sample.
moments (x) CV (x) skew (x) kurt (x)
moments (x) CV (x) skew (x) kurt (x)
x |
vector representing a data-sample |
Skewness and kurtosis are defined as:
where is the size of
x
.
See https://en.wikipedia.org/wiki/Skewness and https://en.wikipedia.org/wiki/Kurtosis for additional informations.
For information on the package and the Author, and for all the references, see nsRFA
.
x <- rnorm(30,10,2) moments(x) data(hydroSIMN) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] sapply(split(x,cod),moments)
x <- rnorm(30,10,2) moments(x) data(hydroSIMN) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] sapply(split(x,cod),moments)
Model selection criteria for the frequency analysis of hydrological extremes, from Laio et al (2008).
MSClaio2008 (sample, dist=c("NORM","LN","GUMBEL","EV2","GEV","P3","LP3"), crit=c("AIC", "AICc", "BIC", "ADC")) ## S3 method for class 'MSClaio2008' print(x, digits=max(3, getOption("digits") - 3), ...) ## S3 method for class 'MSClaio2008' summary(object, ...) ## S3 method for class 'MSClaio2008' plot(x, ...)
MSClaio2008 (sample, dist=c("NORM","LN","GUMBEL","EV2","GEV","P3","LP3"), crit=c("AIC", "AICc", "BIC", "ADC")) ## S3 method for class 'MSClaio2008' print(x, digits=max(3, getOption("digits") - 3), ...) ## S3 method for class 'MSClaio2008' summary(object, ...) ## S3 method for class 'MSClaio2008' plot(x, ...)
sample |
data sample |
dist |
distributions: normal |
crit |
Model-selection criteria: Akaike Information Criterion |
x |
object of class |
object |
object of class |
digits |
minimal number of "significant" digits, see 'print.default' |
... |
other arguments |
The following lines are extracted from Laio et al. (2008). See the paper for more details and references.
Model selection criteria
The problem of model selection can be formalized as follows: a sample of data,
, arranged in ascending order is available, sampled from an unknown parent distribution
;
operating models,
,
, are used to represent the data.
The operating models are in the form of probability distributions,
, with parameters
estimated from the available data sample
.
The scope of model selection is to identify the model
which is better suited to represent the data, i.e. the model which is closer in some sense to the parent distribution
.
Three different model selection criteria are considered here, namely, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and the Anderson-Darling Criterion (ADC). Of the three methods, the first two belong to the category of classical literature approaches, while the third derives from a heuristic interpretation of the results of a standard goodness-of-fit test (see Laio, 2004).
Akalike Information Criterion
The Akaike information Criterion (AIC) for the j-th operational model can be computed as
where
is the likelihood function, evaluated at the point corresponding to the maximum likelihood estimator of the parameter vector
and
is the number of estimated parameter of the j-th operational model.
In practice, after the computation of the
, for all of the operating models, one selects the model with the minimum AIC value,
.
When the sample size, , is small, with respect to the number of estimated parameters,
, the AIC may perform inadequately. In those cases a second-order variant of AIC, called AICc, should be used:
Indicatively, AICc should be used when .
Bayesian Information Criterion
The Bayesian Information Criterion (BIC) for the j-th operational model reads
In practical application, after the computation of the , for all of the operating models, one selects the model with the minimum BIC value,
.
Anderson-Darling Criterion
The Anderson-Darling criterion has the form:
if ,
if ,
where
is the discrepancy measure characterizing the criterion, the Anderson-Darling statistic
A2
in GOFlaio2004
, and ,
and
are distribution-dependent coefficients that are tabled by Laio [2004, Tables 3 and 5] for a set of seven distributions commonly employed for the frequency analysis of extreme events.
In practice, after the computation of the
, for all of the operating models, one selects the model with the minimum ADC value,
.
MSClaio2008
returns the value of the criteria crit
(see Details) chosen applied to the sample
, for every distribution dist
.
plot.MSClaio2008
plots the empirical distribution function of sample
(Weibull plotting position) on a log-normal probability plot, plots the candidate distributions dist
(whose parameters are evaluated with the maximum likelihood technique, see MLlaio2004
, and highlights the ones chosen by the criteria crit
.)
For information on the package and the Author, and for all the references, see nsRFA
.
data(FEH1000) sitedata <- am[am[,1]==53004, ] # data of site 53004 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4]) MSC summary(MSC) plot(MSC) sitedata <- am[am[,1]==69023, ] # data of site 69023 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4], crit=c("AIC", "ADC")) MSC summary(MSC) plot(MSC) sitedata <- am[am[,1]==83802, ] # data of site 83802 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4], dist=c("GEV", "P3", "LP3")) MSC summary(MSC) plot(MSC) # short sample, high positive L-CA sitedata <- am[am[,1]==40012, ] # data of site 40012 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4]) MSC summary(MSC) plot(MSC) # negative L-CA sitedata <- am[am[,1]==68002, ] # data of site 68002 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4]) MSC summary(MSC) plot(MSC)
data(FEH1000) sitedata <- am[am[,1]==53004, ] # data of site 53004 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4]) MSC summary(MSC) plot(MSC) sitedata <- am[am[,1]==69023, ] # data of site 69023 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4], crit=c("AIC", "ADC")) MSC summary(MSC) plot(MSC) sitedata <- am[am[,1]==83802, ] # data of site 83802 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4], dist=c("GEV", "P3", "LP3")) MSC summary(MSC) plot(MSC) # short sample, high positive L-CA sitedata <- am[am[,1]==40012, ] # data of site 40012 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4]) MSC summary(MSC) plot(MSC) # negative L-CA sitedata <- am[am[,1]==68002, ] # data of site 68002 serieplot(sitedata[,4], sitedata[,3]) MSC <- MSClaio2008(sitedata[,4]) MSC summary(MSC) plot(MSC)
P3
provides the link between L-moments of a sample and the three parameter
Pearson type III distribution.
f.gamma (x, xi, beta, alfa) F.gamma (x, xi, beta, alfa) invF.gamma (F, xi, beta, alfa) Lmom.gamma (xi, beta, alfa) par.gamma (lambda1, lambda2, tau3) rand.gamma (numerosita, xi, beta, alfa) mom2par.gamma (mu, sigma, gamm) par2mom.gamma (alfa, beta, xi)
f.gamma (x, xi, beta, alfa) F.gamma (x, xi, beta, alfa) invF.gamma (F, xi, beta, alfa) Lmom.gamma (xi, beta, alfa) par.gamma (lambda1, lambda2, tau3) rand.gamma (numerosita, xi, beta, alfa) mom2par.gamma (mu, sigma, gamm) par2mom.gamma (alfa, beta, xi)
x |
vector of quantiles |
mu |
vector of gamma mean |
sigma |
vector of gamma standard deviation |
gamm |
vector of gamma third moment |
F |
vector of probabilities |
lambda1 |
vector of sample means |
lambda2 |
vector of L-variances |
tau3 |
vector of L-CA (or L-skewness) |
numerosita |
numeric value indicating the length of the vector to be generated |
alfa |
vector of gamma shape parameters |
beta |
vector of gamma scale parameters |
xi |
vector of gamma location parameters |
See https://en.wikipedia.org/wiki/Pearson_distribution for an introduction to the Pearson distribution, and https://en.wikipedia.org/wiki/Gamma_distribution for an introduction to the Gamma distribution (the Pearson type III distribution is, essentially, a Gamma distribution with 3 parameters).
Definition
Parameters (3): (location),
(scale),
(shape).
Moments (3):
(mean),
(standard deviation),
(skewness).
If , let
,
, and
.
If
, then the range of
is
and
If , then the distribution is Normal, the range of
is
and
where
and
.
If , then the range of
is
and
In each case, has no explicit analytical form.
Here
is the gamma function, defined as
and
is the incomplete gamma function.
is the exponential distribution;
is the Normal distribution;
is the reverse exponential distribution.
The parameters ,
and
are the conventional moments of the distribution.
L-moments
Assuming , L-moments are defined for
.
where is the incomplete beta function ratio
There is no simple expression for .
Here we use the rational-funcion approximation given by Hosking and Wallis (1997, pp. 201-202).
The corresponding results for are obtained by changing the signs of
,
and
wherever they occur above.
Parameters
is obtained with an approximation.
If
, let
and use
if , let
and use
Given , then
,
,
.
Lmom.gamma
and par.gamma
accept input as vectors of equal length.
In f.gamma
, F.gamma
, invF.gamma
and rand.gamma
parameters (mu
, sigma
, gamm
) must be atomic.
f.gamma
gives the density ,
F.gamma
gives the distribution function ,
invFgamma
gives
the quantile function ,
Lmom.gamma
gives the L-moments (,
,
,
),
par.gamma
gives the parameters (mu
, sigma
, gamm
), and rand.gamma
generates random deviates.
mom2par.gamma
returns the parameters ,
and
, given the parameters (moments)
,
,
.
For information on the package and the Author, and for all the references, see nsRFA
.
rnorm
, runif
, EXP
, GENLOGIS
, GENPAR
, GEV
, GUMBEL
, KAPPA
, LOGNORM
; DISTPLOTS
, GOFmontecarlo
, Lmoments
.
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.gamma(ll[1],ll[2],ll[4]) f.gamma(1800,parameters$xi,parameters$beta,parameters$alfa) F.gamma(1800,parameters$xi,parameters$beta,parameters$alfa) invF.gamma(0.7511627,parameters$xi,parameters$beta,parameters$alfa) Lmom.gamma(parameters$xi,parameters$beta,parameters$alfa) rand.gamma(100,parameters$xi,parameters$beta,parameters$alfa) Rll <- regionalLmoments(x,fac); Rll parameters <- par.gamma(Rll[1],Rll[2],Rll[4]) Lmom.gamma(parameters$xi,parameters$beta,parameters$alfa) moments <- par2mom.gamma(parameters$alfa,parameters$beta,parameters$xi); moments mom2par.gamma(moments$mu,moments$sigma,moments$gamm)
data(hydroSIMN) annualflows summary(annualflows) x <- annualflows["dato"][,] fac <- factor(annualflows["cod"][,]) split(x,fac) camp <- split(x,fac)$"45" ll <- Lmoments(camp) parameters <- par.gamma(ll[1],ll[2],ll[4]) f.gamma(1800,parameters$xi,parameters$beta,parameters$alfa) F.gamma(1800,parameters$xi,parameters$beta,parameters$alfa) invF.gamma(0.7511627,parameters$xi,parameters$beta,parameters$alfa) Lmom.gamma(parameters$xi,parameters$beta,parameters$alfa) rand.gamma(100,parameters$xi,parameters$beta,parameters$alfa) Rll <- regionalLmoments(x,fac); Rll parameters <- par.gamma(Rll[1],Rll[2],Rll[4]) Lmom.gamma(parameters$xi,parameters$beta,parameters$alfa) moments <- par2mom.gamma(parameters$alfa,parameters$beta,parameters$xi); moments mom2par.gamma(moments$mu,moments$sigma,moments$gamm)
Diagnostics of the output of lm
, that is used to fit linear models.
R2.lm (x) prt.lm (x) mantel.lm (x, Nperm = 1000) vif.lm (x) RMSE.lm (x) MAE.lm (x) predinterval.lm (x, level = 0.95) jackknife1.lm (x) RMSEjk.lm (x) MAEjk.lm (x)
R2.lm (x) prt.lm (x) mantel.lm (x, Nperm = 1000) vif.lm (x) RMSE.lm (x) MAE.lm (x) predinterval.lm (x, level = 0.95) jackknife1.lm (x) RMSEjk.lm (x) MAEjk.lm (x)
x |
object of class “lm” (output of ‘lm’) |
Nperm |
number of permutations |
level |
significance level |
mantel.lm
is performed under the assumption that the dependent distance matrix is variable, while the independent distance matrices are fixed and measured without error (they are not related to random variables, see Smouse et al., 1986).
Under this assumption, the significance of the regression between distance matrices can be evaluated simultaneously permuting the rows and corresponding columns in only the dependent distance matrix, while the others are held constant.
R2.lm
returns the coefficient of determination and the adjusted coefficient of determination
of the regression.
prt.lm
returns the probability of the significance test (Student t) of the independent variables.
If the value is 0.06 for a regressor, its coefficient is not significantly different from 0 for a test with significance level of 5%.
mantel.lm
returns the probability of the Mantel test on every variable conditionated to the others.
It substitutes
prt.lm
when dealing with distance matrices.
If is, for example, 0.92, the variable should be considered significant with significance level greater of 8%.
vif.lm
returns the variance inflation factors (VIF) of the independent values of the regression. If (or 10) there is a problem of multicollinearity.
RMSE.lm
returns the root mean squared error of the regression.
MAE.lm
returns the mean absolute error of the regression.
predinterval.lm
returns the prediction intervals at a specified level
in correspondence to the fitted data.
jackknife1.lm
returns predicted values by a jackknife (cross-validation) procedure.
The procedure (remove 1 observation, fit the model, estimate in the removed point) is repeated for all the points.
RMSEjk.lm
returns the root mean squared error of the cross-validation (performed with jackknife1.lm
).
MAEjk.lm
returns the mean absolute error of the cross-validation (performed with jackknife1.lm
).
For information on the package and the Author, and for all the references, see nsRFA
.
data(hydroSIMN) D <- annualflows["dato"][,] cod <- annualflows["cod"][,] #Dm <- tapply(D,cod,mean) #datregr <- cbind(Dm,parameters) datregr <- parameters regr0 <- lm(Dm ~ .,datregr); summary(regr0) regr1 <- lm(Dm ~ Am + Hm + Ybar,datregr); summary(regr1) R2.lm(regr0) R2.lm(regr1) prt.lm(regr0) prt.lm(regr1) vif.lm(regr0) vif.lm(regr1) RMSE.lm(regr0) RMSE.lm(regr1) MAE.lm(regr0) MAE.lm(regr1) predinterval.lm(regr0) jackknife1.lm(regr0) jackknife1.lm(regr1) RMSEjk.lm(regr0) RMSEjk.lm(regr1) MAEjk.lm(regr0) MAEjk.lm(regr1) # mantel test on distance matrices #Y <- AD.dist(D,cod) # it takes some time #X <- data.frame(apply(datregr[,c("Hm","Ybar")],2,dist)) #dati <- cbind(X) #modello <- lm(Y ~ Hm + Ybar, dati) #mantel.lm(modello, Nperm=100)
data(hydroSIMN) D <- annualflows["dato"][,] cod <- annualflows["cod"][,] #Dm <- tapply(D,cod,mean) #datregr <- cbind(Dm,parameters) datregr <- parameters regr0 <- lm(Dm ~ .,datregr); summary(regr0) regr1 <- lm(Dm ~ Am + Hm + Ybar,datregr); summary(regr1) R2.lm(regr0) R2.lm(regr1) prt.lm(regr0) prt.lm(regr1) vif.lm(regr0) vif.lm(regr1) RMSE.lm(regr0) RMSE.lm(regr1) MAE.lm(regr0) MAE.lm(regr1) predinterval.lm(regr0) jackknife1.lm(regr0) jackknife1.lm(regr1) RMSEjk.lm(regr0) RMSEjk.lm(regr1) MAEjk.lm(regr0) MAEjk.lm(regr1) # mantel test on distance matrices #Y <- AD.dist(D,cod) # it takes some time #X <- data.frame(apply(datregr[,c("Hm","Ybar")],2,dist)) #dati <- cbind(X) #modello <- lm(Y ~ Hm + Ybar, dati) #mantel.lm(modello, Nperm=100)
Formation of clusters for Regional Frequency Analysis: region of influence (Burn, 1990).
roi (p.ungauged, p.gauged, cod.p, x=NULL, cod=NULL) roi.hom (p.ungauged, p.gauged, cod.p, x, cod, test="HW", limit=2, Nsim=500, index=2) roi.st.year (p.ungauged, p.gauged, cod.p, x, cod, test="HW", station.year=500, Nsim=500, index=2)
roi (p.ungauged, p.gauged, cod.p, x=NULL, cod=NULL) roi.hom (p.ungauged, p.gauged, cod.p, x, cod, test="HW", limit=2, Nsim=500, index=2) roi.st.year (p.ungauged, p.gauged, cod.p, x, cod, test="HW", station.year=500, Nsim=500, index=2)
x |
vector representing data from many samples defined with |
cod |
array that defines the data subdivision among sites |
index |
if |
p.ungauged |
parameters of the ungauged site (1 row) |
p.gauged |
parameters of gauged sites |
cod.p |
code of gauged sites |
test |
homogeneity test to apply: |
limit |
limit over which regions must be considered heterogeneous: for example 2 for |
Nsim |
number of simulations in |
station.year |
number of station years to form the region |
The Euclidean distance is used.
Given different classification variables, the distance between two elements
and
is:
where is the value of the
-th variable of the
-th element.
roi
returns the ‘region of influence’ for the site defined with p.ungauged
.
It the gauged sites ordered according to the euclidean distance against the site of interest (the distance is evaluated in the space defined by parameters p.ungauged
and p.gauged
).
If x=NULL
and cod=NULL
(default), a data.frame with the ordered sites and the distances against the site of interest is returned.
If x
and cod
are provided, the data.frame will contain also statistics of samples (number of data n
and L-moments).
roi.hom
returns the ‘region of influence’ for the site defined with p.ungauged
.
It returns codes of gauged sites that form an homogeneous region according to the Hosking and Wallis "HW"
or Anderson-Darling "AD"
tests.
The region is formed using distances in the space defined by parameters p.ungauged
and p.gauged
.
roi.st.year
returns the ‘region of influence’ for the site defined with p.ungauged
.
It returns codes of gauged sites that form a region and the risult of homogeneity tests, according to the station-year criterion.
It also return the similarity ranking factor , the weights
and the regional L-moments as evaluated in the Flood Estimation Handbook (Robson and Reed, 1999).
The region is formed using distances in the space defined by parameters
p.ungauged
and p.gauged
.
For information on the package and the Author, and for all the references, see nsRFA
.
traceWminim
, AD.dist
, HOMTESTS
for the definition of the Hosking and Wallis "HW"
or Anderson-Darling "AD"
tests.
data(hydroSIMN) parameters summary(parameters) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] roi(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1]) roi(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1],x,cod) # roi.hom #roi.hom(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1],x,cod) # it takes some time #roi.hom(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1],x,cod, # test="AD",limit=.95) # it takes some time #roi.hom(parameters[8,3:5],parameters[-8,3:5], # parameters[-8,1],x,cod) # it takes some time # roi.st.year roi.st.year(parameters[5,3:5],parameters[-5,3:5], parameters[-5,1],x,cod) roi.st.year(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1], x,cod,test="AD",station.year=100)
data(hydroSIMN) parameters summary(parameters) annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] roi(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1]) roi(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1],x,cod) # roi.hom #roi.hom(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1],x,cod) # it takes some time #roi.hom(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1],x,cod, # test="AD",limit=.95) # it takes some time #roi.hom(parameters[8,3:5],parameters[-8,3:5], # parameters[-8,1],x,cod) # it takes some time # roi.st.year roi.st.year(parameters[5,3:5],parameters[-5,3:5], parameters[-5,1],x,cod) roi.st.year(parameters[5,3:5],parameters[-5,3:5],parameters[-5,1], x,cod,test="AD",station.year=100)
Plots for time-series investigation.
serieplot (x, t, lim.x=c(min(x),max(x)), lim.t=c(min(t),max(t)), ...) consistencyplot (t, cod, cex.axis=.8, mark.code=TRUE, ...)
serieplot (x, t, lim.x=c(min(x),max(x)), lim.t=c(min(t),max(t)), ...) consistencyplot (t, cod, cex.axis=.8, mark.code=TRUE, ...)
x |
data sample |
t |
vector representing time (e.g. years) of data-samples defined with |
lim.x , lim.t
|
plot limits |
cod |
array that defines the data subdivision among sites |
cex.axis |
dimensions of points and labels |
mark.code |
if TRUE (default) codes of samples are plotted on y axis |
... |
graphical parameters as |
consistencyplot
displays time-series consistency.
For information on the package and the Author, and for all the references, see nsRFA
.
data(hydroSIMN) annualflows[c(1:10),] x <- annualflows["dato"][,] y <- annualflows["anno"][,] cod <- annualflows["cod"][,] consistencyplot(y,cod) for (i in unique(cod)) { serieplot(x[cod==i], y[cod==i], c(0,max(x)), c(min(y),max(y)), xlab="", ylab="D [mm]", main=i) readline() }
data(hydroSIMN) annualflows[c(1:10),] x <- annualflows["dato"][,] y <- annualflows["anno"][,] cod <- annualflows["cod"][,] consistencyplot(y,cod) for (i in unique(cod)) { serieplot(x[cod==i], y[cod==i], c(0,max(x)), c(min(y),max(y)), xlab="", ylab="D [mm]", main=i) readline() }
Plots from books and articles.
Lmoment.ratio.diagram (grid=TRUE, ...) Lspace.HWvsAD (grid=TRUE, ...) Lspace.limits (grid=TRUE, ...)
Lmoment.ratio.diagram (grid=TRUE, ...) Lspace.HWvsAD (grid=TRUE, ...) Lspace.limits (grid=TRUE, ...)
grid |
should a grid be plotted? |
... |
other arguments |
Lmoment.ratio.diagram
plots points corresponding to two parameters distributions and lines corresponding to three parameters distributions on the 'L-CA - L-kur' plane.
The distributions are:
E = exponential,
G = gumbel,
L = logistic,
N = normal,
U = uniform,
GLO = generalized logistic,
GEV = generalized extreme-value,
GPA = generalized Pareto,
LN3 = lognormal,
PE3 = Pearson type III.
Lspace.HWvsAD
separate regions, in 'L-CA - L-CV' space, where the homogeneity tests of Hosking and Wallis (HW) and Anderson-Darling (AD) are preferable.
Lspace.limits
displays limits for regional L-moments in the 'L-CA - L-CV'.
For information on the package and the Author, and for all the references, see nsRFA
.
EXP
, GENLOGIS
, GENPAR
, LOGNORM
, GUMBEL
, GEV
, P3
Lmoment.ratio.diagram() Lspace.HWvsAD() Lspace.limits() data(hydroSIMN) annualflows[c(1:10),] x <- annualflows["dato"][,] cod <- annualflows["cod"][,] rlm <- regionalLmoments(x,cod) Lmoment.ratio.diagram() points(rlm["lcaR"],rlm["lkurR"],col="red",pch=19) Lspace.HWvsAD() points(rlm["lcaR"],rlm["lcvR"],col="red",pch=19)
Lmoment.ratio.diagram() Lspace.HWvsAD() Lspace.limits() data(hydroSIMN) annualflows[c(1:10),] x <- annualflows["dato"][,] cod <- annualflows["cod"][,] rlm <- regionalLmoments(x,cod) Lmoment.ratio.diagram() points(rlm["lcaR"],rlm["lkurR"],col="red",pch=19) Lspace.HWvsAD() points(rlm["lcaR"],rlm["lcvR"],col="red",pch=19)
Formation of disjoint regions for Regional Frequency Analysis.
traceWminim (X, centers) sumtraceW (clusters, X) nearest (clusters, X)
traceWminim (X, centers) sumtraceW (clusters, X) nearest (clusters, X)
X |
a numeric matrix of characteristics, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns) |
centers |
the number of clusters |
clusters |
a numeric vector containing the subdivision of |
The Euclidean distance is used.
Given different classification variables, the distance between two elements
and
is:
where is the value of the
-th variable of the
-th element.
The function traceWminim
is a composition of a jerarchical algorithm, the Ward (1963) one, and an optimisation procedure consisting in the minimisation of:
where
is the number of clusters (obtained initially with Ward's algorithm),
is the number of sites in the
-th cluster and
is the Euclidean distance between the
-th element of the
-th group and the center of mass of the
-th cluster.
is calculated with
sumtraceW
.
The algorithm consist in moving a site from one cluster to another if this makes decrease.
traceWminim
gives a vector defining the subdivision of elements characterized by X
in n=centers
clusters.
sumtraceW
gives (it is used by
traceWminim
).
nearest
gives the nearest site to the centers of mass of clusters (it is used by traceWminim
).
For information on the package and the Author, and for all the references, see nsRFA
.
data(hydroSIMN) parameters summary(parameters) # traceWminim param <- parameters[c("Hm","Ybar")] n <- dim(param)[1]; k <- dim(param)[2] param.norm <- (param - matrix(apply(param,2,mean),nrow=n,ncol=k, byrow=TRUE))/matrix(apply(param,2,sd), nrow=n,ncol=k,byrow=TRUE) clusters <- traceWminim(param.norm,4); names(clusters) <- parameters["cod"][,] clusters annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] fac <- factor(annualflows["cod"][,], levels=names(clusters[clusters==1])) x1 <- annualflows[!is.na(fac),"dato"] cod1 <- annualflows[!is.na(fac),"cod"] #HW.tests(x1,cod1) # it takes some time fac <- factor(annualflows["cod"][,], levels=names(clusters[clusters==3])) x3 <- annualflows[!is.na(fac),"dato"] cod3 <- annualflows[!is.na(fac),"cod"] #HW.tests(x3,cod3) # it takes some time
data(hydroSIMN) parameters summary(parameters) # traceWminim param <- parameters[c("Hm","Ybar")] n <- dim(param)[1]; k <- dim(param)[2] param.norm <- (param - matrix(apply(param,2,mean),nrow=n,ncol=k, byrow=TRUE))/matrix(apply(param,2,sd), nrow=n,ncol=k,byrow=TRUE) clusters <- traceWminim(param.norm,4); names(clusters) <- parameters["cod"][,] clusters annualflows summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] fac <- factor(annualflows["cod"][,], levels=names(clusters[clusters==1])) x1 <- annualflows[!is.na(fac),"dato"] cod1 <- annualflows[!is.na(fac),"cod"] #HW.tests(x1,cod1) # it takes some time fac <- factor(annualflows["cod"][,], levels=names(clusters[clusters==3])) x3 <- annualflows[!is.na(fac),"dato"] cod3 <- annualflows[!is.na(fac),"cod"] #HW.tests(x3,cod3) # it takes some time
varLmoments
provides distribution-free unbiased estimators of the variances and covariances of sample L-moments.
varLmoments (x, matrix=TRUE) varLCV (x) varLCA (x) varLkur (x)
varLmoments (x, matrix=TRUE) varLCV (x) varLCA (x) varLkur (x)
x |
vector representing a data-sample |
matrix |
if |
The estimation of the exact variance structure of sample L-moments is based on Elamir et Seheult (2004).
varLmoments
gives the matrix of unbiased estimates of the variance structure of sample L-moments:
this is a 4x4 matrix containg ,
,
,
on the main diagonal,
and the correspondant covariances elsewhere (
,
, etc.);
varLCV
gives the unbiased estimate of the variance of sample coefficient of L-variation of x
;
varLCA
gives the unbiased estimate of the variance of sample L-skewness of x
;
varLkur
gives the unbiased estimate of the variance of sample L-kurtosis of x
.
For information on the package and the Author, and for all the references, see nsRFA
.
x <- rnorm(30,10,2) varLmoments(x) varLmoments(x, FALSE) varLCV(x) varLCA(x) varLkur(x) data(hydroSIMN) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] dvarLmom <- function(x) {diag(varLmoments(x))} sapply(split(x,cod),dvarLmom)
x <- rnorm(30,10,2) varLmoments(x) varLmoments(x, FALSE) varLCV(x) varLCA(x) varLkur(x) data(hydroSIMN) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] dvarLmom <- function(x) {diag(varLmoments(x))} sapply(split(x,cod),dvarLmom)