Title: | Regional Frequency Analysis using L-Moments |
---|---|
Description: | Functions for regional frequency analysis using the methods of J. R. M. Hosking and J. R. Wallis (1997), "Regional frequency analysis: an approach based on L-moments". |
Authors: | J. R. M. Hosking [aut, cre] |
Maintainer: | J. R. M. Hosking <[email protected]> |
License: | Common Public License Version 1.0 |
Version: | 3.8 |
Built: | 2024-11-03 06:43:25 UTC |
Source: | CRAN |
R functions for regional frequency analysis using -moments.
This package implements methods described in the book
“Regional frequency analysis: an approach based on -moments”
by J. R. M. Hosking and J. R. Wallis.
It is a supplement to the lmom package,
which implements
-moment methods for more
general statistical applications.
The following functions are contained in this package.
cluagg
performs agglomerative hierarchical clustering.
cluinf
provides information about cluster membership in a
hierarchical clustering.
clukm
performs cluster analysis via the K-means algorithm.
as.regdata
creates an object of class "regdata"
, which contains a
“regional data set” consisting of summary statistics
for different data samples, one of the main building blocks
of regional frequency analysis.
regsamlmu
computes the sample -moments of multiple data sets.
regavlmom
and reglmr
both compute, with slightly different interfaces, a regional
weighted average of sample -moments from multiple sites.
Function
regavlmom
is recommended for general use;
reglmr
is deprecated.
regtst
computes discordancy, heterogeneity and goodness-of-fit measures
for regional frequency analysis. These statistics are as described
in Hosking and Wallis (1997, chaps. 3-5).
regfit
fits a frequency distribution to a regional data set, giving a
“regional frequency distribution”.
regqfunc
and siteqfunc
return the regional growth curve and the quantile functions for individual
sites, respectively, from a regional frequency distribution fitted by
regfit
.
regquant
and sitequant
directly compute quantiles of the regional growth curve and of
distributions for individual sites, respectively,
from a regional frequency distribution fitted by regfit
.
regsimh
runs Monte Carlo simulations to estimate the distribution of
heterogeneity and goodness-of-fit measures for an artificial region.
regsimq
runs Monte Carlo simulations to estimate the variability of
quantile estimates from a regional frequency distribution.
regquantbounds
and sitequantbounds
compute error bounds for the regional growth curve and for quantiles
at individual sites, respectively, from a regional frequency distribution
fitted by regfit
.
Functions cluagg
, cluinf
, clukm
, reglmr
,
and regtst
are analogous to Fortran routines from the
LMOMENTS package, version 3.04, available from StatLib at
https://lib.stat.cmu.edu/general/lmoments.
In addition, functions regsimh
and regsimq
provide similar functionality to PROGRAM XSIM
in the LMOMENTS Fortran package.
J. R. M. Hosking [email protected]
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on -moments.
Cambridge University Press.
Site characteristics and sample -moments of annual maximum streamflow
for 104 gaging stations in Appalachia.
Appalach
Appalach
A data frame with 104 observations on the following 11 variables:
siteid
Character vector: each site's Hydrologic Unit Code, a unique identifier.
lat
Numeric vector: gage latitude, in degrees.
long
Numeric vector: gage longitude, in degrees west of the Greenwich Meridian.
area
Numeric vector: drainage basin area, in square miles.
elev
Numeric vector: gage elevation, in feet.
n
Numeric vector: record length.
mean
Numeric vector: sample mean.
t
Numeric vector: sample -CV.
t_3
Numeric vector: sample -skewness.
t_4
Numeric vector: sample -kurtosis.
t_5
Numeric vector: sample -moment ratio
.
The data in columns lat
, long
, area
and elev
,
and the streamflow data used to compute the sample -moments,
were obtained from “Hydrodata” CD-ROMs (Hydrosphere, 1993),
which reproduce data from the U.S. Geological Survey's WATSTORE data files.
The file appalach.dat
in the LMOMENTS Fortran package (Hosking, 1996).
Hydrosphere (1993). Hydrodata CD-ROMs, vol. 4.0: USGS peak values. Hydrosphere Data Products, Boulder, Colo.
Hosking, J. R. M. (1996).
Fortran routines for use with the method of -moments, Version 3.
Research Report RC20525, IBM Research Division, Yorktown Heights, N.Y.
Appalach
Appalach
-moments of annual precipitation totals for the
“North Cascades” region of Plantico et al. (1990).
Cascades
Cascades
An object of class regdata
.
It is a data frame with 19 observations on the following 7 variables:
name
Character vector: site identifier.
n
Numeric vector: record length.
mean
Numeric vector: sample mean.
t
Numeric vector: sample -CV.
t_3
Numeric vector: sample -skewness.
t_4
Numeric vector: sample -kurtosis.
t_5
Numeric vector: sample -moment ratio
.
The data are summary statistics of annual precipitation totals at 19 sites in the northwest U.S., the “North Cascades” region of Plantico et al. (1990). The precipitation data were obtained from the Historical Climatology Network (Karl et al., 1990).
The file cascades.dat
in the LMOMENTS Fortran package (Hosking, 1996).
Hosking, J. R. M. (1996).
Fortran routines for use with the method of -moments, Version 3.
Research Report RC20525, IBM Research Division, Yorktown Heights, N.Y.
Karl, T. R., Williams, C. N., Quinlan, F. T., and Boden, T. A. (1990). United States historical climatology network (HCN) serial temperature and precipitation data. ORNL/CDIAC-30 NDP-019/R1, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, Oak Ridge, Tenn.
Plantico, M. S., Karl, T. R., Kukla, G., and Gavin, J. (1990). Is recent climate change across the United States related to rising levels of anthropogenic greenhouse gases? Journal of Geophysical Research, 95, 16617–16637.
Cascades # L-moment ratios, etc., for the Cascades data lmrd(Cascades) # L-moment ratio diagram for the Cascades data
Cascades # L-moment ratios, etc., for the Cascades data lmrd(Cascades) # L-moment ratio diagram for the Cascades data
Performs cluster analysis by one of several agglomerative hierarchical methods.
cluagg(x, method="ward")
cluagg(x, method="ward")
x |
A numeric matrix (or a data frame with all numeric columns, which will be coerced to a matrix). Contains the data: each row should contain the attributes for a single point. |
method |
Clustering method. Any method valid for
|
In agglomerative hierarchical clustering, there are initially clusters,
each containing one data point, labeled
through
in the same order as the data points.
At each stage of clustering, two clusters are merged.
Their labels are saved in the
merge
array.
The smaller of the two labels is used as the label of the merged cluster.
After the th stage of clustering there are
clusters.
To find which data points belong to which clusters, use function
cluinf
.
A list with elements as follows.
merge |
Matrix of dimension |
wgss |
Vector of length |
Clustering is performed internally by function hclust
in the R stats package.
J. R. M. Hosking [email protected]
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on -moments.
Cambridge University Press.
cluinf
to get details of the clusters
at a particular stage of the merging.
## Clustering of gaging stations in Appalachia, as in Hosking ## and Wallis (1997, sec. 9.2.3) data(Appalach) # Form attributes for clustering (Hosking and Wallis's Table 9.4) att <- cbind(a1 = log(Appalach$area), a2 = sqrt(Appalach$elev), a3 = Appalach$lat, a4 = Appalach$long) att <- apply(att, 2, function(x) x/sd(x)) att[,1] <- att[,1] * 3 # Clustering by Ward's method (cl<-cluagg(att)) # Details of the clustering with 7 clusters cluinf(cl,7)
## Clustering of gaging stations in Appalachia, as in Hosking ## and Wallis (1997, sec. 9.2.3) data(Appalach) # Form attributes for clustering (Hosking and Wallis's Table 9.4) att <- cbind(a1 = log(Appalach$area), a2 = sqrt(Appalach$elev), a3 = Appalach$lat, a4 = Appalach$long) att <- apply(att, 2, function(x) x/sd(x)) att[,1] <- att[,1] * 3 # Clustering by Ward's method (cl<-cluagg(att)) # Details of the clustering with 7 clusters cluinf(cl,7)
Agglomerative hierarchical clustering procedures typically produce
a list of the clusters merged at each stage of the clustering.
cluinf
uses this list to construct arrays that explicitly show
which cluster a given data point belongs to, and which data points belong
to a given cluster.
cluinf(merge, nclust)
cluinf(merge, nclust)
merge |
Matrix with 2 columns. The Can also be the object returned by a call to |
nclust |
Number of clusters. |
Information about the clustering that has nclust
clusters.
It is a list with the following elements:
assign |
Vector giving the assignment of items to clusters. |
list |
List with |
num |
Vector of length |
J. R. M. Hosking [email protected]
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on -moments.
Cambridge University Press.
## Clustering of gaging stations in Appalachia, as in Hosking ## and Wallis (1997, sec. 9.2.3) data(Appalach) # Form attributes for clustering (Hosking and Wallis's Table 9.4) att <- cbind(a1 = log(Appalach$area), a2 = sqrt(Appalach$elev), a3 = Appalach$lat, a4 = Appalach$long) att <- apply(att, 2, function(x) x/sd(x)) att[,1] <- att[,1] * 3 # Clustering by Ward's method (cl<-cluagg(att)) # Details of the clustering with 7 clusters cluinf(cl, 7)
## Clustering of gaging stations in Appalachia, as in Hosking ## and Wallis (1997, sec. 9.2.3) data(Appalach) # Form attributes for clustering (Hosking and Wallis's Table 9.4) att <- cbind(a1 = log(Appalach$area), a2 = sqrt(Appalach$elev), a3 = Appalach$lat, a4 = Appalach$long) att <- apply(att, 2, function(x) x/sd(x)) att[,1] <- att[,1] * 3 # Clustering by Ward's method (cl<-cluagg(att)) # Details of the clustering with 7 clusters cluinf(cl, 7)
Performs cluster analysis using the K-means algorithm.
clukm(x, assign, maxit = 10, algorithm = "Hartigan-Wong")
clukm(x, assign, maxit = 10, algorithm = "Hartigan-Wong")
x |
A numeric matrix (or a data frame with all numeric columns, which will be coerced to a matrix). Contains the data: each row should contain the attributes for a single point. |
assign |
A vector whose distinct values indicate the initial clustering of the points. |
maxit |
Maximum number of iterations. |
algorithm |
Clustering algorithm. Permitted values are the same as for
|
An object of class kmeans
. For details see the help
for kmeans
.
clukm
is a wrapper for the R function kmeans
.
The only difference is that in clukm
the user supplies an initial
assignment of sites to clusters (from which cluster centers are computed),
whereas in kmeans
the user supplies the initial cluster centers
explicitly.
J. R. M. Hosking [email protected]
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on -moments.
Cambridge University Press.
## Clustering of gaging stations in Appalachia, as in Hosking ## and Wallis (1997, sec. 9.2.3) data(Appalach) # Form attributes for clustering (Hosking and Wallis's Table 9.4) att <- cbind(a1 = log(Appalach$area), a2 = sqrt(Appalach$elev), a3 = Appalach$lat, a4 = Appalach$long) att <- apply(att, 2, function(x) x/sd(x)) att[,1] <- att[,1] * 3 # Clustering by Ward's method (cl <- cluagg(att)) # Details of the clustering with 7 clusters (inf <- cluinf(cl, 7)) # Refine the 7 clusters by K-means clkm <- clukm(att, inf$assign) # Compare the original and K-means clusters table(Kmeans=clkm$cluster, Ward=inf$assign) # Some details about the K-means clusters: range of area, number # of sites, weighted average L-CV and L-skewness bb <- by(Appalach, clkm$cluster, function(x) c( min.area = min(x$area), max.area = max(x$area), n = nrow(x), ave.t = round(weighted.mean(x$t, x$n), 3), ave.t_3 = round(weighted.mean(x$t_3, x$n), 3))) # Order the clusters in increasing order of minimum area ord <- order(sapply(bb, "[", "min.area")) # Make the result into a data frame. Compare with Hosking # and Wallis (1997), Table 9.5. do.call(rbind, bb[ord])
## Clustering of gaging stations in Appalachia, as in Hosking ## and Wallis (1997, sec. 9.2.3) data(Appalach) # Form attributes for clustering (Hosking and Wallis's Table 9.4) att <- cbind(a1 = log(Appalach$area), a2 = sqrt(Appalach$elev), a3 = Appalach$lat, a4 = Appalach$long) att <- apply(att, 2, function(x) x/sd(x)) att[,1] <- att[,1] * 3 # Clustering by Ward's method (cl <- cluagg(att)) # Details of the clustering with 7 clusters (inf <- cluinf(cl, 7)) # Refine the 7 clusters by K-means clkm <- clukm(att, inf$assign) # Compare the original and K-means clusters table(Kmeans=clkm$cluster, Ward=inf$assign) # Some details about the K-means clusters: range of area, number # of sites, weighted average L-CV and L-skewness bb <- by(Appalach, clkm$cluster, function(x) c( min.area = min(x$area), max.area = max(x$area), n = nrow(x), ave.t = round(weighted.mean(x$t, x$n), 3), ave.t_3 = round(weighted.mean(x$t_3, x$n), 3))) # Order the clusters in increasing order of minimum area ord <- order(sapply(bb, "[", "min.area")) # Make the result into a data frame. Compare with Hosking # and Wallis (1997), Table 9.5. do.call(rbind, bb[ord])
Plots a regional frequency distribution, optionally with error bounds for either the regional growth curve or the quantile function for an individual site.
The graph is an “extreme-value plot”, i.e. the horizontal axis is the quantile of an extreme-value type I (Gumbel) distribution, and the quantile function of that distribution would plot as a straight line.
## S3 method for class 'rfd' evplot(y, ybounds, npoints=101, add=FALSE, plim, xlim=c(-2,5), ylim, xlab=expression("Reduced variate, " * -log(-log(italic(F)))), ylab="Quantile", rp.axis=TRUE, type="l", lty=c(1,2), col=c(1,1), ...)
## S3 method for class 'rfd' evplot(y, ybounds, npoints=101, add=FALSE, plim, xlim=c(-2,5), ylim, xlab=expression("Reduced variate, " * -log(-log(italic(F)))), ylab="Quantile", rp.axis=TRUE, type="l", lty=c(1,2), col=c(1,1), ...)
y |
Object of class |
ybounds |
Optional. Object of class |
npoints |
Number of points to use in drawing the quantile function. The points are equally spaced along the x axis. |
add |
Logical: if |
plim |
X axis limits, specified as probabilities. |
xlim |
X axis limits, specified as values of the Gumbel reduced variate
|
ylim |
Y axis limits. |
xlab |
X axis label. |
ylab |
Y axis label. |
rp.axis |
Logical: whether to draw the “Return period” axis, a secondary horizontal axis. |
type |
Vector of plot types. The first element is for the quantile
function; subsequent elements are for the error bounds,
and will be used cyclically until all lines are drawn.
Interpreted in the same way as the |
lty |
Vector of line types. The first element is for the quantile function; subsequent elements are for the error bounds, and will be used cyclically until all lines are drawn. |
col |
Vector of colors. The first element is for the quantile function; subsequent elements are for the error bounds, and will be used cyclically until all lines are drawn. |
... |
Additional parameters are passed to the plotting routine. |
If ybounds
is missing, a graph is drawn of the quantile function
(regional growth curve) of the distribution specified by y
.
If ybounds
is present, it may contain error bounds for either
a regional growth curve or the quantile function at a single site.
This regional growth curve or site quantile function is plotted
using arguments type[1]
, lty[1]
and col[1]
.
Then, in each case, error bounds are added to the plot.
The ybounds
object typically contains,
for several probabilities specified by ybounds$bounds
,
error bounds corresponding to that probability
for several quantiles.
For thej
th bound probability, the bounds for the various quantiles
will be joined by straight lines (so to obtain a smooth curve
there should be a lot of quantiles!), using graphics parameters
type[j+1]
, lty[j+1]
and col[j+1]
.
J. R. M. Hosking [email protected]
regfit
, which creates objects of class "rfd"
;
regquantbounds
and sitequantbounds
,
which create objects of class "rfdbounds"
;
evdistp
, evdistq
, and evpoints
,
all in package lmom, for adding further curves and points
to the plot.
Cascades # An object of class "regdata" rfit <- regfit(Cascades, "gno") # Fit a generalized normal distribution evplot(rfit) # Plot the regional growth curve # Compute error bounds for quantile estimates. We will # (optimistically) generate bounds for a homogeneous region # with the same frequency distribution as the one fitted to # the Cascades data. fval <- seq(.01, .99, by=.01) # A lot of quantiles simq <- regsimq(rfit$qfunc, nrec=Cascades$n, nrep=100, f=fval, fit=rfit$dist) # Regional growth curve, and bounds rbounds <- regquantbounds(simq, rfit) evplot(rfit, rbounds) # Quantile function for site 3, and bounds sbounds <- sitequantbounds(simq, rfit, site=3) evplot(rfit, sbounds)
Cascades # An object of class "regdata" rfit <- regfit(Cascades, "gno") # Fit a generalized normal distribution evplot(rfit) # Plot the regional growth curve # Compute error bounds for quantile estimates. We will # (optimistically) generate bounds for a homogeneous region # with the same frequency distribution as the one fitted to # the Cascades data. fval <- seq(.01, .99, by=.01) # A lot of quantiles simq <- regsimq(rfit$qfunc, nrec=Cascades$n, nrep=100, f=fval, fit=rfit$dist) # Regional growth curve, and bounds rbounds <- regquantbounds(simq, rfit) evplot(rfit, rbounds) # Quantile function for site 3, and bounds sbounds <- sitequantbounds(simq, rfit, site=3) evplot(rfit, sbounds)
Annual maximum wind speeds at 12 sites in the southeast U.S.
Maxwind
Maxwind
A list of 12 numeric vectors.
The name of a list element is the site location, including a reference number used by Simiu et al. (1979). Each list element is a numeric vector containing the annual maximum wind speeds for that site. The period of observation varies from site to site: for details see Simiu et al. (1979).
Simiu, E., Changery, M. J., and Filliben, J. J. (1979). Extreme wind speeds at 129 stations in the contiguous United States. Building Science Series 118, National Bureau of Standards, Washington, D.C.
str(Maxwind) regsamlmu(Maxwind) # sample L-moments
str(Maxwind) regsamlmu(Maxwind) # sample L-moments
Computes a regional weighted average of -moments.
regavlmom(regdata, weight)
regavlmom(regdata, weight)
regdata |
Object of class |
weight |
Vector containing the weights to be used for each site.
If omitted, weights will be the sample size at each site,
taken from the second column of |
Vector containing the regional average -moments.
J. R. M. Hosking [email protected]
reglm <- regsamlmu(Maxwind) regavlmom(reglm) # Weight proportional to record length regavlmom(reglm, weight=1) # Equal weights
reglm <- regsamlmu(Maxwind) regavlmom(reglm) # Weight proportional to record length regavlmom(reglm, weight=1) # Equal weights
An object of class "regdata"
stores
summary statistics of the data for the sites in a region.
It is a data frame with each row containing data for one site.
The columns should contain
the site name, record length and -moments
and
-moment ratios, in the order
(mean),
(
-CV),
(
-skewness),
(
-kurtosis),
,
, etc.
There should be at least four columns, but most functions that use
objects of class "regdata"
typically require more columns.
Six or seven columns (4 or 5 -moments) is usually
adequate for regional frequency analysis.
Note that the fourth column should contain values of
the -CV
, not the
-scale
!
Objects of class "regdata"
are created by as.regdata
,
and by regsamlmu
(with default settings of its arguments).
They are used by several functions in package lmomRFA, including
regavlmom
(which computes regional average -moments),
regfit
(which fits a regional frequency distribution), and
regtst
(which computes discordancy, heterogeneity and
goodness-of-fit measures).
as.regdata(x, warn.names=TRUE)
as.regdata(x, warn.names=TRUE)
x |
R object. |
warn.names |
Logical: if |
as.regdata
converts an R object to class "regdata"
.
Only data frames and numeric matrices can be converted.
An object of class "regdata"
.
J. R. M. Hosking [email protected]
Cascades # 'Cascades' is of class "regdata" # Create a data frame with site statistics dd<-data.frame( name =c("site 1", "site 2", "site 3"), n =c( 20, 30, 40), mean =c( 100, 110, 120), LCV =c(0.20, 0.25, 0.30), L_skewness=c(0.15, 0.20, 0.25), L_kurtosis=c(0.10, 0.15, 0.20), t_5 =c(0.10, 0.12, 0.14)) # Convert to class "regdata" rdd<-as.regdata(dd) rdd class(rdd)
Cascades # 'Cascades' is of class "regdata" # Create a data frame with site statistics dd<-data.frame( name =c("site 1", "site 2", "site 3"), n =c( 20, 30, 40), mean =c( 100, 110, 120), LCV =c(0.20, 0.25, 0.30), L_skewness=c(0.15, 0.20, 0.25), L_kurtosis=c(0.10, 0.15, 0.20), t_5 =c(0.10, 0.12, 0.14)) # Convert to class "regdata" rdd<-as.regdata(dd) rdd class(rdd)
Fits a frequency distribution to a vector of regional average
-moments. Returns an object of class
"rfd"
,
which contains the specification of the regional frequency distribution:
the quantile function, parameters of the regional growth curve,
and the index-flood values (site-specific scale factors) for each site.
regfit(regdata, dist)
regfit(regdata, dist)
regdata |
Object of class |
dist |
Character string specifying the distribution to be fitted. See “Details” below. |
The function computes regional average -moments
(by calling
regavlmom
) and fits a probability distribution
to the regional average -moments.
This distribution has mean 1, i.e., the index flood is the
mean of the distribution.
For distribution dist
there should exist a function to estimate
the parameters of the distribution given a set of -moments.
The function should have a name that is the character string
"pel"
followed by the character string dist
.
It should accept a single argument, a vector containing -moments
,
,
,
, etc.,
and return a vector of distribution parameters.
For distribution dist
there should also exist a quantile function,
which should have a name that is the character string
"qua"
followed by the character string dist
.
It should accept two arguments: a vector of probabilities
and a vector containing the parameters of the distribution.
The search path used to find the "pel"
and "qua"
functions
is the same as for arguments supplied to regfit
, i.e.
the enclosing frames of the function, followed by the search path
specified by search()
.
The estimation routines and quantile functions in package lmom
have the form described here. For example, to use a
generalized extreme value distribution set dist
to be
the string "gev"
; then the fitting function pelgev
and the quantile function quagev
will be used
(unless these functions have been masked by another object
on the search path).
An object of class "rfd"
,
containing the specification of the regional frequency distribution:
It is a list with the following elements:
dist |
The character string |
para |
Vector containing the parameters of the fitted regional distribution. |
qfunc |
The quantile function of distribution |
rmom |
The regional average |
index |
Index flood values at each site. This is a named vector
whose values are the index flood values at each site, from |
J. R. M. Hosking [email protected]
data(Cascades) # An object of class "regdata" rfit <- regfit(Cascades, "gno") # Fit a generalized normal distribution rfit # Print details of the fitted distribution # (components 'dist' and 'para') rfit$index # Index flood values evplot(rfit) # Plot the regional growth curve evplot(qfunc=rfit$qfunc) # The same, but with more typing evplot(qfunc=regqfunc(rfit)) # The same, with still more typing
data(Cascades) # An object of class "regdata" rfit <- regfit(Cascades, "gno") # Fit a generalized normal distribution rfit # Print details of the fitted distribution # (components 'dist' and 'para') rfit$index # Index flood values evplot(rfit) # Plot the regional growth curve evplot(qfunc=rfit$qfunc) # The same, but with more typing evplot(qfunc=regqfunc(rfit)) # The same, with still more typing
Regional weighted average of -moments.
reglmr(xmom, weight)
reglmr(xmom, weight)
xmom |
Matrix or data frame each of whose rows contains the |
weight |
Vector containing the weights to be used for each site. If omitted, equal weights will be used. |
Vector containing the regional average -moments.
This function is deprecated and may be removed
from a future version of the package.
Function regavlmom
is the recommended replacement.
J. R. M. Hosking [email protected]
(xmom<-t(sapply(Maxwind,samlmu))) nrec<-sapply(Maxwind,length) reglmr(xmom,nrec) # weighted by record length reglmr(xmom) # unweighted
(xmom<-t(sapply(Maxwind,samlmu))) nrec<-sapply(Maxwind,length) reglmr(xmom,nrec) # weighted by record length reglmr(xmom) # unweighted
regquant
computes quantiles of a regional frequency distribution,
i.e., values of the regional growth curve.
regqfunc
returns a function that will compute the quantiles.
regquant(f, rfd) regqfunc(rfd)
regquant(f, rfd) regqfunc(rfd)
f |
Vector of probabilities. |
rfd |
Object of class |
regquant
returns a vector of quantiles.
regqfunc
returns the qfunc
element of rfd
.
This is a function that takes one argument,
which should be a vector of probabilities, and returns a vector of quantiles.
J. R. M. Hosking [email protected]
rfit <- regfit(Cascades,"gno") # Fit regional distribution # Compute some quantiles regquant(seq(0.1, 0.9, by=0.1), regfit(Cascades,"gno")) # Get the quantile function (regional growth curve) rgc <- regqfunc(rfit) # Compute quantiles by evaluating the regional growth curve rgc(seq(0.1, 0.9, by=0.1)) # Plot the regional growth curve curve(rgc, 0.01, 0.99)
rfit <- regfit(Cascades,"gno") # Fit regional distribution # Compute some quantiles regquant(seq(0.1, 0.9, by=0.1), regfit(Cascades,"gno")) # Get the quantile function (regional growth curve) rgc <- regqfunc(rfit) # Compute quantiles by evaluating the regional growth curve rgc(seq(0.1, 0.9, by=0.1)) # Plot the regional growth curve curve(rgc, 0.01, 0.99)
For a regional frequency distribution, the functions compute
the root mean square error (RMSE) and error bounds for quantiles
either of the regional growth curve (regquantbounds
)
or of distributions at individual sites (sitequantbounds
).
regquantbounds(relbounds, rfd) sitequantbounds(relbounds, rfd, sitenames, index, seindex, drop = TRUE)
regquantbounds(relbounds, rfd) sitequantbounds(relbounds, rfd, sitenames, index, seindex, drop = TRUE)
relbounds |
An object of class |
rfd |
An object of class |
sitenames |
Vector of site names. |
index |
Values of the estimated site-specific scale factor (“index flood”) for the sites. |
seindex |
Standard errors of the estimates in |
drop |
Logical: if |
The relative RMSE values from relbounds
are
multiplied by the quantile values from rfd
to yield absolute RMSE values for quantile estimates,
and the quantile values from rfd
are divided by
the error bounds from relbounds
to yield error bounds for quantiles,
as in Hosking and Wallis (1997), eq. (6.19).
These computations apply to quantiles
either of the regional growth curve (for regquantbounds
) or
of the frequency distributions at individual sites (for sitequantbounds
).
If argument index
of sitequantbounds
is missing,
then results (RMSE and error bounds of quantiles) are computed for sites
in the region specified by rfd
and its index
component,
assuming that the site-specific scale factor (“index flood”)
is estimated by the sample mean at each site, computed from the same data set
that was used to fit the regional frequency distribution.
If index
and sitenames
are both missing, then results
will be computed for all of the sites in the region specified by rfd
.
If index
is missing and sitenames
is present,
then error bounds will be computed for a subset of the
sites in the region specified by rfd
.
sitenames
will be used to select sites from the vector rfd$index
,
either by position or by name.
If argument index
of sitequantbounds
is present,
then results are computed for arbitrary sites
(for example, ungauged sites for which the regional growth curve
of the regional frequency distribution rfd
is believed to apply),
assuming that the site-specific scale factor (“index flood”)
is estimated from data that are (approximately) statistically independent
of the data used to fit the regional frequency distribution.
In this case relbounds$sim.rgc
must not be NULL
,
i.e. relbounds
should have been generated by a call to
regsimq
with argument save=TRUE
.
If index
and sitenames
are both present,
they must have the same length, and will be taken to refer to sites
whose names are the elements of sitenames
and whose index-flood values
are the elements of index
.
If index
is present and sitenames
is missing,
results are computed for sites whose index-flood values
are the elements of index
; if index
has names, these
names will be used as the site names.
When index
and seindex
are specified, it is assumed
in the simulation procedure that the estimated index flood value has a
gamma distribution with mean index
and standard deviation seindex
.
As noted by Hosking and Wallis (1997, discussion following (6.19)),
error bounds in the lower tail of the distribution may be unhelpful
when the fitted distribution can take negative values.
In these cases the computed bounds will be
NA
(if the quantile estimate is negative) or
Inf
(if the quantile estimate is positive but the
corresponding error bound in relbounds
is negative).
For regquantbounds
, an object of class "rfdbounds"
.
This is a data frame with columns
f
, probabilities for which quantiles are estimated;
qhat
, estimated quantiles;
RMSE
, RMSE of the estimated quantiles.
Also, for each bound probability in relbounds$boundprob
, there is
a column containing the error bound corresponding to that probability.
The object also has an attribute "boundprob"
that contains the
bound probabilities.
For sitequantbounds
, a list each of whose components is
an object of class "rfdbounds"
containing results for one site.
In this case the second column of the data frame is named Qhat
,
not qhat
.
If drop
is TRUE
and the list has one component,
a single "rfdbounds"
object is returned.
For exact definitions of quantities returned by functions
regquantbounds
and regsitebounds
, see vignette RegSim
.
For a region that is confidently believed to be homogeneous,
the region used to generate the results in relbounds
may be the same as that specified by rfd
.
In practice, it is often acknowledged that some degree of heterogeneity
is present in the data to which the distribution rfd
is fitted.
The simulations used in function regsimq
to generate relbounds
can then be based on a region whose specification includes
an appropriate degree of heterogeneity, and the error bounds calculated
by regquantbounds
and sitequantbounds
will honestly
reflect the failure of the assumption of homogeneity
made by regfit
(i.e. that the at-site growth curves
are the same for all sites in the region) to hold exactly.
The example below illustrates this practice.
J. R. M. Hosking [email protected]
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on -moments.
Cambridge University Press.
regsimq
, which runs the simulations
that generate the results returned by regquantbounds
.
data(Cascades) # A regional data set rmom <- regavlmom(Cascades) # Regional average L-moments # Fit a generalized normal distribution to the regional data rfit <- regfit(Cascades, "gno") # Set up an artificial region to be simulated: # -- Same number of sites as Cascades # -- Same record lengths as Cascades # -- Same site means as Cascades # -- L-CV varies linearly across sites, with mean value equal # to the regional average L-CV for the Cascades data. # 'LCVrange' specifies the range of L-CV across the sites, # and is chosen to reflect the amount of heterogeneity that # may reasonably be believed to be present in the Cascades # data (see the example for 'regsimh'). # -- L-skewness is the same at each site, and is equal to the # regional average L-skewness for the Cascades data nsites <- nrow(Cascades) means <- Cascades$mean LCVrange <- 0.025 LCVs <- seq(rmom[2]-LCVrange/2, rmom[2]+LCVrange/2, len=nsites) Lskews<-rep(rmom[3], nsites) # Each site will have a generalized normal distribution: # get the parameter values for each site pp <- t(apply(cbind(means, means*LCVs ,Lskews), 1, pelgno)) pp # Set correlation between each pair of sites to 0.64, the # average inter-site correlation for the Cascades data avcor <- 0.64 # Run the simulation. To save time, use only 100 replications. simq <- regsimq(qfunc=quagno, para=pp, cor=avcor, nrec=Cascades$n, nrep=100, fit="gno") # Apply the simulated bounds to the estimated regional growth curve regquantbounds(simq, rfit) # Apply the simulated bounds to quantiles for site 3 sitequantbounds(simq, rfit, site=3) # Apply the simulated bounds to quantiles for a site whose mean # is estimated to be 100 with standard error 25 sitequantbounds(simq, rfit, index=100, seindex=25)
data(Cascades) # A regional data set rmom <- regavlmom(Cascades) # Regional average L-moments # Fit a generalized normal distribution to the regional data rfit <- regfit(Cascades, "gno") # Set up an artificial region to be simulated: # -- Same number of sites as Cascades # -- Same record lengths as Cascades # -- Same site means as Cascades # -- L-CV varies linearly across sites, with mean value equal # to the regional average L-CV for the Cascades data. # 'LCVrange' specifies the range of L-CV across the sites, # and is chosen to reflect the amount of heterogeneity that # may reasonably be believed to be present in the Cascades # data (see the example for 'regsimh'). # -- L-skewness is the same at each site, and is equal to the # regional average L-skewness for the Cascades data nsites <- nrow(Cascades) means <- Cascades$mean LCVrange <- 0.025 LCVs <- seq(rmom[2]-LCVrange/2, rmom[2]+LCVrange/2, len=nsites) Lskews<-rep(rmom[3], nsites) # Each site will have a generalized normal distribution: # get the parameter values for each site pp <- t(apply(cbind(means, means*LCVs ,Lskews), 1, pelgno)) pp # Set correlation between each pair of sites to 0.64, the # average inter-site correlation for the Cascades data avcor <- 0.64 # Run the simulation. To save time, use only 100 replications. simq <- regsimq(qfunc=quagno, para=pp, cor=avcor, nrec=Cascades$n, nrep=100, fit="gno") # Apply the simulated bounds to the estimated regional growth curve regquantbounds(simq, rfit) # Apply the simulated bounds to quantiles for site 3 sitequantbounds(simq, rfit, site=3) # Apply the simulated bounds to quantiles for a site whose mean # is estimated to be 100 with standard error 25 sitequantbounds(simq, rfit, index=100, seindex=25)
Computes the “unbiased” sample -moments and
-moment
ratios of multiple sets of data stored in a list or matrix.
Following the paradigm of regional frequency analysis,
we regard the data sets as coming from different measurement sites.
regsamlmu(x, nmom = 5, sort.data = TRUE, lcv = TRUE)
regsamlmu(x, nmom = 5, sort.data = TRUE, lcv = TRUE)
x |
A list of numeric vectors, or a numeric matrix. |
nmom |
Number of |
sort.data |
Logical: whether each data set should be sorted. |
lcv |
Logical.
If |
Sample -moments are computed for each data set.
The calculations use
samlmu
internally.
If x
is a list, each list element should contain data for one site
and the names of the list elements should be the site names.
If x
is a matrix, each column should contain data for one site
and the column names should be the site names.
An object of class regdata
.
It is a data frame with columns "name"
and "n"
,
containing respectively the site names and the
number of non-missing data values at each site,
and further columns containing the -moments and
-moment ratios,
in the order
,
(or
),
,
, etc.
The default parameter values are chosen to be convenient
for the regional frequency analysis methods described by
Hosking and Wallis (1997).
Note that the number of -moments and the choice
of whether to return
-CV or
-scale
are different from the defaults for
samlmu
.
Users of the LMOMENTS Fortran package, version 3.04, should note that
its PROGRAM XFIT
by default uses plotting-position estimators
of -moment ratios, which give different results from the
“unbiased” estimators used by
regsamlmu
(and by all other functions in package lmomRFA).
J. R. M. Hosking [email protected]
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on -moments.
Cambridge University Press.
data(Maxwind) # a list regsamlmu(Maxwind) data(airquality) # a data frame regsamlmu(airquality[1:4])
data(Maxwind) # a list regsamlmu(Maxwind) data(airquality) # a data frame regsamlmu(airquality[1:4])
Estimates, using Monte Carlo simulation, the distribution of
heterogeneity and goodness-of-fit measures for regional frequency analysis.
These are the statistics and
defined respectively in sections 4.3.3 and 5.2.3 of Hosking and Wallis (1997).
regsimh(qfunc, para, cor = 0, nrec, nrep = 500, nsim = 500)
regsimh(qfunc, para, cor = 0, nrec, nrep = 500, nsim = 500)
qfunc |
List containing the quantile functions for each site. Can also be a single quantile function, which will be used for each site. |
para |
Parameters of the quantile functions at each site.
If |
cor |
Specifies the correlation matrix of the frequency distribution of each site's data. Can be a matrix (which will be rescaled to a correlation matrix if necessary) or a constant (which will be taken as the correlation between each pair of sites). |
nrec |
Numeric vector containing the record lengths at each site. |
nrep |
Number of simulated regions. |
nsim |
Number of simulations used, within each of the |
A realization is generated of data simulated from
the region specified by parameters qfunc
, para
, and cor
,
and with record lengths at each site specified by argument nrec
.
The simulation procedure is as described in Hosking and Wallis (1997),
Table 6.1, through step 3.1.2.
Heterogeneity and goodness-of-fit measures are computed
for the realization, using the same method as in function regtst
.
The entire procedure is repeated nrep
times, and the values
of the heterogeneity and goodness-of-fit measures are saved.
Average values, across all nrep
realizations,
of the heterogeneity and goodness-of-fit measures are computed.
An object of class "regsimh"
.
This is a list with the following components:
nrep |
The number of simulated regions (argument |
nsim |
The number of simulation used within each region
(argument |
results |
Matrix of dimension 8 |
means |
Vector of length 8, containing the mean values,
across the |
J. R. M. Hosking [email protected]
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on -moments.
Cambridge University Press.
regtst
for details of the
heterogeneity and goodness-of-fit measures.
## Not run: data(Cascades) # A regional data set rmom<-regavlmom(Cascades) # Regional average L-moments # Set up an artificial region to be simulated: # -- Same number of sites as Cascades # -- Same record lengths as Cascades # -- Mean 1 at every site (results do not depend on the site means) # -- L-CV varies linearly across sites, with mean value equal # to the regional average L-CV for the Cascades data. # 'LCVrange' specifies the range of L-CV across the sites. # -- L-skewness the same at each site, and equal to the regional # average L-skewness for the Cascades data nsites <- nrow(Cascades) means <- rep(1,nsites) LCVrange <- 0.025 LCVs <- seq(rmom[2]-LCVrange/2, rmom[2]+LCVrange/2, len=nsites) Lskews<-rep(rmom[3], nsites) # Each site will have a generalized normal distribution: # get the parameter values for each site pp <- t(apply(cbind(means, means*LCVs ,Lskews), 1, pelgno)) # Set correlation between each pair of sites to 0.64, the # average inter-site correlation for the Cascades data avcor <- 0.64 # Run the simulation. It will take some time (about 25 sec # on a Lenovo W500, a moderately fast 2011-vintage laptop) # Note that the results are consistent with the statement # "the average H value of simulated regions is 1.08" # in Hosking and Wallis (1997, p.98). set.seed(123) regsimh(qfunc=quagno, para=pp, cor=avcor, nrec=Cascades$n, nrep=100) ## End(Not run)
## Not run: data(Cascades) # A regional data set rmom<-regavlmom(Cascades) # Regional average L-moments # Set up an artificial region to be simulated: # -- Same number of sites as Cascades # -- Same record lengths as Cascades # -- Mean 1 at every site (results do not depend on the site means) # -- L-CV varies linearly across sites, with mean value equal # to the regional average L-CV for the Cascades data. # 'LCVrange' specifies the range of L-CV across the sites. # -- L-skewness the same at each site, and equal to the regional # average L-skewness for the Cascades data nsites <- nrow(Cascades) means <- rep(1,nsites) LCVrange <- 0.025 LCVs <- seq(rmom[2]-LCVrange/2, rmom[2]+LCVrange/2, len=nsites) Lskews<-rep(rmom[3], nsites) # Each site will have a generalized normal distribution: # get the parameter values for each site pp <- t(apply(cbind(means, means*LCVs ,Lskews), 1, pelgno)) # Set correlation between each pair of sites to 0.64, the # average inter-site correlation for the Cascades data avcor <- 0.64 # Run the simulation. It will take some time (about 25 sec # on a Lenovo W500, a moderately fast 2011-vintage laptop) # Note that the results are consistent with the statement # "the average H value of simulated regions is 1.08" # in Hosking and Wallis (1997, p.98). set.seed(123) regsimh(qfunc=quagno, para=pp, cor=avcor, nrec=Cascades$n, nrep=100) ## End(Not run)
Computes, using Monte Carlo simulation, relative error bounds
for estimated quantiles of a regional frequency distribution
fitted by regfit
.
regsimq(qfunc, para, cor = 0, index = NULL, nrec, nrep = 10000, fit = "gev", f = c(0.01, 0.1, 0.5, 0.9, 0.99, 0.999), boundprob = c(0.05, 0.95), save = TRUE)
regsimq(qfunc, para, cor = 0, index = NULL, nrec, nrep = 10000, fit = "gev", f = c(0.01, 0.1, 0.5, 0.9, 0.99, 0.999), boundprob = c(0.05, 0.95), save = TRUE)
qfunc |
List containing the quantile functions for each site. Can also be a single quantile function, which will be used for each site. |
para |
Parameters of the quantile functions at each site.
If |
cor |
Specifies the correlation matrix of the frequency distribution of each site's data. Can be a matrix (which will be rescaled to a correlation matrix if necessary) or a constant (which will be taken as the correlation between each pair of sites). |
index |
Specifies the value of the site-specific scale factor
(“index flood”) at each site. Can be:
a vector, containing the values at each site;
a constant, which will be taken to be the index flood value at each site;
or (the default) |
nrec |
Numeric vector containing the record lengths at each site. |
nrep |
Number of simulated regions. |
fit |
Character string specifying the distribution to be fitted. See “Details” below. |
f |
Vector of probabilities corresponding to the quantiles whose accuracy is to be estimated. |
boundprob |
Vector of probabilities for which error bounds will be computed. |
save |
Logical. Should the simulated values of the ratio of
the estimated to the true regional growth curve be saved?
These values are needed when |
A realization of data from a region is generated as follows.
The frequency distributions at sites (specified by
arguments qfunc
and para
) are expressed
as where
is the site-specific scale factor
(“index flood”) and
is the at-site growth curve.
At each simulation run the at-site growth curves of each site
are randomly permuted, and are scaled by the (unpermuted)
index flood values for the sites.
Data are simulated from these frequency distributions,
with inter-site correlation specified by argument
cor
and record lengths at each site specified by argument nrec
.
The regional frequency distribution specified by argument fit
is then fitted to the simulated data, as in function regfit
.
The procedure is as described in Hosking and Wallis (1997), Table 6.1,
except that the permutation of at-site growth curves is a later
modification, intended to give more realistic sets of simulated data.
For more details, including exact definitions of quantities computed
in the simulation and returned by functions regsimq
,
regquantbounds
, and regsitebounds
, see vignette RegSim
.
From each realization the sample mean values and estimates of the
quantiles of the regional growth curve, for nonexceedance probabilities
specified by argument f
, are saved.
From the simulated values, for each quantile specified by argument f
the relative root mean square error (relative RMSE) is computed as in
Hosking and Wallis (1997, eq. (6.15)).
Error bounds are also computed, as in Hosking and Wallis (1997, eq. (6.18))
but with bound probabilities specified by argument boundprob
rather than the fixed values 0.05 and 0.95 considered by Hosking and Wallis.
The error bounds are sample quantiles, across the nrep
realizations,
of the ratio of the estimated regional growth curve
to the true at-site growth curve
or of the ratio of the estimated to the true quantiles at individual sites.
For distribution fit
there should exist a function to estimate
the parameters of the distribution given a set of -moments.
The function should have a name that is the character string
"pel"
followed by the character string fit
.
It should accept a single argument, a vector containing -moments
,
,
,
, etc.,
and return a vector of distribution parameters.
For distribution fit
there should also exist a quantile function,
which should have a name that is the character string
"qua"
followed by the character string fit
.
It should accept two arguments: a vector of probabilities
and a vector containing the parameters of the distribution.
The search path used to find the "pel"
and "qua"
functions
is the same as for arguments supplied to regsimq
, i.e.
the enclosing frames of the function, followed by the search path
specified by search()
.
The estimation routines and quantile functions in package lmom
have the form described here. For example, to use a
generalized extreme value distribution set fit
to be
the string "gev"
; then the fitting function pelgev
and the quantile function quagev
will be used
(unless these functions have been masked by another object
on the search path).
An object of class "regsimq"
.
This is a list with the following components:
f |
Vector of probabilities corresponding to the quantiles
whose accuracy is to be estimated. A copy of argument |
boundprob |
Vector of probabilities corresponding to the error bounds.
A copy of argument |
nrep |
Number of simulated regions. |
relbounds.rgc |
Data frame containing the relative RMSE and
error bounds for the regional growth curve. It has columns
|
relbounds.by.site |
List of |
true.asgc |
If If |
sim.rgc |
If If |
Error bounds for the regional growth curve apply to the
regional growth curve regarded as an estimator of the
growth curve for a randomly chosen site.
The growth curve for site is defined to be
where
is the site's quantile function
and
is the site-specific scale factor
(“index flood”) for site
.
For each of the
nrep
simulated regions,
and each probability in
f
,
the regional growth curve is estimated
and the ratios
are calculated.
The relative error bounds are empirical quantiles,
corresponding to the probabilities in
boundprob
,
of the nrep * length(nrec)
values of these ratios
obtained from the simulations.
This differs from the approach taken in Hosking and Wallis (1997),
Table 6.2 and Fig. 6.2, in which error bounds are computed regarding
the estimated regional growth curve as an estimator of the regional
average growth curve , the harmonic mean
of the sites' growth curves (Hosking and Wallis, 1997, p. 102).
When the parent region is homogeneous, with identical frequency distributions at each site (apart from a site-specific scale factor), the two approaches give identical results. For heterogeneous regions the “regard as estimator of randomly chosen site growth curve” approach yields error bounds that are wider, but are more realistic given that the ultimate aim of regional frequency analysis is estimation of quantiles at individual sites.
J. R. M. Hosking [email protected]
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on -moments.
Cambridge University Press.
regfit
, for details of fitting a regional frequency distribution;
regquantbounds
and sitequantbounds
, for
converting the relative bounds returned by regsimq
into absolute bounds
for quantiles of the regional growth curve or of the
frequency distributions at individual sites.
data(Cascades) # A regional data set rmom <- regavlmom(Cascades) # Regional average L-moments # Fit generalized normal distribution to regional data rfit <- regfit(Cascades, "gno") # Set up an artificial region to be simulated: # -- Same number of sites as Cascades # -- Same record lengths as Cascades # -- Same site means as Cascades # -- L-CV varies linearly across sites, with mean value equal # to the regional average L-CV for the Cascades data. # 'LCVrange' specifies the range of L-CV across the sites. # -- L-skewness the same at each site, and equal to the regional # average L-skewness for the Cascades data nsites <- nrow(Cascades) means <- Cascades$mean LCVrange <- 0.025 LCVs <- seq(rmom[2]-LCVrange/2, rmom[2]+LCVrange/2, len=nsites) Lskews<-rep(rmom[3], nsites) # Each site will have a generalized normal distribution: # get the parameter values for each site pp <- t(apply(cbind(means, means*LCVs ,Lskews), 1, pelgno)) # Set correlation between each pair of sites to 0.64, the # average inter-site correlation for the Cascades data avcor <- 0.64 # Run the simulation. To save time, use only 100 replications. simq <- regsimq(qfunc=quagno, para=pp, cor=avcor, nrec=Cascades$n, nrep=100, fit="gno") # Relative RMSE and error bounds for the regional growth curve simq$relbounds.rgc # Relative RMSE and error bounds for quantiles at site 3 simq$relbounds.by.site[[3]]
data(Cascades) # A regional data set rmom <- regavlmom(Cascades) # Regional average L-moments # Fit generalized normal distribution to regional data rfit <- regfit(Cascades, "gno") # Set up an artificial region to be simulated: # -- Same number of sites as Cascades # -- Same record lengths as Cascades # -- Same site means as Cascades # -- L-CV varies linearly across sites, with mean value equal # to the regional average L-CV for the Cascades data. # 'LCVrange' specifies the range of L-CV across the sites. # -- L-skewness the same at each site, and equal to the regional # average L-skewness for the Cascades data nsites <- nrow(Cascades) means <- Cascades$mean LCVrange <- 0.025 LCVs <- seq(rmom[2]-LCVrange/2, rmom[2]+LCVrange/2, len=nsites) Lskews<-rep(rmom[3], nsites) # Each site will have a generalized normal distribution: # get the parameter values for each site pp <- t(apply(cbind(means, means*LCVs ,Lskews), 1, pelgno)) # Set correlation between each pair of sites to 0.64, the # average inter-site correlation for the Cascades data avcor <- 0.64 # Run the simulation. To save time, use only 100 replications. simq <- regsimq(qfunc=quagno, para=pp, cor=avcor, nrec=Cascades$n, nrep=100, fit="gno") # Relative RMSE and error bounds for the regional growth curve simq$relbounds.rgc # Relative RMSE and error bounds for quantiles at site 3 simq$relbounds.by.site[[3]]
Computes discordancy, heterogeneity and goodness-of-fit measures
for regional frequency analysis.
These are the statistics ,
, and
defined respectively in sections 3.2.3, 4.3.3, and 5.2.3 of
Hosking and Wallis (1997).
regtst(regdata, nsim=1000) regtst.s(regdata, nsim=1000)
regtst(regdata, nsim=1000) regtst.s(regdata, nsim=1000)
regdata |
Object of class Note that the fourth column should contain values of
the Function |
nsim |
Number of simulations to use in the calculation of the heterogeneity and goodness-of-fit measures. If less than 2, only the discordancy measure will be calculated. |
The discordancy measure indicates, for site
,
the discordancy between the site's
-moment ratios
and the (unweighted) regional average
-moment ratios.
Large values might be used as a flag to indicate potential errors
in the data at the site. “Large” might be 3 for regions with 15
or more sites, but less (exact values in list element
Dcrit
)
for smaller regions.
Three heterogeneity measures are calculated, each based on
a different measure of between-site dispersion of -moment ratios:
[1] weighted standard deviation of
-CVs;
[2] average of
-CV/
-skew distances;
[3] average of
-skew/
-kurtosis distances.
These dispersion measures are the quantities
,
,
and
defined respectively in equations (4.4), (4.6), and (4.7)
of Hosking and Wallis (1997).
The heterogeneity measures are calculated from them as in
equation (4.5) of Hosking and Wallis (1997).
In practice
H[1]
is probably sufficient. A value greater than
(say) 1.0 suggests that further subdivision of the region should
be considered as it might improve the accuracy of quantile estimates.
Goodness of fit is evaluated for five candidate distributions:
generalized logistic,
generalized extreme value,
generalized normal (lognormal),
Pearson type III (3-parameter gamma), and
generalized Pareto.
In the output the distributions are referred to by 3-letter abbreviations,
respectively glo
, gev
, gno
, pe3
, and gpa
.
If the region is homogeneous and data at different sites are
statistically independent, then if one of the distributions is
the true distribution for the region its goodness-of-fit measure
should have approximately a standard normal distribution.
Provided that the region is acceptably close to homogeneous,
the fit may be judged acceptable at the 10 per cent significance level
if the value is less than 1.645 (i.e.,
qnorm(0.95)
) in absolute value.
Calculation of heterogeneity and goodness-of-fit measures
involves the sampling variability of -moment ratios
in a homogeneous region whose record lengths and
average
-moment ratios match those of the data.
The sampling variability is estimated by Monte Carlo simulation
using
nsim
replications of the region.
Results will vary between invocations of regtst
with different seeds for the random-number generator.
In the homogeneous region used in the simulations, the sites have a
kappa distribution, fitted to the regional average -moment ratios
of the data in
regdata
. The kappa fit may fail if the regional average
-kurtosis is high relative to the regional average
-skewness.
In this case a kappa distribution is fitted with shape parameter
constrained to be
(i.e., a generalized logistic distribution);
this gives the largest possible
-kurtosis value for a kappa distribution
with given
-skewness.
regtst
and regtst.s
are functionally identical.
regtst
calls a Fortran routine internally and is faster,
typically by a factor of 3 or 4.
regtst.s
is written almost entirely in the S language;
it is provided so that users can see how the calculations are done,
and can conveniently alter the code for their own purposes if necessary.
An object of class "regtst"
, which is a list with elements as follows.
data |
The input data, i.e. data frame |
nsim |
Number of simulations, i.e. the argument |
D |
Vector containing the discordancy measures for each site. |
Dcrit |
Vector of length 2 containing critical values of the discordancy measure corresponding to significance levels of 10 and 5 per cent — except that the values never exceed 3 and 4 respectively. See Hosking and Wallis (1997), section 3.2.4. |
rmom |
Vector of length 5 containing the regional weighted average
|
rpara |
Vector of length 4 containing the parameters of a kappa distribution
fitted to the regional weighted average |
vobs |
Vector of length 3 containing the observed values of the three
measures of between-site dispersion of |
vbar |
Vector of length 3 containing the mean of the simulated values of the three dispersion measures. |
vsd |
Vector of length 3 containing the standard deviation of the simulated values of the three dispersion measures. |
H |
Vector of length 3 containing the three measures of regional heterogeneity. |
para |
List of length 6 containing the parameters of the five candidate
distributions and the Wakeby distribution (3-letter abbreviation
|
t4fit |
Vector of length 5 containing the |
Z |
Vector of length 5 containing the goodness-of-fit measures for each of the five candidate distributions. |
Data frame regdata
may have only six columns,
i.e. the fifth -moment ratio
may be omitted.
In this case the return value will contain missing values for
rmom[5]
and the elements of para$wak
.
J. R. M. Hosking [email protected]
Hosking, J. R. M. (1996).
Fortran routines for use with the method of -moments, Version 3.
Research Report RC20525, IBM Research Division, Yorktown Heights, N.Y.
Hosking, J. R. M., and Wallis, J. R. (1997).
Regional frequency analysis: an approach based on -moments.
Cambridge University Press.
summary.regtst
for summaries.
# An example from Hosking (1996). Compare the output with # the file 'cascades.out' in the LMOMENTS Fortran package at # https://lib.stat.cmu.edu/general/lmoments (results will not # be identical, because random-number generators are different). summary(regtst(Cascades, nsim=500)) # Output from 'regsamlmu' can be fed straight into 'regtst' regtst(regsamlmu(Maxwind))
# An example from Hosking (1996). Compare the output with # the file 'cascades.out' in the LMOMENTS Fortran package at # https://lib.stat.cmu.edu/general/lmoments (results will not # be identical, because random-number generators are different). summary(regtst(Cascades, nsim=500)) # Output from 'regsamlmu' can be fed straight into 'regtst' regtst(regsamlmu(Maxwind))
Quantiles and quantile functions for individual sites in a region.
sitequant
computes quantiles directly;
siteqfunc
returns a function that will compute quantiles.
sitequant(f, rfd, sitenames, index, drop = TRUE) siteqfunc(rfd, sitenames, index)
sitequant(f, rfd, sitenames, index, drop = TRUE) siteqfunc(rfd, sitenames, index)
f |
Vector of probabilities. |
rfd |
Object of class |
sitenames |
Vector of site names. |
index |
Values of the site-specific scale factor (“index flood”) for the sites. |
drop |
Logical: if |
If index
and sitenames
are both present,
they must have the same length, and will be taken to refer to sites
whose names are the elements of sitename
and whose index-flood values
are the elements of index
.
If index
is present and sitenames
is missing,
quantiles are computed for sites whose index-flood values
are the elements of index
; if index
has names, these
names will be used as the site names.
If sitenames
is present and index
is missing,
then quantiles will be computed for a subset of the
sites in the region specified by rfd
.
sitenames
will be used to select sites from the vector rfd$index
,
either by position or by name.
If sitenames
and index
are both missing,
then quantiles will be computed for all of the
sites in the region specified by rfd
.
For sitequant
, a matrix whose rows contain quantiles
for a single site, for the probabilities specified in f
.
If drop
is TRUE
and the matrix has only one row or column,
it will be returned as a vector.
For siteqfunc
, a function or a list of functions that each compute
quantiles for one site.
Each function takes a single argument, a vector of probabilities,
and returns a vector of quantiles.
J. R. M. Hosking [email protected]
rfit <- regfit(Cascades, "gno") # Fit regional distribution ## Quantiles for: # - sites in the Cascades data set, indexed by number sitequant(c(0.9, 0.99, 0.999), rfit, sitenames=1:3) # - sites in the Cascades data set, indexed by name sitequant(c(0.9, 0.99, 0.999), rfit, sitenames=c("350304", "351433", "351862")) # - other sites, with specified index floods sitequant(c(0.9, 0.99, 0.999), rfit, index=c(80, 100)) # - other sites, with specified index floods and names sitequant(c(0.9, 0.99, 0.999), rfit, index=c(80, 100), sitenames=c("Site 80", "Site 100")) # - a single site, with drop=FALSE: result is a matrix sitequant(c(0.9, 0.99, 0.999), rfit, sitenames=10, drop=FALSE) # - a single site, with drop=TRUE (the default): result is a vector sitequant(c(0.9, 0.99, 0.999), rfit, sitenames=10) # Quantile function for site 10 qfunc10 <- siteqfunc(rfit, site=10) # Compute quantiles via the quantile function qfunc10(c(0.9, 0.99, 0.999)) # Plot the quantile function evplot(qfunc=qfunc10)
rfit <- regfit(Cascades, "gno") # Fit regional distribution ## Quantiles for: # - sites in the Cascades data set, indexed by number sitequant(c(0.9, 0.99, 0.999), rfit, sitenames=1:3) # - sites in the Cascades data set, indexed by name sitequant(c(0.9, 0.99, 0.999), rfit, sitenames=c("350304", "351433", "351862")) # - other sites, with specified index floods sitequant(c(0.9, 0.99, 0.999), rfit, index=c(80, 100)) # - other sites, with specified index floods and names sitequant(c(0.9, 0.99, 0.999), rfit, index=c(80, 100), sitenames=c("Site 80", "Site 100")) # - a single site, with drop=FALSE: result is a matrix sitequant(c(0.9, 0.99, 0.999), rfit, sitenames=10, drop=FALSE) # - a single site, with drop=TRUE (the default): result is a vector sitequant(c(0.9, 0.99, 0.999), rfit, sitenames=10) # Quantile function for site 10 qfunc10 <- siteqfunc(rfit, site=10) # Compute quantiles via the quantile function qfunc10(c(0.9, 0.99, 0.999)) # Plot the quantile function evplot(qfunc=qfunc10)
summary
method for an object of class "regtst"
.
## S3 method for class 'regtst' summary(object, prob = c(0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.8, 0.9, 0.95, 0.98, 0.99, 0.999), conf = 0.90, decimals = c(4, 4, 2, 3), ...) ## S3 method for class 'summary.regtst' print(x, decimals, ...)
## S3 method for class 'regtst' summary(object, prob = c(0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.8, 0.9, 0.95, 0.98, 0.99, 0.999), conf = 0.90, decimals = c(4, 4, 2, 3), ...) ## S3 method for class 'summary.regtst' print(x, decimals, ...)
object |
An object of class |
x |
An object of class |
prob |
Nonexceedance probabilities for which quantile estimates should be printed. |
conf |
Confidence level for printing parameter and quantile estimates. These quantities will be printed only for distributions that give an adequate fit at the specified confidence level. |
decimals |
Vector of length 4. The four elements specify the number of
decimal places to be used when printing |
... |
Further arguments passed to or from other methods. |
The printed output corresponds closely to that produced by
function REGTST
in the LMOMENTS Fortran package (Hosking, 1996).
summary.regtst
and print.summary.regtst
each return,
invisibly, an object of class "summary.regtst"
, which is a list
with elements as for class "regtst"
, plus the following elements:
conf |
Confidence level — the |
prob |
Vector of nonexceedance probabilities — the |
quant |
Matrix with 6 rows and |
decimals |
Vector of length 4. Number of decimals to be used when printing
an object of class |
J. R. M. Hosking [email protected]
Hosking, J. R. M. (1996).
Fortran routines for use with the method of -moments, Version 3.
Research Report RC20525, IBM Research Division, Yorktown Heights, N.Y.
# An example from Hosking (1996). Compare the output with # the file 'cascades.out' in the LMOMENTS Fortran package at # https://lib.stat.cmu.edu/general/lmoments (results will not # be identical, because random-number generators are different). summary(regtst(Cascades, nsim=500))
# An example from Hosking (1996). Compare the output with # the file 'cascades.out' in the LMOMENTS Fortran package at # https://lib.stat.cmu.edu/general/lmoments (results will not # be identical, because random-number generators are different). summary(regtst(Cascades, nsim=500))