Title: | An Introduction to Statistical Modeling of Extreme Values |
---|---|
Description: | Functions to support the computations carried out in `An Introduction to Statistical Modeling of Extreme Values' by Stuart Coles. The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes. |
Authors: | Original S functions written by Janet E. Heffernan with R port and R documentation provided by Alec G. Stephenson. |
Maintainer: | Eric Gilleland <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.42 |
Built: | 2024-10-26 06:19:56 UTC |
Source: | CRAN |
The dowjones
data frame has 1304 rows and 2 columns.
The second column contains daily closing prices of the Dow
Jones Index over the period 1996 to 2000. The first column
contains a POSIXct
object giving the dates of
each observation.
data(dowjones)
data(dowjones)
This data frame contains the following columns:
A POSIXct
object containing dates.
A numeric vector containing daily closing prices of the Dow Jones Index.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
The engine
data frame has 32 rows and 2 columns.
The first column contains the corrosion level, the second
column gives the engine failure time.
data(engine)
data(engine)
This data frame contains the following columns:
A numeric vector of corrosion levels.
A numeric vector of failure times.
Unknown.
A numeric vector of daily exchange rates between the Euro and UK sterling.
data(euroex)
data(euroex)
A vector containing 975 observations.
Unknown.
The exchange
data frame has 975 rows and 2 columns.
The columns contain daily exchange rates; UK sterling
against the US dollar (first column) and UK sterling
against the Canadian dollar (second column).
The rownames contain the corresponding dates in a character
string with the format "2000/05/26"
. This can be
converted into a POSIXct
or POSIXlt
object
using as.POSIXct
or as.POSIXlt
.
data(exchange)
data(exchange)
This data frame contains the following columns:
US against UK exchange rate.
Canada against UK exchange rate.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
The fremantle
data frame has 86 rows and 3 columns.
The second column gives 86 annual maximimum sea levels
recorded at Fremantle, Western Australia, within the period
1897 to 1989.
The first column gives the corresponding years.
The third column gives annual mean values of the Southern
Oscillation Index (SOI), which is a proxy for meteorological
volitility.
data(fremantle)
data(fremantle)
This data frame contains the following columns:
A numeric vector of years.
A numeric vector of annual sea level maxima.
A numeric vector of annual mean values of the Southern Oscillation Index.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
gamGPDfit()
fits the parameters of a generalized Pareto
distribution (GPD) depending on covariates in a non- or semiparametric
way.
gamGPDboot()
fits and bootstraps the parameters of a GPD
distribution depending on covariates in a non- or semiparametric
way. Applies the post-blackend bootstrap of Chavez-Demoulin and
Davison (2005).
gamGPDfit(x, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs, init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1], niter=32, include.updates=FALSE, epsxi=1e-05, epsnu=1e-05, progress=TRUE, verbose=FALSE, ...) gamGPDboot(x, B, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs, init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1], niter=32, include.updates=FALSE, epsxi=1e-5, epsnu=1e-5, boot.progress=TRUE, progress=FALSE, verbose=FALSE, debug=FALSE, ...)
gamGPDfit(x, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs, init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1], niter=32, include.updates=FALSE, epsxi=1e-05, epsnu=1e-05, progress=TRUE, verbose=FALSE, ...) gamGPDboot(x, B, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs, init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1], niter=32, include.updates=FALSE, epsxi=1e-5, epsnu=1e-5, boot.progress=TRUE, progress=FALSE, verbose=FALSE, debug=FALSE, ...)
x |
data.frame containing the losses (in some component; can be
specified with the argument |
B |
number of bootstrap replications. |
threshold |
threshold of the peaks-over-threshold (POT) method. |
nexc |
number of excesses. This can be used to determine |
datvar |
name of the data column in |
xiFrhs |
right-hand side of the formula for |
nuFrhs |
right-hand side of the formula for |
init |
bivariate vector containing initial values
for |
niter |
maximal number of iterations in the backfitting algorithm. |
include.updates |
|
epsxi |
epsilon for stop criterion for |
epsnu |
epsilon for stop criterion for |
boot.progress |
|
progress |
|
verbose |
|
debug |
|
... |
additional arguments passed to |
gamGPDfit()
fits the parameters and
of the generalized Pareto distribution
depending on covariates in
a non- or semiparametric way. The distribution function is given by
for (which is what we assume) and
. Note that
is also denoted by
in this package. Estimation of
and
by
gamGPDfit()
is done via penalized maximum
likelihood estimation, where the estimators are computed with a
backfitting algorithm. In order to guarantee convergence of this
algorithm, a reparameterization of in terms of the parameter
is done via
The parameters and
(and thus
) are allowed to depend on covariates (including
time) in a non- or semiparametric way, for example:
where denotes the vector of covariates,
,
are parameter vectors and
,
are
regression splines. For more details, see the references and the source
code.
gamGPDboot()
first fits the GPD parameters via
gamGPDfit()
. It then conducts the post-blackend bootstrap of
Chavez-Demoulin and Davison (2005). To this end, it computes the
residuals, resamples them (B
times), reconstructs the
corresponding excesses, and refits the GPD parameters via
gamGPDfit()
again.
gamGPDfit()
returns a list with the components
xi
:estimated parameters ;
beta
:estimated parameters ;
nu
:estimated parameters ;
se.xi
:standard error for ((possibly
adjusted) second-order derivative of the reparameterized
log-likelihood with respect to
) multiplied by -1;
se.nu
:standard error for ((possibly
adjusted) second-order derivative of the reparameterized
log-likelihood with respect to
) multiplied by -1;
xi.covar
:(unique) covariates for ;
nu.covar
:(unique) covariates for ;
covar
:available covariate combinations used for
fitting (
,
);
y
:vector of excesses (exceedances minus threshold);
res
:residuals;
MRD
:mean relative distances between for all
iterations, calculated between old parameters (from the last iteration) and new parameters (currently
estimated ones);
logL
:log-likelihood at the estimated parameters;
xiObj
:R object of type gamObject
for estimated
(returned by
mgcv::gam()
);
nuObj
:R object of type gamObject
for estimated
(returned by
mgcv::gam()
);
xiUpdates
:if include.updates
is
TRUE
, updates for for each
iteration. This is a list of R objects of type
gamObject
which contains xiObj
as last element;
nuUpdates
:if include.updates
is
TRUE
, updates for for each
iteration. This is a list of R objects of type
gamObject
which contains nuObj
as last element;
gamGPDboot()
returns a list of length B+1
where the
first component contains the results of
the initial fit via gamGPDfit()
and the other B
components contain the results for each replication of the
post-blackend bootstrap.
Marius Hofert, Valerie Chavez-Demoulin.
Chavez-Demoulin, V., and Davison, A. C. (2005), Generalized additive models for sample extremes, Applied Statistics 54(1), 207–222.
Chavez-Demoulin, V., and Hofert, M. (to be submitted), Smooth extremal models fitted by penalized maximum likelihood estimation.
### Example 1: fitting capability ############################################## ## generate an example data set years <- 2003:2012 # years nyears <- length(years) n <- 250 # sample size for each (different) xi u <- 200 # threshold rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD set.seed(17) # setting seed xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A" ## generate losses for group "A" lossA <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.A[y], beta=1))) xi.true.B <- xi.true.A^2 # true xi for group "B" ## generate losses for group "B" lossB <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.B[y], beta=1))) ## build data frame time <- rep(rep(years, each=n), 2) # "2" stands for the two groups covar <- rep(c("A","B"), each=n*nyears) value <- c(lossA, lossB) x <- data.frame(covar=covar, time=time, value=value) ## fit eps <- 1e-3 # to decrease the run time for this example fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1, nuFrhs=~covar+s(time)-1, epsxi=eps, epsnu=eps) ## note: choosing s(..., bs="cr") will fit cubic splines ## grab the fitted values per group and year xi.fit <- fitted(fit$xiObj) xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year ## plot the fitted values of xi and the true ones we simulated from par(mfrow=c(1,2)) plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A), main="Group A", xlab="Year", ylab=expression(xi)) points(years, xi.fit.A, type="l", col="red") legend("topleft", inset=0.04, lty=1, col=c("black", "red"), legend=c("true", "fitted"), bty="n") plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B), main="Group B", xlab="Year", ylab=expression(xi)) points(years, xi.fit.B, type="l", col="blue") legend("topleft", inset=0.04, lty=1, col=c("black", "blue"), legend=c("true", "fitted"), bty="n") ## Not run: ### Example 2: Comparison of (the more general) gamGPDfit() with gpd.fit() ######## set.seed(17) # setting seed xi.true.A <- rep(0.4, length=nyears) xi.true.B <- rep(0.8, length=nyears) ## generate losses for group "A" lossA <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.A[y], beta=1))) ## generate losses for group "B" lossB <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.B[y], beta=1))) ## build data frame x <- data.frame(covar=covar, time=time, value=c(lossA, lossB)) ## fit with gpd.fit fit.coles <- gpd.fit(x$value, threshold=u, shl=1, sigl=1, ydat=x) xi.fit.coles.A <- fit.coles$mle[3]+1*fit.coles$mle[4] xi.fit.coles.B <- fit.coles$mle[3]+2*fit.coles$mle[4] ## fit with gamGPDfit() fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar, nuFrhs=~covar, epsxi=eps, epsnu=eps) xi.fit <- fitted(fit$xiObj) xi.fit.A <- as.numeric(xi.fit[1]) # fit for group "A" xi.fit.B <- as.numeric(xi.fit[nyears*n+1]) # fit for group "B" ## comparison xi.fit.A-xi.fit.coles.A xi.fit.B-xi.fit.coles.B ## End(Not run) # dontrun
### Example 1: fitting capability ############################################## ## generate an example data set years <- 2003:2012 # years nyears <- length(years) n <- 250 # sample size for each (different) xi u <- 200 # threshold rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD set.seed(17) # setting seed xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A" ## generate losses for group "A" lossA <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.A[y], beta=1))) xi.true.B <- xi.true.A^2 # true xi for group "B" ## generate losses for group "B" lossB <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.B[y], beta=1))) ## build data frame time <- rep(rep(years, each=n), 2) # "2" stands for the two groups covar <- rep(c("A","B"), each=n*nyears) value <- c(lossA, lossB) x <- data.frame(covar=covar, time=time, value=value) ## fit eps <- 1e-3 # to decrease the run time for this example fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1, nuFrhs=~covar+s(time)-1, epsxi=eps, epsnu=eps) ## note: choosing s(..., bs="cr") will fit cubic splines ## grab the fitted values per group and year xi.fit <- fitted(fit$xiObj) xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year ## plot the fitted values of xi and the true ones we simulated from par(mfrow=c(1,2)) plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A), main="Group A", xlab="Year", ylab=expression(xi)) points(years, xi.fit.A, type="l", col="red") legend("topleft", inset=0.04, lty=1, col=c("black", "red"), legend=c("true", "fitted"), bty="n") plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B), main="Group B", xlab="Year", ylab=expression(xi)) points(years, xi.fit.B, type="l", col="blue") legend("topleft", inset=0.04, lty=1, col=c("black", "blue"), legend=c("true", "fitted"), bty="n") ## Not run: ### Example 2: Comparison of (the more general) gamGPDfit() with gpd.fit() ######## set.seed(17) # setting seed xi.true.A <- rep(0.4, length=nyears) xi.true.B <- rep(0.8, length=nyears) ## generate losses for group "A" lossA <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.A[y], beta=1))) ## generate losses for group "B" lossB <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.B[y], beta=1))) ## build data frame x <- data.frame(covar=covar, time=time, value=c(lossA, lossB)) ## fit with gpd.fit fit.coles <- gpd.fit(x$value, threshold=u, shl=1, sigl=1, ydat=x) xi.fit.coles.A <- fit.coles$mle[3]+1*fit.coles$mle[4] xi.fit.coles.B <- fit.coles$mle[3]+2*fit.coles$mle[4] ## fit with gamGPDfit() fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar, nuFrhs=~covar, epsxi=eps, epsnu=eps) xi.fit <- fitted(fit$xiObj) xi.fit.A <- as.numeric(xi.fit[1]) # fit for group "A" xi.fit.B <- as.numeric(xi.fit[nyears*n+1]) # fit for group "B" ## comparison xi.fit.A-xi.fit.coles.A xi.fit.B-xi.fit.coles.B ## End(Not run) # dontrun
Produces diagnostic plots for GEV models using the output
of the function gev.fit
.
gev.diag(z)
gev.diag(z)
z |
An object returned by |
For stationary models four plots are produced; a probability plot, a quantile plot, a return level plot and a histogram of data with fitted density.
For non-stationary models two plots are produced; a residual probability plot and a residual quantile plot.
data(portpirie) ppfit <- gev.fit(portpirie[,2]) gev.diag(ppfit)
data(portpirie) ppfit <- gev.fit(portpirie[,2]) gev.diag(ppfit)
Maximum-likelihood fitting for the generalized extreme value distribution, including generalized linear modelling of each parameter.
gev.fit(xdat, ydat = NULL, mul = NULL, sigl = NULL, shl = NULL, mulink = identity, siglink = identity, shlink = identity, muinit = NULL, siginit = NULL, shinit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
gev.fit(xdat, ydat = NULL, mul = NULL, sigl = NULL, shl = NULL, mulink = identity, siglink = identity, shlink = identity, muinit = NULL, siginit = NULL, shinit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
xdat |
A numeric vector of data to be fitted. |
ydat |
A matrix of covariates for generalized linear modelling
of the parameters (or |
mul , sigl , shl
|
Numeric vectors of integers, giving the columns
of |
mulink , siglink , shlink
|
Inverse link functions for generalized linear modelling of the location, scale and shape parameters repectively. |
muinit , siginit , shinit
|
numeric of length equal to total number of parameters used to model the location, scale or shape parameter(s), resp. See Details section for default (NULL) initial values. |
show |
Logical; if |
method |
The optimization method (see |
maxit |
The maximum number of iterations. |
... |
Other control parameters for the optimization. These
are passed to components of the |
The form of the GEV used is that of Coles (2001) Eq (3.2). Specifically, positive values of the shape parameter imply a heavy tail, and negative values imply a bounded upper tail.
For non-stationary fitting it is recommended that the covariates
within the generalized linear models are (at least approximately)
centered and scaled (i.e.\ the columns of ydat
should be
approximately centered and scaled).
Let m=mean(xdat) and s=sqrt(6*var(xdat))/pi. Then, initial values assigend when 'muinit' is NULL are m - 0.57722 * s (stationary case). When 'siginit' is NULL, the initial value is taken to be s, and when 'shinit' is NULL, the initial value is taken to be 0.1. When covariates are introduced (non-stationary case), these same initial values are used by default for the constant term, and zeros for all other terms. For example, if a GEV( mu(t)=mu0+mu1*t, sigma, xi) is being fitted, then the initial value for mu0 is m - 0.57722 * s, and 0 for mu1.
A list containing the following components. A subset of these
components are printed after the fit. If show
is
TRUE
, then assuming that successful convergence is
indicated, the components nllh
, mle
and se
are always printed.
nllh |
single numeric giving the negative log-likelihood value. |
mle |
numeric vector giving the MLE's for the location, scale and shape parameters, resp. |
se |
numeric vector giving the standard errors for the MLE's for the location, scale and shape parameters, resp. |
trans |
An logical indicator for a non-stationary fit. |
model |
A list with components |
link |
A character vector giving inverse link functions. |
conv |
The convergence code, taken from the list returned by
|
nllh |
The negative logarithm of the likelihood evaluated at the maximum likelihood estimates. |
data |
The data that has been fitted. For non-stationary models, the data is standardized. |
mle |
A vector containing the maximum likelihood estimates. |
cov |
The covariance matrix. |
se |
A vector containing the standard errors. |
vals |
A matrix with three columns containing the maximum likelihood estimates of the location, scale and shape parameters at each data point. |
Coles, S., 2001. An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London, U.K., 208pp.
data(portpirie) gev.fit(portpirie[,2])
data(portpirie) gev.fit(portpirie[,2])
Produce profile log-likelihoods for shape parameters and
m year/block return levels for stationary GEV models using
the output of the function gev.fit
.
gev.prof(z, m, xlow, xup, conf = 0.95, nint = 100) gev.profxi(z, xlow, xup, conf = 0.95, nint = 100)
gev.prof(z, m, xlow, xup, conf = 0.95, nint = 100) gev.profxi(z, xlow, xup, conf = 0.95, nint = 100)
z |
An object returned by |
m |
The return level (i.e.\ the profile likelihood is for
the value that is exceeded with probability 1/ |
xlow , xup
|
The least and greatest value at which to evaluate the profile likelihood. |
conf |
The confidence coefficient of the plotted profile confidence interval. |
nint |
The number of points at which the profile likelihood is evaluated. |
A plot of the profile likelihood is produced, with a horizontal
line representing a profile confidence interval with confidence
coefficient conf
.
data(portpirie) ppfit <- gev.fit(portpirie[,2]) ## Not run: gev.prof(ppfit, m = 10, 4.1, 5) ## Not run: gev.profxi(ppfit, -0.3, 0.3)
data(portpirie) ppfit <- gev.fit(portpirie[,2]) ## Not run: gev.prof(ppfit, m = 10, 4.1, 5) ## Not run: gev.profxi(ppfit, -0.3, 0.3)
A numeric vector containing breaking strengths of 63 glass fibres of length 1.5 centimetres, recorded under experimental conditions.
data(glass)
data(glass)
A vector containing 63 observations.
Smith, R. L. and Naylor, J. C. (1987) A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution. Applied Statistics 36, 358–396.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
Produces diagnostic plots for GPD models using the output
of the function gpd.fit
.
gpd.diag(z)
gpd.diag(z)
z |
An object returned by |
For stationary models four plots are produced; a probability plot, a quantile plot, a return level plot and a histogram of data with fitted density.
For non-stationary models two plots are produced; a residual probability plot and a residual quantile plot.
data(rain) rnfit <- gpd.fit(rain, 10) gpd.diag(rnfit)
data(rain) rnfit <- gpd.fit(rain, 10) gpd.diag(rnfit)
Maximum-likelihood fitting for the GPD model, including generalized linear modelling of each parameter.
gpd.fit(xdat, threshold, npy = 365, ydat = NULL, sigl = NULL, shl = NULL, siglink = identity, shlink = identity, siginit = NULL, shinit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
gpd.fit(xdat, threshold, npy = 365, ydat = NULL, sigl = NULL, shl = NULL, siglink = identity, shlink = identity, siginit = NULL, shinit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
xdat |
A numeric vector of data to be fitted. |
threshold |
The threshold; a single number or a numeric
vector of the same length as |
npy |
The number of observations per year/block. |
ydat |
A matrix of covariates for generalized linear modelling
of the parameters (or |
sigl , shl
|
Numeric vectors of integers, giving the columns
of |
siglink , shlink
|
Inverse link functions for generalized linear modelling of the scale and shape parameters repectively. |
siginit , shinit
|
numeric giving initial value(s) for parameter estimates. If NULL, default is sqrt(6 * var(xdat))/pi and 0.1 for the scale and shape parameters, resp. If using parameter covariates, then these values are used for the constant term, and zeros for all other terms. |
show |
Logical; if |
method |
The optimization method (see |
maxit |
The maximum number of iterations. |
... |
Other control parameters for the optimization. These
are passed to components of the |
For non-stationary fitting it is recommended that the covariates
within the generalized linear models are (at least approximately)
centered and scaled (i.e.\ the columns of ydat
should be
approximately centered and scaled).
The form of the GP model used follows Coles (2001) Eq (4.7). In particular, the shape parameter is defined so that positive values imply a heavy tail and negative values imply a bounded upper value.
A list containing the following components. A subset of these
components are printed after the fit. If show
is
TRUE
, then assuming that successful convergence is
indicated, the components nexc
, nllh
,
mle
, rate
and se
are always printed.
nexc |
single numeric giving the number of threshold exceedances. |
nllh |
nsingle umeric giving the negative log-likelihood value. |
mle |
numeric vector giving the MLE's for the scale and shape parameters, resp. |
rate |
single numeric giving the estimated probability of exceeding the threshold. |
se |
numeric vector giving the standard error estiamtes for the scale and shape parameter estimates, resp. |
trans |
An logical indicator for a non-stationary fit. |
model |
A list with components |
link |
A character vector giving inverse link functions. |
threshold |
The threshold, or vector of thresholds. |
nexc |
The number of data points above the threshold. |
data |
The data that lie above the threshold. For non-stationary models, the data is standardized. |
conv |
The convergence code, taken from the list returned by
|
nllh |
The negative logarithm of the likelihood evaluated at the maximum likelihood estimates. |
vals |
A matrix with three columns containing the maximum likelihood estimates of the scale and shape parameters, and the threshold, at each data point. |
mle |
A vector containing the maximum likelihood estimates. |
rate |
The proportion of data points that lie above the threshold. |
cov |
The covariance matrix. |
se |
A vector containing the standard errors. |
n |
The number of data points (i.e.\ the length of
|
npy |
The number of observations per year/block. |
xdata |
The data that has been fitted. |
Coles, S., 2001. An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London, U.K., 208pp.
gpd.diag
, optim
,
gpd.prof
, gpd.fitrange
,
mrl.plot
, pp.fit
data(rain) gpd.fit(rain, 10)
data(rain) gpd.fit(rain, 10)
Maximum-likelihood fitting for a stationary GPD model, over a range of thresholds. Graphs of parameter estimates which aid the selection of a threshold are produced.
gpd.fitrange(data, umin, umax, nint = 10, show = FALSE, ...)
gpd.fitrange(data, umin, umax, nint = 10, show = FALSE, ...)
data |
A numeric vector of data to be fitted. |
umin , umax
|
The minimum and maximum thresholds at which the model is fitted. |
nint |
The number of fitted models. |
show |
Logical; if |
... |
Optional arguments to |
Two graphs showing maximum likelihood estimates and confidence intervals of the shape and modified scale parameters over a range of thresholds are produced. A list object is returned invisibly with components: 'threshold' numeric vector of length 'nint' giving the thresholds used, 'mle' an 'nint X 3' matrix giving the maximum likelihood parameter estimates (columns are location, scale and shape respectively), 'se' an 'nint X 3' matrix giving the estimated standard errors for the parameter estimates (columns are location, scale and shape, resp.), 'ci.low', 'ci.up' 'nint X 3' matrices giving the lower and upper 95 intervals, resp. (columns same as for 'mle' and 'se').
gpd.fit
, mrl.plot
,
pp.fit
, pp.fitrange
## Not run: data(rain) ## Not run: gpd.fitrange(rain, 10, 40)
## Not run: data(rain) ## Not run: gpd.fitrange(rain, 10, 40)
Produce profile log-likelihoods for shape parameters and
m year/block return levels for stationary GPD models using
the output of the function gpd.fit
.
gpd.prof(z, m, xlow, xup, npy = 365, conf = 0.95, nint = 100) gpd.profxi(z, xlow, xup, conf = 0.95, nint = 100)
gpd.prof(z, m, xlow, xup, npy = 365, conf = 0.95, nint = 100) gpd.profxi(z, xlow, xup, conf = 0.95, nint = 100)
z |
An object returned by |
m |
The return level (i.e.\ the profile likelihood is for
the value that is exceeded with probability 1/ |
xlow , xup
|
The least and greatest value at which to evaluate the profile likelihood. |
npy |
The number of observations per year. |
conf |
The confidence coefficient of the plotted profile confidence interval. |
nint |
The number of points at which the profile likelihood is evaluated. |
A plot of the profile likelihood is produced, with a horizontal
line representing a profile confidence interval with confidence
coefficient conf
.
data(rain) rnfit <- gpd.fit(rain, 10) ## Not run: gpd.prof(rnfit, m = 10, 55, 75) ## Not run: gpd.profxi(rnfit, -0.02, 0.15)
data(rain) rnfit <- gpd.fit(rain, 10) ## Not run: gpd.prof(rnfit, m = 10, 55, 75) ## Not run: gpd.profxi(rnfit, -0.02, 0.15)
Produces diagnostic plots for Gumbel models using the output
of the function gum.fit
.
gum.diag(z)
gum.diag(z)
z |
An object returned by |
For stationary models four plots are produced; a probability plot, a quantile plot, a return level plot and a histogram of data with fitted density.
For non-stationary models two plots are produced; a residual probability plot and a residual quantile plot.
data(portpirie) ppfit <- gum.fit(portpirie[,2]) gum.diag(ppfit)
data(portpirie) ppfit <- gum.fit(portpirie[,2]) gum.diag(ppfit)
Maximum-likelihood fitting for the gumbel distribution, including generalized linear modelling of each parameter.
gum.fit(xdat, ydat = NULL, mul = NULL, sigl = NULL, mulink = identity, siglink = identity, muinit = NULL, siginit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
gum.fit(xdat, ydat = NULL, mul = NULL, sigl = NULL, mulink = identity, siglink = identity, muinit = NULL, siginit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
xdat |
A numeric vector of data to be fitted. |
ydat |
A matrix of covariates for generalized linear modelling
of the parameters (or |
mul , sigl
|
Numeric vectors of integers, giving the columns
of |
mulink , siglink
|
Inverse link functions for generalized linear modelling of the location and scale parameters repectively. |
muinit , siginit
|
numeric giving initial parameter estimates. See Details section for information about default values (NULL). |
show |
Logical; if |
method |
The optimization method (see |
maxit |
The maximum number of iterations. |
... |
Other control parameters for the optimization. These
are passed to components of the |
For non-stationary fitting it is recommended that the covariates
within the generalized linear models are (at least approximately)
centered and scaled (i.e.\ the columns of ydat
should be
approximately centered and scaled).
Let m=mean(xdat) and s=sqrt(6*var(xdat))/pi. Then, initial values assigend when 'muinit' is NULL are m - 0.57722 * s (stationary case). When 'siginit' is NULL, the initial value is taken to be s, and when 'shinit' is NULL. When covariates are introduced (non-stationary case), these same initial values are used by default for the constant term, and zeros for all other terms. For example, if a Gumbel( mu(t)=mu0+mu1*t, sigma) is being fitted, then the initial value for mu0 is m - 0.57722 * s, and 0 for mu1.
A list containing the following components. A subset of these
components are printed after the fit. If show
is
TRUE
, then assuming that successful convergence is
indicated, the components nllh
, mle
and se
are always printed.
trans |
An logical indicator for a non-stationary fit. |
model |
A list with components |
link |
A character vector giving inverse link functions. |
conv |
The convergence code, taken from the list returned by
|
nllh |
The negative logarithm of the likelihood evaluated at the maximum likelihood estimates. |
data |
The data that has been fitted. For non-stationary models, the data is standardized. |
mle |
A vector containing the maximum likelihood estimates. |
cov |
The covariance matrix. |
se |
A vector containing the standard errors. |
vals |
A matrix with two columns containing the maximum likelihood estimates of the location and scale parameters at each data point. |
data(portpirie) gum.fit(portpirie[,2])
data(portpirie) gum.fit(portpirie[,2])
ismev includes functions to support the computations carried out in Coles (2001). The functions may be divided into the following groups; maxima/minima, order statistics, peaks over thresholds and point processes. ismev is an R port of the S-Plus extreme value statistical routines believed to be originally written by Janet E. Heffernan.
Primary functions include:
gev.fit
, gev.diag
, gpd.fit
, gpd.diag
, pp.fit
and pp.diag
.
Original R port was carried out by Alec G. Stephenson, and the package is currently being maintained by Eric Gilleland.
Datasets from Coles (2001) included are:
dowjones euroex fremantle portpirie venice wind engine exchange glass rain wavesurge wooster
Coles, Stuart (2001) An Introduction to Statistical Modeling of Extreme Values, London, UK: Springer, ISBN: 1852334592, 208 pp.
An empirical mean residual life plot, including confidence intervals, is produced. The mean residual life plot aids the selection of a threshold for the GPD or point process models.
mrl.plot(data, umin = min(data), umax = max(data) - 0.1, conf = 0.95, nint = 100)
mrl.plot(data, umin = min(data), umax = max(data) - 0.1, conf = 0.95, nint = 100)
data |
A numeric vector of data to be fitted. |
umin , umax
|
The minimum and maximum thresholds at which the mean residual life function is calculated. |
conf |
The confidence coefficient for the confidence intervals depicted in the plot. |
nint |
The number of points at which the mean residual life function is calculated. |
data(rain) mrl.plot(rain)
data(rain) mrl.plot(rain)
The portpirie
data frame has 65 rows and 2 columns.
The second column gives annual maximimum sea levels recorded
at Port Pirie, South Australia, from 1923 to 1987.
The first column gives the corresponding years.
data(portpirie)
data(portpirie)
This data frame contains the following columns:
A numeric vector of years.
A numeric vector of annual sea level maxima.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
Produces diagnostic plots for point process models using the
output of the function pp.fit
.
pp.diag(z)
pp.diag(z)
z |
An object returned by |
For stationary models two plots are produced; a probability plot and a quantile plot.
For non-stationary models two plots are produced; a residual probability plot and a residual quantile plot.
data(rain) rnfit <- pp.fit(rain, 10) pp.diag(rnfit)
data(rain) rnfit <- pp.fit(rain, 10) pp.diag(rnfit)
Maximum-likelihood fitting for the point process model, including generalized linear modelling of each parameter.
pp.fit(xdat, threshold, npy = 365, ydat = NULL, mul = NULL, sigl = NULL, shl = NULL, mulink = identity, siglink = identity, shlink = identity, muinit = NULL, siginit = NULL, shinit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
pp.fit(xdat, threshold, npy = 365, ydat = NULL, mul = NULL, sigl = NULL, shl = NULL, mulink = identity, siglink = identity, shlink = identity, muinit = NULL, siginit = NULL, shinit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
xdat |
A numeric vector of data to be fitted. |
threshold |
The threshold; a single number or a numeric
vector of the same length as |
npy |
The number of observations per year/block. |
ydat |
A matrix of covariates for generalized linear modelling
of the parameters (or |
mul , sigl , shl
|
Numeric vectors of integers, giving the columns
of |
mulink , siglink , shlink
|
Inverse link functions for generalized linear modelling of the location, scale and shape parameters repectively. |
muinit , siginit , shinit
|
numeric giving initial parameter estimates. See Details section for information on default (NULL) initial values. |
show |
Logical; if |
method |
The optimization method (see |
maxit |
The maximum number of iterations. |
... |
Other control parameters for the optimization. These
are passed to components of the |
For non-stationary fitting it is recommended that the covariates
within the generalized linear models are (at least approximately)
centered and scaled (i.e.\ the columns of ydat
should be
approximately centered and scaled). Otherwise, the numerics may
become unstable.
As of version 1.32, a more accurate estimate of the exceedance rate, in the face of covariates, is used (at the expense of computational efficiency). In particular, when including covariates, parameter estimates may differ from those in Coles (2001).
Let m=mean(xdat) and s=sqrt(6*var(xdat))/pi. Then, initial values assigend when 'muinit' is NULL are m - 0.57722 * s (stationary case). When 'siginit' is NULL, the initial value is taken to be s, and when 'shinit' is NULL, the initial value is taken to be 0.1. When covariates are introduced (non-stationary case), these same initial values are used by default for the constant term, and zeros for all other terms. For example, if a GEV( mu(t)=mu0+mu1*t, sigma, xi) is being fitted, then the initial value for mu0 is m - 0.57722 * s, and 0 for mu1.
A list containing the following components. A subset of these
components are printed after the fit. If show
is
TRUE
, then assuming that successful convergence is
indicated, the components nexc
, nllh
, mle
and se
are always printed.
trans |
An logical indicator for a non-stationary fit. |
model |
A list with components |
link |
A character vector giving inverse link functions. |
threshold |
The threshold, or vector of thresholds. |
npy |
The number of observations per year/block. |
nexc |
The number of data points above the threshold. |
data |
The data that lie above the threshold. For non-stationary models, the data is standardized. |
conv |
The convergence code, taken from the list returned by
|
nllh |
The negative logarithm of the likelihood evaluated at the maximum likelihood estimates. |
vals |
A matrix with four columns containing the maximum likelihood estimates of the location, scale and shape parameters, and the threshold, at each data point. |
gpd |
A matrix with three rows containing the maximum likelihood estimates of corresponding GPD location, scale and shape parameters at each data point. |
mle |
A vector containing the maximum likelihood estimates. |
cov |
The covariance matrix. |
se |
A vector containing the standard errors. |
Different optimization methods may result in wildly different parameter estimates.
This code is adapted by Eric Gilleland from code originally written for S-Plus by Stuart Coles, and ported to R by Alec Stephenson. See details section above.
Beirlant J, Goegebeur Y, Segers J and Teugels J. (2004). Statistics of Extremes, Wiley, Chichester, England.
Coles, Stuart (2001). An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London.
pp.diag
, optim
,
pp.fitrange
, mrl.plot
,
gpd.fit
data(rain) pp.fit(rain, 10)
data(rain) pp.fit(rain, 10)
Maximum-likelihood fitting for a stationary point process model, over a range of thresholds. Graphs of parameter estimates which aid the selection of a threshold are produced.
pp.fitrange(data, umin, umax, npy = 365, nint = 10, show = FALSE, ...)
pp.fitrange(data, umin, umax, npy = 365, nint = 10, show = FALSE, ...)
data |
A numeric vector of data to be fitted. |
umin , umax
|
The minimum and maximum thresholds at which the model is fitted. |
npy |
The number of observations per year/block. |
nint |
The number of fitted models. |
show |
Logical; if |
... |
Optional arguments to |
Three graphs showing maximum likelihood estimates and confidence intervals of the location, scale and shape parameters over a range of thresholds are produced. A list object is returned invisibly with components: 'threshold' numeric vector of length 'nint' giving the thresholds used, 'mle' an 'nint X 3' matrix giving the maximum likelihood parameter estimates (columns are location, scale and shape respectively), 'se' an 'nint X 3' matrix giving the estimated standard errors for the parameter estimates (columns are location, scale and shape, resp.), 'ci.low', 'ci.up' 'nint X 3' matrices giving the lower and upper 95 intervals, resp. (columns same as for 'mle' and 'se').
pp.fit
, mrl.plot
,
gpd.fit
, gpd.fitrange
## Not run: data(rain) ## Not run: pp.fitrange(rain, 10, 40)
## Not run: data(rain) ## Not run: pp.fitrange(rain, 10, 40)
A numeric vector containing daily rainfall accumulations at a location in south-west England over the period 1914 to 1962.
data(rain)
data(rain)
A vector containing 17531 observations.
Coles, S. G. and Tawn, J. A. (1996) Modelling extremes of the areal rainfall process. Journal of the Royal Statistical Society, B 53, 329–347.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
Produces diagnostic plots for order statistics models using
the output of the function rlarg.fit
.
rlarg.diag(z, n = z$r)
rlarg.diag(z, n = z$r)
z |
An object returned by |
n |
Probability and quantile plots are produced for the
largest |
For stationary models four plots are initially produced;
a probability plot, a quantile plot, a return level plot
and a histogram of data with fitted density.
Then probability and quantile plots are produced for the
largest n
order statistics.
For non-stationary models residual probability plots and
residual quantile plots are produced for the largest
n
order statistics.
## Not run: data(venice) ## Not run: venfit <- rlarg.fit(venice[,-1]) ## Not run: rlarg.diag(venfit)
## Not run: data(venice) ## Not run: venfit <- rlarg.fit(venice[,-1]) ## Not run: rlarg.diag(venfit)
Maximum-likelihood fitting for the order statistic model, including generalized linear modelling of each parameter.
rlarg.fit(xdat, r = dim(xdat)[2], ydat = NULL, mul = NULL, sigl = NULL, shl = NULL, mulink = identity, siglink = identity, shlink = identity, muinit = NULL, siginit = NULL, shinit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
rlarg.fit(xdat, r = dim(xdat)[2], ydat = NULL, mul = NULL, sigl = NULL, shl = NULL, mulink = identity, siglink = identity, shlink = identity, muinit = NULL, siginit = NULL, shinit = NULL, show = TRUE, method = "Nelder-Mead", maxit = 10000, ...)
xdat |
A numeric matrix of data to be fitted. Each row
should be a vector of decreasing order, containing the
largest order statistics for each year (or time period).
The first column therefore contains annual (or period)
maxima.
Only the first |
r |
The largest |
ydat |
A matrix of covariates for generalized linear modelling
of the parameters (or |
mul , sigl , shl
|
Numeric vectors of integers, giving the columns
of |
mulink , siglink , shlink
|
Inverse link functions for generalized linear modelling of the location, scale and shape parameters repectively. |
muinit , siginit , shinit
|
numeric of length equal to total number of parameters used to model the location, scale or shape parameter(s), resp. See Details section for default (NULL) initial values. |
show |
Logical; if |
method |
The optimization method (see |
maxit |
The maximum number of iterations. |
... |
Other control parameters for the optimization. These
are passed to components of the |
For non-stationary fitting it is recommended that the covariates
within the generalized linear models are (at least approximately)
centered and scaled (i.e.\ the columns of ydat
should be
approximately centered and scaled).
Let m=mean(xdat) and s=sqrt(6*var(xdat))/pi. Then, initial values assigend when 'muinit' is NULL are m - 0.57722 * s (stationary case). When 'siginit' is NULL, the initial value is taken to be s, and when 'shinit' is NULL, the initial value is taken to be 0.1. When covariates are introduced (non-stationary case), these same initial values are used by default for the constant term, and zeros for all other terms. For example, if a GEV( mu(t)=mu0+mu1*t, sigma, xi) is being fitted, then the initial value for mu0 is m - 0.57722 * s, and 0 for mu1.
A list containing the following components. A subset of these
components are printed after the fit. If show
is
TRUE
, then assuming that successful convergence is
indicated, the components nllh
, mle
and se
are always printed.
trans |
An logical indicator for a non-stationary fit. |
model |
A list with components |
link |
A character vector giving inverse link functions. |
conv |
The convergence code, taken from the list returned by
|
nllh |
The negative logarithm of the likelihood evaluated at the maximum likelihood estimates. |
data |
The data that has been fitted. For non-stationary models, the data is standardized. |
mle |
A vector containing the maximum likelihood estimates. |
cov |
The covariance matrix. |
se |
A vector containing the standard errors. |
vals |
A matrix with three columns containing the maximum likelihood estimates of the location, scale and shape parameters at each data point. |
r |
The number of order statistics used. |
## Not run: data(venice) ## Not run: rlarg.fit(venice[,-1])
## Not run: data(venice) ## Not run: rlarg.fit(venice[,-1])
The venice
data frame has 51 rows and 11 columns.
The final ten columns contain the 10 largest sea levels
observed within the year given by the first column.
The ten largest sea levels are given for every year in
the period 1931 to 1981, excluding 1935 in which only
the six largest measurements are available.
data(venice)
data(venice)
This data frame contains the following columns:
A numeric vector of years.
Annual sea level maxima.
The second largest sea level.
The third largest sea level.
The forth largest sea level.
The fifth largest sea level.
The sixth largest sea level.
The seventh largest sea level.
The eigth largest sea level.
The ninth largest sea level.
The tenth largest sea level.
Smith, R. L. (1986) Extreme value theory based on the r largest annual events. Journal of Hydrology 86, 27–43.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
The wavesurge
data frame has 2894 rows and 2 columns.
The columns contain wave and surge heights (in metres) at a
single location off south-west England.
data(wavesurge)
data(wavesurge)
This data frame contains the following columns:
A numeric vector of wave heights.
A numeric vector of surge heights.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
The wind
data frame has 40 rows and 3 columns.
The second and third columns contain annual maximum wind
speeds at Albany, New York and Hartford, Connecticut
respectively, over the period 1944 to 1983.
The first column gives the corresponding years.
data(wind)
data(wind)
This data frame contains the following columns:
A numeric vector of years.
Annual maximum wind speeds at Hartford.
Annual maximum wind speeds at Albany.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
A numeric vector containing daily minimum temperatures, in degrees Fahrenheit, at Wooster, Ohio, over the period 1983 to 1988.
data(wooster)
data(wooster)
A vector containing 1826 observations.
Coles, S. G., Tawn, J. A. and Smith, R. L. (1994) A seasonal Markov model for extremely low temperatures. Environmetrics 5, 221–239.
Coles, S. G. (2001) An Introduction to Statistical Modelling of Extreme Values. London: Springer.
Smith, R. L., Tawn, J. A. and Coles, S. G. (1997) Markov chain models for threshold exceedences. Biometrica 84, 249–268.