Title: | Density Goodness-of-Fit Test |
---|---|
Description: | Provides functions for the implementation of a density goodness-of-fit test, based on piecewise approximation of the L2 distance. |
Authors: | Dimitrios Bagkavos [aut, cre] |
Maintainer: | Dimitrios Bagkavos <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.6.0 |
Built: | 2024-11-14 06:32:48 UTC |
Source: | CRAN |
of Bagkavos, Patil and Wood (2021)Implements an asymptoticaly normal critical value for testing the goodness-of-fit of a parametrically estimated density with the test statistic S.n
.
cutoff.asymptotic(dist, p1, p2, sig.lev)
cutoff.asymptotic(dist, p1, p2, sig.lev)
dist |
The null distribution. |
p1 |
Parameter 1 (vector or object) for the null distribution. |
p2 |
Parameter 2 (vector or object) for the null distribution. |
sig.lev |
Significance level of the hypothesis test. |
Implements the asymptotic critical value defined in Remark 1, Bagkavos, Patil and Wood (2021), equal to where
is the
quantile of the normal distribution and
A scalar, the estimate of the asymptotic critical value at the given significance level.
Dimitrios Bagkavos
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.
cutoff.edgeworth, cutoff.bootstrap
of Bagkavos, Patil and Wood (2021)Implements a bootstrap critical value for testing the goodness-of-fit of a parametrically estimated density with the test statistic S.n
.
cutoff.bootstrap(xin, M, sim, dist, h.use, kfun, p1, p2, sig.lev)
cutoff.bootstrap(xin, M, sim, dist, h.use, kfun, p1, p2, sig.lev)
xin |
A vector of data points - the available sample. |
M |
Number of bootstrap replications. |
sim |
A character string indicating the type of simulation required: "ordinary" (the default), "parametric", "balanced", "permutation", or "antithetic". |
dist |
The null distribution. |
h.use |
The test statistic bandwidth, best implemented with |
kfun |
The kernel to use in the density estimates used in the bandwidth expression. |
p1 |
Parameter 1 (vector or object) for the null distribution. |
p2 |
Parameter 2 (vector or object) for the null distribution. |
sig.lev |
Significance level of the hypothesis test. |
Implements the bootstrap based finite sample critical value defined in Section 2.6, Bagkavos, Patil and Wood (2021), and calculated as follows:
1. Resample the observations to obtain
bootstrap samples, denoted by
, where for each
,
is sampled randomly, with replacement, from
. Write
for the estimator of
based on the original sample
and, for each
, define the bootstrap estimator of
by
, where
is the relevant functional for the parameter
.
2. For , use
and
from the previous step to calculate
,
.
3. Calculate as the
empirical quantile of the values
,
. Then
approximately satisfies
, where
indicates the bootstrap probability measure conditional on
.
A scalar, the estimate of the bootstrap critical value at the given significance level.
Dimitrios Bagkavos
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.
Gao and Gijbels, Bandwidth selection in nonparametric kernel testing, pp. 1584-1594, JASA (2008)
cutoff.asymptotic, cutoff.edgeworth
library(nor1mix) library(boot) SampleSize<-80 M<-1000 dist<- "normixt" kfun<- Epanechnikov p1 <-MW.nm2 p2 <-1 sig.lev <- 0.05 sim<-"ordinary" ## Not run: #Run the following to compare the asymptotic and bootstrap cut-off points on 4 occasions: for(i in 15:18) { set.seed(i) xin<-rnorMix(SampleSize, p1) h.use <- hopt.be(xin) l.a.a<-cutoff.asymptotic( dist, p1, p2, sig.lev ) l.a.b<- cutoff.bootstrap(xin, M, sim, dist, h.use, kfun, p1, p2, sig.lev) #print the result of each iteration: cat("Asympt. cut.off= ", l.a.a, "Boot. cut.off= ", l.a.b, "\n") } ## End(Not run)
library(nor1mix) library(boot) SampleSize<-80 M<-1000 dist<- "normixt" kfun<- Epanechnikov p1 <-MW.nm2 p2 <-1 sig.lev <- 0.05 sim<-"ordinary" ## Not run: #Run the following to compare the asymptotic and bootstrap cut-off points on 4 occasions: for(i in 15:18) { set.seed(i) xin<-rnorMix(SampleSize, p1) h.use <- hopt.be(xin) l.a.a<-cutoff.asymptotic( dist, p1, p2, sig.lev ) l.a.b<- cutoff.bootstrap(xin, M, sim, dist, h.use, kfun, p1, p2, sig.lev) #print the result of each iteration: cat("Asympt. cut.off= ", l.a.a, "Boot. cut.off= ", l.a.b, "\n") } ## End(Not run)
of Bagkavos, Patil and Wood (2021)Implements the critical value for the density goodness-of-fit test S.n
, approximating via an Edgeworth expansion the size function of the test statistic S.n
.
cutoff.edgeworth(xin, dist, kfun, p1, p2, sig.lev)
cutoff.edgeworth(xin, dist, kfun, p1, p2, sig.lev)
xin |
A vector of data points - the available sample. |
dist |
The null distribution. |
kfun |
The kernel to use in the density estimates used in the bandwidth expression. |
p1 |
Parameter 1 (vector or object) for the null distribution. |
p2 |
Parameter 2 (vector or object) for the null distribution. |
sig.lev |
Significance level of the hypothesis test. |
Implements the critical value for the density goodness-of-fit test S.n
, approximating via an Edgeworth expansion the size function of the test statistic S.n
, given by
where is the
quantile of the normal distribution and
and
with
and
This critical value is the density function equivalent to the critical value estimate obtained in the closely relatated regression setting in Gao and Gijbels (2008) and is suitable for finite sample implementations of the test.
A scalar, the estimate of the critical value at the given significance level.
Dimitrios Bagkavos
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Gao and Gijbels, Bandwidth selection in nonparametric kernel testing, pp. 1584-1594, JASA (2008)
cutoff.asymptotic, cutoff.bootstrap
Implements an optimal, with respect to Berry-Esseen bound, bandwidth for the density goodness-of-fit test of Bagkavos, Patil and Wood (2021).
hopt.be(xin)
hopt.be(xin)
xin |
A vector of data points - the available sample. |
Implements the Berry-Esseen bound optimal bandwidth defined in (18), Bagkavos, Patil and Wood (2022), given by
where
and is the density optimal bandwidth calculated by a reference to a prametric distribution,
and
The estimate of the Berry-Esseen optimal bandwidth.
Dimitrios Bagkavos
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.
S.n
.Implements the power-optimal bandwidth for density goodness-of-fit test S.n
based on optimization of the test statistic's power function.
hopt.edgeworth(xin, dist, kfun, p1, p2, sig.lev)
hopt.edgeworth(xin, dist, kfun, p1, p2, sig.lev)
xin |
A vector of data points - the available sample. |
dist |
The null distribution. |
kfun |
The kernel to use in the density estimates used in the bandwidth expression. |
p1 |
Parameter 1 (vector or object) for the null distribution. |
p2 |
Parameter 2 (vector or object) for the null distribution. |
sig.lev |
Significance level of the hypothesis test. |
Implements: the power-optimal bandwidth for the test statistic S.n
given by
This bandwidth rule is the density function equivalent bandwidth rule obtained in the closely relatated regression setting in Gao and Gijbels (2008) and is designed to optimize the test's power subject to keeping the size contant.
A scalar, the estimate the power-optimal bandwidth.
Dimitrios Bagkavos
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Gao and Gijbels, Bandwidth selection in nonparametric kernel testing, pp. 1584-1594, JASA (2008)
Implements the (classical) kernel density estimator, see (2.2a) in Silverman (1986).
kde(xin, xout, h, kfun)
kde(xin, xout, h, kfun)
xin |
A vector of data points. Missing values not allowed. |
xout |
A vector of grid points at which the estimate will be calculated. |
h |
A scalar, the bandwidth to use in the estimate, e.g. |
kfun |
Kernel function to use. Supported kernels: |
The classical kernel density estimator is given by
is determined by a bandwidth selector such as Silverman's default plug-in rule.
A vector with the density estimates at the designated points xout.
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Silverman (1986), Density Estimation for Statistics and Data Analysis, Chapman and Hall, London.
x<-seq(-5, 5,length=100) #design points where the estimate will be calculated plot(x, dnorm(x), type="l", xlab = "x", ylab="density") #plot true density function SampleSize <- 100 ti<- rnorm(SampleSize) #draw a random sample from the actual distribution huse<-bw.nrd(ti) arg2<-kde(ti, x, huse, Epanechnikov) #Calculate the estimate lines(x, arg2, lty=2) #draw the result on the graphics device.
x<-seq(-5, 5,length=100) #design points where the estimate will be calculated plot(x, dnorm(x), type="l", xlab = "x", ylab="density") #plot true density function SampleSize <- 100 ti<- rnorm(SampleSize) #draw a random sample from the actual distribution huse<-bw.nrd(ti) arg2<-kde(ti, x, huse, Epanechnikov) #Calculate the estimate lines(x, arg2, lty=2) #draw the result on the graphics device.
Implements various kernel functions, including boundary, integrated and discrete kernels for use in the definition of the nonparametric estimates
Biweight(x, ...) Epanechnikov(x, ...) Triangular(x, ...) Gaussian(x, ...) Rectangular(x, ...) Epanechnikov2(x)
Biweight(x, ...) Epanechnikov(x, ...) Triangular(x, ...) Gaussian(x, ...) Rectangular(x, ...) Epanechnikov2(x)
x |
A vector of data points where the kernel will be evaluated. |
... |
Further arguments. |
Implements the Biweight, Triangular, Guassian, Rectangular and Epanechnikov (including the alternative version in Epanechnikov2) kernels.
The value of the kernel at
Wand and Jones, (1996), Kernel Smoothing, Chapman and Hall, London
Implements the selection of null distribution; to be used within the implementation of the test statistic S.n
NDistDens(x, dist, p1, p2)
NDistDens(x, dist, p1, p2)
x |
A vector of data points - the available sample size. |
dist |
The null distribution. |
p1 |
Argument 1 (vector or object) for the null distribution. |
p2 |
Argument 2 (vector or object) for the null distribution. |
Implements the null distribution evaluation at designated points, given the parameters p1 and p2.
A vector containing the density values of the designated distribution
Dimitrios Bagkavos
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.
Implements the density goodness of fit test statistic of Bagkavos, Patil and Wood (2021), based on aggregation of local discrepancies between the fitted parametric density and a nonparametric empirical density estimator.
S.n(xin, h, dist, p1, p2)
S.n(xin, h, dist, p1, p2)
xin |
A vector of data points - the available sample size. |
h |
The bandwidth to use, typically the output of |
dist |
The null distribution. |
p1 |
Parameter 1 (vector or object) for the null distribution. |
p2 |
Parameter 2 (vector or object) for the null distribution. |
Implements the test statistic used for testing the hypothesis
This density goodness-of-fit test is based on a discretized approximation of the L2 distance. Assuming that is the number of observations and
is the number of bins in which the range of the data is split, the test statistic is:
where is the Epanechnikov kernel implemented in this package with the
Epanechnikov
function. The null model is specified through the
dist
argument with parameters passed through the p1
and p2
arguments. The test is implemented either with bandwidth hopt.edgeworth
or with bandwidth hopt.be
which provide the value of needed for calculation of
and the critical value used to determine acceptance or rejection of the null hypothesis. See the example below for an application to a real world dataset.
A vector with the value of the test statistic as well as the Delta value used for its calculation
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.
library(fGarch) library(boot) ## Not run: data(EuStockMarkets) DAX <- as.ts(EuStockMarkets[,"DAX"]) dax <- diff(log(DAX))#[,"DAX"] # Fit a GARCH(1,1) model to dax returns: lll<-garchFit(~ garch(1,1), data = as.ts(dax), trace = FALSE, cond.dist ="std") # define the model innovations, to be used as input to the test statistic xin<-lll@residuals /[email protected] # exclude smallest value - only for uniform presentation of results #(this step can be excluded): xin = xin[xin!= min(xin)] #inputs for the test statistic: #kernel function to use in implementing the statistic #and functional estimates for optimal h: kfun<-"epanechnikov" a.sig<-0.05 #define the significance level #null hypothesis is that the innovations are normaly distributed: Nulldist<-"normal" p1<-mean(xin) p2<- sd(xin) #Power optimal bandwidth: h<-hopt.edgeworth(xin, Nulldist, kfun, p1, p2, a.sig ) h.be <- hopt.be(xin) # Edgeworth cutoff point: cutoff<-cutoff.edgeworth(xin, Nulldist, kfun, p1, p2, a.sig ) # Bootstrap cutoff point: cutoff.boot<-cutoff.bootstrap(xin, 100, "permutation", Nulldist, h.be, kfun, p1, p2, a.sig) # Asympt. Norm. cutoff point: cutoff.asympt<-cutoff.asymptotic( Nulldist, p1, p2, a.sig ) TestStatistic<-S.n(xin, h, Nulldist, p1, p2) TestStatistic.be<-S.n(xin, h.be, Nulldist, p1, p2) cat("L2 test statistic value with power opt. band:", TestStatistic[1], "\nL2 test statistic value Barry-Essen bandwidth:", TestStatistic.be[1], "\ncritical value asymptotic:", round(cutoff.asympt,3), "critical value bootstrap:", round(cutoff.boot,3), "critical value Edgeworth:", round(cutoff,3), "\n") #L2 test statistic value Edgeworth: 7.257444 #L2 test statistic value Berry-Esseen bandwidth: 10.97069 # critical value Asymptotically Norm.: 1.801847 # critical value Edgeworth: 2.140446 # critical value bootstrap: 6.040048 # L2 test statistic > critical value on all occasions, hence normality is rejected ## End(Not run)
library(fGarch) library(boot) ## Not run: data(EuStockMarkets) DAX <- as.ts(EuStockMarkets[,"DAX"]) dax <- diff(log(DAX))#[,"DAX"] # Fit a GARCH(1,1) model to dax returns: lll<-garchFit(~ garch(1,1), data = as.ts(dax), trace = FALSE, cond.dist ="std") # define the model innovations, to be used as input to the test statistic xin<-lll@residuals /lll@sigma.t # exclude smallest value - only for uniform presentation of results #(this step can be excluded): xin = xin[xin!= min(xin)] #inputs for the test statistic: #kernel function to use in implementing the statistic #and functional estimates for optimal h: kfun<-"epanechnikov" a.sig<-0.05 #define the significance level #null hypothesis is that the innovations are normaly distributed: Nulldist<-"normal" p1<-mean(xin) p2<- sd(xin) #Power optimal bandwidth: h<-hopt.edgeworth(xin, Nulldist, kfun, p1, p2, a.sig ) h.be <- hopt.be(xin) # Edgeworth cutoff point: cutoff<-cutoff.edgeworth(xin, Nulldist, kfun, p1, p2, a.sig ) # Bootstrap cutoff point: cutoff.boot<-cutoff.bootstrap(xin, 100, "permutation", Nulldist, h.be, kfun, p1, p2, a.sig) # Asympt. Norm. cutoff point: cutoff.asympt<-cutoff.asymptotic( Nulldist, p1, p2, a.sig ) TestStatistic<-S.n(xin, h, Nulldist, p1, p2) TestStatistic.be<-S.n(xin, h.be, Nulldist, p1, p2) cat("L2 test statistic value with power opt. band:", TestStatistic[1], "\nL2 test statistic value Barry-Essen bandwidth:", TestStatistic.be[1], "\ncritical value asymptotic:", round(cutoff.asympt,3), "critical value bootstrap:", round(cutoff.boot,3), "critical value Edgeworth:", round(cutoff,3), "\n") #L2 test statistic value Edgeworth: 7.257444 #L2 test statistic value Berry-Esseen bandwidth: 10.97069 # critical value Asymptotically Norm.: 1.801847 # critical value Edgeworth: 2.140446 # critical value bootstrap: 6.040048 # L2 test statistic > critical value on all occasions, hence normality is rejected ## End(Not run)
Implements the bootstraped version of the density goodness-of-fit test defined in (6) Bagkavos, Patil and Wood (2021).
S.n.Boot(xin1, indices, h, dist, kfun, p1, p2)
S.n.Boot(xin1, indices, h, dist, kfun, p1, p2)
xin1 |
A vector of data points to perfrom bootstrap on. |
indices |
indices to use for the bootstrap process. |
h |
The bandwidth to use, typically the output of |
dist |
The null distribution. |
kfun |
The kernel to use in the density estimates used in the bandwidth expression. |
p1 |
Argument 1 (vector or object) for the null distribution. |
p2 |
Argument 2 (vector or object) for the null distribution. |
Implements the bootstrap version of the test statistic S.n
for use in the cutoff.bootstrap
function. This function is typically not to be called directly by the user; it is rather meant to be called indirectly through the cutoff.bootstrap
function.
A vector of values of the test statistic.
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.
Implements the multivariate (d >=2) density goodness of fit test statistic of Bagkavos, Patil and Wood (2021), based on aggregation of local discrepancies between the fitted parametric density and a nonparametric empirical density estimator.
S.nd(xin, h, dist, p1, p2)
S.nd(xin, h, dist, p1, p2)
xin |
A matrix (n x d) of data points - the available sample with n rows and d columns, each column corresponds to a different coordinate axis. |
h |
The bandwidth vector to use, typically the output of |
dist |
The null distribution. |
p1 |
Parameter 1 (vector or object) for the null distribution. |
p2 |
Parameter 2 (vector or object) for the null distribution. |
Implements the test statistic used for testing the hypothesis
This density goodness-of-fit test is based on a discretized approximation of the L2 distance. Assuming that is the number of observations and
is the number of bins in which the range of the data is split, the test statistic is:
where is the Epanechnikov kernel implemented in this package with the
Epanechnikov
function. The null model is specified through the
dist
argument with parameters passed through the p1
and p2
arguments. The test is implemented either with bandwidth hopt.edgeworth
or with bandwidth hopt.be
which provide the value of needed for calculation of
and the critical value used to determine acceptance or rejection of the null hypothesis.
A vector with the value of the test statistic as well as the Delta value used for its calculation
R implementation and documentation: Dimitrios Bagkavos <[email protected]>
Bagkavos, Patil and Wood: Nonparametric goodness-of-fit testing for a continuous multivariate parametric model, (2021), under review.
library(mvtnorm) sigma <- matrix(c(4,2,2,3), ncol=2) x <- rmvnorm(n=100, mean=c(1,2), sigma=sigma) h.be1 <- hopt.be(x[,1]) h.be2 <- hopt.be(x[,2]) h<-c(h.be1, h.be2) Nulldist<-"normal" S.nd(x, h, Nulldist, c(1,2), sigma)
library(mvtnorm) sigma <- matrix(c(4,2,2,3), ncol=2) x <- rmvnorm(n=100, mean=c(1,2), sigma=sigma) h.be1 <- hopt.be(x[,1]) h.be2 <- hopt.be(x[,2]) h<-c(h.be1, h.be2) Nulldist<-"normal" S.nd(x, h, Nulldist, c(1,2), sigma)