Package 'GoFKernel' reference manual

Title:	Testing Goodness-of-Fit with the Kernel Density Estimator
Description:	Tests of goodness-of-fit based on a kernel smoothing of the data. References: Pavía (2015) <doi:10.18637/jss.v066.c01>.
Authors:	Jose M. Pavía [aut, cre]
Maintainer:	Jose M. Pavía <jose.m.pavia@uv.es>
License:	GPL
Version:	2.1-3
Built:	2025-03-06 06:58:17 UTC
Source:	CRAN

Testing Goodness-of-fit with the Kernel Density Estimator

Description

Tests of goodness-of-fit based on kernel smoothing of the data.

Details

Package:	GoFKernel
Depends:	R (>=2.17.3), `stats`, `KernSmooth` (>=2.23-8)
Type:	Package
Version:	2.1-1
Date:	2018-05-26
License:	GPL

The most important functions in GoFKernel are dgeometric.test and fan.test.

Author(s)

Jose M. Pavia

Maintainer: Jose M. Pavia <Jose.M.Pavia@uv.es>

References

Fan, Y (1994) "Testing the goodness-of-fit of a parametric density function by kernel method", Econometric Theory, 10, 316-356.

Li, O. and Racine, J.F. (2007) "Nonparametric Econometrics", Princeton University Press, New Jersey.

Pavia, JM (2015) "Testing Goodness-of-fit with the Kernel Density Estimator: GoFKernel", Journal of Statistical Software, Code Snippets, 66(1), 1–27.

Area between a Density Function and a Kernel Estimate

Description

The function area.between is an (internal) function of the GoFKernel package that calculates the area, in a given interval, between a theoretical density function and an empirical kernel estimate. area.between is called by dgeometric.test of the GoFKernel package.

Usage

area.between(f, kernel.density, lower = -Inf, upper = Inf)
area.between(f, kernel.density, lower = -Inf, upper = Inf)

Arguments

`f`	a density function.
`kernel.density`	an empirical kernel estimate, an object of the class `density`.
`lower`	lower limit of the support of f, default -Inf.
`upper`	upper limit of the support of f, default Inf.

Details

area.between is called by dgeometric.test and numerically calculates the area between the density function of the null hypothesis and the kernel density estimate of either the observed sample or a simulated sample from f.

Value

A number corresponding to the numerical value of the area between a density function and a kernel estimate.

Author(s)

Jose M. Pavia

Examples

## Unbounded example
x <- rnorm(100)
dx <- density(x)
area.between(dnorm, dx)

## Bounded example
x <- rbeta(100, 1.3, 2)
dx <- density.reflected(x, lower=0, upper=1)
area.between(dunif, dx)
## Unbounded example
x <- rnorm(100)
dx <- density(x)
area.between(dnorm, dx)

## Bounded example
x <- rbeta(100, 1.3, 2)
dx <- density.reflected(x, lower=0, upper=1)
area.between(dunif, dx)

Kernel Density Estimation with Reflection

Description

The function density.reflected computes kernel density estimates for univariate observations using reflection in the borders.

Usage

## S3 method for class 'reflected'
density(x, lower = -Inf, upper = Inf, weights= NULL, ...)
## S3 method for class 'reflected'
density(x, lower = -Inf, upper = Inf, weights= NULL, ...)

Arguments

`x`	a numeric vector of data from which the estimate is to be computed.
`lower`	the lower limit of the interval to which x is theoretically constrained, default -Inf.
`upper`	the upper limit of the interval to which x is theoretically constrained, default, Inf.
`weights`	numeric vector of non-negative observation weights, hence of same length as x. The default NULL is equivalent to weights = rep(1/length(x), length(x)).
`...`	further `density` arguments.

Details

density.reflected is called by dgeometric.test and computes the density kernel estimate of a univariate random sample x of a random variable defined in the interval (lower,upper) using the default options of density and reflection in the borders. This avoids the density kernel estimate being underestimated in the proximity of lower or upper. For unbounded variables, density.reflected generates the same output as density with its default options.

Value

An object of the class density with borders correction, whose underlying structure is a list containing the following components.

`x`	the `n` coordinates of the points where the density is estimated.
`y`	the estimated density values. These will be non-negative.
`bw`	the bandwidth used.
`n`	the sample size after elimination of missing values.
`call`	the call which produced the result.
`data.name`	the deparsed name of the `x` argument.
`has.na`	logical, for compatibility (always `FALSE`).

The print method reports summary values on the x and y components.

Note

The function is based on density.

Author(s)

Jose M. Pavia

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) "The New S Language." Wadsworth & Brooks/Cole (for S version).

Scott, D. W. (1992) "Multivariate Density Estimation. Theory, Practice and Visualization." New York: Wiley.

Sheather, S. J. and Jones M. C. (1991) "A reliable data-based bandwidth selection method for kernel density estimation." J. Roy. Statist. Soc. B, 683–690.

Silverman, B. W. (1986) "Density Estimation." London: Chapman and Hall.

Venables, W. N. and Ripley, B. D. (2002) "Modern Applied Statistics with S." New York: Springer.

Examples

set.seed(234)
x <- runif(2000)
dx <- density.reflected(x,0,1)

## Plot of the density estimate with and without reflection
par(mfcol=c(1,2))
plot(dx, xlim=c(-0.1,1.1), ylim=c(0,1.1))
abline(h=1, col="red")

plot(density(x), xlim=c(-0.1,1.1), ylim=c(0,1.1))
abline(h=1, col="blue")
set.seed(234)
x <- runif(2000)
dx <- density.reflected(x,0,1)

## Plot of the density estimate with and without reflection
par(mfcol=c(1,2))
plot(dx, xlim=c(-0.1,1.1), ylim=c(0,1.1))
abline(h=1, col="red")

plot(density(x), xlim=c(-0.1,1.1), ylim=c(0,1.1))
abline(h=1, col="blue")

Geometric Goodness-of-fit Test

Description

Implementation of the goodness-of-fit test based on assessing the size of the area between the null hypothesis density function and a kernel density estimate of a sample.

Usage

dgeometric.test(x, fun.den, par = NULL, lower = -Inf, upper = Inf, n.sim = 101,
                bw=NULL)
dgeometric.test(x, fun.den, par = NULL, lower = -Inf, upper = Inf, n.sim = 101,
                bw=NULL)

Arguments

`x`	a numeric vector of data values for which the null hypothesis is tested.
`fun.den`	an actual density distribution function, such as `dnorm`. Only continuous densities are valid.
`par`	list of (additional) parameters of the density function under the null hypothesis, default NULL.
`lower`	lower end point of the support of the random variable defined by `fun.den`, default -Inf.
`upper`	upper end point of the support of the random variable defined by `fun.den`, default -Inf.
`n.sim`	number of iterations performed to calculate the `p.value` of the test, default `101`.
`bw`	a number indicating the bandwidth to be used in the empirical kernel estimate of the data, default NULL. In its default option, the bandwidth varies in each simulated dataset and is the one estimated by default by `density` with a Gaussian kernel.

Details

dgeometric.test uses numerical integration and Monte Carlo simulation to implement the test based on assessing the extend of the area between a null hypothesis density function and a density kernel estimation. It works as follows. After computing by numerical integration the area between the density function under the null hypothesis and its sample empirical kernel estimate obtained using density.reflected, the p-value of the test is obtained by simulation as follows: (i) drawing n.sim samples from fun.den with the same size length(x) of our actual sample x; (ii) estimating the kernel density function for each of these new samples; (iii) computing the area between the theoretical density and each of the estimates obtained in (ii); and, (iv) calculating the p-value as the proportion of times the sample n.sim areas computed in (iii) exceed the value of the area computed from the observed sample.

Value

The output is an object of the class htest exactly like for the Kolmogorov-Smirnov test, ks.test. A list containing the following components:

`statistic`	the value of the test statistic.
`p.value`	the p-value of the test.
`method`	the character string "Geometric test".
`data.name`	a character string giving the name of the data.

Note

dgeometric.test calls density.reflected and area.between (and, in some circunstances, also inverse, random.function and support.facto), which are (internal) functions of the package GoFKernel.

Author(s)

Jose M. Pavia

References

Pavia, JM (2015) "Testing Goodness-of-fit with the Kernel Density Estimator: GoFKernel", Journal of Statistical Software, Code Snippets, 66(1), 1–27.

Examples


set.seed(12)
x <- rlnorm(50, meanlog=1, sdlog=1)
## test if x follows a Gamma distribution with shape .6 and rate .1
dgeometric.test(x, dgamma, par=list(shape=0.6, rate=0.1), lower=0, upper=Inf, n.sim=100)

f0 <- function(x) ifelse(x>=0 & x<=1, 2-2*x, 0)
## test if risk76.1929 follows the distribution characterized by f0
dgeometric.test(risk76.1929, f0, lower=0, upper=1, n.sim=21)

set.seed(12)
x <- rlnorm(50, meanlog=1, sdlog=1)
## test if x follows a Gamma distribution with shape .6 and rate .1
dgeometric.test(x, dgamma, par=list(shape=0.6, rate=0.1), lower=0, upper=Inf, n.sim=100)

f0 <- function(x) ifelse(x>=0 & x<=1, 2-2*x, 0)
## test if risk76.1929 follows the distribution characterized by f0
dgeometric.test(risk76.1929, f0, lower=0, upper=1, n.sim=21)

Univariate implementation of the test of Fan (1994) in the form proposed by Li and Racine (2007).

Description

Given a sample of a continuous univariate random variable and a density function fun.den with support in the interval (lower, upper)), fan.test considers the test whose null hypothesis is that the sample has fun.den as density function and the test statistic and the corresponding p-value of the test based on the integral of the squared difference between the null hypothesis density function and a kernel smoothing approximation. To properly run, the KernSmooth package needs to be installed, as in its default option it depends on the dpik function to estimate the bandwidth.

Usage

fan.test(x, fun.den, par = NULL, lower = -Inf, upper = Inf, kernel = "normal",
         bw=NULL)
fan.test(x, fun.den, par = NULL, lower = -Inf, upper = Inf, kernel = "normal",
         bw=NULL)

Arguments

`x`	a numeric vector of data values for which the null hypothesis is tested.
`fun.den`	an actual density distribution function, such as `dnorm`. Only continuous densities are valid.
`par`	list of (additional) parameters of the density function under the null hypothesis, default NULL.
`lower`	lower end point of the support of the random variable defined by `fun.den`, default -Inf.
`upper`	upper end point of the support of the random variable defined by `fun.den`, default -Inf.
`kernel`	a character string with the kernel to be used, either "normal" (a N(0,1) density), "box" (a uniform in -1 to 1) or "epanech" (a Epanechnikov quadratic kernel), default "normal".
`bw`	a number indicating the bandwidth to be used in the empirical kernel estimate of the data, default NULL. In its default option, the bandwidth is estimated using the `dpik` function included in the package `KernSmooth`.

Details

The Fan's test is based on a normal approximation of the integral of the squared difference between the null hypothesis density function and a kernel smoothing approximation. In Li and Racine's form it is obtained as the aggregation of (i) a sampling component, (ii) the integrate of the square of the kernel convolution of the density null function and (iii) the sum of the convolution of the density in the sampled values, see Li and Racine (2007, pp.380-1) for details.

Value

The output is an object of the class htest exactly like for the Kolmogorov-Smirnov test, ks.test. A list containing the following components:

`statistic`	the value of the test statistic.
`p.value`	the p-value of the test.
`method`	the character string "Geometric test".
`data.name`	a character string giving the name of the data.

Warning

fan.test calls the dpik function of KernSmooth

Note

To properly run the function requires the package KernSmooth to be installed to estimate the bandwidth.

Author(s)

Jose M. Pavia

References

Fan, Y (1994) "Testing the goodness-of-fit of a parametric density function by kernel method", Econometric Theory, 10, 316–356.

Li, O. and Racine, J.F. (2007) "Nonparametric Econometrics", Princeton niversity Press, New Jersey.

Examples

fan.test(runif(100), dunif, lower=0, upper=1)

f0 <- function(x) ifelse(x>=0 & x<=1, 2-2*x, 0)
## testing if risk76.1929 follows the distribution characterized by f0
fan.test(risk76.1929, f0, lower=0, upper=1, kernel="epanech")

fan.test(runif(100), dunif, lower=0, upper=1)

f0 <- function(x) ifelse(x>=0 & x<=1, 2-2*x, 0)
## testing if risk76.1929 follows the distribution characterized by f0
fan.test(risk76.1929, f0, lower=0, upper=1, kernel="epanech")

Inverse CDF Function

Description

Function to calculate the inverse function of a cumulative distribution function.

Usage

inverse(f, lower = -Inf, upper = Inf)inverse(f, lower = -Inf, upper = Inf)

Arguments

`f`	a cdf function for which we want to obtain its inverse.
`lower`	the lower limit of `f` domain (support of the random variable), default -Inf.
`upper`	the upper limit of `f` domain (support of the random variable), default Inf.

Details

inverse is called by random.function and calculates the inverse of a given function f. inverse has been specifically designed to compute the inverse of the cumulative distribution function of an absolutely continuous random variable, therefore it assumes there is only a root for each value in the interval (0,1) between f(lower) and f(upper). It is for internal use in dgeometric.test.

Value

A function, the inverse function of a cumulative distribution function f.

Note

This function uses either optim with default options method="L-BFGS-B" or uniroot to derive the inverse function.

The upper endpoint must be strictly larger than the lower endpoint.

Author(s)

Jose M. Pavia

References

See the references in optim and uniroot.

Examples

f <- function(x) pbeta(x, shape1=2, shape2=3)
f.inv <- inverse(f,lower=0,upper=1)
f.inv(.2)
f <- function(x) pbeta(x, shape1=2, shape2=3)
f.inv <- inverse(f,lower=0,upper=1)
f.inv(.2)

Random Draw Generator

Description

This function generates random draws of a continuous random variable given either its density or its cumulative distribution function.

Usage

random.function(n = 1, f, lower = -Inf, upper = Inf, kind = "density")
random.function(n = 1, f, lower = -Inf, upper = Inf, kind = "density")

Arguments

`n`	number of draws, default 1.
`f`	either a density (default) or cumulative distribution function of the random variable.
`lower`	lower limit of the support of the random variable, default -Inf.
`upper`	upper limit of the support of the random variable, default Inf.
`kind`	character string with the function used to identify the distribution, either "density" (default) or "cumulative", as alternative.

Details

random.function uses the method of the inverse of the cdf to generate random draws from f.

Value

A vector of length n with n draws from a random variable with density (or cumulative distribution) function given by f.

Note

random.function is called by dgeometric.test when the corresponding r- function (random generator of f) is not available in the environment. random.function generates random samples from the null hypothesis density function specified in dgeometric.test.

Author(s)

Jose M. Pavia

Examples

f0 <- function(x) ifelse(x>=0 & x<=1, 2-2*x, 0)
random.function(10, f0, lower=0, upper=1, kind="density")
f0 <- function(x) ifelse(x>=0 & x<=1, 2-2*x, 0)
random.function(10, f0, lower=0, upper=1, kind="density")

Inmigrants Exposed to Risk of Death

Description

Vector containing the time exposed to risk of death with 76 years during 2006 for the 2006 registered Spanish immigrants born in 1929.

Usage

data(risk76.1929)data(risk76.1929)

Format

The format is: num [1:362] 0.94 0.885 0.863 0.852 0.797 ...

Note

Under the null hypotheses of uniform distribution of date of birth and date of migration, this time exposed to risk is distributed as a f(x)=2-2x 0<x<1.

Source

Own elaboration from data available in www.ine.es

Examples

plot(density.reflected(risk76.1929, 0, 1))
plot(density.reflected(risk76.1929, 0, 1))

"De Facto" Support

Description

support.facto computes the de facto numerical limits of a density function with theoretical infinite support. This function is an (internal) function of the GoFKernel package.

Usage

support.facto(f, lower = -Inf, upper = Inf)
support.facto(f, lower = -Inf, upper = Inf)

Arguments

`f`	a density function.
`lower`	theoretical lower limit of the support of the random variable, default -Inf.
`upper`	theoretical upper limit of the support of the random variable, default, Inf.

Details

support.facto requires that the two first ordinary moments of f exist; otherwise, support.facto returns the introduced limits.

Value

A two components vector with the de facto lower and upper limits of f.

Author(s)

Jose M. Pavia

Examples

support.facto(dnorm)
support.facto(dnorm)

Package 'GoFKernel'

Help Index

Testing Goodness-of-fit with the Kernel Density Estimator

Description

Details

Author(s)

References

Area between a Density Function and a Kernel Estimate

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Kernel Density Estimation with Reflection

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Geometric Goodness-of-fit Test

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Univariate implementation of the test of Fan (1994) in the form proposed by Li and Racine (2007).

Description

Usage

Arguments

Details

Value

Warning

Note

Author(s)

References

See Also

Examples

Inverse CDF Function

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Random Draw Generator

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Inmigrants Exposed to Risk of Death

Description

Usage

Format

Note

Source

Examples

"De Facto" Support

Description