R Package Rgof

library(Rgof)
Bsim = c(100, 200) #Number of Simulation Runs

The package Rgof brings together a number of routines for the goodness-of-fit problem for univariate data. We have a data set x, and we want to test whether it was generated by the probability distribution F.

The highlights of this package are:

  • it runs a large number of different tests simultaneously.
  • it can run several tests and find an adjusted p value for the combination of all the tests.
  • it works for continuous and for discrete (aka histogram) data.
  • it allows for parameter estimation.
  • it uses Rcpp and parallel programming to be very fast.
  • it includes routines that make it very easy to estimate the power of the different tests for specific situations.
  • it allows for the possibility that the sample size was a draw from a Poisson random variable.
  • it allows the user to add more tests.
  • it includes a routine to draw the usual power graph.
set.seed(123)

Note all runs of the test routine are done with B=1000 and all runs of the power routines with arguments B=c(500, 500), maxProcessor = 2 in order to pass devtools::check().

The Methods

  • Continuous Data
  1. Kolmogorov Smirnov (KS) (Massey 1951), (Kolmogorov 1933), (Smirnov 1948)
  2. Kuiper (K) (Kuiper 1960)
  3. Anderson-Darling (AD) (Anderson and Darling 1952), (Anderson and Darling 1954)
  4. Cramer-vonMises (CvM) (Anderson 1962)
  5. Wilson (W)
  6. Zhang’s methods (ZA, ZK, ZC) (Zhang 2002)
  7. Wasserstein p=1 (Wassp1) (E del Barrio 1999)

For all of these tests the distribution of the test statistic under the null hypothesis is found via simulation.

  1. Eight variations of chi square tests, using the formulas by Pearson or based on the likelihood ratio test, bins of equal size or equal probability and both a large number and a small number of bins, 50 and 10 by default. The p values are found using the usual chi square approximation to the null distribution. If parameters are estimated this is done either via the method of minimum chi square (Berkson 1980) or via a user-provided estimator. In all cases bins are combined until all of them have an expected count of at least 5.

There is a very large literature on chi square tests, the oldest of the goodness of fit tests. For a survey see (Rolke and Gutierrez-Gongora 2020).

  • Discrete (Histogram) Data

All the methods above are also implemented for discrete data, except for Zhang’s tests, which have no discrete analog.

It is worth noting that these discrete versions are based on the theoretical ideas of the tests and not on the actual formula of calculation for the continuous case. The test statistics can therefore be different even when applied to the same data. For example, the Anderson-Darling test is based on the distance measure

$$A^2=n\int_{-\infty}^{\infty} \frac{(\hat{F}(x)-F(x))^2}{F(x)(1-F(x))}dF(x) $$ where F is the theoretical distribution function under the null hypothesis and is the empirical distribution function. In the case of continuous data it can be shown that

$$A^2=-n-\frac1n\sum_{i=1}^n (2i-1)\left(\log F(x_i) +\log[1-F(x_{n+1-i})\right)$$ However, for discrete data we have

$$A^2=n\sum_{i=1}^k \frac{(\hat{F}(x_i)-F(x_i))^2}{F(x_i)(1-F(x_i))}\left(F(x_i)-F(x_{i-1}\right)$$

with F(x0) = 0.

In the continuous case is a step function but F is continuous, and therefore A2 > 0. In the discrete case howeverA2 = 0 is possible. This shows that the two cases are fundamentally different.

As for continuous data null distributions are found using simulation. In fact in the case of discrete data none of the tests has a known distribution for the test statistic under the null hypothesis.

  1. Four variations of chi square tests, using the formulas by Pearson and log-likelihood as well as a large number and a small number of bins. Again the routine combines bins until all have expected counts greater than 5, and the chi square approximation is used to find p values. The combination of bins is done in such a way that the bins remain of equal size as much as possible.

These methods can be used for both discrete and histogram data. The main difference between these two is that discrete data has (a countable) number of possible values whereas histogram data has possible ranges of values (the bins). The only method directly affected by this difference is Wassp1, which requires actual values. All other methods ignore the vals argument.

Testing

Discrete (Histogram) Data/Model

Simple Null Hypothesis

We generate a data set of size 1000 from a Binomial distribution with n=20 and success probability 0.5, and then test H0 : F = Bin(20, 0.5).

vals=0:20 #possible values of random variable
pnull=function()  pbinom(0:20, 20, 0.5)  # cumulative distribution function (cdf)
rnull = function() table(c(0:20, rbinom(1000, 20, 0.5)))-1 
# generate data under the null hypothesis, make sure that vector of counts has same length as vals, possibly 0.
  • Null Hypothesis is true
x = rnull()
# Basic Test
gof_test(x, vals, pnull, rnull, B=1000)
#> $statistics
#>     KS      K     AD    CvM      W Wassp1    l-P    s-P    l-L    s-L 
#> 0.1701 0.1738 0.1329 0.0249 2.0453 0.0346 2.6011 2.2335 2.6752 2.2953 
#> 
#> $p.values
#>     KS      K     AD    CvM      W Wassp1    l-P    s-P    l-L    s-L 
#> 0.6710 0.7540 0.9970 0.9700 0.6690 0.9950 0.9978 0.9942 0.9974 0.9935
#Test with adjusted overall p value
gof_test_adjusted_pvalue(x, vals, pnull, rnull, B=c(1000, 500))
#> p values of individual tests:
#> W :  0.689
#> AD :  0.993
#> s-P :  0.9963
#> adjusted p value of combined tests: 0.9571
  • Null Hypothesis is false
x = table(c(0:20, rbinom(1000, 20, 0.55)))-1
#true p is 0.55, not 0.5
# Basic Test
gof_test(x, vals, pnull, rnull, B=1000, doMethod = "all")$p.value
#>     KS      K     AD    CvM      W Wassp1    l-P    s-P    l-L    s-L 
#>      0      0      0      0      0      0      0      0      0      0
#Test with adjusted overall p value
gof_test_adjusted_pvalue(x, vals, pnull, rnull, B=c(1000, 500))
#> p values of individual tests:
#> W :  0
#> AD :  0
#> s-P :  0
#> adjusted p value of combined tests: 0

Arguments of gof_test for discrete data/model:

  • x: vector with counts (histogram heights). Should have a number for each value of vals, possibly 0.

  • vals: all possible values of discrete random variable, that is all x with P(X = x) > 0

  • pnull: function to find values of cumulative distribution function for each value of vals. Function has no arguments.

  • rnull: function to generate data from true density. Function has no arguments. Function needs to insure that output is a vector with same length as vals.

  • B=5000: number of simulation runs

  • w: function to find importance sampling weights, if needed

  • phat: function to estimate parameters

  • TS: function to find values of user-supplied test statistics

  • TSextra: a list that is passed to TS if any additional info is required.

  • nbins=c(50, 10): number of bins for chi square tests. The first one is already given by the data in the discrete case, the for the second bins are joined.

  • rate=0, if not 0 sample size is assumed to have come from a Poisson random variable with rate “rate”.

  • minexpcount=5, minimal expected counts for chi square tests.

  • ChiUsePhat=TRUE, if TRUE uses phat for parameter estimation. If false uses method of minimum chi square

  • maxProcessors=1 if greater than 1 number of cores for parallel processing.

  • doMethods=“all” names of methods to include

The arguments of gof_test_adjusted_pvalue for discrete data/model are the same, except that the number of simulation runs B is two numbers. The first is used for estimating the individual p values, the second for the adjustment.

Random Sample Size

In some fields like high energy physics it is common that the sample size is not fixed but a random variable drawn from a Poisson distribution with a known rate. Our package runs this as follows:

rnull = function() table(c(0:20, rbinom(rpois(1, 650), 20, 0.5)))-1 
x = rnull()
gof_test(x, vals, pnull, rnull, rate=650, B=1000)$p.value

Composite Null Hypothesis

We generate a data set of size 1000 from a binomial distribution with n=20 and success probability p, and then test F=Bin(20, .). p is estimated from data.

vals=0:20
pnull=function(p=0.5)  pbinom(0:20, 20, ifelse(p>0&&p<1, p, 0.5))  
rnull = function(p=0.5) table(c(0:20, rbinom(1000, 20, p)))-1
phat = function(x) sum(0:20*x)/sum(x)/20
  • Null Hypothesis is true
x = table(c(0:20, rbinom(1000, 20, 0.5)))-1  
gof_test(x, vals, pnull, rnull, phat=phat, B=1000)$p.value
#>     KS      K     AD    CvM      W Wassp1    l-P    s-P    l-L    s-L 
#> 0.8380 0.9060 0.8090 0.7790 0.5720 0.8190 0.6860 0.5217 0.6839 0.5153
  • Null Hypothesis is true
x = table(c(0:20, rbinom(1000, 20, 0.55)))-1 
# p is not 0.5, but data is still from a binomial distribution with n=20
gof_test(x, vals, pnull, rnull, phat=phat, B=1000)$p.value
#>     KS      K     AD    CvM      W Wassp1    l-P    s-P    l-L    s-L 
#> 0.9770 0.9560 0.1730 0.1150 0.9890 0.1350 0.3852 0.2295 0.3100 0.1673
  • Null Hypothesis is false
x = table(c(rep(0:20, 5), rbinom(1000-21*5, 20, 0.53))) 
# data has to many small and large values to be from a binomial
gof_test(x, vals, pnull, rnull, phat=phat, B=1000)$p.value
#>     KS      K     AD    CvM      W Wassp1    l-P    s-P    l-L    s-L 
#>  0.561  0.046  0.000  0.006  0.138  0.000  0.000  0.000  0.000  0.000

The estimation of the parameter(s) in the case of the chi square tests is done either by using the function phat or via the minimum chi square method. The routine uses a general function minimizer. If there are values of the parameter that are not possible this can lead to warnings. It is best to put a check into the pnull function to avoid this issue. As an example the function pnull above checks that the success probability p is in the interval (0, 1).

Histogram Data

A variant of discrete data sometimes encountered is data given in the form of a histogram, that is as a set of bins and their counts. The main distinction is that discrete data has specific values, for example the non-negative integers for a Poisson distribution, whereas histogram data has ranges of numbers, the bins. It turns out that, though, that the only method that requires actual values is Wassp1, and for that method one can use the midpoint of the intervals.

As an example consider the following case: we have histogram data and we want to test whether it comes from an exponential rate 1 distribution, truncated to the interval 0-2:

rnull = function() {
  y = rexp(2500, 1) # Exp(1) data
  y = y[y<2][1:1500] # 1500 events on 0-2
  bins = 0:40/20 # binning
  hist(y, bins, plot=FALSE)$counts # find bin counts
}
x = rnull()
bins = 0:40/20
vals = (bins[-1]+bins[-21])/2
pnull = function() {
   bins = 1:40/20
   pexp(bins, 1)/pexp(2, 1)
}
  
gof_test(x, vals, pnull, rnull)$p.value
#>     KS      K     AD    CvM      W Wassp1    l-P    s-P    l-L    s-L 
#> 0.4910 0.4066 0.6282 0.5134 0.3926 0.4582 0.9121 0.8115 0.9140 0.8212

Continuous Data

Simple Hypothesis

pnull = function(x) pnorm(x)
rnull = function()  rnorm(1000)
TSextra = list(qnull=function(x) qnorm(x)) #optional quantile function used by chi square tests and Wassp1 test.
  • Null Hypothesis is true
x = rnorm(1000)
#Basic Tests
gof_test(x, NA, pnull, rnull, B=1000, TSextra=TSextra)$p.value
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.1740 0.1740 0.1180 0.2190 0.0220 0.0860 0.1200 0.0830 0.1000 0.2065 0.1726 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.0900 0.3902 0.0657 0.1222 0.0780 0.3964
#Adjusted p value
gof_test_adjusted_pvalue(x, NA, pnull, rnull, B=c(1000,500), TSextra=TSextra)
#> p values of individual tests:
#> W :  0.024
#> ZC :  0.076
#> AD :  0.114
#> ES-s-P :  0.1726
#> adjusted p value of combined tests: 0.0739
  • Null Hypothesis is false
x = rnorm(1000, 0.5) 
gof_test(x, NA, pnull, rnull, B=1000, TSextra=TSextra)$p.value
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#>      0      0      0      0      0      0      0      0      0      0      0 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#>      0      0      0      0      0      0

Composite Hypothesis - One Parameter

pnull = function(x, p=0) pnorm(x, p)
TSextra = list(qnull = function(x, p=0) qnorm(x, p))
rnull = function(p)  rnorm(1000, p)
phat = function(x) mean(x)
  • Null Hypothesis is true
x = rnorm(1000) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.5000 0.5000 0.5700 0.4070 0.3770 0.7700 0.9630 0.8130 0.5750 0.6294 0.3721 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.6892 0.5759 0.6864 0.3567 0.6990 0.5952
  • Null Hypothesis is true
x = rnorm(1000, 0.5) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra)$p.value
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.6290 0.6290 0.9116 0.8350 0.8398 0.7108 0.8500 0.5976 0.9202 0.2519 0.4048 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.3209 0.6517 0.1733 0.3840 0.3186 0.6514
  • Null Hypothesis is false
x = rnorm(1000, 0.5, 2) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#>      0      0      0      0      0      0      0      0      0      0      0 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#>      0      0      0      0      0      0

Composite Hypothesis - Multiple Parameters

pnull = function(x, p=c(0, 1)) pnorm(x, p[1], ifelse(p[2]>0, p[2], 0.001))
TSextra = list(qnull = function(x, p=c(0, 1)) qnorm(x, p[1], ifelse(p[2]>0, p[2], 0.001)))
rnull = function(p=c(0, 1))  rnorm(1000, p[1], ifelse(p[2]>0, p[2], 0.001))
phat = function(x) c(mean(x), sd(x))
  • Null Hypothesis is true
x = rnorm(1000) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.0140 0.0140 0.0530 0.0420 0.0510 0.2500 0.1640 0.2040 0.0930 0.6443 0.0854 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.6430 0.1610 0.6388 0.0891 0.6430 0.1832
  • Null Hypothesis is true
x = rnorm(1000, 0.5) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.0990 0.0990 0.0650 0.0840 0.0620 0.4200 0.2240 0.3270 0.0640 0.0207 0.1731 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.0093 0.0104 0.0149 0.1613 0.0095 0.0124
  • Null Hypothesis is true
x = rnorm(1000, 0.5, 2) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#> 0.6170 0.6170 0.7500 0.7780 0.8580 0.1300 0.2060 0.0090 0.7090 0.9293 0.6697 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#> 0.5599 0.6478 0.9147 0.6639 0.5355 0.6650
  • Null Hypothesis is false
x = rt(1000, 2) 
gof_test(x, NA, pnull, rnull, phat=phat, TSextra=TSextra, B=1000)$p.value
#>     KS      K     AD    CvM      W     ZA     ZK     ZC Wassp1 ES-l-P ES-s-P 
#>      0      0      0      0      0      0      0      0      0      0      0 
#> EP-l-P EP-s-P ES-l-L ES-s-L EP-l-L EP-s-L 
#>      0      0      0      0      0      0

Power Estimation

For estimating the power of the various tests one also has to provide the routine ralt, which generates data under the alternative hypothesis:

Discrete Data/Model

Simple Null Hypothesis

vals = 0:10
pnull = function() pbinom(0:10, 10, 0.5)
rnull =function () table(c(0:10, rbinom(100, 10, 0.5)))-1
ralt =function (p=0.5) table(c(0:10, rbinom(100, 10, p)))-1
P=gof_power(pnull, vals, rnull, ralt, 
  param_alt=seq(0.5, 0.6, 0.02),  B=Bsim, nbins=c(11, 5))
plot_power(P, "p", Smooth=FALSE)

In all cases the arguments are the same as for gof_test. In addition we now have

  • ralt: a routine with one parameter that generates data under some alternative hypothesis.

  • param_alt: values to be passed to ralt. This allows the calculation of the power for many different values.

  • alpha=0.05 type I error probability for tests.

  • B=c(1000, 1000) the first number is the number of simulation runs for power estimation and the second the number of runs to be used to find the null distribution.

Composite Null Hypothesis

vals = 0:10
pnull = function(p=0.5) pbinom(0:10, 10, ifelse(0<p&p<1,p,0.001))
rnull = function (p=0.5) table(c(0:10, rbinom(100, 10, ifelse(0<p&p<1,p,0.001))))-1
phat = function(x) sum(0:10*x)/1000
  • Null Hypothesis is true
ralt =function (p=0.5) table(c(0:10, rbinom(100, 10, p)))-1
gof_power(pnull, vals, rnull, ralt, c(0.5, 0.6), phat=phat,
        B=Bsim, nbins=c(11, 5), maxProcessors = 2)
#>       KS    K   AD  CvM    W Wassp1   l-P   s-P
#> 0.5 0.04 0.03 0.04 0.03 0.05   0.03 0.095 0.045
#> 0.6 0.08 0.07 0.03 0.04 0.07   0.03 0.045 0.040

Note that power estimation in the case of a composite hypothesis (aka with parameters estimated) is much slower than the simple hypothesis case.

  • Null Hypothesis is false
ralt =function (p=0.5) table(c(rep(0:10, 2), rbinom(100, 10, p)))
gof_power(pnull, vals, rnull, ralt, 0.5, phat=phat,
        B=Bsim, nbins=c(11, 5), maxProcessors = 2)
#>     KS      K     AD    CvM      W Wassp1    l-P    s-P 
#>   0.07   0.33   1.00   1.00   0.67   1.00   1.00   1.00

Continuous Data/Model

Simple Null Hypothesis

pnull = function(x) pnorm(x)
TSextra = list(qnull = function(x) qnorm(x))
rnull = function() rnorm(100)
ralt = function(mu=0) rnorm(100, mu)
gof_power(pnull, NA, rnull, ralt, c(0, 1), TSextra=TSextra, B=Bsim)
#>     KS    K   AD  CvM    W   ZA   ZK   ZC Wassp1 ES-l-P ES-s-P EP-l-P EP-s-P
#> 0 0.02 0.02 0.03 0.04 0.07 0.03 0.01 0.03   0.04   0.06   0.04   0.04   0.03
#> 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00   1.00   1.00   1.00   1.00   1.00
#>   ES-l-L ES-s-L EP-l-L EP-s-L
#> 0   0.06   0.03   0.01   0.03
#> 1   1.00   1.00   1.00   1.00

Composite Null Hypothesis

pnull = function(x, p=c(0,1)) pnorm(x, p[1], ifelse(p[2]>0, p[2], 0.01))
TSextra = list(qnull = function(x, p=c(0,1)) qnorm(x, p[1], ifelse(p[2]>0, p[2], 0.01)))
rnull = function(p=c(0,1)) rnorm(500, p[1], p[2])
ralt = function(mu=0) rnorm(100, mu)
phat = function(x) c(mean(x), sd(x))
gof_power(pnull, NA, rnull, ralt, c(0, 1), phat= phat, 
          TSextra=TSextra, B=Bsim, maxProcessor=2)
#>     KS    K   AD  CvM    W   ZA   ZK   ZC Wassp1 ES-l-P ES-s-P EP-l-P EP-s-P
#> 0 1.00 0.93 0.11 0.11 0.11 0.75 0.02 0.02      1   0.05   0.03   0.04   0.07
#> 1 0.95 0.92 0.12 0.06 0.06 0.65 0.01 0.04      1   0.04   0.04   0.05   0.04
#>   ES-l-L ES-s-L EP-l-L EP-s-L
#> 0   0.06   0.05   0.06   0.06
#> 1   0.05   0.05   0.07   0.05
ralt = function(df=1) {
# t distribution truncated at +- 5  
  x=rt(1000, df)
  x=x[abs(x)<5]
  x[1:100]
}  
gof_power(pnull, NA, rnull, ralt, c(2, 50), phat=phat, 
          Range=c(-5,5), TSextra=TSextra, B=Bsim, maxProcessor=2)
#>      KS    K   AD  CvM    W   ZA   ZK   ZC Wassp1 ES-l-P ES-s-P EP-l-P EP-s-P
#> 2  1.00 1.00 0.63 0.62 0.62 0.93 0.18 0.18      1   0.36   0.31   0.36   0.44
#> 50 0.96 0.91 0.05 0.04 0.05 0.64 0.02 0.02      1   0.04   0.12   0.08   0.03
#>    ES-l-L ES-s-L EP-l-L EP-s-L
#> 2    0.34   0.34   0.31   0.43
#> 50   0.05   0.12   0.08   0.03

Running other tests

Its is very easy for a user to add other goodness-of-fit tests to the package. This can be done by editing the routines TS_cont and/or TS_disc, which are located in the folder inst/examples in the Rgof library folder. Or a user can write their own version of these files.

Example

Say we wish to use a test that is a variant of the Cramer-vonMises test, using the integrated absolute difference of the empirical and the theoretical distribution function:

−∞|F(x) − (x)|dF(x) For continuous data we have the routine

newTScont = function(x, Fx) {
   Fx=sort(Fx)
   n=length(x)
   out = sum(abs( (2*1:n-1)/2/n-Fx ))
   names(out) = "CvM alt"
   out
}

This routine has to have two arguments x and Fx. Note that the return object has to be a named vector. The object TSextra can be used to provide further information to the TS routine, if necessary.

Then we can run this test with

pnull = function(x) punif(x)
rnull = function() runif(500)
x = rnull()
Rgof::gof_test(x, NA, pnull, rnull, TS=newTScont)
#> $statistics
#> CvM alt 
#>  10.833 
#> 
#> $p.values
#> CvM alt 
#>  0.1112

Say we want to find the power of this test when the true distribution is a linear:

ralt = function(slope=0) {
  if(slope==0) y=runif(500)
    else y=(slope-1+sqrt((1-slope)^2+4*slope* runif(500)))/2/slope
}
gof_power(pnull, NA, rnull, ralt, TS=newTScont, param_alt=round(seq(0, 0.5, length=5), 3), Range=c(0,1), B=Bsim)
#>       CvM alt
#> 0        0.07
#> 0.125    0.48
#> 0.25     0.90
#> 0.375    1.00
#> 0.5      1.00

for discrete data we write will the routine using Rcpp:

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector newTSdisc(IntegerVector x, 
                      NumericVector Fx,  
                      NumericVector vals) {
    
  Rcpp::CharacterVector methods=CharacterVector::create("CvM alt");    
  int const nummethods=methods.size();
  int k=x.size(), n, i;
  NumericVector TS(nummethods), ecdf(k);
  double tmp;
  TS.names() =  methods;

  n=0;
  for(i=0;i<k;++i) n = n + x[i];
  ecdf(0) = double(x(0))/double(n);  
  for(i=1;i<k;++i) {
    ecdf(i) = ecdf(i-1) + x(i)/double(n);
  }

  tmp = std::abs(ecdf[0]-Fx(0))*Fx(0);
  for(i=1;i<k;++i) 
     tmp = tmp + std::abs(ecdf(i)-Fx(i))*(Fx(i)-Fx(i-1));
  TS(0) = tmp;
 
  return TS;
}

Again the routine has to have three arguments x, Fx and vals and the output vector has to have names.

Note that one drawback of writing the routine in Rcpp is that it is then not possible to use multiple processors.

vals=1:50/51
pnull = function() (1:50)/50
rnull = function() c(rmultinom(1, 500, rep(1/50,50)))
x = rnull()
gof_test(x, vals, pnull, rnull, TS=newTSdisc)
#> Parallel Programming is not possible if custom TS is written in C++. Switching to single processor
#> $statistics
#> CvM alt 
#> 0.02635 
#> 
#> $p.values
#> CvM alt 
#>  0.0432

and for power calculations we can run

ralt = function(slope=0) {
    if(slope==0) p=rep(1/50, 50)
    else p=diff(slope * (0:50/50)^2 + (1 - slope) * 0:50/50)  
  c(rmultinom(1, 500, p))
}
gof_power(pnull, vals, rnull, ralt, TS=newTSdisc, param_alt=round(seq(0, 0.5, length=5), 3), B=Bsim)
#> Parallel Programming is not possible if custom TS is written in C++. Switching to single processor
#>       CvM alt
#> 0        0.06
#> 0.125    0.38
#> 0.25     0.86
#> 0.375    1.00
#> 0.5      1.00

Adjusted p values for Several Tests

As no single test can be relied upon to consistently have good power, it is reasonable to employ several of them. We could then reject the null hypothesis if any of the tests does so, that is, if the smallest p-value is less than the desired type I error probability α.

This procedure clearly suffers from the problem of simultaneous inference, and the true type I error probability will be much larger than α. It is however possible to adjust the p value so it does achieve the desired α. This can be done as follows:

We generate a number of data sets under the null hypothesis. Generally about 1000 will be sufficient. Then for each simulated data set we apply the tests we wish to include, and record the smallest p value. Here is an example. Say the null hypothesis specifies a uniform [0.1] and a sample size of 250.

pnull=function(x) punif(x)
rnull=function() runif(250)
pvals=matrix(0,1000,16)
for(i in 1:1000) pvals[i, ]=Rgof::gof_test(rnull(), NA, pnull, rnull,B=1000)$p.values

Next we find the smallest p value in each run for two selections of four methods. One is the selection found to be best above, namely the methods by Wilson, Anderson-Darling, Zhang’s ZC and a chi square test with a small number of bins and using Pearson’s formula. As a second selection we use the methods by Kolmogorov-Smirnov, Kuiper, Anderson-Darling and Cramer-vonMises. It can be checked that for this null hypothesis these methods are highly correlated.

colnames(pvals)=names(Rgof::gof_test(rnull(), NA, pnull, rnull,B=10)$p.values)
p1=apply(pvals[, c("W", "ZC", "AD", "ES-s-P" )], 1, min)
p2=apply(pvals[, c("KS", "K", "AD", "CvM")], 1, min)

Next we find the empirical distribution function for the two sets of p values and draw their graphs. We also add the curve for the cases of four identical tests and the case of four independent tests, which of course is the Bonferroni correction. The data for the cdf is in the inst/extdata directory of the package

tmp=readRDS("../inst/extdata/pvaluecdf.rds")
Tests=factor(c(rep("Identical Tests", nrow(tmp)),
        rep("Correlated Selection", nrow(tmp)),
        rep("Best Selection", nrow(tmp)),
        rep("Independent Tests", nrow(tmp))),
        levels=c("Identical Tests",  "Correlated Selection", 
                 "Best Selection", "Independent Tests"),
        ordered = TRUE)
dta=data.frame(x=c(tmp[,1],tmp[,1],tmp[,1],tmp[,1]),
          y=c(tmp[,1],tmp[,3],tmp[,2],1-(1-tmp[,1])^4),
          Tests=Tests)
ggplot2::ggplot(data=dta, ggplot2::aes(x=x,y=y,col=Tests))+
  ggplot2::geom_line(linewidth=1.2)+
  ggplot2::labs(x="p value", y="CDF")+
  ggplot2::scale_color_manual(values=c("blue","red", "Orange", "green"))

Weighted Data

Sometimes the data/model uses importance sampling weights. This can be done as follows. Say we want to test whether the data comes from a standard normal distribution, truncated to [-3,3] and with weights from a t distribution with 3 degrees of freedom:

H0 : F = N(0, 1), X ∼ t(3)

df=3
pnull=function(x) pnorm(x)/(2*pnorm(3)-1)
rnull=function() {x=rt(2000, df);x=x[abs(x)<3];sort(x[1:1000])}
w=function(x) (dnorm(x)/(2*pnorm(3)-1))/(dt(x,df)/(2*pt(3,df)-1))
x=sort(rnull())
plot(x, w(x), type="l", ylim=c(0, 2*max(w(x))))

ralt=function(m=0) {x=rt(2000,df)+m;x=x[abs(x)<3];sort(x[1:1000])}
set.seed(111)
Rgof::gof_power(pnull, NA, rnull, ralt, w=w, param_alt = c(0,0.2), Range=c(-3,3),B=Bsim)
#>       KS    K  CvM   AD ES-l-P ES-s-P EP-l-P EP-s-P
#> 0   0.07 0.07 0.09 0.05   0.05   0.06   0.06   0.08
#> 0.2 1.00 1.00 1.00 1.00   0.83   0.98   0.77   0.96

It should be noted that these tests are quite sensitive to the size of the weights and to the sample size, so one should always do a simulation study to verify that they work in the case under consideration.

References

Anderson, T W. 1962. “On the Distribution of the Two-Sample Cramer-von Mises Criterion.” Annals of Mathematical Statistics 33 (3): 1148–59.
Anderson, T W, and D A Darling. 1952. “Asymptotic Theory of Certain Goodness-of-Fit Criteria Based on Stochastic Processes.” Annals of Mathematical Statistics 23: 193–212.
———. 1954. “A Test of Goodness-of-Fit.” JASA 49: 765–69.
Berkson, J. 1980. “Minimum Chi-Square, Not Maximum Likelihood.” Ann. Math. Stat 8 (3): 457–87.
E del Barrio, C Matran, J A Cuesta-Albertos. 1999. “Tests of Goodness of Fit Based on the L2-Wasserstein Distance.” Annals of Statistics 1230-1239: 27.
Kolmogorov, A. 1933. “Sulla Determinazione Empirica Di Una Legge Di Distribuzione.” G. Ist. Ital. Attuari. 4: 83–91.
Kuiper, N H. 1960. “Tests Concerning Random Points on a Circle.” Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen 63: 38–47.
Massey, F J. 1951. “The Kolmogorov-Smirnov Test for Goodness-of-Fit.” JASA 46: 68–78.
Rolke, Wolfgang, and Cristian Gutierrez-Gongora. 2020. “A Chi-Square Goodness-of-Fit Test for Continuous Distributions Against a Known Alternative.” Computational Statistics. https://doi.org/10.1007/s00180-020-00997-x.
Smirnov, N. 1948. “Table for Estimating the Goodness of Fit of Empirical Distributions.” Annals of Mathematical Statistics 19: 279–81.
Zhang, J. 2002. “Powerful Goodness-of-Fit Tests Based on Likelihood Ratio.” Journal of the RSS (Series B) 64: 281–94.