Title: | Statistical Tests for Evaluating Conformity to Benford's Law |
---|---|
Description: | Several specialized statistical tests and support functions for determining if numerical data could conform to Benford's law. |
Authors: | Dieter William Joenssen [aut, cre, cph], Thomas Muellerleile [ctb] |
Maintainer: | Dieter William Joenssen <[email protected]> |
License: | GPL-3 |
Version: | 1.2.0 |
Built: | 2024-11-01 06:49:23 UTC |
Source: | CRAN |
This package contains several specialized statistical tests and support functions for determining if numerical data could conform to Benford's law.
Package: | BenfordTests |
Type: | Package |
Version: | 1.2.0 |
Date: | 2015-07-18 |
License: | GPL-3 |
BenfordTests
is the implementation of eight goodness-of-fit (GOF) tests to assess if data conforms to Benford's law.
Tests include:
Pearson statistic (Pearson, 1900)
Kolmogorov-Smirnov D statistic (Kolmogorov, 1933)
Freedman's modification of Watson's statistic (Freedman, 1981; Watson, 1961)
Chebyshev distance m statistic (Leemis, 2000)
Euclidean distance d statistic (Cho and Gaines, 2007)
Judge-Schechter mean deviation statistic (Judge and Schechter, 2009)
Joenssen's statistic, a Shapiro-Francia type correlation test (Shapiro and Francia, 1972)
Joint Digit Test statistic, a Hotelling type test (Hotelling, 1931)
All tests may be performed using more than one leading digit.
All tests simulate the specific p-values required for statistical inference, while p-values for the , D,
, and
statistics may also be determined using their asymptotic distributions.
Dieter William Joenssen
Maintainer: Dieter William Joenssen <[email protected]>
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Cho, W.K.T. and Gaines, B.J. (2007) Breaking the (Benford) Law: Statistical Fraud Detection in Campaign Finance. The American Statistician. 61, 218–223.
Freedman, L.S. (1981) Watson's Un2 Statistic for a Discrete Distribution. Biometrika. 68, 708–711.
Joenssen, D.W. (2013) Two Digit Testing for Benford's Law. Proceedings of the ISI World Statistics Congress, 59th Session in Hong Kong. [available under http://www.statistics.gov.hk/wsc/CPS021-P2-S.pdf]
Judge, G. and Schechter, L. (2009) Detecting Problems in Survey Data using Benford's Law. Journal of Human Resources. 44, 1–24.
Kolmogorov, A.N. (1933) Sulla determinazione empirica di una legge di distibuzione. Giornale dell'Istituto Italiano degli Attuari. 4, 83–91.
Leemis, L.M., Schmeiser, B.W. and Evans, D.L. (2000) Survival Distributions Satisfying Benford's law. The American Statistician. 54, 236–241.
Newcomb, S. (1881) Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics. 4, 39–40.
Pearson, K. (1900) On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it can be Reasonably Supposed to have Arisen from Random Sampling. Philosophical Magazine Series 5. 50, 157–175.
Shapiro, S.S. and Francia, R.S. (1972) An Approximate Analysis of Variance Test for Normality. Journal of the American Statistical Association. 67, 215–216.
Watson, G.S. (1961) Goodness-of-Fit Tests on a Circle. Biometrika. 48, 109–114.
Hotelling, H. (1931). The generalization of Student's ratio. Annals of Mathematical Statistics. 2, 360–378.
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Look at sample X #Look at the first digits of the sample signifd(X) #Perform a Chi-squared Test on the sample's first digits using defaults chisq.benftest(X) #p-value = 0.648
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Look at sample X #Look at the first digits of the sample signifd(X) #Perform a Chi-squared Test on the sample's first digits using defaults chisq.benftest(X) #p-value = 0.648
chisq.benftest
takes any numerical vector reduces the sample to the specified number of significant digits and performs Pearson's chi-square goodness-of-fit test to assert if the data conforms to Benford's law.
chisq.benftest(x = NULL, digits = 1, pvalmethod = "asymptotic", pvalsims = 10000)
chisq.benftest(x = NULL, digits = 1, pvalmethod = "asymptotic", pvalsims = 10000)
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
pvalmethod |
Method used for calculating the p-value. Either |
pvalsims |
An integer specifying the number of replicates to use if |
A goodness-of-fit test is performed on
signifd(x,digits)
versus pbenf(digits)
.
Specifically:
where denotes the observed frequency of digits
, and
denotes the expected frequency of digits
.
x
is a numeric vector of arbitrary length.
Values of x
should be continuous, as dictated by theory, but may also be integers.
digits
should be chosen so that signifd(x,digits)
is not influenced by previous rounding.
A list with class "htest
" containing the following components:
statistic |
the value of the |
p.value |
the p-value for the test |
method |
a character string indicating the type of test performed |
data.name |
a character string giving the name of the data |
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Joenssen, D.W. (2013) Two Digit Testing for Benford's Law. Proceedings of the ISI World Statistics Congress, 59th Session in Hong Kong. [available under http://www.statistics.gov.hk/wsc/CPS021-P2-S.pdf]
Pearson, K. (1900) On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it can be Reasonably Supposed to have Arisen from Random Sampling. Philosophical Magazine Series 5. 50, 157–175.
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Chi-squared Test on the sample's #first digits using defaults but determine #the p-value by simulation chisq.benftest(X,pvalmethod ="simulate") #p-value = 0.6401
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Chi-squared Test on the sample's #first digits using defaults but determine #the p-value by simulation chisq.benftest(X,pvalmethod ="simulate") #p-value = 0.6401
edist.benftest
takes any numerical vector reduces the sample to the specified number of significant digits and performs a goodness-of-fit test based on the Euclidean distance between the first digits' distribution and Benford's distribution to assert if the data conforms to Benford's law.
edist.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
edist.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
pvalmethod |
Method used for calculating the p-value. Currently only |
pvalsims |
An integer specifying the number of replicates used if |
A statistical test is performed utilizing the Euclidean distance between signifd(x,digits)
and pbenf(digits)
.
Specifically:
where denotes the observed frequency of digits
, and
denotes the expected frequency of digits
.
x
is a numeric vector of arbitrary length.
Values of x
should be continuous, as dictated by theory, but may also be integers.
digits
should be chosen so that signifd(x,digits)
is not influenced by previous rounding.
A list with class "htest
" containing the following components:
statistic |
the value of the Euclidean distance test statistic |
p.value |
the p-value for the test |
method |
a character string indicating the type of test performed |
data.name |
a character string giving the name of the data |
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Cho, W.K.T. and Gaines, B.J. (2007) Breaking the (Benford) Law: Statistical Fraud Detection in Campaign Finance. The American Statistician. 61, 218–223.
Morrow, J. (2010) Benford's Law, Families of Distributions and a Test Basis. [available under http://www.johnmorrow.info/projects/benford/benfordMain.pdf]
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Euclidean Distance Test on the #sample's first digits using defaults edist.benftest(X,pvalmethod ="simulate") #p-value = 0.6085
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Euclidean Distance Test on the #sample's first digits using defaults edist.benftest(X,pvalmethod ="simulate") #p-value = 0.6085
jointdigit.benftest
takes any numerical vector reduces the sample to the specified number of significant digits and performs a Hotelling T-square type goodness-of-fit test to assert if the data conforms to Benford's law.
jointdigit.benftest(x = NULL, digits = 1, eigenvalues="all", tol = 1e-15, pvalmethod = "asymptotic", pvalsims = 10000)
jointdigit.benftest(x = NULL, digits = 1, eigenvalues="all", tol = 1e-15, pvalmethod = "asymptotic", pvalsims = 10000)
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
eigenvalues |
How are the eigenvalues, which are used in testing, selected. |
tol |
Tolerance in detecting values that are essentially zero. |
pvalmethod |
Method used for calculating the p-value. Currently only |
pvalsims |
An integer specifying the number of replicates used if |
A Hotelling type goodness-of-fit test is performed on
signifd(x,digits)
versus pbenf(digits)
.
x
is a numeric vector of arbitrary length.
argument: eigenvalues
can be defined as:
numeric, a vector containing which eigenvalues should be used
string length = 1, eigenvalue selection scheme:
"all", use all non-zero eigenvalues
"kaiser", use all eigenvalues larger than the mean of all non-zero eigenvalues
Values of x
should be continuous, as dictated by theory, but may also be integers.
digits
should be chosen so that signifd(x,digits)
is not influenced by previous rounding.
A list with class "htest
" containing the following components:
statistic |
the value of the |
p.value |
the p-value for the test |
method |
a character string indicating the type of test performed |
data.name |
a character string giving the name of the data |
eigenvalues_tested |
a vector containing the index numbers of the eigenvalues used in testing. |
eigen_val_vect |
the eigen values and vectors of the null distribution. computed using |
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Hotelling, H. (1931). The generalization of Student's ratio. Annals of Mathematical Statistics. 2, 360–378.
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform Test #on the sample's first digits using defaults jointdigit.benftest(X) #p-value = 0.648 #Perform Test #using only the two largest eigenvalues jointdigit.benftest(x=X,eigenvalues=1:2) #p-value = 0.5176 #Perform Test #using the kaiser selection criterion jointdigit.benftest(x=X,eigenvalues="kaiser") #p-value = 0.682
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform Test #on the sample's first digits using defaults jointdigit.benftest(X) #p-value = 0.648 #Perform Test #using only the two largest eigenvalues jointdigit.benftest(x=X,eigenvalues=1:2) #p-value = 0.5176 #Perform Test #using the kaiser selection criterion jointdigit.benftest(x=X,eigenvalues="kaiser") #p-value = 0.682
jpsq.benftest
takes any numerical vector reduces the sample to the specified number of significant digits and performs a goodness-of-fit test based on the correlation between the first digits' distribution and Benford's distribution to assert if the data conforms to Benford's law.
jpsq.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
jpsq.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
pvalmethod |
Method used for calculating the p-value. Currently only |
pvalsims |
An integer specifying the number of replicates used if |
A statistical test is performed utilizing the sign-preserved squared correlation between signifd(x,digits)
and pbenf(digits)
.
Specifically:
where denotes the observed frequencies and
denotes the expected frequency of digits
.
x
is a numeric vector of arbitrary length.
Values of x
should be continuous, as dictated by theory, but may also be integers.
digits
should be chosen so that signifd(x,digits)
is not influenced by previous rounding.
A list with class "htest
" containing the following components:
statistic |
the value of the |
p.value |
the p-value for the test |
method |
a character string indicating the type of test performed |
data.name |
a character string giving the name of the data |
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Joenssen, D.W. (2013) A New Test for Benford's Distribution. In: Abstract-Proceedings of the 3rd Joint Statistical Meeting DAGStat, March 18-22, 2013; Freiburg, Germany.
Joenssen, D.W. (2013) Two Digit Testing for Benford's Law. Proceedings of the ISI World Statistics Congress, 59th Session in Hong Kong. [available under http://www.statistics.gov.hk/wsc/CPS021-P2-S.pdf]
Shapiro, S.S. and Francia, R.S. (1972) An Approximate Analysis of Variance Test for Normality. Journal of the American Statistical Association. 67, 215–216.
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform Joenssen's \emph{JP-square} Test #on the sample's first digits using defaults jpsq.benftest(X) #p-value = 0.3241
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform Joenssen's \emph{JP-square} Test #on the sample's first digits using defaults jpsq.benftest(X) #p-value = 0.3241
ks.benftest
takes any numerical vector reduces the sample to the specified number of significant digits and performs the Kolmogorov-Smirnov goodness-of-fit test to assert if the data conforms to Benford's law.
ks.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
ks.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
pvalmethod |
Method used for calculating the p-value. Currently only |
pvalsims |
An integer specifying the number of replicates used if |
A Kolmogorov-Smirnov test is performed between signifd(x,digits)
and pbenf(digits)
.
Specifically:
where denotes the observed frequency of digits
, and
denotes the expected frequency of digits
.
x
is a numeric vector of arbitrary length. Values of x
should be continuous, as dictated by theory, but may also be integers.
digits
should be chosen so that signifd(x,digits)
is not influenced by previous rounding.
A list with class "htest
" containing the following components:
statistic |
the value of the Kolmogorov-Smirnov D test statistic |
p.value |
the p-value for the test |
method |
a character string indicating the type of test performed |
data.name |
a character string giving the name of the data |
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Joenssen, D.W. (2013) Two Digit Testing for Benford's Law. Proceedings of the ISI World Statistics Congress, 59th Session in Hong Kong. [available under http://www.statistics.gov.hk/wsc/CPS021-P2-S.pdf]
Kolmogorov, A.N. (1933) Sulla determinazione empirica di una legge di distibuzione. Giornale dell'Istituto Italiano degli Attuari. 4, 83–91.
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Kolmogorov-Smirnov Test on the #sample's first digits using defaults ks.benftest(X) #0.7483
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Kolmogorov-Smirnov Test on the #sample's first digits using defaults ks.benftest(X) #0.7483
mdist.benftest
takes any numerical vector reduces the sample to the specified number of significant digits and performs a goodness-of-fit test based on the Chebyshev distance between the first digits' distribution and Benford's distribution to assert if the data conforms to Benford's law.
mdist.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
mdist.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
pvalmethod |
Method used for calculating the p-value. Currently only |
pvalsims |
An integer specifying the number of replicates used if |
A statistical test is performed utilizing the Chebyshev distance between signifd(x,digits)
and pbenf(digits)
.
Specifically:
where denotes the observed frequency of digits
, and
denotes the expected frequency of digits
.
x
is a numeric vector of arbitrary length.
Values of x
should be continuous, as dictated by theory, but may also be integers.
digits
should be chosen so that signifd(x,digits)
is not influenced by previous rounding.
A list with class "htest
" containing the following components:
statistic |
the value of the Chebyshev distance (maximum norm) test statistic |
p.value |
the p-value for the test |
method |
a character string indicating the type of test performed |
data.name |
a character string giving the name of the data |
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Leemis, L.M., Schmeiser, B.W. and Evans, D.L. (2000) Survival Distributions Satisfying Benford's law. The American Statistician. 54, 236–241.
Morrow, J. (2010) Benford's Law, Families of Distributions and a Test Basis. [available under http://www.johnmorrow.info/projects/benford/benfordMain.pdf]
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Chebyshev Distance Test on the #sample's first digits using defaults mdist.benftest(X) #p-value = 0.6421
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Chebyshev Distance Test on the #sample's first digits using defaults mdist.benftest(X) #p-value = 0.6421
meandigit.benftest
takes any numerical vector reduces the sample to the specified number of significant digits and performs a goodness-of-fit test based on the deviation in means of the first digits' distribution and Benford's distribution to assert if the data conforms to Benford's law.
meandigit.benftest(x = NULL, digits = 1, pvalmethod = "asymptotic", pvalsims = 10000)
meandigit.benftest(x = NULL, digits = 1, pvalmethod = "asymptotic", pvalsims = 10000)
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
pvalmethod |
Method used for calculating the p-value. Either |
pvalsims |
An integer specifying the number of replicates used if |
A statistical test is performed utilizing the deviation between the mean digit of signifd(x,digits)
and pbenf(digits)
.
Specifically:
where is the observed mean of the chosen
number of digits, and
is the expected/true mean value for Benford's predictions.
conforms asymptotically to a truncated normal distribution under the null-hypothesis, i.e.,
x
is a numeric vector of arbitrary length.
Values of x
should be continuous, as dictated by theory, but may also be integers.
digits
should be chosen so that signifd(x,digits)
is not influenced by previous rounding.
A list with class "htest
" containing the following components:
statistic |
the value of the |
p.value |
the p-value for the test |
method |
a character string indicating the type of test performed |
data.name |
a character string giving the name of the data |
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Judge, G. and Schechter, L. (2009) Detecting Problems in Survey Data using Benford's Law. Journal of Human Resources. 44, 1–24.
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Judge-Schechter Mean Deviation Test #on the sample's first digits using defaults meandigit.benftest(X) #p-value = 0.1458
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform a Judge-Schechter Mean Deviation Test #on the sample's first digits using defaults meandigit.benftest(X) #p-value = 0.1458
Returns the complete probability mass function for Benford's distribution for a given number of first digits.
pbenf(digits = 1)
pbenf(digits = 1)
digits |
An integer determining the number of first digits for which the pdf is returned, i.e. 1 for 1:9, 2 for 10:99 etc. |
Benford's distribution has the following probability mass function:
where for any chosen
number of digits.
Returns an object of class "table
" containing the expected density of Benford's distribution for the given number of digits.
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Joenssen, D.W. (2013) Two Digit Testing for Benford's Law. Proceedings of the ISI World Statistics Congress, 59th Session in Hong Kong. [available under http://www.statistics.gov.hk/wsc/CPS021-P2-S.pdf]
#show Benford's predictions for the frequencies of the first digit values pbenf(1)
#show Benford's predictions for the frequencies of the first digit values pbenf(1)
Returns the complete quantile function for Benford's distribution with a given number of first digits.
qbenf(digits = 1)
qbenf(digits = 1)
digits |
An integer determining the number of first digits for which the qdf is returned, i.e. 1 for 1:9, 2 for 10:99 etc. |
Returns an object of class "table
" containing the expected quantile function of Benford's distribution with a given number of digits.
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
qbenf(1) qbenf(1)==cumsum(pbenf(1))
qbenf(1) qbenf(1)==cumsum(pbenf(1))
Returns a random sample with length n
satisfying Benford's law.
rbenf(n)
rbenf(n)
n |
Number of observations. |
This distribution has the density:
Returns a random sample with length n
satisfying Benford's law.
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Look at sample X #should be # [1] 6.159420 1.396476 5.193371 2.064033 7.001284 5.006184 #7.950332 4.822725 3.386809 1.619609 2.080063 2.242473 1.944697 5.460581 #[15] 6.443031 2.662821 2.079283 3.703353 1.364175 3.354136
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Look at sample X #should be # [1] 6.159420 1.396476 5.193371 2.064033 7.001284 5.006184 #7.950332 4.822725 3.386809 1.619609 2.080063 2.242473 1.944697 5.460581 #[15] 6.443031 2.662821 2.079283 3.703353 1.364175 3.354136
Applies the first digits function to each element of a given vector.
signifd(x = NULL, digits = 1)
signifd(x = NULL, digits = 1)
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
The first digits function can be written as:
with being the number of first digits that should be extracted.
x
is a numeric vector of arbitrary length.
Unlike other solutions, this function will work reliably with all real numbers.
Returns a vector of integers the same length as the input vector x
.
Dieter William Joenssen [email protected]
Joenssen, D.W. (2013) Two Digit Testing for Benford's Law. Proceedings of the ISI World Statistics Congress, 59th Session in Hong Kong. [available under http://www.statistics.gov.hk/wsc/CPS021-P2-S.pdf]
chisq.benftest
; ks.benftest
; usq.benftest
; mdist.benftest
; edist.benftest
; meandigit.benftest
; jpsq.benftest
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Look at the first digits of the sample signifd(X) #should be: #[1] 6 1 5 2 7 5 7 4 3 1 2 2 1 5 6 2 2 3 1 3
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Look at the first digits of the sample signifd(X) #should be: #[1] 6 1 5 2 7 5 7 4 3 1 2 2 1 5 6 2 2 3 1 3
signifd.analysis
takes any numerical vector reduces the sample to the specified number of significant digits. The (relative) frequencies are then plotted so that a subjective analysis may be performed.
signifd.analysis(x = NULL, digits = 1, graphical_analysis = TRUE, freq = FALSE, alphas = 20, tick_col = "red", ci_col = "darkgreen", ci_lines = c(.05))
signifd.analysis(x = NULL, digits = 1, graphical_analysis = TRUE, freq = FALSE, alphas = 20, tick_col = "red", ci_col = "darkgreen", ci_lines = c(.05))
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
graphical_analysis |
Boolean value indicating if results should be plotted. |
freq |
Boolean value indicating if absolute frequencies should be used. |
alphas |
Either a vector containing the significance levels([0,1]) that will be shaded, or an integer defining the number of evenly spaced confidence intervals. |
tick_col |
Color code or name that will be passed to " |
ci_col |
Color code or name that will be passed to " |
ci_lines |
Boolean or fractional value(s) indicating significance levels where lines are drawn |
Confidence intervals are calculated from the normal distribution with and
, where i represents the considered digit. Be aware that the normal approximation only holds for "large" n.
A list containing the following components:
summary |
the summary printed below the graph, a matrix of digits, their (relative) frequencies and individual p-values |
CIs |
confidence intervals used for plotting as defined by parameter " |
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Freedman, L.S. (1981) Watson's Un2 Statistic for a Discrete Distribution. Biometrika. 68, 708–711.
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Analyze the first digits using the the defaults signifd.analysis(X) #Turn off plot signifd.analysis(X,graphical_analysis=FALSE) #Use absolute frequencies signifd.analysis(X,graphical_analysis=FALSE,freq=TRUE) #Use five evenly spaced confidence intervals, no lines #alphas is used for shadeing signifd.analysis(X,graphical_analysis=TRUE,alphas=5,freq=TRUE,ci_lines=FALSE) #Use fifty evenly spaced, gray confidence intervals, blue ticks, and lines at #the 1 and 5 percent confidence intervals signifd.analysis(X,graphical_analysis=TRUE,alphas=50,freq=TRUE,tick_col="blue", ci_col="gray",ci_lines=c(.01,.05))
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Analyze the first digits using the the defaults signifd.analysis(X) #Turn off plot signifd.analysis(X,graphical_analysis=FALSE) #Use absolute frequencies signifd.analysis(X,graphical_analysis=FALSE,freq=TRUE) #Use five evenly spaced confidence intervals, no lines #alphas is used for shadeing signifd.analysis(X,graphical_analysis=TRUE,alphas=5,freq=TRUE,ci_lines=FALSE) #Use fifty evenly spaced, gray confidence intervals, blue ticks, and lines at #the 1 and 5 percent confidence intervals signifd.analysis(X,graphical_analysis=TRUE,alphas=50,freq=TRUE,tick_col="blue", ci_col="gray",ci_lines=c(.01,.05))
Returns a vector containing all possible significant digits for a given number of places.
signifd.seq(digits = 1)
signifd.seq(digits = 1)
digits |
An integer determining the number of first digits to be returned, i.e. 1 for 1:9, 2 for 10:99 etc. |
Returns an integer vector.
Dieter William Joenssen [email protected]
signifd.seq(1) seq(from=1,to=9)==signifd.seq(1) signifd.seq(2) seq(from=10,to=99)==signifd.seq(2)
signifd.seq(1) seq(from=1,to=9)==signifd.seq(1) signifd.seq(2) seq(from=10,to=99)==signifd.seq(2)
simulateH0
is a wrapper function that calculates the specified test statistic under the null hypothesis a certain number of times.
simulateH0(teststatistic="chisq", n=10, digits=1, pvalsims=10)
simulateH0(teststatistic="chisq", n=10, digits=1, pvalsims=10)
teststatistic |
Which test statistic should be used: "chisq", "edist", "jpsq", "ks", "mdist", "meandigit", or "usq". |
n |
Sample size of interest. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
pvalsims |
An integer specifying the number of replicates to be used in simulation. |
Wrapper function that directly outputs the distributions of the specified test statistic under the null hypothesis.
A vector of length equal to "pvalsims
".
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Joenssen, D.W. (2013) Two Digit Testing for Benford's Law. Proceedings of the ISI World Statistics Congress, 59th Session in Hong Kong. [available under http://www.statistics.gov.hk/wsc/CPS021-P2-S.pdf]
pbenf
, chisq.benftest
, edist.benftest
, jpsq.benftest
, ks.benftest
, mdist.benftest
, \ meandigit.benftest
, usq.benftest
#Set the random seed to an arbitrary number set.seed(421) #calculate critical value for chisquare test via simulation quantile(simulateH0(teststatistic="chisq", n=100,digits=1,pvalsims=100000),probs=.95) #calculate the "real" critical value qchisq(.95,df=8) #alternatively look at critical values for the jpsq statistic #for different sample sizes (notice the low value for pvalsims) set.seed(421) apply(sapply((1:9)*10,FUN=simulateH0,teststatistic="jpsq", digits=1, pvalsims=100), MARGIN=2,FUN=quantile,probs=.05)
#Set the random seed to an arbitrary number set.seed(421) #calculate critical value for chisquare test via simulation quantile(simulateH0(teststatistic="chisq", n=100,digits=1,pvalsims=100000),probs=.95) #calculate the "real" critical value qchisq(.95,df=8) #alternatively look at critical values for the jpsq statistic #for different sample sizes (notice the low value for pvalsims) set.seed(421) apply(sapply((1:9)*10,FUN=simulateH0,teststatistic="jpsq", digits=1, pvalsims=100), MARGIN=2,FUN=quantile,probs=.05)
usq.benftest
takes any numerical vector reduces the sample to the specified number of significant digits and performs the Freedman-Watson test for discreet distributions between the first digits' distribution and Benford's distribution to assert if the data conforms to Benford's law.
usq.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
usq.benftest(x = NULL, digits = 1, pvalmethod = "simulate", pvalsims = 10000)
x |
A numeric vector. |
digits |
An integer determining the number of first digits to use for testing, i.e. 1 for only the first, 2 for the first two etc. |
pvalmethod |
Method used for calculating the p-value. Currently only |
pvalsims |
An integer specifying the number of replicates used if |
A Freedman-Watson test for discreet distributions is performed between signifd(x,digits)
and pbenf(digits)
.
Specifically:
where denotes the observed frequency of digits
, and
denotes the expected frequency of digits
.
x
is a numeric vector of arbitrary length. Values of x
should be continuous, as dictated by theory, but may also be integers.
digits
should be chosen so that signifd(x,digits)
is not influenced by previous rounding.
A list with class "htest
" containing the following components:
statistic |
the value of the |
p.value |
the p-value for the test |
method |
a character string indicating the type of test performed |
data.name |
a character string giving the name of the data |
Dieter William Joenssen [email protected]
Benford, F. (1938) The Law of Anomalous Numbers. Proceedings of the American Philosophical Society. 78, 551–572.
Freedman, L.S. (1981) Watson's Un2 Statistic for a Discrete Distribution. Biometrika. 68, 708–711.
Joenssen, D.W. (2013) Two Digit Testing for Benford's Law. Proceedings of the ISI World Statistics Congress, 59th Session in Hong Kong. [available under http://www.statistics.gov.hk/wsc/CPS021-P2-S.pdf]
Watson, G.S. (1961) Goodness-of-Fit Tests on a Circle. Biometrika. 48, 109–114.
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform Freedman-Watson U-squared Test on #the sample's first digits using defaults usq.benftest(X) #p-value = 0.4847
#Set the random seed to an arbitrary number set.seed(421) #Create a sample satisfying Benford's law X<-rbenf(n=20) #Perform Freedman-Watson U-squared Test on #the sample's first digits using defaults usq.benftest(X) #p-value = 0.4847