Title: | Parameter Inference and Optimal Designs for Grouped and/or Right-Censored Count Data |
---|---|
Description: | We implement two main functions. The first function uses a given grouped and/or right-censored grouping scheme and empirical data to infer parameters, and implements chi-square goodness-of-fit tests. The second function searches for the global optimal grouping scheme of grouped and/or right-censored count responses in surveys. |
Authors: | Xin Guo <[email protected]>, Qiang Fu <[email protected]> |
Maintainer: | Xin Guo <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0 |
Built: | 2024-10-21 06:22:34 UTC |
Source: | CRAN |
This package consists of two main functions: The first function uses a given grouped and/or right-censored grouping scheme and empirical data to infer parameters, and implements chi-square goodness-of-fit tests; The second function searches for the global optimal grouping scheme of grouped and/or right-censored count responses in surveys.
This R package is designed to implement methods and algorithms developed in the following papers and please cite these articles at your convenience:
Qiang Fu, Xin Guo and Kenneth C. Land. Forthcoming. "A Poisson-Multinomial Mixture Approach to Grouped and Right-Censored Counts." Communications in Statistics – Theory and Methods. DOI: 10.1080/03610926.2017.1303736 (mainly about the first function for aggregate-level parameter inference)
Qiang Fu, Xin Guo and Kenneth C. Land. Conditionally accepted. "Optimizing Count Responses in Surveys: A Machine-Learning Approach." Sociological Methods & Research. (mainly about the second function for finding optimal grouping schemes)
To install the package "GRCdata_1.0.tar.gz", one may place this file in the working directory/folder of R, and type
install.packages("GRCdata", repos = NULL, type = "source")
To check the current working directory of R, one may type
getwd()
To see the source code, one could extract the package “GRCdata_1.0.tar.gz”.
There would be two directories/folders: man
and R
. The source code
is under the R
directory/folder.
Package: | GRCdata |
Type: | Package |
Version: | 1.0 |
Date: | July 28, 2017 |
License: | GPLv3 |
Authors: Xin Guo <[email protected]>, Qiang Fu <[email protected]>
Maintainers: Xin Guo <[email protected]>
Qiang Fu, Xin Guo and Kenneth C. Land. Conditionally accepted. "Optimizing Count Responses in Surveys: A Machine-Learning Approach." Sociological Methods & Research.
Qiang Fu, Xin Guo and Kenneth C. Land. Forthcoming. "A Poisson-Multinomial Mixture Approach to Grouped and Right-Censored Counts." Communications in Statistics – Theory and Methods. DOI: 10.1080/03610926.2017.1303736
Given the prior distribution (or values) of parameters, and the total/maximum number of groups (N) allowed for grouping schemes, this function finds the global optimal grouping scheme that makes the sampling process most informative.
find.scheme(N, densityFUN, lambda.lwr, lambda.upr, p.lwr, p.upr, probs, lambdas, ps, is.0.isolated = TRUE, model = c("Poisson", "ZIP"), matSc = c("A", "D", "E"), M = "auto")
find.scheme(N, densityFUN, lambda.lwr, lambda.upr, p.lwr, p.upr, probs, lambdas, ps, is.0.isolated = TRUE, model = c("Poisson", "ZIP"), matSc = c("A", "D", "E"), M = "auto")
N |
(maximum) number of groups allowed for all grouping schemes. A non-integral value will be coerced to an integer. |
densityFUN , lambda.lwr , lambda.upr , p.lwr , p.upr
|
prior information of parameters in a continuous form.
These parameters denote the prior probability density function (optional),
the lower bound of |
probs , lambdas , ps
|
prior information of the parameters in a discrete form.
These parameters are vectors denoting the mass probabilities, the corresponding values of |
is.0.isolated |
a logical value indicating whether zero is contained and only contained in a single group. |
model |
underlying Poisson models to be used for optimal designs: |
matSc |
A character indicating types of optimality functions of the Fisher information (matrix).
It must be one from the three letters:
|
M |
a sufficiently large integer needed to facilitate the search, or a character |
This function tries to find the N-group scheme maximizing Fisher information (matrix).
If model
is specified as Poisson, p.lwr
or p.upr
will be ignored.
When the prior distribution is discrete, lambdas
specify discrete values that may take,
and
probs
specify probabilities associated with p
.
In the ZIP model, lambdas
and ps
specify discrete values that and p may take, respectively.
probs
denotes joint mass probabilities associated with (, p). The values of
(p.lwr, p.upr) cannot be (0, 1) as the algorithm will not converge. Instead, approximate values, such as
(0.000001, 0.999999), can be used.
A sufficiently large integer M
should be provided by the user so that infinitely many grouping schemes
could be handled by the search algorithm. M
is in theory the lowest integer to be contained in the last
right-censored group of the global optimal grouping scheme. In practice, the choice of M
should be slightly higher than
its theoretical value because the search algorithem is designed in a way that it prevents any acceptance of a false optimal
solution at the cost of tolerating false rejection of the correct optimal grouping scheme. This idea is implemented by
a logical indicator succeed
in the output. Its value will be TRUE
if the real optimal grouping scheme is identified.
Otherwise, a FALSE
output means that M
is not large enough to gurantee that the grouping scheme yielded by
the search algorithm is the global optimal grouping scheme. Researchers then need to select a larger M
and repeat this
process until the logical indicator succeed
becomes TRUE
. Alternatively, users may use the "auto"
option so that
this iterative process will be automatically implemented.
The returned value is a list with components.
best.scheme.compact , best.scheme.loose , best.scheme.innerCode
|
the same optimal grouping scheme is printed in various forms. |
succeed |
see Details. This is a logical variable. The global optimal grouping scheme is obtained if it is |
Xin Guo <[email protected]>, Qiang Fu <[email protected]>
Qiang Fu, Xin Guo and Kenneth C. Land. Conditionally accepted. "Optimizing Count Responses in Surveys: A Machine-Learning Approach." Sociological Methods & Research.
# Example 1 #################################### # M=7, N=3, 0 is not required to be contained # in a separate group of grouping schemes. # Poisson model, lambda takes 4 and 5 and each value has a probability of 0.5. find.scheme(probs = c(0.5, 0.5), lambdas = c(4,5), M = 7, N = 3, is.0.isolated = FALSE, model = "Poisson") # Example 2 #################################### # N=3, 0 is required to be contained in a separate group of grouping schemes. # Poisson model, lambda takes 4 and 5 and each value has a probability of 0.5. # M is not given, so it will be selected automatically. find.scheme(probs = c(0.5, 0.5), lambdas = c(4,5), N = 3, is.0.isolated = TRUE, model = "Poisson") # Example 3 #################################### # M=7, N=3, 0 is not required to be contained in a separate group. # ZIP model, (lambda, p) take (4, 0.3) and (5, 0.4) # with their probabilities denoted by c(0.5, 0.5) find.scheme(probs = c(0.5, 0.5), lambdas = c(4,5), ps = c(0.3, 0.5), M = 7, N = 3, is.0.isolated = FALSE, model = "ZIP") # Example 4 #################################### # N=3, 0 is not required to be contained in a separate group. # Poisson model, lambda takes a normal distribution truncated to [1, 10] # M is not given, so it will be selected automatically. find.scheme(densityFUN = function(lambda) dnorm(lambda, mean = 3, sd = 1), lambda.lwr = 1, lambda.upr = 10, N = 3, is.0.isolated = FALSE, model = "Poisson") # Example 5 #################################### # M=7, N=3, 0 is required to be contained in a separate group. # Poisson model, lambda takes a normal distribution truncated to [1, 10] find.scheme(densityFUN = function(lambda) dnorm(lambda, mean = 3, sd = 1), lambda.lwr = 1, lambda.upr = 10, M = 7, N = 3, is.0.isolated = TRUE, model = "Poisson") # Example 6 #################################### # N=3, 0 is required to be contained in a separate group. # Poisson model, lambda takes an uniform distribution on [1, 10] # M is not given, so it will be selected automatically. find.scheme(densityFUN = function(lambda) dunif(lambda, min = 1, max = 10), lambda.lwr = 1, lambda.upr = 10, N = 3, is.0.isolated = TRUE, model = "Poisson") # Example 7 ################################# # M=7, N=3, 0 is required to be contained in a separate group. # ZIP model, (lambda, p) has an uniform distribution with # lambda on [1,10] and p on [0.1, 0.9] find.scheme(densityFUN = function(...) 1, lambda.lwr = 1, lambda.upr = 10, p.lwr = 0.0001, p.upr = 0.9999, M = 7, N = 3, is.0.isolated = TRUE, model = "ZIP") # Example 8 #################################### # M=7, N=3, 0 is required to be contained in a separate group. # ZIP model, (lambda, p) has a normal distribution centered # at (5.5, 0.5) with a covariance matrix showing their correlation # / \ # | 11/3 3 | # | 3 11/3 | # \ /. # This normal distribution is also truncated to # [1, 10] X [0.1, 0.9] # Note: this example may take several minutes to converge, # depending on your computer configuration. dsty <- function(lambda, p){ vec <- c(lambda - 5.5, p - 0.5) mat <- matrix(c(11/3,3,3,11/3), nrow = 2, ncol = 2) pw <- -0.5 * sum(vec * solve(mat, vec)) return(exp(pw)) } find.scheme(densityFUN = dsty, lambda.lwr = 1, lambda.upr = 10, p.lwr = 0.1, p.upr = 0.9, M = 7, N = 3, is.0.isolated = TRUE, model = "ZIP")
# Example 1 #################################### # M=7, N=3, 0 is not required to be contained # in a separate group of grouping schemes. # Poisson model, lambda takes 4 and 5 and each value has a probability of 0.5. find.scheme(probs = c(0.5, 0.5), lambdas = c(4,5), M = 7, N = 3, is.0.isolated = FALSE, model = "Poisson") # Example 2 #################################### # N=3, 0 is required to be contained in a separate group of grouping schemes. # Poisson model, lambda takes 4 and 5 and each value has a probability of 0.5. # M is not given, so it will be selected automatically. find.scheme(probs = c(0.5, 0.5), lambdas = c(4,5), N = 3, is.0.isolated = TRUE, model = "Poisson") # Example 3 #################################### # M=7, N=3, 0 is not required to be contained in a separate group. # ZIP model, (lambda, p) take (4, 0.3) and (5, 0.4) # with their probabilities denoted by c(0.5, 0.5) find.scheme(probs = c(0.5, 0.5), lambdas = c(4,5), ps = c(0.3, 0.5), M = 7, N = 3, is.0.isolated = FALSE, model = "ZIP") # Example 4 #################################### # N=3, 0 is not required to be contained in a separate group. # Poisson model, lambda takes a normal distribution truncated to [1, 10] # M is not given, so it will be selected automatically. find.scheme(densityFUN = function(lambda) dnorm(lambda, mean = 3, sd = 1), lambda.lwr = 1, lambda.upr = 10, N = 3, is.0.isolated = FALSE, model = "Poisson") # Example 5 #################################### # M=7, N=3, 0 is required to be contained in a separate group. # Poisson model, lambda takes a normal distribution truncated to [1, 10] find.scheme(densityFUN = function(lambda) dnorm(lambda, mean = 3, sd = 1), lambda.lwr = 1, lambda.upr = 10, M = 7, N = 3, is.0.isolated = TRUE, model = "Poisson") # Example 6 #################################### # N=3, 0 is required to be contained in a separate group. # Poisson model, lambda takes an uniform distribution on [1, 10] # M is not given, so it will be selected automatically. find.scheme(densityFUN = function(lambda) dunif(lambda, min = 1, max = 10), lambda.lwr = 1, lambda.upr = 10, N = 3, is.0.isolated = TRUE, model = "Poisson") # Example 7 ################################# # M=7, N=3, 0 is required to be contained in a separate group. # ZIP model, (lambda, p) has an uniform distribution with # lambda on [1,10] and p on [0.1, 0.9] find.scheme(densityFUN = function(...) 1, lambda.lwr = 1, lambda.upr = 10, p.lwr = 0.0001, p.upr = 0.9999, M = 7, N = 3, is.0.isolated = TRUE, model = "ZIP") # Example 8 #################################### # M=7, N=3, 0 is required to be contained in a separate group. # ZIP model, (lambda, p) has a normal distribution centered # at (5.5, 0.5) with a covariance matrix showing their correlation # / \ # | 11/3 3 | # | 3 11/3 | # \ /. # This normal distribution is also truncated to # [1, 10] X [0.1, 0.9] # Note: this example may take several minutes to converge, # depending on your computer configuration. dsty <- function(lambda, p){ vec <- c(lambda - 5.5, p - 0.5) mat <- matrix(c(11/3,3,3,11/3), nrow = 2, ncol = 2) pw <- -0.5 * sum(vec * solve(mat, vec)) return(exp(pw)) } find.scheme(densityFUN = dsty, lambda.lwr = 1, lambda.upr = 10, p.lwr = 0.1, p.upr = 0.9, M = 7, N = 3, is.0.isolated = TRUE, model = "ZIP")
This function infers Poisson or zero-inflated Poisson (ZIP) parameters from grouped and right-censored count data, and conducts a chi-squared goodness-of-fit test. A grouped and right-censored scheme may look like
0, 1, 2--4, 5--8, 9+.
For grouped and right-censored count data collected in a survey, such as frequency of alcohol drinking, number of births or occurrence of crimes, the response category designed as the example above means never, once, 2 to 4 times, 5 to 8 times, 9 times and more. The frequency distribution from a sample corresponding to the example above may look like
3, 15, 168, 155, 15
.
grcmle(counts, scheme, method = c("Poisson", "ZIP"), do.plot = T, init.guess = NULL, optimizing.algorithm.index = 2, lambda.extend.ratio = 3, conf.level = 0.95)
grcmle(counts, scheme, method = c("Poisson", "ZIP"), do.plot = T, init.guess = NULL, optimizing.algorithm.index = 2, lambda.extend.ratio = 3, conf.level = 0.95)
counts |
specifies the frequency distribution of the grouped and right-censored count data. For the example above, one may input
|
scheme |
specifies the grouping scheme. It should be a vector of integers containing the starting point (or the lowest integer) of each group. For example, to input the scheme above
one may use
|
method |
a string parameter specifies which statistical model to use. Currently there are two options
|
do.plot |
a logical variable indicating whether or not to plot the log likelihood.
The default is |
init.guess |
the initial value used for the optimization procedure of the likelihood estimation. The default value
is |
optimizing.algorithm.index |
defines which optimization algorithm to use. Currently the possible values are
For details of these algorithms, please see the manual of the R package |
lambda.extend.ratio |
specifies the searching interval of possible
|
conf.level |
confidence level of the confidence interval(s) for the parameter(s) inferred |
Maximum likelihood estimation is used for the inference.
The returned value is a list containing
mle |
the parameter(s) inferred. For Poisson model, it is the estimate of |
p.value |
the p-value of the chi-squared test of goodness-of-fit. |
df |
the degree(s) of freedom of the chi-squared test of goodness-of-fit. |
CI.lambda |
the confidence interval of |
CI.p |
the confidence interval of |
conf.level |
the confidence level |
std.err |
the standard error of |
Authors: Xin Guo <[email protected]>, Qiang Fu <[email protected]>
grcmle(counts=c(6, 15, 168, 155, 15), scheme = c(0, 1, 2, 5, 9)) grcmle(counts=c(6, 15, 168, 155, 15), scheme = c(0, 1, 2, 5, 9), method = "ZIP")
grcmle(counts=c(6, 15, 168, 155, 15), scheme = c(0, 1, 2, 5, 9)) grcmle(counts=c(6, 15, 168, 155, 15), scheme = c(0, 1, 2, 5, 9), method = "ZIP")