Title: | Generalized Linear Density Ratio Models |
---|---|
Description: | Fits a generalized linear density ratio model (GLDRM). A GLDRM is a semiparametric generalized linear model. In contrast to a GLM, which assumes a particular exponential family distribution, the GLDRM uses a semiparametric likelihood to estimate the reference distribution. The reference distribution may be any discrete, continuous, or mixed exponential family distribution. The model parameters, which include both the regression coefficients and the cdf of the unspecified reference distribution, are estimated by maximizing a semiparametric likelihood. Regression coefficients are estimated with no loss of efficiency, i.e. the asymptotic variance is the same as if the true exponential family distribution were known. Huang (2014) <doi:10.1080/01621459.2013.824892>. Huang and Rathouz (2012) <doi:10.1093/biomet/asr075>. Rathouz and Gao (2008) <doi:10.1093/biostatistics/kxn030>. |
Authors: | Michael Wurm [aut, cre], Paul Rathouz [aut] |
Maintainer: | Michael Wurm <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.6 |
Built: | 2024-12-04 07:30:38 UTC |
Source: | CRAN |
update algorithmThis function returns control arguments for the update algorithm.
Each argument has a default value, which will be used unless a different
value is provided by the user.
beta.control(eps = 1e-10, maxiter = 1, maxhalf = 10)
beta.control(eps = 1e-10, maxiter = 1, maxhalf = 10)
eps |
Convergence threshold. The update has converged when the relative
change in log-likelihood between iterations is less than |
maxiter |
Maximum number of iterations allowed. |
maxhalf |
Maximum number of half steps allowed per iteration if log-likelihood does not improve. |
Object of S3 class "betaControl", which is a list of control arguments.
This function returns control arguments for the update algorithm.
Each argument has a default value, which will be used unless a different
value is provided by the user.
f0.control(eps = 1e-10, maxiter = 1000, maxhalf = 20, maxlogstep = 2)
f0.control(eps = 1e-10, maxiter = 1000, maxhalf = 20, maxlogstep = 2)
eps |
Convergence threshold. The update has converged when the relative
change in log-likelihood between iterations is less than |
maxiter |
Maximum number of iterations allowed. |
maxhalf |
Maximum number of half steps allowed per iteration if log-likelihood does not improve between iterations. |
maxlogstep |
Maximum optimization step size allowed on the
|
Object of S3 class "f0Control", which is a list of control arguments.
A GLDRM is a semiparametric generalized linear model. In contrast to a GLM, which assumes a particular exponential family distribution, the GLDRM uses a semiparametric likelihood to estimate the reference distribution. The reference distribution may be any discrete, continuous, or mixed exponential family distribution. The model parameters, which include both the regression coefficients and the cdf of the unspecified reference distribution, are estimated by maximizing a semiparametric likelihood. Regression coefficients are estimated with no loss of efficiency, i.e. the asymptotic variance is the same as if the true exponential family distribution were known.
gldrm( formula, data = NULL, link = "identity", mu0 = NULL, offset = NULL, gldrmControl = gldrm.control(), thetaControl = theta.control(), betaControl = beta.control(), f0Control = f0.control() )
gldrm( formula, data = NULL, link = "identity", mu0 = NULL, offset = NULL, gldrmControl = gldrm.control(), thetaControl = theta.control(), betaControl = beta.control(), f0Control = f0.control() )
formula |
An object of class "formula". |
data |
An optional data frame containing the variables in the model. |
link |
Link function. Can be a character string to be passed to the
|
mu0 |
Mean of the reference distribution. The reference distribution is
not unique unless its mean is restricted to a specific value. This value can
be any number within the range of observed values, but values near the boundary
may cause numerical instability. This is an optional argument with |
offset |
Known component of the linear term. Offset must be passed through
this argument - offset terms in the formula will be ignored.
value and covariate values. If sampling weights are a function of both the
response value and covariates, then |
gldrmControl |
Optional control arguments.
Passed as an object of class "gldrmControl", which is constructed by the
|
thetaControl |
Optional control arguments for the theta update procedure.
Passed as an object of class "thetaControl", which is constructed by the
|
betaControl |
Optional control arguments for the beta update procedure.
Passed as an object of class "betaControl", which is constructed by the
|
f0Control |
Optional control arguments for the |
The arguments linkfun
, linkinv
, and mu.eta
mirror the "link-glm" class. Objects of this class can be created with the
stats::make.link
function.
The "gldrm" class is a list of the following items.
conv
Logical indicator for whether the gldrm algorithm
converged within the iteration limit.
iter
Number of iterations used. A single iteration is a beta
update, followed by an f0
update.
llik
Semiparametric log-likelihood of the fitted model.
beta
Vector containing the regression coefficient estimates.
mu
Vector containing the estimated mean response value for each
observation in the training data.
eta
Vector containing the estimated linear combination of
covariates for each observation.
f0
Vector containing the semiparametric estimate of the reference
distribution, evaluated at the observed response values. The values of correspond
to the support values, sorted in increasing order.
spt
Vector containing the unique observed response values, sorted in
increasing order.
mu0
Mean of the estimated semiparametric reference distribution.
The mean of the reference distribution must be fixed at a value in order for
the model to be identifiable. It can be fixed at any value within the range
of observed response values, but the gldrm
function assigns mu0
to be the mean of the observed response values.
varbeta
Estimated variance matrix of the regression coefficients.
seBeta
Standard errors for . Equal to
sqrt(diag(varbeta))
.
seMu
Standard errors for computed from
varbeta
.
seEta
Standard errors for computed from
varbeta
.
theta
Vector containing the estimated tilt parameter for each observation.
The tilted density function of the response variable is given by
bPrime
is a vector containing the mean of the tilted distribution,
, for each observation.
bPrime
should match mu
,
except in cases where theta
is capped for numerical stability.
bPrime2
is a vector containing the variance of the tilted
distribution, , for each observation.
fTilt
is a vector containing the semiparametric fitted probability,
, for each observation. The semiparametric
log-likelihood is equal to
sampprobs
If sampling probabilities were passed through the
sampprobs
argument, then they are returned here in matrix form.
Each row corresponds to an observation.
llikNull
Log-likelihood of the null model with no covariates.
lr.stat
Likelihood ratio test statistic comparing fitted model to
the null model. It is calculated as .
The asymptotic distribution is F(p-1, n-p) under the null hypothesis.
lr.pval
P-value of the likelihood ratio statistic.
fTiltMatrix
is a matrix containing the semiparametric density for
each observation, i.e. for each unique
y
value.
This is a matrix with nrow equal to the number of observations and ncol equal
to the number of unique response values observed.
Only returned if returnfTilt = TRUE
in the gldrmControl arguments.
score.logf0
Score function for log(f0)
.
Only returned if returnf0ScoreInfo = TRUE
in the gldrmControl arguments.
info.logf0
Information matrix for log(f0)
.
Only returned if returnf0ScoreInfo = TRUE
in the gldrmControl arguments.
formula
Model formula.
data
Model data frame.
link
Link function. If a character string was passed to the
link
argument, then this will be an object of class "link-glm".
Otherwise, it will be the list of three functions passed to the link
argument.
An S3 object of class "gldrm". See details.
data(iris, package="datasets") # Fit a gldrm with log link fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link="log") fit # Fit a gldrm with custom link function link <- list() link$linkfun <- function(mu) log(mu)^3 link$linkinv <- function(eta) exp(eta^(1/3)) link$mu.eta <- function(eta) exp(eta^(1/3)) * 1/3 * eta^(-2/3) fit2 <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link=link) fit2
data(iris, package="datasets") # Fit a gldrm with log link fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link="log") fit # Fit a gldrm with custom link function link <- list() link$linkfun <- function(mu) log(mu)^3 link$linkinv <- function(eta) exp(eta^(1/3)) link$mu.eta <- function(eta) exp(eta^(1/3)) * 1/3 * eta^(-2/3) fit2 <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link=link) fit2
gldrm
algorithmThis function returns control arguments for the gldrm
algorithm.
Each argument has a default value, which will be used unless a different
value is provided by the user.
gldrm.control( eps = 1e-10, maxiter = 100, returnfTiltMatrix = TRUE, returnf0ScoreInfo = FALSE, print = FALSE, betaStart = NULL, f0Start = NULL )
gldrm.control( eps = 1e-10, maxiter = 100, returnfTiltMatrix = TRUE, returnf0ScoreInfo = FALSE, print = FALSE, betaStart = NULL, f0Start = NULL )
eps |
Convergence threshold. The fitting algorithm has converged when the
relative change in log-likelihood between iterations is less than |
maxiter |
Maximum number of iterations allowed. |
returnfTiltMatrix |
Logical. Return nonparametric fitted probabilities for each observation. This is a matrix with nrow equal to the number of observations and ncol equal to the number of unique response values observed. |
returnf0ScoreInfo |
Logical. If |
print |
Logical. If |
betaStart |
Optional vector of starting values for |
f0Start |
Optional vector of starting values for |
Object of S3 class "gldrmControl", which is a list of control arguments.
Calculates a Wald, likelihood ratio, or score confidence interval for a single gldrm coefficient. Also calculates upper or lower confidence bounds. Wald confidence intervals and bounds are calculated from the standard errors which are available from the gldrm model fit. For likelihood ratio and score intervals and bounds, a bisection search method is used, which takes longer to run.
gldrmCI( gldrmFit, term, test = c("Wald", "LRT", "Score"), level = 0.95, type = c("2-sided", "lb", "ub"), eps = 1e-10, maxiter = 100 )
gldrmCI( gldrmFit, term, test = c("Wald", "LRT", "Score"), level = 0.95, type = c("2-sided", "lb", "ub"), eps = 1e-10, maxiter = 100 )
gldrmFit |
A gldrm model fit. Must be an S3 object of class "gldrm",
returned from the |
term |
Character string containing the name of the coefficient of interest. The coefficient names are the names of the beta component of the fitted model object. They can also be obtained from the printed model output. Usually the names match the formula syntax, but can be more complicated for categorical variables and interaction terms. |
test |
Character string for the type confidence interval. Options are "Wald", "LRT" (for likelihood ratio), and "Score". |
level |
Confidence level of the interval. Should be between zero and one. |
type |
Character string containing "2-sided" for a two-sided confidence interval, "lb" for a lower bound, or "ub" for an upper bound. |
eps |
Convergence threshold. Only applies for
|
maxiter |
The maximum number of bisection method iterations for likelihood
ratio intervals or bounds. For two-sided intervals, |
An S3 object of class 'gldrmCI', which is a list of the following items.
term
Coefficient name.
test
Type of interval or bound - Wald or likelihood ratio.
level
Confidence level.
type
Type of interval or bound - two-sided, upper bound, or lower
bound.
cilo
/cihi
Upper and lower interval bounds. One one of the
two applies for confidence bounds.
iterlo
/iterhi
Number of bisection iterations used. Only
applies for likelihood ratio intervals and bounds.
pvallo
/pvalhi
For likelihood ratio intervals and bounds,
the p-value at convergence is reported.
conv
Indicator for whether the confidence interval limit or bound
converged.
data(iris, package="datasets") ### Fit gldrm with all variables fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link="log") ### Wald 95% confidence interval for Sepal.Width ci <- gldrmCI(fit, "Sepal.Width", test="Wald", level=.95, type="2-sided") ci
data(iris, package="datasets") ### Fit gldrm with all variables fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link="log") ### Wald 95% confidence interval for Sepal.Width ci <- gldrmCI(fit, "Sepal.Width", test="Wald", level=.95, type="2-sided") ci
Performs a likelihood ratio F-test between nested gldrm models.
The F-statistic is calculated as , where
is the difference is the number of parameters between the full and null
models. The F-statistic has degrees of freedom
and
, where
is the number of observations and
is the number of parameters
in the full model.
gldrmLRT(gldrmFit, gldrmNull)
gldrmLRT(gldrmFit, gldrmNull)
gldrmFit |
The full model. Must be an object of S3 class 'gldrm' returned from
the |
gldrmNull |
The sub-model being tested under the null hypotheses.
Must be an object of S3 class 'gldrm' returned from the |
An S3 object of class 'gldrmLRT', containing numerator and denominator degrees of freedom, an F-statistic, and a p-value.
data(iris, package="datasets") ### Fit gldrm with all variables fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link="log") ### Fit gldrm without the categorical variable "Species" fit0 <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data=iris, link="log") ### Likelihood ratio test for the nested models lrt <- gldrmLRT(fit, fit0) lrt
data(iris, package="datasets") ### Fit gldrm with all variables fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link="log") ### Fit gldrm without the categorical variable "Species" fit0 <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data=iris, link="log") ### Likelihood ratio test for the nested models lrt <- gldrmLRT(fit, fit0) lrt
Plots and returns the randomized probability inverse transform of a fitted gldrm.
gldrmPIT( gldrmFit, nbreaks = 7, cex.main = NULL, cex.lab = NULL, cex.axis = NULL )
gldrmPIT( gldrmFit, nbreaks = 7, cex.main = NULL, cex.lab = NULL, cex.axis = NULL )
gldrmFit |
A gldrm model fit. Must be an S3 object of class "gldrm",
returned from the |
nbreaks |
Number of breaks in the histogram. |
cex.main |
Text size for main titles. |
cex.lab |
Text size for axis labels. |
cex.axis |
Text size for axis numbers. |
The probability inverse transform is defined generally as ,
which is the fitted conditional cdf of each observation evaluated at the
observed response value. In the case of gldrm, the fitted cdf is descrete, so
we draw a random value from a uniform distribution on the interval
(
,
), where
is the next largest
observed support less than
(or -Infinity if
is the minimum
support value). The output and plots generated by this function will vary
slightly each time it is called (unless the random number generator seed is
set beforehand).
Randomized robability inverse transform as a vector. Also plots the histogram and uniform QQ plot.
data(iris, package="datasets") ### Fit gldrm and return fTiltMatrix fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link="log") # Probability inverse transform plot gldrmPIT(fit)
data(iris, package="datasets") ### Fit gldrm and return fTiltMatrix fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris, link="log") # Probability inverse transform plot gldrmPIT(fit)
Obtains predicted probabilities, predicted class, or linear predictors.
## S3 method for class 'gldrm' predict( object, newdata = NULL, type = c("link", "response", "terms", "fTilt"), se.fit = FALSE, offset = NULL, ... )
## S3 method for class 'gldrm' predict( object, newdata = NULL, type = c("link", "response", "terms", "fTilt"), se.fit = FALSE, offset = NULL, ... )
object |
S3 object of class "gldrm", returned from the |
newdata |
Optional data frame. If NULL, fitted values will be obtained for the training data. |
type |
The type of prediction required. Type "link" returns the linear
predictor. Type "response" returns the fitted mean. Type "terms" returns
a matrix giving the fitted values of each term in the model formula on the
linear predictor scale. Type "fTilt" returns a matrix containing the
fitted nonparametric distribution for each observation. Each row of the matrix
corresponds to an observation in |
se.fit |
Logical. If TRUE, standard errors are also returned. Does not apply
for |
offset |
Optional offset vector. Only used if |
... |
Not used. Additional predict arguments. |
The object returned depends on type
.
Prints fitted coefficients and standard errors, along with a likelihood ratio test against the null model.
## S3 method for class 'gldrm' print(x, digits = 3, ...)
## S3 method for class 'gldrm' print(x, digits = 3, ...)
x |
S3 object of class "gldrm", returned from the |
digits |
Number of digits for rounding. |
... |
Unused. Additional arguments for print method. |
Print method for gldrmCI objects.
## S3 method for class 'gldrmCI' print(x, digits = 3, ...)
## S3 method for class 'gldrmCI' print(x, digits = 3, ...)
x |
An S3 object of class 'gldrmCI'. |
digits |
Number of digits for rounding. |
... |
Not used. Additional arguments for print method. |
Print method for gldrmLRT objects. Prints results of a likelihood ratio F-test between nested models.
## S3 method for class 'gldrmLRT' print(x, digits = 3, ...)
## S3 method for class 'gldrmLRT' print(x, digits = 3, ...)
x |
S3 object of class 'gldrmLRT', returned from the |
digits |
Number of digits for rounding. |
... |
Not used. Additional arguments for print method. |
update algorithmThis function returns control arguments for the update algorithm.
Each argument has a default value, which will be used unless a different
value is provided by the user.
theta.control( eps = 1e-10, maxiter = 100, maxhalf = 20, maxtheta = 500, logit = TRUE, logsumexp = FALSE )
theta.control( eps = 1e-10, maxiter = 100, maxhalf = 20, maxtheta = 500, logit = TRUE, logsumexp = FALSE )
eps |
Convergence threshold for theta updates. Convergence is
evaluated separately for each observation. An observation has converged when
the difference between |
maxiter |
Maximum number of iterations. |
maxhalf |
Maximum number of half steps allowed per iteration if the convergence criterion does not improve. |
maxtheta |
Absolute value of theta is not allowed to exceed |
logit |
Logical for whether logit transformation should be used. Use of this stabilizing transformation appears to be faster in general. Default is TRUE. |
logsumexp |
Logical argument for whether log-sum-exp trick should be used. This may improve numerical stability at the expense of computational time. |
Object of S3 class "thetaControl", which is a list of control arguments.