Title: | Robust Generalized Linear Models (GLM) using Mixtures |
---|---|
Description: | Robust generalized linear models (GLM) using a mixture method, as described in Beath (2018) <doi:10.1080/02664763.2017.1414164>. This assumes that the data are a mixture of standard observations, being a generalised linear model, and outlier observations from an overdispersed generalized linear model. The overdispersed linear model is obtained by including a normally distributed random effect in the linear predictor of the generalized linear model. |
Authors: | Ken Beath [aut, cre] |
Maintainer: | Ken Beath <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2-4 |
Built: | 2024-10-28 06:38:33 UTC |
Source: | CRAN |
Robust generalized linear models (GLM) using a mixture method, as described in Beath (2018) <doi:10.1080/02664763.2017.1414164>.
This is the main function that allows fitting the models. The robmixglm objects may be tested for outliers using outlierTest. The results of test.outliers may also be plotted.
Ken Beath <[email protected]>
Beath, K. J. A mixture-based approach to robust analysis of generalised linear models, Journal of Applied Statistics, 45(12), 2256-2268 (2018) DOI: 10.1080/02664763.2017.1414164
# for the following cores is set to 1 to satisfy the CRAN testing requirements # removing will reduce the time taken depending on number of cores available # animal brain vs body weight library(MASS) data(Animals) Animals$logbrain <- log(Animals$brain) Animals$logbody <- log(Animals$body) lm1 <- lm(logbrain~logbody, data = Animals) lm2 <- robmixglm(logbrain~logbody, data = Animals, cores = 1) plot(Animals$logbody, Animals$logbrain) abline(lm1, col = "red") abline(lm2, col = "green") plot(outlierProbs(lm2)) outlierTest(lm2, cores = 1) # Forbes data on relationship between atmospheric pressure and boiling point of water library(MASS) data(forbes) forbes.robustmix <- robmixglm(100*log10(pres)~bp, data = MASS::forbes, cores = 1) summary(forbes.robustmix) plot(outlierProbs(forbes.robustmix)) outlierTest(forbes.robustmix, cores = 1) # diabetes diabdata.robustmix <- robmixglm(glyhb~age+gender+bmi+waisthip+frame, data = diabdata, cores = 1) summary(diabdata.robustmix) # this will take about 5-10 minutes diabdata.step <- step(diabdata.robustmix, glyhb~age+gender+bmi+waisthip+frame) summary(diabdata.step) plot(outlierProbs(diabdata.step)) outlierTest(diabdata.step, cores = 1) # Hawkins' data library(forward) data(hawkins) hawkins.robustmix <- robmixglm(y~x1+x2+x3+x4+x5+x6+x7+x8, cores = 1, data=hawkins) summary(hawkins.robustmix) plot(outlierProbs(hawkins.robustmix)) outlierTest(hawkins.robustmix, cores = 1) # carrot damage library(robustbase) data(carrots) carrots.robustmix <- robmixglm(cbind(success, total-success)~logdose+factor(block), family = "binomial", data = carrots, cores = 1) summary(carrots.robustmix) plot(outlierProbs(carrots.robustmix)) outlierTest(carrots.robustmix, cores = 1) # train derailment library(forward) data(derailme) derailme$cYear <- derailme$Year-mean(derailme$Year) derailme$TrainKm100 <- derailme$TrainKm*100.0 derailme.robustmix <- robmixglm(y~cYear+factor(Type), offset = log(TrainKm100), family = "truncpoisson", quadpoints = 51, data = derailme, cores = 1) summary(derailme.robustmix) plot(outlierProbs(derailme.robustmix)) outlierTest(derailme.robustmix, cores = 1) # hospital costs hospcosts.robustmix <- robmixglm(costs~adm+age+dest+ins+loglos+sex, family = "gamma", data = hospcosts, cores = 1) summary(hospcosts.robustmix) plot(outlierProbs(hospcosts.robustmix)) outlierTest(hospcosts.robustmix, cores = 1)
# for the following cores is set to 1 to satisfy the CRAN testing requirements # removing will reduce the time taken depending on number of cores available # animal brain vs body weight library(MASS) data(Animals) Animals$logbrain <- log(Animals$brain) Animals$logbody <- log(Animals$body) lm1 <- lm(logbrain~logbody, data = Animals) lm2 <- robmixglm(logbrain~logbody, data = Animals, cores = 1) plot(Animals$logbody, Animals$logbrain) abline(lm1, col = "red") abline(lm2, col = "green") plot(outlierProbs(lm2)) outlierTest(lm2, cores = 1) # Forbes data on relationship between atmospheric pressure and boiling point of water library(MASS) data(forbes) forbes.robustmix <- robmixglm(100*log10(pres)~bp, data = MASS::forbes, cores = 1) summary(forbes.robustmix) plot(outlierProbs(forbes.robustmix)) outlierTest(forbes.robustmix, cores = 1) # diabetes diabdata.robustmix <- robmixglm(glyhb~age+gender+bmi+waisthip+frame, data = diabdata, cores = 1) summary(diabdata.robustmix) # this will take about 5-10 minutes diabdata.step <- step(diabdata.robustmix, glyhb~age+gender+bmi+waisthip+frame) summary(diabdata.step) plot(outlierProbs(diabdata.step)) outlierTest(diabdata.step, cores = 1) # Hawkins' data library(forward) data(hawkins) hawkins.robustmix <- robmixglm(y~x1+x2+x3+x4+x5+x6+x7+x8, cores = 1, data=hawkins) summary(hawkins.robustmix) plot(outlierProbs(hawkins.robustmix)) outlierTest(hawkins.robustmix, cores = 1) # carrot damage library(robustbase) data(carrots) carrots.robustmix <- robmixglm(cbind(success, total-success)~logdose+factor(block), family = "binomial", data = carrots, cores = 1) summary(carrots.robustmix) plot(outlierProbs(carrots.robustmix)) outlierTest(carrots.robustmix, cores = 1) # train derailment library(forward) data(derailme) derailme$cYear <- derailme$Year-mean(derailme$Year) derailme$TrainKm100 <- derailme$TrainKm*100.0 derailme.robustmix <- robmixglm(y~cYear+factor(Type), offset = log(TrainKm100), family = "truncpoisson", quadpoints = 51, data = derailme, cores = 1) summary(derailme.robustmix) plot(outlierProbs(derailme.robustmix)) outlierTest(derailme.robustmix, cores = 1) # hospital costs hospcosts.robustmix <- robmixglm(costs~adm+age+dest+ins+loglos+sex, family = "gamma", data = hospcosts, cores = 1) summary(hospcosts.robustmix) plot(outlierProbs(hospcosts.robustmix)) outlierTest(hospcosts.robustmix, cores = 1)
Returns AIC for a robmixglm object.
## S3 method for class 'robmixglm' AIC(object, ..., k = 2)
## S3 method for class 'robmixglm' AIC(object, ..., k = 2)
object |
robmixglm object |
... |
additional argument; currently none is used. |
k |
penalty per parameter |
AIC
Ken Beath
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) AIC(forbes.robustmix)
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) AIC(forbes.robustmix)
Returns BIC for a robmixglm object.
## S3 method for class 'robmixglm' BIC(object, ...)
## S3 method for class 'robmixglm' BIC(object, ...)
object |
robmixglm object |
... |
additional argument; currently none is used. |
BIC
Ken Beath
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) BIC(forbes.robustmix)
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) BIC(forbes.robustmix)
Returns coefficients for a robmixglm object. Only the coefficients for the linear part of the model are returned. Additional coefficients may be obtained using summary().
## S3 method for class 'robmixglm' coef(object, ...)
## S3 method for class 'robmixglm' coef(object, ...)
object |
robmixglm object |
... |
additional argument; currently none is used. |
coef
Ken Beath
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) coef(forbes.robustmix)
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) coef(forbes.robustmix)
Data from Heritier et al (2009), originally from Harrell (2001, p379). This data was from a study of the prevalence of cardiovascular risk factors such as obesity and diabetes for African Americans. (Willems et al, 19997) Data was available for 403 subjects screened for diabetes, reduced to 372 after removal of cases with missing data.
diabdata
diabdata
A data frame with 372 observations on the following 8 variables.
glyhb
Glycosated haemoglobin (values above 7.0 are usually taken as a positive diagnosis of diabetes)
age
age in years
gender
male or female
bmi
body mass index in kg/m^2
waisthip
ratio of waist to hip measurement
frame
body frame, small, medium or large
stab.glu
glucose
location
location, Buckingham or Louisa
Heritier et al (2009)
Harrell, F.E. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression and Survival Analysis. Springer.
Heritier, S., Cantoni, E., Copt, S. and Victoria-Feser, M-P (2009). Robust Methods in Biostatistics. Wiley.
Willems, J.P., Saunders, J.T., Hunt, D.E. and Schorling, J.B. (1997) Prevalence of coronary heart disease risk factors among rural blacks: A community-based study. Southern Medical Journal, 90:814-820.
diabdata.robustmix <- robmixglm(glyhb~age+gender+bmi+waisthip+frame+location, data = diabdata, cores = 1) summary(diabdata.robustmix) diabdata.step <- step(diabdata.robustmix, glyhb~age+gender+bmi+waisthip+frame+location, cores = 1) summary(diabdata.step)
diabdata.robustmix <- robmixglm(glyhb~age+gender+bmi+waisthip+frame+location, data = diabdata, cores = 1) summary(diabdata.robustmix) diabdata.step <- step(diabdata.robustmix, glyhb~age+gender+bmi+waisthip+frame+location, cores = 1) summary(diabdata.step)
Computes the (generalized) AIC for a fitted robmixglm
model. Used in step
, otherwise use AIC
.
## S3 method for class 'robmixglm' extractAIC(fit, scale, k = 2, ...)
## S3 method for class 'robmixglm' extractAIC(fit, scale, k = 2, ...)
fit |
fitted |
scale |
ignored. |
k |
numeric specifying the ‘weight’ of the
equivalent degrees of freedom ( |
... |
further arguments (currently unused). |
Ken Beath
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = MASS::forbes, cores = 1) extractAIC(forbes.robustmix)
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = MASS::forbes, cores = 1) extractAIC(forbes.robustmix)
Calculates the fitted values.
## S3 method for class 'robmixglm' fitted(object, ...)
## S3 method for class 'robmixglm' fitted(object, ...)
object |
A robmixglm object with a mixture (robust) random effects distribution. |
... |
Other parameters. (not used) |
A vector of the fitted values.
Ken Beath <[email protected]>
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) BIC(forbes.robustmix) plot(fitted(forbes.robustmix), residuals(forbes.robustmix))
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) BIC(forbes.robustmix) plot(fitted(forbes.robustmix), residuals(forbes.robustmix))
Data for the analysis in Beath (2018), previously analysed in Marazzi and Yohai (2004), Cantoni and Ronchetti (2006) and Heritier et al (2009). The data is for 100 patients hospitalised at the Centre Hospitalier Universitaire Vaudois in Lausanne, Switzerland for "medical back problems" (APDRG 243).
hospcosts
hospcosts
A data frame with 100 observations on the following 9 variables.
id
patient id
costs
cost of stay in Swiss francs
los
length of stay in days
adm
admission type, 0 = planned, 1 = emergency
ins
insurance type, 0 = regular, 1 = private
age
age in years
sex
sex, 0 = female, 1 = male
dest
discharge destination, 0 = another health institution, 1 = home
loglos
log of length of stay
Heritier et al (2009)
Cantoni, E., & Ronchetti, E. (2006). A robust approach for skewed and heavy-tailed outcomes in the analysis of health care expenditures. Journal of Health Economics, 25(2), 198213. http://doi.org/10.1016/j.jhealeco.2005.04.010
Heritier, S., Cantoni, E., Copt, S. and Victoria-Feser, M-P (2009). Robust Methods in Biostatistics. Wiley.
Marazzi, A., & Yohai, V. J. (2004). Adaptively truncated maximum likelihood regression with asymmetric errors. Journal of Statistical Planning and Inference, 122(12), 271291. http://doi.org/10.1016/j.jspi.2003.06.011
hospcosts.robustmix <- robmixglm(costs~adm+age+dest+ins+loglos+sex, family = "gamma", data = hospcosts, cores = 1) summary(hospcosts.robustmix)
hospcosts.robustmix <- robmixglm(costs~adm+age+dest+ins+loglos+sex, family = "gamma", data = hospcosts, cores = 1) summary(hospcosts.robustmix)
Returns log Likelihood for a robmixglm object.
## S3 method for class 'robmixglm' logLik(object, ...)
## S3 method for class 'robmixglm' logLik(object, ...)
object |
robmixglm object |
... |
additional argument; currently none is used. |
The loglikelihood.
Ken Beath
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) logLik(forbes.robustmix)
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) logLik(forbes.robustmix)
For the normal mixture random effect calculates the probability that each observation is an outlier based on the posterior probability of it being an outlier.
outlierProbs(object)
outlierProbs(object)
object |
A metaplus object with a mixture (robust) random effects distribution. |
The outlier probabilities are obtained as the posterior probabilities of each observation being an outlier based on the fitted mixture model.
outlier.prob |
Posterior probability that each observation is an outlier |
Ken Beath <[email protected]>
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) outlierProbs(forbes.robustmix)
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) outlierProbs(forbes.robustmix)
Uses the parametric bootstrap to test for the presence of outliers.
outlierTest(object, R = 999, cores = max(detectCores() %/% 2, 1))
outlierTest(object, R = 999, cores = max(detectCores() %/% 2, 1))
object |
A robmixglm object with a mixture (robust) random effects distribution. |
R |
number of bootstrap replications |
cores |
Number of cores to be used in parallel. Default is one less than available. |
Performs a parametric bootstrap to compare models with and without outliers.
An outlierTest object which is the object of class “boot” returned by the call to boot
.
Ken Beath <[email protected]>
hospcosts.robustmix <- robmixglm(costs~adm+age+dest+ins+loglos+sex, family = "gamma", data = hospcosts, cores = 1) summary(hospcosts.robustmix) summary(outlierTest(hospcosts.robustmix, cores = 1))
hospcosts.robustmix <- robmixglm(costs~adm+age+dest+ins+loglos+sex, family = "gamma", data = hospcosts, cores = 1) summary(hospcosts.robustmix) summary(outlierTest(hospcosts.robustmix, cores = 1))
Plots the outlier probability for each observation, from an outlierProbs object.
## S3 method for class 'outlierProbs' plot(x, ...)
## S3 method for class 'outlierProbs' plot(x, ...)
x |
outlierProbs object to be plotted |
... |
additional parameters to plot |
Plot
Ken Beath <[email protected]>
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) plot(outlierProbs(forbes.robustmix))
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) plot(outlierProbs(forbes.robustmix))
Obtains predictions from a fitted robust mixture generalized linear model object.
## S3 method for class 'robmixglm' predict(object, newdata = NULL, type = c("link", "response"), ...)
## S3 method for class 'robmixglm' predict(object, newdata = NULL, type = c("link", "response"), ...)
object |
a fitted object of class inheriting from |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used. |
type |
the type of prediction required. The default |
... |
Other parameters. (not used) |
If newdata
is omitted the predictions are based on the data
used for the fit. In that case how cases with missing values in the
original fit is determined by the na.action
argument of that
fit. If na.action = na.omit
omitted cases will not appear in
the residuals, whereas if na.action = na.exclude
they will
appear (in predictions and standard errors), with residual value
NA
. See also napredict
.
A vector predicted linear predictors or response. For binomial
the
resonse is the predicted proportion.
library(MASS) data(forbes) forbes.robustmix <- robmixglm(100*log10(pres)~bp, data = forbes, cores = 1) plot(forbes$bp, forbes$pres) preddata <- data.frame(bp = seq(from = min(forbes$bp), to = max(forbes$bp), by = 0.01)) # convert to original scale preddata$predpres <-10^(predict(forbes.robustmix, newdata = preddata)/100) lines(preddata$bp, preddata$predpres, col = "red")
library(MASS) data(forbes) forbes.robustmix <- robmixglm(100*log10(pres)~bp, data = forbes, cores = 1) plot(forbes$bp, forbes$pres) preddata <- data.frame(bp = seq(from = min(forbes$bp), to = max(forbes$bp), by = 0.01)) # convert to original scale preddata$predpres <-10^(predict(forbes.robustmix, newdata = preddata)/100) lines(preddata$bp, preddata$predpres, col = "red")
Print an outlierTest object.
## S3 method for class 'outlierTest' print(x, ...)
## S3 method for class 'outlierTest' print(x, ...)
x |
outlierTest object |
... |
further arguments (not currently used) |
Ken Beath
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) summary(forbes.robustmix) print(outlierTest(forbes.robustmix, cores = 1))
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) summary(forbes.robustmix) print(outlierTest(forbes.robustmix, cores = 1))
Extracts model residuals from objects returned by modeling functions.
## S3 method for class 'robmixglm' residuals(object, type = c("deviance", "pearson"), ...)
## S3 method for class 'robmixglm' residuals(object, type = c("deviance", "pearson"), ...)
object |
an object for which the extraction of model residuals is meaningful. |
type |
Type of residual where valid types are deviance and pearson. |
... |
other arguments. |
Residuals extracted from the object object
.
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) BIC(forbes.robustmix) plot(fitted(forbes.robustmix), residuals(forbes.robustmix))
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) BIC(forbes.robustmix) plot(fitted(forbes.robustmix), residuals(forbes.robustmix))
Fits robust generalized linear models and variants described in Beath (2018).
robmixglm(formula, family = c("gaussian", "binomial", "poisson", "gamma", "truncpoisson", "nbinom"), data, offset = NULL, quadpoints = 21, notrials = 50, EMTol = 1.0e-4, cores = max(detectCores() %/% 2, 1), verbose = FALSE)
robmixglm(formula, family = c("gaussian", "binomial", "poisson", "gamma", "truncpoisson", "nbinom"), data, offset = NULL, quadpoints = 21, notrials = 50, EMTol = 1.0e-4, cores = max(detectCores() %/% 2, 1), verbose = FALSE)
formula |
Model formula |
family |
Distribution of response |
data |
Data frame from which variables are obtained |
offset |
Offset to be incorporated in the linear predictor. |
quadpoints |
Number of quadrature points used in the Gauss-Hermite integration. |
notrials |
Number of random starting values to be used for EM |
EMTol |
Relative change in likelihood for completion of EM algorithm before switching to quasi-Newton |
cores |
Number of cores to be used for parallel evaluation of starting values |
verbose |
Print out diagnostic information? This includes the likelihood and parameter estimates for each EM run. |
Fits robust generalized models assuming that data is a mixture of standard observations and outlier abservations, which belong to an overdispersed model (Beath, 2018). For binomial, Poisson, truncated Poisson and gamma, the overdispersed component achieved through including a random effect as part of the linear predictor, as described by Aitkin (1996). For gaussian and negative binomial data the outlier component is also a gaussian and negative binomial model, respectively but with a higher dispersion. For gaussian this corresponds to a higher value of but for negative binomial this is a lower value of
.
The method used is a generalised EM. Random starting values are determined by randomly allocating observations to either the standard or outlier class for the first iteration of the EM. The EM is then run to completion for all sets of starting values. The best set of starting values is then used to obtain the final results using a quasi-Newton method. Where the overdispersed data is obtained using a random effect, the likelihood is obtained by integrating out the random effect using Gauss-Hermite quadrature.
robmixglm object. This contains
fit |
Final model fit from quasi-Newton |
prop |
Posterior probability of observation in each class |
logLik |
final log likelihood |
np |
Number of parameters |
nobs |
Number of observations |
coef.names |
Coefficient names |
call |
Call to function |
family |
Family of model to be fitted |
model |
model |
terms |
terms |
xlevels |
Levels for factors. |
quadpoints |
Number of quadrature points used in the Gauss-Hermite integration. |
notrials |
Number of random starting values to be used for EM |
EMTol |
Relative change in likelihood for completion of EM algorithm before switching to quasi-Newton |
verbose |
Was verbose output requested? |
Ken Beath
Beath, K. J. A mixture-based approach to robust analysis of generalised linear models, Journal of Applied Statistics, 45(12), 2256-2268 (2018) DOI: 10.1080/02664763.2017.1414164
Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing, 6, 251262. DOI: 10.1007/BF00140869
if (requireNamespace("MASS", quietly = TRUE)) { library(MASS) data(forbes) forbes.robustmix <- robmixglm(100*log10(pres)~bp, data = forbes, cores = 1) }
if (requireNamespace("MASS", quietly = TRUE)) { library(MASS) data(forbes) forbes.robustmix <- robmixglm(100*log10(pres)~bp, data = forbes, cores = 1) }
Returns summary for a robmixglm object.
## S3 method for class 'robmixglm' summary(object, ...)
## S3 method for class 'robmixglm' summary(object, ...)
object |
robmixglm object |
... |
additional argument; currently none is used. |
summary
Ken Beath
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) summary(forbes.robustmix)
library(MASS) data(forbes) forbes.robustmix <- robmixglm(bp~pres, data = forbes, cores = 1) summary(forbes.robustmix)