Title: | Half-Normal Plots with Simulation Envelopes |
---|---|
Description: | Generates (half-)normal plots with simulation envelopes using different diagnostics from a range of different fitted models. A few example datasets are included. |
Authors: | Rafael de Andrade Moral [aut, cre], John Hinde [aut], Clarice Garcia Borges Demetrio [aut] |
Maintainer: | Rafael de Andrade Moral <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2-6 |
Built: | 2024-12-11 07:07:34 UTC |
Source: | CRAN |
Generates (half-)normal plots with simulation envelopes for common diagnostics from fitted models for a range of model classes. The function hnp()
is written so that it is relatively easy to extend it to new model classes and different diagnostics that are not already implemented. A few example datasets are included. The main functions are hnp
and plot.hnp
Package: | hnp |
Type: | Package |
Version: | 1.2-6 |
Date: | 2018-05-21 |
License: | GPL version 2 or newer |
Package includes two functions, hnp
and plot.hnp
, as well as seven data sets extracted from Demétrio et al (2014).
Explanation on how to produce the half-normal plots with simulation envelopes can be found in Demétrio and Hinde (1997) and on the hnp
function documentation.
Data sets are based on entomology studies and the focus is on overdispersed data. Details on overdispersion models can be found in Hinde and Demétrio (1998).
Special thanks to FAPESP and CNPq for funding.
Rafael A. Moral, John Hinde and Clarice G. B. Demétrio
Maintainer: Rafael Moral <[email protected]>
Moral, R. A., Hinde, J. and Demétrio, C. G. B. (2017) Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81(10):1-23.
Demétrio, C. G. B. and Hinde, J. (1997) Half-normal plots and overdispersion. GLIM Newsletter 27:19-26.
Hinde, J. and Demétrio, C. G. B. (1998) Overdispersion: models and estimation. Computational Statistics and Data Analysis 27:151-170.
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
Data on counts of coffee berry borer obtained using different traps through time.
data(cbb)
data(cbb)
A data frame with 288 observations on the following 4 variables.
week
|
numeric | week of observation (1 to 24) |
block
|
factor | levels I II III IV
|
trap
|
factor | levels CV F SF
|
count
|
numeric | number of observed insects |
The coffee berry borer is a major pest of commercial coffee. The insect directly attacks the coffee fruit in development causing severe losses in bean production and quality.
This data set was obtained in an experiment conducted by Mota (2013), where three types of traps (SF
, F
, CV
) were randomized in each of four equidistant lines (blocks) of a coffee field. Each week, over a 24 week period, the insects were removed from the traps and counted.
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
Mota, L. H. C. (2013) Desenvolvimento de armadilha de auto-inoculacao para o controlde de Hypothenemus hampei (Ferrari, 1867) (Coleoptera: Curculionidae) com Beauveria bassiana (Bals.) Vuil (Ascomycota: Hypocreales) em tecido sinetico. Master's dissertation, ESALQ-USP
data(cbb) # exploratory plot require(latticeExtra) trellis.par.set(strip.background=list(col="lightgrey")) useOuterStrips(xyplot(count ~ week | block + trap, data=cbb, layout=c(3,1),type="l", col=1, xlab="Week", ylab="Insect counts")) # Poisson fit model1 <- glm(count ~ block + trap*factor(week), data=cbb, family=poisson) anova(model1, test="Chisq") sum(resid(model1, ty="pearson")^2) summary(model1) hnp(model1, sim=19, conf=1) ## Not run: hnp(model1) # default call ## End(Not run) # Quasi-Poisson fit model2 <- glm(count ~ block + trap*factor(week), data=cbb, family=quasipoisson) anova(model2, test="F") summary(model2) hnp(model2, sim=19, conf=1) ## Not run: hnp(model2) # default call ## End(Not run) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
data(cbb) # exploratory plot require(latticeExtra) trellis.par.set(strip.background=list(col="lightgrey")) useOuterStrips(xyplot(count ~ week | block + trap, data=cbb, layout=c(3,1),type="l", col=1, xlab="Week", ylab="Insect counts")) # Poisson fit model1 <- glm(count ~ block + trap*factor(week), data=cbb, family=poisson) anova(model1, test="Chisq") sum(resid(model1, ty="pearson")^2) summary(model1) hnp(model1, sim=19, conf=1) ## Not run: hnp(model1) # default call ## End(Not run) # Quasi-Poisson fit model2 <- glm(count ~ block + trap*factor(week), data=cbb, family=quasipoisson) anova(model2, test="F") summary(model2) hnp(model2, sim=19, conf=1) ## Not run: hnp(model2) # default call ## End(Not run) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
Mortality of the predator Chrysoperla externa on different doses of lime sulphur, a substance used to control pests on trees.
data(chryso)
data(chryso)
A data frame with 24 observations on the following 4 variables.
dead
|
numeric | count of dead C. externa specimens |
alive
|
numeric | count of alive C. externa specimens |
conc
|
numeric | lime sulphur concentration |
log.conc
|
numeric | natural logarithm of lime sulphur concentration |
The neuropteran Chrysoperla externa is a predator that acts
as a natural enemy of the brown citrus aphid, Toxoptera
citricida, which is among the most important citrus pests
worldwide. A possible strategy to control T. citricida
populations would be to use a substance called lime sulphur and the C. externa predator
in combination, which may be beneficial as long as the lime sulphur has less
effect on the predator than the prey. To explore this, Battel (2012) conducted an
experiment with first-instar larvae of Chrysoperla
externa exposed to different levels of lime sulphur.
Specifically, twenty-four Orange Jessamine (Murraya
paniculata) plants were sprayed with different concentrations (conc
) of
lime sulphur and up to seven first-instar larvae were placed on each
plant. The experiment was set up in a completely randomized design with four treatments:
lime sulphur concentrations at 0ppm (water control), 60ppm, 600ppm,
and 6000ppm. The plants were observed until the predators reached the second instar and
the number of larvae that died on each plant was recorded.
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
Battel, A. P. M. B. (2012) Dinamica de predacao e resposta funcional em Chrysoperla externa (Neuroptera: Chrysopidae) sobre Toxoptera citricida (Hemiptera: Aphididae) aplicada a citricultura organica. Master's dissertation, ESALQ-USP
data(chryso) # fit model using conc levels both on log-scale and # as a factor to produce simple analysis of deviance model <- glm(cbind(dead, alive) ~ log.conc + factor(conc), family=binomial, data=chryso) anova(model, test="Chisq") # test adequacy of factor model using deviance and X2 1-pchisq(deviance(model), df.residual(model)) (X2 <- sum(residuals(model, type="pearson")^2)) 1-pchisq(X2, df.residual(model)) model1 <- glm(cbind(dead, alive) ~ log.conc, family=binomial, data=chryso) par(mfrow=c(1,2), cex=1.4) with(chryso, plot(jitter(log.conc), dead/(dead+alive), ylab="Proportion dead", xlab="log(conc+1)")) x <- seq(0, 8.7, .1) pr <- predict(model1, data.frame(log.conc=x), ty="response") lines(x, pr) # half-normal plot hnp(model1, xlab="Half-normal scores", ylab="Deviance residuals", pch=4) require(MASS) dose.p(model1, p=.10) logLC10 <- dose.p(model1, p=.10) LC10 <- exp(logLC10[[1]]) - 1 #95% CI on log-dose scale using transformation c(logLC10[1]-2*attr(logLC10,'SE'), logLC10[1]+2*attr(logLC10,'SE')) #95% CI on dose scale using transformation c(exp(logLC10[1]-2*attr(logLC10,'SE'))-1, exp(logLC10[1]+2*attr(logLC10,'SE'))-1) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
data(chryso) # fit model using conc levels both on log-scale and # as a factor to produce simple analysis of deviance model <- glm(cbind(dead, alive) ~ log.conc + factor(conc), family=binomial, data=chryso) anova(model, test="Chisq") # test adequacy of factor model using deviance and X2 1-pchisq(deviance(model), df.residual(model)) (X2 <- sum(residuals(model, type="pearson")^2)) 1-pchisq(X2, df.residual(model)) model1 <- glm(cbind(dead, alive) ~ log.conc, family=binomial, data=chryso) par(mfrow=c(1,2), cex=1.4) with(chryso, plot(jitter(log.conc), dead/(dead+alive), ylab="Proportion dead", xlab="log(conc+1)")) x <- seq(0, 8.7, .1) pr <- predict(model1, data.frame(log.conc=x), ty="response") lines(x, pr) # half-normal plot hnp(model1, xlab="Half-normal scores", ylab="Deviance residuals", pch=4) require(MASS) dose.p(model1, p=.10) logLC10 <- dose.p(model1, p=.10) LC10 <- exp(logLC10[[1]]) - 1 #95% CI on log-dose scale using transformation c(logLC10[1]-2*attr(logLC10,'SE'), logLC10[1]+2*attr(logLC10,'SE')) #95% CI on dose scale using transformation c(exp(logLC10[1]-2*attr(logLC10,'SE'))-1, exp(logLC10[1]+2*attr(logLC10,'SE'))-1) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
Corn grain damage by the maize weevil, a major pest of stored maize worldwide.
data(corn)
data(corn)
A data frame with 40 observations on the following 3 variables.
extract
|
factor | with levels leaf , branch , seed and control
|
m
|
numeric | total number of corn grains |
y
|
numeric | number of damaged corn grains |
A major pest of stored maize in Brazil is Sitophilus zeamais. In an experiment to assess the insecticide action of organic extracts of Annona mucosa (Annonaceae), Petri dishes containing 10g of corn were treated with extracts prepared with different parts of the plant (seeds, leaves and branches) at a concentration of 1500mg/kg or just water (control), using a completely randomized design with 10 replicates. Then 20 Sitophilus zeamais adults were placed in each Petri dish and, after 60 days, the numbers of damaged and undamaged corn grains were counted, see Ribeiro et al (2013).
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
Moral, R. A., Hinde, J. and Demétrio, C. G. B. (2017) Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81(10):1-23.
Ribeiro, L. P., Vendramin, J. D., Bicalho, K. U., Andrade, M. S., Fernandes, J. B., Moral, R. A., Demétrio, C. G. B. (2013) Annona mucosa Jacq. (Annonaceae): A promising source of bioactive compounds against Sitophilus zeamais Mots. (Coleoptera: Curculionidae). J Stored Prod Res 55:6-14
data(corn) # Binomial fit model1 <- glm(cbind(y, m-y) ~ extract, family=binomial, data=corn) anova(model1, test="Chisq") hnp(model1, pch=4, main="Binomial: Logit", xlab="Half-normal scores", ylab="Deviance residuals") # Quasi-binomial fit model2 <- glm(cbind(y, m-y) ~ extract, family=quasibinomial, data=corn) anova(model2, test="F") summary(model2)$dispersion # estimated phi # half-normal plots par(mfrow=c(1,2),cex=1.4, cex.main=0.9, pty='s') hnp(model1, pch=4, main="(a) Binomial; Logit", xlab="Half-normal scores", ylab="Deviance residuals") hnp(model2, pch=4, main="(b) Quasibinomial; Logit", xlab="Half-normal scores", ylab="Deviance residuals") anova(model1, test="Chisq") # binomial model anova(model2, test="F") # quasi-binomial model summary(model1) # binomial model summary(model2) # quasi-binomial model # now with factor level parameterisation summary(update(model1,.~.-1)) summary(update(model2,.~.-1)) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
data(corn) # Binomial fit model1 <- glm(cbind(y, m-y) ~ extract, family=binomial, data=corn) anova(model1, test="Chisq") hnp(model1, pch=4, main="Binomial: Logit", xlab="Half-normal scores", ylab="Deviance residuals") # Quasi-binomial fit model2 <- glm(cbind(y, m-y) ~ extract, family=quasibinomial, data=corn) anova(model2, test="F") summary(model2)$dispersion # estimated phi # half-normal plots par(mfrow=c(1,2),cex=1.4, cex.main=0.9, pty='s') hnp(model1, pch=4, main="(a) Binomial; Logit", xlab="Half-normal scores", ylab="Deviance residuals") hnp(model2, pch=4, main="(b) Quasibinomial; Logit", xlab="Half-normal scores", ylab="Deviance residuals") anova(model1, test="Chisq") # binomial model anova(model2, test="F") # quasi-binomial model summary(model1) # binomial model summary(model2) # quasi-binomial model # now with factor level parameterisation summary(update(model1,.~.-1)) summary(update(model2,.~.-1)) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
Mortality of the Citrus psyllid, Diaphorina citri, a major pest of Citrus worldwide, when exposed to different concentrations of two fungi species, Beauveria bassiana and Isaria fumosorosea.
data(fungi)
data(fungi)
A data frame with 30 observations on the following 5 variables.
y
|
numeric | number of dead insects |
m
|
numeric | total number of insects |
conc
|
numeric | fungi concentration (in conidia/ml) |
lconc
|
numeric | natural logarithm of fungi concentration |
species
|
factor | levels isaria and beauveria , fungi species
|
The Citrus psyllid Diaphorina citri is a vector of Huanglongbing, known as greening disease. An alternative to chemical control is to use solutions of fungi conidia as a biological control strategy. D'Alessandro (2014) conducted a completely randomized experiment to assess how different conidia concentrations ($10^4$, $10^5$, $10^6$, $10^7$ and $10^8$ conidia/ml) of two fungi species, Beauveria bassiana and Isaria fumosorosea, infected D. citri adults. Each experimental unit consisted of around 20 D. citri adults, which were placed on Citrus limonia plants. The insects were pulverized with the solutions and after 10 days the number of dead insects and dead insects due to fungus infection were observed. Note that in this case the conidia concentrations are obtained in successive dilutions and therefore small variations in the number of conidia per ml may contribute additional variability to the data. Such additional variability may be accounted for in the model by including an additive random effect in the linear predictor.
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
D'Alessandro (2014) Unpublished data, private communication.
data(fungi) # Binomial fit model1 <- glm(cbind(y, m-y) ~ lconc*species, family=binomial, data=fungi) anova(model1, test="Chisq") sum(resid(model1, ty="pearson")^2) 1 - pchisq(sum(resid(model1, ty="pearson")^2), 20) hnp(model1) # Quasi-binomial fit model2 <- glm(cbind(y, m-y) ~ lconc*species, family=quasibinomial, data=fungi) anova(model2, test="F") hnp(model2) ## Not run: # Logistic-normal fit require(lme4) fungi$ind <- factor(1:nrow(fungi)) model3 <- glmer(cbind(y, m-y) ~ lconc*species + (1|ind), family=binomial, data=fungi) summary(model3) hnp(model3) ## End(Not run) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
data(fungi) # Binomial fit model1 <- glm(cbind(y, m-y) ~ lconc*species, family=binomial, data=fungi) anova(model1, test="Chisq") sum(resid(model1, ty="pearson")^2) 1 - pchisq(sum(resid(model1, ty="pearson")^2), 20) hnp(model1) # Quasi-binomial fit model2 <- glm(cbind(y, m-y) ~ lconc*species, family=quasibinomial, data=fungi) anova(model2, test="F") hnp(model2) ## Not run: # Logistic-normal fit require(lme4) fungi$ind <- factor(1:nrow(fungi)) model3 <- glmer(cbind(y, m-y) ~ lconc*species + (1|ind), family=binomial, data=fungi) summary(model3) hnp(model3) ## End(Not run) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
Produces a (half-)normal plot from a fitted model object for a range of different models. Extendable to non-implemented model classes.
hnp(object, sim = 99, conf = 0.95, resid.type, maxit, halfnormal = T, scale = F, plot.sim = T, verb.sim = F, warn = F, how.many.out = F, print.on = F, paint.out = F, col.paint.out, newclass = F, diagfun, simfun, fitfun, ...)
hnp(object, sim = 99, conf = 0.95, resid.type, maxit, halfnormal = T, scale = F, plot.sim = T, verb.sim = F, warn = F, how.many.out = F, print.on = F, paint.out = F, col.paint.out, newclass = F, diagfun, simfun, fitfun, ...)
object |
fitted model object or numeric vector. |
sim |
number of simulations used to compute envelope. Default is 99. |
conf |
confidence level of the simulated envelope. Default is 0.95. |
resid.type |
type of residuals to be used; must be one of "deviance", "pearson", "response", "working", "simple", "student", or "standard". Not all model type and residual type combinations are allowed. Defaults are "student" for |
maxit |
maximum number of iterations of the estimation algorithm. Defaults are 25 for |
halfnormal |
logical. If |
scale |
logical. If |
plot.sim |
logical. Should the (half-)normal plot be plotted? Default is |
verb.sim |
logical. If |
warn |
logical. If |
how.many.out |
logical. If |
print.on |
logical. If |
paint.out |
logical. If |
col.paint.out |
If |
newclass |
logical. If |
diagfun |
user-defined function used to obtain the diagnostic measures from the fitted model object (only used when |
simfun |
user-defined function used to simulate a random sample from the model estimated parameters (only used when |
fitfun |
user-defined function used to re-fit the model to simulated data (only used when |
... |
extra graphical arguments passed to |
A relatively easy way to assess goodness-of-fit of a fitted model is to use (half-)normal plots of a model diagnostic, e.g., different types of residuals, Cook's distance, leverage. These plots are obtained by plotting the ordered absolute values of a model diagnostic versus the expected order statistic of a half-normal distribution,
(for a half-normal plot) or the normal distribution,
(for a normal plot).
Atkinson (1985) proposed the addition of a simulated envelope, which is such that under the correct model the plot for the observed data is likely to fall within the envelope. The objective is not to provide a region of acceptance, but some sort of guidance to what kind of shape to expect.
Obtaining the simulated envelope is simple and consists of (1) fitting a model; (2) extracting model diagnostics and calculating sorted absolute values; (3) simulating 99 (or more) response variables using the same model matrix, error distribution and fitted parameters; (4) fitting the same model to each simulated response variable and obtaining the same model diagnostics, again sorted absolute values; (5) computing the desired percentiles (e.g., 2.5 and 97.5) at each value of the expected order statistic to form the envelope.
This function handles different model classes and more will be implemented as time goes by. So far, the following models are included:
Continuous data: | |
Normal: | functions lm , aov and glm with family=gaussian
|
Gamma: | function glm with family=Gamma
|
Inverse gaussian: | function glm with family=inverse.gaussian
|
Proportion data: | |
Binomial: | function glm with family=binomial
|
Quasi-binomial: | function glm with family=quasibinomial
|
Beta-binomial: | package VGAM - function vglm , with family=betabinomial ; |
package aods3 - function aodml , with family="bb" ; |
|
package gamlss - function gamlss , with family=BB ; |
|
package glmmADMB - function glmmadmb , with family="betabinomial"
|
|
Zero-inflated binomial: | package VGAM - function vglm , with family=zibinomial ; |
package gamlss - function gamlss , with family=ZIBI ; |
|
package glmmADMB - function glmmadmb , with family="binomial"
|
|
and zeroInfl=TRUE
|
|
Zero-inflated beta-binomial: | package gamlss - function gamlss , with family=ZIBB ; |
package glmmADMB - function glmmadmb , with family="betabinomial"
|
|
and zeroInfl=TRUE
|
|
Multinomial: | package nnet - function multinom
|
Count data: | |
Poisson: | function glm with family=poisson
|
Quasi-Poisson: | function glm with family=quasipoisson
|
Negative binomial: | package MASS - function glm.nb ; |
package aods3 - function aodml , with family="nb"
|
|
and phi.scale="inverse"
|
|
Zero-inflated Poisson: | package pscl - function zeroinfl , with dist="poisson"
|
Zero-inflated negative binomial: | package pscl - function zeroinfl , with dist="negbin"
|
Hurdle Poisson: | package pscl - function hurdle , with dist="poisson"
|
Hurdle negative binomial: | package pscl - function hurdle , with dist="negbin"
|
Mixed models: | |
Linear mixed models: | package lme4 , function lmer
|
Generalized linear mixed models: | package lme4 , function glmer with family=poisson or binomial
|
Users can also use a numeric vector as object
and hnp
will generate the (half-)normal plot with a simulated envelope using the standard normal distribution (scale=F
) or (
scale=T
).
Implementing a new model class is done by providing three functions to hnp
: diagfun
- to obtain model diagnostics, simfun
- to simulate random variables and fitfun
- to refit the model to simulated variables. The way these functions must be written is shown in the Examples section.
hnp
returns an object of class "hnp"
, which is a list containing the following components:
x |
quantiles of the (half-)normal distribution |
lower |
lower envelope band |
median |
median envelope band |
upper |
upper envelope band |
residuals |
diagnostic measures in absolute value and in order |
out.index |
vector indicating which points are out of the envelope |
col.paint.out |
color of points which are outside of the envelope (used if |
how.many.out |
logical. Equals |
total |
length of the diagnostic measure vector |
out |
number of points out of the envelope |
print.on |
logical. Equals |
paint.out |
logical. Equals |
all.sim |
matrix with all diagnostics obtained in the simulations. Each column represents one simulation |
See documentation on example data sets for simple analyses and goodness-of-fit checking using hnp
.
Rafael A. Moral <[email protected]>, John Hinde and Clarice G. B. Demétrio
Moral, R. A., Hinde, J. and Demétrio, C. G. B. (2017) Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81(10):1-23.
Atkinson, A. C. (1985) Plots, transformations and regression, Clarendon Press, Oxford.
Demétrio, C. G. B. and Hinde, J. (1997) Half-normal plots and overdispersion. GLIM Newsletter 27:19-26.
Hinde, J. and Demétrio, C. G. B. (1998) Overdispersion: models and estimation. Computational Statistics and Data Analysis 27:151-170.
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
plot.hnp
, cbb
, chryso
, corn
, fungi
, oil
, progeny
, wolbachia
## Simple Poisson regression set.seed(100) counts <- c(rpois(5, 2), rpois(5, 4), rpois(5, 6), rpois(5, 8)) treatment <- gl(4, 5) fit <- glm(counts ~ treatment, family=poisson) anova(fit, test="Chisq") ## half-normal plot hnp(fit) ## or save it in an object and then use the plot method my.hnp <- hnp(fit, print.on=TRUE, plot=FALSE) plot(my.hnp) ## changing graphical parameters plot(my.hnp, lty=2, pch=4, cex=1.2) plot(my.hnp, lty=c(2,3,2), pch=4, cex=1.2, col=c(2,2,2,1)) plot(my.hnp, main="Half-normal plot", xlab="Half-normal scores", ylab="Deviance residuals", legpos="bottomright") ## Using a numeric vector my.vec <- rnorm(20, 4, 4) hnp(my.vec) # using N(0,1) hnp(my.vec, scale=TRUE) # using N(mu, sigma^2) ## Implementing new classes ## Users provide three functions - diagfun, simfun and fitfun, ## in the following way: ## ## diagfun <- function(obj) { ## userfunction(obj, other_argumens) ## # e.g., resid(obj, type="pearson") ## } ## ## simfun <- function(n, obj) { ## userfunction(n, other_arguments) # e.g., rpois(n, fitted(obj)) ## } ## ## fitfun <- function(y.) { ## userfunction(y. ~ linear_predictor, other_arguments, data=data) ## # e.g., glm(y. ~ block + factor1 * factor2, family=poisson, ## # data=mydata) ## } ## ## when response is binary: ## fitfun <- function(y.) { ## userfunction(cbind(y., m-y.) ~ linear_predictor, ## other_arguments, data=data) ## #e.g., glm(cbind(y., m-y.) ~ treatment - 1, ## # family=binomial, data=data) ## } ## Not run: ## Example no. 1: Using Cook's distance as a diagnostic measure y <- rpois(30, lambda=rep(c(.5, 1.5, 5), each=10)) tr <- gl(3, 10) fit1 <- glm(y ~ tr, family=poisson) # diagfun d.fun <- function(obj) cooks.distance(obj) # simfun s.fun <- function(n, obj) { lam <- fitted(obj) rpois(n, lambda=lam) } # fitfun my.data <- data.frame(y, tr) f.fun <- function(y.) glm(y. ~ tr, family=poisson, data=my.data) # hnp call hnp(fit1, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## Example no. 2: Implementing gamma model using package gamlss # load package require(gamlss) # model fitting y <- rGA(30, mu=rep(c(.5, 1.5, 5), each=10), sigma=.5) tr <- gl(3, 10) fit2 <- gamlss(y ~ tr, family=GA) # diagfun d.fun <- function(obj) resid(obj) # this is the default if no # diagfun is provided # simfun s.fun <- function(n, obj) { mu <- obj$mu.fv sig <- obj$sigma.fv rGA(n, mu=mu, sigma=sig) } # fitfun my.data <- data.frame(y, tr) f.fun <- function(y.) gamlss(y. ~ tr, family=GA, data=my.data) # hnp call hnp(fit2, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun, data=data.frame(y, tr)) ## Example no. 3: Implementing binomial model in gamlss # model fitting y <- rBI(30, bd=50, mu=rep(c(.2, .5, .9), each=10)) m <- 50 tr <- gl(3, 10) fit3 <- gamlss(cbind(y, m-y) ~ tr, family=BI) # diagfun d.fun <- function(obj) resid(obj) # simfun s.fun <- function(n, obj) { mu <- obj$mu.fv bd <- obj$bd rBI(n, bd=bd, mu=mu) } # fitfun my.data <- data.frame(y, tr, m) f.fun <- function(y.) gamlss(cbind(y., m-y.) ~ tr, family=BI, data=my.data) # hnp call hnp(fit3, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## End(Not run)
## Simple Poisson regression set.seed(100) counts <- c(rpois(5, 2), rpois(5, 4), rpois(5, 6), rpois(5, 8)) treatment <- gl(4, 5) fit <- glm(counts ~ treatment, family=poisson) anova(fit, test="Chisq") ## half-normal plot hnp(fit) ## or save it in an object and then use the plot method my.hnp <- hnp(fit, print.on=TRUE, plot=FALSE) plot(my.hnp) ## changing graphical parameters plot(my.hnp, lty=2, pch=4, cex=1.2) plot(my.hnp, lty=c(2,3,2), pch=4, cex=1.2, col=c(2,2,2,1)) plot(my.hnp, main="Half-normal plot", xlab="Half-normal scores", ylab="Deviance residuals", legpos="bottomright") ## Using a numeric vector my.vec <- rnorm(20, 4, 4) hnp(my.vec) # using N(0,1) hnp(my.vec, scale=TRUE) # using N(mu, sigma^2) ## Implementing new classes ## Users provide three functions - diagfun, simfun and fitfun, ## in the following way: ## ## diagfun <- function(obj) { ## userfunction(obj, other_argumens) ## # e.g., resid(obj, type="pearson") ## } ## ## simfun <- function(n, obj) { ## userfunction(n, other_arguments) # e.g., rpois(n, fitted(obj)) ## } ## ## fitfun <- function(y.) { ## userfunction(y. ~ linear_predictor, other_arguments, data=data) ## # e.g., glm(y. ~ block + factor1 * factor2, family=poisson, ## # data=mydata) ## } ## ## when response is binary: ## fitfun <- function(y.) { ## userfunction(cbind(y., m-y.) ~ linear_predictor, ## other_arguments, data=data) ## #e.g., glm(cbind(y., m-y.) ~ treatment - 1, ## # family=binomial, data=data) ## } ## Not run: ## Example no. 1: Using Cook's distance as a diagnostic measure y <- rpois(30, lambda=rep(c(.5, 1.5, 5), each=10)) tr <- gl(3, 10) fit1 <- glm(y ~ tr, family=poisson) # diagfun d.fun <- function(obj) cooks.distance(obj) # simfun s.fun <- function(n, obj) { lam <- fitted(obj) rpois(n, lambda=lam) } # fitfun my.data <- data.frame(y, tr) f.fun <- function(y.) glm(y. ~ tr, family=poisson, data=my.data) # hnp call hnp(fit1, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## Example no. 2: Implementing gamma model using package gamlss # load package require(gamlss) # model fitting y <- rGA(30, mu=rep(c(.5, 1.5, 5), each=10), sigma=.5) tr <- gl(3, 10) fit2 <- gamlss(y ~ tr, family=GA) # diagfun d.fun <- function(obj) resid(obj) # this is the default if no # diagfun is provided # simfun s.fun <- function(n, obj) { mu <- obj$mu.fv sig <- obj$sigma.fv rGA(n, mu=mu, sigma=sig) } # fitfun my.data <- data.frame(y, tr) f.fun <- function(y.) gamlss(y. ~ tr, family=GA, data=my.data) # hnp call hnp(fit2, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun, data=data.frame(y, tr)) ## Example no. 3: Implementing binomial model in gamlss # model fitting y <- rBI(30, bd=50, mu=rep(c(.2, .5, .9), each=10)) m <- 50 tr <- gl(3, 10) fit3 <- gamlss(cbind(y, m-y) ~ tr, family=BI) # diagfun d.fun <- function(obj) resid(obj) # simfun s.fun <- function(n, obj) { mu <- obj$mu.fv bd <- obj$bd rBI(n, bd=bd, mu=mu) } # fitfun my.data <- data.frame(y, tr, m) f.fun <- function(y.) gamlss(cbind(y., m-y.) ~ tr, family=BI, data=my.data) # hnp call hnp(fit3, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## End(Not run)
hnp
objectsThis is an internally called function used to prepare hnp
objects for plotting
Rafael A. Moral <[email protected]>, John Hinde and Clarice G. B. Demétrio
Uses user defined functions to produce the (half-)normal plot with simulated envelope.
newhnp(object, sim=99, conf=.95, halfnormal=T, plot.sim=T, verb.sim=F, how.many.out=F, print.on=F, paint.out=F, col.paint.out, diagfun, simfun, fitfun, ...)
newhnp(object, sim=99, conf=.95, halfnormal=T, plot.sim=T, verb.sim=F, how.many.out=F, print.on=F, paint.out=F, col.paint.out, diagfun, simfun, fitfun, ...)
object |
fitted model object or numeric vector. |
sim |
number of simulations used to compute envelope. Default is 99. |
conf |
confidence level of the simulated envelope. Default is 0.95. |
halfnormal |
logical. If |
plot.sim |
logical. Should the (half-)normal plot be plotted? Default is |
verb.sim |
logical. If |
how.many.out |
logical. If |
print.on |
logical. If |
paint.out |
logical. If |
col.paint.out |
If |
diagfun |
user-defined function used to obtain the diagnostic measures from the fitted model object (only used when |
simfun |
user-defined function used to simulate a random sample from the model estimated parameters (only used when |
fitfun |
user-defined function used to re-fit the model to simulated data (only used when |
... |
extra graphical arguments passed to |
By providing three user-defined functions, newhnp
produces the half-normal plot with simulated envelope for a model whose class is not yet implemented in the package.
The first function, diagfun
, must extract the desired model diagnostics from a model fit object. The second function, simfun
, must return the response variable (numeric vector or matrix), simulated using the same error distributions and estimated parameters from the fitted model. The third and final function, fitfun
, must return a fitted model object. See the Examples section.
Rafael A. Moral <[email protected]>, John Hinde and Clarice G. B. Demétrio
## Implementing new classes ## Users provide three functions - diagfun, simfun and fitfun, ## in the following way: ## ## diagfun <- function(obj) { ## userfunction(obj, other_argumens) ## # e.g., resid(obj, type="pearson") ## } ## ## simfun <- function(n, obj) { ## userfunction(n, other_arguments) # e.g., rpois(n, fitted(obj)) ## } ## ## fitfun <- function(y.) { ## userfunction(y. ~ linear_predictor, other_arguments, data=data) ## # e.g., glm(y. ~ block + factor1 * factor2, family=poisson, ## # data=mydata) ## } ## ## when response is binary: ## fitfun <- function(y.) { ## userfunction(cbind(y., m-y.) ~ linear_predictor, ## other_arguments, data=data) ## #e.g., glm(cbind(y., m-y.) ~ treatment - 1, ## # family=binomial, data=data) ## } ## Not run: ## Example no. 1: Using Cook's distance as a diagnostic measure y <- rpois(30, lambda=rep(c(.5, 1.5, 5), each=10)) tr <- gl(3, 10) fit1 <- glm(y ~ tr, family=poisson) # diagfun d.fun <- function(obj) cooks.distance(obj) # simfun s.fun <- function(n, obj) { lam <- fitted(obj) rpois(n, lambda=lam) } # fitfun my.data <- data.frame(y, tr) f.fun <- function(y.) glm(y. ~ tr, family=poisson, data=my.data) # hnp call hnp(fit1, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## Example no. 2: Implementing gamma model using package gamlss # load package require(gamlss) # model fitting y <- rGA(30, mu=rep(c(.5, 1.5, 5), each=10), sigma=.5) tr <- gl(3, 10) fit2 <- gamlss(y ~ tr, family=GA) # diagfun d.fun <- function(obj) resid(obj) # this is the default if no # diagfun is provided # simfun s.fun <- function(n, obj) { mu <- obj$mu.fv sig <- obj$sigma.fv rGA(n, mu=mu, sigma=sig) } # fitfun my.data <- data.frame(y, tr) f.fun <- function(y.) gamlss(y. ~ tr, family=GA, data=my.data) # hnp call hnp(fit2, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun, data=data.frame(y, tr)) ## Example no. 3: Implementing binomial model in gamlss # model fitting y <- rBI(30, bd=50, mu=rep(c(.2, .5, .9), each=10)) m <- 50 tr <- gl(3, 10) fit3 <- gamlss(cbind(y, m-y) ~ tr, family=BI) # diagfun d.fun <- function(obj) resid(obj) # simfun s.fun <- function(n, obj) { mu <- obj$mu.fv bd <- obj$bd rBI(n, bd=bd, mu=mu) } # fitfun my.data <- data.frame(y, tr, m) f.fun <- function(y.) gamlss(cbind(y., m-y.) ~ tr, family=BI, data=my.data) # hnp call hnp(fit3, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## End(Not run)
## Implementing new classes ## Users provide three functions - diagfun, simfun and fitfun, ## in the following way: ## ## diagfun <- function(obj) { ## userfunction(obj, other_argumens) ## # e.g., resid(obj, type="pearson") ## } ## ## simfun <- function(n, obj) { ## userfunction(n, other_arguments) # e.g., rpois(n, fitted(obj)) ## } ## ## fitfun <- function(y.) { ## userfunction(y. ~ linear_predictor, other_arguments, data=data) ## # e.g., glm(y. ~ block + factor1 * factor2, family=poisson, ## # data=mydata) ## } ## ## when response is binary: ## fitfun <- function(y.) { ## userfunction(cbind(y., m-y.) ~ linear_predictor, ## other_arguments, data=data) ## #e.g., glm(cbind(y., m-y.) ~ treatment - 1, ## # family=binomial, data=data) ## } ## Not run: ## Example no. 1: Using Cook's distance as a diagnostic measure y <- rpois(30, lambda=rep(c(.5, 1.5, 5), each=10)) tr <- gl(3, 10) fit1 <- glm(y ~ tr, family=poisson) # diagfun d.fun <- function(obj) cooks.distance(obj) # simfun s.fun <- function(n, obj) { lam <- fitted(obj) rpois(n, lambda=lam) } # fitfun my.data <- data.frame(y, tr) f.fun <- function(y.) glm(y. ~ tr, family=poisson, data=my.data) # hnp call hnp(fit1, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## Example no. 2: Implementing gamma model using package gamlss # load package require(gamlss) # model fitting y <- rGA(30, mu=rep(c(.5, 1.5, 5), each=10), sigma=.5) tr <- gl(3, 10) fit2 <- gamlss(y ~ tr, family=GA) # diagfun d.fun <- function(obj) resid(obj) # this is the default if no # diagfun is provided # simfun s.fun <- function(n, obj) { mu <- obj$mu.fv sig <- obj$sigma.fv rGA(n, mu=mu, sigma=sig) } # fitfun my.data <- data.frame(y, tr) f.fun <- function(y.) gamlss(y. ~ tr, family=GA, data=my.data) # hnp call hnp(fit2, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun, data=data.frame(y, tr)) ## Example no. 3: Implementing binomial model in gamlss # model fitting y <- rBI(30, bd=50, mu=rep(c(.2, .5, .9), each=10)) m <- 50 tr <- gl(3, 10) fit3 <- gamlss(cbind(y, m-y) ~ tr, family=BI) # diagfun d.fun <- function(obj) resid(obj) # simfun s.fun <- function(n, obj) { mu <- obj$mu.fv bd <- obj$bd rBI(n, bd=bd, mu=mu) } # fitfun my.data <- data.frame(y, tr, m) f.fun <- function(y.) gamlss(cbind(y., m-y.) ~ tr, family=BI, data=my.data) # hnp call hnp(fit3, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## End(Not run)
Effects of three agricultural oils on Diaphorina citri oviposition.
data(oil)
data(oil)
A data frame with 70 observations on the following 2 variables.
y
|
numeric | number of eggs laid |
treat
|
factor | treatments applied in the experiment |
In an experiment to assess the effect of three agricultural oils on the oviposition of Diaphorina citri, seventy Orange Jessamine (Murraya paniculata) plants were sprayed with solutions of the mineral oils Oppa and Iharol, and the vegetable oil Nortox. The experiment used the oils in concentrations of 0.5 and 1.0 percent and a control of plain water set out in a completely randomized design with ten replicates. Following treatment, when the plants were dry, ten pregnant females of D. citri were released on each plant. After five days, the insects were removed and the total number of eggs on each plant was observed, see Amaral et al (2012). This is an example of aggregated data as the number of eggs is the sum over the (unrecorded) numbers of eggs deposited each day and the possibility of day to day variation may contribute additional variability to the recorded counts.
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
Amaral, F. S. A., Poltronieri, A. S., Alves, E. B., Omoto, C. (2012) Efeito de oleos agricolas no comportamento de oviposicao e viabilidade de ovos de Diaphorina citri Kuwayama (Hemiptera: Psyllidae). In: XX Simposio Internacional de Iniciacao Cientifica da Universidade de Sao Paulo, 2012, Pirassununga
data(oil) # Poisson fit model1 <- glm(y ~ treat, family=poisson, data=oil) anova(model1, test="Chisq") sum(resid(model1, ty="pearson")^2) # Quasi-Poisson fit model2 <- glm(y ~ treat, family=quasipoisson, data=oil) summary(model2) anova(model2,test="F") summary(model2)$dispersion # Negative binomial fit require(MASS) model3 <- glm.nb(y ~ treat, data=oil) thetahat <- summary(model3)$theta anova(model3, test="F") # half-normal plots par(mfrow=c(1,3),cex=1.4, cex.main=0.9) hnp(model1,pch=4, main="(a) Poisson", xlab="Half-normal scores", ylab="Deviance residuals") hnp(model2,pch=4, main="(b) Quasi-Poisson", xlab="Half-normal scores", ylab='') hnp(model3,pch=4, main="(c) Negative binomial", xlab="Half-normal scores", ylab='') ## Not run: # using aods3 require(aods3) model3b <- aodml(y ~ treat, family="nb", phi.scale="inverse", fixpar=list(8, 1.086148), data=oil) hnp(model3b) ## End(Not run) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
data(oil) # Poisson fit model1 <- glm(y ~ treat, family=poisson, data=oil) anova(model1, test="Chisq") sum(resid(model1, ty="pearson")^2) # Quasi-Poisson fit model2 <- glm(y ~ treat, family=quasipoisson, data=oil) summary(model2) anova(model2,test="F") summary(model2)$dispersion # Negative binomial fit require(MASS) model3 <- glm.nb(y ~ treat, data=oil) thetahat <- summary(model3)$theta anova(model3, test="F") # half-normal plots par(mfrow=c(1,3),cex=1.4, cex.main=0.9) hnp(model1,pch=4, main="(a) Poisson", xlab="Half-normal scores", ylab="Deviance residuals") hnp(model2,pch=4, main="(b) Quasi-Poisson", xlab="Half-normal scores", ylab='') hnp(model3,pch=4, main="(c) Negative binomial", xlab="Half-normal scores", ylab='') ## Not run: # using aods3 require(aods3) model3b <- aodml(y ~ treat, family="nb", phi.scale="inverse", fixpar=list(8, 1.086148), data=oil) hnp(model3b) ## End(Not run) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
Number of embryos of orange variety Caipira produced with different sugar types.
data("orange")
data("orange")
A data frame with 150 observations on the following 4 variables.
block
a factor with levels 1
2
3
4
5
sugar
a factor with levels Maltose
Glucose
Lactose
Galactose
Sucrose
Glycerol
dose
a numeric vector
embryos
a numeric vector
To study the effect of six sugars (maltose, glucose, galactose, lactose, sucrose and glycerol) on the stimulation of somatic embryos from callus cultures, the number of embryos after approximately four weeks was observed. The experiment was set up in a completely randomized block design with five blocks and the six sugars at dose levels of 18, 37, 75, 110 and 150 mM for the first five and 6, 12, 24, 36 and 50 mM for glycerol, see Tomaz (2001). The main interest was in the dose-response relationship.
Tomaz ML, Mendes BMJ, Filho FAM, Demetrio CGB, Jansakul N, Rodriguez APM (1997). Somatic embryogenesis in Citrus spp.: Carbohydrate stimulation and histodifferentiation. In Vitro Cellular & Developmental Biology - Plant, 37, 446–452.
Moral, R. A., Hinde, J. and Demétrio, C. G. B. (2017) Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81(10):1-23.
data(orange) require(gamlss) fit_nbI <- gamlss(embryos ~ block + poly(dose, 2) * sugar, family=NBII(), data=orange) d.fun <- function(obj) resid(obj) s.fun <- function(n, obj) { mu <- obj$mu.fv sigma <- obj$sigma.fv rNBII(n, mu, sigma) } f.fun <- function(y.) { gamlss(y. ~ block + poly(dose, 2) * sugar, family=NBII(), data=orange) } ## Not run: hnp(fit_nbI, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## End(Not run) fit_pred <- gamlss(embryos ~ poly(dose, 2) * sugar, family=NBII(), data=orange) orange.pred <- rbind(expand.grid(sugar=levels(orange$sugar)[-6], dose=18:150), expand.grid(sugar="Glycerol", dose=6:50)) orange.pred$pred <- predict(fit_pred, newdata=orange.pred, type="response") require(latticeExtra) trellis.par.set(strip.background=list(col="lightgrey")) xyplot(embryos ~ dose | sugar, scales=list(relation="free"), layout=c(3,2), data=orange, col=1, xlab="Dose levels", ylab="Number of embryos") + as.layer(xyplot(pred ~ dose | sugar, type="l", col=1, data=orange.pred))
data(orange) require(gamlss) fit_nbI <- gamlss(embryos ~ block + poly(dose, 2) * sugar, family=NBII(), data=orange) d.fun <- function(obj) resid(obj) s.fun <- function(n, obj) { mu <- obj$mu.fv sigma <- obj$sigma.fv rNBII(n, mu, sigma) } f.fun <- function(y.) { gamlss(y. ~ block + poly(dose, 2) * sugar, family=NBII(), data=orange) } ## Not run: hnp(fit_nbI, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun) ## End(Not run) fit_pred <- gamlss(embryos ~ poly(dose, 2) * sugar, family=NBII(), data=orange) orange.pred <- rbind(expand.grid(sugar=levels(orange$sugar)[-6], dose=18:150), expand.grid(sugar="Glycerol", dose=6:50)) orange.pred$pred <- predict(fit_pred, newdata=orange.pred, type="response") require(latticeExtra) trellis.par.set(strip.background=list(col="lightgrey")) xyplot(embryos ~ dose | sugar, scales=list(relation="free"), layout=c(3,2), data=orange, col=1, xlab="Dose levels", ylab="Number of embryos") + as.layer(xyplot(pred ~ dose | sugar, type="l", col=1, data=orange.pred))
The plot
method for objects of class hnp.
## S3 method for class 'hnp' plot(x, cex, pch, colour, lty, lwd, type, xlab, ylab, main, legpos, legcex, ...)
## S3 method for class 'hnp' plot(x, cex, pch, colour, lty, lwd, type, xlab, ylab, main, legpos, legcex, ...)
x |
object of class |
cex |
character expansion size. |
pch |
character string or vector of one character or integer for plotting characters, see |
colour |
vector of colours. |
lty |
vector of line types. |
lwd |
vector of line widths. |
type |
type of plot for each envelope band and points. Default is |
xlab |
title for x axis, as in |
ylab |
title for y axis, as in |
main |
plot title. |
legpos |
if |
legcex |
if |
... |
extra graphical arguments passed to |
None.
Rafael A. Moral <[email protected]>, John Hinde and Clarice G. B. Demétrio
Moral, R. A., Hinde, J. and Demétrio, C. G. B. (2017) Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81(10):1-23.
Demétrio, C. G. B. and Hinde, J. (1997) Half-normal plots and overdispersion. GLIM Newsletter 27:19-26.
Hinde, J. and Demétrio, C. G. B. (1998) Overdispersion: models and estimation. Computational Statistics and Data Analysis 27:151-170.
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
## Simple Poisson regression set.seed(100) counts <- c(rpois(5, 2), rpois(5, 4), rpois(5, 6), rpois(5, 8)) treatment <- gl(4, 5) fit <- glm(counts ~ treatment, family=poisson) anova(fit, test="Chisq") ## half-normal plot hnp(fit) ## or save it in an object and then use the plot method my.hnp <- hnp(fit, print.on=TRUE, plot=FALSE) plot(my.hnp) ## changing graphical parameters plot(my.hnp, lty=2, pch=4, cex=1.2) plot(my.hnp, lty=c(2,3,2), pch=4, cex=1.2, col=c(2,2,2,1)) plot(my.hnp, main="Half-normal plot", xlab="Half-normal scores", ylab="Deviance residuals", legpos="bottomright")
## Simple Poisson regression set.seed(100) counts <- c(rpois(5, 2), rpois(5, 4), rpois(5, 6), rpois(5, 8)) treatment <- gl(4, 5) fit <- glm(counts ~ treatment, family=poisson) anova(fit, test="Chisq") ## half-normal plot hnp(fit) ## or save it in an object and then use the plot method my.hnp <- hnp(fit, print.on=TRUE, plot=FALSE) plot(my.hnp) ## changing graphical parameters plot(my.hnp, lty=2, pch=4, cex=1.2) plot(my.hnp, lty=c(2,3,2), pch=4, cex=1.2, col=c(2,2,2,1)) plot(my.hnp, main="Half-normal plot", xlab="Half-normal scores", ylab="Deviance residuals", legpos="bottomright")
Progeny of Sitophilus zeamais, the maize weevil, when treated with different organic extracts
data(progeny)
data(progeny)
A data frame with 40 observations on the following 2 variables.
extract
|
factor | levels leaf , branch , seed and control
|
y
|
numeric | number of emerged insects after 60 days |
Petri dishes containing 10g of corn were treated with extracts prepared with different parts of the plant Annona mucosa (seeds, leaves and branches) at a concentration of 1500 mg/kg or just water (control), using a completely randomized design with 10 replicates. Then 20 S. zeamais adults were placed in each Petri dish and the focus is on the numbers of emerged insects (progeny) after 60 days, see Ribeiro et al (2013).
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
Ribeiro, L. P., Vendramin, J. D., Bicalho, K. U., Andrade, M. S., Fernandes, J. B., Moral, R. A., Demétrio, C. G. B. (2013) Annona mucosa Jacq. (Annonaceae): A promising source of bioactive compounds against Sitophilus zeamais Mots. (Coleoptera: Curculionidae). J Stored Prod Res 55:6-14
Moral, R. A., Hinde, J. and Demétrio, C. G. B. (2017) Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81(10):1-23.
data(progeny) # Poisson fit model1 <- glm(y ~ extract, family=poisson, data=progeny) anova(model1, test="Chisq") # Quasi-Poisson fit model2 <- glm(y ~ extract, family=quasipoisson, data=progeny) summary(model2)$dispersion anova(model2, test="F") # half-normal plots par(mfrow=c(1,2),cex=1.4, cex.main=0.9, pty='s') hnp(model1, pch=4, main="(a) Poisson; log-linear", xlab="Half-normal scores", ylab="Deviance residuals") hnp(model2, pch=4, main="(b) Quasi-Poisson; log-linear", xlab="Half-normal scores", ylab="Deviance residuals") anova(model1, test="Chisq") # Poisson model anova(model2, test="F") # quasi-Poisson model summary(model1) # Poisson model summary(model2) # quasi-Poisson model # now with factor level parameterisation summary(update(model1,.~.-1)) summary(update(model2,.~.-1)) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
data(progeny) # Poisson fit model1 <- glm(y ~ extract, family=poisson, data=progeny) anova(model1, test="Chisq") # Quasi-Poisson fit model2 <- glm(y ~ extract, family=quasipoisson, data=progeny) summary(model2)$dispersion anova(model2, test="F") # half-normal plots par(mfrow=c(1,2),cex=1.4, cex.main=0.9, pty='s') hnp(model1, pch=4, main="(a) Poisson; log-linear", xlab="Half-normal scores", ylab="Deviance residuals") hnp(model2, pch=4, main="(b) Quasi-Poisson; log-linear", xlab="Half-normal scores", ylab="Deviance residuals") anova(model1, test="Chisq") # Poisson model anova(model2, test="F") # quasi-Poisson model summary(model1) # Poisson model summary(model2) # quasi-Poisson model # now with factor level parameterisation summary(update(model1,.~.-1)) summary(update(model2,.~.-1)) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
Viability of Trichogramma galloi, a parasitoid wasp, when infected with Wolbachia, a bacteria known to change its reproductive aspects.
data(wolbachia)
data(wolbachia)
A data frame with 106 observations on the following 3 variables.
y
|
numeric; number of eggs with an orifice |
m
|
numeric; total number of parasitised eggs |
treat
|
a factor with levels m+f+ , m+f- , f+ , m-f- and f- , |
where m stands for male, f stands for female, + means infected and - means non-infected; |
|
f+ and f- represent virgin infected and non-infected females, respectively.
|
The bacteria Wolbachia is commonly found in various insect species and has the ability to change reproductive aspects of its host. When it infects the wasp Trichogramma galloi it is known to induce thelythokous parthenogenesis, i.e., only females are produced from unfertilized eggs. Souza (2011) conducted an experiment to assess the effects of Wolbachia on the viability of T. galloi eggs. Around 100 Diatraea saccharalis eggs (the host) were offered to infected (+) or non-infected (-) parasitoid couples or virgin females every day until the death of the female. The parasitised eggs, easily identifiable because they become dark, were then kept on moist filter paper for twenty days when counts were then made of the number of eggs that had an orifice, which meant that an adult parasitoid had emerged and thus the parasitoid was viable.
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
Souza, A. R. (2011) A interacao Wolbachia - Trichogramma galloi Zucchi, 1988 (Hymenoptera: Trichogrammatidae). Master's dissertation, ESALQ-USP
data(wolbachia) # Binomial fit model1 <- glm(cbind(y, m-y) ~ treat, family=binomial, data=wolbachia) anova(model1, test="Chisq") # Quasi-binomial fit model2 <- glm(cbind(y, m-y) ~ treat, family=quasibinomial, data=wolbachia) summary(model2) anova(model2,test="F") ## half normal plots par(mfrow=c(1,2),cex=1.2, cex.main=1.1) hnp(model1, print=TRUE, pch=4, main="(a) Binomial", xlab="Half-normal scores", ylab="Deviance residuals") hnp(model2, print=TRUE, pch=4, main="(b) Quasi-binomial", xlab="Half-normal scores", ylab='') ## Not run: # Beta-binomial fit ### using package aods3 require(aods3) model3 <- aodml(cbind(y, m-y) ~ treat, family='bb', data=wolbachia) hnp(model3, print=TRUE, pch=4, xlab="Half-normal scores", ylab='Deviance residuals') ### using package VGAM require(VGAM) model3a <- vglm(cbind(y, m-y) ~ treat, family=betabinomial, data=wolbachia) summary(model3a) 1/(1+exp(-coef(model3a)[2])) # phi estimate hnp(model3a, data=wolbachia) ## End(Not run) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)
data(wolbachia) # Binomial fit model1 <- glm(cbind(y, m-y) ~ treat, family=binomial, data=wolbachia) anova(model1, test="Chisq") # Quasi-binomial fit model2 <- glm(cbind(y, m-y) ~ treat, family=quasibinomial, data=wolbachia) summary(model2) anova(model2,test="F") ## half normal plots par(mfrow=c(1,2),cex=1.2, cex.main=1.1) hnp(model1, print=TRUE, pch=4, main="(a) Binomial", xlab="Half-normal scores", ylab="Deviance residuals") hnp(model2, print=TRUE, pch=4, main="(b) Quasi-binomial", xlab="Half-normal scores", ylab='') ## Not run: # Beta-binomial fit ### using package aods3 require(aods3) model3 <- aodml(cbind(y, m-y) ~ treat, family='bb', data=wolbachia) hnp(model3, print=TRUE, pch=4, xlab="Half-normal scores", ylab='Deviance residuals') ### using package VGAM require(VGAM) model3a <- vglm(cbind(y, m-y) ~ treat, family=betabinomial, data=wolbachia) summary(model3a) 1/(1+exp(-coef(model3a)[2])) # phi estimate hnp(model3a, data=wolbachia) ## End(Not run) ## for discussion on the analysis of this data set, ## see Demetrio et al. (2014)