Title: | Finite Mixture Distribution Models |
---|---|
Description: | Fit finite mixture distribution models to grouped data and conditional data by maximum likelihood using a combination of a Newton-type algorithm and the EM algorithm. |
Authors: | Peter Macdonald <[email protected]>, with contributions from Juan Du <[email protected]> |
Maintainer: | Peter Macdonald <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5-5 |
Built: | 2024-12-01 08:51:21 UTC |
Source: | CRAN |
Compute analysis of variance tables for one or two mixture model objects.
## S3 method for class 'mix' anova(object, mixobj2, ...)
## S3 method for class 'mix' anova(object, mixobj2, ...)
object |
an object of class |
mixobj2 |
an object of the same type to be compared with |
... |
additional objects of the same type. |
An object of class "anova"
inheriting from class
"data.frame"
. When given a single argument this
function produces a table which tests whether the model
is significant. The table contains the residual
degrees of freedom, Chi-square statistic and P value.
If the class of the argument is not "mix"
, this function
returns NULL
. When given two objects, it tests the
models against one another and lists them in the order
of number of parameters fitted. For the model with
fewer parameters fitted, the change in degrees of
freedom is given. This only make statistical sense if
the models are nested. If one of arguments does not
belong to the class "mix"
, the function will give
the anova table for the other argument; if both of
them do not, it returns NULL
.
The comparison between two models will only be valid if they are fitted to the same dataset. And the two models should be nested.
The model fitting function mix
, the generic function
anova
.
data(pike65) # load the grouped data `pike65' data(pikepar) # load the initial values of parameters for the data `pike65' fitpike3 <- mix(pike65, pikepar, "lnorm", mixconstr(conmu = "MFX", fixmu = c(FALSE, FALSE, FALSE, FALSE, TRUE), consigma = "CCV"), emstep = 3) anova(fitpike3) fitpike4 <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3) anova(fitpike4) anova(fitpike3, fitpike4) anova(fitpike4, fitpike3)
data(pike65) # load the grouped data `pike65' data(pikepar) # load the initial values of parameters for the data `pike65' fitpike3 <- mix(pike65, pikepar, "lnorm", mixconstr(conmu = "MFX", fixmu = c(FALSE, FALSE, FALSE, FALSE, TRUE), consigma = "CCV"), emstep = 3) anova(fitpike3) fitpike4 <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3) anova(fitpike4) anova(fitpike3, fitpike4) anova(fitpike4, fitpike3)
We randomly generate four groups of binomial distribution data with
means 4, 8, 12, 16, and corresponding variances 3.2, 4.8, 4.8 and 3.2.
Then we mix the four data groups with 100 observations for each group,
i.e., with equal proportions. After grouping the mixture data, we obtain
the grouped data bindat
.
The bindat
data frame has 21 rows and 2 columns.
data(bindat)
data(bindat)
This data frame contains the following columns:
the boundaries of grouping intervals.
the frequencies of observation falling into each interval.
data(bindat) data(binpar) plot.mixdata(bindat) fit <- mix(bindat, binpar, "binom", mixconstr(conpi = "PFX", fixpi = c(TRUE, TRUE, TRUE, TRUE), consigma = "BINOM", size = c(20, 20, 20, 20))) fit plot(fit)
data(bindat) data(binpar) plot.mixdata(bindat) fit <- mix(bindat, binpar, "binom", mixconstr(conpi = "PFX", fixpi = c(TRUE, TRUE, TRUE, TRUE), consigma = "BINOM", size = c(20, 20, 20, 20))) fit plot(fit)
Starting values of parameters for fitting a mixture distribution to the data set bindat
.
The binpar
data frame has 4 rows and 3 columns.
data(binpar)
data(binpar)
This data frame contains the following columns:
the starting values for proportions.
the starting values for means.
the starting values for standard deviations.
data(binpar)
data(binpar)
Data for Cassie's (1954) analysis of size frequency distributions.
The cassie
data frame has 40 rows and 2 columns.
data(cassie)
data(cassie)
This data frame contains the following columns:
the boundaries of grouping intervals.
the frequencies of observation falling into each interval.
Cassie, R.M. (1954). Some uses of probability paper in the analysis of size frequency distributions. Aust. J. Mar. Freshwater Res. 5 , 513-522.
The data, lengths (in) of 256 snapper (Chrysophrys auratus Forster) taken by a trawl with a mesh of about 1.5 in, are given in Table 5 of that paper. Cassie's results are given in his Table 1.
http://www.math.mcmaster.ca/peter/mix/demex/excass.html
data(cassie) plot.mixdata(cassie)
data(cassie) plot.mixdata(cassie)
coef.mix
is a function which extracts mixture model coefficients
from objects returned by the model fitting function mix
. It is
called via the generic function coef
.
## S3 method for class 'mix' coef(object, natpar = FALSE, ...)
## S3 method for class 'mix' coef(object, natpar = FALSE, ...)
object |
an object of class |
natpar |
a logical scalar specifying whether the natural parameters should be given. |
... |
other arguments. |
A data frame containing three variables, which are,
in order, the proportions, means, and standard
deviations, respectively. If natpar
is TRUE
,
then the natural parameters of component
distributions are also displayed.
mix
for model fitting.
data(pike65) # load the grouped data `pike65' data(pikepar) # load the initial values of parameters for the data `pike65' fit <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3) coef(fit) coef(fit, natpar = TRUE)
data(pike65) # load the grouped data `pike65' data(pikepar) # load the initial values of parameters for the data `pike65' fit <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3) coef(fit) coef(fit, natpar = TRUE)
It combines automatically grouped data with conditional data when enter the conditional samples.
conditdat(mixdat, k, conditsamples)
conditdat(mixdat, k, conditsamples)
mixdat |
a data frame containing grouped data, whose first column should be the right boundaries of grouping intervals, and the second one should be the numbers of observations falling into each interval. |
k |
the number of components. |
conditsamples |
a vector containing conditional data, which consists of the conditional samples, the first element of each sample is a number indicating which interval this sample comes from. |
A data frame containing the grouped data with conditional data.
mixgroup
for constructing grouped and conditional
data.
data(pike65) # load the data set `pike65' pike65 # display the data set `pike65' conditdat(pike65, k = 5, conditsamples = c(c(4, 9, 2, 0, 0, 0), c(5, 8, 6, 0, 0,0), c(12, 0, 2, 34, 0, 0), c(13, 0, 0, 21, 0, 0), c(15, 0, 0, 5, 5, 0), c(16, 0, 0, 6, 5, 1), c(17, 0, 0, 5, 7, 0), c(18, 0, 0, 4, 4, 3), c(19, 0, 0, 0, 8, 0), c(20, 0, 0, 0, 2, 1), c(21, 0, 0, 0, 1, 5), c(22, 0, 0, 0, 2, 4))) # add conditional data to the grouped data `pike65'
data(pike65) # load the data set `pike65' pike65 # display the data set `pike65' conditdat(pike65, k = 5, conditsamples = c(c(4, 9, 2, 0, 0, 0), c(5, 8, 6, 0, 0,0), c(12, 0, 2, 34, 0, 0), c(13, 0, 0, 21, 0, 0), c(15, 0, 0, 5, 5, 0), c(16, 0, 0, 6, 5, 1), c(17, 0, 0, 5, 7, 0), c(18, 0, 0, 4, 4, 3), c(19, 0, 0, 0, 8, 0), c(20, 0, 0, 0, 2, 1), c(21, 0, 0, 0, 1, 5), c(22, 0, 0, 0, 2, 4))) # add conditional data to the grouped data `pike65'
A total of 1000 observations was generated by computer to follow the mixture distribution 1/3 E(1) + 1/3 E(4) + 1/3 E(16) where E(m) denotes an exponential distribution with mean m.
The expdat
data frame has 25 rows and 2 columns.
data(expdat)
data(expdat)
This data frame contains the following columns:
the boundaries of grouping intervals.
the frequencies of observation falling into each interval.
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
http://www.math.mcmaster.ca/peter/mix/demex/exexp.html
data(expdat) plot.mixdata(expdat)
data(expdat) plot.mixdata(expdat)
Fifteen normal components grouped over eighty intervals.
The fiftn80
data frame has 80 rows and 2 columns.
data(fiftn80)
data(fiftn80)
This data frame contains the following columns:
the boundaries of grouping intervals.
the frequencies of observation falling into each interval.
A total of 820 observations were generated by computer to follow the distribution 1/15 N(5, 1) + 1/15 N(10, 1) + ... + 1/15 N(75, 1) where N(m, s) denotes a normal distribution with mean m and standard deviation s.
http://www.math.mcmaster.ca/peter/mix/demex/ex1580.html
data(fiftn80) plot.mixdata(fiftn80)
data(fiftn80) plot.mixdata(fiftn80)
fitted.mix
is a function which computes fitted
values from objects returned by the modeling function
mix
. It is called via the generic function
fitted
.
## S3 method for class 'mix' fitted(object, digits = NULL, ...)
## S3 method for class 'mix' fitted(object, digits = NULL, ...)
object |
an object of class |
digits |
a specified number of decimal places to be reserved. |
... |
other arguments. |
List with the following components:
mixed |
the estimated mixed data, that is, the fitted numbers of observations falling into each interval. |
joint |
the estimated joint data, that is, the fitted numbers of observations from each component falling into every interval. |
conditional |
the estimated conditional data to be
returned if |
conditprob |
the estimated conditional probabilities of observations from given interval belonging to each component. |
mix
for fitting mixture distributions.
data(pike65) data(pikepar) fit1 <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3) fitted(fit1) data(pike65sg) fit2 <- mix(pike65sg, pikepar, "gamma", mixconstr(consigma = "CCV"), usecondit = TRUE) fitted(fit2, digits = 2)
data(pike65) data(pikepar) fit1 <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3) fitted(fit1) data(pike65sg) fit2 <- mix(pike65sg, pikepar, "gamma", mixconstr(consigma = "CCV"), usecondit = TRUE) fitted(fit2, digits = 2)
groupstats
is a function which estimates the
proportion, mean and standard deviation for a mixture
distribution with one component.
groupstats(mixdat)
groupstats(mixdat)
mixdat |
A data frame containing grouped data, whose first column should be right boundaries of grouping intervals where the first and last intervals are open-ended; whose second column should consist of the frequencies indicating numbers of observations falling into each interval. |
A list containing the following components:
pi |
the value is |
mu |
the estimated mean of |
sigma |
the estimated standard deviation of |
mixgroup
for grouping data, mixparam
for
constructing starting values of parameters.
data(pike65) groupstats(pike65)
data(pike65) groupstats(pike65)
Find a set of overlapping component distributions that gives the best fit to grouped data and conditional data, using a combination of a Newton-type method and EM algorithm.
mix(mixdat, mixpar, dist = "norm", constr = list(conpi = "NONE", conmu = "NONE", consigma = "NONE", fixpi = NULL, fixmu = NULL, fixsigma = NULL, cov = NULL, size = NULL), emsteps = 1, usecondit = FALSE, exptol = 5e-06, print.level = 0, ...)
mix(mixdat, mixpar, dist = "norm", constr = list(conpi = "NONE", conmu = "NONE", consigma = "NONE", fixpi = NULL, fixmu = NULL, fixsigma = NULL, cov = NULL, size = NULL), emsteps = 1, usecondit = FALSE, exptol = 5e-06, print.level = 0, ...)
mixdat |
A data frame containing grouped data, whose first column should be right boundaries of grouping intervals where the first and last intervals are open-ended; whose second column should consist of the frequencies indicating numbers of observations falling into each interval. If conditional data are available, this data frame should have k + 2 columns, where k is the number of components, whose element in row j and column i + 2 is the number of observations from the jth interval belonging to the ith component. |
mixpar |
A data frame containing starting values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
dist |
the distribution of components, it can be one of
|
constr |
a list of constraints on parameters of
component distributions. See function |
emsteps |
a non-negative integer specifying the number of EM steps to be performed. |
usecondit |
logical. If |
exptol |
a positive scalar giving the tolerance at which the scaled fitted value is considered large enough to be a degree of freedom. |
print.level |
this argument determines the level of printing
which is done during the optimization process. The default
value of |
... |
additional arguments to the optimization function
|
.
A list containing the following items:
parameters |
A data frame containing estimated values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
se |
A data frame containing estimated values for standard errors of parameters of component distributions. |
distribution |
the distribution used to fit the data. |
constraint |
the constraints on parameters. |
chisq |
the goodness-of-fit chi-square statistic. |
df |
degrees of freedom of the fitted mixture model. |
P |
a significance level (P-value) for the goodness-of-fit test. |
vmat |
covariance matrix for the estimated parameters. |
mixdata |
the original data, i.e. the argument |
usecondit |
the value of the argument |
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
mixgroup
for grouping data, mixparam
for
organizing the parameter values, mixconstr
for
constructing constraints. nlm
for additional
arguments.
data(pike65) data(pikepar) fitpike1 <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3) fitpike1 plot(fitpike1) data(pike65sg) fitpike2 <- mix(pike65sg, pikepar, "lnorm", emsteps = 3, usecondit = TRUE) fitpike2 plot(fitpike2) data(bindat) data(binpar) fitbin1 <- mix(bindat, binpar, "binom", constr = mixconstr(consigma = "BINOM", size = c(20, 20, 20, 20))) plot(fitbin1) fitbin2 <- mix(bindat, binpar, "binom", constr = mixconstr(conpi = "PFX", fixpi = c(TRUE, TRUE, TRUE, TRUE), consigma = "BINOM", size = c(20, 20, 20, 20))) plot(fitbin2)
data(pike65) data(pikepar) fitpike1 <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3) fitpike1 plot(fitpike1) data(pike65sg) fitpike2 <- mix(pike65sg, pikepar, "lnorm", emsteps = 3, usecondit = TRUE) fitpike2 plot(fitpike2) data(bindat) data(binpar) fitbin1 <- mix(bindat, binpar, "binom", constr = mixconstr(consigma = "BINOM", size = c(20, 20, 20, 20))) plot(fitbin1) fitbin2 <- mix(bindat, binpar, "binom", constr = mixconstr(conpi = "PFX", fixpi = c(TRUE, TRUE, TRUE, TRUE), consigma = "BINOM", size = c(20, 20, 20, 20))) plot(fitbin2)
Construct constraints on parameters and check if the constraints are invalid. See the reference for details.
mixconstr(conpi = "NONE", conmu = "NONE", consigma = "NONE", fixpi = NULL, fixmu = NULL, fixsigma = NULL, cov = NULL, size = NULL)
mixconstr(conpi = "NONE", conmu = "NONE", consigma = "NONE", fixpi = NULL, fixmu = NULL, fixsigma = NULL, cov = NULL, size = NULL)
conpi |
a constraint on proportions, it can be either
|
conmu |
a constraint on means, it can be |
consigma |
a constraint on standard deviations, it can be
|
fixpi |
|
fixmu |
similar to |
fixsigma |
similar to |
cov |
|
size |
|
A list containing the following components, which are,
in order, conpi
, conmu
, consigma
, fixpi
,
fixmu
, fixsigma
, cov
, size
.
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
mixgroup
for grouping data, mixparam
for
constructing starting values of parameters.
mixconstr() mixconstr(conmu = "MEQ", consigma = "SFX", fixsigma = c(TRUE, FALSE, TRUE, TRUE, FALSE)) mixconstr(consigma = "BINOM", size = c(25, 25, 25))
mixconstr() mixconstr(conmu = "MEQ", consigma = "SFX", fixsigma = c(TRUE, FALSE, TRUE, TRUE, FALSE)) mixconstr(consigma = "BINOM", size = c(25, 25, 25))
as.mixdata
checks if its argument is mixed data, if true,
it returns the data with class "mixdata"
, if false, it
returns NULL
.
is.mixdata
returns TRUE
if its argument is of class
"mixdata"
and FALSE
otherwise.
as.mixdata(x) is.mixdata(x)
as.mixdata(x) is.mixdata(x)
x |
object to be tested. |
Mixed data consist of grouped data and conditional data (if available). Grouped data is either a data frame or a matrix, whose first column should be right boundaries of grouping intervals where the first and last intervals are open-ended; whose second column should consist of the frequencies indicating numbers of observations falling into each interval. If conditional data are available, mixed data should have k + 2 columns, where k is the number of components, whose element in row j and column i + 2 is the number of observations from the jth interval belonging to the ith component.
mixgroup
to construct mixed data.
data(pike65) # load data set `pike65' pike65 # display the mixed data `pike65' data(pike65sg) # load data set `pike65sg' pike65sg # display the mixed data `pike65sg' data(pikepar) as.mixdata(pikepar) as.mixdata(pike65) is.mixdata(pike65) is.mixdata(as.mixdata(pike65))
data(pike65) # load data set `pike65' pike65 # display the mixed data `pike65' data(pike65sg) # load data set `pike65sg' pike65sg # display the mixed data `pike65sg' data(pikepar) as.mixdata(pikepar) as.mixdata(pike65) is.mixdata(pike65) is.mixdata(as.mixdata(pike65))
Group raw data in the form of numbers of observations over successive intervals.
mixgroup(x, breaks = NULL, xname = NULL, k = NULL, usecondit = FALSE)
mixgroup(x, breaks = NULL, xname = NULL, k = NULL, usecondit = FALSE)
x |
a data frame or matrix containing raw data, whose first column should be the measurements to be grouped, and second column, if available, includes the numbers indicating which component each individual belongs to. |
breaks |
one of: * a vector giving the boundaries of intervals which raw data are grouped into, * a single number giving the number of intervals, * a character string naming an algorithm to compute the number of intervals, * a function to compute the number of intervals. In the last three cases the number is a suggestion only. |
xname |
the name of measurement. |
k |
the number of components. |
usecondit |
if |
A data frame containing grouped data derived from raw data,
whose first column includes the right boundaries of grouping
intervals, where the first and last intervals are open-ended;
whose second column consists of the frequencies which are
the numbers of observations falling into each interval. If
usecondit
is TRUE
and the numbers indicating which
component the individual comes from are available, conditional
data which can be regarded as a table, whose element in row j
and column i is the number of observations from the jth
interval belonging to the ith component, will be displayed
with grouped data.
hist
for more information about the argument
breaks
, is.mixdata
for checking the class of
data sets, mixparam
for organizing the parameter
values, mixconstr
for constructing constraints.
data(pikeraw) # load raw data `pikeraw' pikeraw # display the data set `pikeraw' mixgroup(pikeraw) # group raw data pikemd <- mixgroup(pikeraw, breaks = c(0, seq(19.75, 65.75, 2), 80)) plot(pikemd) mixgroup(pikeraw, breaks = c(0, seq(19.75, 65.75, 2), 80), usecondit = TRUE, k = 5) # construct grouped data associated with conditional data mixgroup(pikeraw, usecondit = TRUE) mixgroup(pikeraw, usecondit = TRUE, k = 3) # grouping data with a warning message mixgroup(pikeraw, usecondit = TRUE, k = 8)
data(pikeraw) # load raw data `pikeraw' pikeraw # display the data set `pikeraw' mixgroup(pikeraw) # group raw data pikemd <- mixgroup(pikeraw, breaks = c(0, seq(19.75, 65.75, 2), 80)) plot(pikemd) mixgroup(pikeraw, breaks = c(0, seq(19.75, 65.75, 2), 80), usecondit = TRUE, k = 5) # construct grouped data associated with conditional data mixgroup(pikeraw, usecondit = TRUE) mixgroup(pikeraw, usecondit = TRUE, k = 3) # grouping data with a warning message mixgroup(pikeraw, usecondit = TRUE, k = 8)
Construct starting values for parameters of a mixture model.
mixparam(mu, sigma, pi = NULL)
mixparam(mu, sigma, pi = NULL)
mu |
a vector of means of component distributions, which should be in ascending order. |
sigma |
a vector of standard deviations of component
distributions, which are corresponding to the means. |
pi |
the corresponding mixing proportions of components.
If |
A data frame containing three variables, which are, in order, the proportions, means, and standard deviations.
mixgroup
for grouping data, mixconstr
for constructing constraints.
mixparam(mu = c(20, 30, 40), sigma = c(2, 3, 4)) mixparam(c(20, 30, 40), c(3), c(0.15, 0.78, 0.07))
mixparam(mu = c(20, 30, 40), sigma = c(2, 3, 4)) mixparam(c(20, 30, 40), c(3), c(0.15, 0.78, 0.07))
Scale mixture of three normal distributions.
The normals
data frame has 25 rows and 2 columns.
data(normals)
data(normals)
This data frame contains the following columns:
the boundaries of grouping intervals.
the frequencies of observation falling into each interval.
A total of 249 observations were generated by computer to follow the mixture distribution 1/3 N(12.5, 1) + 1/3 N(12.5, 3) + 1/3 N(12.5, 5) where N(m, s) denotes a normal distribution with mean m and standard deviation s.
http://www.math.mcmaster.ca/peter/mix/demex/exscle.html
data(normals) plot.mixdata(normals)
data(normals) plot.mixdata(normals)
The data give the ratio of "forehead" breadth to body length for 1000 crabs sampled at Naples by Professor W.F.R. Weldon.
The pearson
data frame has 29 rows and 2 columns.
data(pearson)
data(pearson)
This data frame contains the following columns:
the boundaries of grouping intervals.
the frequencies of observation falling into each interval.
Pearson, K. (1894). Contributions to the mathematical theory of evolution. Phil. Trans. Roy. Soc. London A 185, 71-110.
http://www.math.mcmaster.ca/peter/mix/demex/excrabs.html
data(pearson) plot.mixdata(pearson)
data(pearson) plot.mixdata(pearson)
Starting values of parameters for fitting a mixture distribution to the data set pearson
.
The pearsonpar
data frame has 2 rows and 3 columns.
data(pearsonpar)
data(pearsonpar)
This data frame contains the following columns:
the starting values for proportions.
the starting values for means.
the starting values for standard deviations.
Pearson, K. (1894). Contributions to the mathematical theory of evolution. Phil. Trans. Roy. Soc. London A 185, 71-110.
http://www.math.mcmaster.ca/peter/mix/demex/excrabs.html
data(pearsonpar)
data(pearsonpar)
The raw data pikeraw
give the lengths of 523 pike (Esox lucius), and there are known to
be five age-groups in the sample. We grouped the lengths over 25 intervals to obtain the grouped
data given as separate samples for each age group determined by scale reading.
The pikdat5
data frame has 25 rows and 6 columns.
data(pikdat5)
data(pikdat5)
This data frame contains the following columns:
the boundaries of grouping intervals.
the numbers of observation from each interval belonging to the first age group.
the numbers of observation from each interval belonging to the second age group.
the numbers of observation from each interval belonging to the third age group.
the numbers of observation from each interval belonging to the fourth age group.
the numbers of observation from each interval belonging to the fifth age group.
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
data(pikdat5)
data(pikdat5)
The raw data pikeraw
give the lengths of 523 pike (Esox lucius). We
grouped the lengths over 25 intervals to obtain this length-frequency data.
The pike65
data frame has 25 rows and 2 columns.
data(pike65)
data(pike65)
This data frame contains the following columns:
the boundaries of grouping intervals.
the frequencies of observation falling into each interval.
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
data(pike65) data(pikepar) plot.mixdata(pike65) fit <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3) plot(fit)
data(pike65) data(pikepar) plot.mixdata(pike65) fit <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3) plot(fit)
The raw data pikeraw
give the lengths of 523 pike (Esox lucius), and there are known to
be five age-groups in the sample. After grouping the data, we take subsamples from some
intervals to determine the age group, and then obtain this data set.
The pike65sg
data frame has 25 rows and 7 columns.
data(pike65sg)
data(pike65sg)
This data frame contains the following columns:
the boundaries of grouping intervals.
the frequencies of observation falling into each interval.
the numbers of observation in the subsamples belonging to the first age group.
the numbers of observation in the subsamples belonging to the second age group.
the numbers of observation in the subsamples belonging to the third age group.
the numbers of observation in the subsamples belonging to the fourth age group.
the numbers of observation in the subsamples belonging to the fifth age group.
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
data(pike65sg) data(pikepar) fit1 <- mix(pike65sg, pikepar, "gamma", mixconstr(consigma = "CCV"), usecondit = TRUE) plot(fit1) fit2 <- mix(pike65sg, pikepar, "gamma", usecondit = TRUE) plot(fit2)
data(pike65sg) data(pikepar) fit1 <- mix(pike65sg, pikepar, "gamma", mixconstr(consigma = "CCV"), usecondit = TRUE) plot(fit1) fit2 <- mix(pike65sg, pikepar, "gamma", usecondit = TRUE) plot(fit2)
Starting values of parameters for fitting a mixture distribution to the data set pike65
.
The pikepar
data frame has 5 rows and 3 columns.
data(pikepar)
data(pikepar)
This data frame contains the following columns:
the starting values for proportions.
the starting values for means.
the starting values for standard deviations.
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
data(pikepar)
data(pikepar)
The data give the lengths of 523 pike (Esox lucius), sampled in 1965 from Heming Lake, Manitoba, Canada. There are known to be five age-groups in the sample. For each fish, the age group is determined by scale reading.
The pikeraw
data frame has 523 rows and 2 columns.
data(pikeraw)
data(pikeraw)
This data frame contains the following columns:
the lengths of 523 pike
the age groups of 523 pike
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
data(pikeraw)
data(pikeraw)
A function for plotting of Mix objects. It is called
via the generic function plot
.
## S3 method for class 'mix' plot(x, mixpar = NULL, dist = "norm", root = FALSE, ytop = NULL, clwd = 1, main, sub, xlab, ylab, bty, BW = FALSE, ...)
## S3 method for class 'mix' plot(x, mixpar = NULL, dist = "norm", root = FALSE, ytop = NULL, clwd = 1, main, sub, xlab, ylab, bty, BW = FALSE, ...)
x |
an object of class |
mixpar |
|
dist |
the distribution of components, it can be
|
root |
if |
ytop |
a scalar which determines the top of the y-axis. |
clwd |
a positive number denoting line width, defaulting to |
main |
an overall title for the plot. |
sub |
a subtitle for the plot. |
xlab |
a title for the x-axis. |
ylab |
a title for the y-axis. |
bty |
A character string which determined the type of box which is
drawn about plots. If |
BW |
logical; if TRUE the plot will be drawn in black and white. |
... |
additional arguments to the function |
If the argument x
gives an object of class
"mix"
, the plot will be a histogram for the grouped
data which come from the element mixdata
of x
.
Although the leftmost (first) and rightmost (mth) intervals are
always open-ended, on the histogram the first interval is shown
as being twice the width of the second interval and the mth is
shown as being twice the width of the m - 1st interval. When the
fitted distribution is one of "lnorm"
, "gamma"
and
"weibull"
, the left boundary of the first interval will be
taken zero since negative values and zeroes are not allowed for
these distribution. For the distributions "binom"
, "nbinom"
and "pois"
negative data are not permitted, so the left
boundary of the first interval is taken -0.5. The component
distributions weighted by their respect proportions and the
mixture distribution are computed by the estimated parameter
values from the element parameters
of x
, and
superimposed on the histogram. The distribution of components
will be taken the value of the element distribution
. If sub
,
xlab
, ylab
and bty
are not specified, the default
values will be used. The positions of the means are indicated with
triangles. When the argument root
is TRUE
, a hanging
rootogram will be displayed, that is, if only grouped data are
given, this option plots the histogram with the square root of
relative frequency on the y-axis. If there is a model as well as
data, not only is the y-axis the square root of relative frequency,
also the bars of the histogram, instead of rising from 0, are
shifted up or down so that the mid-point of the top of the bar
is exactly on the curve indicating the mixture distribution
and the bottom of the bar may therefore be above or below the
x-axis. If the bar goes below the x-axis, the portion below is
shown as a blue rectangle. If the bar does not reach the x-axis,
the space between the bottom of the bar and the x-axis is shown
as a blue rectangle. If the blue rectangles are almost above or
below in an area of the x-axis, we may say that the mixture
curve around that area is not fitting well.
mixparam
for organizing the parameter values, mix
for fitting mixture model, plot.mixdata
for plotting
Mixdata objects, plot.default
for additional arguments.
data(pike65) data(pikepar) fit1 <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3) plot(fit1) plot(fit1, root = TRUE) data(bindat) data(binpar) fit2 <- mix(bindat, binpar, "binom", constr = mixconstr(consigma = "BINOM", size = c(20, 20, 20, 20))) plot(fit2) plot(fit2, root = TRUE)
data(pike65) data(pikepar) fit1 <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3) plot(fit1) plot(fit1, root = TRUE) data(bindat) data(binpar) fit2 <- mix(bindat, binpar, "binom", constr = mixconstr(consigma = "BINOM", size = c(20, 20, 20, 20))) plot(fit2) plot(fit2, root = TRUE)
A function for plotting of Mixdata objects. It is called
via the generic function plot
.
## S3 method for class 'mixdata' plot(x, mixpar = NULL, dist = "norm", root = FALSE, ytop = NULL, clwd = 1, main, sub, xlab, ylab, bty, ...)
## S3 method for class 'mixdata' plot(x, mixpar = NULL, dist = "norm", root = FALSE, ytop = NULL, clwd = 1, main, sub, xlab, ylab, bty, ...)
x |
an object of class |
mixpar |
|
dist |
the distribution of components, it can be
|
root |
if |
ytop |
a scalar which determines the top of the y-axis. |
clwd |
a positive number denoting line width, defaulting to |
main |
an overall title for the plot. |
sub |
a subtitle for the plot. |
xlab |
a title for the x-axis. |
ylab |
a title for the y-axis. |
bty |
A character string which determined the type of box which is
drawn about plots. If |
... |
additional arguments to the function |
If the argument mixpar
is NULL
, then only the
histogram of the data will be displayed; if mixpar
gives
the values of parameters, the component distributions and the
mixture distribution are computed from the parameter values
and superimposed on the histogram.
plot.mix
for plotting Mix objects, plot.default
for additional arguments.
data(cassie) as.mixdata(cassie) # if the result isn't `NULL', then cassie is mixed data plot.mixdata(cassie) data(pikeraw) data(pikepar) pikemd <- mixgroup(pikeraw, breaks = c(0, seq(19.75, 65.75, 2), 80)) plot(pikemd) plot(pikemd, pikepar, "lnorm") fit <- mix(pikemd, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3) plot(fit) plot(pikemd, pikepar, "lnorm", root = TRUE) plot(fit, root = TRUE)
data(cassie) as.mixdata(cassie) # if the result isn't `NULL', then cassie is mixed data plot.mixdata(cassie) data(pikeraw) data(pikepar) pikemd <- mixgroup(pikeraw, breaks = c(0, seq(19.75, 65.75, 2), 80)) plot(pikemd) plot(pikemd, pikepar, "lnorm") fit <- mix(pikemd, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3) plot(fit) plot(pikemd, pikepar, "lnorm", root = TRUE) plot(fit, root = TRUE)
The poisdat
data frame has 15 rows and 2 columns.
data(poisdat)
data(poisdat)
This data frame contains the following columns:
the boundaries of grouping intervals.
the frequencies of observation falling into each interval.
data(poisdat) plot.mixdata(poisdat)
data(poisdat) plot.mixdata(poisdat)
Starting values of parameters for fitting a mixture distribution to the data set poisdat
.
The poispar
data frame has 4 rows and 3 columns.
data(poispar)
data(poispar)
This data frame contains the following columns:
the starting values for proportions.
the starting values for means.
the starting values for standard deviations.
data(poispar)
data(poispar)
print.mix
is a function which prints objects of
class "mix"
and returns it invisibly. It is called
via the generic function print
.
## S3 method for class 'mix' print(x, digits = 4, ...)
## S3 method for class 'mix' print(x, digits = 4, ...)
x |
an object of class |
digits |
how many significant digits are to be used. |
... |
further arguments passed to or from other methods. |
This function only prints information about the mixture
model, which are the estimated parameters of the mixture,
the distribution of components and the constraints on
the parameters. Also, the values for the parameters are
rounded to the specified number of decimal places (default 4).
The whole object can be printed out using the function
print.default
.
mix
for model fitting. print.default
for
printing the whole object.
data(pike65) data(pikepar) fit <- mix(pike65, pikepar, "gamma", mixconstr(consigma = "CCV"), emsteps = 3) fit print(fit) print.mix(fit) print.default(fit)
data(pike65) data(pikepar) fit <- mix(pike65, pikepar, "gamma", mixconstr(consigma = "CCV"), emsteps = 3) fit print(fit) print.mix(fit) print.default(fit)
summary
method for class "mix"
. It is called via
the generic function summary
.
## S3 method for class 'mix' summary(object, digits = 4, ...)
## S3 method for class 'mix' summary(object, digits = 4, ...)
object |
an object of class |
digits |
how many significant digits are to be used. |
... |
additional arguments affecting the summary produced. |
A list containing the following items:
parameters |
a data frame containing the values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
standard errors |
a data frame giving the standard errors of estimated parameters. |
anova table |
analysis of variance table for the
|
mix
for model fitting, summary
for
summarizing other kinds of object. anova.mix
for
information about anova table
.
data(pike65) data(pikepar) fit <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3) fit summary(fit)
data(pike65) data(pikepar) fit <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3) fit summary(fit)
Check if constraints on parameters are valid. See the reference for details.
testconstr(mixdat, mixpar, dist, constr)
testconstr(mixdat, mixpar, dist, constr)
mixdat |
a data frame containing grouped data, whose first
column should be right boundaries of grouping intervals, whose
second column should consist of the frequencies indicating
numbers of observations falling into each interval. If conditional
data are available, this data frame should have |
mixpar |
a data frame containing the values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
dist |
the distribution of components, it can be one of
|
constr |
a list of constraints on parameters of component
distributions. See function |
If the constraints are valid, this function will give a
logical value TRUE
. If not, it will give an error
message to illustrate the reason.
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
mixgroup
for grouping data, mixparam
for
organizing the parameter values, mixconstr
for constructing
constraints.
## Not run: testconstr(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV")) testconstr(bindat, binpar, "binom", constr = mixconstr()) testconstr(bindat, binpar, "binom", constr = mixconstr(consigma = "BINOM")) testconstr(bindat, binpar, "pois", constr = mixconstr(conmu = "MEQ", consigma = "POIS")) ## End(Not run)
## Not run: testconstr(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV")) testconstr(bindat, binpar, "binom", constr = mixconstr()) testconstr(bindat, binpar, "binom", constr = mixconstr(consigma = "BINOM")) testconstr(bindat, binpar, "pois", constr = mixconstr(conmu = "MEQ", consigma = "POIS")) ## End(Not run)
Compute the parameters shape and scale for Weibull distribution given the mean, standard deviation and location.
weibullpar(mu, sigma, loc = 0)
weibullpar(mu, sigma, loc = 0)
mu |
the mean of weibull distribution. |
sigma |
the standard deviation of weibull distribution. |
loc |
the location parameter of weibull distribution defaulting to |
A data frame containing three parameters, which are, in order, shape, scale, and location.
weibullparinv
for computing mean and standard
deviation from the parameters shape, scale and location.
weibullpar(2, 1.2) weibullpar(2, 1.2, 1)
weibullpar(2, 1.2) weibullpar(2, 1.2, 1)
Compute mean and standard deviation of weibull distribution given the values of shape, scale and location.
weibullparinv(shape, scale, loc = 0)
weibullparinv(shape, scale, loc = 0)
shape |
the shape parameter of weibull distribution. |
scale |
the scale parameter of weibull distribution. |
loc |
the location parameter of weibull distribution defaulting to 0. |
A data frame containing three parameters, which are, in order, mean, standard deviation and location.
weibullpar
for computing the parameters shape and scale
from mean and standard deviation.
weibullparinv(weibullpar(2, 1.2)$shape, weibullpar(2, 1.2)$scale)
weibullparinv(weibullpar(2, 1.2)$shape, weibullpar(2, 1.2)$scale)