Title: | Finite Mixture Modeling, Clustering & Classification |
---|---|
Description: | Random univariate and multivariate finite mixture model generation, estimation, clustering, latent class analysis and classification. Variables can be continuous, discrete, independent or dependent and may follow normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or circular von Mises parametric families. |
Authors: | Marko Nagode [aut, cre] , Branislav Panic [ctb] , Jernej Klemenc [ctb] , Simon Oman [ctb] |
Maintainer: | Marko Nagode <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.16.0 |
Built: | 2024-12-08 07:14:31 UTC |
Source: | CRAN |
The adult
dataset containing 48842 instances with 16 continuous, binary and discrete variables was extracted from the census bureau database. Extraction was done by Barry Becker from the 1994 census bureau database.
data(adult)
data(adult)
adult
is a data frame with 48842 cases (rows) and 16 variables (columns) named:
Type
binary train
or test
.
Age
continuous.
Workclass
one of the 8 discrete values
private
,
self-emp-not-inc
,
self-emp-inc
,
federal-gov
,
local-gov
,
state-gov
,
without-pay
or
never-worked
.
Fnlwgt
stands for continuous final weight.
Education
one of the 16 discrete values
bachelors
,
some-college
,
11th
,
hs-grad
,
prof-school
,
assoc-acdm
,
assoc-voc
,
9th
,
7th-8th
,
12th
,
masters
,
1st-4th
,
10th
,
doctorate
,
5th-6th
or
preschool
.
Education.Num
continuous.
Marital.Status
one of the 7 discrete values
married-civ-spouse
,
divorced
,
never-married
,
separated
,
widowed
,
married-spouse-absent
or
married-af-spouse
.
Occupation
one of the 14 discrete values
tech-support
,
craft-repair
,
other-service
,
sales
,
exec-managerial
,
prof-specialty
,
handlers-cleaners
,
machine-op-inspct
,
adm-clerical
,
farming-fishing
,
transport-moving
,
priv-house-serv
,
protective-serv
or
armed-forces
.
Relationship
one of the 6 discrete values
wife
,
own-child
,
husband
,
not-in-family
,
other-relative
or
unmarried
.
Race
one of the 5 discrete values
white
,
asian-pac-islander
,
amer-indian-eskimo
,
other
or
black
.
Sex
binary female
or male
.
Capital.Gain
continuous.
Capital.Loss
continuous.
Hours.Per.Week
continuous.
Native.Country
one of the 41 discrete values
united-states
,
cambodia
,
england
,
puerto-rico
,
canada
,
germany
,
outlying-us(guam-usvi-etc)
,
india
,
japan
,
greece
,
south
,
china
,
cuba
,
iran
,
honduras
,
philippines
,
italy
,
poland
,
jamaica
,
vietnam
,
mexico
,
portugal
,
ireland
,
france
,
dominican-republic
,
laos
,
ecuador
,
taiwan
,
haiti
,
columbia
,
hungary
,
guatemala
,
nicaragua
,
scotland
,
thailand
,
yugoslavia
,
el-salvador
,
trinadad&tobago
,
peru
,
hong
or
holand-netherlands
.
Income
binary <=50k
or >50k
.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
data(adult) # Find complete cases. adult <- adult[complete.cases(adult),] # Show level attributes for binary and discrete variables. levels(adult[["Type"]]) levels(adult[["Workclass"]]) levels(adult[["Education"]]) levels(adult[["Marital.Status"]]) levels(adult[["Occupation"]]) levels(adult[["Relationship"]]) levels(adult[["Race"]]) levels(adult[["Sex"]]) levels(adult[["Native.Country"]]) levels(adult[["Income"]])
data(adult) # Find complete cases. adult <- adult[complete.cases(adult),] # Show level attributes for binary and discrete variables. levels(adult[["Type"]]) levels(adult[["Workclass"]]) levels(adult[["Education"]]) levels(adult[["Marital.Status"]]) levels(adult[["Occupation"]]) levels(adult[["Relationship"]]) levels(adult[["Race"]]) levels(adult[["Sex"]]) levels(adult[["Native.Country"]]) levels(adult[["Income"]])
Returns the Akaike information criterion at pos
.
## S4 method for signature 'REBMIX' AIC(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AIC3(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AIC4(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AICc(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' CAIC(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' AIC(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AIC3(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AIC4(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AICc(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' CAIC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(51):716-723, 1974.
A. F. M. Smith and D. J. Spiegelhalter. Bayes factors and choice criteria for linear
models. Journal of the Royal Statistical Society. Series B, 42(2):213-220, 1980. https://www.jstor.org/stable/2984964.
H. Bozdogan. Model selection and akaike's information criterion (aic): The general theory and its
analytical extensions. Psychometrika, 52(3):345-370, 1987. doi:10.1007/BF02294361.
C. M. Hurvich and C.-L. Tsai. Regression and time series model selection in small samples. Biometrika,
76(2):297-307, 1989. https://www.jstor.org/stable/2336663.
Returns the approximate weight of evidence criterion at pos
.
## S4 method for signature 'REBMIX' AWE(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' AWE(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
J. D. Banfield and A. E. Raftery. Model-based gaussian and non-gaussian clustering. Biometrics, 49(3):803-821, 1993. doi:10.2307/2532201.
These data are the results of the extraction process from the vibrational data of healthy and faulty bearings. Different faults are considered: faultless (1), defect on outer race (2), defect on inner race (3) and defect on ball (4). The extracted features are: root mean square (RMS), square root of the amplitude (SRA), kurtosis value (KV), skewness value (SV), peak to peak value (PPV), crest factor (CF), impulse factor (IF), margin factor (MF), shape factor (SF), kurtosis factor (KF), frequency centre (FC), root mean square frequency (RMSF) and root variance frequency (RVF).
data(bearings)
data(bearings)
bearings
is a data frame with 1906 cases (rows) and 14 variables (columns) named:
RMS
continuous.
SRA
continuous.
KV
continuous.
SV
continuous.
PPV
continuous.
CF
continuous.
IF
continuous.
MF
continuous.
SF
continuous.
KF
continuous.
FC
continuous.
RMSF
continuous.
RVF
continuous.
Class
discrete 1
, 2
, 3
or 4
.
Case Western Reserve University Bearing Data Center Website https://engineering.case.edu/bearingdatacenter/welcome.
B. Panic, J. Klemenc and M. Nagode. Gaussian mixture model based classification revisited: Application to the bearing fault classification. Journal of Mechanical Engineering, 66(4):215-226, 2020. doi:10.5545/sv-jme.2020.6563.
## Not run: data(bearings) # Split dataset into train (75 set.seed(3) Bearings <- split(p = 0.75, Dataset = bearings, class = 14) # Estimate number of components, component weights and component # parameters for train subsets. bearingsest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Bearings), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. bearingscla <- RCLSMIX(model = "RCLSMVNORM", x = list(bearingsest), Dataset = a.test(Bearings), Zt = a.Zt(Bearings)) bearingscla summary(bearingscla) ## End(Not run)
## Not run: data(bearings) # Split dataset into train (75 set.seed(3) Bearings <- split(p = 0.75, Dataset = bearings, class = 14) # Estimate number of components, component weights and component # parameters for train subsets. bearingsest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Bearings), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. bearingscla <- RCLSMIX(model = "RCLSMVNORM", x = list(bearingsest), Dataset = a.test(Bearings), Zt = a.Zt(Bearings)) bearingscla summary(bearingscla) ## End(Not run)
Returns as default the optimized RCLSMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model
equals "RCLSMVNORM"
optimized output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'RCLSMIX' BFSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(), Zt = factor(), ...) ## ... and for other signatures
## S4 method for signature 'RCLSMIX' BFSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(), Zt = factor(), ...) ## ... and for other signatures
model |
see Methods section below. |
x |
a list of objects of class |
Dataset |
a data frame containing test dataset |
Zt |
a factor of true class membership |
... |
currently not used. |
Returns an optimized object of class RCLSMIX
or RCLSMVNORM
.
signature(model = "RCLSMIX")
a character giving the default class name "RCLSMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "RCLSMVNORM")
a character giving the class name "RCLSMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
Marko Nagode
R. Kohavi and G. H. John. Wrappers for feature subset selection, Artificial Intelligence, 97(1-2):273-324, 1997. doi:10.1016/S0004-3702(97)00043-X.
Returns the Bayesian information criterion at pos
.
## S4 method for signature 'REBMIX' BIC(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' BIC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
G. Schwarz. Estimating the dimension of the model. The Annals of Statistics, 6(2):461-464, 1978.
Returns the list of data frames containing bin means and frequencies
for the histogram preprocessing.
## S4 method for signature 'list' bins(Dataset = list(), K = matrix(), ymin = numeric(), ymax = numeric(), ...) ## ... and for other signatures
## S4 method for signature 'list' bins(Dataset = list(), K = matrix(), ymin = numeric(), ymax = numeric(), ...) ## ... and for other signatures
Dataset |
a list of length |
K |
a matrix of size |
ymin |
a vector of length |
ymax |
a vector of length |
... |
currently not used. |
signature(x = "list")
a list of data frames.
Branislav Panic, Marko Nagode
M. Nagode. Finite mixture modeling via REBMIX. Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate multivariate normal datasets. n <- c(7, 10) Theta <- new("RNGMVNORM.Theta", c = 2, d = 2) a.theta1(Theta, 1) <- c(8, 6) a.theta1(Theta, 2) <- c(6, 8) a.theta2(Theta, 1) <- c(8, 2, 2, 4) a.theta2(Theta, 2) <- c(2, 1, 1, 4) sim2d <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("sim2d_", 1:2, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Calculate optimal numbers of bins. opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth equal", kmin = 1, kmax = 20) opt.k Y <- bins(Dataset = sim2d@Dataset, K = opt.k) Y opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth unequal", kmin = 1, kmax = 20) opt.k Y <- bins(Dataset = sim2d@Dataset, K = opt.k) Y
# Generate multivariate normal datasets. n <- c(7, 10) Theta <- new("RNGMVNORM.Theta", c = 2, d = 2) a.theta1(Theta, 1) <- c(8, 6) a.theta1(Theta, 2) <- c(6, 8) a.theta2(Theta, 1) <- c(8, 2, 2, 4) a.theta2(Theta, 2) <- c(2, 1, 1, 4) sim2d <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("sim2d_", 1:2, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Calculate optimal numbers of bins. opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth equal", kmin = 1, kmax = 20) opt.k Y <- bins(Dataset = sim2d@Dataset, K = opt.k) Y opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth unequal", kmin = 1, kmax = 20) opt.k Y <- bins(Dataset = sim2d@Dataset, K = opt.k) Y
Returns as default the boot output for mixtures of conditionally independent normal,
lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If
x
is of class RNGMVNORM
the boot output for mixtures of multivariate normal
component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'REBMIX' boot(x = NULL, rseed = -1, pos = 1, Bootstrap = "parametric", B = 100, n = numeric(), replace = TRUE, prob = numeric(), ...) ## ... and for other signatures ## S4 method for signature 'REBMIX.boot' summary(object, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' boot(x = NULL, rseed = -1, pos = 1, Bootstrap = "parametric", B = 100, n = numeric(), replace = TRUE, prob = numeric(), ...) ## ... and for other signatures ## S4 method for signature 'REBMIX.boot' summary(object, ...) ## ... and for other signatures
x |
see Methods section below. |
rseed |
set the random seed to any negative integer value to initialize the sequence. The first bootstrap dataset corresponds to it.
For each next bootstrap dataset the random seed is decremented |
pos |
a desired row number in |
Bootstrap |
a character giving the bootstrap type. One of default |
B |
number of bootstrap datasets. The default value is |
n |
number of observations. The default value is |
replace |
logical. The sampling is with replacement if |
prob |
a vector of length |
... |
maximum number of components |
object |
see Methods section below. |
Returns an object of class REBMIX.boot
or REBMVNORM.boot
.
signature(x = "REBMIX")
an object of class REBMIX
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(x = "REBMVNORM")
an object of class REBMVNORM
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
signature(object = "REBMIX")
an object of class REBMIX
.
signature(object = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York, 2000.
## Not run: data(weibull) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "single", variant = "EM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. weibullest <- REBMIX(Dataset = list(weibull), Preprocessing = "kernel density estimation", cmin = 2, cmax = 4, Criterion = "BIC", pdf = "Weibull", EMcontrol = EM) # Plot finite mixture. plot(weibullest, what = c("pdf", "marginal cdf", "IC", "logL", "D"), nrow = 3, ncol = 2, npts = 1000) # Bootstrap finite mixture. weibullboot <- boot(x = weibullest, Bootstrap = "nonparametric", B = 10) weibullboot ## End(Not run)
## Not run: data(weibull) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "single", variant = "EM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. weibullest <- REBMIX(Dataset = list(weibull), Preprocessing = "kernel density estimation", cmin = 2, cmax = 4, Criterion = "BIC", pdf = "Weibull", EMcontrol = EM) # Plot finite mixture. plot(weibullest, what = c("pdf", "marginal cdf", "IC", "logL", "D"), nrow = 3, ncol = 2, npts = 1000) # Bootstrap finite mixture. weibullboot <- boot(x = weibullest, Bootstrap = "nonparametric", B = 10) weibullboot ## End(Not run)
Returns an object of class Histogram
. The method can be called recursively.
This way more than one dataset can be binned into one histogram. The method is time consuming.
## S4 method for signature 'Histogram' chistogram(x = NULL, Dataset = data.frame(), K = numeric(), ymin = numeric(), ymax = numeric(), ...) ## ... and for other signatures
## S4 method for signature 'Histogram' chistogram(x = NULL, Dataset = data.frame(), K = numeric(), ymin = numeric(), ymax = numeric(), ...) ## ... and for other signatures
x |
an object of class |
Dataset |
a data frame of size |
K |
an integer or a vector of length |
ymin |
a vector of length |
ymax |
a vector of length |
... |
currently not used. |
signature(x = "Histogram")
an object of class Histogram
.
Marko Nagode
# Create three datasets. set.seed(1) n <- 15 Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10))) Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29))) Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13))) apply(Dataset1, 2, range) apply(Dataset2, 2, range) apply(Dataset3, 2, range) # Bin the first dataset. hist <- chistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0)) # Bin the second dataset. hist <- chistogram(x = hist, Dataset = Dataset2) # Bin the third dataset. hist <- chistogram(x = hist, Dataset = Dataset3) hist
# Create three datasets. set.seed(1) n <- 15 Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10))) Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29))) Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13))) apply(Dataset1, 2, range) apply(Dataset2, 2, range) apply(Dataset3, 2, range) # Bin the first dataset. hist <- chistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0)) # Bin the second dataset. hist <- chistogram(x = hist, Dataset = Dataset2) # Bin the third dataset. hist <- chistogram(x = hist, Dataset = Dataset3) hist
Returns (invisibly) the object containing train and test observations as well as true class membership
for the test dataset. Vectors
are subvectors of
.
## S4 method for signature 'RCLS.chunk' chunk(x = NULL, variables = expression(1:d)) ## ... and for other signatures
## S4 method for signature 'RCLS.chunk' chunk(x = NULL, variables = expression(1:d)) ## ... and for other signatures
x |
see Methods section below. |
variables |
a vector containing indices of variables in subvectors |
Returns an object of class RCLS.chunk
.
signature(x = "RCLS.chunk")
an object of class RCLS.chunk
.
Marko Nagode
data(iris) # Split dataset into train (75%) and test (25%) subsets. set.seed(5) Iris <- split(p = 0.75, Dataset = iris, class = 5) # Extract chunk from train and test datasets. Iris14 <- chunk(x = Iris, variables = c(1,4)) Iris14
data(iris) # Split dataset into train (75%) and test (25%) subsets. set.seed(5) Iris <- split(p = 0.75, Dataset = iris, class = 5) # Extract chunk from train and test datasets. Iris14 <- chunk(x = Iris, variables = c(1,4)) Iris14
Returns the classification likelihood criterion at pos
.
## S4 method for signature 'REBMIX' CLC(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' CLC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
C. Biernacki and G. Govaert. Using the classification likelihood to choose the number of clusters. In E. J. Wegman and S. P. Azen, editors, Computing Science and Statistics, 1997.
Returns the data frame containing observations and empirical
densities
for the kernel density estimation or k-nearest neighbour or bin means
and empirical densities
for the histogram preprocessing. Vectors
and
are subvectors of
and
.
## S4 method for signature 'REBMIX' demix(x = NULL, pos = 1, variables = expression(1:d), ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' demix(x = NULL, pos = 1, variables = expression(1:d), ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "best") # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 8, Criterion = "BIC", EMcontrol = NULL) # Preprocess simulated dataset. f <- demix(simulatedest, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1) par(usr = opar[[2]]$usr, mfg = c(2, 1)) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 1)
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "best") # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 8, Criterion = "BIC", EMcontrol = NULL) # Preprocess simulated dataset. f <- demix(simulatedest, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1) par(usr = opar[[2]]$usr, mfg = c(2, 1)) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 1)
Returns the data frame containing observations and
predictive marginal densities
. Vectors
are subvectors of
. If
the method returns the data frame containing observations
and
the corresponding predictive mixture densities
.
## S4 method for signature 'REBMIX' dfmix(x = NULL, Dataset = data.frame(), pos = 1, variables = expression(1:d), ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' dfmix(x = NULL, Dataset = data.frame(), pos = 1, variables = expression(1:d), ...) ## ... and for other signatures
x |
see Methods section below. |
Dataset |
a data frame containing observations |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Number of classes or nearest neighbours to be processed. K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule. as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule. # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 4, Criterion = "BIC") # Preprocess simulated dataset. Dataset <- data.frame(c(-7, 1), NA, c(3, 7)) f <- dfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1, contour.drawlabels = TRUE, contour.labcex = 0.6) par(usr = opar[[2]]$usr, mfg = c(2, 1)) points(x = f[, 1], y = f[, 2]) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Number of classes or nearest neighbours to be processed. K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule. as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule. # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 4, Criterion = "BIC") # Preprocess simulated dataset. Dataset <- data.frame(c(-7, 1), NA, c(3, 7)) f <- dfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1, contour.drawlabels = TRUE, contour.labcex = 0.6) par(usr = opar[[2]]$usr, mfg = c(2, 1)) points(x = f[, 1], y = f[, 2]) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)
"EM.Control"
Object of class EM.Control
.
Objects can be created by calls of the form new("EM.Control", ...)
. Accessor methods for the slots are a.strategy(x = NULL)
,
a.variant(x = NULL)
, a.acceleration(x = NULL)
, a.tolerance(x = NULL)
, a.acceleration.multiplier(x = NULL)
,
a.maximum.iterations(x = NULL)
, a.K(x = NULL)
and a. eliminate.zero.components (x = NULL)
, where x
stands for an object of
class EM.Control
. Setter methods a.strategy(x = NULL)
, a.variant(x = NULL)
,
a.acceleration(x = NULL)
, a.tolerance(x = NULL)
, a.acceleration.multiplier(x = NULL)
, a.maximum.iterations(x = NULL)
,
a.K(x = NULL)
and eliminate.zero.components
are provided to write to strategy
, variant
, acceleration
, tolerance
,
acceleration.multiplier
, maximum.iterations
and eliminate.zero.components
slot respectively.
strategy
:a character containing the EM and REBMIX strategy. One of "none"
, "exhaustive"
, "best"
and "single"
. The default value is "none"
.
variant
:a character containing the type of the EM algorithm to be used. One of "EM"
of "ECM"
. The default value is "EM"
.
acceleration
:a character containing the type of acceleration of the EM iteration increment. One of "fixed"
, "line"
or "golden"
. The default value is "fixed"
.
tolerance
:tolerance value for the EM convergence criteria. The default value is 1.0E-4.
acceleration.multiplier
:acceleration.multiplier ,
. acceleration.multiplier for the EM step increment. The default value is 1.0.
maximum.iterations
:a positive integer containing the maximum allowed number of iterations of the EM algorithm. The default value is 1000.
K
:an integer containing the number of bins for the histogram based EM algorithm. This option can reduce computational time drastically if the datasets contain a large number of observations and
K
is set to the value . The default value of 0 means that the EM algorithm runs over all
.
eliminate.zero.components
:a logical indicating if the componenets with should be eliminated from output. Only used with
EMMIX-methods
.
Branislav Panic
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
A. P. Dempster et al. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39(1):1-38, 1977.
https://www.jstor.org/stable/2984875.
G. Celeux and G. Govaert. A classification EM algorithm for clustering and two stochastic versions, Computational Statistics & Data Analysis, 14(3):315:332, 1992.
doi:10.1016/0167-9473(92)90042-E.
# Inline creation by new call. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", tolerance = 1e-4, acceleration.multiplier = 1.0, maximum.iterations = 1000, K = 0) EM # Creation of EM object with setter method. EM <- new("EM.Control") a.strategy(EM) <- "exhaustive" a.variant(EM) <- "EM" a.acceleration(EM) <- "fixed" a.tolerance(EM) <- 1e-4 a.acceleration.multiplier(EM) <- 1.0 a.maximum.iterations(EM) <- 1000 a.K(EM) <- 256 EM
# Inline creation by new call. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", tolerance = 1e-4, acceleration.multiplier = 1.0, maximum.iterations = 1000, K = 0) EM # Creation of EM object with setter method. EM <- new("EM.Control") a.strategy(EM) <- "exhaustive" a.variant(EM) <- "EM" a.acceleration(EM) <- "fixed" a.tolerance(EM) <- 1e-4 a.acceleration.multiplier(EM) <- 1.0 a.maximum.iterations(EM) <- 1000 a.K(EM) <- 256 EM
Returns as default the EM algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma,
Gumbel, binomial, Poisson, Dirac or von Mises component densities. If model
equals "REBMVNORM"
output
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'REBMIX' EMMIX(model = "REBMIX", Dataset = list(), Theta = NULL, EMcontrol = NULL, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' EMMIX(model = "REBMIX", Dataset = list(), Theta = NULL, EMcontrol = NULL, ...) ## ... and for other signatures
model |
see Methods section below. |
Dataset |
a list of length |
Theta |
an object of class |
EMcontrol |
an object of class |
... |
currently not used. |
Returns an object of class REBMIX
or REBMVNORM
.
signature(model = "REBMIX")
a character giving the default class name "REBMIX"
for mixtures of conditionally
independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac or von Mises component densities.
signature(model = "REBMVNORM")
a character giving the class name "REBMVNORM"
for mixtures
of multivariate normal component densities with unrestricted variance-covariance matrices.
Branislav Panic
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation. Mathematics, 8(3):373, 2020. doi:10.3390/math8030373.
## Not run: devAskNewPage(ask = TRUE) # Load faithful dataset. data(faithful) # Plot faithfull dataset. plot(faithful) # Number of dimensions. d <- ncol(faithful) # Obtain 2 component solution with Gaussian mixtures. c <- 2 # Create EMMVNORM.Theta object with new call. Theta <- new("EMMVNORM.Theta", d = d, c = c) # Set parameters of Theta. # Weights. a.w(Theta) <- c(0.5, 0.5) # Means. a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0) # Covariances. a.theta2.all(Theta) <- c(1, 0, 0, 1, 1, 0, 0, 1) # Run EMMIX method. model <- EMMIX(model = "REBMVNORM", Dataset = list(faithful), Theta = Theta) # show. model # summary. summary(model) # plot. plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf")) # Create EMMIX.Theta object with new call. Theta <- new("EMMIX.Theta", c = c, pdf = c("normal", "normal")) # Set parameters of Theta. # Weights. a.w(Theta) <- c(0.5, 0.5) # Means. a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0) # Covariances. a.theta2.all(Theta) <- c(1, 1, 1, 1) # Run EMMIX method. model <- EMMIX(Dataset = list(faithful), Theta = Theta) # show. model # summary. summary(model) # plot. plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf")) ## End(Not run)
## Not run: devAskNewPage(ask = TRUE) # Load faithful dataset. data(faithful) # Plot faithfull dataset. plot(faithful) # Number of dimensions. d <- ncol(faithful) # Obtain 2 component solution with Gaussian mixtures. c <- 2 # Create EMMVNORM.Theta object with new call. Theta <- new("EMMVNORM.Theta", d = d, c = c) # Set parameters of Theta. # Weights. a.w(Theta) <- c(0.5, 0.5) # Means. a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0) # Covariances. a.theta2.all(Theta) <- c(1, 0, 0, 1, 1, 0, 0, 1) # Run EMMIX method. model <- EMMIX(model = "REBMVNORM", Dataset = list(faithful), Theta = Theta) # show. model # summary. summary(model) # plot. plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf")) # Create EMMIX.Theta object with new call. Theta <- new("EMMIX.Theta", c = c, pdf = c("normal", "normal")) # Set parameters of Theta. # Weights. a.w(Theta) <- c(0.5, 0.5) # Means. a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0) # Covariances. a.theta2.all(Theta) <- c(1, 1, 1, 1) # Run EMMIX method. model <- EMMIX(Dataset = list(faithful), Theta = Theta) # show. model # summary. summary(model) # plot. plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf")) ## End(Not run)
"EMMIX.Theta"
Object of class EMMIX.Theta
.
Objects can be created by calls of the form new("EMMIX.Theta", ...)
. Accessor methods for the slots are a.c(x = NULL)
, a.d(x = NULL)
,
a.pdf(x = NULL)
and a.Theta(x = NULL)
, where x
stands for an object of class EMMIX.Theta
. Setter methods
a.theta1(x = NULL, l = numeric())
, a.theta2(x = NULL, l = numeric())
, a.theta3(x = NULL, l = numeric())
,
a.theta1.all(x = NULL)
, a.theta2.all(x = NULL)
, a.theta3.all(x = NULL)
and a.w(x = NULL)
are provided to write to Theta
slot, where .
c
:number of components . The default value is
1
.
d
:number of dimensions.
pdf
:a character vector of length containing continuous or discrete parametric family types. One of
"normal"
, "lognormal"
, "Weibull"
, "gamma"
, "Gumbel"
, "binomial"
, "Poisson"
, "Dirac"
or "vonMises"
.
Theta
:a list containing parametric family types
pdfl
. One of "normal"
, "lognormal"
, "Weibull"
, "gamma"
, "Gumbel"
, "binomial"
, "Poisson"
, "Dirac"
or circular "vonMises"
defined for .
Component parameters
theta1.l
follow the parametric family types. One of for normal, lognormal, Gumbel and von Mises distributions and
for Weibull, gamma, binomial, Poisson and Dirac distributions.
Component parameters
theta2.l
follow theta1.l
. One of for normal, lognormal and Gumbel distributions,
for Weibull and gamma distributions,
for binomial distribution,
for von Mises distribution.
Component parameters
theta3.l
follow theta2.l
. One of for Gumbel distribution.
w
:a vector of length containing component weights
summing to 1.
Branislav Panic
Theta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.w(Theta) <- c(0.4, 0.6) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(0.5, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(20, 50) a.theta2(Theta, l = 2) <- c(3, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) Theta Theta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel", "Poisson")) a.w(Theta) <- c(0.4, 0.6) a.theta1.all(Theta) <- c(2, 10, 30, 20, 50, 60) a.theta2.all(Theta) <- c(0.5, 2.3, NA, 3, 4.2, NA) a.theta3.all(Theta) <- c(NA, 1.0, NA, NA, -1.0, NA) Theta Theta <- new("EMMVNORM.Theta", c = 2, d = 3) a.w(Theta) <- c(0.4, 0.6) a.theta1(Theta, l = 1) <- c(2, 10, -20) a.theta2(Theta, l = 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1) a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30) a.theta2(Theta, l = 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) Theta Theta <- new("EMMVNORM.Theta", c = 2, d = 3) a.w(Theta) <- c(0.4, 0.6) a.theta1.all(Theta) <- c(2, 10, -20, -2.4, -15.1, 30) a.theta2.all(Theta) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1, 4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) Theta
Theta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.w(Theta) <- c(0.4, 0.6) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(0.5, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(20, 50) a.theta2(Theta, l = 2) <- c(3, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) Theta Theta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel", "Poisson")) a.w(Theta) <- c(0.4, 0.6) a.theta1.all(Theta) <- c(2, 10, 30, 20, 50, 60) a.theta2.all(Theta) <- c(0.5, 2.3, NA, 3, 4.2, NA) a.theta3.all(Theta) <- c(NA, 1.0, NA, NA, -1.0, NA) Theta Theta <- new("EMMVNORM.Theta", c = 2, d = 3) a.w(Theta) <- c(0.4, 0.6) a.theta1(Theta, l = 1) <- c(2, 10, -20) a.theta2(Theta, l = 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1) a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30) a.theta2(Theta, l = 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) Theta Theta <- new("EMMVNORM.Theta", c = 2, d = 3) a.w(Theta) <- c(0.4, 0.6) a.theta1.all(Theta) <- c(2, 10, -20, -2.4, -15.1, 30) a.theta2.all(Theta) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1, 4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) Theta
Returns an object of class Histogram
. The method can be called recursively.
This way more than one dataset can be binned into one histogram. Set shrink
to TRUE
only when the method is called for the last time to optimize the size of the object.
The method is memory consuming.
## S4 method for signature 'Histogram' fhistogram(x = NULL, Dataset = data.frame(), K = numeric(), ymin = numeric(), ymax = numeric(), shrink = FALSE, ...) ## ... and for other signatures
## S4 method for signature 'Histogram' fhistogram(x = NULL, Dataset = data.frame(), K = numeric(), ymin = numeric(), ymax = numeric(), shrink = FALSE, ...) ## ... and for other signatures
x |
an object of class |
Dataset |
a data frame of size |
K |
an integer or a vector of length |
ymin |
a vector of length |
ymax |
a vector of length |
shrink |
logical. If |
... |
currently not used. |
signature(x = "Histogram")
an object of class Histogram
.
Marko Nagode
# Create three datasets. set.seed(1) n <- 15 Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10))) Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29))) Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13))) apply(Dataset1, 2, range) apply(Dataset2, 2, range) apply(Dataset3, 2, range) # Bin the first dataset. hist <- fhistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0)) # Bin the second dataset. hist <- fhistogram(x = hist, Dataset = Dataset2) # Bin the third dataset and shrink the hist object. hist <- fhistogram(x = hist, Dataset = Dataset3, shrink = TRUE) hist
# Create three datasets. set.seed(1) n <- 15 Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10))) Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29))) Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13))) apply(Dataset1, 2, range) apply(Dataset2, 2, range) apply(Dataset3, 2, range) # Bin the first dataset. hist <- fhistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0)) # Bin the second dataset. hist <- fhistogram(x = hist, Dataset = Dataset2) # Bin the third dataset and shrink the hist object. hist <- fhistogram(x = hist, Dataset = Dataset3, shrink = TRUE) hist
The unfilled survey of the Corona Borealis region contains the velocities of 82 galaxies from 6 well separated conic sections of space.
data(galaxy)
data(galaxy)
galaxy
is a data frame with 82 cases (rows) and 1 continuous variable (columns) called Velocity
.
K. Roeder. Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. Journal of American Statistical Association, 85(411):617-624, 1990. https://www.jstor.org/stable/2289993.
S. Richardson and P. J. Green. On bayesian analysis of mixtures with an unknown number
of components. Journal of the Royal Statistical Society B, 59(4):731-792, 1997. https://www.jstor.org/stable/2985194.
G. McLachlan and D. Peel. Contribution to the discussion of paper by s. richardson
and p.j. green. Journal of the Royal Statistical Society B, 59(4):779-780, 1997. https://www.jstor.org/stable/2985194.
M. Stephens. Bayesian analysis of mixture models with an unknown number of components -
an alternative to reversible jump methods. The Annals of Statistics, 28(1):40-74, 2000. https://www.jstor.org/stable/2673981.
"Histogram"
Object of class Histogram
.
Objects can be created by calls of the form new("Histogram", ...)
. Accessor methods for the slots are a.Y(x = NULL)
,
a.K(x = NULL)
, a.ymin(x = NULL)
, a.ymax(x = NULL)
, a.y0(x = NULL)
, a.h(x = NULL)
, a.n(x = NULL)
and a.ns(x = NULL)
.
Y
:a data frame of size containing d-dimensional histogram.
Each of the first
columns represents one random variable and contains bin means
. Column
contains frequencies
.
K
:an integer or a vector of length containing numbers of bins
.
ymin
:a vector of length containing minimum observations.
ymax
:a vector of length containing maximum observations.
y0
:a vector of length containing origins.
h
:a vector of length containing bin widths.
n
:an integer containing total number of observations.
ns
:an integer containing number of samples.
Marko Nagode
Y <- as.data.frame(matrix(1.0, nrow = 8, ncol = 3)) hist <- new("Histogram", Y = Y, K = c(4, 2), ymin = c(2, 1), ymax = c(10, 8)) a.Y(hist) a.K(hist) a.ymin(hist) a.ymax(hist) a.y0(hist) a.h(hist) a.n(hist) a.ns(hist) # Multiplay Y[ , d + 1] by 0.1. a.Y(hist) <- 0.1
Y <- as.data.frame(matrix(1.0, nrow = 8, ncol = 3)) hist <- new("Histogram", Y = Y, K = c(4, 2), ymin = c(2, 1), ymax = c(10, 8)) a.Y(hist) a.K(hist) a.ymin(hist) a.ymax(hist) a.y0(hist) a.h(hist) a.n(hist) a.ns(hist) # Multiplay Y[ , d + 1] by 0.1. a.Y(hist) <- 0.1
Returns the Hannan-Quinn information criterion at pos
.
## S4 method for signature 'REBMIX' HQC(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' HQC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
E. J. Hannan and B. G. Quinn. The determination of the order of an autoregression. Journal of the Royal Statistical Society. Series B, 41(2):190-195, 1979. https://www.jstor.org/stable/2985032.
Returns the integrated classification likelihood criterion at pos
.
## S4 method for signature 'REBMIX' ICL(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' ICL(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
C. Biernacki, G. Celeux and G. Govaert. Assessing a mixture model for clustering with the integrated classification likelihood. Technical Report 3521, INRIA, Rhone-Alpes, 1998.
Returns the approximate integrated classification likelihood criterion at pos
.
## S4 method for signature 'REBMIX' ICLBIC(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' ICLBIC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
C. Biernacki, G. Celeux and G. Govaert. Assessing a mixture model for clustering with the integrated classification likelihood. Technical Report 3521, INRIA, Rhone-Alpes, 1998.
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
data(iris)
data(iris)
iris
is a data frame with 150 cases (rows) and 5 variables (columns) named:
Sepal.Length
continuous.
Sepal.Width
continuous.
Petal.Length
continuous.
Petal.Width
continuous.
Class
discrete iris-setosa
, iris-versicolour
or iris-virginica
.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179-188, 1936.
## Not run: devAskNewPage(ask = TRUE) data(iris) # Show level attributes. levels(iris[["Class"]]) # Split dataset into train (75 set.seed(5) Iris <- split(p = 0.6, Dataset = iris, class = 5) # Estimate number of components, component weights and component # parameters for train subsets. n <- range(a.ntrain(Iris)) irisest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Iris), Preprocessing = "histogram", cmax = 10, Criterion = "ICL-BIC", EMcontrol = new("EM.Control", strategy = "single")) plot(irisest, pos = 1, nrow = 3, ncol = 2, what = c("pdf")) plot(irisest, pos = 2, nrow = 3, ncol = 2, what = c("pdf")) plot(irisest, pos = 3, nrow = 3, ncol = 2, what = c("pdf")) # Selected chunks. iriscla <- RCLSMIX(model = "RCLSMVNORM", x = list(irisest), Dataset = a.test(Iris), Zt = a.Zt(Iris)) iriscla summary(iriscla) # Plot selected chunks. plot(iriscla, nrow = 3, ncol = 2) ## End(Not run)
## Not run: devAskNewPage(ask = TRUE) data(iris) # Show level attributes. levels(iris[["Class"]]) # Split dataset into train (75 set.seed(5) Iris <- split(p = 0.6, Dataset = iris, class = 5) # Estimate number of components, component weights and component # parameters for train subsets. n <- range(a.ntrain(Iris)) irisest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Iris), Preprocessing = "histogram", cmax = 10, Criterion = "ICL-BIC", EMcontrol = new("EM.Control", strategy = "single")) plot(irisest, pos = 1, nrow = 3, ncol = 2, what = c("pdf")) plot(irisest, pos = 2, nrow = 3, ncol = 2, what = c("pdf")) plot(irisest, pos = 3, nrow = 3, ncol = 2, what = c("pdf")) # Selected chunks. iriscla <- RCLSMIX(model = "RCLSMVNORM", x = list(irisest), Dataset = a.test(Iris), Zt = a.Zt(Iris)) iriscla summary(iriscla) # Plot selected chunks. plot(iriscla, nrow = 3, ncol = 2) ## End(Not run)
Returns (invisibly) a vector containing numbers of bins for the histogram and the kernel density estimation or numbers of nearest
neighbours
for the k-nearest neighbour.
kseq(from = NULL, to = NULL, f = 0.05, ...)
kseq(from = NULL, to = NULL, f = 0.05, ...)
from |
starting value of the sequence. The default value is |
to |
end value of the sequence. The default value is |
f |
number specifying the fraction by which the bins or nearest neighbours should be separated |
... |
currently not used. |
Marko Nagode
# Generate numbers of bins. n <- 10000 Sturges <- as.integer(1 + log2(n)) # Minimum v follows Sturges rule. Log10 <- as.integer(10 * log10(n)) # Maximum v follows Log10 rule. RootN <- as.integer(2 * n^0.5) # Maximum v follows RootN rule. K <- kseq(from = Sturges, to = Log10, f = 0.05) K K <- kseq(from = Sturges, to = RootN, f = 0.03) K
# Generate numbers of bins. n <- 10000 Sturges <- as.integer(1 + log2(n)) # Minimum v follows Sturges rule. Log10 <- as.integer(10 * log10(n)) # Maximum v follows Log10 rule. RootN <- as.integer(2 * n^0.5) # Maximum v follows RootN rule. K <- kseq(from = Sturges, to = Log10, f = 0.05) K K <- kseq(from = Sturges, to = RootN, f = 0.03) K
Returns the list with the data frame Mij
containing the cluster levels , the numbers of pixels
and the cluster moments
for 2D images or the data frame
Mijk
containing the cluster levels , the numbers of voxels
and the cluster moments
for 3D images and the adjacency matrix
A
of size . It may have some
NA
rows and columns. To calculate the adjacency matrix , the raw cluster moments are first converted into z-scores.
## S4 method for signature 'array' labelmoments(Zp = array(), cmax = integer(), Sigma = 1.0, ...) ## ... and for other signatures
## S4 method for signature 'array' labelmoments(Zp = array(), cmax = integer(), Sigma = 1.0, ...) ## ... and for other signatures
Zp |
a 2D array of size |
cmax |
maximum number of clusters |
Sigma |
scale parameter |
... |
currently not used. |
signature(Zp = "array")
an array.
Marko Nagode, Branislav Panic
A. Ng, M. Jordan and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14 (NIPS 2001).
Zp <- matrix(rep(0, 100), nrow = 10, ncol = 10) Zp[2, 2:4] <- 1; Zp[2:4, 5] <- 2; Zp[8, 7:10] <- 3; Zp[9, 6] <- 4; Zp[10, 5] <- 4 Zp[10, 1:4] <- 5 Zp[6:9, 1] <- 6 labelmoments <- labelmoments(Zp, cmax = 6, Sigma = 1.0) set.seed(12) mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 3) Zp mergelabels
Zp <- matrix(rep(0, 100), nrow = 10, ncol = 10) Zp[2, 2:4] <- 1; Zp[2:4, 5] <- 2; Zp[8, 7:10] <- 3; Zp[9, 6] <- 4; Zp[10, 5] <- 4 Zp[10, 1:4] <- 5 Zp[6:9, 1] <- 6 labelmoments <- labelmoments(Zp, cmax = 6, Sigma = 1.0) set.seed(12) mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 3) Zp mergelabels
Returns the log likelihood at pos
.
## S4 method for signature 'REBMIX' logL(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' logL(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York, 2000.
Returns a factor of predictive cluster membership for dataset.
## S4 method for signature 'RCLRMIX' mapclusters(x = NULL, Dataset = data.frame(), s = expression(c), ...) ## ... and for other signatures
## S4 method for signature 'RCLRMIX' mapclusters(x = NULL, Dataset = data.frame(), s = expression(c), ...) ## ... and for other signatures
x |
see Methods section below. |
Dataset |
a data frame of size |
s |
a desired number of clusters to be created. The default value is |
... |
currently not used. |
signature(x = "RCLRMIX")
an object of class RCLRMIX
.
signature(x = "RCLRMVNORM")
an object of class RCLRMVNORM
.
Marko Nagode, Branislav Panic
devAskNewPage(ask = TRUE) # Generate normal dataset. n <- c(50, 20, 40) Theta <- new("RNGMVNORM.Theta", c = 3, d = 2) a.theta1(Theta, 1) <- c(3, 10) a.theta1(Theta, 2) <- c(8, 6) a.theta1(Theta, 3) <- c(12, 11) a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2) a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5) a.theta2(Theta, 3) <- c(2, 1, 1, 2) normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("normal_", 1:10, sep = ""), n = n, Theta = a.Theta(Theta)) # Convert all datasets to single histogram. hist <- NULL n <- length(normal@Dataset) hist <- fhistogram(Dataset = normal@Dataset[[1]], K = c(10, 10), ymin = a.ymin(normal), ymax = a.ymax(normal)) for (i in 2:n) { hist <- fhistogram(x = hist, Dataset = normal@Dataset[[i]], shrink = i == n) } # Estimate number of components, component weights and component parameters. normalest <- REBMIX(model = "REBMVNORM", Dataset = list(hist), Preprocessing = "histogram", cmax = 6, Criterion = "BIC") summary(normalest) # Plot finite mixture. plot(normalest) # Cluster dataset. normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest) # Plot clusters. plot(normalclu) summary(normalclu) # Map clusters. Zp <- mapclusters(x = normalclu, Dataset = a.Dataset(normal, 4)) Zt <- a.Zt(normal) Zp Zt
devAskNewPage(ask = TRUE) # Generate normal dataset. n <- c(50, 20, 40) Theta <- new("RNGMVNORM.Theta", c = 3, d = 2) a.theta1(Theta, 1) <- c(3, 10) a.theta1(Theta, 2) <- c(8, 6) a.theta1(Theta, 3) <- c(12, 11) a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2) a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5) a.theta2(Theta, 3) <- c(2, 1, 1, 2) normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("normal_", 1:10, sep = ""), n = n, Theta = a.Theta(Theta)) # Convert all datasets to single histogram. hist <- NULL n <- length(normal@Dataset) hist <- fhistogram(Dataset = normal@Dataset[[1]], K = c(10, 10), ymin = a.ymin(normal), ymax = a.ymax(normal)) for (i in 2:n) { hist <- fhistogram(x = hist, Dataset = normal@Dataset[[i]], shrink = i == n) } # Estimate number of components, component weights and component parameters. normalest <- REBMIX(model = "REBMVNORM", Dataset = list(hist), Preprocessing = "histogram", cmax = 6, Criterion = "BIC") summary(normalest) # Plot finite mixture. plot(normalest) # Cluster dataset. normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest) # Plot clusters. plot(normalclu) summary(normalclu) # Map clusters. Zp <- mapclusters(x = normalclu, Dataset = a.Dataset(normal, 4)) Zt <- a.Zt(normal) Zp Zt
Returns the minimum desription length at pos
.
## S4 method for signature 'REBMIX' MDL2(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' MDL5(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' MDL2(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' MDL5(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
M. H. Hansen and B. Yu. Model selection and the principle of minimum description length. Journal of the American Statistical Association, 96(454):746-774, 2001. https://www.jstor.org/stable/2670311.
Returns the list with the normalised adjacency matrix L
of size . The normalised adjacency matrix
depends on the probability adjacency matrix
, where
and the degree matrix
. The
matrices may contain some
NA
rows and columns, which are eliminated by the method.
The list also contains the vector of integers cluster
of length , which indicates the cluster to which each label is assigned.
## S4 method for signature 'list' mergelabels(A = list(), w = numeric(), k = 2, ...) ## ... and for other signatures
## S4 method for signature 'list' mergelabels(A = list(), w = numeric(), k = 2, ...) ## ... and for other signatures
A |
a list of length |
w |
vector of length |
k |
number of clusters |
... |
further arguments to |
signature(A = "list")
a list.
Marko Nagode, Branislav Panic
A. Ng, M. Jordan and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14 (NIPS 2001).
Zp <- array(0, dim = c(10, 10, 2)) Zp[ , ,1][10, 1:4] <- 1 Zp[ , ,1][1:4, 10] <- 2 Zp[ , ,2][9, 1:5] <- 3 Zp[ , ,2][1:6, 9] <- 4 labelmoments <- labelmoments(Zp, cmax = 4, Sigma = 1.0) labelmoments set.seed(3) mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 5) Zp mergelabels
Zp <- array(0, dim = c(10, 10, 2)) Zp[ , ,1][10, 1:4] <- 1 Zp[ , ,1][1:4, 10] <- 2 Zp[ , ,2][9, 1:5] <- 3 Zp[ , ,2][1:6, 9] <- 4 labelmoments <- labelmoments(Zp, cmax = 4, Sigma = 1.0) labelmoments set.seed(3) mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 5) Zp mergelabels
Returns the matrix of size containing optimal numbers of bins
for all processed datasets.
## S4 method for signature 'list' optbins(Dataset = list(), Rule = "Knuth equal", ymin = numeric(), ymax = numeric(), kmin = numeric(), kmax = numeric(), ...) ## ... and for other signatures
## S4 method for signature 'list' optbins(Dataset = list(), Rule = "Knuth equal", ymin = numeric(), ymax = numeric(), kmin = numeric(), kmax = numeric(), ...) ## ... and for other signatures
Dataset |
a list of length |
Rule |
a character giving the histogram binning rule. One of |
ymin |
a vector of length |
ymax |
a vector of length |
kmin |
lower limit of the number of bins. The default value is |
kmax |
upper limit of the number of bins. The default value is |
... |
currently not used. |
signature(x = "list")
a list of data frames.
Branislav Panic, Marko Nagode
K. K. Knuth. Optimal data-based binning for histograms and histogram-based probability density models.
Digital Signal Processing, 95:102581, 2019.
doi:10.1016/j.dsp.2019.102581.
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
# Generate multivariate normal datasets. n <- c(750, 1000) Theta <- new("RNGMVNORM.Theta", c = 2, d = 2) a.theta1(Theta, 1) <- c(8, 6) a.theta1(Theta, 2) <- c(6, 8) a.theta2(Theta, 1) <- c(8, 2, 2, 4) a.theta2(Theta, 2) <- c(2, 1, 1, 4) sim2d <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("sim2d_", 1:5, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Calculate optimal numbers of bins. opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth equal", ymin = sim2d@ymin, ymax = sim2d@ymax, kmin = 2, kmax = 20) opt.k # Create object of class EM.Control. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. sim2dest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(sim2d), Preprocessing = "h", cmax = 10, ymin = a.ymin(sim2d), ymax = a.ymax(sim2d), K = opt.k, Criterion = "BIC", EMcontrol = EM) # Plot finite mixture. plot(sim2dest, pos = 3, nrow = 4, what = c("pdf", "marginal pdf", "IC")) # Estimate number of components, component weights and component # parameters for well known Iris dataset. Dataset <- list(iris[, c(1:4)]) # Calculate optimal numbers of bins using non-equal number of bins in each dimension. opt.k <- optbins(Dataset = Dataset, Rule = "Knuth unequal", kmin = 2, kmax = 20) opt.k # Estimate number of components, component weights and component parameters. irisest <- REBMIX(model = "REBMVNORM", Dataset = Dataset, Preprocessing = "h", cmax = 10, K = opt.k, Criterion = "BIC", EMcontrol = EM) irisest
# Generate multivariate normal datasets. n <- c(750, 1000) Theta <- new("RNGMVNORM.Theta", c = 2, d = 2) a.theta1(Theta, 1) <- c(8, 6) a.theta1(Theta, 2) <- c(6, 8) a.theta2(Theta, 1) <- c(8, 2, 2, 4) a.theta2(Theta, 2) <- c(2, 1, 1, 4) sim2d <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("sim2d_", 1:5, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Calculate optimal numbers of bins. opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth equal", ymin = sim2d@ymin, ymax = sim2d@ymax, kmin = 2, kmax = 20) opt.k # Create object of class EM.Control. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. sim2dest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(sim2d), Preprocessing = "h", cmax = 10, ymin = a.ymin(sim2d), ymax = a.ymax(sim2d), K = opt.k, Criterion = "BIC", EMcontrol = EM) # Plot finite mixture. plot(sim2dest, pos = 3, nrow = 4, what = c("pdf", "marginal pdf", "IC")) # Estimate number of components, component weights and component # parameters for well known Iris dataset. Dataset <- list(iris[, c(1:4)]) # Calculate optimal numbers of bins using non-equal number of bins in each dimension. opt.k <- optbins(Dataset = Dataset, Rule = "Knuth unequal", kmin = 2, kmax = 20) opt.k # Estimate number of components, component weights and component parameters. irisest <- REBMIX(model = "REBMVNORM", Dataset = Dataset, Preprocessing = "h", cmax = 10, K = opt.k, Criterion = "BIC", EMcontrol = EM) irisest
Returns the partition coefficient of Bezdek at pos
.
## S4 method for signature 'REBMIX' PC(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' PC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York, 2000.
Returns the data frame containing observations and empirical
distribution functions
. Vectors
are subvectors of
.
## S4 method for signature 'REBMIX' pemix(x = NULL, pos = 1, variables = expression(1:d), lower.tail = TRUE, log.p = FALSE, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' pemix(x = NULL, pos = 1, variables = expression(1:d), lower.tail = TRUE, log.p = FALSE, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
lower.tail |
logical. If |
log.p |
logical. if |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "exhaustive", variant = "ECM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(Dataset = a.Dataset(simulated), Preprocessing = "kernel density estimation", cmax = 4, pdf = c("n", "n", "n"), EMcontrol = EM) # Preprocess simulated dataset. f <- pemix(simulatedest, pos = 3, variables = c(1)) f
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "exhaustive", variant = "ECM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(Dataset = a.Dataset(simulated), Preprocessing = "kernel density estimation", cmax = 4, pdf = c("n", "n", "n"), EMcontrol = EM) # Preprocess simulated dataset. f <- pemix(simulatedest, pos = 3, variables = c(1)) f
Returns the data frame containing observations and
predictive marginal distribution functions
. Vectors
are subvectors of
. If
the method returns the data frame containing observations
and
the corresponding predictive mixture distribution function
.
## S4 method for signature 'REBMIX' pfmix(x = NULL, Dataset = data.frame(), pos = 1, variables = expression(1:d), lower.tail = TRUE, log.p = FALSE, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' pfmix(x = NULL, Dataset = data.frame(), pos = 1, variables = expression(1:d), lower.tail = TRUE, log.p = FALSE, ...) ## ... and for other signatures
x |
see Methods section below. |
Dataset |
a data frame containing observations |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
lower.tail |
logical. If |
log.p |
logical. if |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Number of classes or nearest neighbours to be processed. K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule. as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule. # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 4, Criterion = "BIC", pdf = c("n", "n", "n")) # Preprocess simulated dataset. Dataset <- data.frame(c(25, 5, -20), NA, c(31, 20, 20)) f <- pfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1, what = "pdf", contour.drawlabels = TRUE, contour.labcex = 0.6) par(usr = opar[[2]]$usr, mfg = c(2, 1)) points(x = f[, 1], y = f[, 2]) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Number of classes or nearest neighbours to be processed. K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule. as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule. # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 4, Criterion = "BIC", pdf = c("n", "n", "n")) # Preprocess simulated dataset. Dataset <- data.frame(c(25, 5, -20), NA, c(31, 20, 20)) f <- pfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1, what = "pdf", contour.drawlabels = TRUE, contour.labcex = 0.6) par(usr = opar[[2]]$usr, mfg = c(2, 1)) points(x = f[, 1], y = f[, 2]) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)
Plots true clusters if x
equals "RNGMIX"
. Plots the REBMIX output
depending on what
argument if x
equals "REBMIX"
.
Plots predictive clusters if x
equals "RCLRMIX"
.
Wrongly clustered observations are plotted only if x@Zt
is available.
Plots predictive classes and wrongly classified observations if x
equals "RCLSMIX"
.
## S4 method for signature 'RNGMIX,missing' plot(x, y, pos = 1, nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## S4 method for signature 'REBMIX,missing' plot(x, y, pos = 1, what = c("pdf"), nrow = 1, ncol = 1, npts = 200, n = 200, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, contour.drawlabels = FALSE, contour.labcex = 0.8, contour.method = "flattest", contour.nlevels = 12, log = "", ...) ## S4 method for signature 'RCLRMIX,missing' plot(x, y, s = expression(c), nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## S4 method for signature 'RCLSMIX,missing' plot(x, y, nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## ... and for other signatures
## S4 method for signature 'RNGMIX,missing' plot(x, y, pos = 1, nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## S4 method for signature 'REBMIX,missing' plot(x, y, pos = 1, what = c("pdf"), nrow = 1, ncol = 1, npts = 200, n = 200, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, contour.drawlabels = FALSE, contour.labcex = 0.8, contour.method = "flattest", contour.nlevels = 12, log = "", ...) ## S4 method for signature 'RCLRMIX,missing' plot(x, y, s = expression(c), nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## S4 method for signature 'RCLSMIX,missing' plot(x, y, nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## ... and for other signatures
x |
see Methods section below. |
y |
currently not used. |
pos |
a desired row number in |
s |
a desired number of clusters to be plotted. The default value is |
what |
a character vector giving the plot types. One of |
nrow |
a desired number of rows in which the empirical and predictive densities are to be plotted. The default value is |
ncol |
a desired number of columns in which the empirical and predictive densities are to be plotted. The default value is |
npts |
a number of points at which the predictive densities are to be plotted. The default value is |
n |
a number of observations to be plotted. The default value is |
cex |
a numerical value giving the amount by which the plotting text and symbols should be magnified
relative to the default, see also |
fg |
a colour used for things like axes and boxes around plots, see also |
lty |
a line type, see also |
lwd |
a line width, see also |
pty |
a character specifying the type of the plot region to be used. One of |
tcl |
a length of tick marks as a fraction of the height of a line of the text, see also |
plot.cex |
a numerical vector giving the amount by which plotting characters and symbols should be
scaled relative to the default. It works as a multiple of |
plot.pch |
a vector of plotting characters or symbols, see also |
contour.drawlabels |
logical. The contours are labelled if |
contour.labcex |
|
contour.method |
a character specifying where the labels will be located. The possible values
are |
contour.nlevels |
a number of desired contour levels. The default value is |
log |
a character which contains |
... |
further arguments to |
Returns (invisibly) a list containing graphical parameters par
. Such a list can be passed as an argument to par
to restore the parameter values.
signature(x = "RNGMIX", y = "missing")
an object of class RNGMIX
.
signature(x = "RNGMVNORM", y = "missing")
an object of class RNGMVNORM
.
signature(x = "REBMIX", y = "missing")
an object of class REBMIX
.
signature(x = "REBMVNORM", y = "missing")
an object of class REBMVNORM
.
signature(x = "RCLRMIX", y = "missing")
an object of class RCLRMIX
.
signature(x = "RCLRMVNORM", y = "missing")
an object of class RCLRMVNORM
.
signature(x = "RCLSMIX", y = "missing")
an object of class RCLSMIX
.
signature(x = "RCLSMVNORM", y = "missing")
an object of class RCLSMVNORM
.
Marko Nagode
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
## Not run: devAskNewPage(ask = TRUE) data(wine) colnames(wine) # Remove Cultivar column from wine dataset. winecolnames <- !(colnames(wine) wine <- wine[, winecolnames] # Determine number of dimensions d and wine dataset size n. d <- ncol(wine) n <- nrow(wine) wineest <- REBMIX(model = "REBMVNORM", Dataset = list(wine = wine), Preprocessing = "kernel density estimation", Criterion = "ICL-BIC", EMcontrol = new("EM.Control", strategy = "best")) # Plot finite mixture. plot(wineest, what = c("pdf", "IC", "logL", "D"), nrow = 2, ncol = 2, pty = "s") ## End(Not run)
## Not run: devAskNewPage(ask = TRUE) data(wine) colnames(wine) # Remove Cultivar column from wine dataset. winecolnames <- !(colnames(wine) wine <- wine[, winecolnames] # Determine number of dimensions d and wine dataset size n. d <- ncol(wine) n <- nrow(wine) wineest <- REBMIX(model = "REBMVNORM", Dataset = list(wine = wine), Preprocessing = "kernel density estimation", Criterion = "ICL-BIC", EMcontrol = new("EM.Control", strategy = "best")) # Plot finite mixture. plot(wineest, what = c("pdf", "IC", "logL", "D"), nrow = 2, ncol = 2, pty = "s") ## End(Not run)
Returns the total of positive relative deviations D
at pos
.
## S4 method for signature 'REBMIX' PRD(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' PRD(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
"RCLRMIX"
Object of class RCLRMIX
.
Objects can be created by calls of the form new("RCLRMIX", ...)
.
Accessor methods for the slots are a.Dataset(x = NULL)
, a.pos(x = NULL)
, a.Zt(x = NULL)
,
a.Zp(x = NULL, s = expression(c))
, a.c(x = NULL)
,
a.p(x = NULL, s = expression(c))
, a.pi(x = NULL, s = expression(c))
, a.P(x = NULL, s = expression(c))
, a.tau(x = NULL, s = expression(c))
,
a.prob(x = NULL)
, a.Rule(x = NULL)
, a.from(x = NULL)
, a.to(x = NULL)
,
a.EN(x = NULL)
and a.ED(x = NULL)
, where x
stands for an object of class RCLRMIX
and s
a desired number of clusters for which the slot is calculated.
x
:an object of class REBMIX
.
Dataset
:a data frame or an object of class Histogram
to be clustered.
pos
:a desired row number in x@summary
for which the clustering is performed. The default value is 1
.
Zt
:a factor of true cluster membership.
Zp
:a factor of predictive cluster membership.
c
:number of nonempty clusters.
p
:a vector of length containing prior probabilities of cluster memberships
summing to 1. The value is returned only if all variables in slot
x
follow either binomial or Dirac parametric families. The default value is numeric()
.
pi
:a list of length of matrices of size
containing cluster conditional probabilities
. Let
denote the cluster conditional probability that an observation in cluster
produces the
th outcome on the
th variable.
Suppose we observe
polytomous categorical variables (the manifest variables), each of which contains
possible outcomes for observations
.
A manifest variable is a variable that can be measured or observed directly. It must be coded as whole number starting at zero for the first outcome and increasing to the possible number of outcomes minus one.
It is presumed here that all variables are statistically independentand within clusters and that
stands for an observed
dimensional dataset of size
of vector observations
.
The value is returned only if all variables in slot
x
follow either binomial or Dirac parametric families. The default value is list()
.
P
:a data frame containing true and predictive
frequencies calculated for unique
, where
and
.
tau
:a matrix of size containing conditional probabilities
that observations
arise from clusters
.
prob
:a vector of length containing probabilities of correct clustering for
.
Rule
:a character containing the merging rule. One of "Entropy"
and "Demp"
. The default value is "Entropy"
.
from
:a vector of length containing clusters merged to
to
clusters.
to
:a vector of length containing clusters originating from
from
clusters.
EN
:a vector of length containing entropies for combined clusters.
ED
:a vector of length containing decrease of entropies for combined clusters.
A
:an adjacency matrix of size , where
.
Marko Nagode, Branislav Panic
J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo and R. Gottardo. Combining mixture components for clustering.
Journal of Computational and Graphical Statistics, 19(2):332-353, 2010. doi:10.1198/jcgs.2010.08111
S. Kyoya and K. Yamanishi. Summarizing finite mixture model with overlapping quantification. Entropy, 23(11):1503, 2021. doi:10.3390/e23111503
devAskNewPage(ask = TRUE) # Generate normal dataset. n <- c(500, 200, 400) Theta <- new("RNGMVNORM.Theta", c = 3, d = 2) a.theta1(Theta, 1) <- c(3, 10) a.theta1(Theta, 2) <- c(8, 6) a.theta1(Theta, 3) <- c(12, 11) a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2) a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5) a.theta2(Theta, 3) <- c(2, 1, 1, 2) normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = "normal_1", n = n, Theta = a.Theta(Theta)) # Estimate number of components, component weights and component parameters. normalest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(normal), Preprocessing = "histogram", cmax = 6, Criterion = "BIC") summary(normalest) # Plot finite mixture. plot(normalest) # Cluster dataset. normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest, Zt = a.Zt(normal)) # Plot clusters. plot(normalclu) summary(normalclu)
devAskNewPage(ask = TRUE) # Generate normal dataset. n <- c(500, 200, 400) Theta <- new("RNGMVNORM.Theta", c = 3, d = 2) a.theta1(Theta, 1) <- c(3, 10) a.theta1(Theta, 2) <- c(8, 6) a.theta1(Theta, 3) <- c(12, 11) a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2) a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5) a.theta2(Theta, 3) <- c(2, 1, 1, 2) normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = "normal_1", n = n, Theta = a.Theta(Theta)) # Estimate number of components, component weights and component parameters. normalest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(normal), Preprocessing = "histogram", cmax = 6, Criterion = "BIC") summary(normalest) # Plot finite mixture. plot(normalest) # Cluster dataset. normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest, Zt = a.Zt(normal)) # Plot clusters. plot(normalclu) summary(normalclu)
Returns as default the RCLRMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities, following the methodology proposed in the article cited in the references. If model
equals "RCLRMVNORM"
output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'RCLRMIX' RCLRMIX(model = "RCLRMIX", x = NULL, Dataset = NULL, pos = 1, Zt = factor(), Rule = character(), ...) ## ... and for other signatures ## S4 method for signature 'RCLRMIX' summary(object, ...) ## ... and for other signatures
## S4 method for signature 'RCLRMIX' RCLRMIX(model = "RCLRMIX", x = NULL, Dataset = NULL, pos = 1, Zt = factor(), Rule = character(), ...) ## ... and for other signatures ## S4 method for signature 'RCLRMIX' summary(object, ...) ## ... and for other signatures
model |
see Methods section below. |
x |
an object of class |
Dataset |
a data frame or an object of class |
pos |
a desired row number in |
Zt |
a factor of true cluster membership. The default value is |
Rule |
a character containing the merging rule. One of |
object |
see Methods section below. |
... |
currently not used. |
Returns an object of class RCLRMIX
or RCLRMVNORM
.
signature(model = "RCLRMIX")
a character giving the default class name "RCLRMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "RCLRMVNORM")
a character giving the class name "RCLRMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
signature(object = "RCLRMIX")
an object of class RCLRMIX
.
signature(object = "RCLRMVNORM")
an object of class RCLRMVNORM
.
Marko Nagode
J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo and R. Gottardo. Combining mixture components for clustering. Journal of Computational and Graphical Statistics, 19(2):332-353, 2010. doi:10.1198/jcgs.2010.08111
devAskNewPage(ask = TRUE) # Generate Poisson dataset. n <- c(500, 200, 400) Theta <- new("RNGMIX.Theta", c = 3, pdf = "Poisson") a.theta1(Theta) <- c(3, 12, 36) poisson <- RNGMIX(Dataset.name = "Poisson_1", n = n, Theta = a.Theta(Theta)) # Estimate number of components, component weights and component parameters. EM <- new("EM.Control", strategy = "exhaustive") poissonest <- REBMIX(Dataset = a.Dataset(poisson), Preprocessing = "histogram", cmax = 6, Criterion = "BIC", pdf = rep("Poisson", 1), EMcontrol = EM) summary(poissonest) # Plot finite mixture. plot(poissonest) # Cluster dataset. poissonclu <- RCLRMIX(x = poissonest, Zt = a.Zt(poisson)) summary(poissonclu) # Plot clusters. plot(poissonclu) # Create new dataset. Dataset <- sample.int(n = 50, size = 10, replace = TRUE) Dataset <- as.data.frame(Dataset) # Cluster the dataset. poissonclu <- RCLRMIX(x = poissonest, Dataset = Dataset, Rule = "Demp") a.Dataset(poissonclu)
devAskNewPage(ask = TRUE) # Generate Poisson dataset. n <- c(500, 200, 400) Theta <- new("RNGMIX.Theta", c = 3, pdf = "Poisson") a.theta1(Theta) <- c(3, 12, 36) poisson <- RNGMIX(Dataset.name = "Poisson_1", n = n, Theta = a.Theta(Theta)) # Estimate number of components, component weights and component parameters. EM <- new("EM.Control", strategy = "exhaustive") poissonest <- REBMIX(Dataset = a.Dataset(poisson), Preprocessing = "histogram", cmax = 6, Criterion = "BIC", pdf = rep("Poisson", 1), EMcontrol = EM) summary(poissonest) # Plot finite mixture. plot(poissonest) # Cluster dataset. poissonclu <- RCLRMIX(x = poissonest, Zt = a.Zt(poisson)) summary(poissonclu) # Plot clusters. plot(poissonclu) # Create new dataset. Dataset <- sample.int(n = 50, size = 10, replace = TRUE) Dataset <- as.data.frame(Dataset) # Cluster the dataset. poissonclu <- RCLRMIX(x = poissonest, Dataset = Dataset, Rule = "Demp") a.Dataset(poissonclu)
"RCLS.chunk"
Object of class RCLS.chunk
.
Objects can be created by calls of the form new("RCLS.chunk", ...)
. Accessor methods for the slots are a.s(x = NULL)
,
a.levels(x = NULL)
, a.ntrain(x = NULL)
, a.train(x = NULL)
, a.Zr(x = NULL)
, a.ntest(x = NULL)
, a.test(x = NULL)
and a.Zt(x = NULL)
,
where x
stands for an object of class RCLS.chunk
.
s
:finite set of size of classes
.
levels
:a character vector of length containing class names
.
ntrain
:a vector of length containing numbers of observations in train datasets
.
train
:a list of length of data frames containing train datasets
of length
.
Zr
:a list of factors of true class membership for the train datasets.
ntest
:number of observations in test dataset .
test
:a data frame containing test dataset of length
.
Zt
:a factor of true class membership for the test dataset.
Marko Nagode
D. M. Dziuda. Data Mining for Genomics and Proteomics: Analysis of Gene and Protein Expression Data. John Wiley & Sons, New York, 2010.
"RCLSMIX"
Object of class RCLSMIX
.
Objects can be created by calls of the form new("RCLSMIX", ...)
. Accessor methods for the slots are a.o(x = NULL)
,
a.Dataset(x = NULL)
, a.s(x = NULL)
, a.ntrain(x = NULL)
, a.P(x = NULL)
, a.ntest(x = NULL)
, a.Zt(x = NULL)
,
a.Zp(x = NULL)
, a.CM(x = NULL)
, a.Accuracy(x = NULL)
, a.Error(x = NULL)
, a.Precision(x = NULL)
, a.Sensitivity(x = NULL)
,
a.Specificity(x = NULL)
and a.Chunks(x = NULL)
, where x
stands for an object of class RCLSMIX
.
x
:a list of objects of class REBMIX
of length obtained by running
REBMIX
on train datasets
all of length
.
For the train datasets the corresponding class membership
is known. This yields
, while
for all
.
Each object in the list corresponds to one chunk, e.g.,
.
o
:number of chunks .
is an observed
-dimensional dataset of size
of vector observations
and
is partitioned into train and test datasets. Vector observations
may further be split into
chunks when running
REBMIX
, e.g.,
for and
the set of chunks substituting
may be as follows
,
and
.
Dataset
:a data frame containing test dataset of length
. For the test dataset the corresponding class membership
is not known.
s
:finite set of size of classes
.
ntrain
:a vector of length containing numbers of observations in train datasets
.
P
:a vector of length containing prior probabilities
.
ntest
:number of observations in test dataset .
Zt
:a factor of true class membership for the test dataset.
Zp
:a factor of predictive class membership for the test dataset.
CM
:a table containing confusion matrix for multiclass classifier. It contains
number of test observations with the true class
that are classified into the class
, where
.
Accuracy
:proportion of all test observations that are classified correctly. .
Error
:proportion of all test observations that are classified wrongly. .
Precision
:a vector containing proportions of predictive observations in class that are
classified correctly into class
.
.
Sensitivity
:a vector containing proportions of test observations in class that are classified
correctly into class
.
.
Specificity
:a vector containing proportions of test observations that are not in class and
are classified into the non
class.
.
Chunks
:a vector containing selected chunks.
Marko Nagode
D. M. Dziuda. Data Mining for Genomics and Proteomics: Analysis of Gene and Protein Expression Data. John Wiley & Sons, New York, 2010.
Returns as default the RCLSMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model
equals "RCLSMVNORM"
output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'RCLSMIX' RCLSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(), Zt = factor(), ...) ## ... and for other signatures ## S4 method for signature 'RCLSMIX' summary(object, ...) ## ... and for other signatures
## S4 method for signature 'RCLSMIX' RCLSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(), Zt = factor(), ...) ## ... and for other signatures ## S4 method for signature 'RCLSMIX' summary(object, ...) ## ... and for other signatures
model |
see Methods section below. |
x |
a list of objects of class |
Dataset |
a data frame containing test dataset |
Zt |
a factor of true class membership |
object |
see Methods section below. |
... |
currently not used. |
Returns an object of class RCLSMIX
or RCLSMVNORM
.
signature(model = "RCLSMIX")
a character giving the default class name "RCLSMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "RCLSMVNORM")
a character giving the class name "RCLSMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
signature(object = "RCLSMIX")
an object of class RCLSMIX
.
signature(object = "RCLSMVNORM")
an object of class RCLSMVNORM
.
Marko Nagode
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 1973.
## Not run: devAskNewPage(ask = TRUE) data(adult) # Find complete cases. adult <- adult[complete.cases(adult),] # Replace levels with numbers. adult <- as.data.frame(data.matrix(adult)) # Find numbers of levels. cmax <- unlist(lapply(apply(adult[, c(-1, -16)], 2, unique), length)) cmax # Split adult dataset into train and test subsets for two Incomes # and remove Type and Income columns. Adult <- split(p = list(type = 1, train = 2, test = 1), Dataset = adult, class = 16) # Estimate number of components, component weights and component parameters # for the set of chunks 1:14. adultest <- list() for (i in 1:14) { adultest[[i]] <- REBMIX(Dataset = a.train(chunk(Adult, i)), Preprocessing = "histogram", cmax = min(120, cmax[i]), Criterion = "BIC", pdf = "Dirac", K = 1) } # Class membership prediction based upon the best first search algorithm. adultcla <- BFSMIX(x = adultest, Dataset = a.test(Adult), Zt = a.Zt(Adult)) adultcla summary(adultcla) # Plot selected chunks. plot(adultcla, nrow = 5, ncol = 2) ## End(Not run)
## Not run: devAskNewPage(ask = TRUE) data(adult) # Find complete cases. adult <- adult[complete.cases(adult),] # Replace levels with numbers. adult <- as.data.frame(data.matrix(adult)) # Find numbers of levels. cmax <- unlist(lapply(apply(adult[, c(-1, -16)], 2, unique), length)) cmax # Split adult dataset into train and test subsets for two Incomes # and remove Type and Income columns. Adult <- split(p = list(type = 1, train = 2, test = 1), Dataset = adult, class = 16) # Estimate number of components, component weights and component parameters # for the set of chunks 1:14. adultest <- list() for (i in 1:14) { adultest[[i]] <- REBMIX(Dataset = a.train(chunk(Adult, i)), Preprocessing = "histogram", cmax = min(120, cmax[i]), Criterion = "BIC", pdf = "Dirac", K = 1) } # Class membership prediction based upon the best first search algorithm. adultcla <- BFSMIX(x = adultest, Dataset = a.test(Adult), Zt = a.Zt(Adult)) adultcla summary(adultcla) # Plot selected chunks. plot(adultcla, nrow = 5, ncol = 2) ## End(Not run)
"REBMIX"
Object of class REBMIX
.
Objects can be created by calls of the form new("REBMIX", ...)
. Accessor methods for the slots are a.Dataset(x = NULL, pos = 0)
,
a.Preprocessing(x = NULL)
, a.cmax(x = NULL)
, a.cmin(x = NULL)
, a.Criterion(x = NULL)
, a.Variables(x = NULL)
,
a.pdf(x = NULL)
, a.theta1(x = NULL)
, a.theta2(x = NULL)
, a.theta3(x = NULL)
, a.K(x = NULL)
, a.ymin(x = NULL)
,
a.ymax(x = NULL)
, a.ar(x = NULL)
, a.Restraints(x = NULL)
, a.Mode(x = NULL)
, a.w(x = NULL, pos = 0)
, a.Theta(x = NULL, pos = 0)
, a.summary(x = NULL, col.name = character(), pos = 0)
,
a.summary.EM(x = NULL, col.name = character(), pos = 0)
, a.pos(x = NULL)
,
a.opt.c(x = NULL)
, a.opt.IC(x = NULL)
, a.opt.logL(x = NULL)
, a.opt.Dmin(x = NULL)
, a.opt.D(x = NULL)
, a.all.K(x = NULL)
, a.all.IC(x = NULL)
,
a.theta1.all(x = NULL, pos = 1)
, a.theta2.all(x = NULL, pos = 1)
and a.theta3.all(x = NULL, pos = 1)
, where x
, pos
and col.name
stand for an object of class REBMIX
,
a desired slot item and a desired column name, respectively.
Dataset
:a list of length of data frames or objects of class
Histogram
.
Data frames should have size containing d-dimensional datasets. Each of the
columns represents one random variable. Numbers of observations
equal the number of rows in the datasets.
Preprocessing
:a character vector giving the preprocessing types. One of "histogram"
, "kernel density estimation"
or "k-nearest neighbour"
.
cmax
:maximum number of components . The default value is
15
.
cmin
:minimum number of components . The default value is
1
.
Criterion
:a character giving the information criterion type. One of default Akaike "AIC"
, "AIC3"
, "AIC4"
or "AICc"
,
Bayesian "BIC"
, consistent Akaike "CAIC"
, Hannan-Quinn "HQC"
, minimum description length "MDL2"
or "MDL5"
,
approximate weight of evidence "AWE"
, classification likelihood "CLC"
,
integrated classification likelihood "ICL"
or "ICL-BIC"
, partition coefficient "PC"
,
total of positive relative deviations "D"
or sum of squares error "SSE"
.
Variables
:a character vector of length containing types of variables. One of
"continuous"
or "discrete"
.
pdf
:a character vector of length containing continuous or discrete parametric family types. One of
"normal"
, "lognormal"
, "Weibull"
, "gamma"
, "Gumbel"
, "binomial"
, "Poisson"
, "Dirac"
, "uniform"
or "vonMises"
.
theta1
:a vector of length containing initial component parameters. One of
for
"binomial"
distribution.
theta2
:a vector of length containing initial component parameters. Currently not used.
theta3
:a vector of length containing initial component parameters. One of
for
"Gumbel"
distribution.
K
:a character or a vector or a list of vectors containing numbers of bins for the histogram and the kernel density estimation or numbers of nearest
neighbours
for the k-nearest neighbour. There is no genuine rule to identify
or
. Consequently,
the REBMIX algorithm identifies them from the set
K
of input values by
minimizing the information criterion. The Sturges rule ,
rule
or RootN
rule
can be applied to estimate the limiting numbers of bins
or the rule of thumb
to guess the intermediate number of nearest neighbours. If, e.g.,
K = c(10, 20, 40, 60)
and minimum IC
coincides, e.g., 40
, brackets are set to 20
and 60
and the golden section is applied to refine the minimum search. See also kseq
for sequence of bins or nearest neighbours generation. The default value is "auto"
.
ymin
:a vector of length containing minimum observations. The default value is
numeric()
.
ymax
:a vector of length containing maximum observations. The default value is
numeric()
.
ar
:acceleration rate . The default value is
0.1
and in most cases does not have to be altered.
Restraints
:a character giving the restraints type. One of "rigid"
or default "loose"
.
The rigid restraints are obsolete and applicable for well separated components only.
Mode
:a character giving the mode type. One of "all"
, "outliers"
or default "outliersplus"
.The modes are determined in decreasing order of magnitude from all observations if Mode = "all"
.
If Mode = "outliers"
, the modes are determined in decreasing order of magnitude from outliers only. In the meantime, some outliers are reclassified as inliers. Finally, when all observations are inliers, the procedure is completed.
If Mode = "outliersplus"
, the modes are determined in decreasing magnitude from the outliers only. In the meantime, some outliers are reclassified as inliers. Finally, if all observations are inliers, they are converted to outliers and the mode determination procedure is continued.
w
:a list of vectors of length containing component weights
summing to 1.
Theta
:a list of lists each containing parametric family types
pdfl
. One of "normal"
, "lognormal"
, "Weibull"
, "gamma"
, "Gumbel"
, "binomial"
, "Poisson"
, "Dirac"
, "uniform"
or circular "vonMises"
defined for .
Component parameters
theta1.l
follow the parametric family types. One of for normal, lognormal, Gumbel and von Mises distributions,
for Weibull, gamma, binomial, Poisson and Dirac distributions and
for uniform distribution.
Component parameters
theta2.l
follow theta1.l
. One of for normal, lognormal and Gumbel distributions,
for Weibull and gamma distributions,
for binomial distribution,
for von Mises distribution and
for uniform distribution.
Component parameters
theta3.l
follow theta2.l
. One of for Gumbel distribution.
summary
:a data frame with additional information about dataset, preprocessing, ,
, information criterion type,
, restraints type, mode type, optimal
, optimal
or
,
,
,
,
, optimal
,
information criterion
, log likelihood
and degrees of freedom
.
summary.EM
:a data frame with additional information about dataset, strategy for the EM algorithm strategy
,
variant of the EM algorithm variant
, acceleration type acceleration
, tolerance tolerance
, acceleration multilplier acceleration.multiplier
,
maximum allowed number of iterations maximum.iterations
, number of iterations used for obtaining optimal solution opt.iterations.nbr
and total number of iterations of the EM algorithm total.iterations.nbr
.
pos
:position in the summary
data frame at which log likelihood attains its maximum.
opt.c
:a list of vectors containing numbers of components for optimal for the histogram and the kernel density estimation or for optimal number of nearest
neighbours
for the k-nearest neighbour.
opt.IC
:a list of vectors containing information criteria for optimal for the histogram and the kernel density estimation or for optimal number of nearest
neighbours
for the k-nearest neighbour.
opt.logL
:a list of vectors containing log likelihoods for optimal for the histogram and the kernel density estimation or for optimal number of nearest
neighbours
for the k-nearest neighbour.
opt.Dmin
:a list of vectors containing values for optimal
for the histogram and the kernel density estimation or for optimal number of nearest
neighbours
for the k-nearest neighbour.
opt.D
:a list of vectors containing totals of positive relative deviations for optimal for the histogram and the kernel density estimation or for optimal number of nearest
neighbours
for the k-nearest neighbour.
all.K
:a list of vectors containing all processed numbers of bins for the histogram and the kernel density estimation or all processed numbers of nearest
neighbours
for the k-nearest neighbour.
all.IC
:a list of vectors containing information criteria for all processed numbers of bins for the histogram and the kernel density estimation or for all processed numbers of nearest
neighbours
for the k-nearest neighbour.
Marko Nagode
Returns as default the REBMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model
equals "REBMVNORM"
output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'REBMIX' REBMIX(model = "REBMIX", Dataset = list(), Preprocessing = character(), cmax = 15, cmin = 1, Criterion = "AIC", pdf = character(), theta1 = numeric(), theta2 = numeric(), theta3 = numeric(), K = "auto", ymin = numeric(), ymax = numeric(), ar = 0.1, Restraints = "loose", Mode = "outliersplus", EMcontrol = NULL, ...) ## ... and for other signatures ## S4 method for signature 'REBMIX' summary(object, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' REBMIX(model = "REBMIX", Dataset = list(), Preprocessing = character(), cmax = 15, cmin = 1, Criterion = "AIC", pdf = character(), theta1 = numeric(), theta2 = numeric(), theta3 = numeric(), K = "auto", ymin = numeric(), ymax = numeric(), ar = 0.1, Restraints = "loose", Mode = "outliersplus", EMcontrol = NULL, ...) ## ... and for other signatures ## S4 method for signature 'REBMIX' summary(object, ...) ## ... and for other signatures
model |
see Methods section below. |
Dataset |
a list of length |
Preprocessing |
a character giving the preprocessing type. One of |
cmax |
maximum number of components |
cmin |
minimum number of components |
Criterion |
a character giving the information criterion type. One of default Akaike |
pdf |
a character vector of length |
theta1 |
a vector of length |
theta2 |
a vector of length |
theta3 |
a vector of length |
K |
a character or a vector or a matrix of size |
ymin |
a vector of length |
ymax |
a vector of length |
ar |
acceleration rate |
Restraints |
a character giving the restraints type. One of |
Mode |
a character giving the mode type. One of |
EMcontrol |
an object of class |
object |
see Methods section below. |
... |
currently not used. |
Returns an object of class REBMIX
or REBMVNORM
.
signature(model = "REBMIX")
a character giving the default class name "REBMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "REBMVNORM")
a character giving the class name "REBMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
signature(object = "REBMIX")
an object of class REBMIX
.
signature(object = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
H. A. Sturges. The choice of a class interval. Journal of American Statistical Association, 21(153):
65-66, 1926. https://www.jstor.org/stable/2965501.
P. F. Velleman. Interactive computing for exploratory data analysis I: display algorithms. Proceedings of the Statistical Computing Section,
American Statistical Association, 1976.
W. J. Dixon and R. A. Kronmal. The Choice of origin and scale for graphs. Journal of the ACM, 12(2):
259-261, 1965. doi:10.1145/321264.321277.
M. Nagode and M. Fajdiga. A general multi-modal probability density function suitable for the
rainflow ranges of stationary random processes. International Journal of Fatigue, 20(3):211-223,
1998. doi:10.1016/S0142-1123(97)00106-0.
M. Nagode and M. Fajdiga. An improved algorithm for parameter estimation suitable for mixed
weibull distributions. International Journal of Fatigue, 22(1):75-80, 2000. doi:10.1016/S0142-1123(99)00112-7.
M. Nagode, J. Klemenc and M. Fajdiga. Parametric modelling and scatter prediction of rainflow
matrices. International Journal of Fatigue, 23(6):525-532, 2001. doi:10.1016/S0142-1123(01)00007-X.
M. Nagode and M. Fajdiga. An alternative perspective on the mixture estimation problem. Reliability
Engineering & System Safety, 91(4):388-397, 2006. doi:10.1016/j.ress.2005.02.005.
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
# Generate and plot univariate normal dataset. n <- c(998, 263, 1086, 487) Theta <- new("RNGMIX.Theta", c = 4, pdf = "normal") a.theta1(Theta) <- c(688, 265, 30, 934) a.theta2(Theta) <- c(72, 54, 34, 28) normal <- RNGMIX(Dataset.name = "complex1", rseed = -1, n = n, Theta = a.Theta(Theta)) normal a.Dataset(normal, 1)[1:20,] # Estimate number of components, component weights and component parameters. normalest <- REBMIX(Dataset = a.Dataset(normal), Preprocessing = "h", cmax = 8, Criterion = "BIC", pdf = "n") normalest BIC(normalest) logL(normalest) # Plot finite mixture. plot(normalest, nrow = 2, what = c("pdf", "marginal cdf"), npts = 1000) # EM algorithm utilization # Load iris data. data(iris) Dataset <- list(data.frame(iris[, c(1:4)])) # Create EM.Control object. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", tolerance = 1e-4, acceleration.multiplier = 1.0, maximum.iterations = 1000) # Mixture parameter estimation using REBMIX and EM algorithm. irisest <- REBMIX(model = "REBMVNORM", Dataset = Dataset, Preprocessing = "histogram", cmax = 10, Criterion = "BIC", EMcontrol = EM) irisest # Print total number of EM iterations used in Ehxaustive strategy from summary.EM slot. a.summary.EM(irisest, col.name = "total.iterations.nbr", pos = 1)
# Generate and plot univariate normal dataset. n <- c(998, 263, 1086, 487) Theta <- new("RNGMIX.Theta", c = 4, pdf = "normal") a.theta1(Theta) <- c(688, 265, 30, 934) a.theta2(Theta) <- c(72, 54, 34, 28) normal <- RNGMIX(Dataset.name = "complex1", rseed = -1, n = n, Theta = a.Theta(Theta)) normal a.Dataset(normal, 1)[1:20,] # Estimate number of components, component weights and component parameters. normalest <- REBMIX(Dataset = a.Dataset(normal), Preprocessing = "h", cmax = 8, Criterion = "BIC", pdf = "n") normalest BIC(normalest) logL(normalest) # Plot finite mixture. plot(normalest, nrow = 2, what = c("pdf", "marginal cdf"), npts = 1000) # EM algorithm utilization # Load iris data. data(iris) Dataset <- list(data.frame(iris[, c(1:4)])) # Create EM.Control object. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", tolerance = 1e-4, acceleration.multiplier = 1.0, maximum.iterations = 1000) # Mixture parameter estimation using REBMIX and EM algorithm. irisest <- REBMIX(model = "REBMVNORM", Dataset = Dataset, Preprocessing = "histogram", cmax = 10, Criterion = "BIC", EMcontrol = EM) irisest # Print total number of EM iterations used in Ehxaustive strategy from summary.EM slot. a.summary.EM(irisest, col.name = "total.iterations.nbr", pos = 1)
"REBMIX.boot"
Object of class REBMIX.boot
.
Objects can be created by calls of the form new("REBMIX.boot", ...)
. Accessor methods for the slots are a.rseed(x = NULL)
,
a.pos(x = NULL)
, a.Bootstrap(x = NULL)
, a.B(x = NULL)
, a.n(x = NULL)
, a.replace(x = NULL)
, a.prob(x = NULL)
,
a.c(x = NULL)
, a.c.se(x = NULL)
, a.c.cv(x = NULL)
, a.c.mode(x = NULL)
, a.c.prob(x = NULL)
, a.w(x = NULL)
,
a.w.se(x = NULL)
, a.w.cv(x = NULL)
, a.Theta(x = NULL)
, a.Theta.se(x = NULL)
and a.Theta.cv(x = NULL)
, where x
stands for an object of class REBMIX.boot
.
x
:an object of class REBMIX
.
rseed
:set the random seed to any negative integer value to initialize the sequence. The first bootstrap dataset corresponds to it.
For each next bootstrap dataset the random seed is decremented . The default value is
-1
.
pos
:a desired row number in x@summary
to be bootstrapped. The default value is 1
.
Bootstrap
:a character giving the bootstrap type. One of default "parametric"
or "nonparametric"
.
B
:number of bootstrap datasets. The default value is 100
.
n
:number of observations. The default value is numeric()
.
replace
:logical. The sampling is with replacement if TRUE
, see also sample
. The default value is TRUE
.
prob
:a vector of length containing probability weights, see also
sample
. The default value is numeric()
.
c
:a vector containing numbers of components for bootstrap datasets.
c.se
:standard error of numbers of components c
.
c.cv
:coefficient of variation of numbers of components c
.
c.mode
:mode of numbers of components c
.
c.prob
:probability of mode c.mode
.
w
:a matrix containing component weights for bootstrap datasets.
w.se
:a vector containing standard errors of component weights w
.
w.cv
:a vector containing coefficients of variation of component weights w
.
Theta
:a list of matrices containing component parameters theta1.l
, theta2.l
and theta3.l
for bootstrap datasets.
Theta.se
:a list of vectors containing standard errors of component parameters theta1.l
, theta2.l
and theta3.l
.
Theta.cv
:a list of vectors containing coefficients of variation of component parameters theta1.l
, theta2.l
and theta3.l
.
Marko Nagode
"RNGMIX"
Object of class RNGMIX
.
Objects can be created by calls of the form new("RNGMIX", ...)
. Accessor methods for the slots are a.Dataset.name(x = NULL)
,
a.rseed(x = NULL)
, a.n(x = NULL)
, a.Theta(x = NULL)
, a.Dataset(x = NULL, pos = 0)
,
a.Zt(x = NULL)
, a.w(x = NULL)
, a.Variables(x = NULL)
, a.ymin(x = NULL)
and a.ymax(x = NULL)
,
where x
and pos
stand for an object of class RNGMIX
and a desired slot item, respectively.
Dataset.name
:a character vector containing list names of data frames of size that d-dimensional datasets are written in.
rseed
:set the random seed to any negative integer value to initialize the sequence. The first file in Dataset.name
corresponds to it.
For each next file the random seed is decremented . The default value is
-1
.
n
:a vector containing numbers of observations in classes , where number of observations
.
Theta
:a list containing parametric family types
pdfl
. One of "normal"
, "lognormal"
, "Weibull"
, "gamma"
, "Gumbel"
, "binomial"
, "Poisson"
, "Dirac"
, "uniform"
or circular "vonMises"
defined for .
Component parameters
theta1.l
follow the parametric family types. One of for normal, lognormal, Gumbel and von Mises distributions,
for Weibull, gamma, binomial, Poisson and Dirac distributions and
for uniform distribution.
Component parameters
theta2.l
follow theta1.l
. One of for normal, lognormal and Gumbel distributions,
for Weibull and gamma distributions,
for binomial distribution,
for von Mises distribution and
for uniform distribution.
Component parameters
theta3.l
follow theta2.l
. One of for Gumbel distribution.
Dataset
:a list of length of data frames of size
containing d-dimensional datasets. Each of the
columns represents one random variable. Numbers of observations
equal the number of rows
in the datasets.
Zt
:a factor of true cluster membership.
w
:a vector of length containing component weights
summing to 1.
Variables
:a character vector containing types of variables. One of "continuous"
or "discrete"
.
ymin
:a vector of length containing minimum observations.
ymax
:a vector of length containing maximum observations.
Marko Nagode
Returns as default the RNGMIX univariate or multivariate random datasets for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
If model
equals "RNGMVNORM"
multivariate random datasets for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices are returned.
## S4 method for signature 'RNGMIX' RNGMIX(model = "RNGMIX", Dataset.name = character(), rseed = -1, n = numeric(), Theta = list(), ...) ## ... and for other signatures
## S4 method for signature 'RNGMIX' RNGMIX(model = "RNGMIX", Dataset.name = character(), rseed = -1, n = numeric(), Theta = list(), ...) ## ... and for other signatures
model |
see Methods section below. |
Dataset.name |
a character vector containing list names of data frames of size |
rseed |
set the random seed to any negative integer value to initialize the sequence. The first file in |
n |
a vector containing numbers of observations in classes |
Theta |
a list containing |
... |
currently not used. |
RNGMIX is based on the "Minimal" random number generator ran1
of Park and Miller with the Bays-Durham shuffle and added safeguards that returns a uniform random deviate between 0.0 and 1.0
(exclusive of the endpoint values).
Returns an object of class RNGMIX
or RNGMVNORM
.
signature(model = "RNGMIX")
a character giving the default class name "RNGMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "RNGMVNORM")
a character giving the class name "RNGMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
Marko Nagode
W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge, 1992.
devAskNewPage(ask = TRUE) # Generate and print multivariate normal datasets with diagonal # variance-covariance matrices. n <- c(75, 100, 125, 150, 175) Theta <- new("RNGMIX.Theta", c = 5, pdf = rep("normal", 4)) a.theta1(Theta, 1) <- c(10, 12, 10, 12) a.theta1(Theta, 2) <- c(8.5, 10.5, 8.5, 10.5) a.theta1(Theta, 3) <- c(12, 14, 12, 14) a.theta1(Theta, 4) <- c(13, 15, 7, 9) a.theta1(Theta, 5) <- c(7, 9, 13, 15) a.theta2(Theta, 1) <- c(1, 1, 1, 1) a.theta2(Theta, 2) <- c(1, 1, 1, 1) a.theta2(Theta, 3) <- c(1, 1, 1, 1) a.theta2(Theta, 4) <- c(2, 2, 2, 2) a.theta2(Theta, 5) <- c(3, 3, 3, 3) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:25, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 22, nrow = 2, ncol = 3) # Generate and print multivariate normal datasets with unrestricted # variance-covariance matrices. n <- c(200, 50, 50) Theta <- new("RNGMVNORM.Theta", c = 3, d = 3) a.theta1(Theta, 1) <- c(0, 0, 0) a.theta1(Theta, 2) <- c(-6, 3, 6) a.theta1(Theta, 3) <- c(6, 6, 4) a.theta2(Theta, 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1) a.theta2(Theta, 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) a.theta2(Theta, 3) <- c(4, 3.2, 2.8, 3.2, 4, 2.4, 2.8, 2.4, 2) simulated <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("simulated_", 1:2, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 2, nrow = 3, ncol = 1) # Generate and print multivariate mixed continuous-discrete datasets. n <- c(400, 100, 500) Theta <- new("RNGMIX.Theta", c = 3, pdf = c("lognormal", "Poisson", "binomial", "Weibull")) a.theta1(Theta, 1) <- c(1, 2, 10, 2) a.theta1(Theta, 2) <- c(3.5, 10, 10, 10) a.theta1(Theta, 3) <- c(2.5, 15, 10, 25) a.theta2(Theta, 1) <- c(0.3, NA, 0.9, 3) a.theta2(Theta, 2) <- c(0.2, NA, 0.1, 7) a.theta2(Theta, 3) <- c(0.4, NA, 0.7, 20) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:5, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 4, nrow = 2, ncol = 3) # Generate and print univariate mixed Weibull dataset. n <- c(75, 100, 125, 150, 175) Theta <- new("RNGMIX.Theta", c = 5, pdf = "Weibull") a.theta1(Theta) <- c(12, 10, 14, 15, 9) a.theta2(Theta) <- c(2, 4.1, 3.2, 7.1, 5.3) simulated <- RNGMIX(Dataset.name = "simulated", rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 1) # Generate and print multivariate normal datasets with unrestricted # variance-covariance matrices. # Set dimension, dataset size, number of components and seed. d <- 2; n <- 1000; c <- 10; set.seed(123) # Component weights are generated. w <- runif(c, 0.1, 0.9); w <- w / sum(w) # Set range of means and rang of eigenvalues. mu <- c(-100, 100); lambda <- c(1, 100) # Component means and variance-covariance matrices are calculated. Mu <- list(); Sigma <- list() for (l in 1:c) { Mu[[l]] <- runif(d, mu[1], mu[2]) Lambda <- diag(runif(d, lambda[1], lambda[2]), nrow = d, ncol = d) P <- svd(matrix(runif(d * d, -1, 1), nc = d))$u Sigma[[l]] <- P } # Numbers of observations are calculated and component means and # variance-covariance matrices are stored. n <- round(w * n); Theta <- list() for (l in 1:c) { Theta[[paste0("pdf", l)]] <- rep("normal", d) Theta[[paste0("theta1.", l)]] <- Mu[[l]] Theta[[paste0("theta2.", l)]] <- as.vector(Sigma[[l]]) } # Dataset is generated. simulated <- RNGMIX(model = "RNGMVNORM", Dataset.name = "mvnorm_1", rseed = -1, n = n, Theta = Theta) plot(simulated) # Generate and print bivariate mixed uniform-Gumbel dataset. n <- c(100, 150) Theta <- new("RNGMIX.Theta", c = 2, pdf = c("uniform", "Gumbel")) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(10, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(10, 50) a.theta2(Theta, l = 2) <- c(30, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) plot(simulated)
devAskNewPage(ask = TRUE) # Generate and print multivariate normal datasets with diagonal # variance-covariance matrices. n <- c(75, 100, 125, 150, 175) Theta <- new("RNGMIX.Theta", c = 5, pdf = rep("normal", 4)) a.theta1(Theta, 1) <- c(10, 12, 10, 12) a.theta1(Theta, 2) <- c(8.5, 10.5, 8.5, 10.5) a.theta1(Theta, 3) <- c(12, 14, 12, 14) a.theta1(Theta, 4) <- c(13, 15, 7, 9) a.theta1(Theta, 5) <- c(7, 9, 13, 15) a.theta2(Theta, 1) <- c(1, 1, 1, 1) a.theta2(Theta, 2) <- c(1, 1, 1, 1) a.theta2(Theta, 3) <- c(1, 1, 1, 1) a.theta2(Theta, 4) <- c(2, 2, 2, 2) a.theta2(Theta, 5) <- c(3, 3, 3, 3) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:25, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 22, nrow = 2, ncol = 3) # Generate and print multivariate normal datasets with unrestricted # variance-covariance matrices. n <- c(200, 50, 50) Theta <- new("RNGMVNORM.Theta", c = 3, d = 3) a.theta1(Theta, 1) <- c(0, 0, 0) a.theta1(Theta, 2) <- c(-6, 3, 6) a.theta1(Theta, 3) <- c(6, 6, 4) a.theta2(Theta, 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1) a.theta2(Theta, 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) a.theta2(Theta, 3) <- c(4, 3.2, 2.8, 3.2, 4, 2.4, 2.8, 2.4, 2) simulated <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("simulated_", 1:2, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 2, nrow = 3, ncol = 1) # Generate and print multivariate mixed continuous-discrete datasets. n <- c(400, 100, 500) Theta <- new("RNGMIX.Theta", c = 3, pdf = c("lognormal", "Poisson", "binomial", "Weibull")) a.theta1(Theta, 1) <- c(1, 2, 10, 2) a.theta1(Theta, 2) <- c(3.5, 10, 10, 10) a.theta1(Theta, 3) <- c(2.5, 15, 10, 25) a.theta2(Theta, 1) <- c(0.3, NA, 0.9, 3) a.theta2(Theta, 2) <- c(0.2, NA, 0.1, 7) a.theta2(Theta, 3) <- c(0.4, NA, 0.7, 20) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:5, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 4, nrow = 2, ncol = 3) # Generate and print univariate mixed Weibull dataset. n <- c(75, 100, 125, 150, 175) Theta <- new("RNGMIX.Theta", c = 5, pdf = "Weibull") a.theta1(Theta) <- c(12, 10, 14, 15, 9) a.theta2(Theta) <- c(2, 4.1, 3.2, 7.1, 5.3) simulated <- RNGMIX(Dataset.name = "simulated", rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 1) # Generate and print multivariate normal datasets with unrestricted # variance-covariance matrices. # Set dimension, dataset size, number of components and seed. d <- 2; n <- 1000; c <- 10; set.seed(123) # Component weights are generated. w <- runif(c, 0.1, 0.9); w <- w / sum(w) # Set range of means and rang of eigenvalues. mu <- c(-100, 100); lambda <- c(1, 100) # Component means and variance-covariance matrices are calculated. Mu <- list(); Sigma <- list() for (l in 1:c) { Mu[[l]] <- runif(d, mu[1], mu[2]) Lambda <- diag(runif(d, lambda[1], lambda[2]), nrow = d, ncol = d) P <- svd(matrix(runif(d * d, -1, 1), nc = d))$u Sigma[[l]] <- P } # Numbers of observations are calculated and component means and # variance-covariance matrices are stored. n <- round(w * n); Theta <- list() for (l in 1:c) { Theta[[paste0("pdf", l)]] <- rep("normal", d) Theta[[paste0("theta1.", l)]] <- Mu[[l]] Theta[[paste0("theta2.", l)]] <- as.vector(Sigma[[l]]) } # Dataset is generated. simulated <- RNGMIX(model = "RNGMVNORM", Dataset.name = "mvnorm_1", rseed = -1, n = n, Theta = Theta) plot(simulated) # Generate and print bivariate mixed uniform-Gumbel dataset. n <- c(100, 150) Theta <- new("RNGMIX.Theta", c = 2, pdf = c("uniform", "Gumbel")) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(10, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(10, 50) a.theta2(Theta, l = 2) <- c(30, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) plot(simulated)
"RNGMIX.Theta"
Object of class RNGMIX.Theta
.
Objects can be created by calls of the form new("RNGMIX.Theta", ...)
. Accessor methods for the slots are a.c(x = NULL)
, a.d(x = NULL)
,
a.pdf(x = NULL)
and a.Theta(x = NULL)
, where x
stands for an object of class RNGMIX.Theta
. Setter methods
a.theta1(x = NULL, l = numeric())
, a.theta2(x = NULL, l = numeric())
and a.theta3(x = NULL, l = numeric())
,
a.theta1.all(x = NULL)
, a.theta2.all(x = NULL)
and a.theta3.all(x = NULL)
are provided to write to Theta
slot, where .
c
:number of components . The default value is
1
.
d
:number of dimensions.
pdf
:a character vector of length containing continuous or discrete parametric family types. One of
"normal"
, "lognormal"
, "Weibull"
, "gamma"
, "Gumbel"
, "binomial"
, "Poisson"
, "Dirac"
, "uniform"
or "vonMises"
.
Theta
:a list containing parametric family types
pdfl
. One of "normal"
, "lognormal"
, "Weibull"
, "gamma"
, "Gumbel"
, "binomial"
, "Poisson"
, "Dirac"
, "uniform"
or circular "vonMises"
defined for .
Component parameters
theta1.l
follow the parametric family types. One of for normal, lognormal, Gumbel and von Mises distributions,
for Weibull, gamma, binomial, Poisson and Dirac distributions and
for uniform distribution.
Component parameters
theta2.l
follow theta1.l
. One of for normal, lognormal and Gumbel distributions,
for Weibull and gamma distributions,
for binomial distribution,
for von Mises distribution and
for uniform distribution.
Component parameters
theta3.l
follow theta2.l
. One of for Gumbel distribution.
Marko Nagode
Theta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(0.5, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(20, 50) a.theta2(Theta, l = 2) <- c(3, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) Theta Theta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.theta1.all(Theta) <- c(2, 10, 20, 50) a.theta2.all(Theta) <- c(0.5, 2.3, 3, 4.2) a.theta3.all(Theta) <- c(NA, 1.0, NA, -1.0) Theta Theta <- new("RNGMVNORM.Theta", c = 2, d = 3) a.theta1(Theta, l = 1) <- c(2, 10, -20) a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30) Theta
Theta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(0.5, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(20, 50) a.theta2(Theta, l = 2) <- c(3, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) Theta Theta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.theta1.all(Theta) <- c(2, 10, 20, 50) a.theta2.all(Theta) <- c(0.5, 2.3, 3, 4.2) a.theta3.all(Theta) <- c(NA, 1.0, NA, -1.0) Theta Theta <- new("RNGMVNORM.Theta", c = 2, d = 3) a.theta1(Theta, l = 1) <- c(2, 10, -20) a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30) Theta
These data are the results of a sensorless drive diagnosis procedure. Features are extracted from the electric current drive signals. The drive has intact and defective components. This results in 11 different classes with different conditions. Each condition has been measured several times by 12 different operating conditions, this means by different speeds, load moments and load forces. The current signals are measured with a current probe and an oscilloscope on two phases. The original dataset contains 49 features, however, here only 3 are used, that is, features 5, 7 and 11. First class (1) are the healthy drives and the rest are the drives with fault components.
data(sensorlessdrive)
data(sensorlessdrive)
sensorlessdrive
is a data frame with 58509 cases (rows) and 4 variables (columns) named:
V5
continuous.
V7
continuous.
V11
continuous.
Class
discrete 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 10
or 11
.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
F. Paschke1, C. Bayer, M. Bator, U. Moenks, A. Dicks, O. Enge-Rosenblatt and V. Lohweg. Sensorlose Zustandsueberwachung an Synchronmotoren.
23. Workshop Computational Intelligence VDI/VDE-Gesellschaft Mess- und Automatisierungstechnik (GMA), 2013.
M. Bator, A. Dicks, U. Moenks and V. Lohweg. Feature extraction and reduction applied to sensorless drive diagnosis.
22. Workshop Computational Intelligence VDI/VDE-Gesellschaft Mess- und Automatisierungstechnik (GMA), 2012. doi:10.13140/2.1.2421.5689.
## Not run: data(sensorlessdrive) # Split dataset into train (75 set.seed(3) Drive <- split(p = 0.75, Dataset = sensorlessdrive, class = 4) # Estimate number of components, component weights and component # parameters for train subsets. driveest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Drive), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. drivecla <- RCLSMIX(model = "RCLSMVNORM", x = list(driveest), Dataset = a.test(Drive), Zt = a.Zt(Drive)) drivecla summary(drivecla) ## End(Not run)
## Not run: data(sensorlessdrive) # Split dataset into train (75 set.seed(3) Drive <- split(p = 0.75, Dataset = sensorlessdrive, class = 4) # Estimate number of components, component weights and component # parameters for train subsets. driveest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Drive), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. drivecla <- RCLSMIX(model = "RCLSMVNORM", x = list(driveest), Dataset = a.test(Drive), Zt = a.Zt(Drive)) drivecla summary(drivecla) ## End(Not run)
Returns (invisibly) the object containing train and test observations as well as true class membership
for the test dataset.
## S4 method for signature 'numeric' split(p = 0.75, Dataset = data.frame(), class = numeric(), ...) ## S4 method for signature 'list' split(p = list(), Dataset = data.frame(), class = numeric(), ...) ## ... and for other signatures
## S4 method for signature 'numeric' split(p = 0.75, Dataset = data.frame(), class = numeric(), ...) ## S4 method for signature 'list' split(p = list(), Dataset = data.frame(), class = numeric(), ...) ## ... and for other signatures
p |
see Methods section below. |
Dataset |
a data frame containing dataset |
class |
a column number in |
... |
further arguments to |
Returns an object of class RCLS.chunk
.
signature(p = "numeric")
a number specifying the fraction of observations for training . The default value is
0.75
.
signature(p = "list")
a list composed of column number p$type
in Dataset
containing the type membership information followed by the corresponding train p$train
and test p$test
values.
The default value is list()
.
Marko Nagode
## Not run: data(iris) # Split dataset into train (75 set.seed(5) Iris <- split(p = 0.75, Dataset = iris, class = 5) Iris # Generate simulated dataset. N <- 1000 class <- c(rep("A", 0.4 * N), rep("B", 0.2 * N), rep("C", 0.1 * N), rep("D", 0.05 * N), rep("E", 0.25 * N)) type <- c(rep("train", 0.75 * N), rep("test", 0.25 * N)) n <- 300 Dataset <- data.frame(1:n, sample(class, n)) colnames(Dataset) <- c("y", "class") # Split dataset into train (60 simulated <- split(p = 0.6, Dataset = Dataset, class = 2) simulated # Generate simulated dataset. Dataset <- data.frame(1:n, sample(class, n), sample(type, n)) colnames(Dataset) <- c("y", "class", "type") # Split dataset into train and test subsets. simulated <- split(p = list(type = 3, train = "train", test = "test"), Dataset = Dataset, class = 2) simulated ## End(Not run)
## Not run: data(iris) # Split dataset into train (75 set.seed(5) Iris <- split(p = 0.75, Dataset = iris, class = 5) Iris # Generate simulated dataset. N <- 1000 class <- c(rep("A", 0.4 * N), rep("B", 0.2 * N), rep("C", 0.1 * N), rep("D", 0.05 * N), rep("E", 0.25 * N)) type <- c(rep("train", 0.75 * N), rep("test", 0.25 * N)) n <- 300 Dataset <- data.frame(1:n, sample(class, n)) colnames(Dataset) <- c("y", "class") # Split dataset into train (60 simulated <- split(p = 0.6, Dataset = Dataset, class = 2) simulated # Generate simulated dataset. Dataset <- data.frame(1:n, sample(class, n), sample(type, n)) colnames(Dataset) <- c("y", "class", "type") # Split dataset into train and test subsets. simulated <- split(p = list(type = 3, train = "train", test = "test"), Dataset = Dataset, class = 2) simulated ## End(Not run)
Returns the sum of squares error at pos
.
## S4 method for signature 'REBMIX' SSE(x = NULL, pos = 1, ...) ## ... and for other signatures
## S4 method for signature 'REBMIX' SSE(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")
an object of class REBMIX
.
signature(x = "REBMVNORM")
an object of class REBMVNORM
.
Marko Nagode
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
These data are the results of an extraction process from images of faults of steel plates. There are seven different faults: Pastry (1), Z_Scratch (2), K_Scratch (3), Stains (4), Dirtiness (5), Bumps (6), Other faults (7).
data(steelplates)
data(steelplates)
steelplates
is a data frame with 1941 cases (rows) and 28 variables (columns) named:
X_Minimum
integer.
X_Maximum
integer.
Y_Minimum
integer.
Y_Maximum
integer.
Pixels_Areas
integer.
X_Perimeter
integer.
Y_Perimeter
integer.
Sum_of_Luminosity
integer.
Minimum_of_Luminosity
integer.
Maximum_of_Luminosity
integer.
Length_of_Conveyer
integer.
TypeOfSteel_A300
binary.
TypeOfSteel_A400
binary.
Steel_Plate_Thickness
integer.
Edges_Index
continuous.
Empty_Index
continuous.
Square_Index
continuous.
Outside_X_Index
continuous.
Edges_X_Index
continuous.
Edges_Y_Index
continuous.
Outside_Global_Index
continuous.
LogOfAreas
continuous.
Log_X_Index
continuous.
Log_Y_Index
continuous.
Orientation_Index
continuous.
Luminosity_Index
continuous.
SigmoidOfAreas
continuous.
Class
discrete 1
, 2
, 3
, 4
, 5
, 6
or 7
.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
M. Buscema, S. Terzi, W. Tastle. A new meta-classifier. Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS, 2010. doi:10.1109/NAFIPS.2010.5548298.
M. Buscema. MetaNet*: The theory of independent judges. Substance Use & Misuse. 33(2):439-461, 1998. doi:10.3109/10826089809115875.
## Not run: data(steelplates) # Split dataset into train (75 set.seed(3) Steelplates <- split(p = 0.75, Dataset = steelplates, class = 28) # Estimate number of components, component weights and component # parameters for train subsets. steelplatesest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Steelplates), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. steelplatescla <- RCLSMIX(model = "RCLSMVNORM", x = list(steelplatesest), Dataset = a.test(Steelplates), Zt = a.Zt(Steelplates)) steelplatescla summary(steelplatescla) ## End(Not run)
## Not run: data(steelplates) # Split dataset into train (75 set.seed(3) Steelplates <- split(p = 0.75, Dataset = steelplates, class = 28) # Estimate number of components, component weights and component # parameters for train subsets. steelplatesest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Steelplates), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. steelplatescla <- RCLSMIX(model = "RCLSMVNORM", x = list(steelplatesest), Dataset = a.test(Steelplates), Zt = a.Zt(Steelplates)) steelplatescla summary(steelplatescla) ## End(Not run)
The dataset contains amplitudes and means measured on a truck wheels.
data(truck)
data(truck)
truck
is a data frame with 31665 rows and 2 variables (columns) named:
Amplitude
continuous.
Mean
continuous.
Mitja Franko
data(truck)
data(truck)
The complete data are the failure times in weeks.
data(weibull)
data(weibull)
weibull
is a data frame with 50 cases (rows) and 1 variables (columns) named:
Failure.Time
continuous.
D. N. P. Murthy, M. Xie and R. Jiang. Weibull Models. John Wiley & Sons, New York, 2003.
data(weibull)
data(weibull)
The dataset contains amplitudes and means simulated from a three component Weibull-normal mixture.
data(weibullnormal)
data(weibullnormal)
weibullnormal
is a data frame with 10000 rows and 2 variables (columns) named:
Amplitude
continuous.
Mean
continuous.
Mitja Franko
data(weibullnormal)
data(weibullnormal)
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars (1-3). The analysis determined the quantities of 13 constituents: alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, flavanoids, nonflavanoid phenols, proanthocyanins, colour intensity, hue, OD280/OD315 of diluted wines, and proline found in each of the three types of the wines. The number of instances in classes 1 to 3 is 59, 71 and 48, respectively.
data(wine)
data(wine)
wine
is a data frame with 178 cases (rows) and 14 variables (columns) named:
Alcohol
continuous.
Malic.Acid
continuous.
Ash
continuous.
Alcalinity.of.Ash
continuous.
Magnesium
continuous.
Total.Phenols
continuous.
Flavanoids
continuous.
Nonflavanoid.Phenols
continuous.
Proanthocyanins
continuous.
Color.Intensity
continuous.
Hue
continuous.
OD280.OD315.of.Diluted.Wines
continuous.
Proline
continuous.
Cultivar
discrete 1
, 2
or 3
.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
S. J. Roberts, R. Everson and I. Rezek. Maximum certainty data partitioning. Pattern Recognition, 33(5):833-839, 2000. doi:10.1016/S0031-3203(99)00086-2.
## Not run: devAskNewPage(ask = TRUE) data(wine) # Show level attributes. levels(factor(wine[["Cultivar"]])) # Split dataset into train (75 set.seed(3) Wine <- split(p = 0.75, Dataset = wine, class = 14) # Estimate number of components, component weights and component # parameters for train subsets. n <- range(a.ntrain(Wine)) K <- c(as.integer(1 + log2(n[1])), # Minimum v follows Sturges rule. as.integer(10 * log10(n[2]))) # Maximum v follows log10 rule. K <- c(floor(K[1]^(1/13)), ceiling(K[2]^(1/13))) wineest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Wine), Preprocessing = "kernel density estimation", cmax = 10, Criterion = "ICL-BIC", pdf = rep("normal", 13), K = K[1]:K[2], Restraints = "loose", Mode = "outliersplus") plot(wineest, pos = 1, nrow = 7, ncol = 6, what = c("pdf")) plot(wineest, pos = 2, nrow = 7, ncol = 6, what = c("pdf")) plot(wineest, pos = 3, nrow = 7, ncol = 6, what = c("pdf")) # Selected chunks. winecla <- RCLSMIX(model = "RCLSMVNORM", x = list(wineest), Dataset = a.test(Wine), Zt = a.Zt(Wine)) winecla summary(winecla) # Plot selected chunks. plot(winecla, nrow = 7, ncol = 6) ## End(Not run)
## Not run: devAskNewPage(ask = TRUE) data(wine) # Show level attributes. levels(factor(wine[["Cultivar"]])) # Split dataset into train (75 set.seed(3) Wine <- split(p = 0.75, Dataset = wine, class = 14) # Estimate number of components, component weights and component # parameters for train subsets. n <- range(a.ntrain(Wine)) K <- c(as.integer(1 + log2(n[1])), # Minimum v follows Sturges rule. as.integer(10 * log10(n[2]))) # Maximum v follows log10 rule. K <- c(floor(K[1]^(1/13)), ceiling(K[2]^(1/13))) wineest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Wine), Preprocessing = "kernel density estimation", cmax = 10, Criterion = "ICL-BIC", pdf = rep("normal", 13), K = K[1]:K[2], Restraints = "loose", Mode = "outliersplus") plot(wineest, pos = 1, nrow = 7, ncol = 6, what = c("pdf")) plot(wineest, pos = 2, nrow = 7, ncol = 6, what = c("pdf")) plot(wineest, pos = 3, nrow = 7, ncol = 6, what = c("pdf")) # Selected chunks. winecla <- RCLSMIX(model = "RCLSMVNORM", x = list(wineest), Dataset = a.test(Wine), Zt = a.Zt(Wine)) winecla summary(winecla) # Plot selected chunks. plot(winecla, nrow = 7, ncol = 6) ## End(Not run)