Title: | Distribution Optimization |
---|---|
Description: | Fits Gaussian Mixtures by applying evolution. As fitness function a mixture of the chi square test for distributions and a novel measure for approximating the common area under curves between multiple Gaussians is used. The package presents an alternative to the commonly used Likelihood Maximization as is used in Expectation Maximization. The algorithm and applications of this package are published under: Lerch, F., Ultsch, A., Lotsch, J. (2020) <doi:10.1038/s41598-020-57432-w>. The evolution is based on the 'GA' package: Scrucca, L. (2013) <doi:10.18637/jss.v053.i04> while the Gaussian Mixture Logic stems from 'AdaptGauss': Ultsch, A, et al. (2015) <doi:10.3390/ijms161025897>. |
Authors: | Florian Lerch, Jorn Lotsch, Alfred Ultsch |
Maintainer: | Florian Lerch <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.6 |
Built: | 2024-12-02 06:46:45 UTC |
Source: | CRAN |
Distribution Optimization fits gaussian mixture models on to one dimensional samples by minimizing the Chi Squared Error by evolutional optimization. It is an alternative to likelihood maximizers like expection maximization. Through the included "Overlapping" Methods, single gaussians can be forced to be separated, achieving various signifcant models to choose from. The evolutionary part is done through the "GA" Package. The Gaussian Mixture Logic is based on the "AdaptGauss" Package.
Florian Lerch, Jorn Lotsch, Alfred Ultsch
Luca Scrucca (2013). GA: A Package for Genetic Algorithms in R. Journal of Statistical Software, 53(4), 1-37. URL http://www.jstatsoft.org/v53/i04/
Calculates the probability of bins/intervals within the dataspace defined by given breaks between them.
BinProb4Mixtures(Means, SDs, Weights, Breaks, IsLogDistribution = rep(F, length(Means)), LimitsAreFinite = T)
BinProb4Mixtures(Means, SDs, Weights, Breaks, IsLogDistribution = rep(F, length(Means)), LimitsAreFinite = T)
Means |
Means of the GMM Components |
SDs |
Standard Deviations of the GMM Components |
Weights |
Weights of the GMM Components |
Breaks |
Breaks Defining c-1 or c+1 bins (depending on LimitsAreFinite) |
IsLogDistribution |
If True, the GMM is interpreted as a logarithmic |
LimitsAreFinite |
If True, there are c+1 Bins, where the first and last bin are of inifinite size |
Probabalities of either c-1 or c+1 bins/intervals (depending on LimitsAreFinite)
Florian Lerch
Data = c(rnorm(50,1,2), rnorm(50,3,4)) NoBins = 20 breaks = seq(min(Data),max(Data), length.out=length(NoBins)+1) BinProb4Mixtures(c(1,3), c(2,4), c(0.5,0.5), breaks)
Data = c(rnorm(50,1,2), rnorm(50,3,4)) NoBins = 20 breaks = seq(min(Data),max(Data), length.out=length(NoBins)+1) BinProb4Mixtures(c(1,3), c(2,4), c(0.5,0.5), breaks)
Fits a Gaussian Mixture Model onto a Dataset by minimizing a fitting error through evolutionary optimization. Every individual encodes one GMM. Details over the evolutionary process itself can be taken from the 'GA' package. ga
DistributionOptimization(Data, Modes, Monitor = 1, SelectionMethod = "UnbiasedTournament", MutationMethod = "Uniform+Focused", CrossoverMethod = "WholeArithmetic", PopulationSize = Modes * 3 * 25, MutationRate = 0.7, Elitism = 0.05, CrossoverRate = 0.2, Iter = Modes * 3 * 200, OverlapTolerance = NULL, IsLogDistribution = rep(F, Modes), ErrorMethod = "chisquare", NoBins = NULL, Seed = NULL, ConcurrentInit = F, ParetoRad = NULL)
DistributionOptimization(Data, Modes, Monitor = 1, SelectionMethod = "UnbiasedTournament", MutationMethod = "Uniform+Focused", CrossoverMethod = "WholeArithmetic", PopulationSize = Modes * 3 * 25, MutationRate = 0.7, Elitism = 0.05, CrossoverRate = 0.2, Iter = Modes * 3 * 200, OverlapTolerance = NULL, IsLogDistribution = rep(F, Modes), ErrorMethod = "chisquare", NoBins = NULL, Seed = NULL, ConcurrentInit = F, ParetoRad = NULL)
Data |
Data to be modelled |
Modes |
Number of expected Modes |
Monitor |
0:no monitoring, 1: status messages, 2: status messages and plots, 3: status messages, plots and calculated error-measures |
SelectionMethod |
1: LinearRank selection 4: UnbiasedTournament 5: FitnessProportional selection |
MutationMethod |
1: UniformRandom mutation 2: NonuniformRandom mutation 4: Focused mutation, alternative random mutation around solution 5: GaussMutationCust 6: TwoPhaseMutation - mutation is uniform random during the first half of iterations, and than focuses around current solution |
CrossoverMethod |
1: single point crossover 2: whole arithmetic crossover 3: local arithmetic crossover 4: blend crossover 5: GaussCrossover - exchange complete gaussian components 6: MultiPointCrossover - Random amount of information between mixtures get exchanged |
PopulationSize |
Size of the population |
MutationRate |
amount (0..1) of population that gets mutated |
Elitism |
amount of best individuals that will survive generation unchanged |
CrossoverRate |
amount of individuals that will be used for crossover |
Iter |
number of iterations of this algorithm |
OverlapTolerance |
ratio between Chi-Square and OverlapError (only if FitnessMethod = 4 (Chi2ValueWithOverlap)) |
IsLogDistribution |
which gauss components should be considered as log gaussian |
ErrorMethod |
"pde": fitting is measured by pareto density estimation. "chisquare": fitting is measured by Chi-Square test |
NoBins |
Number of Bins that will be used for evaluating fitting |
Seed |
Random Seed for reproducible results |
ConcurrentInit |
If true, before initialization a number of short optimizations are done to find a good starting point for evolution |
ParetoRad |
Pareto Radius for Pareto Density Estimation and its plots |
The GA object containing the evolutionary training and a description of the final GMM consisting of means, sdevs and weights.
Florian Lerch
Jorn Lotsch
Alfred Ultsch
## Not run: DistributionOptimization(c(rnorm(200),rnorm(200,3), 2)) ## End(Not run)
## Not run: DistributionOptimization(c(rnorm(200),rnorm(200,3), 2)) ## End(Not run)
Calculates a fitting error as well as the overlapping measure for the mixtures. Combines them with ratio rho in favor of Overlapping.
MixedDistributionError(Means, SDs, Weights, Data, rho = 0.5, breaks = NULL, Kernels = NULL, ErrorMethod = "chisquare")
MixedDistributionError(Means, SDs, Weights, Data, rho = 0.5, breaks = NULL, Kernels = NULL, ErrorMethod = "chisquare")
Means |
Means of the GMM Components |
SDs |
Standard Deviations of the GMM Components |
Weights |
Weights of the GMM Components |
Data |
Empirical Data based on which the GMM is build |
rho |
Ratio of OverlappingError vs Fitting Error |
breaks |
vector containing the breaks between bins |
Kernels |
positions at which density is to be compared |
ErrorMethod |
"pdeerror": fitting error is measured through Pareto Density Estimation. "chisquare": fitting error is measured through the Chi Square fitting error. |
Mixed Error
Florian Lerch
Data = c(rnorm(50,1,2), rnorm(50,3,4)) MixedDistributionError(c(1,3), c(2,4), c(0.5,0.5), Data = Data)
Data = c(rnorm(50,1,2), rnorm(50,3,4)) MixedDistributionError(c(1,3), c(2,4), c(0.5,0.5), Data = Data)
Similarity in GMM by Density
OverlapErrorByDensity(Means, SDs, Weights, Data = NULL, Kernels = NULL)
OverlapErrorByDensity(Means, SDs, Weights, Data = NULL, Kernels = NULL)
Means |
Means of the GMM Components |
SDs |
Standard Deviations of the GMM Components |
Weights |
Weights of the GMM Components |
Data |
Dataset that the GMM should be compared with |
Kernels |
if length(Kernels) = 1: amount of kernels if length(Kernels) > 1: kernels in dataspace at which the GMM Components will be compared with each other |
Calculates the similarity (overlap) between multiple modes in Gaussian Mixture Models. Kernels at equally distanced positions are used, if not explicitly given.
List: OverlapError Error for estimating the maximal Overlap of AUC of PDFs of each pair of GMM Components Kernels Kernels that were used for comparing the GMM Components
Florian Lerch
Data = c(rnorm(50,1,2), rnorm(50,3,4)) V<-OverlapErrorByDensity(c(1,3), c(2,4), c(0.5,0.5), Data = Data) AdaptGauss::PlotMixtures(Data, c(1,3), c(2,4), SingleGausses = TRUE) print(V$OverlapError)
Data = c(rnorm(50,1,2), rnorm(50,3,4)) V<-OverlapErrorByDensity(c(1,3), c(2,4), c(0.5,0.5), Data = Data) AdaptGauss::PlotMixtures(Data, c(1,3), c(2,4), SingleGausses = TRUE) print(V$OverlapError)