Package 'admix' reference manual

Title:	Package Admix for Admixture (aka Contamination) Models
Description:	Implements techniques to estimate the unknown quantities related to two-component admixture models, where the two components can belong to any distribution (note that in the case of multinomial mixtures, the two components must belong to the same family). Estimation methods depend on the assumptions made on the unknown component density; see Bordes and Vandekerkhove (2010) <doi:10.3103/S1066530710010023>, Patra and Sen (2016) <doi:10.1111/rssb.12148>, and Milhaud, Pommeret, Salhi, Vandekerkhove (2024) <doi:10.3150/23-BEJ1593>. In practice, one can estimate both the mixture weight and the unknown component density in a wide variety of frameworks. On top of that, hypothesis tests can be performed in one and two-sample contexts to test the unknown component density (see Milhaud, Pommeret, Salhi and Vandekerkhove (2022) <doi:10.1016/j.jspi.2021.05.010>, and Milhaud, Pommeret, Salhi, Vandekerkhove (2024) <doi:10.3150/23-BEJ1593>). Finally, clustering of unknown mixture components is also feasible in a K-sample setting (see Milhaud, Pommeret, Salhi, Vandekerkhove (2024) <https://jmlr.org/papers/v25/23-0914.html>).
Authors:	Xavier Milhaud [aut, cre], Pierre Vandekerkhove [ctb], Denys Pommeret [ctb], Yahia Salhi [ctb]
Maintainer:	Xavier Milhaud <[email protected]>
License:	GPL (>= 3)
Version:	2.3.3
Built:	2024-12-11 06:47:38 UTC
Source:	CRAN

Clustering of K populations following admixture models

Description

Create clusters on the unknown components related to the K populations following admixture models. Based on the K-sample test using Inversion - Best Matching (IBM) approach, see 'Details' below for further information.

Usage

admix_cluster(
  samples,
  admixMod,
  conf_level = 0.95,
  tune_penalty = TRUE,
  tabul_dist = NULL,
  echo = TRUE,
  ...
)
admix_cluster(
  samples,
  admixMod,
  conf_level = 0.95,
  tune_penalty = TRUE,
  tabul_dist = NULL,
  echo = TRUE,
  ...
)

Arguments

`samples`	A list of the K (K>1) samples to be studied, all following admixture distributions.
`admixMod`	A list of objects of class admix_model, containing useful information about distributions and parameters.
`conf_level`	(default to 0.95) The confidence level of the k-sample tests used in the clustering procedure.
`tune_penalty`	(default to TRUE) A boolean that allows to choose between a classical penalty term or an optimized penalty (embedding some tuning parameters, automatically optimized). Optimized penalty is particularly useful for low/mid-sized samples, or unbalanced sample sizes to detect alternatives to the null hypothesis (H0). It is recommended to use it.
`tabul_dist`	(default to NULL) Only useful for comparisons of detected clusters at different confidence levels. A list of the tabulated distributions of the stochastic integral used in the k-sample test, each element for each cluster previously detected.
`echo`	(default to TRUE) Display the remaining computation time.
`...`	Optional arguments to IBM_k_samples_test; namely 'n_sim_tab', 'parallel' and 'n_cpu'. These are crucial to speed-up the building of clusters.

Value

An object of class admix_cluster, containing 12 attributes: 1) the number of samples under study; 2) the sizes of samples; 3) the information about mixture components in each sample (distributions and parameters); 4) the number of detected clusters; 5) the list of p-values for each k-sample test at the origin of detected clusters; 6) the cluster affiliation for each sample; 7) the confidence level of statistical tests; 8) which samples in which cluster; 9) the size of clusters; 10) the estimated weights of the unknown component distributions inside each cluster (remind that estimated weights are consistent only if unknown components are tested to be identical, which is the case inside clusters); 11) the matrix of pairwise discrepancies across all samples; 12) the list of tabulated distributions used for statistical tests involved in building the clusters.

`samples`	A list of the K (K>0) samples to be studied, all following admixture distributions.
`admixMod`	A list of objects of class admix_model, containing useful information about distributions and parameters.
`est_method`	The estimation method to be applied. Can be one of 'BVdk' (Bordes and Vandekerkhove estimator), 'PS' (Patra and Sen estimator), or 'IBM' (Inversion Best-Matching approach) in the continuous case (continuous random variable). Only 'IBM' for discrete random variables. The same estimation method is performed on each sample if several samples are provided.
`...`	Optional arguments to estim_PS, estim_BVdk or estim_IBM depending on the choice made by the user for the estimation method.

`knownComp_dist`	(Character) The name of the distribution (specified as in R glossary) of the known component of the admixture model
`knownComp_param`	(Character) A vector of the names of the parameters (specified as in R glossary) involved in the chosen known distribution, with their values.

`samples`	A list of the K (K > 0) samples to be studied, each one assumed to follow a mixture distribution.
`admixMod`	A list of objects of class admix_model, containing useful information about distributions and parameters of the contamination / admixture models under study.
`test_method`	The testing method to be applied. Can be either 'poly' (polynomial basis expansion) or 'icv' (inner convergence from IBM). The same testing method is performed between all samples. In the one-sample case, only 'Poly' is available and the test is a gaussianity test. For further details, see section 'Details' below.
`conf_level`	The confidence level of the K-sample test.
`...`	Depending on the choice made by the user for the test method ('poly' or 'icv'), optional arguments to gaussianity_test or orthobasis_test (in case of 'poly'), and to IBM_k_samples_test in case of 'icv'. .

`sample1`	Observations of the sample under study.
`estim.p`	The estimated weight of the unknown component distribution, related to the proportion of the unknown component in the admixture model studied.
`admixMod`	An object of class 'admix_model', containing useful information about distributions and parameters.

`sample1`	Sample under study.
`estim.p`	The estimated weight of the unknown component distribution, related to the proportion of the unknown component in the admixture model studied.
`admixMod`	An object of class 'admix_model', containing useful information about distributions and parameters.

`samples`	The observed sample under study.
`admixMod`	An object of class admix_model, containing useful information about distributions and parameters.
`method`	The method used throughout the optimization process, either 'L-BFGS-B' or 'Nelder-Mead' (see ?optim).

`samples`	A list of the two considered samples.
`admixMod`	A list of two objects of class admix_model, one for each sample.
`n.integ`	Number of data points generated for the distribution on which to integrate.

`samples`	Sample to be studied.
`admixMod`	An object of class admix_model, containing information about the known component distribution and its parameter(s).
`method`	One of 'lwr.bnd', fixed' or 'cv': depending on whether compute some lower bound of the mixing proportion, the estimate based on the value of 'c.n' or use cross-validation for choosing 'c.n' (tuning parameter).
`c.n`	(default to NULL) A positive number for the penalization, see reference below. If NULL, equals to 0.1*log(log(n)).
`folds`	(optional, default to 10) Number of folds used for cross-validation.
`reps`	(optional, default to 1) Number of replications for cross-validation.
`cn.s`	(optional) A sequence of 'c.n' to be used for cross-validation (vector of values). Default is equally spaced grid of 100 values between .001 x log(log(n)) and 0.2 x log(log(n)).
`cn.length`	(optional, default to 100) Number of equally spaced tuning parameter (between .001 x log(log(n)) and 0.2 x log(log(n))). Values to search from.
`gridsize`	(default to 600) Number of equally spaced points (between 0 and 1) to evaluate the distance function. Larger values are more computationally intensive but also lead to more accurate estimates.

`samples`	A list of the K samples to be studied, all following admixture distributions.
`admixMod`	A list of objects of class class admix_model, containing useful information about distributions and parameters.
`conf_level`	The confidence level of the K-sample test.
`sim_U`	(default to NULL) Random draws of the inner convergence part of the contrast as defined in the IBM approach (see 'Details' below).
`tune_penalty`	(default to TRUE) A boolean that allows to choose between a classical penalty term or an optimized penalty (embedding some tuning parameters, automatically optimized). Optimized penalty is very useful for low or unbalanced sample sizes to detect alternatives to the null hypothesis (H0).
`n_sim_tab`	(default to 100) Number of simulated Gaussian processes when tabulating the inner convergence distribution in the 'icv' testing method using the IBM estimation approach.
`parallel`	(default to FALSE) Boolean to indicate whether parallel computations are performed (speed-up the tabulation).
`n_cpu`	(default to 2) Number of cores used when paralleling computations.

`samples`	A list of the two samples under study.
`admixMod`	A list of two objects of class 'admix_model', with information about distributions and parameters.
`min_size`	(optional, NULL by default) In the k-sample case, useful to provide the minimal size among all samples (needed to take into account the correction factor for variance-covariance assessment). Otherwise, useless.
`n.varCovMat`	(default to 80) Number of time points at which the Gaussian processes are simulated.
`n_sim_tab`	(default to 100) Number of simulated Gaussian processes when tabulating the inner convergence distribution in the 'icv' testing method using the IBM estimation approach.
`parallel`	(default to FALSE) Boolean to indicate whether parallel computations are performed (speed-up the tabulation).
`n_cpu`	(default to 2) Number of cores used when paralleling computations.

`x`	An object of class 'decontaminated_density' (see ?decontaminated_density).
`x_val`	(numeric) A vector of points at which to evaluate the probability mass/density function.
`add_plot`	(default to FALSE) A boolean specifying if one plots the decontaminated density over an existing plot. Used for visual comparison purpose.
`...`	Arguments to be passed to generic method 'plot', such as graphical parameters (see par).

`x`	An object of class 'decontaminated_density' (see ?decontaminated_density).
`...`	Arguments to be passed to generic method 'plot', such as graphical parameters (see par).

`samples`	List of the two samples, each one following the mixture distribution given by l = pf + (1-p)g, with f and p unknown and g known.
`admixMod`	An object of class admix_model, containing useful information about distributions and parameters.
`conf_level`	The confidence level, default to 95 percent. Equals 1-alpha, where alpha is the level of the test (type-I error).
`est_method`	Estimation method to get the component weights, either 'PS' (Patra and Sen estimation) or 'BVdk' (Bordes and Vendekerkhove estimation). Choosing 'PS' requires to specify the number of bootstrap samples.
`ask_poly_param`	(default to FALSE) If TRUE, ask the user to choose both the order 'K' of expansion coefficients in the orthonormal polynomial basis, and the penalization rate 's' involved on the penalization rule for the test.
`K`	(K > 0, default to 3) If not asked (see the previous argument), number of coefficients considered for the polynomial basis expansion.
`s`	(in ]0,1/2[, default to 0.49) If not asked (see the previous argument), rate at which the normalization factor is set in the penalization rule for model selection (in ]0,1/2[). Low values of 's' favors the detection of alternative hypothesis. See reference below. [, default to 0.49) If not asked (see the previous argument), rate at which the normalization factor is set in the penalization rule for model selection (in ]: R:,%20default%20to%200.49)%20If%20not%20asked%20(see%20the%20previous%20argument),%20rate%20at%20which%20the%20normalization%20factor%20is%20set%20in%0A%20%20%20%20%20%20%20%20%20%20the%20penalization%20rule%20for%20model%20selection%20(in%20
`nb_echBoot`	(default to 100) Number of bootstrap samples, useful when choosing 'PS' estimation method.
`support`	Support of the probability distributions, useful to choose the appropriate polynomial orthonormal basis. One of 'Real', 'Integer', 'Positive', or 'Bounded.continuous'.
`bounds_supp`	(default to NULL) Useful if support = 'Bounded.continuous', a list of minimum and maximum bounds, specified as following: list( list(min.f1,min.g1,min.f2,min.g2) , list(max.f1,max.g1,max.f2,max.g2) )
`...`	Optional arguments to estim_BVdk or estim_PS, depending on the chosen argument 'est_method' (see above).

`x`	Object of class 'twoComp_mixt' from which the density will be plotted.
`add.plot`	(default to FALSE) Option to plot another mixture distribution on the same graph.
`...`	further classical arguments and graphical parameters for methods plot and hist.

`x`	An object of class 'admix_cluster' (see ?admix_clustering).
`...`	further arguments passed to or from other methods.

`x`	An object of class 'admix_estim' (see ?admix_estim).
`...`	further arguments passed to or from other methods.

`x`	An object of class 'admix_model'.
`...`	A list of additional parameters belonging to the default method.

`x`	An object of class 'admix_test'.
`...`	A list of additional parameters belonging to the default method.

`x`	An object of class 'estim_BVdk'.
`...`	A list of additional parameters belonging to the default method.

`x`	An object of class 'estim_IBM'.
`...`	A list of additional parameters belonging to the default method.

`x`	An object of class 'estim_PS'.
`...`	further arguments passed to or from other methods.

`x`	An object of class 'gaussianity_test'.
`...`	A list of additional parameters belonging to the default method.

Package 'admix'

Help Index

Clustering of K populations following admixture models

Description

Usage

Arguments

Value

Author(s)

References

Examples

Estimate the unknown parameters of the admixture model(s)

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Create an object of class 'admix_model'

Description

Usage

Arguments

Value

Author(s)

Examples

Equality test for the unknown components of admixture models

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Measurements of heliocentric velocities in four galaxies

Description

Usage

Format

Source

Estimates the decontaminated CDF of the unknown component in an admixture

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Estimates the decontaminated density of the unknown component in an admixture

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Estimation of the admixture parameters by Bordes & Vandekerkhove (2010)

Description

Usage

Arguments

Value

Author(s)

References

Examples

Estimates weights of unknown components from 2 admixtures using IBM

Description

Usage

Arguments

Value

Author(s)

References

Examples

Estimates in an admixture using Patra and Sen approach

Description

Usage

Arguments

Value

Author(s)

References

Examples

`x`	An object of class 'IBM_test'.
`...`	A list of additional parameters belonging to the default method.

`x`	An object of class 'orthobasis_test'.
`...`	A list of additional parameters belonging to the default method.

`x`	An object of class 'twoComp_mixt'.
`...`	A list of additional parameters belonging to the default method.

`object`	An object of class 'admix_cluster' (see ?admix_clustering).
`...`	further arguments passed to or from other methods.

`object`	An object of class 'admix_test' (see ?admix_test).
`...`	further arguments passed to or from other methods.

`n`	Number of observations to be simulated.
`weight`	Weight of the first component distribution (distribution f) in the mixture.
`comp.dist`	A list of two elements corresponding to the component distributions (specified with R native names) involved in the mixture model. These elements respectively refer to the two component distributions f and g.
`comp.param`	A list of two elements corresponding to the parameters of the component distributions, each element being a list itself. The names used in each list must correspond to the native R argument names for these distributions. These elements respectively refer to the parameters of f and g distributions of the mixture model.