Package 'RPMM' reference manual

Title:	Recursively Partitioned Mixture Model
Description:	Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a model-based clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.
Authors:	E. Andres Houseman, Sc.D. and Devin C. Koestler, Ph.D.
Maintainer:	E. Andres Houseman <[email protected]>
License:	GPL (>= 2)
Version:	1.25
Built:	2025-01-19 06:48:58 UTC
Source:	CRAN

Beta Distribution Maximum Likelihood Estimator

Description

Estimates a beta distribution via Maximum Likelihood

Usage

betaEst(y, w, weights)
betaEst(y, w, weights)

Arguments

`y`	data vector
`w`	posterior weights
`weights`	case weights

Details

Typically not be called by user.

Value

(a,b) parameters

Beta Maximum Likelihood on a Matrix

Description

Maximum likelihood estimator for beta model on matrix of values (columns having different, independent beta distributions)

Usage

betaEstMultiple(Y, weights = NULL)
betaEstMultiple(Y, weights = NULL)

Arguments

`Y`	data matrix
`weights`	case weights

Value

A list of beta parameters and BIC

Beta Maximum Likelihood Objective Function

Description

Objective function for fitting a beta model using maximum likelihood

Usage

betaObjf(logab, ydata, wdata, weights)
betaObjf(logab, ydata, wdata, weights)

Arguments

`logab`	log(a,b) parameters
`ydata`	data vector
`wdata`	posterior weights
`weights`	case weights

Details

Typically not be called by user.

Value

negative log-likelihood

Beta Latent Class Model

Description

Fits a beta mixture model for any number of classes

Usage

blc(Y, w, maxiter = 25, tol = 1e-06, weights = NULL, verbose = TRUE)
blc(Y, w, maxiter = 25, tol = 1e-06, weights = NULL, verbose = TRUE)

Arguments

`Y`	Data matrix (n x j) on which to perform clustering
`w`	Initial weight matrix (n x k) representing classification
`maxiter`	Maximum number of EM iterations
`tol`	Convergence tolerance
`weights`	Case weights
`verbose`	Verbose output?

Details

Typically not be called by user.

Value

A list of parameters representing mixture model fit, including posterior weights and log-likelihood

Initialize Gaussian Latent Class via Mean Dichotomization

Description

Creates a function for initializing latent class model by dichotomizing via mean over all responses

Usage

blcInitializeSplitDichotomizeUsingMean(threshold = 0.5, fuzz = 0.95)
blcInitializeSplitDichotomizeUsingMean(threshold = 0.5, fuzz = 0.95)

Arguments

`threshold`	Mean threshold for determining class
`fuzz`	“fuzz” factor for producing imperfectly clustered subjects

Details

Creates a function f(x) that will take a data matrix x and initialize a weight matrix for a two-class latent class model. Here, a simple threshold will be applied to the mean over all item responses. See blcTree for example of using “blcInitializeSplit...” to create starting values.

Value

A function f(x) (see Details.)

Initialize Gaussian Latent Class via Eigendecomposition

Description

Creates a function for initializing latent class model based on Eigendecomposition

Usage

blcInitializeSplitEigen(eigendim = 1, 
    assignmentf = function(s) (rank(s) - 0.5)/length(s))
blcInitializeSplitEigen(eigendim = 1, 
    assignmentf = function(s) (rank(s) - 0.5)/length(s))

Arguments

`eigendim`	How many eigenvalues to use
`assignmentf`	assignment function for transforming eigenvector to weight

Details

Creates a function f(x) that will take a data matrix x and initialize a weight matrix for a two-class latent class model. Here, the initialized classes will be based on eigendecomposition of the variance of x. See blcTree for example of using “blcSplitCriterion...” to control split.

Value

A function f(x) (see Details.)

Initialize Beta Latent Class via Fanny

Description

Creates a function for initializing latent class model using the fanny algorithm

Usage

blcInitializeSplitFanny(nu = 2, nufac = 0.875, metric = "euclidean")
blcInitializeSplitFanny(nu = 2, nufac = 0.875, metric = "euclidean")

Arguments

`nu`	`memb.exp` parameter in `fanny`
`nufac`	Factor by which to multiply `nu` if an error occurs
`metric`	Metric to use for `fanny`

Details

Creates a function f(x) that will take a data matrix x and initialize a weight matrix for a two-class latent class model. Here, the “fanny” algorithm will be used. See blcTree for example of using “blcSplitCriterion...” to control split.

Value

A function f(x) (see Details.)

Initialize Beta Latent Class via Hierarchical Clustering

Description

Creates a function for initializing latent class model using hierarchical clustering.

Usage

blcInitializeSplitHClust(metric = "manhattan", method = "ward")
blcInitializeSplitHClust(metric = "manhattan", method = "ward")

Arguments

`metric`	Dissimilarity metric used for hierarchical clustering
`method`	Linkage method used for hierarchical clustering

Details

Creates a function f(x) that will take a data matrix x and initialize a weight matrix for a two-class latent class model. Here, a two-branch split from hierarchical clustering will be used. See blcTree for example of using “blcSplitCriterion...” to control split.

Value

A function f(x) (see Details.)

Beta Latent Class Splitter

Description

Splits a data set into two via a beta mixture model

Usage

blcSplit(x, initFunctions, weight = NULL, index = NULL, level = NULL, 
    wthresh = 1e-09, verbose = TRUE, nthresh = 5, 
    splitCriterion = NULL)
blcSplit(x, initFunctions, weight = NULL, index = NULL, level = NULL, 
    wthresh = 1e-09, verbose = TRUE, nthresh = 5, 
    splitCriterion = NULL)

Arguments

`x`	Data matrix (n x j) on which to perform clustering
`initFunctions`	List of functions of type “blcInitialize...” for initializing latent class model. See `blcInitializeFanny` for an example of arguments and return values.
`weight`	Weight corresponding to the indices passed (see `index`). Defaults to 1 for all indices
`index`	Row indices of data matrix to include. Defaults to all (1 to n).
`level`	Current level.
`wthresh`	Weight threshold for filtering data to children. Indices having weight less than this value will not be passed to children nodes.
`verbose`	Level of verbosity. Default=2 (too much). 0 for quiet.
`nthresh`	Total weight in node required for node to be a candidate for splitting. Nodes with weight less than this value will never split.
`splitCriterion`	Function of type “blcSplitCriterion...” for determining whether split should occur. See `blcSplitCriterionBIC` for an example of arguments and return values. Default behavior is `blcSplitCriterionBIC` (though the function is bypassed by internal calculations for some modest computational efficiency gains).

Details

Should not be called by user.

Value

A list of objects representing split.

Beta RPMM Split Criterion: Use BIC

Description

Split criterion function: compare BICs to determine split.

Usage

blcSplitCriterionBIC(llike1, llike2, weight, ww, J, level)
blcSplitCriterionBIC(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

This is a function of the form “glcSplitCriterion...”, which is required to return a list with at least a boolean value split, along with supporting information. See blcTree for example of using “blcSplitCriterion...” to control split.

Value

`bic1`	one-class (weighted) BIC
`bic2`	two-class (weighted) BIC
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Beta RPMM Split Criterion: Use ICL-BIC

Description

Split criterion function: compare ICL-BICs to determine split (i.e. include entropy term in comparison).

Usage

blcSplitCriterionBICICL(llike1, llike2, weight, ww, J, level)
blcSplitCriterionBICICL(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

Value

`bic1`	one-class (weighted) BIC
`bic2`	two-class (weighted) BIC
`entropy`	two-class entropy
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Beta RPMM Split Criterion: Always Split and Record Everything

Description

Split criterion function: always split, but record everything as you go.

Usage

blcSplitCriterionJustRecordEverything(llike1, llike2, weight, ww, J, level)
blcSplitCriterionJustRecordEverything(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

This is a function of the form “glcSplitCriterion...”, which is required to return a list with at least a boolean value split, along with supporting information. This function ALWAYS returns split=TRUE. Useful for gathering information. It is recommended that you set the maxlev argument in the main function to something less than infinity (say, 3 or 4). See blcTree for example of using “blcSplitCriterion...” to control split.

Value

`llike1`	Just returns `llike1`
`llike2`	Just returns `llike2`
`J`	Just returns `J`
`weight`	Just returns `weight`
`ww`	Just returns `ww`
`degFreedom`	Degrees-of-freedom for LRT
`chiSquareStat`	Chi-square statistic
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Beta RPMM Split Criterion: Level-Weighted BIC

Description

Split criterion function: use a level-weighted version of BIC to determine split; there is an additional penalty incorporated for deep recursion.

Usage

blcSplitCriterionLevelWtdBIC(llike1, llike2, weight, ww, J, level)
blcSplitCriterionLevelWtdBIC(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

Value

`bic1`	One-class BIC, with additional penalty for deeper levels
`bic2`	Two-class BIC, with additional penalty for deeper levels
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Beta RPMM Split Criterion: use likelihood ratio test p value

Description

Split criterion function: Use likelihood ratio test p value to determine split.

Usage

blcSplitCriterionLRT(llike1, llike2, weight, ww, J, level)
blcSplitCriterionLRT(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

This is a function of the form “blcSplitCriterion...”, which is required to return a list with at least a boolean value split, along with supporting information. See blcTree for example of using “blcSplitCriterion...” to control split.

Value

`llike1`	Just returns `llike1`
`llike2`	Just returns `llike2`
`J`	Just returns `J`
`weight`	Just returns `weight`
`degFreedom`	Degrees-of-freedom for LRT
`chiSquareStat`	Chi-square statistic
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Beta Subtree

Description

Subsets a “blcTree” object, i.e. considers the tree whose root is a given node.

Usage

blcSubTree(tr, node)
blcSubTree(tr, node)

Arguments

`tr`	“blcTree” object to subset
`node`	Name of node to make root.

Details

Typically not be called by user.

Value

A “blcTree” object whose root is the given node of tr

Beta RPMM Tree

Description

Performs beta latent class modeling using recursively-partitioned mixture model

Usage

blcTree(x, initFunctions = list(blcInitializeSplitFanny()), 
   weight = NULL, index = NULL, wthresh = 1e-08, nodename = "root",
   maxlevel = Inf, verbose = 2, nthresh = 5, level = 0, env = NULL, 
   unsplit = NULL, splitCriterion = blcSplitCriterionBIC)
blcTree(x, initFunctions = list(blcInitializeSplitFanny()), 
   weight = NULL, index = NULL, wthresh = 1e-08, nodename = "root",
   maxlevel = Inf, verbose = 2, nthresh = 5, level = 0, env = NULL, 
   unsplit = NULL, splitCriterion = blcSplitCriterionBIC)

Arguments

`x`	Data matrix (n x j) on which to perform clustering. Missing values are supported. All values should lie strictly between 0 and 1.
`initFunctions`	List of functions of type “blcInitialize...” for initializing latent class model. See `blcInitializeFanny` for an example of arguments and return values.
`weight`	Weight corresponding to the indices passed (see `index`). Defaults to 1 for all indices
`index`	Row indices of data matrix to include. Defaults to all (1 to n).
`wthresh`	Weight threshold for filtering data to children. Indices having weight less than this value will not be passed to children nodes. Default=1E-8.
`nodename`	Name of object that will represent node in tree data object. Defaults to “root”. USER SHOULD NOT SET THIS.
`maxlevel`	Maximum depth to recurse. Default=Inf.
`verbose`	Level of verbosity. Default=2 (too much). 0 for quiet.
`nthresh`	Total weight in node required for node to be a candidate for splitting. Nodes with weight less than this value will never split. Defaults to 5.
`level`	Current level. Defaults to 0. USER SHUOLD NOT SET THIS.
`env`	Object of class “blcTree” to store tree data. Defaults to a new object. USER SHOULD NOT SET THIS.
`unsplit`	Latent class parameters from parent, to store in current node. Defaults to NULL for root. This is used in plotting functions. USER SHOULD NOT SET THIS.
`splitCriterion`	Function of type “blcSplitCriterion...” for determining whether a node should be split. See `blcSplitCriterionBIC` for an example of arguments and return values.

Details

This function is called recursively by itself. Upon each recursion, certain arguments (e.g. nodename) are reset. Do not attempt to set these arguments yourself.

Value

An object of class “blcTree”. This is an environment, each of whose component objects represents a node in the tree.

Note

The class “blcTree” is currently implemented as an environment object with nodes represented flatly, with name indicating positition in hierarchy (e.g. “rLLR” = “right child of left child of left child of root”) This implementation is to make certain plotting and update functions simpler than would be required if the data were stored in a more natural “list of list” format.

The following error may appear during the course of the algorithm:

      Error in optim(logab, betaObjf, ydata = y, wdata = w, weights = weights,  : 
           non-finite value supplied by optim

This is merely an indication that the node being split is too small, in which case the splitting will terminate at that node; in other words, it is nothing to worry about.

Author(s)

E. Andres Houseman

References

Houseman et al., Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9:365, 2008.

Examples

## Not run: 
data(IlluminaMethylation)

heatmap(IllumBeta, scale="n",
  col=colorRampPalette(c("yellow","black","blue"),space="Lab")(128))

# Fit Gaussian RPMM
rpmm <- blcTree(IllumBeta, verbose=0)
rpmm

# Get weight matrix and show first few rows
rpmmWeightMatrix <- blcTreeLeafMatrix(rpmm)
rpmmWeightMatrix[1:3,]

# Get class assignments and compare with tissue
rpmmClass <- blcTreeLeafClasses(rpmm)
table(rpmmClass,tissue)

# Plot fit
par(mfrow=c(2,2))
plot(rpmm) ; title("Image of RPMM Profile")
plotTree.blcTree(rpmm) ; title("Dendrogram with Labels")
plotTree.blcTree(rpmm, 
  labelFunction=function(u,digits) table(as.character(tissue[u$index])))
title("Dendrogram with Tissue Counts")

# Alternate initialization
rpmm2 <- blcTree(IllumBeta, verbose=0, 
  initFunctions=list(blcInitializeSplitEigen(),
                     blcInitializeSplitFanny(nu=2.5)))
rpmm2

# Alternate split criterion
rpmm3 <- blcTree(IllumBeta, verbose=0, maxlev=3,
  splitCriterion=blcSplitCriterionLevelWtdBIC)
rpmm3

rpmm4 <- blcTree(IllumBeta, verbose=0, maxlev=3,
  splitCriterion=blcSplitCriterionJustRecordEverything)
rpmm4$rLL$splitInfo$llike1
rpmm4$rLL$splitInfo$llike2

## End(Not run)
## Not run: 
data(IlluminaMethylation)

heatmap(IllumBeta, scale="n",
  col=colorRampPalette(c("yellow","black","blue"),space="Lab")(128))

# Fit Gaussian RPMM
rpmm <- blcTree(IllumBeta, verbose=0)
rpmm

# Get weight matrix and show first few rows
rpmmWeightMatrix <- blcTreeLeafMatrix(rpmm)
rpmmWeightMatrix[1:3,]

# Get class assignments and compare with tissue
rpmmClass <- blcTreeLeafClasses(rpmm)
table(rpmmClass,tissue)

# Plot fit
par(mfrow=c(2,2))
plot(rpmm) ; title("Image of RPMM Profile")
plotTree.blcTree(rpmm) ; title("Dendrogram with Labels")
plotTree.blcTree(rpmm, 
  labelFunction=function(u,digits) table(as.character(tissue[u$index])))
title("Dendrogram with Tissue Counts")

# Alternate initialization
rpmm2 <- blcTree(IllumBeta, verbose=0, 
  initFunctions=list(blcInitializeSplitEigen(),
                     blcInitializeSplitFanny(nu=2.5)))
rpmm2

# Alternate split criterion
rpmm3 <- blcTree(IllumBeta, verbose=0, maxlev=3,
  splitCriterion=blcSplitCriterionLevelWtdBIC)
rpmm3

rpmm4 <- blcTree(IllumBeta, verbose=0, maxlev=3,
  splitCriterion=blcSplitCriterionJustRecordEverything)
rpmm4$rLL$splitInfo$llike1
rpmm4$rLL$splitInfo$llike2

## End(Not run)

Recursive Apply Function for Beta RPMM Objects

Description

Recursively applies a function down the nodes of a Gaussian RPMM tree.

Usage

blcTreeApply(tr, f, start = "root", terminalOnly = FALSE, asObject = TRUE, ...)
blcTreeApply(tr, f, start = "root", terminalOnly = FALSE, asObject = TRUE, ...)

Arguments

`tr`	Tree object to recurse
`f`	Function to apply to every node
`start`	Starting node. Default = “root”.
`terminalOnly`	`TRUE`=only terminal nodes, `FALSE`=all nodes.
`asObject`	`TRUE`: `f` accepts node as object. `FALSE`: `f` accepts node by node name and object name, f(nn,tr)

. In the latter case, f should be defined as f <- function(nn,tree){...}.

...

Additional arguments to pass to f

Value

A list of results; names of elements are names of nodes.

Posterior Class Assignments for Beta RPMM

Description

Gets a vector of posterior class membership assignments for terminal nodes.

Usage

blcTreeLeafClasses(tr)
blcTreeLeafClasses(tr)

Arguments

`tr`	Tree from which to create assignments.

Details

See blcTree for example.

Value

Vector of class assignments

Posterior Weight Matrix for Beta RPMM

Description

Gets a matrix of posterior class membership weights for terminal nodes.

Usage

blcTreeLeafMatrix(tr, rounding = 3)
blcTreeLeafMatrix(tr, rounding = 3)

Arguments

`tr`	Tree from which to create matrix.
`rounding`	Digits to round.

Details

See blcTree for example.

Value

N x K matrix of posterior weights

Overall BIC for Entire RPMM Tree (Beta version)

Description

Computes the BIC for the latent class model represented by terminal nodes

Usage

blcTreeOverallBIC(tr, ICL = FALSE)
blcTreeOverallBIC(tr, ICL = FALSE)

Arguments

`tr`	Tree object on which to compute BIC
`ICL`	Include ICL entropy term?

Value

BIC or BIC-ICL.

Empirical Bayes predictions for a specific RPMM model

Description

Empirical Bayes predictions for a specific RPMM model

Usage

ebayes(rpmm, x, type, nodelist=NULL)
ebayes(rpmm, x, type, nodelist=NULL)

Arguments

`rpmm`	RPMM object
`x`	Data matrix
`type`	RPMM type ("blc" or "glc")
`nodelist`	RPMM subnode to use (default = root)

Details

Typically not be called by user.

Value

Matrix of empirical bayes predictions corresponding to x.

Gaussian Maximum Likelihood on a Matrix

Description

Maximum likelihood estimator for Gaussian model on matrix of values (columns having different, independent Gaussian distributions)

Usage

gaussEstMultiple(Y, weights = NULL)
gaussEstMultiple(Y, weights = NULL)

Arguments

`Y`	data matrix
`weights`	case weights

Value

A list of beta parameters and BIC

Gaussian Finite Mixture Model

Description

Fits a Gaussian mixture model for any number of classes

Usage

glc(Y, w, maxiter = 100, tol = 1e-06, weights = NULL, verbose = TRUE)
glc(Y, w, maxiter = 100, tol = 1e-06, weights = NULL, verbose = TRUE)

Arguments

`Y`	Data matrix (n x j) on which to perform clustering
`w`	Initial weight matrix (n x k) representing classification
`maxiter`	Maximum number of EM iterations
`tol`	Convergence tolerance
`weights`	Case weights
`verbose`	Verbose output?

Details

Typically not be called by user.

Value

A list of parameters representing mixture model fit, including posterior weights and log-likelihood

Initialize Gaussian Latent Class via Eigendecomposition

Description

Creates a function for initializing latent class model based on Eigendecomposition

Usage

glcInitializeSplitEigen(eigendim = 1, 
   assignmentf = function(s) (rank(s) - 0.5)/length(s))
glcInitializeSplitEigen(eigendim = 1, 
   assignmentf = function(s) (rank(s) - 0.5)/length(s))

Arguments

`eigendim`	How many eigenvalues to use
`assignmentf`	assignment function for transforming eigenvector to weight

Details

Creates a function f(x) that will take a data matrix x and initialize a weight matrix for a two-class latent class model. Here, the initialized classes will be based on eigendecomposition of the variance of x. See glcTree for example of using “glcInitializeSplit...” to create starting values.

Value

A function f(x) (see Details.)

Initialize Gaussian Latent Class via Fanny

Description

Creates a function for initializing latent class model using the fanny algorithm

Usage

glcInitializeSplitFanny(nu = 2, nufac = 0.875, metric = "euclidean")
glcInitializeSplitFanny(nu = 2, nufac = 0.875, metric = "euclidean")

Arguments

`nu`	`memb.exp` parameter in `fanny`
`nufac`	Factor by which to multiply `nu` if an error occurs
`metric`	Metric to use for `fanny`

Details

Creates a function f(x) that will take a data matrix x and initialize a weight matrix for a two-class latent class model. Here, the “fanny” algorithm will be used. See glcTree for example of using “glcInitializeSplit...” to create starting values.

Value

A function f(x) (see Details.)

Initialize Gaussian Latent Class via Hierarchical Clustering

Description

Creates a function for initializing latent class model using hierarchical clustering.

Usage

glcInitializeSplitHClust(metric = "manhattan", method = "ward")
glcInitializeSplitHClust(metric = "manhattan", method = "ward")

Arguments

`metric`	Dissimilarity metric used for hierarchical clustering
`method`	Linkage method used for hierarchical clustering

Details

Creates a function f(x) that will take a data matrix x and initialize a weight matrix for a two-class latent class model. Here, a two-branch split from hierarchical clustering will be used. See glcTree for example of using “glcInitializeSplit...” to create starting values.

Value

A function f(x) (see Details.)

Gaussian Latent Class Splitter

Description

Splits a data set into two via a Gaussian mixture models

Usage

glcSplit(x, initFunctions, weight = NULL, index = NULL, level =
                 0, wthresh = 1e-09, verbose = TRUE, nthresh = 5,
                 splitCriterion = glcSplitCriterionBIC)
glcSplit(x, initFunctions, weight = NULL, index = NULL, level =
                 0, wthresh = 1e-09, verbose = TRUE, nthresh = 5,
                 splitCriterion = glcSplitCriterionBIC)

Arguments

`x`	Data matrix (n x j) on which to perform clustering
`initFunctions`	List of functions of type “glcInitialize...” for initializing latent class model. See `glcInitializeFanny` for an example of arguments and return values.
`weight`	Weight corresponding to the indices passed (see `index`). Defaults to 1 for all indices
`index`	Row indices of data matrix to include. Defaults to all (1 to n).
`level`	Current level.
`wthresh`	Weight threshold for filtering data to children. Indices having weight less than this value will not be passed to children nodes.
`verbose`	Level of verbosity. Default=2 (too much). 0 for quiet.
`nthresh`	Total weight in node required for node to be a candidate for splitting. Nodes with weight less than this value will never split.
`splitCriterion`	Function of type “glcSplitCriterion...” for determining whether split should occur. See `glcSplitCriterionBIC` for an example of arguments and return values.

Details

Should not be called by user.

Value

A list of objects representing split.

Gaussian RPMM Split Criterion: Use BIC

Description

Split criterion function: compare BICs to determine split.

Usage

glcSplitCriterionBIC(llike1, llike2, weight, ww, J, level)
glcSplitCriterionBIC(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

This is a function of the form “glcSplitCriterion...”, which is required to return a list with at least a boolean value split, along with supporting information. See glcTree for example of using “glcSplitCriterion...” to control split.

Value

`bic1`	one-class (weighted) BIC
`bic2`	two-class (weighted) BIC
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Gaussian RPMM Split Criterion: Use ICL-BIC

Description

Split criterion function: compare ICL-BICs to determine split (i.e. include entropy term in comparison).

Usage

glcSplitCriterionBICICL(llike1, llike2, weight, ww, J, level)
glcSplitCriterionBICICL(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

Value

`bic1`	one-class (weighted) BIC
`bic2`	two-class (weighted) BIC
`entropy`	two-class entropy
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Gaussian RPMM Split Criterion: Always Split and Record Everything

Description

Split criterion function: always split, but record everything as you go.

Usage

glcSplitCriterionJustRecordEverything(llike1, llike2, weight, ww, J, level)
glcSplitCriterionJustRecordEverything(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

This is a function of the form “glcSplitCriterion...”, which is required to return a list with at least a boolean value split, along with supporting information. This function ALWAYS returns split=TRUE. Useful for gathering information. It is recommended that you set the maxlev argument in the main function to something less than infinity (say, 3 or 4). See glcTree for example of using “glcSplitCriterion...” to control split.

Value

`llike1`	Just returns `llike1`
`llike2`	Just returns `llike2`
`J`	Just returns `J`
`weight`	Just returns `weight`
`ww`	Just returns `ww`
`degFreedom`	Degrees-of-freedom for LRT
`chiSquareStat`	Chi-square statistic
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Gaussian RPMM Split Criterion: Level-Weighted BIC

Description

Split criterion function: use a level-weighted version of BIC to determine split; there is an additional penalty incorporated for deep recursion.

Usage

glcSplitCriterionLevelWtdBIC(llike1, llike2, weight, ww, J, level)
glcSplitCriterionLevelWtdBIC(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

Value

`bic1`	One-class BIC, with additional penalty for deeper levels
`bic2`	Two-class BIC, with additional penalty for deeper levels
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Gaussian RPMM Split Criterion: Use likelihood ratio test p value

Description

Split criterion function: use likelihood ratio test p value to determine split.

Usage

glcSplitCriterionLRT(llike1, llike2, weight, ww, J, level)
glcSplitCriterionLRT(llike1, llike2, weight, ww, J, level)

Arguments

`llike1`	one-class likelihood.
`llike2`	two-class likelihood.
`weight`	weights from RPMM node.
`ww`	“ww” from RPMM node.
`J`	Number of items.
`level`	Node level.

Details

Value

`llike1`	Just returns `llike1`
`llike2`	Just returns `llike2`
`J`	Just returns `J`
`weight`	Just returns `weight`
`degFreedom`	Degrees-of-freedom for LRT
`chiSquareStat`	Chi-square statistic
`split`	`TRUE`=split the node, `FALSE`=do not split the node.

Gaussian Subtree

Description

Subsets a “glcTree” object, i.e. considers the tree whose root is a given node.

Usage

glcSubTree(tr, node)
glcSubTree(tr, node)

Arguments

`tr`	“glcTree” object to subset
`node`	Name of node to make root.

Details

Typically not be called by user.

Value

A “glcTree” object whose root is the given node of tr

Gaussian RPMM Tree

Description

Performs Gaussian latent class modeling using recursively-partitioned mixture model

Usage

glcTree(x, initFunctions = list(glcInitializeSplitFanny(nu=1.5)), 
   weight = NULL, index = NULL, wthresh = 1e-08, 
   nodename = "root", maxlevel = Inf, verbose = 2, nthresh = 5, level = 0, 
   env = NULL, unsplit = NULL, splitCriterion = glcSplitCriterionBIC)
glcTree(x, initFunctions = list(glcInitializeSplitFanny(nu=1.5)), 
   weight = NULL, index = NULL, wthresh = 1e-08, 
   nodename = "root", maxlevel = Inf, verbose = 2, nthresh = 5, level = 0, 
   env = NULL, unsplit = NULL, splitCriterion = glcSplitCriterionBIC)

Arguments

`x`	Data matrix (n x j) on which to perform clustering. Missing values are supported.
`initFunctions`	List of functions of type “glcInitialize...” for initializing latent class model. See `glcInitializeFanny` for an example of arguments and return values.
`weight`	Weight corresponding to the indices passed (see `index`). Defaults to 1 for all indices
`index`	Row indices of data matrix to include. Defaults to all (1 to n).
`wthresh`	Weight threshold for filtering data to children. Indices having weight less than this value will not be passed to children nodes. Default=1E-8.
`nodename`	Name of object that will represent node in tree data object. Defaults to “root”. USER SHOULD NOT SET THIS.
`maxlevel`	Maximum depth to recurse. Default=Inf.
`verbose`	Level of verbosity. Default=2 (too much). 0 for quiet.
`nthresh`	Total weight in node required for node to be a candidate for splitting. Nodes with weight less than this value will never split. Defaults to 5.
`level`	Current level. Defaults to 0. USER SHUOLD NOT SET THIS.
`env`	Object of class “glcTree” to store tree data. Defaults to a new object. USER SHOULD NOT SET THIS.
`unsplit`	Latent class parameters from parent, to store in current node. Defaults to NULL for root. This is used in plotting functions. USER SHOULD NOT SET THIS.
`splitCriterion`	Function of type “glcSplitCriterion...” for determining whether a node should be split. See `glcSplitCriterionBIC` for an example of arguments and return values.

Details

This function is called recursively by itself. Upon each recursion, certain arguments (e.g. nodename) are reset. Do not attempt to set these arguments yourself.

Value

An object of class “glcTree”. This is an environment, each of whose component objects represents a node in the tree.

Note

The class “glcTree” is currently implemented as an environment object with nodes represented flatly, with name indicating positition in hierarchy (e.g. “rLLR” = “right child of left child of left child of root”) This implementation is to make certain plotting and update functions simpler than would be required if the data were stored in a more natural “list of list” format.

The following error may appear during the course of the algorithm:

      Error in optim(logab, betaObjf, ydata = y, wdata = w, weights = weights,  : 
           non-finite value supplied by optim

This is merely an indication that the node being split is too small, in which case the splitting will terminate at that node; in other words, it is nothing to worry about.

Author(s)

E. Andres Houseman

References

Examples


data(IlluminaMethylation)

## Not run: 
heatmap(IllumBeta, scale="n",
  col=colorRampPalette(c("yellow","black","blue"),space="Lab")(128))

## End(Not run)

# Fit Gaussian RPMM
rpmm <- glcTree(IllumBeta, verbose=0)
rpmm

# Get weight matrix and show first few rows
rpmmWeightMatrix <- glcTreeLeafMatrix(rpmm)
rpmmWeightMatrix[1:3,]

# Get class assignments and compare with tissue
rpmmClass <- glcTreeLeafClasses(rpmm)
table(rpmmClass,tissue)

## Not run: 
# Plot fit
par(mfrow=c(2,2))
plot(rpmm) ; title("Image of RPMM Profile")
plotTree.glcTree(rpmm) ; title("Dendrogram with Labels")
plotTree.glcTree(rpmm, 
  labelFunction=function(u,digits) table(as.character(tissue[u$index])))
title("Dendrogram with Tissue Counts")

# Alternate initialization
rpmm2 <- glcTree(IllumBeta, verbose=0, 
  initFunctions=list(glcInitializeSplitEigen(),
                     glcInitializeSplitFanny(nu=2.5)))
rpmm2

# Alternate split criterion
rpmm3 <- glcTree(IllumBeta, verbose=0, maxlev=3,
  splitCriterion=glcSplitCriterionLevelWtdBIC)
rpmm3

rpmm4 <- glcTree(IllumBeta, verbose=0, maxlev=3,
  splitCriterion=glcSplitCriterionJustRecordEverything)
rpmm4$rLL$splitInfo$llike1
rpmm4$rLL$splitInfo$llike2

## End(Not run)
data(IlluminaMethylation)

## Not run: 
heatmap(IllumBeta, scale="n",
  col=colorRampPalette(c("yellow","black","blue"),space="Lab")(128))

## End(Not run)

# Fit Gaussian RPMM
rpmm <- glcTree(IllumBeta, verbose=0)
rpmm

# Get weight matrix and show first few rows
rpmmWeightMatrix <- glcTreeLeafMatrix(rpmm)
rpmmWeightMatrix[1:3,]

# Get class assignments and compare with tissue
rpmmClass <- glcTreeLeafClasses(rpmm)
table(rpmmClass,tissue)

## Not run: 
# Plot fit
par(mfrow=c(2,2))
plot(rpmm) ; title("Image of RPMM Profile")
plotTree.glcTree(rpmm) ; title("Dendrogram with Labels")
plotTree.glcTree(rpmm, 
  labelFunction=function(u,digits) table(as.character(tissue[u$index])))
title("Dendrogram with Tissue Counts")

# Alternate initialization
rpmm2 <- glcTree(IllumBeta, verbose=0, 
  initFunctions=list(glcInitializeSplitEigen(),
                     glcInitializeSplitFanny(nu=2.5)))
rpmm2

# Alternate split criterion
rpmm3 <- glcTree(IllumBeta, verbose=0, maxlev=3,
  splitCriterion=glcSplitCriterionLevelWtdBIC)
rpmm3

rpmm4 <- glcTree(IllumBeta, verbose=0, maxlev=3,
  splitCriterion=glcSplitCriterionJustRecordEverything)
rpmm4$rLL$splitInfo$llike1
rpmm4$rLL$splitInfo$llike2

## End(Not run)

Recursive Apply Function for Gaussian RPMM Objects

Description

Recursively applies a function down the nodes of a Gaussian RPMM tree.

Usage

glcTreeApply(tr, f, start = "root", terminalOnly = FALSE, 
   asObject = TRUE, ...)
glcTreeApply(tr, f, start = "root", terminalOnly = FALSE, 
   asObject = TRUE, ...)

Arguments

`tr`	Tree object to recurse
`f`	Function to apply to every node
`start`	Starting node. Default = “root”.
`terminalOnly`	`TRUE`=only terminal nodes, `FALSE`=all nodes.
`asObject`	`TRUE`: `f` accepts node as object. `FALSE`: `f` accepts node by node name and object name, f(nn,tr)

. In the latter case, f should be defined as f <- function(nn,tree){...}.

...

Additional arguments to pass to f

Value

A list of results; names of elements are names of nodes.

Posterior Class Assignments for Gaussian RPMM

Description

Gets a vector of posterior class membership assignments for terminal nodes.

Usage

glcTreeLeafClasses(tr)
glcTreeLeafClasses(tr)

Arguments

`tr`	Tree from which to create assignments.

Details

See glcTree for example.

Value

Vector of class assignments

Posterior Weight Matrix for Gaussian RPMM

Description

Gets a matrix of posterior class membership weights for terminal nodes.

Usage

glcTreeLeafMatrix(tr, rounding = 3)
glcTreeLeafMatrix(tr, rounding = 3)

Arguments

`tr`	Tree from which to create matrix.
`rounding`	Digits to round.

Details

See glcTree for example.

Value

N x K matrix of posterior weights

Overall BIC for Entire RPMM Tree (Gaussian version)

Description

Computes the BIC for the latent class model represented by terminal nodes

Usage

glcTreeOverallBIC(tr, ICL = FALSE)
glcTreeOverallBIC(tr, ICL = FALSE)

Arguments

`tr`	Tree object on which to compute BIC
`ICL`	Include ICL entropy term?

Value

BIC or BIC-ICL.

Weighted GLM for latent class covariates

Description

Wrapper for glm function to incorporate weights corresponding to latent classes

Usage

glmLC(y,W,family=quasibinomial(),eps=1E-8,Z=NULL)
glmLC(y,W,family=quasibinomial(),eps=1E-8,Z=NULL)

Arguments

`y`	outcome
`W`	weight matrix (rows=cases, # rows = length of y)
`family`	glm family (default = quasibinomial for logistic regression)
`eps`	threshold below which to delete pseudo-subject corresponding to a specific weight
`Z`	matrix of additional covariates

Details

This function is a wrapper for glm to incorporate weights corresponding to latent classes (e.g. from an RPMM prediction)

Value

a glm object

DNA Methylation Data for Normal Tissue Types

Description

Illumina GoldenGate DNA methylation data for 217 normal tissues. 100 most variable CpG sites.

Usage

IlluminaMethylationIlluminaMethylation

Format

a 217 x 100 matrix containing Illumina Avg Beta values (IllumBeta), and a corresponding factor vector of 217 tissue types (tissue).

References

Christensen BC, Houseman EA, et al. 2009 Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context. PLoS Genet 5(8): e1000602.

Data log-likelihood implied by a specific RPMM model

Description

Data log-likelihood implied by a specific RPMM model

Usage

llikeRPMMObject(o, x, type)
llikeRPMMObject(o, x, type)

Arguments

`o`	RPMM object
`x`	Data matrix
`type`	RPMM type ("blc" or "glc")

Details

Typically not be called by user.

Value

Vector of loglikelihoods corresponding to rows of x.

Plot a Beta RPMM Tree Profile

Description

Plot method for objects of type “blcTree”. Plots profiles of terminal nodes in color. Method wrapper for plotImage.blcTree.

Usage

## S3 method for class 'blcTree'
plot(x,...)
## S3 method for class 'blcTree'
plot(x,...)

Arguments

`x`	RPMM object to plot.
`...`	Additional arguments to pass to `plotImage.blcTree`.

Details

See blcTree for example.

Plot a Gaussian RPMM Tree Profile

Description

Plot method for objects of type “glcTree”. Plots profiles of terminal nodes in color. Method wrapper for plotImage.glcTree.

Usage

## S3 method for class 'glcTree'
plot(x,...)
## S3 method for class 'glcTree'
plot(x,...)

Arguments

`x`	RPMM object to plot.
`...`	Additional arguments to pass to `plotImage.glcTree`.

Details

See glcTree for example.

Plot a Beta RPMM Tree Profile

Description

Plots profiles of terminal nodes in color.

Usage

plotImage.blcTree(env, 
   start = "r", method = "weight", 
   palette = colorRampPalette(c("yellow", "black", "blue"), space = "Lab")(128), 
   divcol = "red", xorder = NULL, dimensions = NULL, labelType = "LR")
plotImage.blcTree(env, 
   start = "r", method = "weight", 
   palette = colorRampPalette(c("yellow", "black", "blue"), space = "Lab")(128), 
   divcol = "red", xorder = NULL, dimensions = NULL, labelType = "LR")

Arguments

`env`	RPMM object to plot.
`start`	Node to plot (usually root)
`method`	Method to determine width of columns that represent classes: “weight” (subject weight in class) or dQuotebinary (depth in tree).
`palette`	Color palette to use for image plot.
`divcol`	Divider color
`xorder`	Order of variables. Can be useful for constant ordering across multiple plots.
`dimensions`	Subset of dimensions of source data to show. Defaults to all. Useful to show a subset of dimensions.
`labelType`	Label name type: “LR” or “01”.

Details

See blcTree for example.

Value

Returns a vector of indices similar to the order function, representing the orrdering of items used in the plot. This is useful for replicating the order in another plot, or for axis labeling.

Plot a Gaussian RPMM Tree Profile

Description

Plots profiles of terminal nodes in color.

Usage

plotImage.glcTree(env, 
   start = "r", method = "weight", 
   palette = colorRampPalette(c("yellow", "black", "blue"), space = "Lab")(128), 
   divcol = "red", xorder = NULL, dimensions = NULL, labelType = "LR", muColorEps = 1e-08)
plotImage.glcTree(env, 
   start = "r", method = "weight", 
   palette = colorRampPalette(c("yellow", "black", "blue"), space = "Lab")(128), 
   divcol = "red", xorder = NULL, dimensions = NULL, labelType = "LR", muColorEps = 1e-08)

Arguments

`env`	RPMM object to print.
`start`	Node to plot (usually root)
`method`	Method to determine width of columns that represent classes: “weight” (subject weight in class) or dQuotebinary (depth in tree).
`palette`	Color palette to use for image plot.
`divcol`	Divider color
`xorder`	Order of variables. Can be useful for constant ordering across multiple plots.
`dimensions`	Subset of dimensions of source data to show. Defaults to all. Useful to show a subset of dimensions.
`labelType`	Label name type: “LR” or “01”.
`muColorEps`	Small value to stabilize color generation.

Details

See glcTree for example.

Value

Returns a vector of indices similar to the order function, representing the orrdering of items used in the plot. This is useful for replicating the order in another plot, or for axis labeling.

Plot a Beta RPMM Tree Dendrogram

Description

Alternate plot function for objects of type blcTree: plots a dendrogram

Usage

plotTree.blcTree(env, start = "r", labelFunction = NULL, 
    buff = 4, cex = 0.9, square = TRUE, labelAllNodes = FALSE, labelDigits = 1, ...)
plotTree.blcTree(env, start = "r", labelFunction = NULL, 
    buff = 4, cex = 0.9, square = TRUE, labelAllNodes = FALSE, labelDigits = 1, ...)

Arguments

`env`	Tree object to print
`start`	Note from which to start. Default=“r” for “root”.
`labelFunction`	Function for generating node labels. Useful for labeling each node with a value.
`buff`	Buffer for placing tree in plot window.
`cex`	Text size
`square`	Square dendrogram or “V” shaped
`labelAllNodes`	`TRUE`=All nodes will be labeled; `FALSE`=Terminal nodes only.
`labelDigits`	Digits to include in labels, if `labelFunction` returns numeric values.
`...`	Other parameters to be passed to `labelFunction`.

Details

This plots a dendrogram based on RPMM tree, with labels constructed from summaries of tree object. See blcTree for example.

Plot a Gaussian RPMM Tree Dendrogram

Description

Alternate plot function for objects of type glcTree: plots a dendrogram

Usage

plotTree.glcTree(env, start = "r", labelFunction = NULL, 
    buff = 4, cex = 0.9, square = TRUE, labelAllNodes = FALSE, labelDigits = 1, ...)
plotTree.glcTree(env, start = "r", labelFunction = NULL, 
    buff = 4, cex = 0.9, square = TRUE, labelAllNodes = FALSE, labelDigits = 1, ...)

Arguments

`env`	Tree object to print
`start`	Note from which to start. Default=“r” for “root”.
`labelFunction`	Function for generating node labels. Useful for labeling each node with a value.
`buff`	Buffer for placing tree in plot window.
`cex`	Text size
`square`	Square dendrogram or “V” shaped
`labelAllNodes`	`TRUE`=All nodes will be labeled; `FALSE`=Terminal nodes only.
`labelDigits`	Digits to include in labels, if `labelFunction` returns numeric values.
`...`	Other parameters to be passed to `labelFunction`.

Details

This plots a dendrogram based on RPMM tree, with labels constructed from summaries of tree object. See glcTree for example.

Predict using a Beta RPMM object

Description

Prediction method for objects of type blcTree

Usage

## S3 method for class 'blcTree'
predict(object, newdata=NULL, nodelist=NULL, type="weight",...)
## S3 method for class 'blcTree'
predict(object, newdata=NULL, nodelist=NULL, type="weight",...)

Arguments

`object`	RPMM object to print
`newdata`	external data matrix from which to apply predictions
`nodelist`	RPMM subnode to use (default = root)
`type`	output type: "weight" produces output similar to `blcTreeLeafMatrix`, "class" produces output similar to `blcTreeLeafClasses`.
`...`	(Unused).

Details

This function is similar to blcTreeLeafMatrix and blcTreeLeafClasses, except that it supports prediction on an external data set via the argument newdata.

Predict using a Gaussian RPMM object

Description

Prediction method for objects of type glcTree

Usage

## S3 method for class 'glcTree'
predict(object, newdata=NULL, nodelist=NULL, type="weight",...)
## S3 method for class 'glcTree'
predict(object, newdata=NULL, nodelist=NULL, type="weight",...)

Arguments

`object`	RPMM object to print
`newdata`	external data matrix from which to apply predictions
`nodelist`	RPMM subnode to use (default = root)
`type`	output type: "weight" produces output similar to `glcTreeLeafMatrix`, "class" produces output similar to `glcTreeLeafClasses`.
`...`	(Unused).

Details

This function is similar to glcTreeLeafMatrix and glcTreeLeafClasses, except that it supports prediction on an external data set via the argument newdata.

Print a Beta RPMM object

Description

Print method for objects of type blcTree

Usage

## S3 method for class 'blcTree'
print(x,...)
## S3 method for class 'blcTree'
print(x,...)

Arguments

`x`	RPMM object to print
`...`	(Unused).

Details

See blcTree for example.

Print a Gaussian RPMM object

Description

Print method for objects of type blcTree

Usage

## S3 method for class 'glcTree'
print(x,...)
## S3 method for class 'glcTree'
print(x,...)

Arguments

`x`	RPMM object to print
`...`	(Unused).

Details

See glcTree for example.

Package 'RPMM'

Help Index

Beta Distribution Maximum Likelihood Estimator

Description

Usage

Arguments

Details

Value

Beta Maximum Likelihood on a Matrix

Description

Usage

Arguments

Value

Beta Maximum Likelihood Objective Function

Description

Usage

Arguments

Details

Value

Beta Latent Class Model

Description

Usage

Arguments

Details

Value

Initialize Gaussian Latent Class via Mean Dichotomization

Description

Usage

Arguments

Details

Value

See Also

Initialize Gaussian Latent Class via Eigendecomposition

Description

Usage

Arguments

Details

Value

See Also

Initialize Beta Latent Class via Fanny

Description

Usage

Arguments

Details

Value

See Also

Initialize Beta Latent Class via Hierarchical Clustering

Description

Usage

Arguments

Details

Value

See Also

Beta Latent Class Splitter

Description

Usage

Arguments

Details

Value

Beta RPMM Split Criterion: Use BIC

Description

Usage

Arguments

Details

Value

See Also

Beta RPMM Split Criterion: Use ICL-BIC

Description

Usage

Arguments

Details

Value

See Also

Beta RPMM Split Criterion: Always Split and Record Everything

Description

Usage

Arguments

Details

Value

See Also