Title: | Recursively Partitioned Mixture Model |
---|---|
Description: | Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a model-based clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models. |
Authors: | E. Andres Houseman, Sc.D. and Devin C. Koestler, Ph.D. |
Maintainer: | E. Andres Houseman <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.25 |
Built: | 2025-01-19 06:48:58 UTC |
Source: | CRAN |
Estimates a beta distribution via Maximum Likelihood
betaEst(y, w, weights)
betaEst(y, w, weights)
y |
data vector |
w |
posterior weights |
weights |
case weights |
Typically not be called by user.
(a,b) parameters
Maximum likelihood estimator for beta model on matrix of values (columns having different, independent beta distributions)
betaEstMultiple(Y, weights = NULL)
betaEstMultiple(Y, weights = NULL)
Y |
data matrix |
weights |
case weights |
A list of beta parameters and BIC
Objective function for fitting a beta model using maximum likelihood
betaObjf(logab, ydata, wdata, weights)
betaObjf(logab, ydata, wdata, weights)
logab |
log(a,b) parameters |
ydata |
data vector |
wdata |
posterior weights |
weights |
case weights |
Typically not be called by user.
negative log-likelihood
Fits a beta mixture model for any number of classes
blc(Y, w, maxiter = 25, tol = 1e-06, weights = NULL, verbose = TRUE)
blc(Y, w, maxiter = 25, tol = 1e-06, weights = NULL, verbose = TRUE)
Y |
Data matrix (n x j) on which to perform clustering |
w |
Initial weight matrix (n x k) representing classification |
maxiter |
Maximum number of EM iterations |
tol |
Convergence tolerance |
weights |
Case weights |
verbose |
Verbose output? |
Typically not be called by user.
A list of parameters representing mixture model fit, including posterior weights and log-likelihood
Creates a function for initializing latent class model by dichotomizing via mean over all responses
blcInitializeSplitDichotomizeUsingMean(threshold = 0.5, fuzz = 0.95)
blcInitializeSplitDichotomizeUsingMean(threshold = 0.5, fuzz = 0.95)
threshold |
Mean threshold for determining class |
fuzz |
“fuzz” factor for producing imperfectly clustered subjects |
Creates a function f(x)
that will take a data matrix x
and
initialize a weight matrix for a two-class latent class model.
Here, a simple threshold will be applied to the mean over all item responses.
See blcTree
for example of using “blcInitializeSplit...” to create starting values.
A function f(x)
(see Details.)
glcInitializeSplitFanny
,
glcInitializeSplitHClust
Creates a function for initializing latent class model based on Eigendecomposition
blcInitializeSplitEigen(eigendim = 1, assignmentf = function(s) (rank(s) - 0.5)/length(s))
blcInitializeSplitEigen(eigendim = 1, assignmentf = function(s) (rank(s) - 0.5)/length(s))
eigendim |
How many eigenvalues to use |
assignmentf |
assignment function for transforming eigenvector to weight |
Creates a function f(x)
that will take a data matrix x
and
initialize a weight matrix for a two-class latent class model.
Here, the initialized classes will be based on eigendecomposition of the variance of x
.
See blcTree
for example of using “blcSplitCriterion...” to control split.
A function f(x)
(see Details.)
blcInitializeSplitDichotomizeUsingMean
,
glcInitializeSplitFanny
,
glcInitializeSplitHClust
Creates a function for initializing latent class model using the fanny
algorithm
blcInitializeSplitFanny(nu = 2, nufac = 0.875, metric = "euclidean")
blcInitializeSplitFanny(nu = 2, nufac = 0.875, metric = "euclidean")
nu |
|
nufac |
Factor by which to multiply |
metric |
Metric to use for |
Creates a function f(x)
that will take a data matrix x
and
initialize a weight matrix for a two-class latent class model.
Here, the “fanny” algorithm will be used.
See blcTree
for example of using “blcSplitCriterion...” to control split.
A function f(x)
(see Details.)
blcInitializeSplitDichotomizeUsingMean
,
blcInitializeSplitEigen
,
blcInitializeSplitHClust
Creates a function for initializing latent class model using hierarchical clustering.
blcInitializeSplitHClust(metric = "manhattan", method = "ward")
blcInitializeSplitHClust(metric = "manhattan", method = "ward")
metric |
Dissimilarity metric used for hierarchical clustering |
method |
Linkage method used for hierarchical clustering |
Creates a function f(x)
that will take a data matrix x
and
initialize a weight matrix for a two-class latent class model.
Here, a two-branch split from hierarchical clustering will be used.
See blcTree
for example of using “blcSplitCriterion...” to control split.
A function f(x)
(see Details.)
blcInitializeSplitDichotomizeUsingMean
,
blcInitializeSplitEigen
,
blcInitializeSplitFanny
Splits a data set into two via a beta mixture model
blcSplit(x, initFunctions, weight = NULL, index = NULL, level = NULL, wthresh = 1e-09, verbose = TRUE, nthresh = 5, splitCriterion = NULL)
blcSplit(x, initFunctions, weight = NULL, index = NULL, level = NULL, wthresh = 1e-09, verbose = TRUE, nthresh = 5, splitCriterion = NULL)
x |
Data matrix (n x j) on which to perform clustering |
initFunctions |
List of functions of type “blcInitialize...” for initializing latent class model.
See |
weight |
Weight corresponding to the indices passed (see |
index |
Row indices of data matrix to include. Defaults to all (1 to n). |
level |
Current level. |
wthresh |
Weight threshold for filtering data to children. Indices having weight less than this value will not be passed to children nodes. |
verbose |
Level of verbosity. Default=2 (too much). 0 for quiet. |
nthresh |
Total weight in node required for node to be a candidate for splitting. Nodes with weight less than this value will never split. |
splitCriterion |
Function of type “blcSplitCriterion...” for determining whether split should occur.
See |
Should not be called by user.
A list of objects representing split.
Split criterion function: compare BICs to determine split.
blcSplitCriterionBIC(llike1, llike2, weight, ww, J, level)
blcSplitCriterionBIC(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “glcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
See blcTree
for example of using “blcSplitCriterion...” to control split.
bic1 |
one-class (weighted) BIC |
bic2 |
two-class (weighted) BIC |
split |
|
blcSplitCriterionBIC
,
blcSplitCriterionJustRecordEverything
,
blcSplitCriterionLevelWtdBIC
,
blcSplitCriterionLRT
Split criterion function: compare ICL-BICs to determine split (i.e. include entropy term in comparison).
blcSplitCriterionBICICL(llike1, llike2, weight, ww, J, level)
blcSplitCriterionBICICL(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “glcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
See blcTree
for example of using “blcSplitCriterion...” to control split.
bic1 |
one-class (weighted) BIC |
bic2 |
two-class (weighted) BIC |
entropy |
two-class entropy |
split |
|
blcSplitCriterionBICICL
,
blcSplitCriterionJustRecordEverything
,
blcSplitCriterionLevelWtdBIC
,
blcSplitCriterionLRT
Split criterion function: always split, but record everything as you go.
blcSplitCriterionJustRecordEverything(llike1, llike2, weight, ww, J, level)
blcSplitCriterionJustRecordEverything(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “glcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
This function ALWAYS returns split=TRUE
. Useful for gathering information.
It is recommended that you set the maxlev
argument in the main function to something
less than infinity (say, 3 or 4).
See blcTree
for example of using “blcSplitCriterion...” to control split.
llike1 |
Just returns |
llike2 |
Just returns |
J |
Just returns |
weight |
Just returns |
ww |
Just returns |
degFreedom |
Degrees-of-freedom for LRT |
chiSquareStat |
Chi-square statistic |
split |
|
blcSplitCriterionBIC
,
blcSplitCriterionBICICL
,
blcSplitCriterionLevelWtdBIC
,
blcSplitCriterionLRT
Split criterion function: use a level-weighted version of BIC to determine split; there is an additional penalty incorporated for deep recursion.
blcSplitCriterionLevelWtdBIC(llike1, llike2, weight, ww, J, level)
blcSplitCriterionLevelWtdBIC(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “glcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
See blcTree
for example of using “blcSplitCriterion...” to control split.
bic1 |
One-class BIC, with additional penalty for deeper levels |
bic2 |
Two-class BIC, with additional penalty for deeper levels |
split |
|
blcSplitCriterionBIC
,
blcSplitCriterionBICICL
,
blcSplitCriterionJustRecordEverything
,
blcSplitCriterionLRT
Split criterion function: Use likelihood ratio test p value to determine split.
blcSplitCriterionLRT(llike1, llike2, weight, ww, J, level)
blcSplitCriterionLRT(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “blcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
See blcTree
for example of using “blcSplitCriterion...” to control split.
llike1 |
Just returns |
llike2 |
Just returns |
J |
Just returns |
weight |
Just returns |
degFreedom |
Degrees-of-freedom for LRT |
chiSquareStat |
Chi-square statistic |
split |
|
blcSplitCriterionBIC
,
blcSplitCriterionBICICL
,
blcSplitCriterionJustRecordEverything
,
blcSplitCriterionLevelWtdBIC
Subsets a “blcTree” object, i.e. considers the tree whose root is a given node.
blcSubTree(tr, node)
blcSubTree(tr, node)
tr |
“blcTree” object to subset |
node |
Name of node to make root. |
Typically not be called by user.
A “blcTree” object whose root is the given node of tr
Performs beta latent class modeling using recursively-partitioned mixture model
blcTree(x, initFunctions = list(blcInitializeSplitFanny()), weight = NULL, index = NULL, wthresh = 1e-08, nodename = "root", maxlevel = Inf, verbose = 2, nthresh = 5, level = 0, env = NULL, unsplit = NULL, splitCriterion = blcSplitCriterionBIC)
blcTree(x, initFunctions = list(blcInitializeSplitFanny()), weight = NULL, index = NULL, wthresh = 1e-08, nodename = "root", maxlevel = Inf, verbose = 2, nthresh = 5, level = 0, env = NULL, unsplit = NULL, splitCriterion = blcSplitCriterionBIC)
x |
Data matrix (n x j) on which to perform clustering. Missing values are supported. All values should lie strictly between 0 and 1. |
initFunctions |
List of functions of type “blcInitialize...” for initializing latent class model. See |
weight |
Weight corresponding to the indices passed (see |
index |
Row indices of data matrix to include. Defaults to all (1 to n). |
wthresh |
Weight threshold for filtering data to children. Indices having weight less than this value will not be passed to children nodes. Default=1E-8. |
nodename |
Name of object that will represent node in tree data object. Defaults to “root”. USER SHOULD NOT SET THIS. |
maxlevel |
Maximum depth to recurse. Default=Inf. |
verbose |
Level of verbosity. Default=2 (too much). 0 for quiet. |
nthresh |
Total weight in node required for node to be a candidate for splitting. Nodes with weight less than this value will never split. Defaults to 5. |
level |
Current level. Defaults to 0. USER SHUOLD NOT SET THIS. |
env |
Object of class “blcTree” to store tree data. Defaults to a new object. USER SHOULD NOT SET THIS. |
unsplit |
Latent class parameters from parent, to store in current node. Defaults to NULL for root. This is used in plotting functions. USER SHOULD NOT SET THIS. |
splitCriterion |
Function of type “blcSplitCriterion...” for determining whether a node should be split. See |
This function is called recursively by itself. Upon each recursion, certain arguments (e.g. nodename) are reset. Do not attempt to set these arguments yourself.
An object of class “blcTree”. This is an environment, each of whose component objects represents a node in the tree.
The class “blcTree” is currently implemented as an environment object with nodes represented flatly, with name indicating positition in hierarchy (e.g. “rLLR” = “right child of left child of left child of root”) This implementation is to make certain plotting and update functions simpler than would be required if the data were stored in a more natural “list of list” format.
The following error may appear during the course of the algorithm:
Error in optim(logab, betaObjf, ydata = y, wdata = w, weights = weights, : non-finite value supplied by optim
This is merely an indication that the node being split is too small, in which case the splitting will terminate at that node; in other words, it is nothing to worry about.
E. Andres Houseman
Houseman et al., Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9:365, 2008.
## Not run: data(IlluminaMethylation) heatmap(IllumBeta, scale="n", col=colorRampPalette(c("yellow","black","blue"),space="Lab")(128)) # Fit Gaussian RPMM rpmm <- blcTree(IllumBeta, verbose=0) rpmm # Get weight matrix and show first few rows rpmmWeightMatrix <- blcTreeLeafMatrix(rpmm) rpmmWeightMatrix[1:3,] # Get class assignments and compare with tissue rpmmClass <- blcTreeLeafClasses(rpmm) table(rpmmClass,tissue) # Plot fit par(mfrow=c(2,2)) plot(rpmm) ; title("Image of RPMM Profile") plotTree.blcTree(rpmm) ; title("Dendrogram with Labels") plotTree.blcTree(rpmm, labelFunction=function(u,digits) table(as.character(tissue[u$index]))) title("Dendrogram with Tissue Counts") # Alternate initialization rpmm2 <- blcTree(IllumBeta, verbose=0, initFunctions=list(blcInitializeSplitEigen(), blcInitializeSplitFanny(nu=2.5))) rpmm2 # Alternate split criterion rpmm3 <- blcTree(IllumBeta, verbose=0, maxlev=3, splitCriterion=blcSplitCriterionLevelWtdBIC) rpmm3 rpmm4 <- blcTree(IllumBeta, verbose=0, maxlev=3, splitCriterion=blcSplitCriterionJustRecordEverything) rpmm4$rLL$splitInfo$llike1 rpmm4$rLL$splitInfo$llike2 ## End(Not run)
## Not run: data(IlluminaMethylation) heatmap(IllumBeta, scale="n", col=colorRampPalette(c("yellow","black","blue"),space="Lab")(128)) # Fit Gaussian RPMM rpmm <- blcTree(IllumBeta, verbose=0) rpmm # Get weight matrix and show first few rows rpmmWeightMatrix <- blcTreeLeafMatrix(rpmm) rpmmWeightMatrix[1:3,] # Get class assignments and compare with tissue rpmmClass <- blcTreeLeafClasses(rpmm) table(rpmmClass,tissue) # Plot fit par(mfrow=c(2,2)) plot(rpmm) ; title("Image of RPMM Profile") plotTree.blcTree(rpmm) ; title("Dendrogram with Labels") plotTree.blcTree(rpmm, labelFunction=function(u,digits) table(as.character(tissue[u$index]))) title("Dendrogram with Tissue Counts") # Alternate initialization rpmm2 <- blcTree(IllumBeta, verbose=0, initFunctions=list(blcInitializeSplitEigen(), blcInitializeSplitFanny(nu=2.5))) rpmm2 # Alternate split criterion rpmm3 <- blcTree(IllumBeta, verbose=0, maxlev=3, splitCriterion=blcSplitCriterionLevelWtdBIC) rpmm3 rpmm4 <- blcTree(IllumBeta, verbose=0, maxlev=3, splitCriterion=blcSplitCriterionJustRecordEverything) rpmm4$rLL$splitInfo$llike1 rpmm4$rLL$splitInfo$llike2 ## End(Not run)
Recursively applies a function down the nodes of a Gaussian RPMM tree.
blcTreeApply(tr, f, start = "root", terminalOnly = FALSE, asObject = TRUE, ...)
blcTreeApply(tr, f, start = "root", terminalOnly = FALSE, asObject = TRUE, ...)
tr |
Tree object to recurse |
f |
Function to apply to every node |
start |
Starting node. Default = “root”. |
terminalOnly |
|
asObject |
|
.
In the latter case, f
should be defined as f <- function(nn,tree){...}
.
... |
Additional arguments to pass to |
A list of results; names of elements are names of nodes.
Gets a vector of posterior class membership assignments for terminal nodes.
blcTreeLeafClasses(tr)
blcTreeLeafClasses(tr)
tr |
Tree from which to create assignments. |
See blcTree
for example.
Vector of class assignments
Gets a matrix of posterior class membership weights for terminal nodes.
blcTreeLeafMatrix(tr, rounding = 3)
blcTreeLeafMatrix(tr, rounding = 3)
tr |
Tree from which to create matrix. |
rounding |
Digits to round. |
See blcTree
for example.
N x K matrix of posterior weights
Computes the BIC for the latent class model represented by terminal nodes
blcTreeOverallBIC(tr, ICL = FALSE)
blcTreeOverallBIC(tr, ICL = FALSE)
tr |
Tree object on which to compute BIC |
ICL |
Include ICL entropy term? |
BIC or BIC-ICL.
Empirical Bayes predictions for a specific RPMM model
ebayes(rpmm, x, type, nodelist=NULL)
ebayes(rpmm, x, type, nodelist=NULL)
rpmm |
RPMM object |
x |
Data matrix |
type |
RPMM type ("blc" or "glc") |
nodelist |
RPMM subnode to use (default = root) |
Typically not be called by user.
Matrix of empirical bayes predictions corresponding to x
.
Maximum likelihood estimator for Gaussian model on matrix of values (columns having different, independent Gaussian distributions)
gaussEstMultiple(Y, weights = NULL)
gaussEstMultiple(Y, weights = NULL)
Y |
data matrix |
weights |
case weights |
A list of beta parameters and BIC
Fits a Gaussian mixture model for any number of classes
glc(Y, w, maxiter = 100, tol = 1e-06, weights = NULL, verbose = TRUE)
glc(Y, w, maxiter = 100, tol = 1e-06, weights = NULL, verbose = TRUE)
Y |
Data matrix (n x j) on which to perform clustering |
w |
Initial weight matrix (n x k) representing classification |
maxiter |
Maximum number of EM iterations |
tol |
Convergence tolerance |
weights |
Case weights |
verbose |
Verbose output? |
Typically not be called by user.
A list of parameters representing mixture model fit, including posterior weights and log-likelihood
Creates a function for initializing latent class model based on Eigendecomposition
glcInitializeSplitEigen(eigendim = 1, assignmentf = function(s) (rank(s) - 0.5)/length(s))
glcInitializeSplitEigen(eigendim = 1, assignmentf = function(s) (rank(s) - 0.5)/length(s))
eigendim |
How many eigenvalues to use |
assignmentf |
assignment function for transforming eigenvector to weight |
Creates a function f(x)
that will take a data matrix x
and
initialize a weight matrix for a two-class latent class model.
Here, the initialized classes will be based on eigendecomposition of the variance of x
.
See glcTree
for example of using “glcInitializeSplit...” to create starting values.
A function f(x)
(see Details.)
glcInitializeSplitFanny
,
glcInitializeSplitHClust
Creates a function for initializing latent class model using the fanny
algorithm
glcInitializeSplitFanny(nu = 2, nufac = 0.875, metric = "euclidean")
glcInitializeSplitFanny(nu = 2, nufac = 0.875, metric = "euclidean")
nu |
|
nufac |
Factor by which to multiply |
metric |
Metric to use for |
Creates a function f(x)
that will take a data matrix x
and
initialize a weight matrix for a two-class latent class model.
Here, the “fanny” algorithm will be used.
See glcTree
for example of using “glcInitializeSplit...” to create starting values.
A function f(x)
(see Details.)
glcInitializeSplitEigen
,
glcInitializeSplitHClust
Creates a function for initializing latent class model using hierarchical clustering.
glcInitializeSplitHClust(metric = "manhattan", method = "ward")
glcInitializeSplitHClust(metric = "manhattan", method = "ward")
metric |
Dissimilarity metric used for hierarchical clustering |
method |
Linkage method used for hierarchical clustering |
Creates a function f(x)
that will take a data matrix x
and
initialize a weight matrix for a two-class latent class model.
Here, a two-branch split from hierarchical clustering will be used.
See glcTree
for example of using “glcInitializeSplit...” to create starting values.
A function f(x)
(see Details.)
glcInitializeSplitEigen
,
glcInitializeSplitFanny
Splits a data set into two via a Gaussian mixture models
glcSplit(x, initFunctions, weight = NULL, index = NULL, level = 0, wthresh = 1e-09, verbose = TRUE, nthresh = 5, splitCriterion = glcSplitCriterionBIC)
glcSplit(x, initFunctions, weight = NULL, index = NULL, level = 0, wthresh = 1e-09, verbose = TRUE, nthresh = 5, splitCriterion = glcSplitCriterionBIC)
x |
Data matrix (n x j) on which to perform clustering |
initFunctions |
List of functions of type “glcInitialize...” for initializing latent class model.
See |
weight |
Weight corresponding to the indices passed (see |
index |
Row indices of data matrix to include. Defaults to all (1 to n). |
level |
Current level. |
wthresh |
Weight threshold for filtering data to children. Indices having weight less than this value will not be passed to children nodes. |
verbose |
Level of verbosity. Default=2 (too much). 0 for quiet. |
nthresh |
Total weight in node required for node to be a candidate for splitting. Nodes with weight less than this value will never split. |
splitCriterion |
Function of type “glcSplitCriterion...” for determining whether split should occur.
See |
Should not be called by user.
A list of objects representing split.
Split criterion function: compare BICs to determine split.
glcSplitCriterionBIC(llike1, llike2, weight, ww, J, level)
glcSplitCriterionBIC(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “glcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
See glcTree
for example of using “glcSplitCriterion...” to control split.
bic1 |
one-class (weighted) BIC |
bic2 |
two-class (weighted) BIC |
split |
|
glcSplitCriterionBIC
,
glcSplitCriterionJustRecordEverything
,
glcSplitCriterionLevelWtdBIC
,
glcSplitCriterionLRT
Split criterion function: compare ICL-BICs to determine split (i.e. include entropy term in comparison).
glcSplitCriterionBICICL(llike1, llike2, weight, ww, J, level)
glcSplitCriterionBICICL(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “glcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
See glcTree
for example of using “glcSplitCriterion...” to control split.
bic1 |
one-class (weighted) BIC |
bic2 |
two-class (weighted) BIC |
entropy |
two-class entropy |
split |
|
glcSplitCriterionBICICL
,
glcSplitCriterionJustRecordEverything
,
glcSplitCriterionLevelWtdBIC
,
glcSplitCriterionLRT
Split criterion function: always split, but record everything as you go.
glcSplitCriterionJustRecordEverything(llike1, llike2, weight, ww, J, level)
glcSplitCriterionJustRecordEverything(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “glcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
This function ALWAYS returns split=TRUE
. Useful for gathering information.
It is recommended that you set the maxlev
argument in the main function to something
less than infinity (say, 3 or 4).
See glcTree
for example of using “glcSplitCriterion...” to control split.
llike1 |
Just returns |
llike2 |
Just returns |
J |
Just returns |
weight |
Just returns |
ww |
Just returns |
degFreedom |
Degrees-of-freedom for LRT |
chiSquareStat |
Chi-square statistic |
split |
|
glcSplitCriterionBIC
,
glcSplitCriterionBICICL
,
glcSplitCriterionLevelWtdBIC
,
glcSplitCriterionLRT
Split criterion function: use a level-weighted version of BIC to determine split; there is an additional penalty incorporated for deep recursion.
glcSplitCriterionLevelWtdBIC(llike1, llike2, weight, ww, J, level)
glcSplitCriterionLevelWtdBIC(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “glcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
See glcTree
for example of using “glcSplitCriterion...” to control split.
bic1 |
One-class BIC, with additional penalty for deeper levels |
bic2 |
Two-class BIC, with additional penalty for deeper levels |
split |
|
glcSplitCriterionBIC
,
glcSplitCriterionBICICL
,
glcSplitCriterionJustRecordEverything
,
glcSplitCriterionLRT
Split criterion function: use likelihood ratio test p value to determine split.
glcSplitCriterionLRT(llike1, llike2, weight, ww, J, level)
glcSplitCriterionLRT(llike1, llike2, weight, ww, J, level)
llike1 |
one-class likelihood. |
llike2 |
two-class likelihood. |
weight |
weights from RPMM node. |
ww |
“ww” from RPMM node. |
J |
Number of items. |
level |
Node level. |
This is a function of the form “glcSplitCriterion...”, which is required to return a list
with at least a boolean value split
, along with supporting information.
See glcTree
for example of using “glcSplitCriterion...” to control split.
llike1 |
Just returns |
llike2 |
Just returns |
J |
Just returns |
weight |
Just returns |
degFreedom |
Degrees-of-freedom for LRT |
chiSquareStat |
Chi-square statistic |
split |
|
glcSplitCriterionBIC
,
glcSplitCriterionBICICL
,
glcSplitCriterionJustRecordEverything
,
glcSplitCriterionLevelWtdBIC
Subsets a “glcTree” object, i.e. considers the tree whose root is a given node.
glcSubTree(tr, node)
glcSubTree(tr, node)
tr |
“glcTree” object to subset |
node |
Name of node to make root. |
Typically not be called by user.
A “glcTree” object whose root is the given node of tr
Performs Gaussian latent class modeling using recursively-partitioned mixture model
glcTree(x, initFunctions = list(glcInitializeSplitFanny(nu=1.5)), weight = NULL, index = NULL, wthresh = 1e-08, nodename = "root", maxlevel = Inf, verbose = 2, nthresh = 5, level = 0, env = NULL, unsplit = NULL, splitCriterion = glcSplitCriterionBIC)
glcTree(x, initFunctions = list(glcInitializeSplitFanny(nu=1.5)), weight = NULL, index = NULL, wthresh = 1e-08, nodename = "root", maxlevel = Inf, verbose = 2, nthresh = 5, level = 0, env = NULL, unsplit = NULL, splitCriterion = glcSplitCriterionBIC)
x |
Data matrix (n x j) on which to perform clustering. Missing values are supported. |
initFunctions |
List of functions of type “glcInitialize...” for initializing latent class model. See |
weight |
Weight corresponding to the indices passed (see |
index |
Row indices of data matrix to include. Defaults to all (1 to n). |
wthresh |
Weight threshold for filtering data to children. Indices having weight less than this value will not be passed to children nodes. Default=1E-8. |
nodename |
Name of object that will represent node in tree data object. Defaults to “root”. USER SHOULD NOT SET THIS. |
maxlevel |
Maximum depth to recurse. Default=Inf. |
verbose |
Level of verbosity. Default=2 (too much). 0 for quiet. |
nthresh |
Total weight in node required for node to be a candidate for splitting. Nodes with weight less than this value will never split. Defaults to 5. |
level |
Current level. Defaults to 0. USER SHUOLD NOT SET THIS. |
env |
Object of class “glcTree” to store tree data. Defaults to a new object. USER SHOULD NOT SET THIS. |
unsplit |
Latent class parameters from parent, to store in current node. Defaults to NULL for root. This is used in plotting functions. USER SHOULD NOT SET THIS. |
splitCriterion |
Function of type “glcSplitCriterion...” for determining whether a node should be split. See |
This function is called recursively by itself. Upon each recursion, certain arguments (e.g. nodename) are reset. Do not attempt to set these arguments yourself.
An object of class “glcTree”. This is an environment, each of whose component objects represents a node in the tree.
The class “glcTree” is currently implemented as an environment object with nodes represented flatly, with name indicating positition in hierarchy (e.g. “rLLR” = “right child of left child of left child of root”) This implementation is to make certain plotting and update functions simpler than would be required if the data were stored in a more natural “list of list” format.
The following error may appear during the course of the algorithm:
Error in optim(logab, betaObjf, ydata = y, wdata = w, weights = weights, : non-finite value supplied by optim
This is merely an indication that the node being split is too small, in which case the splitting will terminate at that node; in other words, it is nothing to worry about.
E. Andres Houseman
Houseman et al., Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9:365, 2008.
data(IlluminaMethylation) ## Not run: heatmap(IllumBeta, scale="n", col=colorRampPalette(c("yellow","black","blue"),space="Lab")(128)) ## End(Not run) # Fit Gaussian RPMM rpmm <- glcTree(IllumBeta, verbose=0) rpmm # Get weight matrix and show first few rows rpmmWeightMatrix <- glcTreeLeafMatrix(rpmm) rpmmWeightMatrix[1:3,] # Get class assignments and compare with tissue rpmmClass <- glcTreeLeafClasses(rpmm) table(rpmmClass,tissue) ## Not run: # Plot fit par(mfrow=c(2,2)) plot(rpmm) ; title("Image of RPMM Profile") plotTree.glcTree(rpmm) ; title("Dendrogram with Labels") plotTree.glcTree(rpmm, labelFunction=function(u,digits) table(as.character(tissue[u$index]))) title("Dendrogram with Tissue Counts") # Alternate initialization rpmm2 <- glcTree(IllumBeta, verbose=0, initFunctions=list(glcInitializeSplitEigen(), glcInitializeSplitFanny(nu=2.5))) rpmm2 # Alternate split criterion rpmm3 <- glcTree(IllumBeta, verbose=0, maxlev=3, splitCriterion=glcSplitCriterionLevelWtdBIC) rpmm3 rpmm4 <- glcTree(IllumBeta, verbose=0, maxlev=3, splitCriterion=glcSplitCriterionJustRecordEverything) rpmm4$rLL$splitInfo$llike1 rpmm4$rLL$splitInfo$llike2 ## End(Not run)
data(IlluminaMethylation) ## Not run: heatmap(IllumBeta, scale="n", col=colorRampPalette(c("yellow","black","blue"),space="Lab")(128)) ## End(Not run) # Fit Gaussian RPMM rpmm <- glcTree(IllumBeta, verbose=0) rpmm # Get weight matrix and show first few rows rpmmWeightMatrix <- glcTreeLeafMatrix(rpmm) rpmmWeightMatrix[1:3,] # Get class assignments and compare with tissue rpmmClass <- glcTreeLeafClasses(rpmm) table(rpmmClass,tissue) ## Not run: # Plot fit par(mfrow=c(2,2)) plot(rpmm) ; title("Image of RPMM Profile") plotTree.glcTree(rpmm) ; title("Dendrogram with Labels") plotTree.glcTree(rpmm, labelFunction=function(u,digits) table(as.character(tissue[u$index]))) title("Dendrogram with Tissue Counts") # Alternate initialization rpmm2 <- glcTree(IllumBeta, verbose=0, initFunctions=list(glcInitializeSplitEigen(), glcInitializeSplitFanny(nu=2.5))) rpmm2 # Alternate split criterion rpmm3 <- glcTree(IllumBeta, verbose=0, maxlev=3, splitCriterion=glcSplitCriterionLevelWtdBIC) rpmm3 rpmm4 <- glcTree(IllumBeta, verbose=0, maxlev=3, splitCriterion=glcSplitCriterionJustRecordEverything) rpmm4$rLL$splitInfo$llike1 rpmm4$rLL$splitInfo$llike2 ## End(Not run)
Recursively applies a function down the nodes of a Gaussian RPMM tree.
glcTreeApply(tr, f, start = "root", terminalOnly = FALSE, asObject = TRUE, ...)
glcTreeApply(tr, f, start = "root", terminalOnly = FALSE, asObject = TRUE, ...)
tr |
Tree object to recurse |
f |
Function to apply to every node |
start |
Starting node. Default = “root”. |
terminalOnly |
|
asObject |
|
.
In the latter case, f
should be defined as f <- function(nn,tree){...}
.
... |
Additional arguments to pass to |
A list of results; names of elements are names of nodes.
Gets a vector of posterior class membership assignments for terminal nodes.
glcTreeLeafClasses(tr)
glcTreeLeafClasses(tr)
tr |
Tree from which to create assignments. |
See glcTree
for example.
Vector of class assignments
Gets a matrix of posterior class membership weights for terminal nodes.
glcTreeLeafMatrix(tr, rounding = 3)
glcTreeLeafMatrix(tr, rounding = 3)
tr |
Tree from which to create matrix. |
rounding |
Digits to round. |
See glcTree
for example.
N x K matrix of posterior weights
Computes the BIC for the latent class model represented by terminal nodes
glcTreeOverallBIC(tr, ICL = FALSE)
glcTreeOverallBIC(tr, ICL = FALSE)
tr |
Tree object on which to compute BIC |
ICL |
Include ICL entropy term? |
BIC or BIC-ICL.
Wrapper for glm function to incorporate weights corresponding to latent classes
glmLC(y,W,family=quasibinomial(),eps=1E-8,Z=NULL)
glmLC(y,W,family=quasibinomial(),eps=1E-8,Z=NULL)
y |
outcome |
W |
weight matrix (rows=cases, # rows = length of y) |
family |
glm family (default = quasibinomial for logistic regression) |
eps |
threshold below which to delete pseudo-subject corresponding to a specific weight |
Z |
matrix of additional covariates |
This function is a wrapper for glm to incorporate weights corresponding to latent classes (e.g. from an RPMM prediction)
a glm object
Illumina GoldenGate DNA methylation data for 217 normal tissues. 100 most variable CpG sites.
IlluminaMethylation
IlluminaMethylation
a 217 x 100 matrix containing Illumina Avg Beta values (IllumBeta), and a corresponding factor vector of 217 tissue types (tissue).
Christensen BC, Houseman EA, et al. 2009 Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context. PLoS Genet 5(8): e1000602.
Data log-likelihood implied by a specific RPMM model
llikeRPMMObject(o, x, type)
llikeRPMMObject(o, x, type)
o |
RPMM object |
x |
Data matrix |
type |
RPMM type ("blc" or "glc") |
Typically not be called by user.
Vector of loglikelihoods corresponding to rows of x
.
Plot method for objects of type “blcTree”. Plots profiles of terminal nodes in color.
Method wrapper for plotImage.blcTree
.
## S3 method for class 'blcTree' plot(x,...)
## S3 method for class 'blcTree' plot(x,...)
x |
RPMM object to plot. |
... |
Additional arguments to pass to |
See blcTree
for example.
Plot method for objects of type “glcTree”. Plots profiles of terminal nodes in color.
Method wrapper for plotImage.glcTree
.
## S3 method for class 'glcTree' plot(x,...)
## S3 method for class 'glcTree' plot(x,...)
x |
RPMM object to plot. |
... |
Additional arguments to pass to |
See glcTree
for example.
Plots profiles of terminal nodes in color.
plotImage.blcTree(env, start = "r", method = "weight", palette = colorRampPalette(c("yellow", "black", "blue"), space = "Lab")(128), divcol = "red", xorder = NULL, dimensions = NULL, labelType = "LR")
plotImage.blcTree(env, start = "r", method = "weight", palette = colorRampPalette(c("yellow", "black", "blue"), space = "Lab")(128), divcol = "red", xorder = NULL, dimensions = NULL, labelType = "LR")
env |
RPMM object to plot. |
start |
Node to plot (usually root) |
method |
Method to determine width of columns that represent classes: “weight” (subject weight in class) or dQuotebinary (depth in tree). |
palette |
Color palette to use for image plot. |
divcol |
Divider color |
xorder |
Order of variables. Can be useful for constant ordering across multiple plots. |
dimensions |
Subset of dimensions of source data to show. Defaults to all. Useful to show a subset of dimensions. |
labelType |
Label name type: “LR” or “01”. |
See blcTree
for example.
Returns a vector of indices similar to the order
function, representing the orrdering of items used in the plot.
This is useful for replicating the order in another plot, or for axis labeling.
Plots profiles of terminal nodes in color.
plotImage.glcTree(env, start = "r", method = "weight", palette = colorRampPalette(c("yellow", "black", "blue"), space = "Lab")(128), divcol = "red", xorder = NULL, dimensions = NULL, labelType = "LR", muColorEps = 1e-08)
plotImage.glcTree(env, start = "r", method = "weight", palette = colorRampPalette(c("yellow", "black", "blue"), space = "Lab")(128), divcol = "red", xorder = NULL, dimensions = NULL, labelType = "LR", muColorEps = 1e-08)
env |
RPMM object to print. |
start |
Node to plot (usually root) |
method |
Method to determine width of columns that represent classes: “weight” (subject weight in class) or dQuotebinary (depth in tree). |
palette |
Color palette to use for image plot. |
divcol |
Divider color |
xorder |
Order of variables. Can be useful for constant ordering across multiple plots. |
dimensions |
Subset of dimensions of source data to show. Defaults to all. Useful to show a subset of dimensions. |
labelType |
Label name type: “LR” or “01”. |
muColorEps |
Small value to stabilize color generation. |
See glcTree
for example.
Returns a vector of indices similar to the order
function, representing the orrdering of items used in the plot.
This is useful for replicating the order in another plot, or for axis labeling.
Alternate plot function for objects of type blcTree: plots a dendrogram
plotTree.blcTree(env, start = "r", labelFunction = NULL, buff = 4, cex = 0.9, square = TRUE, labelAllNodes = FALSE, labelDigits = 1, ...)
plotTree.blcTree(env, start = "r", labelFunction = NULL, buff = 4, cex = 0.9, square = TRUE, labelAllNodes = FALSE, labelDigits = 1, ...)
env |
Tree object to print |
start |
Note from which to start. Default=“r” for “root”. |
labelFunction |
Function for generating node labels. Useful for labeling each node with a value. |
buff |
Buffer for placing tree in plot window. |
cex |
Text size |
square |
Square dendrogram or “V” shaped |
labelAllNodes |
|
labelDigits |
Digits to include in labels, if |
... |
Other parameters to be passed to |
This plots a dendrogram based on RPMM tree, with labels constructed from summaries of tree object.
See blcTree
for example.
Alternate plot function for objects of type glcTree: plots a dendrogram
plotTree.glcTree(env, start = "r", labelFunction = NULL, buff = 4, cex = 0.9, square = TRUE, labelAllNodes = FALSE, labelDigits = 1, ...)
plotTree.glcTree(env, start = "r", labelFunction = NULL, buff = 4, cex = 0.9, square = TRUE, labelAllNodes = FALSE, labelDigits = 1, ...)
env |
Tree object to print |
start |
Note from which to start. Default=“r” for “root”. |
labelFunction |
Function for generating node labels. Useful for labeling each node with a value. |
buff |
Buffer for placing tree in plot window. |
cex |
Text size |
square |
Square dendrogram or “V” shaped |
labelAllNodes |
|
labelDigits |
Digits to include in labels, if |
... |
Other parameters to be passed to |
This plots a dendrogram based on RPMM tree, with labels constructed from summaries of tree object.
See glcTree
for example.
Prediction method for objects of type blcTree
## S3 method for class 'blcTree' predict(object, newdata=NULL, nodelist=NULL, type="weight",...)
## S3 method for class 'blcTree' predict(object, newdata=NULL, nodelist=NULL, type="weight",...)
object |
RPMM object to print |
newdata |
external data matrix from which to apply predictions |
nodelist |
RPMM subnode to use (default = root) |
type |
output type: "weight" produces output similar to |
... |
(Unused). |
This function is similar to blcTreeLeafMatrix
and blcTreeLeafClasses
, except that it supports prediction on an external data set via the argument newdata
.
Prediction method for objects of type glcTree
## S3 method for class 'glcTree' predict(object, newdata=NULL, nodelist=NULL, type="weight",...)
## S3 method for class 'glcTree' predict(object, newdata=NULL, nodelist=NULL, type="weight",...)
object |
RPMM object to print |
newdata |
external data matrix from which to apply predictions |
nodelist |
RPMM subnode to use (default = root) |
type |
output type: "weight" produces output similar to |
... |
(Unused). |
This function is similar to glcTreeLeafMatrix
and glcTreeLeafClasses
, except that it supports prediction on an external data set via the argument newdata
.
Print method for objects of type blcTree
## S3 method for class 'blcTree' print(x,...)
## S3 method for class 'blcTree' print(x,...)
x |
RPMM object to print |
... |
(Unused). |
See blcTree
for example.
Print method for objects of type blcTree
## S3 method for class 'glcTree' print(x,...)
## S3 method for class 'glcTree' print(x,...)
x |
RPMM object to print |
... |
(Unused). |
See glcTree
for example.