| Title: | Subclone Multiplicity Allocation and Somatic Heterogeneity |
|---|---|
| Description: | Cluster user-supplied somatic read counts with corresponding allele-specific copy number and tumor purity to infer feasible underlying intra-tumor heterogeneity in terms of number of subclones, multiplicity, and allocation (Little et al. (2019) <doi:10.1186/s13073-019-0643-9>). |
| Authors: | Paul Little [aut, cre] |
| Maintainer: | Paul Little <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.0 |
| Built: | 2026-05-09 05:43:54 UTC |
| Source: | https://github.com/cran/SMASH |
A R list containing subclone configurations in matrix form for 1 to 5 subclones. For each matrix, each column corresponds to a subclone and each row corresponds to a variant's allocation across all subclones. For example, the first row of each matrix is a vector of 1's to represent clonal variants, variants present in all subclones.
eSeS
An object of class list of length 5.
Simulates observed alternate and reference read counts
gen_ITH_RD(DATA, RD)gen_ITH_RD(DATA, RD)
DATA |
The output data.frame from |
RD |
A positive integer for the mean read depth generated from the negative binomial distribution |
A matrix of simulated alternate and reference read counts.
Simulates copy number states, multiplicities, allocations
gen_subj_truth(mat_eS, maxLOCI, nCN = NULL)gen_subj_truth(mat_eS, maxLOCI, nCN = NULL)
mat_eS |
A subclone configuration matrix pre-defined in R list |
maxLOCI |
A positive integer number of simulated somatic variant calls |
nCN |
A positive integer for the number of allelic copy number pairings
to sample from. If |
A list containing the following components:
subj_truthdataframe of each variant's simulated minor
(CN_1) and major (CN_2) copy number states, total copy
number (tCN), subclone allocation (true_A), multiplicity
(true_M), mutant allele frequency (true_MAF), and cellular
prevalence (true_CP)
puritytumor purity
etathe product of tumor purity and subclone proportions
qvector of subclone proportions
This function performs a grid search over enumerated
configurations within the pre-defined list eS
grid_ITH_optim( my_data, my_purity, list_eS, pi_eps0 = NULL, trials = 20, max_iter = 4000, my_epsilon = 1e-06 )grid_ITH_optim( my_data, my_purity, list_eS, pi_eps0 = NULL, trials = 20, max_iter = 4000, my_epsilon = 1e-06 )
my_data |
A R dataframe containing the following columns:
|
my_purity |
A single numeric value of known/estimated purity |
list_eS |
A nested list of subclone configuration matrices |
pi_eps0 |
A user-specified parameter denoting the proportion
of loci not explained by the combinations of purity, copy number,
multiplicity, and allocation. If |
trials |
Positive integer, number of random initializations of subclone proportions |
max_iter |
Positive integer, preferably 1000 or more, setting the maximum number of iterations |
my_epsilon |
Convergence criterion threshold for changes in the log likelihood, preferably 1e-6 or smaller |
A R list containing two objects. GRID is a
dataframe where each row denotes a feasible subclone configuration
with corresponding subclone proportion estimates q and
somatic variant allocations alloc. INFER is a list
where INFER[[i]] corresponds to the i-th row or
model of GRID.
Performs EM algorithm for a given configuration matrix
ITH_optim( my_data, my_purity, init_eS, pi_eps0 = NULL, my_unc_q = NULL, max_iter = 4000, my_epsilon = 1e-06 )ITH_optim( my_data, my_purity, init_eS, pi_eps0 = NULL, my_unc_q = NULL, max_iter = 4000, my_epsilon = 1e-06 )
my_data |
A R dataframe containing the following columns:
|
my_purity |
A single numeric value of known/estimated purity |
init_eS |
A subclone configuration matrix pre-defined in R
list |
pi_eps0 |
A user-specified parameter denoting the proportion
of loci not explained by the combinations of purity, copy number,
multiplicity, and allocation. If |
my_unc_q |
An optimal initial vector for the unconstrained
|
max_iter |
Positive integer, preferably 1000 or more, setting the maximum number of iterations |
my_epsilon |
Convergence criterion threshold for changes in the log likelihood, preferably 1e-6 or smaller |
If the EM algorithm converges, the output will be a list containing
iternumber of iterations
convergeconvergence status
unc_q0initial unconstrained subclone proportions parameter
unc_qunconstrained estimate of q
qestimated subclone proportions among cancer cells
CN_MA_piestimated mixture probabilities of multiplicities and allocations given copy number states
etaestimated subclone proportion among tumor cells
purityuser-inputted tumor purity
entropyestimated entropy
inferA R dataframe containing inferred variant allocations
(infer_A), multiplicities (infer_M), cellular prevalences
(infer_CP).
msmodel size, number of parameters within parameter space
LLThe observed log likelihood evaluated at maximum likelihood estimates.
AIC = 2 * LL - 2 * msNegative AIC, used for model selection
BIC = 2 * LL - ms * log(LOCI)Negative BIC, used for model selection
LOCIThe number of inputted somatic variants.
A simple visualization of SMASH's grid of solutions
vis_GRID(GRID)vis_GRID(GRID)
GRID |
The |
A ggplot object for data visualization