Title: | Fused Lasso Latent Feature Model |
---|---|
Description: | Fits the Fused Lasso Latent Feature model, which is used for modeling multi-sample aCGH data to identify regions of copy number variation (CNV). Produces a set of features that describe the patterns of CNV and a set of weights that describe the composition of each sample. Also provides functions for choosing the optimal tuning parameters and the appropriate number of features, and for estimating the false discovery rate. |
Authors: | Gen Nowak [aut, cre], Trevor Hastie [aut], Jonathan R. Pollack [aut], Robert Tibshirani [aut], Nicholas Johnson [aut] |
Maintainer: | Gen Nowak <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2-1 |
Built: | 2024-12-02 06:33:57 UTC |
Source: | CRAN |
Fits the Fused Lasso Latent Feature (FLLat) model for given
values of (the number of features), and
and
(the two fused lasso tuning parameters).
FLLat(Y, J=min(15,floor(ncol(Y)/2)), B="pc", lam1, lam2, thresh=10^(-4), maxiter=100, maxiter.B=1, maxiter.T=1)
FLLat(Y, J=min(15,floor(ncol(Y)/2)), B="pc", lam1, lam2, thresh=10^(-4), maxiter=100, maxiter.B=1, maxiter.T=1)
Y |
A matrix of data from an aCGH experiment (usually in the form of log intensity ratios) or some other type of copy number data. Rows correspond to the probes and columns correspond to the samples. |
J |
The number of features in the FLLat model. The default is
the smaller of either |
B |
The initial values for the features. Can be one of
|
lam1 |
The tuning parameter |
lam2 |
The tuning parameter |
thresh |
The threshold for determining when the solutions have
converged. The default is |
maxiter |
The maximum number of iterations for the outer loop of
the algorithm. The default is |
maxiter.B |
The maximum number of iterations for the inner loop
of the algorithm for estimating the features |
maxiter.T |
The maximum number of iterations for the inner loop
of the algorithm for estimating the weights |
This function fits the Fused Lasso Latent Feature model to
multi-sample aCGH data, as described in Nowak and others (2011), for
given values of ,
and
. Given
aCGH data consisting of
samples and
probes, the model
is given by:
where is an
-by-
matrix denoting the aCGH data (with samples in columns),
is an
-by-
matrix denoting the features (with features in
columns), and
is a
-by-
matrix denoting the
weights. Each feature describes a pattern of copy number variation
and the weights describe the composition of each sample.
Specifically, each sample (column of
) is modeled as a
weighted sum of the features (columns of
), with the weights
given by the corresponding column of
.
The model is fitted by minimizing a penalized version of the residual sum of squares (RSS):
where the penalty is given by:
Here denotes the
th element of
.
We also constrain the
norm of each row of
to be
less than or equal to
.
For more details, please see Nowak and others (2011) and the package vignette.
An object of class FLLat
with components:
Beta |
The estimated features |
Theta |
The estimated weights |
niter |
The number of iterations taken by the algorithm (outer loop). |
rss |
The residual sum of squares of the fitted model. |
bic |
The BIC for the fitted model. See |
lam1 |
The value of |
lam2 |
The value of |
There is a plot
method and a predict
method for FLLat
objects.
Gen Nowak [email protected], Trevor Hastie, Jonathan R. Pollack, Robert Tibshirani and Nicholas Johnson.
G. Nowak, T. Hastie, J. R. Pollack and R. Tibshirani. A Fused Lasso Latent Feature Model for Analyzing Multi-Sample aCGH Data. Biostatistics, 2011, doi: 10.1093/biostatistics/kxr012
plot.FLLat
, predict.FLLat
,
FLLat.BIC
, FLLat.PVE
,
FLLat.FDR
## Load simulated aCGH data. data(simaCGH) ## Run FLLat for J = 5, lam1 = 1 and lam2 = 9. result <- FLLat(simaCGH,J=5,lam1=1,lam2=9) ## Plot the estimated features. plot(result) ## Plot a heatmap of the estimated weights. plot(result,type="weights")
## Load simulated aCGH data. data(simaCGH) ## Run FLLat for J = 5, lam1 = 1 and lam2 = 9. result <- FLLat(simaCGH,J=5,lam1=1,lam2=9) ## Plot the estimated features. plot(result) ## Plot a heatmap of the estimated weights. plot(result,type="weights")
Returns the optimal values of the fused lasso tuning parameters for the Fused Lasso Latent Feature (FLLat) model by minimizing the BIC. Also returns the fitted FLLat model for the optimal values of the tuning parameters.
FLLat.BIC(Y, J=min(15,floor(ncol(Y)/2)), B="pc", thresh=10^(-4), maxiter=100, maxiter.B=1, maxiter.T=1)
FLLat.BIC(Y, J=min(15,floor(ncol(Y)/2)), B="pc", thresh=10^(-4), maxiter=100, maxiter.B=1, maxiter.T=1)
Y |
A matrix of data from an aCGH experiment (usually in the form of log intensity ratios) or some other type of copy number data. Rows correspond to the probes and columns correspond to the samples. |
J |
The number of features in the FLLat model. The default is
the smaller of either |
B |
The initial values for the features. Can be one of
|
thresh |
The threshold for determining when the solutions have
converged. The default is |
maxiter |
The maximum number of iterations for the outer loop of
the FLLat algorithm. The default is |
maxiter.B |
The maximum number of iterations for the inner loop
of the FLLat algorithm for estimating the features |
maxiter.T |
The maximum number of iterations for the inner loop
of the FLLat algorithm for estimating the weights |
This function returns the optimal values of the fused lasso tuning
parameters, and
, for the FLLat model.
The optimal values are chosen by first
re-parameterizing
and
in terms of
and a proportion
such that
and
.
The values of
are fixed to be
and for
each value of
we consider a range of
values. The optimal values of
and
(and
consequently
and
) are chosen by
minimizing the following BIC-type criterion over this two dimensional grid:
where is the number of samples,
is the number probes,
denotes the residual sum of
squares and
denotes the sum over all the
features of the number of unique non-zero elements in each estimated
feature.
Note that for extremely large data sets, this function may take some time to run.
For more details, please see Nowak and others (2011) and the package vignette.
A list with components:
lam0 |
The optimal value of |
alpha |
The optimal value of |
lam1 |
The optimal value of |
lam2 |
The optimal value of |
opt.FLLat |
The fitted FLLat model for the optimal values of the tuning parameters. |
Gen Nowak [email protected], Trevor Hastie, Jonathan R. Pollack, Robert Tibshirani and Nicholas Johnson.
G. Nowak, T. Hastie, J. R. Pollack and R. Tibshirani. A Fused Lasso Latent Feature Model for Analyzing Multi-Sample aCGH Data. Biostatistics, 2011, doi: 10.1093/biostatistics/kxr012
## Load simulated aCGH data. data(simaCGH) ## Run FLLat.BIC to choose optimal tuning parameters for J = 5 features. result.bic <- FLLat.BIC(simaCGH,J=5) ## Plot the features for the optimal FLLat model. plot(result.bic$opt.FLLat) ## Plot a heatmap of the weights for the optimal FLLat model. plot(result.bic$opt.FLLat,type="weights")
## Load simulated aCGH data. data(simaCGH) ## Run FLLat.BIC to choose optimal tuning parameters for J = 5 features. result.bic <- FLLat.BIC(simaCGH,J=5) ## Plot the features for the optimal FLLat model. plot(result.bic$opt.FLLat) ## Plot a heatmap of the weights for the optimal FLLat model. plot(result.bic$opt.FLLat,type="weights")
Estimates the false discovery rate (FDR) over a range of threshold values for a fitted Fused Lasso Latent Feature (FLLat) model. Also plots the FDRs against the threshold values.
FLLat.FDR(Y, Y.FLLat, n.thresh=50, fdr.control=0.05, pi0=1, n.perms=20) ## S3 method for class 'FDR' plot(x, xlab="Threshold", ylab="FDR", ...)
FLLat.FDR(Y, Y.FLLat, n.thresh=50, fdr.control=0.05, pi0=1, n.perms=20) ## S3 method for class 'FDR' plot(x, xlab="Threshold", ylab="FDR", ...)
Y |
A matrix of data from an aCGH experiment (usually in the form of log intensity ratios) or some other type of copy number data. Rows correspond to the probes and columns correspond to the samples. |
Y.FLLat |
A FLLat model fitted to |
n.thresh |
The number of threshold values at which to estimate
the FDR. The default is |
fdr.control |
A value at which to control the FDR. The function
will return the smallest threshold value which controls the FDR at the
specified value. The default is |
pi0 |
The proportion of true null hypotheses. For probe location
|
n.perms |
The number of permutations of the aCGH data used in
estimating the FDRs. The default is |
x |
An object of class |
xlab |
The title for the |
ylab |
The title for the |
... |
Further graphical parameters. |
Identifying regions of copy number variation (CNV) in aCGH data
can be viewed in a multiple-testing framework. For each probe
location within sample
, we are essentially testing the
hypothesis
that there is no CNV at that location.
The decision to reject each hypothesis can be based on the fitted
values
produced by the FLLat model.
Specifically, for a given threshold value
, we can declare
location
as exhibiting CNV if
. The FDR is then defined to be the expected
proportion of declared CNVs which are not true CNVs.
The FDR for a fitted FLLat model is estimated in the following
manner. Firstly, n.thresh
threshold values are chosen, equally
spaced between and the largest absolute fitted value over all
locations
. Then, for each threshold value, the estimated
FDR is equal to
where:
The quantity is the number of declared CNVs calculated
from the fitted FLLat model, as described above.
The quantity is the number of declared CNVs
calculated from re-fitting the FLLat model to permuted versions of the
data
. In each permuted data set, the probe locations
within each sample are permuted to approximate the null distribution
of the data.
The quantity is the proportion of true null
hypotheses. The default value of
will result in conservative
estimates of the FDR. If warranted, smaller values of
can be specified.
For more details, please see Nowak and others (2011) and the package vignette.
An object of class FDR
with components:
thresh.vals |
The threshold values for which each FDR was estimated. |
FDRs |
The estimated FDR for each value of |
thresh.control |
The smallest threshold value which controls the
estimated FDR at |
There is a plot
method for FDR
objects.
Due to the randomness of the permutations, for reproducibility of
results please set the random seed using set.seed
before running FLLat.FDR
.
Gen Nowak [email protected], Trevor Hastie, Jonathan R. Pollack, Robert Tibshirani and Nicholas Johnson.
G. Nowak, T. Hastie, J. R. Pollack and R. Tibshirani. A Fused Lasso Latent Feature Model for Analyzing Multi-Sample aCGH Data. Biostatistics, 2011, doi: 10.1093/biostatistics/kxr012
## Load simulated aCGH data. data(simaCGH) ## Run FLLat for J = 5, lam1 = 1 and lam2 = 9. result <- FLLat(simaCGH,J=5,lam1=1,lam2=9) ## Estimate the FDRs. result.fdr <- FLLat.FDR(simaCGH,result) ## Plotting the FDRs against the threshold values. plot(result.fdr) ## The threshold value which controls the FDR at 0.05. result.fdr$thresh.control
## Load simulated aCGH data. data(simaCGH) ## Run FLLat for J = 5, lam1 = 1 and lam2 = 9. result <- FLLat(simaCGH,J=5,lam1=1,lam2=9) ## Estimate the FDRs. result.fdr <- FLLat.FDR(simaCGH,result) ## Plotting the FDRs against the threshold values. plot(result.fdr) ## The threshold value which controls the FDR at 0.05. result.fdr$thresh.control
Calculates the percentage of variation explained (PVE) for
a range of values of (the number of features) for the Fused
Lasso Latent Feature (FLLat) model. Also plots the PVE against
,
which can be used for choosing the value of
.
FLLat.PVE(Y, J.seq=seq(1,min(15,floor(ncol(Y)/2)),by=2), B=c("pc","rand"), lams=c("same","diff"), thresh=10^(-4), maxiter=100, maxiter.B=1, maxiter.T=1) ## S3 method for class 'PVE' plot(x, xlab="Number of Features", ylab="PVE", ...)
FLLat.PVE(Y, J.seq=seq(1,min(15,floor(ncol(Y)/2)),by=2), B=c("pc","rand"), lams=c("same","diff"), thresh=10^(-4), maxiter=100, maxiter.B=1, maxiter.T=1) ## S3 method for class 'PVE' plot(x, xlab="Number of Features", ylab="PVE", ...)
Y |
A matrix of data from an aCGH experiment (usually in the form of log intensity ratios) or some other type of copy number data. Rows correspond to the probes and columns correspond to the samples. |
J.seq |
A vector of values of |
B |
The initial values for the features to use in the FLLat
algorithm for each value of |
lams |
The choice of whether to use the same values of the tuning
parameters in the FLLat algorithm for each value of |
thresh |
The threshold for determining when the solutions have
converged in the FLLat algorithm. The default is |
maxiter |
The maximum number of iterations for the outer loop of
the FLLat algorithm. The default is |
maxiter.B |
The maximum number of iterations for the inner loop
of the FLLat algorithm for estimating the features |
maxiter.T |
The maximum number of iterations for the inner loop
of the FLLat algorithm for estimating the weights |
x |
An object of class |
xlab |
The title for the |
ylab |
The title for the |
... |
Further graphical parameters. |
This function calculates the PVE for each value of as
specified by
J.seq
. The PVE is defined to be:
where RSS and TSS denote the
residual sum of squares and the total sum of squares, respectively.
For each value of , the PVE is calculated by fitting the FLLat
model with that value of
.
There are two choices for how the tuning parameters are chosen when
fitting the FLLat model for each value of . The first choice,
given by
lams="same"
, applies the FLLat.BIC
function just once for the default value of . The resulting
optimal tuning parameters are then used for all values of
in
J.seq
. The second choice, given by lams="diff"
, applies
the FLLat.BIC
function for each value of in
J.seq
. Although this second choice will give a more accurate
measure of the PVE, it will take much longer to run than the first
choice.
When the PVE is plotted against , as
increases the PVE
will begin to plateau after a certain point, indicating that
additional features are not improving the model. Therefore, the value
of
to use in the FLLat algorithm can be chosen as the point at
which the PVE plot begins to plateau.
For more details, please see Nowak and others (2011) and the package vignette.
An object of class PVE
with components:
PVEs |
The PVE for each value of |
J.seq |
The sequence of |
There is a plot
method for PVE
objects.
Gen Nowak [email protected], Trevor Hastie, Jonathan R. Pollack, Robert Tibshirani and Nicholas Johnson.
G. Nowak, T. Hastie, J. R. Pollack and R. Tibshirani. A Fused Lasso Latent Feature Model for Analyzing Multi-Sample aCGH Data. Biostatistics, 2011, doi: 10.1093/biostatistics/kxr012
## Load simulated aCGH data. data(simaCGH) ## Generate PVEs for J ranging from 1 to the number of samples divided by 2. result.pve <- FLLat.PVE(simaCGH,J.seq=1:(ncol(simaCGH)/2)) ## Generate PVE plot. plot(result.pve)
## Load simulated aCGH data. data(simaCGH) ## Generate PVEs for J ranging from 1 to the number of samples divided by 2. result.pve <- FLLat.PVE(simaCGH,J.seq=1:(ncol(simaCGH)/2)) ## Generate PVE plot. plot(result.pve)
Plots either the estimated features or a heatmap of the
estimated weights from a fitted Fused Lasso Latent Feature (FLLat)
model (i.e., an object of class FLLat
).
## S3 method for class 'FLLat' plot(x, type=c("features","weights"), f.mar=c(5,3,4,2), f.xlab="Probe", w.mar=c(3,5,0,2), samp.names=1:ncol(x$Theta), hc.meth="complete", ...)
## S3 method for class 'FLLat' plot(x, type=c("features","weights"), f.mar=c(5,3,4,2), f.xlab="Probe", w.mar=c(3,5,0,2), samp.names=1:ncol(x$Theta), hc.meth="complete", ...)
x |
A fitted FLLat model. That is, an object of class
|
type |
The choice of whether to plot the estimated features
|
f.mar |
The margins for the plot of each estimated feature. |
f.xlab |
The label for the |
w.mar |
The margins for the heatmap of the estimated weights. |
samp.names |
The sample names used to label the columns in the heatmap of the estimated weights. |
hc.meth |
The agglomeration method to be used in the hierarchical
clustering of the columns of |
... |
Further graphical parameters, for the |
This function plots the estimated features or a heatmap
of the estimated weights
from a fitted FLLat model.
The features are plotted in order of decreasing total magnitude, where
the magnitude is given by
with
for
denoting the
th estimated feature (column of
).
Similarly, the rows of the heatmap of the estimated weights are
re-ordered in the same way. The heatmap also includes a dendrogram of
a hierarchical clustering of the samples based on their estimated
weights (columns of
).
For more details, please see Nowak and others (2011) and the package vignette.
Gen Nowak [email protected], Trevor Hastie, Jonathan R. Pollack, Robert Tibshirani and Nicholas Johnson.
G. Nowak, T. Hastie, J. R. Pollack and R. Tibshirani. A Fused Lasso Latent Feature Model for Analyzing Multi-Sample aCGH Data. Biostatistics, 2011, doi: 10.1093/biostatistics/kxr012
## Load simulated aCGH data. data(simaCGH) ## Run FLLat for J = 5, lam1 = 1 and lam2 = 9. result <- FLLat(simaCGH,J=5,lam1=1,lam2=9) ## Plot the estimated features. plot(result) ## Plot a heatmap of the estimated weights. plot(result,type="weights")
## Load simulated aCGH data. data(simaCGH) ## Run FLLat for J = 5, lam1 = 1 and lam2 = 9. result <- FLLat(simaCGH,J=5,lam1=1,lam2=9) ## Plot the estimated features. plot(result) ## Plot a heatmap of the estimated weights. plot(result,type="weights")
Calculates predicted values and weights for a new set of samples using the estimated features from a fitted Fused Lasso Latent Feature (FLLat) model.
## S3 method for class 'FLLat' predict(object, newY=NULL, thresh=10^(-4), maxiter.T=100, ...)
## S3 method for class 'FLLat' predict(object, newY=NULL, thresh=10^(-4), maxiter.T=100, ...)
object |
A fitted FLLat model. That is, an object of class
|
newY |
A matrix of new data from an aCGH experiment (usually in
the form of log intensity ratios) or some other type of copy number
data. Rows correspond to the probes and columns correspond to the
samples. The number of probes must match the number of probes in the
data used to produce the fitted FLLat model. Note that if
|
thresh |
The threshold for determining when the predicted weights
have converged. The default is |
maxiter.T |
The maximum number of iterations for the algorithm
for calculating the predicted weights. The default is |
... |
Arguments passed to or from other methods. |
Based on the estimated features from a fitted
FLLat model, this function predicts the new weights that need to be
applied to each feature for predicting a new set of samples
. The predicted weights
are calculated by minimizing the
residual sum of squares:
where the norm of each row of
is still constrained to be less
than or equal to
. From these predicted weights, the predicted
values for the new set of samples are calculated as
.
These predicted values can useful when performing model validation.
Note that for the predictions to be meaningful and useful, the new set
of samples must be similar in scale/magnitude to the
original data used in producing the fitted FLLat model. If a
new set of samples
are not specified, the function
returns the fitted values
and estimated weights
from the fitted FLLat model.
For more details, please see Nowak and others (2011) and the package vignette.
A list with components:
pred.Y |
The predicted values |
Theta |
The predicted weights |
niter |
The number of iterations taken by the algorithm for calculating the predicted weights, or the number of iterations taken by the algorithm for producing the fitted FLLat model. |
rss |
The residual sum of squares based on the new set of samples, or based on the original data used in producing the fitted FLLat model. |
Gen Nowak [email protected], Trevor Hastie, Jonathan R. Pollack, Robert Tibshirani and Nicholas Johnson.
G. Nowak, T. Hastie, J. R. Pollack and R. Tibshirani. A Fused Lasso Latent Feature Model for Analyzing Multi-Sample aCGH Data. Biostatistics, 2011, doi: 10.1093/biostatistics/kxr012
## Load simulated aCGH data. data(simaCGH) ## Divide the data into a training and test set. tr.dat <- simaCGH[,1:15] tst.dat <- simaCGH[,16:20] ## Run FLLat for J = 5, lam1 = 1 and lam2 = 9 on the training set. result.tr <- FLLat(tr.dat,J=5,lam1=1,lam2=9) ## Calculate fitted values on the training set. tr.pred <- predict(result.tr) ## Calculate predicted values and weights on the test set using the FLLat ## model (i.e., the features) fitted on the training set. tst.pred <- predict(result.tr,newY=tst.dat) ## Plotting predicted values and data for the first sample in the test set. plot(tst.dat[,1],xlab="Probe",ylab="Y") lines(tst.pred$pred.Y[,1],col="red",lwd=3)
## Load simulated aCGH data. data(simaCGH) ## Divide the data into a training and test set. tr.dat <- simaCGH[,1:15] tst.dat <- simaCGH[,16:20] ## Run FLLat for J = 5, lam1 = 1 and lam2 = 9 on the training set. result.tr <- FLLat(tr.dat,J=5,lam1=1,lam2=9) ## Calculate fitted values on the training set. tr.pred <- predict(result.tr) ## Calculate predicted values and weights on the test set using the FLLat ## model (i.e., the features) fitted on the training set. tst.pred <- predict(result.tr,newY=tst.dat) ## Plotting predicted values and data for the first sample in the test set. plot(tst.dat[,1],xlab="Probe",ylab="Y") lines(tst.pred$pred.Y[,1],col="red",lwd=3)
These samples of aCGH data are simulated using the Fused Lasso Latent
Feature model with
features, as described in Section 4 of Nowak and
others (2011).
data(simaCGH)
data(simaCGH)
A matrix consisting of probes (rows) and
samples (columns).
G. Nowak, T. Hastie, J. R. Pollack and R. Tibshirani. A Fused Lasso Latent Feature Model for Analyzing Multi-Sample aCGH Data. Biostatistics, 2011, doi: 10.1093/biostatistics/kxr012
data(simaCGH)
data(simaCGH)