Title: | Robust Tests for Differential Dispersion and Differential Expression in RNA-Sequencing Data |
---|---|
Description: | Implements the algorithm described in Jun Li and Alicia T. Lamere, "DiPhiSeq: Robust comparison of expression levels on RNA-Seq data with large sample sizes" (Unpublished). Detects not only genes that show different average expressions ("differential expression", DE), but also genes that show different diversities of expressions in different groups ("differentially dispersed", DD). DD genes can be important clinical markers. 'DiPhiSeq' uses a redescending penalty on the quasi-likelihood function, and thus has superior robustness against outliers and other noise. Updates from version 0.1.0: (1) Added the option of using adaptive initial value for phi. (2) Added a function for estimating the proportion of outliers in the data. (3) Modified the input parameter names for clarity, and modified the output format for the main function. |
Authors: | Jun Li [aut, cre], Alicia T. Lamere [aut] |
Maintainer: | Jun Li <[email protected]> |
License: | GPL-3 |
Version: | 0.2.0 |
Built: | 2024-10-27 06:44:38 UTC |
Source: | CRAN |
Main function. For most users, this function is all what they need for the analysis.
diphiseq(countmat, classlab, depth = NULL, c.tukey.beta = 4, c.tukey.phi = 4, phi.ini = "adaptive")
diphiseq(countmat, classlab, depth = NULL, c.tukey.beta = 4, c.tukey.phi = 4, phi.ini = "adaptive")
countmat |
A count matrix. Rows are genes, and columns are samples. Each element/count is the number of reads mapped to a gene in a sample. |
classlab |
The class labels. A vector whose elements are of value 1 or 2. |
depth |
Sequencing depth. A vector of sequencing depth. Users are encouraged to provide estimated values from edgeR, DESeq, PoissonSeq, or other software that they prefer. If no values are provided, depth estimated by total counts will be used. |
c.tukey.beta |
The c value for beta in Tukey's biweight function. The default value, 4, is typically regarded as appropriate and should work for most datasets. |
c.tukey.phi |
The c value for phi in Tukey's biweight function. The default value, 4, is typically regarded as appropriate and should work for most datasets. |
phi.ini |
The initial value for phi to start search. If phi.in == 'adaptive' (the default value), the algorithm will adaptively choose a value (check Algorithm 1 for details). Otherwise, a positive numeric value (such as 0.5) should be given. |
A List that contains the following elements:
tab
: This is a data frame that contains the main results of this package. It has the following columns:
phi1
: the estimated dispersion of sample group 1.
phi2
: the estimated dispersion of sample group 2.
beta1
: the estimated (log) expression of sample group 1.
beta2
: the estimated (log) expression of sample group 2.
statistic.phi
: the z statistic for DD.
statistic.beta
: the z statistic for DE.
p.value.phi
: the p value for DD.
p.value.beta
: the p value for DE.
fdr.phi
: the FDR for DD.
fdr.beta
: the FDR for DE.
log.depth
: A vector of the log sequencing depths.
countmat
: This is the count matrix that the user provided.
classlab
: This is the vecgtor of class labels that the user provided.
phi.ini
: The initial searching value of the dispersion parameter.
mumat
: This is the (estimated) mu (expected expression) matrix, of the same size as countmat.
In another word, E(countmat)=mumat.
phimat
: This is the (estimated) phi matrix, of the same size as countmat.
In another word, counmat ~ negative binomial(mumat, phimat).
countmat <- matrix(rnbinom(100, size=1, mu=50), nrow=4, ncol=25) classlab <- c(rep(1, 10), rep(2, 15)) res <- diphiseq(countmat, classlab) countmat <- matrix(rnbinom(100, size=1, mu=50), nrow=4, ncol=25) classlab <- c(rep(1, 10), rep(2, 15)) res <- diphiseq(countmat, classlab, phi.ini=0.5)
countmat <- matrix(rnbinom(100, size=1, mu=50), nrow=4, ncol=25) classlab <- c(rep(1, 10), rep(2, 15)) res <- diphiseq(countmat, classlab) countmat <- matrix(rnbinom(100, size=1, mu=50), nrow=4, ncol=25) classlab <- c(rep(1, 10), rep(2, 15)) res <- diphiseq(countmat, classlab, phi.ini=0.5)
The first several rows of a dataset from the Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma data collection, consisting of 5 genes from a dataset of raw RNA-Seq counts from 35 tumor and 35 normal samples.
example_data
example_data
A numeric matrix with five genes observed across 70 experiments. The first 35 are tumor samples, the second 35 are normal samples.
Akin, O. and Elnajjar, P. and Heller, M. and Jarosz, R. and Erickson, B. J. and Kirk, S. and ... Filippini, J. (2016) Data from The Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) collection. The Cancer Imaging Archive.
Give a rough estimate of the proportion of outliers in the data based on the results of DiPhiSeq.
outprop(diphiseq.res, fdr.cutoff = 0.1)
outprop(diphiseq.res, fdr.cutoff = 0.1)
diphiseq.res |
The results given by running diphiseq. |
fdr.cutoff |
The cutoff for FDR. |
a numeric value. The estimated proportion of outliers under the FDR cutoff in the data.
countmat <- matrix(rnbinom(100, size=1, mu=50), nrow=4, ncol=25) classlab <- c(rep(1, 10), rep(2, 15)) res <- diphiseq(countmat, classlab) outlier.proportion <- outprop(res)
countmat <- matrix(rnbinom(100, size=1, mu=50), nrow=4, ncol=25) classlab <- c(rep(1, 10), rep(2, 15)) res <- diphiseq(countmat, classlab) outlier.proportion <- outprop(res)
Calculates the estimate and standard error of beta and phi. It takes as input counts from one group of samples for a single gene. This function is the core underlining function of the whole package. A significant part of the code is edited based on William H. Aeberhard's glmrob.nb R function; we appreciate them very much for sharing their code online. This function also implement Algorithm 1 of our submitted paper about DiPhiSeq. This function is called by robtest. Most users don't need to call this function directly.
robnb(y, log.depth, c.tukey.beta = 4, c.tukey.phi = 4, phi.ini = 0.5, alpha = 0.2, minphi = 0.01, maxphi = 5, maxit = 30, maxit.beta = 30, maxit.phi = 30, tol.beta = 0.01, tol.phi = 0.005)
robnb(y, log.depth, c.tukey.beta = 4, c.tukey.phi = 4, phi.ini = 0.5, alpha = 0.2, minphi = 0.01, maxphi = 5, maxit = 30, maxit.beta = 30, maxit.phi = 30, tol.beta = 0.01, tol.phi = 0.005)
y |
A count vector. |
log.depth |
Vector of log(sequencing depths). |
c.tukey.beta |
The c value for beta in Huber function. The default value should be appropriate for most datasets. |
c.tukey.phi |
The c value for phi in Huber function. The default value should be appropriate for most datasets. |
phi.ini |
The initial value of phi. |
alpha |
A positive value for setting initial values. The default value is usually appropriate. |
minphi |
A searching parameter for Algorithm 1 (check the algorithm for details.) The default value is usually appropriate. |
maxphi |
A searching parameter for Algorithm 1 (check the algorithm for details.) The default value is usually appropriate. |
maxit |
Maximum number of iterations for the outer loop. The default value is usually appropriate. |
maxit.beta |
Maximum number of iterations for the inner loop of solving beta. The default value is usually appropriate. |
maxit.phi |
Maximum number of iterations for the inner loop of solving phi. The default value is usually appropriate. |
tol.beta |
The numerical tolerance of solving beta. The default value is usually appropriate. |
tol.phi |
The numerical tolerance of solving phi. The default value is usually appropriate. |
A list that contains the elements:
beta
: the estimated (log) expression.
phi
: the estimated dispersion.
fconv
: flag of the convergence of the search.
vars
: the variance-covariance matrix of the estimates.
sd.beta
: the standard error of beta.
sd.phi
: the standard error of phi.
y
: the input y value.
log.depth
: log(sequencing depth).
d <- runif(10, min=1, max=2) y <- rnbinom(10, size=1, mu=d*50) res <- robnb(y, log(d))
d <- runif(10, min=1, max=2) y <- rnbinom(10, size=1, mu=d*50) res <- robnb(y, log(d))
Calls the robnb function to estimate the coefficients, and then construct the statistical tests for DD and DE. It works for a single gene. y1 and y2 are count vectors for a single gene. diphiseq calls this function to do the calculation for each gene. Normal users often don't need to use this function directly.
robtest(y1, log.depth1, y2, log.depth2, c.tukey.beta = 4, c.tukey.phi = 4)
robtest(y1, log.depth1, y2, log.depth2, c.tukey.beta = 4, c.tukey.phi = 4)
y1 |
counts from group 1. a vector. |
log.depth1 |
log(sequencing depths) for samples in group 1. a vector. |
y2 |
counts from group 2. a vector. |
log.depth2 |
log(sequencing depths) for samples in group 2. a vector. |
c.tukey.beta |
The c value for beta in Huber function. The default value, 4, is typically regarded as appropriate and should work for most datasets. |
c.tukey.phi |
The c value for phi in Huber function. The default value, 4, is typically regarded as appropriate and should work for most datasets. |
A vector that contains the elements:
phi1
: the estimated dispersion of sample group 1.
phi2
: the estimated dispersion of sample group 2.
beta1
: the estimated (log) expression of sample group 1.
beta2
: the estimated (log) expression of sample group 2.
statistic.phi
: the z statistic for DD.
statistic.beta
: the z statistic for DE.
p.value.phi
: the p value for DD.
p.value.beta
: the p value for DE.
d1 <- runif(10, min=1, max=2) d2 <- runif(15, min=1, max=2) y1 <- rnbinom(10, size=1, mu=d1*50) y2 <- rnbinom(15, size=1, mu=d2*50) res <- robtest(y1, log(d1), y2, log(d2))
d1 <- runif(10, min=1, max=2) d2 <- runif(15, min=1, max=2) y1 <- rnbinom(10, size=1, mu=d1*50) y2 <- rnbinom(15, size=1, mu=d2*50) res <- robtest(y1, log(d1), y2, log(d2))