| Title: | Unified Differential Expression, Variability and Skewness Analysis for RNA-Seq Data |
|---|---|
| Description: | Provides a unified framework for the simultaneous testing of differential expression, variability, and skewness of genes using RNA-Seq data. The framework adopts a compositional data analysis approach for modelling RNA-Seq count data, applies the centered log-ratio transformation to obtain continuous variables, and uses the skew-normal distribution for statistical inference. Methods are described in Li and Khang (bioRxiv preprint, 2024, version 3) <doi:10.1101/2024.04.09.588804>. |
| Authors: | Hongxiang Li [aut, cre], Tsung Fei Khang [aut] |
| Maintainer: | Hongxiang Li <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.0.0 |
| Built: | 2026-06-25 19:58:20 UTC |
| Source: | https://github.com/cran/SIEVEseq |
Estimate the mean, standard deviation, and skewness parameters of the skew-normal distribution using centered log-ratio (CLR) transformed RNA-Seq data.
clr.SN.fit(data)clr.SN.fit(data)
data |
A table of CLR-transformed count data, with genes/transcripts on the rows and samples on columns. |
mu |
The maximum likelihood estimate of the mean parameter. |
se.mu |
The standard error of the maximum likelihood estimate of |
z.mu |
The Wald statistic for |
p.mu |
The p-value of the Wald statistic for |
sigma |
The maximum likelihood estimate of the standard deviation parameter. |
se.sigma |
The standard error of the maximum likelihood estimate of |
z.sigma |
The Wald statistic for |
p.sigma |
The p-value of the Wald statistic for |
gamma |
The maximum likelihood estimate of the skewness parameter. |
se.gamma |
The standard error of the maximum likelihood estimate of |
z.gamma |
The Wald statistic for |
p.gamma |
The p-value of the Wald statistic for |
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178, JSTOR
Azzalini, A. and Capitanio, A. (2014). The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.
Azzalini, A. and Arellano-Valle, R. B. (2013). Maximum penalized likelihood estimation for skew-normal and skew-t distributions. Journal of Statistical Planning and Inference 143, 419–433.
Azzalini, A. (2022). The R package sn: The skew-normal and related distribution such as the skew-t and the SUN (version 2.0.2). Universit\'a degli Studi di Padova, Italia. Home page: https://cran.r-project.org/package=sn.
Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapman & Hall, London.
library(SIEVEseq) data(clrCounts1) clr.SN.fit(clrCounts1[1:2, ]) clr.SN.fit(clrCounts1[1, ])library(SIEVEseq) data(clrCounts1) clr.SN.fit(clrCounts1[1:2, ]) clr.SN.fit(clrCounts1[1, ])
A simulated CLR-transformed count table with genes on the rows and samples on the columns for one condition.
data(clrCounts1)data(clrCounts1)
A data frame containing CLR-transformed counts, with 200 genes and 500 samples.
A simulated CLR-transformed count table with genes on the rows and samples on the columns for two groups. Ten percent of the genes are DV genes (gene1 to gene50).
data(clrCounts2)data(clrCounts2)
A data frame containing CLR-transformed counts, with 500 genes and 200 samples per group.
A simulated CLR-transformed count table with genes on the rows and samples on the columns for two groups. Ten percent of the genes are DE genes (gene1 to gene50).
data(clrCounts3)data(clrCounts3)
A data frame containing CLR-transformed counts, with 500 genes and 200 samples per group.
Model CLR-transformed RNA-Seq data using the skew-normal distribution, and then conduct a statistical test for finding genes/transcripts with differential expression using the Wald test.
clrDE(data = NULL, group = NULL)clrDE(data = NULL, group = NULL)
data |
CLR-transformed counts matrix with genes/transcripts on the rows and samples on the columns |
group |
2 groups (control vs. treatment) |
An object of 'clrDE' class that contains the results of the DV test and
associated information:
DE |
The difference of |
se |
The standard error of |
z |
The observed Wald statistic. |
pval |
The unadjusted p-value of the Wald test. |
adj_pval |
The p-value of the Wald test adjusted using the Benjamini-Yekutieli procedure. |
mu1 |
The maximum likelihood estimate of the mean parameter of group 1. |
se.mu1 |
The standard error of the maximum likelihood estimate of |
z.mu1 |
The Wald statistic for |
p.mu1 |
The p-value of the Wald test for |
mu2 |
The maximum likelihood estimate of the mean parameter of group 2. |
se.mu2 |
The standard error of the maximum likelihood estimate of |
z.mu2 |
The Wald statistic for |
p.mu2 |
The p-value of the Wald test for |
clrSIEVE() for DE, DV and DS tests.
library(SIEVEseq) data(clrCounts3) # The first 50 genes (gene1 to gene50) are DE genes groups <- c(rep(0, 200), rep(1, 200)) clrDE_test <- clrDE(clrCounts3[46:100, ], group = groups) sum(is.na(clrDE_test)) # check NA values head(clrDE_test, 5) # adj_pval < 0.05, DE genes tail(clrDE_test, 5) # adj_pval > 0.05, non-DE geneslibrary(SIEVEseq) data(clrCounts3) # The first 50 genes (gene1 to gene50) are DE genes groups <- c(rep(0, 200), rep(1, 200)) clrDE_test <- clrDE(clrCounts3[46:100, ], group = groups) sum(is.na(clrDE_test)) # check NA values head(clrDE_test, 5) # adj_pval < 0.05, DE genes tail(clrDE_test, 5) # adj_pval > 0.05, non-DE genes
Model CLR-transformed RNA-Seq data using the skew-normal distribution, and then conduct a statistical test for finding genes/transcripts with differential variability using the Wald test.
clrDV(data = NULL, group = NULL)clrDV(data = NULL, group = NULL)
data |
A CLR-transformed count matrix with genes/transcripts on the rows and samples on the columns. |
group |
A vector specifying the group labels of the data. |
An object of 'clrDV' class that contains the results of the DV test and
associated information:
DV |
The difference of standard deviation ( |
se |
The standard error of |
z |
The observed Wald statistic. |
pval |
The unadjusted p-value of Wald test. |
adj_pval |
The p-value of the Wald test adjusted using the Benjamini-Yekutieli procedure. |
sigma1 |
The maximum likelihood estimate of the standard deviation parameter for group 1. |
se.sigma1 |
The standard error of the maximum likelihood estimate of |
z.sigma1 |
The Wald statistic for |
p.sigma1 |
The p-value of the Wald test for |
sigma2 |
The maximum likelihood estimate of the standard deviation parameter for group 2. |
se.sigma2 |
The standard error of the maximum likelihood estimate of |
z.sigma2 |
The Wald statistic for |
p.sigma2 |
The p-value of the Wald test for |
clrSIEVE() for DE, DV and DS tests.
library(SIEVEseq) data(clrCounts2) # first 50 genes (gene1 to gene50) are DV genes groups <- c(rep(0, 200), rep(1, 200)) clrDV_test <- clrDV(clrCounts2[46:100, ], group = groups) sum(is.na(clrDV_test)) # check NA values head(clrDV_test, 5) # adj_pval < 0.05, DV genes tail(clrDV_test, 5) # adj_pval > 0.05, non-DV geneslibrary(SIEVEseq) data(clrCounts2) # first 50 genes (gene1 to gene50) are DV genes groups <- c(rep(0, 200), rep(1, 200)) clrDV_test <- clrDV(clrCounts2[46:100, ], group = groups) sum(is.na(clrDV_test)) # check NA values head(clrDV_test, 5) # adj_pval < 0.05, DV genes tail(clrDV_test, 5) # adj_pval > 0.05, non-DV genes
Estimate the mean, standard deviation, and skewness parameters of the skew-normal distribution using centered log-ratio (CLR) transformed RNA-Seq data for 2 groups. The related computational works (standard error, Wald statistic, and p value) are also provided.
clrSeq(data = NULL, group = NULL)clrSeq(data = NULL, group = NULL)
data |
A table of CLR-transformed count data, with genes/transcripts on the rows and samples on the columns for 2 groups. |
group |
A vector specifying the group labels of the data. |
mu1 |
The maximum likelihood estimate of the mean parameter of group 1. |
se.mu1 |
The standard error of the maximum likelihood estimate of |
z.mu1 |
The Wald statistic for |
p.mu1 |
The p-value of the Wald test for |
sigma1 |
The maximum likelihood estimate of the standard deviation parameter of group 1. |
se.sigma1 |
The standard error of the maximum likelihood estimate of |
z.sigma1 |
The Wald statistic for |
p.sigma1 |
The p-value of the Wald test for |
gamma1 |
The maximum likelihood estimate of the skewness parameter of group 1. |
se.gamma1 |
The standard error of the maximum likelihood estimate of |
z.gamma1 |
The Wald statistic for |
p.gamma1 |
The p-value of the Wald test for |
mu2 |
The maximum likelihood estimate of mean parameter of group 2. |
se.mu2 |
The standard error of the maximum likelihood estimate of |
z.mu2 |
The Wald statistic for |
p.mu2 |
The p-value of the Wald test for |
sigma2 |
The maximum likelihood estimate of the standard deviation parameter of group 2. |
se.sigma2 |
The standard error of the maximum likelihood estimate of |
z.sigma2 |
The Wald statistic for |
p.sigma2 |
The p-value of the Wald test for |
gamma2 |
The maximum likelihood estimate of the skewness parameter of group 2. |
se.gamma2 |
The standard error of the maximum likelihood estimate of |
z.gamma2 |
The Wald statistic for |
p.gamma2 |
The p-value of the Wald test for |
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178, JSTOR
Azzalini, A. and Capitanio, A. (2014). The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.
Azzalini, A. and Arellano-Valle, R. B. (2013). Maximum penalized likelihood estimation for skew-normal and skew-t distributions. Journal of Statistical Planning and Inference 143, 419–433.
Azzalini, A. (2022). The R package sn: The skew-normal and related distribution such as the skew-t and the SUN (version 2.0.2). Universit\'a degli Studi di Padova, Italia. Home page: https://cran.r-project.org/package=sn.
Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapman & Hall, London.
clrSIEVE() for DE, DV and DS tests.
library(SIEVEseq) data("clrCounts2") groups <- c(rep(0, 200), rep(1, 200)) clrSeq_result <- clrSeq(clrCounts2[46:100, ], group = groups) tail(clrSeq_result, 5) SN.plot(clrCounts2[1, 1:200]) clrSeq_result[1, c("mu1", "sigma1", "gamma1")]library(SIEVEseq) data("clrCounts2") groups <- c(rep(0, 200), rep(1, 200)) clrSeq_result <- clrSeq(clrCounts2[46:100, ], group = groups) tail(clrSeq_result, 5) SN.plot(clrCounts2[1, 1:200]) clrSeq_result[1, c("mu1", "sigma1", "gamma1")]
Model CLR-transformed RNA-Seq data using the skew-normal distribution, and then conduct statistical tests for finding genes/transcripts with differential expression, variability and skewness using the Wald test.
clrSIEVE( clrSeq_result = NULL, alpha_level = 0.05, order_DE = FALSE, order_LFC = FALSE, order_DS = FALSE, order_sieve = FALSE )clrSIEVE( clrSeq_result = NULL, alpha_level = 0.05, order_DE = FALSE, order_LFC = FALSE, order_DS = FALSE, order_sieve = FALSE )
clrSeq_result |
The result of clrSeq() function. |
alpha_level |
The adjusted p-value cutoff used for flagging genes that show significant differential expression, variability and skewness. |
order_DE |
Logical string. "FALSE" for no ordering; "TRUE" for ordering by the value of |
order_LFC |
Logical string. "FALSE" for no ordering; "TRUE" for ordering by the value of |
order_DS |
Logical string. "FALSE" for no ordering; "TRUE" for ordering by the value of |
order_sieve |
Character/logical string specifying the order method. Possibilities are "DE" for
the value of |
clrDE_test, clrDV_test and clrDS_test contain the
result of the DE, DV and DS tests, respectively.
clrSIEVE_tests contains the result of all three tests.
clrDE_test |
A data.frame contating the results of differential expression test. |
clrDV_test |
A data.frame contating the results of differential variability test. |
clrDS_test |
A data.frame contating the results of differential skewness test. |
clrSIEVE_tests |
A data.frame contating the results of differential expression, variability and skewness tests. |
DE |
The difference of mean ( |
se_DE |
The standard error of |
z_DE |
The observed Wald statistic of |
pval_DE |
The unadjusted p-value of Wald test of |
adj_pval_DE |
The p-value of the Wald test of |
mu1 |
The maximum likelihood estimate of the mean parameter for group 1. |
mu2 |
The maximum likelihood estimate of the mean parameter for group 2. |
de_indicator |
1: DE gene; 0: non-DE gene. |
SD_ratio |
The ratio of standard deviation ( |
LFC |
|
DV |
The difference of standard deviation ( |
se_DV |
The standard error of |
z_DV |
The observed Wald statistic of |
pval_DV |
The unadjusted p-value of Wald test of |
adj_pval_DV |
The p-value of the Wald test of |
sigma1 |
The maximum likelihood estimate of the standard deviation parameter for group 1. |
sigma2 |
The maximum likelihood estimate of the standard deviation parameter for group 2. |
dv_indicator |
1: DV gene; 0: non-DV gene. |
DS |
The difference of skewness parameter ( |
se_DS |
The standard error of |
z_DS |
The observed Wald statistic of |
pval_DS |
The unadjusted p-value of Wald test of |
adj_pval_DS |
The p-value of the Wald test of |
gamma1 |
The maxiimum likelihood estimate of the skewness parameter for group 1. |
gamma2 |
The maxiimum likelihood estimate of the skewness parameter for group 2. |
ds_indicator |
1: DS gene; 0: non-DS gene. |
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178, JSTOR
Azzalini, A. and Capitanio, A. (2014). The Skew-Normal and Related Families. Cambridge University Press, IMS Monographs series.
Azzalini, A. and Arellano-Valle, R. B. (2013). Maximum penalized likelihood estimation for skew-normal and skew-t distributions. Journal of Statistical Planning and Inference 143, 419–433.
Azzalini, A. (2022). The R package sn: The skew-normal and related distribution such as the skew-t and the SUN (version 2.0.2). Universit\'a degli Studi di Padova, Italia. Home page: https://cran.r-project.org/package=sn.
Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapman & Hall, London.
clrSeq() for the results of parameters estimation; alternatively, clrDE() only provides DE test and clrDV() only provides DV test.
library(SIEVEseq) data(clrCounts3) # first 50 genes (gene1 to gene50) are DE genes groups <- c(rep(0, 200), rep(1, 200)) clrSeq_result3 <- clrSeq(clrCounts3[46:100, ], group = groups) # DE dataset clrSIEVE_result3 <- clrSIEVE(clrSeq_result = clrSeq_result3, alpha_level = 0.05, order_DE = FALSE, order_LFC = FALSE, order_DS = FALSE, order_sieve = FALSE) clrDE_test3 <- clrSIEVE_result3$clrDE_test # DE test head(clrDE_test3, 5) clrDS_test3 <- clrSIEVE_result3$clrDS_test # DS test clrDS_test3[clrDS_test3$adj_pval_DS < 0.05, ] clrSIEVE_tests3 <- clrSIEVE_result3$clrSIEVE_tests # Sieve DE, DV and DS genes head(clrSIEVE_tests3, 5) tail(clrSIEVE_tests3, 5)library(SIEVEseq) data(clrCounts3) # first 50 genes (gene1 to gene50) are DE genes groups <- c(rep(0, 200), rep(1, 200)) clrSeq_result3 <- clrSeq(clrCounts3[46:100, ], group = groups) # DE dataset clrSIEVE_result3 <- clrSIEVE(clrSeq_result = clrSeq_result3, alpha_level = 0.05, order_DE = FALSE, order_LFC = FALSE, order_DS = FALSE, order_sieve = FALSE) clrDE_test3 <- clrSIEVE_result3$clrDE_test # DE test head(clrDE_test3, 5) clrDS_test3 <- clrSIEVE_result3$clrDS_test # DS test clrDS_test3[clrDS_test3$adj_pval_DS < 0.05, ] clrSIEVE_tests3 <- clrSIEVE_result3$clrSIEVE_tests # Sieve DE, DV and DS genes head(clrSIEVE_tests3, 5) tail(clrSIEVE_tests3, 5)
Produce a histogram of observed CLR-transformed counts, with the fitted skew-normal probability density function for a particular gene/transcript.
SN.plot(data)SN.plot(data)
data |
CLR-transformed counts for a particular gene/transcript. |
No return value. This function is called for its side effect of generating a plot showing the observed CLR-transformed counts and the fitted skew-normal density.
library(SIEVEseq) data("clrCounts1") SN.plot(clrCounts1[1,]) SN.plot(clrCounts1[2,]) SN.plot(clrCounts1[3,])library(SIEVEseq) data("clrCounts1") SN.plot(clrCounts1[1,]) SN.plot(clrCounts1[2,]) SN.plot(clrCounts1[3,])
Produce violin plots of CLR-transformed count data for two or three groups.
violin.plot.SIEVE( data = NULL, name.gene = NULL, group = NULL, group.names = NULL, xlab = "CLR-transformed count", ylab = "Condition" )violin.plot.SIEVE( data = NULL, name.gene = NULL, group = NULL, group.names = NULL, xlab = "CLR-transformed count", ylab = "Condition" )
data |
A CLR-transformed count table. |
name.gene |
Gene/transcript name. |
group |
A vector specifying the group labels of the data. |
group.names |
A vector specifying the group names. |
xlab |
Name of the x-axis. |
ylab |
Name of the y-axis. |
A ggplot object displaying the distribution of CLR-transformed expression values for the specified gene across groups.
library(SIEVEseq) data(clrCounts2) # first 50 genes (gene1 to gene50) are DV genes data(clrCounts3) # first 50 genes (gene1 to gene50) are DE genes group0 <- c(rep(0, 200), rep(1, 200)) group1 <- c(rep(0, 200), rep(1, 100), rep(2, 100)) violin.plot.SIEVE(data = clrCounts2, "gene1", group = group0, group.names = c("control", "case")) # DV violin.plot.SIEVE(data = clrCounts2, "gene1", group = group1, group.names = c("control", "case1", "case2")) # DV violin.plot.SIEVE(data = clrCounts3, "gene1", group = group0, group.names = c("control", "case")) # DE violin.plot.SIEVE(data = clrCounts3, "gene2", group = group0, group.names = c("control", "case")) # DE violin.plot.SIEVE(data = clrCounts3, "gene200", group = group0, group.names = c("control", "case")) # non-DElibrary(SIEVEseq) data(clrCounts2) # first 50 genes (gene1 to gene50) are DV genes data(clrCounts3) # first 50 genes (gene1 to gene50) are DE genes group0 <- c(rep(0, 200), rep(1, 200)) group1 <- c(rep(0, 200), rep(1, 100), rep(2, 100)) violin.plot.SIEVE(data = clrCounts2, "gene1", group = group0, group.names = c("control", "case")) # DV violin.plot.SIEVE(data = clrCounts2, "gene1", group = group1, group.names = c("control", "case1", "case2")) # DV violin.plot.SIEVE(data = clrCounts3, "gene1", group = group0, group.names = c("control", "case")) # DE violin.plot.SIEVE(data = clrCounts3, "gene2", group = group0, group.names = c("control", "case")) # DE violin.plot.SIEVE(data = clrCounts3, "gene200", group = group0, group.names = c("control", "case")) # non-DE