| Title: | Compositional Statistical Framework for RNA Fractionation Analysis |
|---|---|
| Description: | A compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data. 'FracFixR' addresses the fundamental challenge in fractionated RNA-seq experiments where library preparation and sequencing depth obscure the original proportions of RNA fractions. It reconstructs original fraction proportions using non-negative linear regression, estimates the "lost" unrecoverable fraction, corrects individual transcript frequencies, and performs differential proportion testing between conditions. Supports any RNA fractionation protocol including polysome profiling, sub-cellular localization, and RNA-protein complex isolation. |
| Authors: | Alice Cleynen [aut, cre] (ORCID: <https://orcid.org/0000-0001-8083-0204>), Agin Ravindran [aut], Nikolay Shirokikh [aut] (ORCID: <https://orcid.org/0000-0001-8249-358X>) |
| Maintainer: | Alice Cleynen <[email protected]> |
| License: | CC BY 4.0 |
| Version: | 1.1.0 |
| Built: | 2026-05-11 10:40:56 UTC |
| Source: | https://github.com/cran/FracFixR |
Performs statistical testing to identify transcripts with significantly different proportions between conditions in specified fraction(s). Implements three test options: GLM (most powerful), Logit, and Wald.
DiffPropTest(NormObject, Conditions, Types, Test = c("GLM", "Logit", "Wald"))DiffPropTest(NormObject, Conditions, Types, Test = c("GLM", "Logit", "Wald"))
NormObject |
Output from FracFixR() function |
Conditions |
Character vector of exactly 2 conditions to compare |
Types |
Character vector of fraction type(s) to analyze. Can be single fraction or multiple (will be combined) |
Test |
Statistical test to use: "GLM", "Logit", or "Wald" |
GLM: Uses binomial generalized linear model (most statistically powerful)
Logit: Faster alternative using logit transformation
Wald: Beta-binomial Wald test for overdispersed count data
Data frame with columns:
transcript: transcript identifier
mean_success_cond1/2: mean proportions in each condition
mean_diff: difference in proportions
log2FC: log2 fold change
pval: p-value from statistical test
padj: FDR-adjusted p-value
data(example_counts) data(example_annotation) # Run FracFixR results <- FracFixR(example_counts, example_annotation, parallel=FALSE) # Run differential testing diff_results <- DiffPropTest(results, Conditions = c("Control", "Treatment"), Types = "Heavy_Polysome", Test = "GLM")data(example_counts) data(example_annotation) # Run FracFixR results <- FracFixR(example_counts, example_annotation, parallel=FALSE) # Run differential testing diff_results <- DiffPropTest(results, Conditions = c("Control", "Treatment"), Types = "Heavy_Polysome", Test = "GLM")
A data frame containing sample annotations for the example_counts matrix. Describes the experimental design with conditions, fraction types, and replicates.
example_annotationexample_annotation
A data frame with 12 rows and 4 columns:
Sample identifier matching column names in example_counts
Experimental condition (Control or Treatment)
Fraction type (Total, Light_Polysome, or Heavy_Polysome)
Replicate identifier (Rep1 or Rep2)
Simulated data generated for package examples
data(example_annotation) head(example_annotation) table(example_annotation$Condition, example_annotation$Type)data(example_annotation) head(example_annotation) table(example_annotation$Condition, example_annotation$Type)
A matrix containing simulated RNA-seq counts for 100 genes across 12 samples. The data simulates a polysome profiling experiment with two conditions (Control and Treatment) and three fractions (Total, Light_Polysome, Heavy_Polysome).
example_countsexample_counts
A numeric matrix with 100 rows (genes) and 12 columns (samples):
Gene identifiers (Gene1 to Gene100)
Sample identifiers (Sample1 to Sample12)
Simulated data generated for package examples
data(example_counts) dim(example_counts) head(example_counts[, 1:6])data(example_counts) dim(example_counts) head(example_counts[, 1:6])
This is the core function that implements the FracFixR framework. It takes raw count data from total and fractionated samples and reconstructs the original fraction proportions through compositional modeling.
FracFixR(MatrixCounts, Annotation, st1 = 0.6, st2 = 0.999, parallel = TRUE)FracFixR(MatrixCounts, Annotation, st1 = 0.6, st2 = 0.999, parallel = TRUE)
MatrixCounts |
A numeric matrix of raw transcript/gene counts with:
|
Annotation |
A data.frame with required columns:
|
parallel |
A boolean indicating whether to use parallel processing of the transcripts (default=TRUE). |
st1 |
Lower quantile threshold (between 0 and 1) for selecting informative transcripts for the NNLS regression fit (default = 0.6). Transcripts below this quantile of Total abundance are considered too noisy for reliable regression. |
st2 |
Upper quantile threshold (between 0 and 1) for selecting informative transcripts for the NNLS regression fit (default = 0.999). Transcripts above this quantile are potential outliers and are excluded from the regression. |
The function works by:
Filtering transcripts based on presence in Total samples
For each condition and replicate, fitting NNLS regression
Estimating global fraction weights and individual transcript proportions
Calculating the "lost" unrecoverable fraction
A list containing:
OriginalData: filtered input count matrix
Annotation: input annotation data
Propestimates: matrix of proportion estimates (values between 0 and 1)
NewData: corrected count matrix (proportions multiplied by predicted total, rounded)
Coefficients: data.frame of regression coefficients
Fractions: data.frame of estimated fraction proportions
plots: list of diagnostic plots
Cleynen et al. FracFixR: A compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data.
# Load example data data(example_counts) data(example_annotation) # Run FracFixR results <- FracFixR(example_counts, example_annotation, parallel=FALSE) # View fraction proportions print(results$Fractions)# Load example data data(example_counts) data(example_annotation) # Run FracFixR results <- FracFixR(example_counts, example_annotation, parallel=FALSE) # View fraction proportions print(results$Fractions)
Returns the corrected count matrix embedded in a FracFixR() result
object. This matrix is computed internally by multiplying each transcript's
proportion estimate by the predicted total abundance for that replicate,
providing counts that are corrected for compositional bias while remaining
on the original count scale.
If you need to re-scale using the raw (observed) Total counts instead of the
NNLS-predicted totals, multiply fracfixr_results$Propestimates by the
corresponding column of fracfixr_results$OriginalData manually.
get_corrected_counts(fracfixr_results)get_corrected_counts(fracfixr_results)
fracfixr_results |
Output list from |
A numeric matrix with the same dimensions as
fracfixr_results$Propestimates. Non-Total columns contain corrected
counts (rounded integers, proportion estimate multiplied by the
NNLS-predicted total abundance). The Total column contains the
NNLS-predicted total abundance itself.
data(example_counts) data(example_annotation) results <- FracFixR(example_counts, example_annotation, parallel = FALSE) corrected <- get_corrected_counts(results) head(corrected)data(example_counts) data(example_annotation) results <- FracFixR(example_counts, example_annotation, parallel = FALSE) corrected <- get_corrected_counts(results) head(corrected)
Generates avolcano plot showing transcripts with significant differential proportions between conditions.
PlotComparison(DiffPropResult, Conditions = NULL, Types = NULL, cutoff = NULL)PlotComparison(DiffPropResult, Conditions = NULL, Types = NULL, cutoff = NULL)
DiffPropResult |
Output from DiffPropTest() function |
Conditions |
Character vector of conditions being compared |
Types |
Character vector of fraction types analyzed |
cutoff |
Optional y-axis maximum for plot |
Volcano plot-type object
data(example_counts) data(example_annotation) # Run FracFixR results <- FracFixR(example_counts, example_annotation,parallel=FALSE) # Run differential testing diff_results <- DiffPropTest(results, Conditions = c("Control", "Treatment"), Types = "Heavy_Polysome", Test = "GLM") # Create volcano plot volcano <- PlotComparison(diff_results, Conditions = c("Control", "Treatment"), Types = "Heavy_Polysome")data(example_counts) data(example_annotation) # Run FracFixR results <- FracFixR(example_counts, example_annotation,parallel=FALSE) # Run differential testing diff_results <- DiffPropTest(results, Conditions = c("Control", "Treatment"), Types = "Heavy_Polysome", Test = "GLM") # Create volcano plot volcano <- PlotComparison(diff_results, Conditions = c("Control", "Treatment"), Types = "Heavy_Polysome")
Creates a stacked bar plot showing the distribution of RNA across fractions for each replicate, including the "lost" fraction.
PlotFractions(FracFixed)PlotFractions(FracFixed)
FracFixed |
Output from FracFixR() function |
ggplot2 object showing fraction proportions
data(example_counts) data(example_annotation) # Run FracFixR results <- FracFixR(example_counts, example_annotation, parallel=FALSE) # Create fraction plot frac_plot <- PlotFractions(results) # Save plot with ggsave("fractions.pdf", frac_plot, width = 10, height = 8)data(example_counts) data(example_annotation) # Run FracFixR results <- FracFixR(example_counts, example_annotation, parallel=FALSE) # Create fraction plot frac_plot <- PlotFractions(results) # Save plot with ggsave("fractions.pdf", frac_plot, width = 10, height = 8)
An alternative annotation data frame for polysome profiling experiments with monosome and polysome fractions.
polysome_annotationpolysome_annotation
A data frame with 12 rows and 4 columns:
Sample identifier
Experimental condition (Control or Stress)
Fraction type (Total, Monosome, or Polysome)
Replicate identifier (Rep1 or Rep2)
Simulated data generated for package examples
data(polysome_annotation) head(polysome_annotation)data(polysome_annotation) head(polysome_annotation)
An annotation data frame for subcellular fractionation experiments with nuclear and cytoplasmic fractions.
subcellular_annotationsubcellular_annotation
A data frame with 12 rows and 4 columns:
Sample identifier
Experimental condition (WT or Mutant)
Fraction type (Total, Nuclear, or Cytoplasmic)
Replicate identifier (Rep1 or Rep2)
Simulated data generated for package examples
data(subcellular_annotation) head(subcellular_annotation)data(subcellular_annotation) head(subcellular_annotation)