| Title: | Visualizing Multi-Species Relative Synonymous Codon Usage and Extensible Data Exploration |
|---|---|
| Description: | Facilitates efficient visualization of Relative Synonymous Codon Usage patterns across species. Based on analytical outputs from 'codonW', 'MEGA', and 'Phylosuite', it supports multi-species 'RSCU' comparisons and allows users to explore visual analysis of structurally similar datasets. |
| Authors: | Xunhuan Ding [aut, cre, cph], Mingru Fu [ctb], Guojun Xing [ctb], Xinxin Ding [ctb], Xiaoli Liu [aut, ths], Tao Sun [aut, ths] |
| Maintainer: | Xunhuan Ding <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-05-21 08:15:27 UTC |
| Source: | https://github.com/cran/ggmRSCU |
A dataset containing projected energy production values under different scenarios from 2025 to 2050. The data includes projections for various energy sources.
data(energy_data)data(energy_data)
A data frame with 24 rows and 7 variables:
Integer year of projection (2025-2050)
Character: High, Low, Moderate, Reference
Numeric biomass energy proportion
Numeric coal energy proportion
Numeric electrolysis energy proportion
Numeric gas energy proportion
Numeric nuclear energy proportion
Data simulated from:
Ampah, J.D., Jin, C., Liu, H. et al. (2024) "Deployment expectations
of multi-gigatonne scale carbon removal could have adverse impacts on
Asia’s energy-water-land nexus" <doi:10.1038/s41467-024-50594-5>
Monthly financial data (income and expenses breakdown)
data(financial_data)data(financial_data)
A data frame with 66 rows and 4 variables:
Month of record (e.g., Jan, Feb)
Main category: Income or Expenses
Detailed type (e.g., Revenue, COGS, Payroll, R&D)
Amount in local currency
Self-created simulated data.
Region-specific genome size data for six hominid species across two haplotypes.
data(genomic_data)data(genomic_data)
Tibble with 62 rows × 4 columns:
Species code (PAB, PPY, GGO, HSA, PPA, PTR)
Haplotype (H1/H2)
Genomic region: Euchromatin, Centromere, Acrocentric, QH_region, etc.
Region size in megabases
Data simulated from:
Yoo, D., Rhie, A., Hebbar, P. et al. (2025) "Complete sequencing of ape genomes"
<doi:10.1038/s41586-025-08816-3>
This function generates a customized visualization of Relative Synonymous Codon Usage (RSCU) across distinct biological categories. It represents codon usage through color-coded blocks using species-specific shapes, and optionally incorporates a secondary plot displaying codon labels. Also applicable for extended visualizations of similar data structures (e.g., energy consumption data).
ggmRSCU( data, x, y, fill, rev = FALSE, theme = "theme1", shape = FALSE, size = 1, width = 0.1, sub_axis = "fill", merge = FALSE, border_color = NA, blocks_colors = NULL )ggmRSCU( data, x, y, fill, rev = FALSE, theme = "theme1", shape = FALSE, size = 1, width = 0.1, sub_axis = "fill", merge = FALSE, border_color = NA, blocks_colors = NULL )
data |
A data frame containing the input data. |
x |
Primary and secondary x-axis variables: a character vector (e.g., c("Group", "Subgroup")) or a single variable ("Group"). |
y |
Primary and secondary y-axis variables: a character vector (e.g., c("ValueVar", "Subgroup")) or a single variable ("ValueVar"). |
fill |
Variable for color filling. |
rev |
(Optional) logical value. If TRUE, the color order will be reversed. The default is FALSE. |
theme |
(Optional) A string specifying the color theme for the blocks. Available themes are: "theme1"-"theme8". |
shape |
(Optional) logical value. If TRUE, a vector of shape values for species in the plot is applied (default: FALSE). |
size |
(Optional) Size of shape markers (default: 1). |
width |
(Optional) Thickness of bar plot borders. (e.g., 0.1 for thin, 0.5 for thick). (Default: 0.1). |
sub_axis |
(Optional) Annotation type for secondary plot ("ysub" or "fill", default: "fill"). |
merge |
(Optional) logical value. If TRUE, the main plot will be merged with the secondary annotation plot. Default is FALSE. |
border_color |
(Optional) Color for bar borders (default: NA) |
blocks_colors |
(Optional) A custom color vector to fill the plot. If "NULL", the default color scheme will be used. |
Key features: - Hierarchical grouping with automatic position calculation - Flexible color management with 8 predefined themes or custom palettes - Dual-axis annotation system for biological information - Interactive subplot arrangements with smart label positioning
A ggplot object or patchwork composite plot when merge=TRUE
# Example 1: Basic Usage library(ggmRSCU) ggmRSCU( data = rscu_data, x = c(AA, Species), y = c(RSCU, Codon), fill = Fill, shape = TRUE, merge = TRUE ) # Example 2: Visualize RSCU from data loaded using `read_codonw` function # Load data from `read_codonw` output example_dir <- system.file("extdata", "codonw", package = "ggmRSCU") data <- read_codonw(example_dir) col <- c("#FF6F61", "#6B5B95", "#88B04B", "#F7CAC9", "#92A8D1", "#9b59b6") ggmRSCU( data = data, x = c(AA, Species), y = c(RSCU, Codon), fill = Fill, blocks_colors = col, shape = TRUE, merge = TRUE, sub_axis = "fill" ) # Example 3: Visualize RSCU from data loaded using `read_mega` function # Load data from `read_mega` output example_dir <- system.file("extdata", "mega", package = "ggmRSCU") df <- read_mega(example_dir) ggmRSCU( data = df, x = c(AA, Species), y = c(RSCU, Codon), fill = Fill ) # Example 4: Additional Dataset (scoring_data) col <- c("#4E79A7", "#A0CBE8", "#F28E2B", "#FFBE7D", "#59A14F", "#9b59b6") ggmRSCU( data = scoring_data, x = c(Sample, Replicate), y = c(Score, CellType), fill = CellType, blocks_colors = col, shape = TRUE, sub_axis = "Fill" ) # Example 5: Additional Dataset (hydrogen_data) library(tidyr) df_long <- hydrogen_data %>% pivot_longer(cols = -c(Year, type), names_to = "category", values_to = "value") col <- c("#4E79A7", "#A0CBE8", "#F28E2B", "#FFBE7D", "#59A14F", "#9b59b6") ggmRSCU( data = df_long, x = c(Year, category), y = c(value, type), fill = type, shape = TRUE, blocks_colors = col ) # Example 6: Additional Dataset (energy_data) long_data <- energy_data %>% pivot_longer(cols = 3:7, names_to = "Source", values_to = "Production") col <- c("#4E79A7", "#A0CBE8", "#F28E2B", "#FFBE7D", "#59A14F", "#9b59b6") ggmRSCU( data = long_data, x = c(Scenario, Year), y = c(Production, Source), fill = Source, blocks_colors = col, shape = TRUE ) # Example 7: Additional Dataset (financial_data) ggmRSCU( data = financial_data, x = c(Month, Category), y = c(Amount), fill = Subcategory ) # Example 8: Additional Dataset (genomic_data) ggmRSCU( data = genomic_data, x = c(Species, haplotypes), y = c(Mb), fill = Region, shape = TRUE )# Example 1: Basic Usage library(ggmRSCU) ggmRSCU( data = rscu_data, x = c(AA, Species), y = c(RSCU, Codon), fill = Fill, shape = TRUE, merge = TRUE ) # Example 2: Visualize RSCU from data loaded using `read_codonw` function # Load data from `read_codonw` output example_dir <- system.file("extdata", "codonw", package = "ggmRSCU") data <- read_codonw(example_dir) col <- c("#FF6F61", "#6B5B95", "#88B04B", "#F7CAC9", "#92A8D1", "#9b59b6") ggmRSCU( data = data, x = c(AA, Species), y = c(RSCU, Codon), fill = Fill, blocks_colors = col, shape = TRUE, merge = TRUE, sub_axis = "fill" ) # Example 3: Visualize RSCU from data loaded using `read_mega` function # Load data from `read_mega` output example_dir <- system.file("extdata", "mega", package = "ggmRSCU") df <- read_mega(example_dir) ggmRSCU( data = df, x = c(AA, Species), y = c(RSCU, Codon), fill = Fill ) # Example 4: Additional Dataset (scoring_data) col <- c("#4E79A7", "#A0CBE8", "#F28E2B", "#FFBE7D", "#59A14F", "#9b59b6") ggmRSCU( data = scoring_data, x = c(Sample, Replicate), y = c(Score, CellType), fill = CellType, blocks_colors = col, shape = TRUE, sub_axis = "Fill" ) # Example 5: Additional Dataset (hydrogen_data) library(tidyr) df_long <- hydrogen_data %>% pivot_longer(cols = -c(Year, type), names_to = "category", values_to = "value") col <- c("#4E79A7", "#A0CBE8", "#F28E2B", "#FFBE7D", "#59A14F", "#9b59b6") ggmRSCU( data = df_long, x = c(Year, category), y = c(value, type), fill = type, shape = TRUE, blocks_colors = col ) # Example 6: Additional Dataset (energy_data) long_data <- energy_data %>% pivot_longer(cols = 3:7, names_to = "Source", values_to = "Production") col <- c("#4E79A7", "#A0CBE8", "#F28E2B", "#FFBE7D", "#59A14F", "#9b59b6") ggmRSCU( data = long_data, x = c(Scenario, Year), y = c(Production, Source), fill = Source, blocks_colors = col, shape = TRUE ) # Example 7: Additional Dataset (financial_data) ggmRSCU( data = financial_data, x = c(Month, Category), y = c(Amount), fill = Subcategory ) # Example 8: Additional Dataset (genomic_data) ggmRSCU( data = genomic_data, x = c(Species, haplotypes), y = c(Mb), fill = Region, shape = TRUE )
Hydrogen production by technology and scenario (2010–2070)
data(hydrogen_data)data(hydrogen_data)
A data frame with 27 rows and 8 variables:
Year of production
Production technology (e.g., CG, SMR, EL, BG+CCS, etc.)
Production under the NPI scenario
Production under the C scenario
Production under the MNET scenario
Production under the gCCS scenario
Production under the PBIO scenario
Production under the all scenario
Data simulated from:
Zanon-Zotin, M., Baptista, L.B., Draeger, R. et al. (2024) "Unaddressed
non-energy use in the chemical industry can undermine fossil fuels phase-out"
<doi:10.1038/s41467-024-52434-y>
This function processes .blk files from the CodonW software. It reads the files, cleans the data, and returns a combined data frame.
read_codonw(folder_path)read_codonw(folder_path)
folder_path |
The path to the folder containing .blk files to process. |
A combined data frame of all processed .blk files with columns:
AA |
Amino acid abbreviation |
Codon |
DNA codon sequence |
RSCU |
Relative Synonymous Codon Usage value |
Fill |
Position index within amino acid group |
Species |
Species name derived from file name |
# Using example data example_dir <- system.file("extdata", "codonw", package = "ggmRSCU") result <- read_codonw(example_dir) head(result)# Using example data example_dir <- system.file("extdata", "codonw", package = "ggmRSCU") result <- read_codonw(example_dir) head(result)
Reads and processes MEGA-generated CSV files from a specified directory, extracts codon usage data, and returns a consolidated dataset.
read_mega(folder_path)read_mega(folder_path)
folder_path |
Path to directory containing MEGA CSV files. Files should follow MEGA format with columns for Codon, Count, and RSCU values. |
Processes files through these steps:
Identifies valid header lines containing "Codon", "Count", and "RSCU"
Extracts codon/amino acid data below headers
Calculates position indices (Fill) for synonymous codons
Combines all species data into a single data frame
A data frame containing:
Amino acid abbreviation
DNA codon sequence
Position index within amino acid group
Species identifier from file name
Calculated Relative Synonymous Codon Usage value
# Using example data example_dir <- system.file("extdata", "mega", package = "ggmRSCU") result <- read_mega(example_dir) head(result)# Using example data example_dir <- system.file("extdata", "mega", package = "ggmRSCU") result <- read_mega(example_dir) head(result)
Processes Relative Synonymous Codon Usage (RSCU) data from PhyloSuite CSV files, preparing it for visualization with the 'ggmRSCU' package.
read_phy(folder_path)read_phy(folder_path)
folder_path |
Path to folder containing CSV files. Files must contain: - 'RSCU' column - Species-reflecting file names - Required columns: AA, Codon, Fill, RSCU |
Performs:
File validation for required structure
Species name extraction
Data reshaping to long format
NA removal
Visualization-ready data frame with:
Amino acid code
RNA triplet
Position index (1-6)
Organism identifier
Normalized usage values
# Using example data example_dir <- system.file("extdata", "phy", package = "ggmRSCU") result <- read_phy(example_dir) head(result)# Using example data example_dir <- system.file("extdata", "phy", package = "ggmRSCU") result <- read_phy(example_dir) head(result)
A longitudinal dataset documenting codon usage bias patterns across multiple species. Contains normalized RSCU values calculated from protein- coding sequences.
data(rscu_data)data(rscu_data)
A tibble with 186 rows and 5 columns:
character Standard three-letter amino acid code with
isotype variants indicating synonymous codon groups: e.g.,
"Phe", "Leu1", "Leu2"
character RNA codon triplet in uppercase:
e.g., "UUU", "CUA"
integer Group index classifying synonymous codons
for each amino acid (e.g., 1, 2, etc.)
character Organism identifier in
[Accession].[Version] format: e.g.,
"MK693137.1"
numeric Relative Synonymous Codon Usage value
Data derived from previously published studies based on protein- coding sequences analyzed in our laboratory. Accession numbers correspond to NCBI GenBank.
Simulated dataset containing normalized immune cell type scores across biological samples with technical replicates.
data(scoring_data)data(scoring_data)
A data frame with 207 rows and 5 variables:
Character vector indicating sample groups (e.g., "A-CP4-T", "CP4", "NTC")
Character vector identifying technical replicates (Rep1-Rep6)
Numeric total score per replicate for normalization
Character vector specifying immune cell types
Numeric normalized cell type score
Self-created simulated data.