| Title: | Ultrahigh-Resolution Mass Spectrometry Data Evaluation for Complex Organic Matter |
|---|---|
| Description: | Provides tools for assigning molecular formulas from exact masses obtained by ultrahigh-resolution mass spectrometry. The methodology follows the workflow described in Leefmann et al. (2019) <doi:10.1002/rcm.8315>. The package supports the inspection, filtering and visualization of molecular formula data and includes utilities for calculating common molecular parameters (e.g., double bond equivalents, DBE). A graphical user interface is available via the 'shiny'-based 'ume' application. |
| Authors: | Boris Koch [aut, cre] (ORCID: <https://orcid.org/0000-0002-8453-731X>), Stephan Frickenhaus [ctb] (ORCID: <https://orcid.org/0000-0002-0356-9791>), Oliver Lechtenfeld [ctb] (ORCID: <https://orcid.org/0000-0001-5313-6014>), Tim Leefmann [ctb] (ORCID: <https://orcid.org/0000-0002-5784-8657>), Fabian Moye [ctb] (ORCID: <https://orcid.org/0000-0002-4632-5033>) |
| Maintainer: | Boris Koch <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.6.1 |
| Built: | 2026-05-09 17:52:38 UTC |
| Source: | https://github.com/cran/ume |
Annotate molecular formulas categories using ume::known_mf.
Join molecular formula data and metadata about known formulas
(e.g. annotate carboxylic-rich alicyclic molecules; CRAM).
The name of the molecular formula column will be set to "mf".
This function works with:
a vector of molecular formulas: returns a 2-column data.table(mf, categories)
a data.table with a formula column: returns the table with an added categories column
add_known_mf(mfd, mf_col = "mf", known_mf = ume::known_mf, wide = FALSE, ...)add_known_mf(mfd, mf_col = "mf", known_mf = ume::known_mf, wide = FALSE, ...)
mfd |
Either (1) a character vector of molecular formulas, or (2) a data.frame / data.table containing such a column. |
mf_col |
Name of the molecular formula column if |
known_mf |
data.table with known molecular formulas ( |
wide |
Logical. If TRUE, return one column per category (CRAM, surfactant, ...).
If FALSE (default), return only a single |
... |
Additional arguments passed to methods. |
A data.table containing additional columns having information on formula categories
Boris P. Koch
CRAM Hertkorn N., Benner R., Frommberger M., Schmitt-Kopplin P., Witt M., Kaiser K., Kettrup A., Hedges J.I. (2006). Characterization of a major refractory component of marine dissolved organic matter. Geochimica et Cosmochimica Acta, 70, 2990-3010. doi:10.1016/j.gca.2006.03.021 Surfactants Lechtenfeld O.J., Koch B.P., Gasparovic B., Frka S., Witt M., Kattner G. (2013). The influence of salinity on the molecular and optical properties of surface microlayers in a karstic estuary. Marine Chemistry, 150, 25-38. doi:10.1016/j.marchem.2013.01.006
Ideg Flerus R., Lechtenfeld O.J., Koch B.P., McCallister S.L., Schmitt-Kopplin P., Benner R., Kaiser K., Kattner G. (2012). A molecular perspective on the ageing of marine dissolved organic matter. Biogeosciences, 9, 1935-1955. doi:10.5194/bg-9-1935-2012
iTerr Medeiros P.M., Seidel M., Niggemann J., Spencer R.G.M., Hernes P.J., Yager P.L., Miller W.L., Dittmar T., Hansell D.A. (2016). A novel molecular approach for tracing terrigenous dissolved organic matter into the deep ocean. Global Biogeochemical Cycles, 30, 689-699. doi:10.1002/2015gb005320
Other Formula assignment:
calc_eval_params(),
check_formula_library(),
eval_isotopes(),
ume_assign_formulas()
add_known_mf(mfd = mf_data_demo)add_known_mf(mfd = mf_data_demo)
This function ensures that missing isotope columns are added to the input data table (mfd), which is required for further data evaluation that considers isotope information. If any of the specified isotope columns are not already present in the data, they will be added with a default value of 0.
The function is typically used to standardize the dataset by ensuring that all expected isotopes (e.g., nitrogen-15, carbon-13) are represented, even if they are not initially present in the data. The function works by checking for the existence of each specified isotope column and adding the missing ones.
add_missing_element_columns(mfd, missing_cols = "15n")add_missing_element_columns(mfd, missing_cols = "15n")
mfd |
data.table with molecular formula data as derived from
|
missing_cols |
A character vector of isotope column names that should be checked and added if missing. By default, it includes |
A data.table object with the missing isotope columns added,
where missing columns are populated with a default value of 0.
The original mfd object is modified in place.
Other tools:
order_columns()
# Add missing isotope columns to a demo dataset mfd_with_isotopes <- add_missing_element_columns(mfd = mf_data_demo) # Add a specific isotope column for Nitrogen-15 (if missing) mfd_with_15n <- add_missing_element_columns(mfd = mf_data_demo, missing_cols = c("15n", "na"))# Add missing isotope columns to a demo dataset mfd_with_isotopes <- add_missing_element_columns(mfd = mf_data_demo) # Add a specific isotope column for Nitrogen-15 (if missing) mfd_with_15n <- add_missing_element_columns(mfd = mf_data_demo, missing_cols = c("15n", "na"))
Flexible entry point for UME. Accepts:
data.frame / data.table peaklists
numeric m/z vectors
file paths (csv, txt, tsv, rds)
Normalizes column names, adds missing structural columns (file_id, peak_id),
removes invalid rows, validates schema, and assigns the UME peaklist class.
Creates a standardized data.table ready for formula assignment.
as_peaklist(pl, verbose = FALSE, track_original_names = TRUE, ...)as_peaklist(pl, verbose = FALSE, track_original_names = TRUE, ...)
pl |
Input object representing a peaklist. Can be:
|
verbose |
logical; if |
track_original_names |
Logical (default: TRUE). If TRUE,
|
... |
Reserved for future extensions. |
A validated and normalized peaklist as a data.table
with class "ume_peaklist".
Other check ume objects:
check_formula_library(),
check_mfd()
Assigns molecular formulas to molecular masses using a predefined library.
Input of the peaklist (pl) is internally checked as_peaklist(),
converted to neutral masses calc_neutral_mass(), and assigned with
molecular formulas based on the mass accuracy (ma_dev) provided calc_ma_abs().
The input can be either:
A peaklist (data.table) containing m/z values or neutral masses
and additional metadata
.
A numeric vector of m/z values or neutral masses without additional metadata
(internally checked and standardized by as_peaklist()).
assign_formulas(pl, formula_library, verbose = FALSE, ...)assign_formulas(pl, formula_library, verbose = FALSE, ...)
pl |
Either a peaklist ( |
formula_library |
Molecular formula library: a predefined data.table used for
assigning molecular formulas to a peak list and for mass calibration. The library
requires a fixed format, including mass values for matching. Predefined libraries
are available in the R package ume.formulas and further described in
Leefmann et al. (2019). A standard library for marine dissolved organic matter is
|
verbose |
logical; if |
... |
Arguments passed on to
|
This function calculates the neutral mass of peaks in pl and
compares it to mass values in formula_library, assigning molecular formulas
based on mass accuracy thresholds. If 13C, 15N, or 34S isotope information
is missing, additional columns are added to the output table.
A data.table where each row represents a molecular formula assigned to a
mass peak. The table contains:
All columns of the input peaklist pl (e.g. mz, i_magnitude, file_id).
All columns of the input formula_library (e.g. mf, element counts).
Calculated columns:
m — neutral mass.
m_cal — exact mass of the assigned formula.
del — absolute mass error (Da).
ppm — mass error in parts per million.
mf_id — unique ID for each (file_id, mf) combination.
Added isotope columns (13C, 15N, 34S) if missing in the library.
One peak may receive zero, one, or multiple assigned formulas depending on the mass accuracy threshold.
Boris P. Koch
# Example using demo data and demo peak list: assign_formulas(pl = peaklist_demo, formula_library = ume::lib_demo, pol = "neg", ma_dev = 0.2, verbose = FALSE) # Example using a given mass and UME demo library: mfd <- assign_formulas(pl = 254.0426527, formula_library = ume::lib_demo, pol = "neutral", ma_dev = 0.5, verbose = TRUE)# Example using demo data and demo peak list: assign_formulas(pl = peaklist_demo, formula_library = ume::lib_demo, pol = "neg", ma_dev = 0.2, verbose = FALSE) # Example using a given mass and UME demo library: mfd <- assign_formulas(pl = 254.0426527, formula_library = ume::lib_demo, pol = "neutral", ma_dev = 0.5, verbose = TRUE)
Generates a data summary table that provides intensity-weighted averages for element ratios, mass accuracy, and additional parameters. Results can be grouped based on the specified grouping columns.
calc_data_summary(mfd, grp = "file_id", ...)calc_data_summary(mfd, grp = "file_id", ...)
mfd |
data.table with molecular formula data as derived from
|
grp |
Character vector. Names of columns (e.g., sample or file identifiers) used to aggregate results. |
... |
Additional arguments passed to methods. |
This function computes a variety of weighted averages and summary statistics for mass spectrometry data
using the provided peak list (mfd). Calculated values include weighted averages for elemental counts
(e.g., Carbon, Hydrogen), elemental ratios (e.g., O/C, H/C), and additional parameters such as the base peak intensity
and summed intensities. It also calculates the aromaticity index (wa(AI)) based on the elemental composition.
If grouping columns are provided, the summary statistics are calculated for each group.
The function also joins additional indices (ideg, iterr) from related functions calc_ideg() and calc_iterr()
to the final summary table.
A data.table containing the summarized results, with columns including:
Number of molecular formulas per group.
Median accuracy in parts-per-million (ppm) for the identified peaks.
Maximum ppm accuracy within a three-sigma range.
Weighted average m/z value.
Weighted average Double Bond Equivalent (DBE).
Weighted averages for elements (C, H, N, O, P, S) and ratios (O/C, H/C, N/C, S/C).
Weighted average nominal oxidation state of carbon.
Weighted average Gibbs free energy (Cox) in kJ/mol.
Weighted average aromaticity index.
Ratios derived from N/C and S/C.
Indices for degree of identification, as calculated by calc_ideg().
Iteration error indices from calc_iterr().
Median intensity value.
Intensity of the base peak.
Summed intensity of all peaks.
Other calculations:
calc_dbe(),
calc_eval_params(),
calc_exact_mass(),
calc_ideg(),
calc_ma(),
calc_neutral_mass(),
calc_nm(),
calc_norm_int(),
calc_number_assignment(),
calc_number_occurrence(),
calc_recalibrate_ms()
# Example using demo data, grouping by file ID calc_data_summary(mfd = mf_data_demo, grp = c("file_id"))# Example using demo data, grouping by file ID calc_data_summary(mfd = mf_data_demo, grp = c("file_id"))
Calculates the Double Bond Equivalent (DBE) for a given
neutral molecular formula.
DBE is a measure of unsaturation, representing the total number of rings
and pi bonds in a molecule.
The function uses the ume::masses data table to determine valence information
for each element in the input molecular formula.
#'
It can be calculated from the molecular formula using atomic valences:
where:
: number of atoms of element
: valence of element (e.g., C = 4, H = 1,
N = 3, O = 2, S = 2/4/6 depending on bonding state)
This formula works for any set of elements as long as their valence
is known. Be aware that some elements can have more than one valence
at normal conditions (e.g. Sulfur can have valences of 2, 4 and 6).
The function uses the valence that is represented in ume:masses$valence.
For a reasonable neutral molecule DBE has an integer value >=0. A higher DBE indicates a more unsaturated structure; a lower DBE indicates a more saturated structure.
calc_dbe(mfd, masses = ume::masses, verbose = FALSE, ...)calc_dbe(mfd, masses = ume::masses, verbose = FALSE, ...)
mfd |
data.table with molecular formula data as derived from
|
masses |
A data.table. Defaults to |
verbose |
logical; if |
... |
Additional arguments passed to methods. |
This function computes DBE based on the molecular formula specified in mfd.
mfd can be a data.table or a character string or character vector of molecular formula strings.
For each isotope in the formula, DBE is calculated as the sum of (valence - 2) multiplied by the count of that isotope, divided by 2, and then adding 1. Elements with a valence of 2 are excluded from the DBE calculation.
The function stops with an informative error if valence information is missing
for any element or isotope present in mfd.
A numeric vector of the same length as the number of rows in mfd,
where each entry represents the calculated DBE for the corresponding molecular formula.
The result vector is named 'dbe'.
Other calculations:
calc_data_summary(),
calc_eval_params(),
calc_exact_mass(),
calc_ideg(),
calc_ma(),
calc_neutral_mass(),
calc_nm(),
calc_norm_int(),
calc_number_assignment(),
calc_number_occurrence(),
calc_recalibrate_ms()
# Example with user-defined data calc_dbe("C6H10O6") calc_dbe("C6H10Br2") calc_dbe(c("C3[13C1]H10O4", "C6H10O6")) # Example with demo data from UME package calc_dbe(mfd = mf_data_demo)# Example with user-defined data calc_dbe("C6H10O6") calc_dbe("C6H10Br2") calc_dbe(c("C3[13C1]H10O4", "C6H10O6")) # Example with demo data from UME package calc_dbe(mfd = mf_data_demo)
This function calculates and adds several evaluation parameters as additional columns to the mfd data table.
These parameters are essential for evaluating the molecular structure and isotopic distribution, enabling further analysis.
For a detailed description of the output table, see help(mf_data_demo).
calc_eval_params(mfd, verbose = FALSE, ...)calc_eval_params(mfd, verbose = FALSE, ...)
mfd |
data.table with molecular formula data as derived from
|
verbose |
logical; if |
... |
Additional arguments passed to methods. |
The original data.table mfd with additional evaluation columns:
nmNominal molecular mass: Calculated if not already present.
dbeDouble Bond Equivalent (measure of unsaturation).
kmdKendrick mass defect for CH4 versus O exchange.
O/C, H/C, N/C, S/C
Element ratios for a molecular formula.
nsp_type, snp_check
Types of combinations of N, S, and P atoms in a formula.
noscWeighted average nominal oxidation state of carbon.
delG0_CoxWeighted average Gibbs free energy (Cox) in kJ/mol.
aiAromaticity index.
ppm_filtA mass accuracy threshold calculated for each spectrum.
Boris P. Koch
Hughey C.A., Hendrickson C.L., Rodgers R.P., Marshall A.G., Qian K.N. (2001). Kendrick mass defect spectrum: A compact visual analysis for ultrahigh-resolution broadband mass spectra. Analytical Chemistry, 73, 4676-4681. doi:10.1021/ac010560w
Koch B.P., Dittmar T. (2006). From mass to structure: an aromaticity index for high-resolution mass data of natural organic matter. Rapid Communications in Mass Spectrometry, 20, 926-932. doi:10.1002/rcm.2386
LaRowe D.E., Van Cappellen P. (2011). Degradation of natural organic matter: A thermodynamic analysis. Geochimica et Cosmochimica Acta, 75, 2030-2042. doi:10.1016/j.gca.2011.01.020
Other Formula assignment:
add_known_mf(),
check_formula_library(),
eval_isotopes(),
ume_assign_formulas()
Other calculations:
calc_data_summary(),
calc_dbe(),
calc_exact_mass(),
calc_ideg(),
calc_ma(),
calc_neutral_mass(),
calc_nm(),
calc_norm_int(),
calc_number_assignment(),
calc_number_occurrence(),
calc_recalibrate_ms()
# Example usage with a demo molecular formula dataset mfd_with_params <- calc_eval_params(mfd = mf_data_demo, verbose = TRUE)# Example usage with a demo molecular formula dataset mfd_with_params <- calc_eval_params(mfd = mf_data_demo, verbose = TRUE)
This function calculates the exact monoisotopic mass for each molecule
in a given data table based on the specified isotope composition. Exact masses of
elements and isotopes used in the calculation are retrieved from the ume::masses data,
based on data from NIST (https://www.nist.gov/pml/atomic-weights-and-isotopic-compositions-relative-atomic-masses).
calc_exact_mass(mfd, ...)calc_exact_mass(mfd, ...)
mfd |
data.table with molecular formula data as derived from
|
... |
Additional arguments passed to methods. |
A numeric vector of the calculated exact monoisotopic mass.
Boris P. Koch
Other calculations:
calc_data_summary(),
calc_dbe(),
calc_eval_params(),
calc_ideg(),
calc_ma(),
calc_neutral_mass(),
calc_nm(),
calc_norm_int(),
calc_number_assignment(),
calc_number_occurrence(),
calc_recalibrate_ms()
# Example with demo data calc_exact_mass(mfd = mf_data_demo) # Custom example calc_exact_mass(data.table::data.table(c = 3, h = 8, o = 1))# Example with demo data calc_exact_mass(mfd = mf_data_demo) # Custom example calc_exact_mass(data.table::data.table(c = 3, h = 8, o = 1))
This function calculates the degradation index ('Ideg') following Flerus et al. (2012). High Ideg values indicate 'older' marine DOM (i.e., a higher contribution of peaks that correlate negatively with delta14C), while low values indicate 'younger' DOM (i.e., a higher contribution of peaks that correlate positively with delta14C)./
Ideg is computed as the ratio of summed magnitudes for five negative (NEG) molecular formulas to the total summed magnitudes of five positive (POS) and five negative (NEG) molecular formulas:
The index ranges from 0 to 1 and is valid only if all required formulas (n = 10) are present. Ideg depends strongly on the type of sample preparation, ionization method, and instrument settings, and should only be interpreted for relative changes within the same dataset.
calc_ideg( mfd, mf_col = "mf", magnitude_col = "i_magnitude", grp = "file_id", ... )calc_ideg( mfd, mf_col = "mf", magnitude_col = "i_magnitude", grp = "file_id", ... )
mfd |
data.table with molecular formula data as derived from
|
mf_col |
Character. The name of the column containing molecular formulas. Default is "mf". |
magnitude_col |
Character. The name of the column containing magnitude values (absolute or relative). Default is "i_magnitude". |
grp |
Character vector. Names of columns (e.g., sample or file identifiers) used to aggregate results. |
... |
Additional arguments passed to methods. |
A data.table with columns:
grp: Grouping variable.
ideg: Calculated degradation index (rounded to 3 decimals).
ideg_n: Number of assigned formulas used in the calculation.
Other calculations:
calc_data_summary(),
calc_dbe(),
calc_eval_params(),
calc_exact_mass(),
calc_ma(),
calc_neutral_mass(),
calc_nm(),
calc_norm_int(),
calc_number_assignment(),
calc_number_occurrence(),
calc_recalibrate_ms()
# Create a minimal dataset containing all required POS and NEG formulas library(data.table) demo_ideg <- data.table( file_id = 1, mf = c( "C17H20O9", "C19H22O10", "C20H22O10", "C20H24O11", "C21H26O11", # NEG "C13H18O7", "C14H20O7", "C15H22O7", "C15H22O8", "C16H24O8" # POS ), i_magnitude = c( 1200, 900, 1500, 700, 800, # NEG intensities 2000, 1800, 2200, 1600, 1900 # POS intensities ) ) calc_ideg( mfd = demo_ideg, mf_col = "mf", magnitude_col = "i_magnitude", grp = "file_id" )# Create a minimal dataset containing all required POS and NEG formulas library(data.table) demo_ideg <- data.table( file_id = 1, mf = c( "C17H20O9", "C19H22O10", "C20H22O10", "C20H24O11", "C21H26O11", # NEG "C13H18O7", "C14H20O7", "C15H22O7", "C15H22O8", "C16H24O8" # POS ), i_magnitude = c( 1200, 900, 1500, 700, 800, # NEG intensities 2000, 1800, 2200, 1600, 1900 # POS intensities ) ) calc_ideg( mfd = demo_ideg, mf_col = "mf", magnitude_col = "i_magnitude", grp = "file_id" )
Calculates the theoretical isotope pattern of a molecular formula based on natural isotope abundances using multinomial/binomial isotope combinations.
calc_isotope_pattern( mf, masses = ume::masses, threshold = 1e-12, rel_threshold = 1e-06, max_peaks = 5000L, mass_digits = 6L )calc_isotope_pattern( mf, masses = ume::masses, threshold = 1e-12, rel_threshold = 1e-06, max_peaks = 5000L, mass_digits = 6L )
mf |
A character vector of molecular formulas or a |
masses |
A data.table. Defaults to |
threshold |
Numeric. Minimum absolute isotope probability retained during intermediate calculations. |
rel_threshold |
Numeric. Minimum relative abundance retained in the final isotope pattern. |
max_peaks |
Integer. Maximum number of isotope peaks retained during intermediate calculations. |
mass_digits |
Integer. Number of decimal places used to merge nearly identical masses during intermediate calculations. |
Calculate Theoretical Isotope Pattern
The function calculates all relevant isotope combinations for each element in a molecular formula and combines them into a theoretical isotope pattern.
For each isotope peak, the function returns the exact mass, nominal mass,
absolute probability, relative abundance, elemental molecular formula (mf),
and isotope-specific molecular formula (mf_iso).
The isotope-specific molecular formula uses bracket notation, for example
[12C2][13C][1H6][16O].
Very small isotope peaks can be removed using threshold and
rel_threshold to keep the output compact.
A data.table with one row per isotope peak and the following columns:
Elemental molecular formula.
Isotope-specific molecular formula.
Exact mass of the isotope composition.
Nominal mass of the isotope composition.
Absolute probability of the isotope composition.
Relative abundance normalized to the most abundant isotope peak.
Peak number ordered by increasing mass.
Other isotopes:
create_isotope_expanded_table(),
eval_isotopes(),
uplot_isotope_precision()
calc_isotope_pattern("C2H6O") calc_isotope_pattern("FeC10H10", rel_threshold = 1e-4)calc_isotope_pattern("C2H6O") calc_isotope_pattern("FeC10H10", rel_threshold = 1e-4)
Calculate a degradation index 'Iterr' and modified index 'iterr2' after Medeiros et al. (2016). High Iterr values represent higher contribution of terrestrial material (i.e. higher contribution of peaks that correlate positively with delta13C) while low values represent less terrestrial material (i.e. higher contribution of peaks that correlate negatively with delta13C). Iterr / Iterr2 are calculated from a peak magnitude ratio of 50 or 5 POS and NEG formulas, respectively: sum(terr) / (sum(terr) + sum(marine)) Therefore Iterr / Iterr2 range between 1 and 0. It should be noted that absolute values strongly depend on factors such as type of solid phase extraction, ionization method, instrument settings etc. Therefore values can only be interpreted as relative changes. It should also be noted that for an appropriate evaluation ALL index formulas must be present.
calc_iterr( mfd, mf_col = "mf", magnitude_col = "i_magnitude", grp = "file_id", ... )calc_iterr( mfd, mf_col = "mf", magnitude_col = "i_magnitude", grp = "file_id", ... )
mfd |
data.table with molecular formula data as derived from
|
mf_col |
Name of the column containing molecular formulas (string) |
magnitude_col |
Name of the column containing absolute or relative mass peak magnitudes (string). |
grp |
Character vector. Names of columns (e.g., sample or file identifiers) used to aggregate results. |
... |
Additional arguments passed to methods. |
Iterr and iterr2 values
Medeiros P.M., Seidel M., Niggemann J., Spencer R.G.M., Hernes P.J., Yager P.L., Miller W.L., Dittmar T., Hansell D.A. (2016). A novel molecular approach for tracing terrigenous dissolved organic matter into the deep ocean. Global Biogeochemical Cycles, 30, 689-699. doi:10.1002/2015gb005320
library(data.table) # Create a minimal dataset containing all required # POS, NEG, POS2, and NEG2 formulas for demonstration demo_iterr <- data.table( file_id = 1, mf = c( # NEG (Iterr) 'C13H12O5','C15H14O4','C14H12O5','C14H14O5','C13H12O6', 'C16H16O4','C15H14O5','C14H12O6','C15H16O5','C14H14O6', 'C16H14O5','C16H16O5','C15H14O6','C15H16O6','C14H14O7', 'C17H16O5','C16H14O6','C17H18O5','C16H16O6','C15H14O7', 'C17H16O6','C16H14O7','C18H18O6','C17H16O7','C17H18O7', 'C18H16O7','C18H18O7','C17H16O8','C19H18O7','C20H20O7', 'C19H18O8','C20H18O9','C19H16O10','C21H20O9','C20H18O10', 'C22H22O9','C21H20O10','C23H22O10','C24H24O10','C25H26O10', # POS (Iterr) 'C15H19NO6','C15H21NO6','C17H21NO7','C17H23NO7','C17H22O8', 'C16H21NO8','C17H20N2O7','C17H19NO8','C18H23NO7','C17H21NO8', 'C18H24O8','C16H19NO9','C17H23NO8','C17H22O9','C17H24O9', 'C18H21NO8','C17H19NO9','C18H23NO8','C18H22O9','C17H21NO9', 'C18H24O9','C18H20N2O8','C18H21NO9','C19H24O9','C18H23NO9', 'C18H22O10','C18H24O10','C20H24O9','C19H22O10','C20H26O9', 'C19H24O10','C19H26O10','C20H24O10','C20H26O10','C19H24O11', 'C20H24O11','C20H26O11','C20H26O12','C22H28O11','C21H28O12', # NEG2 (Iterr2) 'C17H18O7','C18H18O7','C17H16O7','C17H16O8','C15H16O6', # POS2 (Iterr2) 'C20H24O9','C20H24O10','C19H22O10','C17H21NO8','C20H26O9' ), # Assign magnitude values (arbitrary but valid) i_magnitude = c( rep(1000, 40), # NEG rep(2000, 40), # POS rep(1500, 5), # NEG2 rep(1800, 5) # POS2 ) ) calc_iterr( mfd = demo_iterr, mf_col = "mf", magnitude_col = "i_magnitude", grp = "file_id" )library(data.table) # Create a minimal dataset containing all required # POS, NEG, POS2, and NEG2 formulas for demonstration demo_iterr <- data.table( file_id = 1, mf = c( # NEG (Iterr) 'C13H12O5','C15H14O4','C14H12O5','C14H14O5','C13H12O6', 'C16H16O4','C15H14O5','C14H12O6','C15H16O5','C14H14O6', 'C16H14O5','C16H16O5','C15H14O6','C15H16O6','C14H14O7', 'C17H16O5','C16H14O6','C17H18O5','C16H16O6','C15H14O7', 'C17H16O6','C16H14O7','C18H18O6','C17H16O7','C17H18O7', 'C18H16O7','C18H18O7','C17H16O8','C19H18O7','C20H20O7', 'C19H18O8','C20H18O9','C19H16O10','C21H20O9','C20H18O10', 'C22H22O9','C21H20O10','C23H22O10','C24H24O10','C25H26O10', # POS (Iterr) 'C15H19NO6','C15H21NO6','C17H21NO7','C17H23NO7','C17H22O8', 'C16H21NO8','C17H20N2O7','C17H19NO8','C18H23NO7','C17H21NO8', 'C18H24O8','C16H19NO9','C17H23NO8','C17H22O9','C17H24O9', 'C18H21NO8','C17H19NO9','C18H23NO8','C18H22O9','C17H21NO9', 'C18H24O9','C18H20N2O8','C18H21NO9','C19H24O9','C18H23NO9', 'C18H22O10','C18H24O10','C20H24O9','C19H22O10','C20H26O9', 'C19H24O10','C19H26O10','C20H24O10','C20H26O10','C19H24O11', 'C20H24O11','C20H26O11','C20H26O12','C22H28O11','C21H28O12', # NEG2 (Iterr2) 'C17H18O7','C18H18O7','C17H16O7','C17H16O8','C15H16O6', # POS2 (Iterr2) 'C20H24O9','C20H24O10','C19H22O10','C17H21NO8','C20H26O9' ), # Assign magnitude values (arbitrary but valid) i_magnitude = c( rep(1000, 40), # NEG rep(2000, 40), # POS rep(1500, 5), # NEG2 rep(1800, 5) # POS2 ) ) calc_iterr( mfd = demo_iterr, mf_col = "mf", magnitude_col = "i_magnitude", grp = "file_id" )
Calculates relative mass accuracy (ma, in parts per million) as:
where:
= measured mass
= calculated / theoretical (exact) mass
Returned value is rounded to 4 digits. In this context the theoretical mass is represented by the mass of the assigned molecular formula. A small absolute ppm value indicates a very precise measurement and increases confidence in correct molecular formula assignment.
calc_ma(m, m_cal, ...)calc_ma(m, m_cal, ...)
m |
Measured mass |
m_cal |
Calculated (theoretical) mass. |
... |
Additional arguments passed to methods. |
A numeric vector of mass accuracy (rounded to 4 decimals).
Other calculations:
calc_data_summary(),
calc_dbe(),
calc_eval_params(),
calc_exact_mass(),
calc_ideg(),
calc_neutral_mass(),
calc_nm(),
calc_norm_int(),
calc_number_assignment(),
calc_number_occurrence(),
calc_recalibrate_ms()
# Use of single values calc_ma(m = 264.08641, m_cal = 264.08653) # Use in a molecular formula table calc_ma(m = mf_data_demo$m, m_cal = mf_data_demo$m_cal) mf_data_demo[, .(m, m_cal, accuracy_in_ppm = calc_ma(m, m_cal))]# Use of single values calc_ma(m = 264.08641, m_cal = 264.08653) # Use in a molecular formula table calc_ma(m = mf_data_demo$m, m_cal = mf_data_demo$m_cal) mf_data_demo[, .(m, m_cal, accuracy_in_ppm = calc_ma(m, m_cal))]
This function calculates the absolute mass accuracy range for a neutral mass (m) at a given a mass accuracy (ma_dev).
calc_ma_abs(m, ma_dev, ...)calc_ma_abs(m, ma_dev, ...)
m |
Measured mass |
ma_dev |
Mass accuracy in +/- parts per million (ppm) |
... |
Additional arguments passed to methods. |
Returns a list with two values: m_min, m_max
calc_ma_abs(m = 327.0134, ma_dev = 0.5)calc_ma_abs(m = 327.0134, ma_dev = 0.5)
Calculates neutral molecular masses for singly charged ions with full numerical precision. No user options are modified.
The conversion used is:
negative mode: m = mz + 1.0072763
positive mode: m = mz - 1.0072763
neutral: m = mz
calc_neutral_mass(mz, pol = c("neg", "pos", "neutral"), ...)calc_neutral_mass(mz, pol = c("neg", "pos", "neutral"), ...)
mz |
Numeric vector of m/z values (> 0). |
pol |
Character: |
... |
Additional arguments passed to methods. |
Numeric vector of neutral masses.
Other calculations:
calc_data_summary(),
calc_dbe(),
calc_eval_params(),
calc_exact_mass(),
calc_ideg(),
calc_ma(),
calc_nm(),
calc_norm_int(),
calc_number_assignment(),
calc_number_occurrence(),
calc_recalibrate_ms()
calc_neutral_mass(199.32, pol = "neg")calc_neutral_mass(199.32, pol = "neg")
Computes the nominal mass (integer mass) for each molecular formula in the provided data.
This function uses isotope masses stored in the dataset ume::masses, based on values from NIST,
for accurate calculation of each element's nominal mass contribution.
calc_nm(mfd, ...)calc_nm(mfd, ...)
mfd |
data.table with molecular formula data as derived from
|
... |
Additional arguments passed to methods. |
The function calculates the nominal mass of each molecular formula by retrieving the relevant
integer mass values of isotopes from ume::masses. This information is processed to create a calculation
string which is then evaluated to obtain the nominal mass for each molecule.
The nominal mass is derived by summing the integer masses of each constituent element in the formula, where the integer mass for each element is multiplied by the number of atoms of that element in the molecule.
Note: This function depends on ume::get_isotope_info() for isotope data retrieval.
A numeric vector of the calculated nominal mass.
Other calculations:
calc_data_summary(),
calc_dbe(),
calc_eval_params(),
calc_exact_mass(),
calc_ideg(),
calc_ma(),
calc_neutral_mass(),
calc_norm_int(),
calc_number_assignment(),
calc_number_occurrence(),
calc_recalibrate_ms()
# Example using a demo dataset to calculate nominal mass calc_nm(mfd = mf_data_demo)# Example using a demo dataset to calculate nominal mass calc_nm(mfd = mf_data_demo)
Computes normalized peak intensities for a molecular formula dataset and adds the results
as additional columns to the input data.table (mfd). It also calculates:
the number of molecular formula assignments per peak (n_assignments)
the total occurrences of each formula across the dataset (n_occurrence)
Normalized intensities are stored in a new column norm_int, and the reference
intensity used for normalization is stored in int_ref.
Supported normalization methods:
"none" – no normalization; raw peak intensities are copied to norm_int
"bp" – normalized to the base peak intensity per spectrum
"sum" – normalized by the total sum of intensities per spectrum
"sum_ubiq" – normalized by the sum of intensities of ubiquitous peaks across the dataset
"sum_rank" – normalized by the sum of the top n_rank most intense peaks per spectrum
"euc" – Euclidean normalization (optional, not implemented in current version)
calc_norm_int( mfd, ms_id = "file_id", peak_id = "peak_id", peak_magnitude = "i_magnitude", normalization = c("bp", "sum", "sum_ubiq", "sum_rank", "none"), n_rank = 200, verbose = FALSE, ... )calc_norm_int( mfd, ms_id = "file_id", peak_id = "peak_id", peak_magnitude = "i_magnitude", normalization = c("bp", "sum", "sum_ubiq", "sum_rank", "none"), n_rank = 200, verbose = FALSE, ... )
mfd |
data.table with molecular formula data as derived from
|
ms_id |
Character; name of the column identifying individual spectra (default: |
peak_id |
Character; name of the column identifying unique peaks (default: |
peak_magnitude |
Character; name of the column containing peak intensity values (default: |
normalization |
Character; normalization method to apply. One of |
n_rank |
Integer; number of top-ranked peaks to use for |
verbose |
logical; if |
... |
Additional arguments (currently unused). |
A data.table identical to mfd but with additional columns:
Normalized peak intensity based on selected method.
Reference intensity used for normalization (e.g., sum, base peak).
Number of formula assignments per peak (calculated internally).
Number of occurrences of each formula across all spectra (calculated internally).
Other calculations:
calc_data_summary(),
calc_dbe(),
calc_eval_params(),
calc_exact_mass(),
calc_ideg(),
calc_ma(),
calc_neutral_mass(),
calc_nm(),
calc_number_assignment(),
calc_number_occurrence(),
calc_recalibrate_ms()
mfd_norm <- calc_norm_int( mfd = mf_data_demo, normalization = "sum_ubiq" )mfd_norm <- calc_norm_int( mfd = mf_data_demo, normalization = "sum_ubiq" )
This function calculates the number of molecular formula (mf) assignments for each individual peak (peak_id) within a specified mass spectrum (ms_id). It counts the occurrences of molecular formulas assigned to each peak and returns a vector of counts corresponding to the number of assignments for each unique combination of mass spectrum ID, peak ID, and molecular formula.
calc_number_assignment(ms_id, peak_id, mf, ...)calc_number_assignment(ms_id, peak_id, mf, ...)
ms_id |
A vector containing the mass spectrum ID for each peak. |
peak_id |
A vector containing the peak ID for each peak. |
mf |
Character vector of molecular formula(s)
(e.g., |
... |
Additional arguments passed to methods. |
A vector of integer counts representing the number of molecular formula assignments for each unique combination of mass spectrum ID, peak ID, and molecular formula.
Other calculations:
calc_data_summary(),
calc_dbe(),
calc_eval_params(),
calc_exact_mass(),
calc_ideg(),
calc_ma(),
calc_neutral_mass(),
calc_nm(),
calc_norm_int(),
calc_number_occurrence(),
calc_recalibrate_ms()
ms_ids <- c("file1", "file1", "file2", "file2", "file3") peak_ids <- c(1, 2, 2, 3, 4) mfs <- c("C10H10N2O8", "C10H12N2O8", "C10H10N2O8", "C10H11NOS4", "C10H24N4O2S") n_assignments <- calc_number_assignment(ms_id = ms_ids, peak_id = peak_ids, mf = mfs) print(n_assignments) mf_data_demo[, calc_number_assignment(file_id, peak_id, mf)]ms_ids <- c("file1", "file1", "file2", "file2", "file3") peak_ids <- c(1, 2, 2, 3, 4) mfs <- c("C10H10N2O8", "C10H12N2O8", "C10H10N2O8", "C10H11NOS4", "C10H24N4O2S") n_assignments <- calc_number_assignment(ms_id = ms_ids, peak_id = peak_ids, mf = mfs) print(n_assignments) mf_data_demo[, calc_number_assignment(file_id, peak_id, mf)]
This function calculates Pielou's evenness index, a measure of the distribution of abundances across molecular formulas. Evenness ranges from 0 (one molecular formula dominates) to 1 (all formulas are equally abundant).
Evenness is derived using the Shannon index:
where:
is the Shannon diversity index.
is the number of unique molecular formulas.
If there is only one molecular formula, evenness is defined as 1.
calc_pielou_evenness(mf, magnitude)calc_pielou_evenness(mf, magnitude)
mf |
Character vector. A list of unique molecular formulas. |
magnitude |
Numeric vector. A list of respective intensities (abundances) for each molecular formula.
Must be non-negative and have the same length as |
A single numeric value representing Pielou's evenness.
calc_pielou_evenness( mf = c("C10H20O5", "C12H18O3", "C18H30O6"), magnitude = c(1982375, 2424, 312410) )calc_pielou_evenness( mf = c("C10H20O5", "C12H18O3", "C18H30O6"), magnitude = c(1982375, 2424, 312410) )
The Shannon diversity index is calculated to quantify the diversity of molecular formulas based on their relative abundances. This index considers both the richness (number of unique formulas) and the evenness (distribution of abundances). Higher values indicate greater diversity.
The Shannon index is defined as:
where:
is the relative abundance of the -th molecular formula.
Zero-abundance formulas are excluded from the calculation.
calc_shannon_index(mf, magnitude)calc_shannon_index(mf, magnitude)
mf |
Character vector. A list of unique molecular formulas. |
magnitude |
Numeric vector. A list of respective abundances (intensities) for each molecular formula.
Must be non-negative and have the same length as |
A single numeric value representing the Shannon diversity index. Returns 0 if magnitude is all zeros.
calc_shannon_index( mf = c("C10H20O5", "C12H18O3", "C18H30O6"), magnitude = c(1982375, 2424, 312410) )calc_shannon_index( mf = c("C10H20O5", "C12H18O3", "C18H30O6"), magnitude = c(1982375, 2424, 312410) )
The Simpson diversity index is calculated to measure the probability that two randomly selected individuals (e.g., molecular formulas) belong to the same category. It quantifies the dominance or evenness within a dataset.
The Simpson index is defined as:
where:
is the relative abundance of the -th molecular formula.
The index ranges between 0 and 1:
A value near 0 indicates high diversity (even distribution of abundances).
A value of 1 indicates no diversity (one molecular formula dominates).
calc_simpson_index(mf, magnitude)calc_simpson_index(mf, magnitude)
mf |
Character vector. A list of unique molecular formulas. |
magnitude |
Numeric vector. A list of respective abundances (intensities) for each molecular formula.
Must be non-negative and have the same length as |
A single numeric value representing the Simpson diversity index. Returns 0 if magnitude is all zeros.
calc_simpson_index( mf = c("C10H20O5", "C12H18O3", "C18H30O6"), magnitude = c(1982375, 2424, 312410) )calc_simpson_index( mf = c("C10H20O5", "C12H18O3", "C18H30O6"), magnitude = c(1982375, 2424, 312410) )
Checks whether character strings are valid neutral molecular formulas that
can be parsed by convert_molecular_formula_to_data_table().
The function is intended as a lightweight pre-check before converting molecular formulas into element-count tables. It identifies common non-formula entries such as InChIKeys, charged formulas, empty values, unsupported isotope notation, and formulas containing unknown element or isotope labels.
check_neutral_mf(mf, masses = ume::masses)check_neutral_mf(mf, masses = ume::masses)
mf |
A character vector of molecular formulas. |
masses |
A |
Check molecular formulas for neutral formula validity
This function validates syntax only. It does not check chemical plausibility, valence rules, isotope natural abundance, charge balance, or whether the molecular formula corresponds to a real compound.
The parser uses the valid element symbols and isotope labels provided in
masses. This avoids hard-coding element symbols and ensures that the
validation is consistent with convert_molecular_formula_to_data_table().
Supported isotope notation follows the convention used in ume, for example:
[13C] for one carbon-13 atom
[13C2] for two carbon-13 atoms
[18O2] for two oxygen-18 atoms
The alternative notation [13C]2 is currently classified as unsupported
because the isotope count is placed outside the brackets.
Charged formulas such as "C10H13N2+", "C11H18N2+2", or
"C18H35CaO2Zn+3" are classified as charged and therefore not neutral.
InChIKeys such as "IOVCWXUNBOPUCH-UHFFFAOYSA-M" are detected separately
and classified as non-formula identifiers.
A data.table with one row per input entry and the following columns:
Original input string.
Logical; TRUE if the input is NA or empty.
Logical; TRUE if the input resembles an InChIKey.
Logical; TRUE if the formula ends with charge notation
such as +, -, +2, -3, 2+, or 3-.
Logical; TRUE if the string can be fully tokenized
using valid element and isotope labels from masses.
Logical; TRUE if the input is non-empty, does not
resemble an InChIKey, has no terminal charge notation, and can be fully
parsed as a molecular formula.
Character label describing the detected issue. Valid neutral
formulas are labelled "valid_neutral_mf".
Other molecular formula functions:
convert_data_table_to_molecular_formulas(),
convert_molecular_formula_to_data_table()
mf <- c( "C6H6", "C6[13C2]HF15O2", "C6[13C]2HF15O2", "C4H5FeO4+", "C11H18N2+2", "IOVCWXUNBOPUCH-UHFFFAOYSA-M", NA_character_ ) check_neutral_mf(mf) valid_mf <- check_neutral_mf(mf)[is_neutral_mf == TRUE, mf]mf <- c( "C6H6", "C6[13C2]HF15O2", "C6[13C]2HF15O2", "C4H5FeO4+", "C11H18N2+2", "IOVCWXUNBOPUCH-UHFFFAOYSA-M", NA_character_ ) check_neutral_mf(mf) valid_mf <- check_neutral_mf(mf)[is_neutral_mf == TRUE, mf]
Classifies entries into categories (blank, standard, pool, sample, …) based on pattern rules applied to a specific search column. The identifiers returned in each category are also configurable.
classify_files( fi, search_col = "link_rawdata", id_col = "file_id", patterns = list(blank = c("blk", "blank", "mq"), standard = c("srfa", "standard"), pool = c("pool")), include_blank_check = TRUE, return = c("list", "table") )classify_files( fi, search_col = "link_rawdata", id_col = "file_id", patterns = list(blank = c("blk", "blank", "mq"), standard = c("srfa", "standard"), pool = c("pool")), include_blank_check = TRUE, return = c("list", "table") )
fi |
|
search_col |
Character. Name of the column used for pattern matching.
Defaults to |
id_col |
Character. Name of the column whose values are returned for
each category. Defaults to |
patterns |
Named list of character vectors. Each list entry is a category name, and its value is a vector of patterns. |
include_blank_check |
Logical; if TRUE and |
return |
Either
|
Default behavior:
"blank": blank_check == "blank" or pattern "blk"
"standard": pattern "srfa"
"pool": pattern "pool"
"sample": everything unmatched
Pattern matching is case-insensitive.
Named list or a classified data.table.
# Minimal demo data fi <- data.table::data.table( file_id = 1:6, filename = c("NS_blk_01.raw", "SRFA_20.raw", "Pool_A.raw", "Sample_01.raw", "Sample_02.raw", "MQ_blank.raw"), blank_check = c("blank", NA, NA, NA, NA, "blank"), # optional column link_rawdata = c("NS_blk_01.raw", "SRFA_20.raw", "Pool_A.raw", "Sample_01.raw", "Sample_02.raw", "MQ_blank.raw") ) # 1) Default behavior: return named list of file_ids by category classify_files(fi) # 2) Use a different column for pattern matching classify_files(fi, search_col = "filename") # 3) Return another ID field (here: file_id → stays the same for demo) classify_files(fi, id_col = "file_id") # 4) Return the full table with new category column classify_files(fi, return = "table")# Minimal demo data fi <- data.table::data.table( file_id = 1:6, filename = c("NS_blk_01.raw", "SRFA_20.raw", "Pool_A.raw", "Sample_01.raw", "Sample_02.raw", "MQ_blank.raw"), blank_check = c("blank", NA, NA, NA, NA, "blank"), # optional column link_rawdata = c("NS_blk_01.raw", "SRFA_20.raw", "Pool_A.raw", "Sample_01.raw", "Sample_02.raw", "MQ_blank.raw") ) # 1) Default behavior: return named list of file_ids by category classify_files(fi) # 2) Use a different column for pattern matching classify_files(fi, search_col = "filename") # 3) Return another ID field (here: file_id → stays the same for demo) classify_files(fi, id_col = "file_id") # 4) Return the full table with new category column classify_files(fi, return = "table")
Constructs a continuous color palette from a sequence of base colors. Intermediate colors are interpolated between each pair of adjacent colors, optionally using a custom number of interpolation steps.
color.palette(steps, n.steps.between = NULL, ...)color.palette(steps, n.steps.between = NULL, ...)
steps |
A character vector of base colors (e.g., hex codes or color names). These colors define the breakpoints in the palette. |
n.steps.between |
An optional integer vector specifying how many interpolated colors should
be added between each pair of entries in |
... |
Additional arguments passed to methods. |
This helper is primarily used for UME visualizations (e.g., color bars in density plots), but it can be used independently for any plotting task.
A function of class "colorRampPalette" that generates interpolated color
vectors when called with a single integer argument n.
For example, pal <- color.palette(c("blue", "white", "red")); pal(100)
returns a vector of 100 smoothly interpolated colors.
# Generate a simple blue-white-red palette pal <- color.palette(c("blue", "white", "red")) pal(10) # Add additional steps between colors pal2 <- color.palette(c("blue", "white", "red"), n.steps.between = c(5, 10)) pal2(20)# Generate a simple blue-white-red palette pal <- color.palette(c("blue", "white", "red")) pal(10) # Add additional steps between colors pal2 <- color.palette(c("blue", "white", "red"), n.steps.between = c(5, 10)) pal2(20)
Creates standardized molecular formula strings from isotope or element count
columns and adds them to the input data.table.
convert_data_table_to_molecular_formulas( mfd, isotope_formulas = FALSE, keep_element_sums = FALSE, verbose = FALSE, ... )convert_data_table_to_molecular_formulas( mfd, isotope_formulas = FALSE, keep_element_sums = FALSE, verbose = FALSE, ... )
mfd |
data.table with molecular formula data as derived from
|
isotope_formulas |
Logical. If |
keep_element_sums |
Logical. If |
verbose |
Logical. If |
... |
Additional arguments passed to |
The function extracts element or isotope counts from a table with one column
per isotope or element. Valid isotope columns are detected using
get_isotope_info() and the reference table ume::masses.
The standard molecular formula mf is created by summing isotopes belonging
to the same element and arranging elements according to Hill order.
If isotope_formulas = TRUE, an additional mf_iso column is created that
keeps isotope-specific information, for example [12C5][13C][1H12][16O6].
The function preserves the original row order and keeps duplicate rows.
The original table mfd as a data.table with additional columns:
Standardized molecular formula following Hill order.
If isotope_formulas = TRUE, isotope-specific molecular
formula.
If keep_element_sums = TRUE, total count of all carbon
isotopes. Equivalent *_tot columns are created for other elements.
Isotopic columns such as 13C are formatted as [13C] in mf_iso.
The output follows Hill order: C, H, then all other elements
alphabetically.
Single-element counts, e.g. C1H4, are formatted without explicit 1.
Hill E.A. (1900). On a system of indexing chemical literature; adopted by the classification division of the U. S. patent office. Journal of the American Chemical Society, 22, 478-494. doi:10.1021/ja02046a005
Other molecular formula functions:
check_neutral_mf(),
convert_molecular_formula_to_data_table()
convert_data_table_to_molecular_formulas( mf_data_demo[, .(`12C`, `1H`, `14N`, `16O`, `31P`, `32S`)] )convert_data_table_to_molecular_formulas( mf_data_demo[, .(`12C`, `1H`, `14N`, `16O`, `31P`, `32S`)] )
Parses molecular formulas and returns a data.table where each row
represents one molecular formula and each element or isotope is represented
by a separate count column.
convert_molecular_formula_to_data_table( mf, masses = ume::masses, table_format = c("wide", "long"), keep_mf_old = TRUE, isotope_default = c("most_abundant", "lightest"), check_neutral = FALSE )convert_molecular_formula_to_data_table( mf, masses = ume::masses, table_format = c("wide", "long"), keep_mf_old = TRUE, isotope_default = c("most_abundant", "lightest"), check_neutral = FALSE )
mf |
Character vector of molecular formula(s)
(e.g., |
masses |
A data.table. Defaults to |
table_format |
A string controlling the output format. Either |
keep_mf_old |
Logical. If |
isotope_default |
A string defining which isotope should be used when
an element is given without explicit isotope notation. Either
|
check_neutral |
Logical. If |
The function supports normal element notation such as C6H12O6 and bracketed
isotope notation such as [13C], [13C2], and [18O2].
Input formulas are parsed using the element symbols and isotope labels
provided in masses. This avoids hard-coded element lists and allows rare
elements to be parsed as long as they are present in masses.
By default, input formulas are checked with check_neutral_mf() before
parsing.
The standardized molecular formula mf is generated using dynamic Hill
ordering:
if carbon is present: C, then H, then all other elements alphabetically
if carbon is absent: all elements alphabetically, including H
A data.table in wide or long format.
Other molecular formula functions:
check_neutral_mf(),
convert_data_table_to_molecular_formulas()
Creates a new molecular formula table containing the original parent formulas and their corresponding single-isotope daughter formulas.
create_isotope_expanded_table( mfd, id_col = "peak_id", allow_duplicates = TRUE, elements = NULL )create_isotope_expanded_table( mfd, id_col = "peak_id", allow_duplicates = TRUE, elements = NULL )
mfd |
A |
id_col |
Name of the column in |
allow_duplicates |
Logical. If |
elements |
Optional character vector of element symbols
(matching |
The output includes annotation columns that facilitate isotope validation in downstream workflows:
iso_role indicates whether a row represents a "parent" or
"daughter" isotopologue.
iso_element stores the element symbol for which the isotope
substitution was generated (e.g. "C", "N", "S").
iso_from and iso_to store the parent and daughter isotope labels
(e.g. "12C" and "13C").
A data.table containing parent and daughter formulas, including
isotope annotation columns for downstream validation.
Other isotopes:
calc_isotope_pattern(),
eval_isotopes(),
uplot_isotope_precision()
Generates all combinations of element / isotope counts between
min_formula and max_formula, filtered by mass, DBE, element ratios,
and heuristic rules (Kind & Fiehn 2007).
create_ume_formula_library( max_formula, min_formula = "C1H1", lib_version = 99, masses = ume::masses, max_mass = 152, ratio_filter = TRUE, heu_filter = TRUE, max_oc = 1.2, max_hc = 3.1, max_nc = 1.3, max_pc = 0.3, max_sc = 0.8, verbose = FALSE )create_ume_formula_library( max_formula, min_formula = "C1H1", lib_version = 99, masses = ume::masses, max_mass = 152, ratio_filter = TRUE, heu_filter = TRUE, max_oc = 1.2, max_hc = 3.1, max_nc = 1.3, max_pc = 0.3, max_sc = 0.8, verbose = FALSE )
max_formula |
Character. Maximum element/isotope counts, e.g. "C20H40O10" or "C1000\[13C1\]H2000". |
min_formula |
Character. Minimum element/isotope counts (default "C1H1"). |
lib_version |
Integer. Library version identifier (default 99). |
masses |
A data.table. Defaults to |
max_mass |
Numeric. Maximum allowed exact mass. |
ratio_filter |
Logical. Apply O/C, H/C, N/C, P/C, S/C filters. |
heu_filter |
Logical. Apply Kind - Fiehn heuristic rules. |
max_oc |
Maximum oxygen / carbon ratio in a molecule; (UM_orig: 1.5; 7 rules: 1.2) |
max_hc |
Maximum hydrogen / carbon ratio in a molecule; (UM_orig: ; 7 rules: 1.2) |
max_nc |
Maximum nitrogen / carbon ratio in a molecule; (UM_orig: 0.5; 7 rules: 1.3) |
max_pc |
Maximum phosphorus / carbon ratio in a molecule; (UM_orig: 3; 7 rules: 0.3) |
max_sc |
Maximum sulfur / carbon ratio in a molecule; (UM_orig: 4; 7 rules: 0.8) |
verbose |
Logical. Print progress messages. |
A data.table containing the generated molecular formula library.
The returned object has class "ume_library" and includes one row per
molecular formula, with columns for:
elemental and isotopic counts (e.g., 12C, 13C, 1H, 16O, ...)
double bond equivalent (dbe)
exact mass (mass)
molecular formula string (mf)
a unique versioned key (vkey)
Additional metadata is stored as attributes:
"lib_version": numeric version identifier
"min_formula": user-supplied minimum formula
"max_formula": user-supplied maximum formula
"max_mass": maximum allowed exact mass
"filters": list describing applied ratio and heuristic filters
"call": the matched function call
The object inherits from both "ume_library" and "data.table".
Kind T., Fiehn O. (2007). Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics, 8, 105. doi:10.1186/1471-2105-8-105
Hill E.A. (1900). On a system of indexing chemical literature; adopted by the classification division of the U. S. patent office. Journal of the American Chemical Society, 22, 478-494. doi:10.1021/ja02046a005
Downloads one of the UME formula libraries from Zenodo only when explicitly called by the user.
Unlike earlier versions, this CRAN-compliant implementation:
never writes to the user's filespace unless dest is explicitly provided
does NOT create ~/.ume/ or any other default directory
does NOT perform automatic caching
In non-interactive environments (CRAN checks), the function returns NULL
download_library( library = "lib_05.rds", doi = "10.5281/zenodo.17606457", dest = NULL, overwrite = FALSE )download_library( library = "lib_05.rds", doi = "10.5281/zenodo.17606457", dest = NULL, overwrite = FALSE )
library |
Character. One of |
doi |
Character. Zenodo DOI. |
dest |
Optional file path where the library should be saved.
If |
overwrite |
Logical. Redownload even if |
A data.table or NULL (in non-interactive mode).
Add isotope information to the parent mass and optionally remove isotopoloques from mfd table. Required for further data evaluation that considers isotope information.
eval_isotopes(mfd, remove_isotopes = TRUE, verbose = FALSE, ...)eval_isotopes(mfd, remove_isotopes = TRUE, verbose = FALSE, ...)
mfd |
data.table with molecular formula data as derived from
|
remove_isotopes |
If set to TRUE (default), all entries for isotopologues are removed from mfd. The main isotope information for each parent ion is still maintained in the "intxy"-columns. |
verbose |
logical; if |
... |
Additional arguments passed to methods. |
A data.table with additional columns such as "int_13c" containing stable isotope abundance information.
Boris P. Koch
Other Formula assignment:
add_known_mf(),
calc_eval_params(),
check_formula_library(),
ume_assign_formulas()
Other isotopes:
calc_isotope_pattern(),
create_isotope_expanded_table(),
uplot_isotope_precision()
eval_isotopes(mfd = mf_data_demo)eval_isotopes(mfd = mf_data_demo)
This function filters molecular formulas by (relative) peak abundances.
filter_int(mfd, norm_int_min = NULL, norm_int_max = NULL, verbose = FALSE, ...)filter_int(mfd, norm_int_min = NULL, norm_int_max = NULL, verbose = FALSE, ...)
mfd |
data.table with molecular formula data as derived from
|
norm_int_min |
Lower threshold (>=) of (normalized) peak magnitude |
norm_int_max |
Upper threshold (<=) of (normalized) peak magnitude |
verbose |
logical; if |
... |
Arguments passed on to
|
data.table; subset of original molecular formula table
Other Formula subsetting:
filter_mass_accuracy(),
filter_mf_data(),
remove_blanks(),
subset_known_mf(),
ume_assign_formulas(),
ume_filter_formulas()
filter_int(mfd = calc_norm_int(mfd = mf_data_demo, normalization = "sum_rank", n_rank = 100), norm_int_min = 1)filter_int(mfd = calc_norm_int(mfd = mf_data_demo, normalization = "sum_rank", n_rank = 100), norm_int_min = 1)
This function automatically sets a filter for mass accuracy for each individual spectrum.
filter_mass_accuracy( mfd, ma_col = "ppm", file_col = "file_id", msg = FALSE, ... )filter_mass_accuracy( mfd, ma_col = "ppm", file_col = "file_id", msg = FALSE, ... )
mfd |
data.table with molecular formula data as derived from
|
ma_col |
Name of the column that contains mass accuracy values in ppm (string) |
file_col |
Name of the column that contains file name |
msg |
logical. Deprecated synonym for |
... |
Additional arguments passed to methods. |
data.table; subset of original molecular formula table
Other Formula subsetting:
filter_int(),
filter_mf_data(),
remove_blanks(),
subset_known_mf(),
ume_assign_formulas(),
ume_filter_formulas()
This function filters molecular formulas by isotope numbers, element ratios, etc.
filter_mf_data( mfd, c_iso_check = FALSE, n_iso_check = FALSE, s_iso_check = FALSE, ma_dev = 3, dbe_max = 999, dbe_o_min = -999, dbe_o_max = 999, mz_min = 1, mz_max = 9999, n_min = 0, n_max = 999, s_min = 0, s_max = 999, p_min = 0, p_max = 999, oc_min = 0, oc_max = 999, hc_min = 0, hc_max = 999, nc_min = 0, nc_max = 99, verbose = FALSE, ... )filter_mf_data( mfd, c_iso_check = FALSE, n_iso_check = FALSE, s_iso_check = FALSE, ma_dev = 3, dbe_max = 999, dbe_o_min = -999, dbe_o_max = 999, mz_min = 1, mz_max = 9999, n_min = 0, n_max = 999, s_min = 0, s_max = 999, p_min = 0, p_max = 999, oc_min = 0, oc_max = 999, hc_min = 0, hc_max = 999, nc_min = 0, nc_max = 99, verbose = FALSE, ... )
mfd |
data.table with molecular formula data as derived from
|
c_iso_check |
(TRUE / FALSE); check if formulas are verified by the presence of the main daughter isotope |
n_iso_check |
(TRUE / FALSE); check if formulas are verified by the presence of the main daughter isotope |
s_iso_check |
(TRUE / FALSE); check if formulas are verified by the presence of the main daughter isotope |
ma_dev |
Deviation range of mass accuracy in +/- ppm (default: 3 ppm) |
dbe_max |
Maximum number for DBE |
dbe_o_min |
Minimum number for DBE minus O atoms |
dbe_o_max |
Maximum number for DBE minus O atoms |
mz_min |
Minimum of mass to charge value |
mz_max |
Maximum of mass to charge value |
n_min |
Minimum number of nitrogen atoms |
n_max |
Maximum number of nitrogen atoms |
s_min |
Minimum number of nitrogen atoms |
s_max |
Maximum number of nitrogen atoms |
p_min |
Minimum number of nitrogen atoms |
p_max |
Maximum number of nitrogen atoms |
oc_min |
Minimum atomic ratio of oxygen / carbon |
oc_max |
Maximum atomic ratio of oxygen / carbon |
hc_min |
Minimum atomic ratio of hydrogen / carbon |
hc_max |
Maximum atomic ratio of hydrogen / carbon |
nc_min |
Minimum atomic ratio of nitrogen / carbon |
nc_max |
Maximum atomic ratio of nitrogen / carbon |
verbose |
logical; if |
... |
Additional arguments passed to methods. |
data.table; subset of original molecular formula table
Boris P. Koch
Other Formula subsetting:
filter_int(),
filter_mass_accuracy(),
remove_blanks(),
subset_known_mf(),
ume_assign_formulas(),
ume_filter_formulas()
filter_mf_data(mfd = mf_data_demo, dbe_o_max = 10)filter_mf_data(mfd = mf_data_demo, dbe_o_max = 10)
Checks if element/isotope columns are present in mfd
and lookup of NIST isotope information (based on masses).
Can be applied to a formula library and any table having molecular formula data.
If only an element name is identified, the symbol and data of the lightest isotope
of the element will be returned.
For example, the column name "C" will return "12C" isotope data.
get_isotope_info(mfd, masses = ume::masses, verbose = FALSE, ...)get_isotope_info(mfd, masses = ume::masses, verbose = FALSE, ...)
mfd |
data.table with molecular formula data as derived from
|
masses |
A data.table. Defaults to |
verbose |
logical; if |
... |
Additional arguments passed to methods. |
A data.table containing information on all isotopes identified in mfd
and a column "orig_name" having the original names of the
isotope / element columns in mfd. Results are ordered according to Hill system.
Hill E.A. (1900). On a system of indexing chemical literature; adopted by the classification division of the U. S. patent office. Journal of the American Chemical Society, 22, 478-494. doi:10.1021/ja02046a005
get_isotope_info(mfd = mf_data_demo, verbose = TRUE)get_isotope_info(mfd = mf_data_demo, verbose = TRUE)
Extracts the molecular formula from an InChI string by parsing the first layer of the InChI representation. The function performs a fast string-based extraction without requiring external cheminformatics libraries.
inchi_to_mf(inchi)inchi_to_mf(inchi)
inchi |
A character vector containing InChI strings (e.g.,
|
Extract molecular formula from InChI
The function extracts the molecular formula from the first layer of the
InChI string (i.e., the part immediately following "InChI=1S/" or
"InChI=1/" and before the next / separator).
This approach is highly efficient because the molecular formula is explicitly encoded in the InChI and does not require interpretation of molecular structure, in contrast to SMILES-based approaches.
Leading and trailing whitespace is ignored. Non-character inputs result in an error.
A character vector of molecular formulas in Hill notation, with the same
length and order as inchi. Invalid or missing inputs return NA_character_.
https://www.inchi-trust.org/technical-faq/
inchi <- c( "InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3", "InChI=1S/H2O/h1H2", NA_character_ ) inchi_to_mf(inchi) # [1] "C2H6O" "H2O" NAinchi <- c( "InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3", "InChI=1S/H2O/h1H2", NA_character_ ) inchi_to_mf(inchi) # [1] "C2H6O" "H2O" NA
Check whether an object is a UME peaklist
is_ume_peaklist(x)is_ume_peaklist(x)
x |
Any object |
TRUE/FALSE
Known formulas; contains formulas for which additional knowledge is available. This can be also calibration lists. Due to size reasons the table is restricted to what is covered by standard UME formula library (mz<=700, elements CHONSP considered). The original version is part of the UME database and transferred to UME using UTF-8 encoding. CRAM molecular formulas are taken from the supplementary material that is provided by Hertkorn et al. (2006).
known_mfknown_mf
A data.table with ~300,000 rows and 14 variables:
Mass to charge ratio (numeric)
molecular formula
taken from www.awi.de
Other ume data:
lib_demo,
masses,
mf_data_demo,
nice_labels_dt,
peaklist_demo,
tab_ume_labels
data(known_mf)data(known_mf)
Contains a small molecular formula library for demonstration and validation purposes. Complete formula libraries are available in the 'ume.formulas' data package.
lib_demolib_demo
A data.table having ~115,111 rows and 12 variables:
First two digits represent the formula library version; last digits are unique identifiers for each formula
Neutral molecular formula (no differentiation of isotopes)
Calculated exact neutral mass of a formula (based on ume::masses)
Other ume data:
known_mf,
masses,
mf_data_demo,
nice_labels_dt,
peaklist_demo,
tab_ume_labels
data(peaklist_demo)data(peaklist_demo)
Contains masses, valences, isotopes and isotope ratios of elements based on data by NIST Physical Measurement Laboratory (https://www.nist.gov/pml).
massesmasses
A data.table having 288 rows and 23 variables:
Element symbol in lower case
Element symbol in upper case
Isotope symbol in lower case
Isotope symbol in upper case
Nominal mass of the isotope
Exact mass of the isotope
Mole fraction compared to all isotopes of an element
Relative abundance compared to the main (most abundant) isotope
Valence at standard conditions
Alternative valence at standard conditions
Rank in Hill Order for molecular formulas (cf. https://en.wikipedia.org/wiki/Chemical_formula)
https://www.nist.gov/pml/atomic-weights-and-isotopic-compositions-relative-atomic-masses
Hill E.A. (1900). On a system of indexing chemical literature; adopted by the classification division of the U. S. patent office. Journal of the American Chemical Society, 22, 478-494. doi:10.1021/ja02046a005
Other ume data:
known_mf,
lib_demo,
mf_data_demo,
nice_labels_dt,
peaklist_demo,
tab_ume_labels
data(masses)data(masses)
Contains molecular formula data and metainformation on formulas. The metainformation
mf_data_demomf_data_demo
A data.table with ~9245 rows (formulas) and 65 variables:
Unique ID (integer) for each analysis
Unique ID (integer) for each mass peak in the peak list 'pl'
Mass to charge ratio of the singly charged molecular ion (numeric)
Measured mass peak magnitude of the singly charged molecular ion (numeric)
Normalized intensity as calculated by calc_norm_int()
Neutral measured mass of the molecular ion
Neutral calculated mass of the assigned formula
Realtive mass accuracy of measured mass compared to m_cal (in ppm)
Nominal mass of the neutral molecule
molecular formula (no differentiation of isotopes)
Double bond equivalent
12CNumber of carbon atoms (12C)
1HNumber of hydrogen atoms
hydrogen / carbon ratio in a molecular formula
oxygen / carbon ratio in a molecular formula
nitrogen / carbon ratio in a molecular formula
sulfur / carbon ratio in a molecular formula
Aromaticity index according to Koch and Dittmar (2008, 2016)
z score according to Stenson et al. (2003)
Kendrick mass defect (based on CH2-units) according to Kendrick (1963)
Calculated threshold value for relative mass accuracy (in ppm) that can be used for formular filtering
Identifier for each unique molecular formula identified in the unfiltered dataset
Molecular formula that was identified (CRAM == 1) as carboxylic rich alicyclic molecule according to Hertkorn et al. (2006). See ume::known_mf for details.
Measured relative peak magnitude of the 13C1 isotope compared to the parent ion (0 if isotope was not existing)
Measured relative peak magnitude of the 15N1 isotope compared to the parent ion (0 if isotope was not existing)
Measured relative peak magnitude of the 34S1 isotope compared to the parent ion (0 if isotope was not existing)
Deviation of the 12C/13C isotope ratio represented in carbon numbers according to Koch et al. (2007)
DBE minus O
Nominal oxidation state of carbon according to LaRowe & Van Cappellen (2011)
Standard molal Gibbs energies of the oxidation half reactions of organic compounds according to LaRowe & Van Cappellen (2011)
Total number of carbon and oxygen atoms in a molecular formula
Total number of nitrogen, sulfur, and phosphorus atoms in a molecular formula
Number of occurrences of a molecular formula in the entire unfiltered set of formulas
Number of molecular formula assignments per molecular mass in the unfiltered set of formulas
Number of molecular formula assignments per molecular mass after filter process
Magnitude of the base peak in a mass spectrum
Total magnitude of the reference that was used for normalization (cf. calc_norm_int())
taken from www.awi.de
Other ume data:
known_mf,
lib_demo,
masses,
nice_labels_dt,
peaklist_demo,
tab_ume_labels
data(mf_data_demo)data(mf_data_demo)
nice_labels_dt
nice_labels_dtnice_labels_dt
A data.table with labels that can be used for plots
Name that will be displayed instead of the standard column name
Name of the standard column in ume tables
taken from www.awi.de
Other ume data:
known_mf,
lib_demo,
masses,
mf_data_demo,
peaklist_demo,
tab_ume_labels
data(nice_labels_dt)data(nice_labels_dt)
Take most prominent columns required for data evaluation first - followed by all other columns.
order_columns(mfd, col_order = NULL, ...)order_columns(mfd, col_order = NULL, ...)
mfd |
data.table with molecular formula data as derived from
|
col_order |
A list of column names that defines the order of columns of mfd. Default is: cols = c("sample_tag", "sample_id", "file", "file_id", "peak_id", "i_magnitude", "norm_int", "m", "m_cal", "ppm", "nm", "mf", "dbe", "c", "h", "n", "o", "p", "s", "hc", "oc", "nc", "sc", "ai", "z", "kmd") If "col_order" is NULL the default order is applied. |
... |
Additional arguments passed to methods. |
A data.table containing isotope data for those isotopes present in mfd.
Other tools:
add_missing_element_columns()
order_columns(mfd = mf_data_demo)order_columns(mfd = mf_data_demo)
Contains parts of the peaklist (200 - 300 m/z) from mass spectra to use as demonstration and validation dataset. The sample mass spectra contain one blank, three replicates of North Sea water, and three Arctic fjord samples as triplicates.
peaklist_demopeaklist_demo
A data.table having 31,091 rows and 7 variables:
A unique identifier for a mass spectrum (integer)
A unique label for a mass spectrum or sample (character)
A unique identifier for a peak in the entire peak list (integer)
Mass to charge ratio of the singly charged molecular ion (numeric)
Peak magnitude of the molecular ion (numeric)
Signal to noise ratio of the molecular ion (numeric)
Mass resolution of the peak / ion (numeric)
taken from www.awi.de
Other ume data:
known_mf,
lib_demo,
masses,
mf_data_demo,
nice_labels_dt,
tab_ume_labels
data(peaklist_demo)data(peaklist_demo)
Remove all molecular formulas that were detected in one or more blank analyses
(identified via blank_file_ids). Matching is always on mf. If a
retention-time column is present (or provided using ret_time_col), removal
is restricted to the corresponding LC segment.
remove_blanks( mfd, blank_file_ids = NULL, blank_prevalence = 0.5, ret_time_col = NULL, verbose = FALSE, ... )remove_blanks( mfd, blank_file_ids = NULL, blank_prevalence = 0.5, ret_time_col = NULL, verbose = FALSE, ... )
mfd |
data.table with molecular formula data as derived from
|
blank_file_ids |
Integer vector of |
blank_prevalence |
Numeric between 0 and 1. Threshold for blank filtering:
the proportion of blanks in which a molecular formula must occur before it is
excluded from the sample data. For example, |
ret_time_col |
Character scalar. Name of the retention-time column that
contains the beginning of the retention time segment that corresponds to the
mass spectrum.
If |
verbose |
logical; if |
... |
Additional arguments passed to methods. |
Requires a unique integer file_id per analysis in mfd.
Minimal required columns in mfd: mf, file_id.
Optional column: a retention-time column (e.g. "ret_time_min").
If a retention-time column is used, formulas present in blanks are only
removed for rows whose mf and retention time match
The input mfd is not modified by reference; a subset is returned.
data.table; subset of the original molecular formula table (mfd)
with blank formulas removed (globally or LC-segment-wise).
The argument LCMS is deprecated and no longer used. Retention-time-aware
removal is now enabled automatically when a retention-time column is present
or explicitly provided via ret_time_col.
Boris P. Koch
Other Formula subsetting:
filter_int(),
filter_mass_accuracy(),
filter_mf_data(),
subset_known_mf(),
ume_assign_formulas(),
ume_filter_formulas()
# Presence/absence removal, no retention time: remove_blanks(mfd = mf_data_demo, remove_blank_list = "Blank", verbose = TRUE)# Presence/absence removal, no retention time: remove_blanks(mfd = mf_data_demo, remove_blank_list = "Blank", verbose = TRUE)
Removes columns that contain only NA values from a data.table.
Columns listed in excl_cols are retained even if they are empty.
remove_empty_columns(df, excl_cols = NULL, ...)remove_empty_columns(df, excl_cols = NULL, ...)
df |
A |
excl_cols |
Optional character vector of column names that must be preserved, even if all values in those columns are missing. |
... |
Additional arguments passed to methods. |
A data.table containing all original non-empty columns, plus any
columns listed in excl_cols, regardless of whether they are empty.
Columns that contain only NA values and are not explicitly preserved
are removed from the output.
dt <- data.table::data.table( c = c(2, 2, 2), x = c(NA, NA, NA), y = c(NA, NA, NA) ) remove_empty_columns(dt, excl_cols = "y")dt <- data.table::data.table( c = c(2, 2, 2), x = c(NA, NA, NA), y = c(NA, NA, NA) ) remove_empty_columns(dt, excl_cols = "y")
This functions removes columns ID columns ('_id') and hierarchical search columns ('_lft', '_rgt') from a table. Only exceptions are "sample_id" and "bottle_id that are always kept in the output table.
remove_id_columns(df, ...)remove_id_columns(df, ...)
df |
data.table that contains ID columns |
... |
Additional arguments passed to methods. |
Other Clean data output:
remove_unknown_columns()
This function removes columns that exclusively contain the value defined in 'search_term' (such as " unknown" (default)).
remove_unknown_columns(df, excl_cols = NULL, search_term = " unknown", ...)remove_unknown_columns(df, excl_cols = NULL, search_term = " unknown", ...)
df |
data.table that contains empty columns |
excl_cols |
List of column names that should not be removed, even if all values contain search_term |
search_term |
String that uniquely occurs in one column |
... |
Additional arguments passed to methods. |
Other Clean data output:
remove_id_columns()
Subset all molecular formulas that are present in one or more categories of ume::known_mf. Based on presence / absence.
subset_known_mf( mfd, select_category = NULL, exclude_category = NULL, verbose = FALSE, ... )subset_known_mf( mfd, select_category = NULL, exclude_category = NULL, verbose = FALSE, ... )
mfd |
data.table with molecular formula data as derived from
|
select_category |
List of category names that should be selected |
exclude_category |
List of category names that should be ignored |
verbose |
logical; if |
... |
Additional arguments passed to methods. |
data.table; subset of original molecular formula data.table (mfd)
Other Formula subsetting:
filter_int(),
filter_mass_accuracy(),
filter_mf_data(),
remove_blanks(),
ume_assign_formulas(),
ume_filter_formulas()
subset_known_mf(category_list = c("marine_dom"), mfd = mf_data_demo, verbose = TRUE)subset_known_mf(category_list = c("marine_dom"), mfd = mf_data_demo, verbose = TRUE)
Labels of UME columns.
tab_ume_labelstab_ume_labels
A data.table that is derived from the MarChem database:
Identifier for each label
Label that can be used e.g. in figures
Shows if label is used in the UME shiny app
taken from www.awi.de
Other ume data:
known_mf,
lib_demo,
masses,
mf_data_demo,
nice_labels_dt,
peaklist_demo
data(tab_ume_labels)data(tab_ume_labels)
Applies a clean UME-style theme used across all uplot_* visualisations.
theme_uplots(base_size = 12, base_family = "")theme_uplots(base_size = 12, base_family = "")
base_size |
Numeric base font size. |
base_family |
Font family. |
Unified UME Theme for All uplot_* Functions
A ggplot2 theme object.
Assigns molecular formulas to neutral molecular masses and calculates all parameters required for data evaluation, such as a posteriori filtering of molecular formulas, plotting, and statistics. The function uses a pre-build molecular formula library.
ume_assign_formulas(pl, formula_library, verbose = FALSE, ...)ume_assign_formulas(pl, formula_library, verbose = FALSE, ...)
pl |
data.table containing peak data. Mandatory columns include neutral
molecular mass ( |
formula_library |
Molecular formula library: a predefined data.table used for
assigning molecular formulas to a peak list and for mass calibration. The library
requires a fixed format, including mass values for matching. Predefined libraries
are available in the R package ume.formulas and further described in
Leefmann et al. (2019). A standard library for marine dissolved organic matter is
|
verbose |
logical; if |
... |
Arguments passed on to
|
All function arguments: args(filter_mf_data) args(filter_int)
A data.table having molecular formula assignments for each mass.
Other Formula assignment:
add_known_mf(),
calc_eval_params(),
check_formula_library(),
eval_isotopes()
Other Formula subsetting:
filter_int(),
filter_mass_accuracy(),
filter_mf_data(),
remove_blanks(),
subset_known_mf(),
ume_filter_formulas()
Other ume wrapper:
ume_filter_formulas()
ume_assign_formulas(pl = peaklist_demo, formula_library = lib_demo, pol = "neg", ma_dev = 0.2)ume_assign_formulas(pl = peaklist_demo, formula_library = lib_demo, pol = "neg", ma_dev = 0.2)
A wrapper function to filter molecular formulas according to a evaluation parameters.
ume_filter_formulas(mfd, verbose = FALSE, ...)ume_filter_formulas(mfd, verbose = FALSE, ...)
mfd |
data.table with molecular formula data as derived from
|
verbose |
logical; if |
... |
Arguments passed on to
|
A data.table having molecular formula assignments for each mass. ume_filter_formulas(mfd = mf_data_demo, dbe_o_max = 15, norm_int_min = 2)
Other Formula subsetting:
filter_int(),
filter_mass_accuracy(),
filter_mf_data(),
remove_blanks(),
subset_known_mf(),
ume_assign_formulas()
Other ume wrapper:
ume_assign_formulas()
This function plots the results of a cluster analysis and a multi-dimensional scaling (MDS) plot based on the input data. It first creates a hierarchical cluster dendrogram using the Bray-Curtis dissimilarity index, followed by an MDS plot for dimensionality reduction. The function outputs both plots side by side.
uplot_cluster(mfd, grp = "file_id", int_col = "norm_int", ...)uplot_cluster(mfd, grp = "file_id", int_col = "norm_int", ...)
mfd |
data.table with molecular formula data as derived from
|
grp |
Character vector. Names of columns (e.g., sample or file identifiers) used to aggregate results. |
int_col |
Character. The name of the column that contains the intensity values to be used (e.g. for clustering or color coding). Default usually is "norm_int" for normalized intensity values. |
... |
Additional arguments passed to methods. |
Plot Cluster Analysis and Multi-Dimensional Scaling
A named list with two elements:
dendrogramA recordedplot object containing the hierarchical clustering
dendrogram generated from the Bray–Curtis dissimilarity matrix.
mdsA plotly object representing the two-dimensional
Multi-Dimensional Scaling (MDS) scatter plot.
This can be rendered interactively in HTML or converted to
a static ggplot object if needed.
The function always returns a list with these two components.
This function requires the vegan package for the Bray-Curtis
dissimilarity and MDS calculations.
Other uplots:
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
# Example with demo data out <- uplot_cluster(mfd = mf_data_demo, grp = "file", int_col = "norm_int") out$dendrogram out$mds# Example with demo data out <- uplot_cluster(mfd = mf_data_demo, grp = "file", int_col = "norm_int") out$dendrogram out$mds
Generates a scatter plot of nominal molecular mass (nm) versus carbon count (12C),
coloured by the median a supplied variable (z_var), following Reemtsma (2010).
uplot_cvm( mfd, z_var = "co_tot", fun = median, palname = "redblue", tf = FALSE, size_dots = 1.5, ... )uplot_cvm( mfd, z_var = "co_tot", fun = median, palname = "redblue", tf = FALSE, size_dots = 1.5, ... )
mfd |
data.table with molecular formula data as derived from
|
z_var |
Character. Column name for variable used for color-coding. Content of column should be numeric. |
fun |
Function used to aggregate |
palname |
Character. Name of the palette. Available palettes:
|
tf |
Logical. If |
size_dots |
Numeric. Size of the dots in the plot (default = 0.5). |
... |
Arguments passed on to
|
Carbon vs Mass (CvM) Diagram
A ggplot or plotly object.
Reemtsma, T. (2010). The carbon versus mass diagram to visualize and exploit FTICR-MS data of natural organic matter. Journal of Mass Spectrometry, 45(4), 382–390. doi:10.1002/jms.1722
Other uplots:
uplot_cluster(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_cvm(mfd = mf_data_demo, z_var = "co_tot", ume_logo = FALSE) uplot_cvm(mfd = mf_data_demo, z_var = "norm_int", palname = "viridis") ## Not run: uplot_cvm(mfd = mf_data_demo, z_var = "co_tot", interactive = TRUE) uplot_cvm(mf_data_demo, base_size = 11, palname = "awi", tf = TRUE, title_show = FALSE, col_bar = FALSE) ## End(Not run)uplot_cvm(mfd = mf_data_demo, z_var = "co_tot", ume_logo = FALSE) uplot_cvm(mfd = mf_data_demo, z_var = "norm_int", palname = "viridis") ## Not run: uplot_cvm(mfd = mf_data_demo, z_var = "co_tot", interactive = TRUE) uplot_cvm(mf_data_demo, base_size = 11, palname = "awi", tf = TRUE, title_show = FALSE, col_bar = FALSE) ## End(Not run)
Bar plot showing the frequency distribution of double bond equivalents (dbe)
minus the number of oxygen atoms in a molecular formula (dbe_o).
The unified UME plotting system is applied (theme, labels, logo, hover text, plotly).
The formula assignment strategy follows chemically motivated constraints and group-wise decision criteria based on DBE and oxygen content to distinguish reliable from equivocal molecular formulas.
uplot_dbe_minus_o_freq(mfd, ...)uplot_dbe_minus_o_freq(mfd, ...)
mfd |
data.table with molecular formula data as derived from
|
... |
Arguments passed on to
|
Frequency Plot of DBE - O
ggplot or plotly object
Herzsprung, P., Hertkorn, N., von Tümpling, W., Harir, M., Friese, K., & Schmitt-Kopplin, P. (2014). Understanding molecular formula assignment of Fourier transform ion cyclotron resonance mass spectrometry data of natural organic matter from a chemical point of view. Analytical and Bioanalytical Chemistry, 406(30), 7977–7987. doi:10.1007/s00216-014-8249-y
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_dbe_minus_o_freq(mf_data_demo) uplot_dbe_minus_o_freq(mf_data_demo, interactive = TRUE, ume_logo = FALSE, title_show = FALSE)uplot_dbe_minus_o_freq(mf_data_demo) uplot_dbe_minus_o_freq(mf_data_demo, interactive = TRUE, ume_logo = FALSE, title_show = FALSE)
Creates a scatter plot of DBE (double bond equivalents) vs. number of carbon
atoms. Points are color-coded by a selected variable (z_var). The plot
follows the same stylistic conventions as the other uplot_* functions,
including the unified theme and optional UME caption.
This approach follows the DBE/C concept introduced for identifying aromatic sub-structures in a molecular formula.
uplot_dbe_vs_c( mfd, z_var = "norm_int", fun = median, palname = "redblue", tf = FALSE, size_dots = 1.5, ... )uplot_dbe_vs_c( mfd, z_var = "norm_int", fun = median, palname = "redblue", tf = FALSE, size_dots = 1.5, ... )
mfd |
data.table with molecular formula data as derived from
|
z_var |
Character. Column name for variable used for color-coding. Content of column should be numeric. |
fun |
Function used to aggregate |
palname |
Character. Name of the palette. Available palettes:
|
tf |
Logical. If |
size_dots |
Numeric. Size of the dots in the plot (default = 0.5). |
... |
Arguments passed on to
|
A ggplot2 object or a plotly object (if plotly = TRUE).
Hockaday, W. C., Grannas, A. M., Kim, S., & Hatcher, P. G. (2006). Direct molecular evidence for the degradation and mobility of black carbon in soils from ultrahigh-resolution mass spectral analysis of dissolved organic matter from a fire-impacted forest soil. Organic Geochemistry, 37(4), 501–510. doi:10.1016/j.orggeochem.2005.11.003
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_dbe_vs_c(mf_data_demo, z_var = "norm_int")uplot_dbe_vs_c(mf_data_demo, z_var = "norm_int")
This function generates a scatter plot of DBE (Double Bond Equivalent) versus parts per million (ppm) from the provided data.
It also provides the option to customize the appearance and to return an interactive plotly plot.
uplot_dbe_vs_ma( mfd, z_var = "norm_int", fun = median, palname = "redblue", tf = FALSE, size_dots = 1.5, ... )uplot_dbe_vs_ma( mfd, z_var = "norm_int", fun = median, palname = "redblue", tf = FALSE, size_dots = 1.5, ... )
mfd |
data.table with molecular formula data as derived from
|
z_var |
Character. Column name for variable used for color-coding. Content of column should be numeric. |
fun |
Function used to aggregate |
palname |
Character. Name of the palette. Available palettes:
|
tf |
Logical. If |
size_dots |
Numeric. Size of the dots in the plot (default = 0.5). |
... |
Arguments passed on to
|
A ggplot or plotly object.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_dbe_vs_ma(mfd = mf_data_demo, size_dots = 1)uplot_dbe_vs_ma(mfd = mf_data_demo, size_dots = 1)
This function generates a scatter plot of Double Bond Equivalent (DBE) versus the number of oxygen atoms (o).
It allows for optional customization of colors based on a specified variable (z_var) and offers the
option to convert the plot to an interactive plotly object.
uplot_dbe_vs_o( mfd, z_var = "norm_int", fun = median, palname = "redblue", tf = FALSE, size_dots = 1.5, ... )uplot_dbe_vs_o( mfd, z_var = "norm_int", fun = median, palname = "redblue", tf = FALSE, size_dots = 1.5, ... )
mfd |
data.table with molecular formula data as derived from
|
z_var |
Character. Column name for variable used for color-coding. Content of column should be numeric. |
fun |
Function used to aggregate |
palname |
Character. Name of the palette. Available palettes:
|
tf |
Logical. If |
size_dots |
Numeric. Size of the dots in the plot (default = 0.5). |
... |
Arguments passed on to
|
A ggplot or plotly object.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
Creates a frequency plot (bar plot) for a selected variable in a molecular formula dataset. Values are grouped and counted, then visualized as bars. A unified UME plot theme is applied for consistent styling across all uplot_* functions.
uplot_freq( mfd, var = "14N", col = "grey", space = 0.5, width = 0.3, logo = TRUE, gg_size = 12, plotly = FALSE, ... )uplot_freq( mfd, var = "14N", col = "grey", space = 0.5, width = 0.3, logo = TRUE, gg_size = 12, plotly = FALSE, ... )
mfd |
data.table with molecular formula data as derived from
|
var |
Character. Name of the variable for which the frequency
distribution should be plotted (e.g. |
col |
Bar fill color. |
space |
Not used (kept for backward compatibility). |
width |
Bar width. |
logo |
Logical. If TRUE, adds a UME caption. |
gg_size |
Base text size for |
plotly |
Logical. If TRUE, return interactive plotly object. |
... |
Additional arguments passed to methods. |
A ggplot object, or a plotly object when plotly = TRUE.
Creates a histogram of mass accuracy values (ppm). Includes summary statistics (median, 2.5% and 97.5% quantiles). Follows general uplot behavior:
returns a ggplot2 object by default
converts to plotly only if plotly = TRUE
uses caption-style UME logo
uplot_freq_ma(mfd, ma_col = "ppm", bins = NULL, ...)uplot_freq_ma(mfd, ma_col = "ppm", bins = NULL, ...)
mfd |
data.table with molecular formula data as derived from
|
ma_col |
String. Name of the column having mass accuracy values. |
bins |
Numeric. Number of bins(e.g. for the x-scale in a histogram) |
... |
Arguments passed on to
|
ggplot or plotly object
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
Creates a histogram showing the frequency distribution of mass accuracy
values (ppm).
Displays median and quantile statistics in the title and optionally adds
a UME caption (logo).
The plot uses the unified UME theme (theme_uplots()), ensuring visual
consistency across all uplot_* functions.
uplot_freq_vs_ppm( df, col = "grey", width = 0.01, gg_size = 12, logo = TRUE, plotly = FALSE )uplot_freq_vs_ppm( df, col = "grey", width = 0.01, gg_size = 12, logo = TRUE, plotly = FALSE )
df |
A
|
col |
Character. Histogram bar color. Default |
width |
Numeric. Histogram bin width (not used when |
gg_size |
Base text size for |
logo |
Logical. If TRUE, adds a UME caption. |
plotly |
Logical. If TRUE, return interactive plotly object. |
This plot is useful for visual inspection of mass accuracy performance.
The required additional columns (14N, 32S, 31P, dbe_o) ensure that the
dataset is a complete UME molecular formula table and can be compared
to other quality-control plots.
A ggplot2 histogram, or a plotly object if plotly = TRUE.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_freq_vs_ppm(mf_data_demo)uplot_freq_vs_ppm(mf_data_demo)
Creates a scatter plot of the hydrogen-to-carbon ratio (H/C) versus molecular
mass (nm). Points are color-coded according to a selected intensity or
property column (int_col). This visualization follows the conceptual design
in Schmitt-Kopplin et al. (2010).
The function can optionally add a branding label ("UltraMassExplorer") and can optionally return an interactive Plotly version of the plot.
uplot_hc_vs_m( df, int_col = "norm_int", palname = "redblue", size_dots = 1.2, gg_size = 12, logo = TRUE, plotly = FALSE, ... )uplot_hc_vs_m( df, int_col = "norm_int", palname = "redblue", size_dots = 1.2, gg_size = 12, logo = TRUE, plotly = FALSE, ... )
df |
A
|
int_col |
Character, column used for color-coding. Default |
palname |
Character, palette name passed to |
size_dots |
Numeric. Size of the dots in the plot (default = 0.5). |
gg_size |
Base text size for |
logo |
Logical. If TRUE, adds a UME caption. |
plotly |
Logical. If TRUE, return interactive plotly object. |
... |
Arguments passed on to
|
A ggplot2 scatter plot, or a plotly object if plotly = TRUE.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_hc_vs_m(mf_data_demo, int_col = "norm_int")uplot_hc_vs_m(mf_data_demo, int_col = "norm_int")
Produces a boxplot visualizing the distribution of mass accuracy (ppm)
for different heteroatom combinations (nsp_type) defined by the number
of nitrogen (N), sulfur (S), and phosphorus (P) atoms in each formula.
The plot can be returned as either a ggplot object or as an interactive
plotly object (plotly = TRUE). An optional “UltraMassExplorer”
watermark can be added.
uplot_heteroatoms(df, col = "grey", gg_size = 12, logo = TRUE, plotly = FALSE)uplot_heteroatoms(df, col = "grey", gg_size = 12, logo = TRUE, plotly = FALSE)
df |
A
|
col |
Character. Box color. Default |
gg_size |
Base text size for |
logo |
Logical. If TRUE, adds a UME caption. |
plotly |
Logical. If TRUE, return interactive plotly object. |
A ggplot or plotly interactive boxplot.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_heteroatoms(mf_data_demo)uplot_heteroatoms(mf_data_demo)
Isotope precision describes how reliably the instrument reproduces the
expected intensity of the naturally occurring isotope
peak relative to its corresponding monoisotopic peak.
uplot_isotope_precision( mfd, z_var = "nsp_tot", int_col = "norm_int", size_dots = 1.5, bins = 100, data_reduction = FALSE, tf = FALSE, logo = TRUE, plotly = FALSE, cex.axis = 1, cex.lab = 1.4 )uplot_isotope_precision( mfd, z_var = "nsp_tot", int_col = "norm_int", size_dots = 1.5, bins = 100, data_reduction = FALSE, tf = FALSE, logo = TRUE, plotly = FALSE, cex.axis = 1, cex.lab = 1.4 )
mfd |
data.table with molecular formula data as derived from
|
z_var |
Column used for color mapping (default: "nsp_tot") |
int_col |
Intensity column (default: "norm_int") |
size_dots |
Numeric. Size of the dots in the plot (default = 0.5). |
bins |
Number of bins used when data_reduction = TRUE |
data_reduction |
Logical. If TRUE, bins the data and uses bin medians (recommended for very large datasets; speeds up rendering massively). |
tf |
Logical. If |
logo |
Logical. If TRUE, adds a UME caption. |
plotly |
Logical. Return a plotly object instead of ggplot. |
cex.axis |
Numeric. Size of axis text (default is |
cex.lab |
Numeric. Size of axis labels (default is |
The measured signal provides an
intrinsic validation of molecular formula assignments.
For a molecule containing carbon atoms with a natural abundance
of 1.07% for
, the theoretical relative intensity
of the isotope peak is:
The measured intensity provides an independent estimate of
the number of carbon atoms:
From this, the deviation in carbon number can be defined:
A value of indicates perfect agreement between the
formula assignment and the isotope-based estimate. Negative values indicate
that the measured isotope abundance is lower than expected.
Isotope precision is assessed by evaluating the distribution of
across peaks with sufficient signal quality.
becomes small and stable at higher signal-to-noise ratios
(). Therefore, isotopic peak ratios for intense mass signals
provide an internal metric for validating molecular formula assignments.
The function visualizes the deviation between measured and
theoretical isotope ratios.
Supports optional data reduction (binning) to enhance interactive
rendering speed in Plotly.
A ggplot or plotly object.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
Other isotopes:
calc_isotope_pattern(),
create_isotope_expanded_table(),
eval_isotopes()
This function generates a scatter plot of Kendrick Mass Defect (KMD) versus
nominal mass (nm), with color-coding based on a specified variable
(z_var). Optionally, the plot can be returned as an interactive Plotly
object.
uplot_kmd( mfd, z_var = "norm_int", fun = median, palname = "redblue", tf = FALSE, size_dots = 1, ... )uplot_kmd( mfd, z_var = "norm_int", fun = median, palname = "redblue", tf = FALSE, size_dots = 1, ... )
mfd |
data.table with molecular formula data as derived from
|
z_var |
Character. Column name for variable used for color-coding. Content of column should be numeric. |
fun |
Function used to aggregate |
palname |
Character. Name of the palette. Available palettes:
|
tf |
Logical. If |
size_dots |
Numeric. Size of the dots in the plot (default = 0.5). |
... |
Arguments passed on to
|
Kendrick Mass Defect (KMD) vs. Nominal Mass Plot
A ggplot or plotly object.
Kendrick E. (1963). A mass scale based on CH = 14.0000 for high
resolution mass spectrometry of organic compounds.
Analytical Chemistry, 35, 2146–2154.
Hughey C.A., Hendrickson C.L., Rodgers R.P., Marshall A.G., Qian K.N. (2001). Kendrick mass defect spectrum: A compact visual analysis for ultrahigh-resolution broadband mass spectra. Analytical Chemistry, 73, 4676–4681. doi:10.1021/ac010560w
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_kmd(mf_data_demo, z_var = "norm_int")uplot_kmd(mf_data_demo, z_var = "norm_int")
Creates a 3D LC–MS plot (RT x m/z x intensity) when retention time is available.
If no retention-time column exists (e.g., with DI-FTMS demo data), the function
gracefully falls back to uplot_ms() and issues an informative message.
uplot_lcms( pl, mass = "mz", peak_magnitude = "i_magnitude", retention_time = "ret_time_min", label = "file_id", logo = FALSE, ... )uplot_lcms( pl, mass = "mz", peak_magnitude = "i_magnitude", retention_time = "ret_time_min", label = "file_id", logo = FALSE, ... )
pl |
data.table containing peak data. Mandatory columns include neutral
molecular mass ( |
mass |
Column containing m/z values (default |
peak_magnitude |
Column containing intensity (default |
retention_time |
Column with retention time (default |
label |
Sample/group labeling column (default |
logo |
Logical. If TRUE, adds a UME caption. |
... |
Additional arguments passed to methods. |
A plotly 3D visualization (LC-MS) or a 2D MS spectrum fallback.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
Generates a UME-style scatter plot showing mass accuracy (ppm)
versus mass-to-charge ratio (m/z).
Summary statistics (median, 2.5% and 97.5% quantiles) are displayed as horizontal reference lines and an annotation panel.
The plot is returned as a ggplot2 object by default, with optional plotly conversion for interactivity.
uplot_ma_vs_mz(mfd, ma_col = "ppm", logo = FALSE, plotly = FALSE, ...)uplot_ma_vs_mz(mfd, ma_col = "ppm", logo = FALSE, plotly = FALSE, ...)
mfd |
data.table with molecular formula data as derived from
|
ma_col |
Character. Column containing mass accuracy (ppm). |
logo |
Logical. If TRUE, adds a UME caption. |
plotly |
Logical. If TRUE, return interactive plotly object. |
... |
Additional arguments passed to methods. |
A ggplot or plotly object.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_ma_vs_mz(mf_data_demo, ma_col = "ppm")uplot_ma_vs_mz(mf_data_demo, ma_col = "ppm")
Plots a mass spectrum, showing peak magnitude versus mass-to-charge ratio (m/z).
Optionally reduces the dataset by selecting the most abundant peaks per spectrum.
uplot_ms( pl, mass = "mz", peak_magnitude = "i_magnitude", label = "file_id", logo = FALSE, plotly = TRUE, data_reduction = 1, ... )uplot_ms( pl, mass = "mz", peak_magnitude = "i_magnitude", label = "file_id", logo = FALSE, plotly = TRUE, data_reduction = 1, ... )
pl |
A |
mass |
Character. Name of the column containing mass-to-charge or mass
information (default = |
peak_magnitude |
Character. Name of the column containing peak magnitude
(default = |
label |
Character. Name of the column identifying individual spectra
(default = |
logo |
Logical. If TRUE, adds a UME caption. |
plotly |
Logical. If TRUE, return interactive plotly object. |
data_reduction |
Numeric between 0 and 1. Fraction of the most abundant peaks to retain per spectrum. Default = 1 (no reduction). If set to 0, a minimum of 0.01 is used to ensure some data is displayed. |
... |
Additional arguments passed to methods. |
A ggplot object or a plotly object if plotly = TRUE.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_ms(pl = peaklist_demo, data_reduction = 0.1, plotly = TRUE) uplot_ms(pl = peaklist_demo, data_reduction = 1, plotly = FALSE)uplot_ms(pl = peaklist_demo, data_reduction = 0.1, plotly = TRUE) uplot_ms(pl = peaklist_demo, data_reduction = 1, plotly = FALSE)
Creates a bar plot showing how many molecular formulas were assigned per
sample (file_id). The plot title contains the mean and standard deviation
of assigned molecular formulas across samples. Optionally, the plot can be
converted to an interactive Plotly plot or display the UltraMassExplorer logo.
uplot_n_mf_per_sample( df, col = "grey", logo = TRUE, width = 0.3, gg_size = 12, plotly = FALSE )uplot_n_mf_per_sample( df, col = "grey", logo = TRUE, width = 0.3, gg_size = 12, plotly = FALSE )
df |
A data.table containing at least a |
col |
Character. Fill color for the bars (default |
logo |
Logical. If TRUE, adds a UME caption. |
width |
Numeric. Width of bars (default |
gg_size |
Base text size for |
plotly |
Logical. If TRUE, return interactive plotly object. |
Number of Molecular Formulas per Sample / File
A ggplot object, or a plotly object if plotly = TRUE.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
uplot_n_mf_per_sample(mf_data_demo)uplot_n_mf_per_sample(mf_data_demo)
Performs Principal Component Analysis (PCA) on molecular formula intensity data and visualizes the results as a PCA score plot and a Van Krevelen plot colored by PC1 loadings.
uplot_pca( mfd, grp, int_col = "norm_int", palname = "viridis", col_bar = TRUE, ... )uplot_pca( mfd, grp, int_col = "norm_int", palname = "viridis", col_bar = TRUE, ... )
mfd |
data.table with molecular formula data as derived from
|
grp |
Character. Name of the column used to define rows/samples in the PCA matrix. |
int_col |
Character. Name of the intensity column used for PCA
(default = |
palname |
Character. Name of the color palette passed to |
col_bar |
Logical. If |
... |
Additional arguments passed to |
Principal Component Analysis (PCA) Plotting
The PCA is performed on a wide matrix with one row per group defined by
grp and one column per molecular formula (mf). Intensities are aggregated
using the mean if multiple values occur for the same combination of grp
and mf.
Columns with zero variance are removed before PCA because they cannot be
scaled. The argument grp defines the observational unit for the PCA, for
example "file_id", "sample_id", or "ms_id".
A list containing:
The PCA model object returned by stats::prcomp().
A data.table with PCA scores for each group.
A Van Krevelen plot colored by PC1 loadings.
A PCA score plot of PC1 versus PC2.
The input molecular formula data augmented with PC1/PC2 scores and PC1/PC2 loadings.
The function uses stats::prcomp() for PCA and uplot_vk() for the
Van Krevelen plot.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
res <- uplot_pca( mfd = mf_data_demo, grp = "file_id", int_col = "norm_int" ) res$fig_pca res$fig_vkres <- uplot_pca( mfd = mf_data_demo, grp = "file_id", int_col = "norm_int" ) res$fig_pca res$fig_vk
This function generates a bar plot showing the median of mass accuracy (ppm) for each sample.
It also provides the option to convert the plot into an interactive plotly object.
uplot_ppm_avg(df, cex.axis = 12, cex.lab = 15, plotly = FALSE, ...)uplot_ppm_avg(df, cex.axis = 12, cex.lab = 15, plotly = FALSE, ...)
df |
A data frame containing the data. The columns |
cex.axis |
Numeric. Size of axis text (default is |
cex.lab |
Numeric. Size of axis labels (default is |
plotly |
Logical. If TRUE, return interactive plotly object. |
... |
Additional arguments passed to methods. |
A ggplot object or a plotly object depending on the plotly argument.
Computes the intensity ratio between a sample and a control group and visualizes it in a Van Krevelen diagram. Optionally highlights unique molecular formulas and plots the ratio distribution.
uplot_ratios( df, upper = 90, lower = -90, grp = "file_id", int_col = "norm_int", control, sample, uniques = FALSE, conservative = FALSE, palname = "ratios", distrib = TRUE, main = NA, ... )uplot_ratios( df, upper = 90, lower = -90, grp = "file_id", int_col = "norm_int", control, sample, uniques = FALSE, conservative = FALSE, palname = "ratios", distrib = TRUE, main = NA, ... )
df |
A data.table containing at least columns:
|
upper, lower
|
Ratio filtering limits (default 90 / -90) |
grp |
Column defining sample/control grouping |
int_col |
Intensity column to use |
control |
Character: control group name |
sample |
Character: sample group name |
uniques |
Logical: highlight uniquely present formulas |
conservative |
Logical: stricter uniqueness definition |
palname |
Color palette for projection |
distrib |
Logical: include ratio distribution plot |
main |
Optional main title |
... |
Additional arguments passed to methods. |
Ratio Plot in Van Krevelen Space
A list with:
ratio_table
plot_ratio_vk
plot_ratio_distr
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_reproducibility(),
uplot_ri_vs_sample(),
uplot_vk()
Computes reproducibility of sample analyses based on the relative intensity
column (norm_int). For each molecular formula (mf), the function calculates:
number of occurrences (N)
median relative intensity (ri)
relative standard deviation (RSD = sd/median × 100)
It also bins ri into integer bins and calculates the median RSD per bin.
The function returns:
processed tables
two ggplot2 objects:
intensity vs RSD scatter plot
binned median RSD plot
uplot_reproducibility(df, ri = "norm_int")uplot_reproducibility(df, ri = "norm_int")
df |
A data.table or data.frame containing at least columns |
ri |
Character string: name of the intensity column. Default: |
A list containing:
tmpSummary table by molecular formula
tmp2Binned median RSD table
plot_rsdScatter plot of RI vs RSD (ggplot2)
plot_binsMedian RSD per bin (ggplot2)
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_ri_vs_sample(),
uplot_vk()
out <- uplot_reproducibility(mf_data_demo, ri = "norm_int") out$plot_rsd out$plot_binsout <- uplot_reproducibility(mf_data_demo, ri = "norm_int") out$plot_rsd out$plot_bins
Creates a bar plot showing the median relative intensity (default: norm_int)
for each sample (grouped by file_id).
The overall dataset-wide median and standard deviation are shown in the title.
uplot_ri_vs_sample( df, int_col = "norm_int", grp = "file_id", col = "grey", logo = TRUE, width = 0.3, gg_size = 12 )uplot_ri_vs_sample( df, int_col = "norm_int", grp = "file_id", col = "grey", logo = TRUE, width = 0.3, gg_size = 12 )
df |
A data.table containing at least:
|
int_col |
Character. Column name containing relative intensity values. |
grp |
Character. Column name specifying sample / file grouping. |
col |
Character. Fill color for bars. |
logo |
Logical. If TRUE, adds a UME caption. |
width |
Numeric. Width of bars (default |
gg_size |
Base text size for |
Plot Average Relative Intensity per Sample
A ggplot2 object containing a bar plot of per-sample median relative intensity.
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_vk()
uplot_ri_vs_sample(mf_data_demo, int_col = "norm_int", grp = "file")uplot_ri_vs_sample(mf_data_demo, int_col = "norm_int", grp = "file")
Creates a Van Krevelen diagram (H/C vs O/C).
uplot_vk( mfd, z_var = "norm_int", projection = TRUE, palname = "viridis", median_vK = TRUE, col_median = "white", ai = TRUE, size_dots = 3, col_bar = TRUE, tf = FALSE, ... )uplot_vk( mfd, z_var = "norm_int", projection = TRUE, palname = "viridis", median_vK = TRUE, col_median = "white", ai = TRUE, size_dots = 3, col_bar = TRUE, tf = FALSE, ... )
mfd |
data.table with molecular formula data as derived from
|
z_var |
Character. Column name for variable used for color-coding. Content of column should be numeric. |
projection |
If TRUE, median z-values per (oc, hc) are used. |
palname |
Character. Name of the palette. Available palettes:
|
median_vK |
Add median VK point. |
col_median |
Color of the marker for the median O/C and H/C value (Default = "white") |
ai |
Add aromaticity index threshold lines. |
size_dots |
Numeric. Size of the dots in the plot (default = 0.5). |
col_bar |
Logical. If |
tf |
Logical. If |
... |
Arguments passed on to
|
Plot Van Krevelen Diagram
ggplot or plotly object
Other uplots:
uplot_cluster(),
uplot_cvm(),
uplot_dbe_minus_o_freq(),
uplot_dbe_vs_c(),
uplot_dbe_vs_ma(),
uplot_dbe_vs_o(),
uplot_freq_ma(),
uplot_freq_vs_ppm(),
uplot_hc_vs_m(),
uplot_heteroatoms(),
uplot_isotope_precision(),
uplot_kmd(),
uplot_lcms(),
uplot_ma_vs_mz(),
uplot_ms(),
uplot_n_mf_per_sample(),
uplot_pca(),
uplot_ratios(),
uplot_reproducibility(),
uplot_ri_vs_sample()
This function computes an out_score for each value in a selected column.
The score increases when a value is flagged as an outlier by one or more tests:
IQR test, quantile cutoffs, and Hampel filter.
ustats_outlier(dt, check_col = "ppm", verbose = FALSE, ...)ustats_outlier(dt, check_col = "ppm", verbose = FALSE, ...)
dt |
A |
check_col |
A character string naming the column to test for outliers. |
verbose |
Logical; print summary statistics when TRUE. |
... |
Additional arguments passed to methods. |
A data.table containing new columns: out_score, out_box,
out_quantile, and out_hampel.
ustats_outlier(mf_data_demo, check_col = "ppm")ustats_outlier(mf_data_demo, check_col = "ppm")