This function characterizes peaks by species/strain in a simulated spectrum after taking the highest peak or merging peaks in each bin.
characterize_peak(spec, option = 1, bin.size = 1, min.mz = 1000, max.mz = 2200)
characterize_peak(spec, option = 1, bin.size = 1, min.mz = 1000, max.mz = 2200)
spec |
A data frame that contains m/z values of peaks, normalized intensities of peaks, species names, and strain names. Either an output of |
option |
An option on how to merge peaks. There are two options: 1) no merge, thus take the highest intensity peak in each bin after binning a spectrum by bin.size, or 2) take a sum of intensity within each bin after binning a spectrum by bin.size. |
bin.size |
An integer. A bin size. (1 by default) |
min.mz |
A real number. Minimum mass-to-charge ratio. (1000 by default) |
max.mz |
A real number. Maximum mass-to-charge ratio. (2200 by default) |
A data frame that contains m/z values of peaks (mz), intensities of peaks (int), species names (species), and strain names (strain). Species and strain columns may contain more than one species/strain if an option 2 is chosen.
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C)) mixture.ratio <- list() mixture.ratio['A']=1 mixture.ratio['B']=0.5 mixture.ratio['C']=0 sim.template <- create_insilico_mixture_template(mono.info) insilico.spectrum <- simulate_poly_spectra(sim.template, mixture.ratio) merged.spectrum <- characterize_peak(insilico.spectrum, option=2)
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C)) mixture.ratio <- list() mixture.ratio['A']=1 mixture.ratio['B']=0.5 mixture.ratio['C']=0 sim.template <- create_insilico_mixture_template(mono.info) insilico.spectrum <- simulate_poly_spectra(sim.template, mixture.ratio) merged.spectrum <- characterize_peak(insilico.spectrum, option=2)
This function generates an intial template for simulated mass spectra.
create_insilico_mixture_template(mono.info, mz.tol = 0.5)
create_insilico_mixture_template(mono.info, mz.tol = 0.5)
mono.info |
An output of |
mz.tol |
A m/z tolerance in Da. (Default: 0.5) |
A data frame which contains simulated m/z, log intensity, and normalized intensity values of peaks.
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C)) template <- create_insilico_mixture_template(mono.info)
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C)) template <- create_insilico_mixture_template(mono.info)
Internal function. This function removes peaks with their mass values (m/z values) outside a given mass range.
This function is used in process_monospectra
.
filtermass(spectra, mass.range)
filtermass(spectra, mass.range)
spectra |
Mass Spectra (A MALDIquant MassSpectrum (S4) object). An output of |
mass.range |
Mass (m/z) range (a vector). For exmaple, c(1000,2200). |
A list of filtered mass spectra (MALDIquant MassSpectrum (S4) objects) which contains mass, intensity, and metaData.
This function combines outputs from summarize_monospectra
.
gather_summary(x)
gather_summary(x)
x |
A list of multiple monomicrobial mass spectra information from |
A list of combined summaries (data frames) of mass spectra from summarize_monospectra
and the corresponding species (a vector).
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C))
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C))
This function combines output files from summarize_monospectra
.
gather_summary_file(directory)
gather_summary_file(directory)
directory |
A directory that contains summary files from |
A list of combined summaries of mass spectra (data frames) from summarize_monospectra
and the corresponding species (a vector).
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) summary <- gather_summary_file(directory=tempdir())
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) summary <- gather_summary_file(directory=tempdir())
Internal function. This function preprocesses spectra by transforming/smoothing intensity, removing baseline, and calibrating intensities.
preprocessMS(spectra, halfWindowSize = 20, SNIP.iteration = 60)
preprocessMS(spectra, halfWindowSize = 20, SNIP.iteration = 60)
spectra |
Spectra. A MALDIquant object. An output of either |
halfWindowSize |
halfWindowSize The highest peaks in the given window (+/-halfWindowSize) will be recognized as peaks. (Default: 20). See |
SNIP.iteration |
SNIP.iteration An iteration used to remove the baseline of an spectrum. (Default: 60). See |
The processed mass spectra. A list of MALDIquant MassSpectrum objects (S4 objects).
This function processes multiple mzXML files which are listed in the file that an user specifies.
process_monospectra( file, mass.range = c(1000, 2200), halfWindowSize = 20, SNIP.iteration = 60 )
process_monospectra( file, mass.range = c(1000, 2200), halfWindowSize = 20, SNIP.iteration = 60 )
file |
A file name. This file is a tab-delimited file which contains the following columns: file names, strain.no, and strain. See below for details. |
mass.range |
The m/z range that users want to consider for the analysis. (Default: c(1000,2200)). |
halfWindowSize |
A half window size used for the smoothing the intensity values. (Default: 20). See |
SNIP.iteration |
An iteration used to remove the baseline of an spectrum. (Default: 60). See |
A list of processed monobacterial mass spectra (S4 objects, MALDIquant MassSpectrum objects), and their strain numbers (a vector), unique strains (a vector), and strain names (a vector).
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200))
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200))
Internal function. The function simulates m/z and intensity values using given summary statistics.
simulate_ind_spec_single(interest, mz.tol, species, strain)
simulate_ind_spec_single(interest, mz.tol, species, strain)
interest |
Summary statistics of spectra. |
mz.tol |
The tolerance of m/z. This is used to generate m/z values of peaks. |
species |
Species. |
strain |
Strain name. |
A data frame that contains m/z, (normalized) intensity values, missing rates of peaks, species name, and strain name.
The function creates simulated mass spectra in pdf file and returns simulated mass spectra (m/z and intensity values of peaks).
simulate_many_poly_spectra( mono.info, nsim = 10000, file = NULL, mixture.ratio, mixture.missing.prob.peak = 0.05, noise.peak.ratio = 0.05, snr.basepeak = 500, noise.cv = 0.25, mz.range = c(1000, 2200), mz.tol = 0.5 )
simulate_many_poly_spectra( mono.info, nsim = 10000, file = NULL, mixture.ratio, mixture.missing.prob.peak = 0.05, noise.peak.ratio = 0.05, snr.basepeak = 500, noise.cv = 0.25, mz.range = c(1000, 2200), mz.tol = 0.5 )
mono.info |
A list output of |
nsim |
The number of simulated spectra. (Default: 10000) |
file |
An output file name. (By default, file=NULL. No pdf file will be generated.) |
mixture.ratio |
A list of bacterial mixture ratios for given bacterial species in sim.template. |
mixture.missing.prob.peak |
A real value. The missing probability caused by mixing multiple bacteria species. (Default: 0.05) |
noise.peak.ratio |
A ratio between the numbers of noise and signal peaks. (Default: 0.05) |
snr.basepeak |
A (base peak) signal to noise ratio. (Default: 5000) |
noise.cv |
A coefficient of variation of noise peaks. (Default: 0.25) |
mz.range |
A range of m/z values. (Default: c(1000,2200)) |
mz.tol |
m/z tolerance. (Default: 0.5) |
A list of data frames. A list of simulated mass spectra (data frames) that contains m/z values of peaks, normalized intensities of peaks, species names, and strain names. This function also creates pdf files which contain simulated spectra.
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C)) mixture.ratio <- list() mixture.ratio['A']=1 mixture.ratio['B']=0.5 mixture.ratio['C']=0 insilico.spectra <- simulate_many_poly_spectra(mono.info, mixture.ratio=mixture.ratio, nsim=10)
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C)) mixture.ratio <- list() mixture.ratio['A']=1 mixture.ratio['B']=0.5 mixture.ratio['C']=0 insilico.spectra <- simulate_many_poly_spectra(mono.info, mixture.ratio=mixture.ratio, nsim=10)
This function takes simulated m/z and intensities of peaks from create_insilico_mixture_template
and modifies them based on given parameters.
simulate_poly_spectra( sim.template, mixture.ratio, spectrum.name = "Spectrum", mixture.missing.prob.peak = 0.05, noise.peak.ratio = 0.05, snr.basepeak = 500, noise.cv = 0.25, mz.range = c(1000, 2200) )
simulate_poly_spectra( sim.template, mixture.ratio, spectrum.name = "Spectrum", mixture.missing.prob.peak = 0.05, noise.peak.ratio = 0.05, snr.basepeak = 500, noise.cv = 0.25, mz.range = c(1000, 2200) )
sim.template |
A data frame which contains m/z, log intensitiy, normalized intensity values and missing rates of peaks. There are also species and strain information. An object of |
mixture.ratio |
A list of bacterial mixture ratios for given bacterial species in sim.template. |
spectrum.name |
A character. An user can define the spectrum name. (Default: 'Spectrum'). |
mixture.missing.prob.peak |
A real value. The missing probability caused by mixing multiple bacteria species. (Default: 0.05) |
noise.peak.ratio |
A ratio between the numbers of noise and signal peaks. (Default: 0.05) |
snr.basepeak |
A (base peak) signal to noise ratio. (Default: 500) |
noise.cv |
A coefficient of variation of noise peaks. (Default: 0.25) |
mz.range |
A range of m/z values. (Default: c(1000,2200)) |
A data frame that contains m/z values of peaks, normalized intensities of peaks, species names, and strain names. A modified version of sim.template
.
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C)) mixture.ratio <- list() mixture.ratio['A']=1 mixture.ratio['B']=0.5 mixture.ratio['C']=0 sim.template <- create_insilico_mixture_template(mono.info) insilico.spectrum <- simulate_poly_spectra(sim.template, mixture.ratio)
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.B <- process_monospectra( file=system.file("extdata", "listB.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.processed.C <- process_monospectra( file=system.file("extdata", "listC.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir()) spectra.mono.summary.B <- summarize_monospectra( processed.obj=spectra.processed.B, species='B', directory=tempdir()) spectra.mono.summary.C <- summarize_monospectra( processed.obj=spectra.processed.C, species='C', directory=tempdir()) mono.info=gather_summary(c(spectra.mono.summary.A, spectra.mono.summary.B, spectra.mono.summary.C)) mixture.ratio <- list() mixture.ratio['A']=1 mixture.ratio['B']=0.5 mixture.ratio['C']=0 sim.template <- create_insilico_mixture_template(mono.info) insilico.spectrum <- simulate_poly_spectra(sim.template, mixture.ratio)
This function summarizes monomicrobial spectra and writes summary in the specified directory.
summarize_monospectra( processed.obj, species, directory = NULL, minFrequency = 0.5, align.tolerance = 5e-04, snr = 3, halfWindowSize = 20, top.N = 50 )
summarize_monospectra( processed.obj, species, directory = NULL, minFrequency = 0.5, align.tolerance = 5e-04, snr = 3, halfWindowSize = 20, top.N = 50 )
processed.obj |
A list from |
species |
Species name. |
directory |
Directory. (By default, no summary file will be generated.) |
minFrequency |
Percentage value. A minimum occurrence proportion required for building a reference peaks. All peaks with their occurence proportion less than minFrequency will be moved. (Default: 0.50). See |
align.tolerance |
Mass tolerance. Must be multiplied by 10^-6 for ppm. (Default: 0.0005). |
snr |
Signal-to-noise ratio. (Default: 3). |
halfWindowSize |
The highest peaks in the given window (+/-halfWindowSize) will be recognized as peaks. (Default: 20). See |
top.N |
The top N peaks will be chosen for the analysis. An integer value. (Default: 50). |
A data frame that contains the peaks informations: m/z, mean log intensity, standard deviation of log intensity, missing rate of peaks. In addition, it also contains species and strain information.
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir())
spectra.processed.A <- process_monospectra( file=system.file("extdata", "listA.txt", package="MGMS2"), mass.range=c(1000,2200)) spectra.mono.summary.A <- summarize_monospectra( processed.obj=spectra.processed.A, species='A', directory=tempdir())
Internal function. This function calculates summary statistics for peaks afterling aligning spectra of interest.
summary_mono( spectra.interest, minFrequency = 0.5, align.tolerance = 5e-04, snr = 3, halfWindowSize = 20, top.N = 50 )
summary_mono( spectra.interest, minFrequency = 0.5, align.tolerance = 5e-04, snr = 3, halfWindowSize = 20, top.N = 50 )
spectra.interest |
A list which contains peaks information for a strain of interest. |
minFrequency |
Percentage value. A minimum occurrence proportion required for building a reference peaks. All peaks with their occurence proportion less than minFrequency will be moved. (Default: 0.50). See |
align.tolerance |
Mass tolerance. Must be multiplied by 10^-6 for ppm. (Default: 0.0005). |
snr |
Signal-to-noise ratio. (Default: 3). |
halfWindowSize |
The highest peaks in the given window (+/-halfWindowSize) will be recognized as peaks. (Default: 20). See |
top.N |
The top N peaks will be chosen for the analysis. An integer value. (Default: 50). |
Summary information (Data frame) of spectra of interest.