| Title: | R Functions Useful for Community Ecology |
|---|---|
| Description: | A collection of utility functions for community ecology analyses, with emphasis on workflows using the 'phyloseq' and 'vegan' packages. Includes functions for normalizing OTU tables, computing alpha diversity via rarefaction (using a fast C++ implementation), differential abundance comparisons with compact letter displays, primer checking for amplicon sequencing, plotting QIIME 2/DADA2 generated transition stats and miscellaneous helpers for ordination plots and taxonomic name formatting. |
| Authors: | John Quensen [aut, cre, cph] |
| Maintainer: | John Quensen <[email protected]> |
| License: | GPL-2 |
| Version: | 0.2.1 |
| Built: | 2026-05-13 23:10:37 UTC |
| Source: | https://github.com/cran/QsRutils |
Removes elements present in 'y' from 'x'.
x %wo% yx %wo% y
x |
A vector. |
y |
A vector of elements to remove from 'x'. |
A vector containing elements of 'x' that are not in 'y'.
c(1, 2, 3, 4, 5) %wo% c(2, 4) letters[1:5] %wo% c("b", "d")c(1, 2, 3, 4, 5) %wo% c(2, 4) letters[1:5] %wo% c("b", "d")
Transform a percentage to its arc sine.
arc_sine(x)arc_sine(x)
x |
A percentage. |
The arcsine transformation of x.
arc_sine(30.1)arc_sine(30.1)
Indicate Significance with Stars
asterix(prob)asterix(prob)
prob |
p value |
Returns '***; for p < 0.001, '**' for p < 0.01, '*' for p < 0.05.
Character vector of asterisks indicating significance level.
asterix(0.039)asterix(0.039)
Calculates alpha-diversity metrics from n samplings of an OTU table to a constant number of counts per sample.
avg_alpha( otu, sampling_depth, iterations = 100, sum_method = c("median", "mean"), ncores = 1 )avg_alpha( otu, sampling_depth, iterations = 100, sum_method = c("median", "mean"), ncores = 1 )
otu |
An OTU table as a data frame or matrix with samples as rows and taxa as columns. |
sampling_depth |
The number of counts per sample in the sampled OTU table |
iterations |
The number of times the OTU table should be sampled. |
sum_method |
Method ("median" or "mean") for summarizing replication results. |
ncores |
Number of cores to use for parallel execution. Default 1 (no parallelism). |
This implementation focuses on speed: - vectorizes the alpha-metric calculations (avoids repeated calls to vegan::diversity and vegan::specnumber) - uses base R operations (rowSums, logical ops, arithmetic) which are much faster in tight loops - optionally parallelizes replicates across cores (platform-aware).
The OTU data frame supplied must be in typical vegan format: samples as row names and taxa as column names. The minimum row sum must be greater than or equal to the sampling depth.
By default the sum_method is mean. For a similar function in QIIME2, the default sum_method is median.
Returns a data frame with Shannon, Observed, Pielou, Simpson and Inverse Simpson as the column names for each sample as the row names in an OTU table.
{ data(BCI, package = "vegan") otu <- BCI[rowSums(BCI) > 400, ] avg_alpha(otu, sampling_depth = 400, iterations = 100) }{ data(BCI, package = "vegan") otu <- BCI[rowSums(BCI) > 400, ] avg_alpha(otu, sampling_depth = 400, iterations = 100) }
Determine hits of all orientations of primers to paired sequence files.
check_primer_hits( path, fwd_pattern = "_R1.fastq", rev_pattern = "_R2.fastq", fwd_primer = "GGAAGTAAAAGTCGTAACAAGG", rev_primer = "GCTGCGTTCTTCATCGATGC" )check_primer_hits( path, fwd_pattern = "_R1.fastq", rev_pattern = "_R2.fastq", fwd_primer = "GGAAGTAAAAGTCGTAACAAGG", rev_primer = "GCTGCGTTCTTCATCGATGC" )
path |
Path to paired sequence files |
fwd_pattern |
Portion of file name that distinguishes forward read files. The default is "_R1.fastq" |
rev_pattern |
Portion of the file name that distinguishes the reverse file. The default is "_R2.fastq" |
fwd_primer |
Nucleotide sequence of the forward primer |
rev_primer |
Nucleotide sequence of the reverse primer |
This function is for checking the effectiveness of primer trimming of ITS sequences. Because the ITS region varies in length, it is possible for forward sequences to extend past the reverse primenr region and vice-versa, leadsing to what Robert Edgar calls staggered pairs fi the sequences are merged.
The fwd_pattern and rev_pattern must contain the file extension. The defaults are "_R1.fastq" and "_R2.fastq".
Default primers are ITS5 (forward) and ITS2 (reverse) from White et.al 1990.
ITS5: "GGAAGTAAAAGTCGTAACAAGG"
ITS2: "GCTGCGTTCTTCATCGATGC"
A table of hits to the sequences by all primer orientations
{ # Copy example fastq files to a writable temp directory src <- system.file("extdata", package = "QsRutils") path <- file.path(tempdir(), "QsRutils_example") dir.create(path, showWarnings = FALSE) file.copy(list.files(src, pattern = "\\.fastq\\.gz$", full.names = TRUE), path) check_primer_hits( path = path, fwd_pattern = "raw_1.fastq.gz", rev_pattern = "raw_2.fastq.gz", fwd_primer = "GGAAGTAAAAGTCGTAACAAGG", rev_primer = "GCTGCGTTCTTCATCGATGC" ) }{ # Copy example fastq files to a writable temp directory src <- system.file("extdata", package = "QsRutils") path <- file.path(tempdir(), "QsRutils_example") dir.create(path, showWarnings = FALSE) file.copy(list.files(src, pattern = "\\.fastq\\.gz$", full.names = TRUE), path) check_primer_hits( path = path, fwd_pattern = "raw_1.fastq.gz", rev_pattern = "raw_2.fastq.gz", fwd_primer = "GGAAGTAAAAGTCGTAACAAGG", rev_primer = "GCTGCGTTCTTCATCGATGC" ) }
Tests for Heterogeneity of Variances in make_comparisons Result
check_var(otu.pc.transformed, group.vector) ## S3 method for class 'check_var' print(x, ...)check_var(otu.pc.transformed, group.vector) ## S3 method for class 'check_var' print(x, ...)
otu.pc.transformed |
An OTU matrix of transformed data. Taxa are rows. |
group.vector |
A vector of treatments. |
x |
A |
... |
Arguments passed to |
A data frame of class "check_var" with one row per taxon and
columns taxon, statistic, df, and p.value from
the Fligner-Killeen test. The object can be printed with print().
make_comparisons
# Transform species matrix to proportion; # Check variances for the first three species # in the dune data set grouped by Management. data(dune, package = "vegan") data(dune.env, package = "vegan") dune <- vegan::decostand(dune, method = "total") dune <- dune[, 1:3] dune <- t(dune) check_var(dune, dune.env$Management)# Transform species matrix to proportion; # Check variances for the first three species # in the dune data set grouped by Management. data(dune, package = "vegan") data(dune.env, package = "vegan") dune <- vegan::decostand(dune, method = "total") dune <- dune[, 1:3] dune <- t(dune) check_var(dune, dune.env$Management)
Generate compact letter displays from Dunn.test results
cld_dunn(dunn_rslt, significance = 0.05)cld_dunn(dunn_rslt, significance = 0.05)
dunn_rslt |
The result of the function dunn.test::dunn.test |
significance |
The alpha level for statistical significance |
A list of two items. The first, p_adj_matrix, is an object of class 'dist' giving p values adjusted for multiple comparisons. The second,clds, is a character vector of compact letter displays (CLDs) for each treatment.
# Example cribbed and modified from the kruskal.test documentation x <- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects y <- c(3.8, 2.7, 4.0, 2.4) # with obstructive airway disease z <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis x <- c(x, y, z) g <- factor(rep(1:3, c(5, 4, 5)), labels = c("Normal", "COPD", "Asbestosis")) dunn_rslt <- dunn.test::dunn.test(x, g) cld_dunn(dunn_rslt, significance = 0.05)# Example cribbed and modified from the kruskal.test documentation x <- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects y <- c(3.8, 2.7, 4.0, 2.4) # with obstructive airway disease z <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis x <- c(x, y, z) g <- factor(rep(1:3, c(5, 4, 5)), labels = c("Normal", "COPD", "Asbestosis")) dunn_rslt <- dunn.test::dunn.test(x, g) cld_dunn(dunn_rslt, significance = 0.05)
Makes a tibble for adding compact letter assignments to a boxplot using HSD.test results.
cld_hsd(hsd_rslt, y_pos = "boxtop")cld_hsd(hsd_rslt, y_pos = "boxtop")
hsd_rslt |
The result of the HSD.test function of package agricolae |
y_pos |
The y-position in relation to the boxplots. Choices are at the top of the box ("boxtop", the default) or at the maximum group value ("max"). |
hsd_rslt must be created with agricolae::HSD.test
A tibble with columns for treatment groups (x), the y-positions of the treatment CLD (y), and the CLD letters indicating significantly different treatments.
data("iris") model <- lm(Petal.Length ~ Species, data = iris) hsd_rslt <- agricolae::HSD.test(model, trt="Species") cld_hsd(hsd_rslt)data("iris") model <- lm(Petal.Length ~ Species, data = iris) hsd_rslt <- agricolae::HSD.test(model, trt="Species") cld_hsd(hsd_rslt)
Clears all warning messages from the base environment.
clear_warnings()clear_warnings()
Sometimes when working in the console R retains a list of warnings such that they keep being reported after the function call which originated them. This function removes them so that they are not a nuisance.
No return value, called for side effects. Clears the stored warning list so that stale warnings are no longer reported in the console.
{ as.numeric(c("1", "2", "apples")) summary(warnings()) clear_warnings() summary(warnings()) }{ as.numeric(c("1", "2", "apples")) summary(warnings()) clear_warnings() summary(warnings()) }
Calculates the number of combinations of n things drawn r at a time.
comb(n, r, repetition = FALSE)comb(n, r, repetition = FALSE)
n |
The total number of items. |
r |
The number of items to be drawn. |
repetition |
A logical, whether or not repetitions are allowed. FALSE by default. |
An integer giving the number of ways a set of r items can be drawn from a set of n items.
comb(5, 3) comb(5, 3, repetition = TRUE)comb(5, 3) comb(5, 3, repetition = TRUE)
Assembles Comparison Data Frame
comp_assemble(part1, part2, part3)comp_assemble(part1, part2, part3)
part1 |
Result from comp_means_sd |
part2 |
Result from comp_make_f_tests |
part3 |
Result from comp_comparisons |
The data frame returned has taxa as row names. The first three column names are mean (relative abundance of the taxa), sd and F_value for comparisons among groups. The remaining column names are the groups. The group columns show the mean relative abundance for the group plus/minus the standard deviation and a compact letter display for the group. See also the vignette "Compare Relative Abundances Among Treatments."
A summary data frame of differential abundances by taxon and treatment.
{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") temp3 <- comp_means_sd(temp2$otu.pc) temp4 <- comp_make_f_tests(temp2$otu.pc.trans, grps = temp2$groups, var.equal = TRUE) temp5 <- comp_comparisons(otu.pc = temp2$otu.pc, otu.pc.trans = temp2$otu.pc.trans, grps = temp2$groups, p.adjust.method = "BH", pool.sd = TRUE) comp_assemble(temp3, temp4, temp5) |> dplyr::arrange(desc(mean)) }{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") temp3 <- comp_means_sd(temp2$otu.pc) temp4 <- comp_make_f_tests(temp2$otu.pc.trans, grps = temp2$groups, var.equal = TRUE) temp5 <- comp_comparisons(otu.pc = temp2$otu.pc, otu.pc.trans = temp2$otu.pc.trans, grps = temp2$groups, p.adjust.method = "BH", pool.sd = TRUE) comp_assemble(temp3, temp4, temp5) |> dplyr::arrange(desc(mean)) }
Calculates the treatment comparison portion of a table comparing relative abundances of each taxon among treatments.
comp_comparisons( otu.pc, otu.pc.trans, grps, p.adjust.method = "BH", pool.sd = FALSE )comp_comparisons( otu.pc, otu.pc.trans, grps, p.adjust.method = "BH", pool.sd = FALSE )
otu.pc |
An OTU table of percentages. |
otu.pc.trans |
An OTU table of transfromed data. |
grps |
A vector of treatemnt groups for which to make comparisons. |
p.adjust.method |
Adjustment method for multiple comparisons. |
pool.sd |
A logical, whether or not to pool standard deviations. |
The row names of the data frame returned are taxa. The columns are of type character and their names are the group names. For each group, the entry gives the mean relative abundance +/- the standard deviation and a compact letter display (CLD) for the group. See also the vignette "Compare Relative Abundances Among Treatments."
A data frame of differences in relative abundances among treatments.
{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") comp_comparisons(otu.pc = temp2$otu.pc, otu.pc.trans = temp2$otu.pc.trans, grps = temp2$groups, p.adjust.method = "BH", pool.sd = TRUE) }{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") comp_comparisons(otu.pc = temp2$otu.pc, otu.pc.trans = temp2$otu.pc.trans, grps = temp2$groups, p.adjust.method = "BH", pool.sd = TRUE) }
Calculates omnibus F tests to be included in a table comparing relative abundances of each taxon among treatments.
comp_make_f_tests(otu.pc.trans, grps, var.equal = FALSE)comp_make_f_tests(otu.pc.trans, grps, var.equal = FALSE)
otu.pc.trans |
An OTU table of transformed data from comp_prepare_otu_table. |
grps |
A vector of treatment groups for which to make comparisons. |
var.equal |
Logical, whether or not to assume variances equal. |
The row names of the data frame returned are taxa. The column names are 'F', 'Prob', 'sig' and 'F_value'. The column 'sig' includes asterisks indicating the degree of significance and the 'F_value' column is the F value to 2 decimal places plus the asterisk(s) from 'sig'. See also the vignette "Compare Relative Abundances Among Treatments."
A data frame of the F-test results.
{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") temp4 <- comp_make_f_tests(temp2$otu.pc.trans, grps = temp2$groups, var.equal = TRUE) temp4 }{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") temp4 <- comp_make_f_tests(temp2$otu.pc.trans, grps = temp2$groups, var.equal = TRUE) temp4 }
Calculates means and standard deviation for each taxon to be included in a table comparing relative abundances of each taxon among treatments.
comp_means_sd(otu.pc)comp_means_sd(otu.pc)
otu.pc |
An OTU table with data as percentages. |
The OTU table should be created with comp_prepare_otu_table. The data frame returned has taxa as row names. For each taxa, 'mean' is the mean relative abundance and 'sd' is the standard deviation. See also the vignette "Compare Relative Abundances Among Treatments."
A data frame with mean relative abundance and standard deviations by taxon.
{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") temp3 <- comp_means_sd(temp2$otu.pc) temp3 }{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") temp3 <- comp_means_sd(temp2$otu.pc) temp3 }
Make OTU tables for making comparisons of relative abundances among treatments.
comp_prepare_otu_table( expt.taxon.pc, grps = "Treatment", transformation = "sqrt_arc_sine" )comp_prepare_otu_table( expt.taxon.pc, grps = "Treatment", transformation = "sqrt_arc_sine" )
expt.taxon.pc |
Phyloseq object from comp_prepare_phyloseq with percentages in the otu_table. |
grps |
Factor in sample data for which to make comparisons. |
transformation |
Transformation function to use. |
The transformation applied may be "none" or a user-supplied function name in quotation marks or any of the built-it transformations("arc_sine", "log_arc_sine", or "sqrt_arc_sine"). The "sqrt_arc_sine" has generally proven most effective. See also the vignette "Compare Relative Abundances Among Treatments."
A list consisting of an OTU table with percentages, an OTU table with transformed data, and a vector of treatment groups.
{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") temp2 }{ data("its.root") temp1 <- comp_prepare_phyloseq(its.root) temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc, grps = "Label", transformation = "sqrt_arc_sine") temp2 }
Prepares a phyloseq object for making comparisons of relative abundances among treatments.
comp_prepare_phyloseq(expt, taxrank = "Phylum", pc.filter = 0.01)comp_prepare_phyloseq(expt, taxrank = "Phylum", pc.filter = 0.01)
expt |
Experiment level phyloseq object. |
taxrank |
Taxonomic rank for which to make comparisons. |
pc.filter |
Minimum percentage of total counts to include rank in result. |
For both returned phyloseq objects, the OTU table has been filtered to include only OTUs initially present at >= pc.filter times the original total count and only taxrank is included in the taxonomy table. For the second object in the list, the OTU table has been transformed to percentages of the total counts per sample. See also the vignette "Compare Relative Abundances Among Treatments."
A list of two modified experiment level phyloseq objects.
{ data(its.root) temp1 <- comp_prepare_phyloseq(its.root) temp1 }{ data(its.root) temp1 <- comp_prepare_phyloseq(its.root) temp1 }
Transform an angle in degrees to radians.
deg2rad(x)deg2rad(x)
x |
Angle in degrees |
Angle in radians.
deg2rad(90)deg2rad(90)
Based on 16S rRNA gene sequences from Iowa loess soil
exptexpt
A phyloseq object with otu_table, sample_data, tax_table, phylogenetic tree and refseqs.
0-20 cm, 60-80cm
HNC
Top, Middle, Bottom
Formats a taxon so that it will be properly italicized in a ggpplot
format_taxon(x)format_taxon(x)
x |
A string representing a phylum, class, etc. |
If a taxon begins with an upper case letter followed by lower case letters and does not contain an underscore, it is wrapped in asterisks. If it begins with an upper case letter followed by lower case letters and contains an underscore, the portion before the underscore is wrapped in asterisks, the underscore removed and any letters following the underscore left alone. If a taxon contains all upper case letters or digits it is not a proper taxon and is left alone.
The string with asterisks properly inserted.
format_taxon("UAB") format_taxon("Pseudomonas_B") format_taxon("Pseudomonas")format_taxon("UAB") format_taxon("Pseudomonas_B") format_taxon("Pseudomonas")
Generates a random character string of specified length.
generate_password(n, type = "alpha_numeric")generate_password(n, type = "alpha_numeric")
n |
Numberof characters in password. |
type |
c("alpha_numeric", "anything_else") |
If type equals "alpha_numeric" (the default), only alpha-numeric characters are used to generate the password. If type does not equal "alpha_numeric" then at least one non-alpha-numeric symbol will be included in the password. In either case, the alpha characters used are both upper and lower case.
A character string.
generate_password(8)generate_password(8)
Assign treatment groups based on pairwise t-tests.
get_groups(ptt.rslt, alpha = 0.05, rm.subset = FALSE)get_groups(ptt.rslt, alpha = 0.05, rm.subset = FALSE)
ptt.rslt |
Result from the stats function pairwise.t.test. |
alpha |
Confidence level. |
rm.subset |
A logical; remove group subsets if true. |
This function aids in making letter assignments as to which treatments are significantly different. Also returns a square matrix of alpha values for all pairwise differences. This square matrix can serve as input to the multcompLetters function of the multcompView package which provides letter assignments. If rm.subset is FALSE, then groups such as {A,B} and {A, B, C} may be reported. This is redundant in the sense the {A, B} is a subset of {A, B, C}. In this case if rm.subset is FALSE, the group {A, B} is not reported.
A list consisting of groups of treatment groups that are not significantly different and a matrix of p values.
make_letter_assignments
attach(airquality) Month <- factor(Month, labels = month.abb[5:9]) ptt.rslt <- pairwise.t.test(Ozone, Month) detach(airquality) get_groups(ptt.rslt, alpha = 0.05, rm.subset = FALSE)attach(airquality) Month <- factor(Month, labels = month.abb[5:9]) ptt.rslt <- pairwise.t.test(Ozone, Month) detach(airquality) get_groups(ptt.rslt, alpha = 0.05, rm.subset = FALSE)
Gets the ranges for the width and height of a ggplot panel.
get_plot_limits(plot)get_plot_limits(plot)
plot |
A plot created with ggplot2 |
Sometimes when adding text to a ggplot, the text is cutoff if it extends beyond the limits of plot panel. This function provides information enabling the user to extend the panel limits so that the text is not cutoff.
A list: xmin, xmax, ymin, ymax giving the coordinates of the limits of a ggplot panel.
library(ggplot2) data(iris) plt <- ggplot(data=iris, aes(x=Species, y=Petal.Length)) + geom_boxplot() get_plot_limits(plt)library(ggplot2) data(iris) plt <- ggplot(data=iris, aes(x=Species, y=Petal.Length)) + geom_boxplot() get_plot_limits(plt)
Calculates Good's coverage from a community data matrix with samples as rows and OTUs as columns.
goods(com)goods(com)
com |
a vegan compatible community data matrix. |
A table with the headings number of singletons, number of sequences, and Good's coverage for each sample in rows.
Good, I. J. 1953. The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika 40:237-264.
{ data(dune, package = "vegan") goods(dune) }{ data(dune, package = "vegan") goods(dune) }
Renames sequences in a multi-fasta file with the MD5 hash of each sequence.
hash_dna_seqs(seqs)hash_dna_seqs(seqs)
seqs |
A list of DNA sequences |
Taxa names for the ASV table returned by the R version of DADA2 are the sequences themselves. This function renames them with the MD5 hash of each sequences so that the results are directly comparable to QIIME2/DADA2 results.
DNA sequences renamed with their hashes.
seqs <- ">ASV1 AGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAG" hash_dna_seqs(seqs)seqs <- ">ASV1 AGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAG" hash_dna_seqs(seqs)
Based on ITS2 sequences amplified from corn roots.
its.rootits.root
A phyloseq object with otu_table, sample_data and tax_table. The sample_data variables are:
Phosporous level, H or L
One of three: 2, 3, and C
A code for treatments: 2HR, 2LR, 3HR, 3LR, CHR, CLR
Transform a percentage to the log of its arc sine.
log_arc_sine(x)log_arc_sine(x)
x |
A percentage. |
The common logarithm of the arcsine transformation of x.
log_arc_sine(x = 30.1)log_arc_sine(x = 30.1)
Makes multiple comparisons of the relative abundances of taxa between treatment groups using the pairwise.t.test. Data may be transformed by a user supplied function. Three are included in this package.
make_comparisons( expt, taxrank = "Phylum", grps = "Treatment", transformation = "none", pc.filter = 0.01, p.adjust.method = "BH", pool.sd = FALSE )make_comparisons( expt, taxrank = "Phylum", grps = "Treatment", transformation = "none", pc.filter = 0.01, p.adjust.method = "BH", pool.sd = FALSE )
expt |
Experiment level phyloseq object. |
taxrank |
Rank for which to make comparisons. |
grps |
Factor in sample data for which to make comparisons. |
transformation |
Transformation function to use. |
pc.filter |
Minimum percentage of total counts to include rank in result. |
p.adjust.method |
Adjustment method for multiple comparisons. |
pool.sd |
Logical, whether or not to pool standard deviations. |
This is essentially a wrapper around the functions comp_prepare_phyloseq(), comp_prepare_otu_table(), comp_means_sd(), comp_make_f_tests(), comp_comparisons() and comp_assemble(). Transformation may be "none" or a user-supplied function name in quotation marks or any of the built-it transformations ("arc_sine", "log_arc_sine", or "sqrt_arc_sine"). The "sqrt_arc_sine" has generally proven most effective.
A list consisting of three data frames and a vector of treatment groups. The first data frame is comparison.table.giving for each row (taxon) the mean relative abundance, standard deviation, F statistic with significance indicated by asterisks, and then for each treatment group the mean+/-sd with a CLD indicating group membership. The second data frame is taxa.pc giving for each row (taxon) the relative abundance of each treatment group. The third data frame is taxa.pc.transformed. It is like the second data frame but the data has been transformed using the specified function.
arc_sine, log_arc_sine, sqrt_arc_sine, check_var
{ data(its.root) make_comparisons(its.root, taxrank = "Phylum", grps = "Label", transformation = "sqrt_arc_sine", pc.filter = 0.01, p.adjust.method ="BH", pool.sd = TRUE) }{ data(its.root) make_comparisons(its.root, taxrank = "Phylum", grps = "Label", transformation = "sqrt_arc_sine", pc.filter = 0.01, p.adjust.method ="BH", pool.sd = TRUE) }
Makes letter assignments for treatment groups that are not significantly different.
make_letter_assignments(ptt.rslt, significance = 0.05)make_letter_assignments(ptt.rslt, significance = 0.05)
ptt.rslt |
Output from the pairwise.t.test function. |
significance |
Alpha level to be declared a significant difference. |
Letter assignments are made using Piepho's algorithm.
A named vector of letter assignments. Names are treatment groups.
Piepho, H. P. 2004. An algorithm for a letter-based representation of all-pairwise comparisons. Journal of Computational and Graphical Statistics **13**:456-466.
{ data(iris, package = "datasets") ptt.rslt <- with(iris, pairwise.t.test(Petal.Length, Species, pool.sd = FALSE)) make_letter_assignments(ptt.rslt, significance = 0.05) }{ data(iris, package = "datasets") ptt.rslt <- with(iris, pairwise.t.test(Petal.Length, Species, pool.sd = FALSE)) make_letter_assignments(ptt.rslt, significance = 0.05) }
Merge two data frames by their row names.
merge_2_frames(one, two)merge_2_frames(one, two)
one |
A data frame. |
two |
A second data frame. |
Merges data frames by common row names. This function differs from merge.data.frames in that the merged data frame returned has row names and not a new column of the row names.
A merged data frame.
{ common_rows <- paste0("ID_", 1:5) df1 <- data.frame( Value_A = runif(5), Category = sample(c("X", "Y"), 5, replace = TRUE), row.names = common_rows ) df2 <- data.frame( Value_B = rnorm(5), Flag = sample(c(TRUE, FALSE), 5, replace = TRUE), row.names = common_rows ) merge_2_frames(df1, df2) }{ common_rows <- paste0("ID_", 1:5) df1 <- data.frame( Value_A = runif(5), Category = sample(c("X", "Y"), 5, replace = TRUE), row.names = common_rows ) df2 <- data.frame( Value_B = rnorm(5), Flag = sample(c(TRUE, FALSE), 5, replace = TRUE), row.names = common_rows ) merge_2_frames(df1, df2) }
Makes ordination axis labels that include, if appropriate, the % of the total variance explained by each axis.
ord_labels(ord)ord_labels(ord)
ord |
A vegan ordination object. |
If there are no eigenvalues in ord, or if any eigenvalues are less than 0, each element of the vector returned has the form "DIMn" where n is the axis number. Otherwise, each element of the vector returned has the form Axisn xx.x % where Axis is taken from the vector of eigenvalues in ord if they are named or simply "DIM" if they are not, n is the number of the axis, and xx.x is the % of total variance explained by the axis.
For this function to work correctly, ord should be created in one of the following ways:
As an unconstrained ordination using vegan::rda. In this case the labels have the form PCAn xx.x %
As a PCoA made with stats::cmdscale. In this case the labels have the form DIMn xx.x %.
As a CA made with vegan::ca. In this case the labels have the form CAn xx.x %.
A character vector, each element of which can be used to label the corresponding axis of an ordination plot.
{ # For PCA using rda: data("dune", package = "vegan") dune_hel <- vegan::decostand(dune, method = "hellinger") pca <- vegan::rda(dune_hel) print("For the PCA case:") print(ord_labels(pca)[1:2]) cat("\n") # For PCoA with negative eigenvalues d <- vegan::vegdist(dune) pcoa <- stats::cmdscale(d, k = nrow(dune)-1, eig = TRUE, add = FALSE) print("For the PCoA case with negative eigenvalues:") print(ord_labels(pcoa)[1:2]) cat("\n") # For PCoA without negative eigenvalues pcoa <- stats::cmdscale(d, k = nrow(dune)-1, eig = TRUE, add = TRUE) print("For the PCoA case without negative eigenvalues:") print(ord_labels(pcoa)[1:2]) cat("\n") # For correspondence analysis ca_ord <- vegan::ca(dune) (ord_labels(ca_ord)) print("For the CA case:") print(ord_labels(ca_ord)) }{ # For PCA using rda: data("dune", package = "vegan") dune_hel <- vegan::decostand(dune, method = "hellinger") pca <- vegan::rda(dune_hel) print("For the PCA case:") print(ord_labels(pca)[1:2]) cat("\n") # For PCoA with negative eigenvalues d <- vegan::vegdist(dune) pcoa <- stats::cmdscale(d, k = nrow(dune)-1, eig = TRUE, add = FALSE) print("For the PCoA case with negative eigenvalues:") print(ord_labels(pcoa)[1:2]) cat("\n") # For PCoA without negative eigenvalues pcoa <- stats::cmdscale(d, k = nrow(dune)-1, eig = TRUE, add = TRUE) print("For the PCoA case without negative eigenvalues:") print(ord_labels(pcoa)[1:2]) cat("\n") # For correspondence analysis ca_ord <- vegan::ca(dune) (ord_labels(ca_ord)) print("For the CA case:") print(ord_labels(ca_ord)) }
Makes PCA axis labels that include the
pca_labels(pca)pca_labels(pca)
pca |
Object containing the results of vegan's rda function. |
Each element of the vector returned has the form "PCAn xx.x
A character vector, each element of which can be used to label the corresponding axis of a PCA plot.
{ data(dune, package = "vegan") dune_hel <- vegan::decostand(dune, method = "hellinger") pca_ord <- vegan::rda(dune_hel) pca_labels(pca_ord) }{ data(dune, package = "vegan") dune_hel <- vegan::decostand(dune, method = "hellinger") pca_ord <- vegan::rda(dune_hel) pca_labels(pca_ord) }
Retuns the number of permutaions of n things taken r at a time.
perm(n, r, repetition = FALSE)perm(n, r, repetition = FALSE)
n |
Total number of items. |
r |
Number of items drawn. |
repetition |
A logical, whether or not repetitions are allowed. FALSE by default. |
An integer giving how many ways m things can be drawn n at a time.
perm(10, 5) perm(10, 5, repetition = TRUE)perm(10, 5) perm(10, 5, repetition = TRUE)
Used in Case 3 of the vignette make_comparisons
plot_dfplot_df
A data file in long format used for a ggplot. The sample_data variables are:
A code for genotype (2, 3, or C), P level (H or L) and sample type (R)
One of the families in Gigasporaceae
Percent of total counts for family and treatment combination.
Extracts a QIIME2/DADA2 transition stats file and returns a plot.
plot_transition_stats(trans_stats.qza)plot_transition_stats(trans_stats.qza)
trans_stats.qza |
The transitions stats file output by QIIME2 DADA2 |
Beginning with QIIME2 version 2025.7 the DADA2 plugin requires output of a compressed (qza) file of the transition stats. This function makes a ggplot plot from the data in that file. It is useful in determining how well DADA2 has corrected read errors.
A ggplot of the transition probabilities
qza <- system.file("extdata", "base-transition-stats.qza", package = "QsRutils") plot_transition_stats(qza)qza <- system.file("extdata", "base-transition-stats.qza", package = "QsRutils") plot_transition_stats(qza)
Allows subsetting of a phyloseq object according to the relative abundance of OTUs in a minimal number of samples. Returns a logical vector of OTUs that are at least n% of the sequences in at least m samples.
prop_filter(x, n, m)prop_filter(x, n, m)
x |
A phyloseq object. |
n |
Minimum percentage to keep OTU. |
m |
Minimum number of samples. |
The functions creates a logical vector to be used in subsetting a phyloseq object according to the relative abundance of OTUs in a given number of samples. For example, if n = 1 and m = 2, then the OTUs to be kept must represent at least 1% of the sequences in at least 2 samples. The vector is then used as an argument to the phyloseq object 'prune_taxa'.
A logical vector of OTUs to keep.
{ data("its.root") prop_filter(x = its.root, n = 1, m = 5) }{ data("its.root") prop_filter(x = its.root, n = 1, m = 5) }
The QsRutils package contains functions I have written to make some aspects of using phyloseq and vegan simpler. I originally called the package MyRutils, but that does not make much sense if I am posting it publically!
Transform an angle in radians to degrees.
rad2deg(x)rad2deg(x)
x |
Angle in radians |
Angle in degrees.
rad2deg(pi * 0.5)rad2deg(pi * 0.5)
Makes RDA axis labels that include the axis, that is by the constrained portion of the analysis.
rda_labels(rda)rda_labels(rda)
rda |
A constrained ordination object made with vegan::rda() or vegan::cca(). |
Each element of the vector returned has the form "RDAn xx.x n is the number of the RDA axis and xx.x is the explained by the axis. The percent of total variance is for the constrained portion only.
A character vector, each element of which can be used to label the corresponding axis of an RDA plot.
ord_labels()
{ # Using vegan::rda() data(dune, package = "vegan") data(dune.env, package = "vegan") dune.Manure <- vegan::rda(dune ~ Manure, dune.env) print("For the rda case:") print(rda_labels(dune.Manure)[1:2]) cat("\n") # Using vegan::cca() data(varespec, package = "vegan") data(varechem, package = "vegan") vare.cca <- vegan::cca(varespec ~ Al + P*(K + Baresoil), data=varechem) print("For the cca case:") print(rda_labels(vare.cca)[1:2]) }{ # Using vegan::rda() data(dune, package = "vegan") data(dune.env, package = "vegan") dune.Manure <- vegan::rda(dune ~ Manure, dune.env) print("For the rda case:") print(rda_labels(dune.Manure)[1:2]) cat("\n") # Using vegan::cca() data(varespec, package = "vegan") data(varechem, package = "vegan") vare.cca <- vegan::cca(varespec ~ Al + P*(K + Baresoil), data=varechem) print("For the cca case:") print(rda_labels(vare.cca)[1:2]) }
Roots an unrooted tree in a phyloseq object
root_phyloseq_tree(phylo)root_phyloseq_tree(phylo)
phylo |
A phyloseq object containing an unrooted tree |
The tree is rooted by the longest terminal branch.
The same phyloseq object with a rooted tree
{ data("expt") expt.rooted <- root_phyloseq_tree(expt) ape::is.rooted(phyloseq::phy_tree(expt.rooted)) }{ data("expt") expt.rooted <- root_phyloseq_tree(expt) ape::is.rooted(phyloseq::phy_tree(expt.rooted)) }
Calculates the standard error of a numeric vector
se(x)se(x)
x |
A numeric vector |
NA values are ignored.
The standard error of the numeric vector
x <- c(1,2,3,4,5, NA) se(x)x <- c(1,2,3,4,5, NA) se(x)
Transform a percentage to the square root of its arc sine.
sqrt_arc_sine(x)sqrt_arc_sine(x)
x |
A percentage. |
The square root of the arcsine transformation of x.
sqrt_arc_sine(30.1)sqrt_arc_sine(30.1)
Normalize sample counts using scaling with ranked subsampling (SRS)
srs_p(p)srs_p(p)
p |
a phyloseq object containing an OTU table |
This is an alternative to "rarefying" an OTU table to a constant sample size. The phyloseq object submitted must be pruned to the desired sample size before using this function.
a phyloseq object including an OTU table, all sample sums equal.
John Quensen
Beule L, Karlovsky P. Improved normalization of species count data in ecology by scaling with ranked subsampling (SRS): application to microbial communities. PeerJ. 2020;8:e9593.
{ data("expt") print("Sample sums before srs_p") print(phyloseq::sample_sums(expt)) cat("\n") expt_srs <- srs_p(p = expt) print("Sample sums after srs_p") print(phyloseq::sample_sums(expt_srs)) }{ data("expt") print("Sample sums before srs_p") print(phyloseq::sample_sums(expt)) cat("\n") expt_srs <- srs_p(p = expt) print("Sample sums after srs_p") print(phyloseq::sample_sums(expt_srs)) }
Subset physeq by refseq lengths
subset_by_refseq_lengths(p, min_len = 252, max_len = 255)subset_by_refseq_lengths(p, min_len = 252, max_len = 255)
p |
An experiment level phyloseq object with reference sequences |
min_len |
The minimum reference sequence length to keep |
max_len |
The maximum references sequences length to keep |
Sometimes, due to sequencing errors, reference sequences have lengths less than and/or greater than the expected amplicon length. This function offers an easy way of removing such extraneous reference sequences from an experiment level phyloseq object. The default values for min_len and max_len are fro the V4 region of the 16S rRNA gene.
A phyloseq object filtered to have reference sequences within the length range specified.
{ data("expt") print("Refseq length range before subsetting:") expt@refseq@ranges@width |> summary() |> print() cat("\n") expt_filt <- subset_by_refseq_lengths(p = expt, min_len = 400, max_len = 420) print("Refseq length range after subsetting:") expt_filt@refseq@ranges@width |> summary() |> print() }{ data("expt") print("Refseq length range before subsetting:") expt@refseq@ranges@width |> summary() |> print() cat("\n") expt_filt <- subset_by_refseq_lengths(p = expt, min_len = 400, max_len = 420) print("Refseq length range after subsetting:") expt_filt@refseq@ranges@width |> summary() |> print() }
Subsets a distance matrix.
subset_dist(physeq, d.matrix)subset_dist(physeq, d.matrix)
physeq |
An experiment level phyloseq object. |
d.matrix |
A distance matrix. |
Some distance matrices take a long time to calculate for large data sets. This is especially true of unifrac and generalized unifrac distances calculated by GUniFracs. If distances are first calculated from data in a large experiment level phyloseq object and then it is desired to perform PERMANOVA (with adonis) on a subset of that object, this function provides a means of sub-setting the distance matrix so that it does not have to be calculated again for the subset data. The arguments are the distance matrix for the original phyloseq object and the smaller phyloseq object subset from the original.
A distance matrix of smaller dimensions.
Chen J, Bittinger K, Charlson ES et al. (2012) Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics, 28, 2106-2113.
{ otu <- veganotu(its.root) d <- vegan::vegdist(otu) print("Before subsetting:") print(dim(as.matrix(d))) cat("\n") p <- phyloseq::subset_samples(its.root, P_Location == "LR") d_sub <- subset_dist(p, d) print("After subsetting:") print(dim(as.matrix(d_sub))) }{ otu <- veganotu(its.root) d <- vegan::vegdist(otu) print("Before subsetting:") print(dim(as.matrix(d))) cat("\n") p <- phyloseq::subset_samples(its.root, P_Location == "LR") d_sub <- subset_dist(p, d) print("After subsetting:") print(dim(as.matrix(d_sub))) }
Applies any vegan decostand standardization method to a phyloseq OTU table.
vegan_stand(physeq, method = "hellinger", ...)vegan_stand(physeq, method = "hellinger", ...)
physeq |
A phyloseq object containing at least an OTU table. |
method |
A method from vegan's decostand function. |
... |
Other parameters passed to vegan's decostand function. |
Returns a phyloseq object with transformed OTU table.
{ data("expt") print("Before standardization, first 5 rows and columns:") print(veganotu(expt)[1:5, 1:5]) cat("\n") expt_mod <- vegan_stand(expt, method = "hellinger") print("After standardization, first 5 rows and columns:") print(veganotu(expt_mod)[1:5, 1:5]) }{ data("expt") print("Before standardization, first 5 rows and columns:") print(veganotu(expt)[1:5, 1:5]) cat("\n") expt_mod <- vegan_stand(expt, method = "hellinger") print("After standardization, first 5 rows and columns:") print(veganotu(expt_mod)[1:5, 1:5]) }
Extracts a vegan compatible OTU table from a phyloseq object.
veganotu(physeq)veganotu(physeq)
physeq |
A phyloseq object contaning at least an OTU table. |
A matrix with samples in rows and OTUs in columns.
{ data("expt") # Show only first 5 columns and rows: veganotu(physeq = expt)[1:5, 1:5] }{ data("expt") # Show only first 5 columns and rows: veganotu(physeq = expt)[1:5, 1:5] }
Extracts a sample data table from a phyloseq object.
vegansam(physeq)vegansam(physeq)
physeq |
A phyloseq object containing sample_data. |
A data frame with samples in rows and factors and/or variables in columns.
{ data("expt") vegansam(physeq = expt) }{ data("expt") vegansam(physeq = expt) }