strollur includes the
function read_mothur() as well as several functions to read
mothur output files individually. To
create a data set from the outputs of the Miseq SOP Example, run the
following:
read_mothur()data <- read_mothur(
fasta = strollur_example("final.fasta.gz"),
count = strollur_example("final.count_table.gz"),
taxonomy = strollur_example("final.taxonomy.gz"),
design = strollur_example("mouse.time.design"),
otu_list = strollur_example("final.opti_mcc.list.gz"),
asv_list = strollur_example("final.asv.list.gz"),
phylo_list = strollur_example("final.tx.list.gz"),
sample_tree = strollur_example("final.opti_mcc.jclass.ave.tre"),
dataset_name = "miseq_sop"
)
#> Added 2425 sequences.
#> Assigned 2425 sequence abundances.
#> Assigned 2425 sequence taxonomies.
#> Assigned 531 otu bins.
#> Assigned 2425 asv bins.
#> Assigned 63 phylotype bins.
#> Assigned 19 samples to treatments.To view a summary of data:
data
#> miseq_sop:
#>
#> starts ends nbases ambigs polymers numns numseqs
#> Minimum: 1 375 249 0 3 0 1.00
#> 2.5%-tile: 1 375 252 0 4 0 2849.08
#> 25%-tile: 1 375 252 0 4 0 28490.75
#> Median: 1 375 253 0 4 0 56981.50
#> 75%-tile: 1 375 253 0 5 0 85472.25
#> 97.5%-tile: 1 375 254 0 6 0 111113.93
#> Maximum: 1 375 256 0 6 0 113963.00
#> Mean: 1 375 252 0 4 0 56981.64
#>
#> Number of unique seqs: 2425
#> Total number of seqs: 113963
#>
#> Total number of samples: 19
#> Total number of treatments: 2
#> Total number of otus: 531
#> Total number of otu bin classifications: 531
#> Total number of asvs: 2425
#> Total number of asv bin classifications: 2425
#> Total number of phylotypes: 63
#> Total number of phylotype bin classifications: 63
#> Total number of sequence classifications: 2425read_fasta() read a FASTA
formatted sequence fileread_mothur_count() read a mothur formatted count fileread_mothur_taxonomy() read a mothur formatted taxonomy fileread_mothur_cons_taxonomy() read a mothur formatted cons_taxonomy
fileread_mothur_list() read a mothur formatted list fileread_mothur_shared() read a mothur formatted shared fileread_mothur_rabund() read a mothur formatted rabund fileTo create a data set and read the individual file types, you can use the functions below. First let’s create a data set named my_data.
To add FASTA data
to your data set you can use the read_fasta() function:
fasta_data is a data.frame containing sequence names, sequence
nucleotide strings, and comments if provided. You can add the FASTA
sequences to your data set using the add() function:
add(my_data, table = fasta_data, type = "sequence")
#> Added 2425 sequences.
my_data
#> my_data:
#>
#> starts ends nbases ambigs polymers numns numseqs
#> Minimum: 1 375 249 0 3 0 1.00
#> 2.5%-tile: 1 375 252 0 4 0 60.62
#> 25%-tile: 1 375 252 0 4 0 606.25
#> Median: 1 375 253 0 4 0 1212.50
#> 75%-tile: 1 375 253 0 5 0 1818.75
#> 97.5%-tile: 1 375 254 0 6 0 2364.38
#> Maximum: 1 375 256 0 6 0 2425.00
#> Mean: 1 375 252 0 4 0 1212.64
#>
#> Number of unique seqs: 2425
#> Total number of seqs: 2425To add your sequence abundance data, you can read a mothur count file file
using the read_mothur_count() function:
sample_table is a data.frame containing sequence_names, samples, and
abundances. You can add the sequence abundance data to your data set
using the assign() function:
assign(my_data, table = sample_table, type = "sequence_abundance")
#> Assigned 2425 sequence abundances.
my_data
#> my_data:
#>
#> starts ends nbases ambigs polymers numns numseqs
#> Minimum: 1 375 249 0 3 0 1.00
#> 2.5%-tile: 1 375 252 0 4 0 2849.08
#> 25%-tile: 1 375 252 0 4 0 28490.75
#> Median: 1 375 253 0 4 0 56981.50
#> 75%-tile: 1 375 253 0 5 0 85472.25
#> 97.5%-tile: 1 375 254 0 6 0 111113.93
#> Maximum: 1 375 256 0 6 0 113963.00
#> Mean: 1 375 252 0 4 0 56981.64
#>
#> Number of unique seqs: 2425
#> Total number of seqs: 113963
#>
#> Total number of samples: 19To add sequence taxonomy assignments, you can read a taxonomy file file
using the read_mothur_taxonomy() function:
classification_data is a data.frame containing sequence names and taxonomies. You can add the sequence classification data to your data set as follows:
assign(my_data, table = classification_data, type = "sequence_taxonomy")
#> Assigned 2425 sequence taxonomies.To assign sequences to bins, you can read a mothur list file file
using the read_mothur_list() function:
otu_data <- read_mothur_list(list = strollur_example("final.opti_mcc.list.gz"))
asv_data <- read_mothur_list(list = strollur_example("final.asv.list.gz"))
phylotype_data <- read_mothur_list(list = strollur_example("final.tx.list.gz"))otu_data, asv_data and phylotype_data are data.frames containing bin names and sequence names. You can add the bin data to your data set as follows:
assign(my_data, table = otu_data, type = "bin", bin_type = "otu")
#> Assigned 531 otu bins.
assign(my_data, table = asv_data, type = "bin", bin_type = "asv")
#> Assigned 2425 asv bins.
assign(
my_data,
table = phylotype_data,
type = "bin", bin_type = "phylotype"
)
#> Assigned 63 phylotype bins.
my_data
#> my_data:
#>
#> starts ends nbases ambigs polymers numns numseqs
#> Minimum: 1 375 249 0 3 0 1.00
#> 2.5%-tile: 1 375 252 0 4 0 2849.08
#> 25%-tile: 1 375 252 0 4 0 28490.75
#> Median: 1 375 253 0 4 0 56981.50
#> 75%-tile: 1 375 253 0 5 0 85472.25
#> 97.5%-tile: 1 375 254 0 6 0 111113.93
#> Maximum: 1 375 256 0 6 0 113963.00
#> Mean: 1 375 252 0 4 0 56981.64
#>
#> Number of unique seqs: 2425
#> Total number of seqs: 113963
#>
#> Total number of samples: 19
#> Total number of otus: 531
#> Total number of otu bin classifications: 531
#> Total number of asvs: 2425
#> Total number of asv bin classifications: 2425
#> Total number of phylotypes: 63
#> Total number of phylotype bin classifications: 63
#> Total number of sequence classifications: 2425When you assign bins to sequences with taxonomic assignments the data
set object will find the consensus taxonomy of the bins automatically.
If you wish to assign the bin taxonomy separately, you can read a mothur cons_taxonomy
file file using the read_mothur_cons_taxonomy()
function:
otu_taxonomy_data <- read_mothur_cons_taxonomy(
taxonomy =
strollur_example("final.cons.taxonomy")
)otu_taxonomy_data is a data.frame containing bin names, abundances and taxonomies. You can add the bin taxonomic data to your data set as follows:
write_mothur() write mothur formatted files for all
datawrite_fasta() read a FASTA
formatted sequence filewrite_mothur_count() write a mothur formatted count filewrite_mothur_design() write a mothur formatted design filewrite_taxonomy() write a mothur formatted taxonomy filewrite_mothur_cons_taxonomy() write a mothur formatted
cons_taxonomy
filewrite_mothur_list() write a mothur formatted list filewrite_mothur_shared() write a mothur formatted shared filewrite_mothur_rabund() write a mothur formatted rabund file