Importing from mothur

strollur includes the function read_mothur() as well as several functions to read mothur output files individually. To create a data set from the outputs of the Miseq SOP Example, run the following:

Using `read_mothur()`

data <- read_mothur(
  fasta = strollur_example("final.fasta.gz"),
  count = strollur_example("final.count_table.gz"),
  taxonomy = strollur_example("final.taxonomy.gz"),
  design = strollur_example("mouse.time.design"),
  otu_list = strollur_example("final.opti_mcc.list.gz"),
  asv_list = strollur_example("final.asv.list.gz"),
  phylo_list = strollur_example("final.tx.list.gz"),
  sample_tree = strollur_example("final.opti_mcc.jclass.ave.tre"),
  dataset_name = "miseq_sop"
)
#> Added 2425 sequences.
#> Assigned 2425 sequence abundances.
#> Assigned 2425 sequence taxonomies.
#> Assigned 531 otu bins.
#> Assigned 2425 asv bins.
#> Assigned 63 phylotype bins.
#> Assigned 19 samples to treatments.

To view a summary of data:

data
#> miseq_sop:
#> 
#>             starts ends nbases ambigs polymers numns   numseqs
#> Minimum:         1  375    249      0        3     0      1.00
#> 2.5%-tile:       1  375    252      0        4     0   2849.08
#> 25%-tile:        1  375    252      0        4     0  28490.75
#> Median:          1  375    253      0        4     0  56981.50
#> 75%-tile:        1  375    253      0        5     0  85472.25
#> 97.5%-tile:      1  375    254      0        6     0 111113.93
#> Maximum:         1  375    256      0        6     0 113963.00
#> Mean:            1  375    252      0        4     0  56981.64
#> 
#> Number of unique seqs: 2425 
#> Total number of seqs: 113963 
#> 
#> Total number of samples: 19 
#> Total number of treatments: 2 
#> Total number of otus: 531 
#> Total number of otu bin classifications: 531 
#> Total number of asvs: 2425 
#> Total number of asv bin classifications: 2425 
#> Total number of phylotypes: 63 
#> Total number of phylotype bin classifications: 63 
#> Total number of sequence classifications: 2425

Importing Individual Files

read_fasta() read a FASTA formatted sequence file
read_mothur_count() read a mothur formatted count file
read_mothur_taxonomy() read a mothur formatted taxonomy file
read_mothur_cons_taxonomy() read a mothur formatted cons_taxonomy file
read_mothur_list() read a mothur formatted list file
read_mothur_shared() read a mothur formatted shared file
read_mothur_rabund() read a mothur formatted rabund file

To create a data set and read the individual file types, you can use the functions below. First let’s create a data set named my_data.

my_data <- new_dataset(dataset_name = "my_data")

To add FASTA data to your data set you can use the read_fasta() function:

fasta_data <- strollur::read_fasta(fasta = strollur_example("final.fasta.gz"))

fasta_data is a data.frame containing sequence names, sequence nucleotide strings, and comments if provided. You can add the FASTA sequences to your data set using the add() function:

add(my_data, table = fasta_data, type = "sequence")
#> Added 2425 sequences.
my_data
#> my_data:
#> 
#>             starts ends nbases ambigs polymers numns numseqs
#> Minimum:         1  375    249      0        3     0    1.00
#> 2.5%-tile:       1  375    252      0        4     0   60.62
#> 25%-tile:        1  375    252      0        4     0  606.25
#> Median:          1  375    253      0        4     0 1212.50
#> 75%-tile:        1  375    253      0        5     0 1818.75
#> 97.5%-tile:      1  375    254      0        6     0 2364.38
#> Maximum:         1  375    256      0        6     0 2425.00
#> Mean:            1  375    252      0        4     0 1212.64
#> 
#> Number of unique seqs: 2425 
#> Total number of seqs: 2425

To add your sequence abundance data, you can read a mothur count file file using the read_mothur_count() function:

sample_table <- read_mothur_count(
  filename = strollur_example("final.count_table.gz")
)

sample_table is a data.frame containing sequence_names, samples, and abundances. You can add the sequence abundance data to your data set using the assign() function:

assign(my_data, table = sample_table, type = "sequence_abundance")
#> Assigned 2425 sequence abundances.
my_data
#> my_data:
#> 
#>             starts ends nbases ambigs polymers numns   numseqs
#> Minimum:         1  375    249      0        3     0      1.00
#> 2.5%-tile:       1  375    252      0        4     0   2849.08
#> 25%-tile:        1  375    252      0        4     0  28490.75
#> Median:          1  375    253      0        4     0  56981.50
#> 75%-tile:        1  375    253      0        5     0  85472.25
#> 97.5%-tile:      1  375    254      0        6     0 111113.93
#> Maximum:         1  375    256      0        6     0 113963.00
#> Mean:            1  375    252      0        4     0  56981.64
#> 
#> Number of unique seqs: 2425 
#> Total number of seqs: 113963 
#> 
#> Total number of samples: 19

To add sequence taxonomy assignments, you can read a taxonomy file file using the read_mothur_taxonomy() function:

classification_data <- read_mothur_taxonomy(
  taxonomy = strollur_example("final.taxonomy.gz")
)

classification_data is a data.frame containing sequence names and taxonomies. You can add the sequence classification data to your data set as follows:

assign(my_data, table = classification_data, type = "sequence_taxonomy")
#> Assigned 2425 sequence taxonomies.

To assign sequences to bins, you can read a mothur list file file using the read_mothur_list() function:

otu_data <- read_mothur_list(list = strollur_example("final.opti_mcc.list.gz"))
asv_data <- read_mothur_list(list = strollur_example("final.asv.list.gz"))
phylotype_data <- read_mothur_list(list = strollur_example("final.tx.list.gz"))

otu_data, asv_data and phylotype_data are data.frames containing bin names and sequence names. You can add the bin data to your data set as follows:

assign(my_data, table = otu_data, type = "bin", bin_type = "otu")
#> Assigned 531 otu bins.
assign(my_data, table = asv_data, type = "bin", bin_type = "asv")
#> Assigned 2425 asv bins.
assign(
  my_data,
  table = phylotype_data,
  type = "bin", bin_type = "phylotype"
)
#> Assigned 63 phylotype bins.
my_data
#> my_data:
#> 
#>             starts ends nbases ambigs polymers numns   numseqs
#> Minimum:         1  375    249      0        3     0      1.00
#> 2.5%-tile:       1  375    252      0        4     0   2849.08
#> 25%-tile:        1  375    252      0        4     0  28490.75
#> Median:          1  375    253      0        4     0  56981.50
#> 75%-tile:        1  375    253      0        5     0  85472.25
#> 97.5%-tile:      1  375    254      0        6     0 111113.93
#> Maximum:         1  375    256      0        6     0 113963.00
#> Mean:            1  375    252      0        4     0  56981.64
#> 
#> Number of unique seqs: 2425 
#> Total number of seqs: 113963 
#> 
#> Total number of samples: 19 
#> Total number of otus: 531 
#> Total number of otu bin classifications: 531 
#> Total number of asvs: 2425 
#> Total number of asv bin classifications: 2425 
#> Total number of phylotypes: 63 
#> Total number of phylotype bin classifications: 63 
#> Total number of sequence classifications: 2425

When you assign bins to sequences with taxonomic assignments the data set object will find the consensus taxonomy of the bins automatically. If you wish to assign the bin taxonomy separately, you can read a mothur cons_taxonomy file file using the read_mothur_cons_taxonomy() function:

otu_taxonomy_data <- read_mothur_cons_taxonomy(
  taxonomy =
    strollur_example("final.cons.taxonomy")
)

otu_taxonomy_data is a data.frame containing bin names, abundances and taxonomies. You can add the bin taxonomic data to your data set as follows:

assign(my_data, table = otu_taxonomy_data, type = "bin_taxonomy")
#> Assigned 531 otu bin taxonomies.

Writing mothur formatted file types

write_mothur() write mothur formatted files for all data
write_fasta() read a FASTA formatted sequence file
write_mothur_count() write a mothur formatted count file
write_mothur_design() write a mothur formatted design file
write_taxonomy() write a mothur formatted taxonomy file
write_mothur_cons_taxonomy() write a mothur formatted cons_taxonomy file
write_mothur_list() write a mothur formatted list file
write_mothur_shared() write a mothur formatted shared file
write_mothur_rabund() write a mothur formatted rabund file

Using read_mothur()

Importing Individual Files

Writing mothur formatted file types

Using `read_mothur()`