strollur includes the function read_mothur() as
well as several functions to read mothur output files individually. To
create a data set from the outputs of the Miseq SOP Example, run the
following:
Using read_mothur()
data <- read_mothur(
fasta = strollur_example("final.fasta.gz"),
count = strollur_example("final.count_table.gz"),
taxonomy = strollur_example("final.taxonomy.gz"),
design = strollur_example("mouse.time.design"),
otu_list = strollur_example("final.opti_mcc.list.gz"),
asv_list = strollur_example("final.asv.list.gz"),
phylo_list = strollur_example("final.tx.list.gz"),
sample_tree = strollur_example("final.opti_mcc.jclass.ave.tre"),
dataset_name = "miseq_sop"
)
#> Added 2425 sequences.
#> Assigned 2425 sequence abundances.
#> Assigned 2425 sequence taxonomies.
#> Assigned 531 otu bins.
#> Assigned 2425 asv bins.
#> Assigned 63 phylotype bins.
#> Assigned 19 samples to treatments.To view a summary of data:
data
#> miseq_sop:
#>
#> starts ends nbases ambigs polymers numns numseqs
#> Minimum: 1 375 249 0 3 0 1.00
#> 2.5%-tile: 1 375 252 0 3 0 2850.08
#> 25%-tile: 1 375 252 0 4 0 28491.75
#> Median: 1 375 252 0 4 0 56982.50
#> 75%-tile: 1 375 253 0 5 0 85473.25
#> 97.5%-tile: 1 375 253 0 6 0 111114.93
#> Maximum: 1 375 256 0 6 0 113963.00
#> Mean: 1 375 252 0 4 0 0.00
#>
#> Number of unique seqs: 2425
#> Total number of seqs: 113963
#>
#> Total number of samples: 19
#> Total number of treatments: 2
#> Total number of otus: 531
#> Total number of otu bin classifications: 531
#> Total number of asvs: 2425
#> Total number of asv bin classifications: 2425
#> Total number of phylotypes: 63
#> Total number of phylotype bin classifications: 63
#> Total number of sequence classifications: 2425Importing Individual Files
-
read_fasta()read a FASTA formatted sequence file -
read_mothur_count()read a mothur formatted count file -
read_mothur_taxonomy()read a mothur formatted taxonomy file -
read_mothur_cons_taxonomy()read a mothur formatted cons_taxonomy file -
read_mothur_list()read a mothur formatted list file -
read_mothur_shared()read a mothur formatted shared file -
read_mothur_rabund()read a mothur formatted rabund file
To create a data set and read the individual file types, you can use the functions below. First let’s create a data set named my_data.
my_data <- new_dataset(dataset_name = "my_data")To add FASTA data
to your data set you can use the read_fasta() function:
fasta_data <- read_fasta(fasta = strollur_example("final.fasta.gz"))fasta_data is a data.frame containing sequence names, sequence
nucleotide strings, and comments if provided. You can add the FASTA
sequences to your data set using the add() function:
add(my_data, table = fasta_data, type = "sequences")
#> Added 2425 sequences.
#> [1] 2425
my_data
#> my_data:
#>
#> starts ends nbases ambigs polymers numns numseqs
#> Minimum: 1 375 249 0 3 0 1.00
#> 2.5%-tile: 1 375 252 0 4 0 61.62
#> 25%-tile: 1 375 252 0 4 0 607.25
#> Median: 1 375 253 0 4 0 1213.50
#> 75%-tile: 1 375 253 0 5 0 1819.75
#> 97.5%-tile: 1 375 254 0 6 0 2365.38
#> Maximum: 1 375 256 0 6 0 2425.00
#> Mean: 1 375 252 0 4 0 0.00
#>
#> Number of unique seqs: 2425
#> Total number of seqs: 2425To add your sequence abundance data, you can read a mothur count file file
using the read_mothur_count() function:
sample_table <- read_mothur_count(
filename = strollur_example("final.count_table.gz")
)sample_table is a data.frame containing sequence_names, samples, and
abundances. You can add the sequence abundance data to your data set
using the assign() function:
assign(my_data, table = sample_table, type = "sequence_abundance")
#> Assigned 2425 sequence abundances.
#> [1] 2425
my_data
#> my_data:
#>
#> starts ends nbases ambigs polymers numns numseqs
#> Minimum: 1 375 249 0 3 0 1.00
#> 2.5%-tile: 1 375 252 0 3 0 2850.08
#> 25%-tile: 1 375 252 0 4 0 28491.75
#> Median: 1 375 252 0 4 0 56982.50
#> 75%-tile: 1 375 253 0 5 0 85473.25
#> 97.5%-tile: 1 375 253 0 6 0 111114.93
#> Maximum: 1 375 256 0 6 0 113963.00
#> Mean: 1 375 252 0 4 0 0.00
#>
#> Number of unique seqs: 2425
#> Total number of seqs: 113963
#>
#> Total number of samples: 19To add sequence taxonomy assignments, you can read a taxonomy file file
using the read_mothur_taxonomy() function:
classification_data <- read_mothur_taxonomy(
taxonomy = strollur_example("final.taxonomy.gz")
)classification_data is a data.frame containing sequence names and taxonomies. You can add the sequence classification data to your data set as follows:
assign(my_data, table = classification_data, type = "sequence_taxonomy")
#> Assigned 2425 sequence taxonomies.
#> [1] 2425To assign sequences to bins, you can read a mothur list file file
using the read_mothur_list() function:
otu_data <- read_mothur_list(list = strollur_example("final.opti_mcc.list.gz"))
asv_data <- read_mothur_list(list = strollur_example("final.asv.list.gz"))
phylotype_data <- read_mothur_list(list = strollur_example("final.tx.list.gz"))otu_data, asv_data and phylotype_data are data.frames containing bin names and sequence names. You can add the bin data to your data set as follows:
assign(my_data, table = otu_data, type = "bins", bin_type = "otu")
#> Assigned 531 otu bins.
#> [1] 531
assign(my_data, table = asv_data, type = "bins", bin_type = "asv")
#> Assigned 2425 asv bins.
#> [1] 2425
assign(
my_data,
table = phylotype_data,
type = "bins", bin_type = "phylotype"
)
#> Assigned 63 phylotype bins.
#> [1] 63
my_data
#> my_data:
#>
#> starts ends nbases ambigs polymers numns numseqs
#> Minimum: 1 375 249 0 3 0 1.00
#> 2.5%-tile: 1 375 252 0 3 0 2850.08
#> 25%-tile: 1 375 252 0 4 0 28491.75
#> Median: 1 375 252 0 4 0 56982.50
#> 75%-tile: 1 375 253 0 5 0 85473.25
#> 97.5%-tile: 1 375 253 0 6 0 111114.93
#> Maximum: 1 375 256 0 6 0 113963.00
#> Mean: 1 375 252 0 4 0 0.00
#>
#> Number of unique seqs: 2425
#> Total number of seqs: 113963
#>
#> Total number of samples: 19
#> Total number of otus: 531
#> Total number of otu bin classifications: 531
#> Total number of asvs: 2425
#> Total number of asv bin classifications: 2425
#> Total number of phylotypes: 63
#> Total number of phylotype bin classifications: 63
#> Total number of sequence classifications: 2425When you assign bins to sequences with taxonomic assignments the data
set object will find the consensus taxonomy of the bins automatically.
If you wish to assign the bin taxonomy separately, you can read a mothur cons_taxonomy
file file using the read_mothur_cons_taxonomy()
function:
otu_taxonomy_data <- read_mothur_cons_taxonomy(
taxonomy =
strollur_example("final.cons.taxonomy")
)otu_taxonomy_data is a data.frame containing bin names, abundances and taxonomies. You can add the bin taxonomic data to your data set as follows:
assign(my_data, table = otu_taxonomy_data, type = "bin_taxonomy")
#> Assigned 531 otu bin taxonomies.
#> [1] 531Writing mothur formatted file types
-
write_mothur()write mothur formatted files for all data -
write_fasta()read a FASTA formatted sequence file -
write_mothur_count()write a mothur formatted count file -
write_mothur_design()write a mothur formatted design file -
write_taxonomy()write a mothur formatted taxonomy file -
write_mothur_cons_taxonomy()write a mothur formatted cons_taxonomy file -
write_mothur_list()write a mothur formatted list file -
write_mothur_shared()write a mothur formatted shared file -
write_mothur_rabund()write a mothur formatted rabund file
