The sequence and taxonomy data for the 10,049 sequences found in the
Ribosomal Database Project's trainset9_032012 training set for use with the
naive Bayesian classifier as implemented in the {phylyotypr}
R package.
Originally released by the RDP in September 2012. The rdp
version contains
the same sequences as provided by the official RDP version (9,665 bacterial
and 384 archaeal). The pds
version contains extra eukaryotic sequences
including 119 chloroplasts and mitochondria (10,168 total sequences). See the
mothur reference file page in "Sources" for more information. Be sure to see
the mothur GitHub project where you can find the phylotyprrefdata package
(https://github.com/mothur/phylotyprrefdata) for access to other taxonomic
reference data.
Format
A data frame with 3 columns. Each row represents a different sequence:
- id
Sequence accession identifier
- sequence
DNA sequence string
- taxonomy
Taxonomic string with each level separated with a
;
An object of class data.frame
with 10169 rows and 3 columns.
Source
RDP sourceforge page # nolint: line_length_linter