Skip to contents

The sequence and taxonomy data for the 10,049 sequences found in the Ribosomal Database Project's trainset9_032012 training set for use with the naive Bayesian classifier as implemented in the {phylyotypr} R package. Originally released by the RDP in September 2012. The rdp version contains the same sequences as provided by the official RDP version (9,665 bacterial and 384 archaeal). The pds version contains extra eukaryotic sequences including 119 chloroplasts and mitochondria (10,168 total sequences). See the mothur reference file page in "Sources" for more information. Be sure to see the mothur GitHub project where you can find the phylotyprrefdata package (https://github.com/mothur/phylotyprrefdata) for access to other taxonomic reference data.

Usage

trainset9_rdp

trainset9_pds

Format

A data frame with 3 columns. Each row represents a different sequence:

id

Sequence accession identifier

sequence

DNA sequence string

taxonomy

Taxonomic string with each level separated with a ;

An object of class data.frame with 10169 rows and 3 columns.

Source