We will be offering an R workshop December 18-20, 2019. Learn more.
A taxonomy database consists of unaligned sequences in fasta format and a taxonomy file. The taxonomy file is a two column text file where the first column is the name of the sequence and the second column is a string of taxonomic information separated by semicolons. This information should not include spaces and the last character must be a semi-colon. For example, the first lines of silva.slv.taxonomy are as follows:
U87775.1 Bacteria;Alphaproteobacteria;Rhizobiales;Azorhizobium_et_rel.;Methylobacterium_et_rel.;Bosea; DQ904772.1 Bacteria;Firmicutes;Clostridiales;Ruminococcus_et_rel.;Anaerofilum-Faecalibacterium;Faecalibacterium;Faecalibacterium_prausnitzii; AY553109.1 Bacteria;Firmicutes;Bacillales_Mollicutes;Bacillus_subtilis_et_rel.;Bacillus_carboniphilus_et_rel.;Bacillus_licheniformis-pumilus-subtilis; AY553101.1 Bacteria;Firmicutes;Bacillales_Mollicutes;Bacillus_subtilis_et_rel.;Bacillus_carboniphilus_et_rel.;Bacillus_licheniformis-pumilus-subtilis;
You can download our version of the..
- SILVA reference files: The fasta and taxonomic outlines for the SILVA, greengenes, RDP, and NCBI heirarchies and can be used with the Bayesian classifier
- RDP reference files: The fasta and taxonomic outline that the RDP uses with their implementation of the Bayesian classifier
- greengenes reference files: The fasta and taxonomic outline that greengenes uses with their classifier and can be used with the Bayesian classifier