We will be a mothur workshop in December. Learn more.

Greengenes-formatted databases

From mothur
Revision as of 18:46, 18 May 2015 by Pschloss (Talk | contribs) (Old stuff)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The greengenes-based alignment is 7,682 columns wide. Because of the poor alignment quality in the variable regions we strongly discourage people from using it for their "real" analysis. One side effect of this is that chimera.slayer detects fewer real chimeras when using greengenes-aligned sequences compared to SILVA-aligned sequences.

Current stuff

  • greengenes reference taxonomy - This is from the August 2013 release of gg_13_8_99 and contains 202,421 bacterial and archaeal sequences. The source data for this file was downloaded from the greengenes.microbio.me ftp server and should be used with classify.seqs. Approximately 10% of the sequences in this dataset have spices-level names. Depending on your sample and 16S rRNA gene region of interest, you might find that this reference taxonomy does better than the RDP taxonomy.A README file is included that describes precisely how the files were created. You can also find this README on the mothur blog site
  • greengenes reference alignment - This is from the August 2013 release of gg_13_8_99 and contains 202,421 bacterial and archaeal sequences. The source data for this file was downloaded from the greengenes.microbio.me ftp server and should be used (if necessary, but why!?) to align sequences with align.seqs. The greengenes-based alignment is 7,682 columns wide. Because of the poor alignment quality in the variable regions we strongly discourage people from using it for their "real" analysis. A README file is included that describes precisely how the files were created.
  • greengenes gold alignment - 5,181 bacterial and archaeal sequences in it and the sequences do not necessarily cover the entire 16S rRNA gene.  This file was downloaded from the Broad Institutes sourceforge website and should be used with chimera.slayer or with chimera.uchime. One side effect of the crappy greengenes alignment is that chimera.slayer detects fewer real chimeras when using greengenes-aligned sequences compared to SILVA-aligned sequences.

Old stuff