align.check

The align.check command allows you to calculate the number of potentially misaligned bases in a 16S rRNA gene sequence alignment. If you are familiar with the editor window in ARB, this is the same as counting the number of ~, #, -, and = signs. To run through the command below use the greengenes secondary structure map and the esophagus dataset.

Meaning of symbols

It should first be noted that every sequence alignment will have non-ideal pairing, thus this tool should be used to identify outliers in your alignment.

Symbol Base-pairings ——– ————————————————– # AA, AC, CC, CT, GG, A-, C-, G-, T- - GT = AG + TT ~ AT, CG loop -- and positions not in the secondary structure

Default settings

To run align.check you need to provide your aligned sequences as a FASTA-formatted file. You also need to provide the appropriate file containing the secondary structure map. These esophagus sequences were aligned using the greengenes alignment. The output will be in

.align.check file. Try the following command:

mothur > align.check(fasta=esophagus.align, map=gg.ss.map)

The output file will look like:

name   pound   dash    plus    equal   loop    tilde   total
9_1_12 42  68  8   28  275 420 872
9_1_14 36  68  6   26  266 422 851
9_1_15 44  68  8   28  276 418 873
9_1_16 34  72  6   30  267 430 860
9_1_18 46  80  2   36  261 406 862
...

name

You can also provide a name file associated with your fasta file.

mothur > align.check(fasta=esophagus.align, map=gg.ss.map, name=esophagus.names)

count

The count file is similar to the name file in that it is used to represent the number of duplicate sequences for a given representative sequence.

mothur > align.check(fasta=esophagus.align, map=gg.ss.map, count=esophagus.count_table)

Comparing alignments

We find that the SILVA alignment actually provides a better alignment than the greengenes alignment. To test this, realign the esophagus sequences using the SILVA alignment with the align.seqs command:

mothur > align.seqs(candidate=esophagus.fasta, template=silva.bacteria.fasta)
mothur > align.check(fasta=esophagus.align, map=silva.ss.map)

And the top of the resulting esophagus.align.check file looks like:

name   pound   dash    plus    equal   loop    tilde   total
9_1_12 20  72  6   28  277 432 866
9_1_14 26  70  6   28  271 420 848
9_1_15 20  72  6   28  277 432 866
9_1_16 30  72  4   32  274 426 859
9_1_18 18  80  4   36  265 416 850
...

You can seem by comparing the greengenes and SILVA alignments for these 5 sequences that there are significantly fewer #’s in the SILVA aligned sequences.

Revisions

  • 1.28.0 Added count option
  • 1.28.0 Added name parameter back