chimera.pintail

Use Pintail approach .… silva quantile file, silva conservation file

Algorithm

Default settings

The fasta and template parameters are required. You may enter multiple fasta files by separating them by dashes. Example: fasta=ex.align-abrecovery.align.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta)

The output to the screen looks like:

mothur > chimera.seqs(fasta=ex.align, template=core_set_aligned.imputed.fasta)
Reading sequences and template file... Done.
Finding closest sequence in template to each sequence... Done.
Getting conservation... Calculating probability of conservation for your template sequences.
This can take a while...  I will  output the frequency of the highest base in each position 
to a .freq file so that you can input them using the conservation parameter next time you run this command.
Providing the .freq file will improve speed.    Done.
Finding window breaks... Done.
Calculating observed distance... Done.
Finding variability... Done.
Calculating alpha... Done.
Calculating expected distance... Done.
Finding deviation... Done.
Calculating quantiles for your template.  This can take a while...  
I will output the quantiles to a .quan file that you can input them using the quantile
parameter next time you run this command. 
Providing the .quan file will dramatically improve speed.    
Processing sequence 0
Processing sequence 1
Processing sequence 2
Processing sequence 3
Processing sequence 4
Processing sequence 5
...
...
Processing sequence 4936
Processing sequence 4937
Done.
gi|11093941|MNA3|AF293013  div: 29.9873    stDev: 6.01747  chimera flag: Yes
gi|11093935|MNC9|AF293007  div: 23.9793    stDev: 4.81477  chimera flag: Yes
gi|11093933|MNA5|AF293005  div: 23.8932    stDev: 4.81141  chimera flag: Yes
gi|11093930|MNH4|AF293002  div: 32.3988    stDev: 7.03789  chimera flag: Yes
gi|11093924|MNF4|AF292996  div: 26.593 stDev: 5.26257  chimera flag: Yes

Opening ex.pintail.chimeras you would see:

gi|11093941|MNA3|AF293013  div: 29.9873    stDev: 6.01747  chimera flag: Yes
Observed   9.66667 10.6667 12  10.6667 12.3333 12.3333 11.3333 11 ...  
Expected   6.72447 7.19081 7.3083  7.35233 7.49571 7.34767 7.34767 7.3494 ...  
gi|11093940|MNF8|AF293012  div: 31.3688    stDev: 5.98965  chimera flag: No
Observed   9.66667 10.6667 12  10.6667 12.3333 12.3333 11.3333 11 ...  
Expected   7.03297 7.5207  7.64357 7.68963 7.83958 7.68475 7.68475 7.68656 ...
...

Options

conservation

You can upload a file containing the frequency information for your template file to increase speed. mothur will generate this for you but it takes a long time.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
conservation=core_set_aligned.imputed.freq)

quantile

You can upload a file containing the quantiles information for your template file to increase speed. mothur can generate this for you but it takes a VERY long time. Note that when you use the filter, mask or mask and filter you need to select the appropriate quantile file. The filter parameter makes the quantile file generated specific to the query set you are analyzing.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
 quantile=core_set_aligned.imputed.pintail.quan)

filter

By default the filter parameter is set to false, but if you set it to true a 50% soft vertical filter will be applied. Filtering makes the quantile file specific to the query set you are analyzing.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta, filter=t, 
    conservation=core_set_aligned.imputed.freq, quantile=core_set_aligned.imputed.pintail.filtered.ex.quan)

With the filter...

gi|11093937|MNF2|AF293009  div: 16.3738    stDev: 5.44241  chimera flag: Yes
gi|11093934|MND1|AF293006  div: 15.6675    stDev: 5.29495  chimera flag: Yes

mask

By default there is no mask applied, but you can set it to a file containing your mask or mask=default will apply the ecoli mask.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta, filter=t, mask=default, 
    conservation=core_set_aligned.imputed.freq, quantile=core_set_aligned.imputed.pintail.filtered.ex.masked.quan)

With the ecoli mask and filter applied...

gi|11093934|MND1|AF293006  div: 15.6675    stDev: 4.26498  chimera flag: Yes

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, mask=default, filter=t, quantile=core_set_aligned.imputed.pintail.filtered.masked.quan)

window

The window parameter is used to determine the length of sequence you want in each window analyzed. By default it is set to 300. Note, changing the window size will require new quantile files to be made.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, window=200)

increment

The increment parameter is used to slide the window along the sequence. For the pintail algorithm the default is 25. Note, changing the increment will require new quantile files to be made.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, increment=50)

processors

To speed up your processing the chimera.seqs command can be run with multiple processors by using the processors parameter. By default the processors parameter is 1. If you are using the mpi-enabled version, processors is set to the number of processors you have running.

mothur > chimera.pintail(fasta=ex.align, template=core_set_aligned.imputed.fasta,
method=pintail, quantile=core_set_aligned.imputed.pintail.quan, processors=2)

This method was written using the algorithms described in the paper: “At Least 1 in 20 16S rRNA Sequence Records Currently Held in the Public Repositories is Estimated To Contain Substantial Anomalies” by Kevin E. Ashelford, Nadia A. Chuzhanova, John C. Fry, Antonia J. Jones and Andrew J. Weightman. Applied and Environmental Microbiology 71 (12): 7724:7736.

Revisions

  • 1.38.0 - Removes save option.