summary.single

The summary.single command will produce a summary file that has the calculator value for each line in the OTU data and for all possible comparisons between the different groups in the group file. This can be useful if you aren’t interested in generating collector’s or rarefaction curves for your multi-sample data analysis. It would be worth your while, however, to look at the collector’s curves for the calculators you are interested in to determine how sensitive the values are to sampling. If the values are not sensitive to sampling, then you can trust the values. Otherwise, you need to keep sampling. For this tutorial you should download and decompress amazondata.zip

Default settings

Enter either of the following commands:

mothur > summary.single(list=98_lt_phylip_amazon.fn.list)

or to run the single analysis with multiple samples:

mothur > make.shared(list=98_lt_phylip_amazon.fn.list, group=amazon.groups)
mothur > summary.single(shared=98_lt_phylip_amazon.fn.shared)

The summary data for all of the single sample calculators is generated by default with the following command:

mothur > summary.single(list=98_lt_phylip_amazon.fn.list)

This will result in output to the screen looking like:

unique 1
0.00   2
0.01   3
0.02   4
0.03   5
0.04   6
0.05   7
0.06   8
0.07   9
0.08   10
0.09   11
0.10   12

The left column indicates the label for each line in the data set and the right column indicates the row number in the data set. In dotur, the summary data was provided in separate files ending in “ltt” and was only generated after the collector’s curves were generated. Now, in mothur, all of this data is contained within a single “summary” file. In this case data was written to the file 98_lt_phylip_amazon.fn.summary, which looks like:

label  Sobs        Chao        Chao_lci    Chao_hci
unique 96.000000   1558.222222 347.616318  8593.437067
0.00   95.000000   1144.375000 350.293575  4408.417964
0.01   93.000000   732.222222  307.998499  1993.501867
0.02   89.000000   1255.666667 288.403803  6914.903478
0.03   84.000000   481.193878  227.569853  1182.858662
0.04   81.000000   315.945000  179.197207  643.125486
0.05   73.000000   179.211735  119.908132  313.489912
0.06   68.000000   143.306667  100.656145  241.660852
0.07   66.000000   121.723183  90.116038   194.755526
0.08   59.000000   92.109568   72.389087   140.875896
0.09   57.000000   96.744444   72.675322   157.771188
0.10   55.000000   95.158163   70.499679   159.045903

Again, the first column contains the label for the row in the data set you are analyzing. Next, the first row of each column is labeled to indicate the calculator that was used to generate the data. For instance, here the data in the column labeled Sobs contains the number of OTUs that were observed in each row of the data set. Next in this file are the data for the Chao1 richness estimator. Because there are formulae for the 95% confidence intervals the first column contains the actual estimator and the next two columns contain the value for the lower and upper bound on the interval.

Options

calc

If you don’t want to see all of the default calculators, you can tell mothur which ones to use in the summary file:

mothur > summary.single(list=98_lt_phylip_amazon.fn.list, calc=sobs-chao-npshannon)

This would generate the 98_lt_phylip_amazon.fn.summary file:

label  Sobs        Chao        Chao_lci    Chao_hci    NPShannon
unique 96.000000   1558.22222  347.61631   8593.43706  7.768419
0.00   95.000000   1144.37500  350.29357   4408.41796  7.355786
0.01   93.000000   732.222222  307.99849   1993.50186  6.831284
0.02   89.000000   1255.66666  288.40380   6914.90347  6.344819
0.03   84.000000   481.193878  227.56985   1182.85866  5.800593
0.04   81.000000   315.945000  179.19720   643.125486  5.559488
0.05   73.000000   179.211735  119.90813   313.489912  5.090494
0.06   68.000000   143.306667  100.65614   241.660852  4.853388
0.07   66.000000   121.723183  90.116038   194.755526  4.776910
0.08   59.000000   98.4453120  74.865828   157.068166  4.483528
0.09   57.000000   105.568047  75.830083   182.270567  4.377893
0.10   55.000000   100.872781  72.625547   174.389885  4.286385

abund

By default the ace estimator uses 10 as the cutoff between OTUs that are rare and abundant. So if an OTU has more than 10 individuals in it, then it is considered abundant. This is really just an empirical decision and we are merely following the lead of Anne Chao and others who implement 10 in their software. If you would like to use a different cutoff, you can use the abund option:

mothur > summary.single(calc=ace, abund=5)

Looking at the file, 98_lt_phylip_amazon.fn.summary, you’ll see that when the distance is 0.10, the ACE estimate value is 101.1 (95% CI=75.5-158.8) compared to 161.4 (95% CI=120.3-228.4) when abund was 10. You will not see a difference when the maximum abundance is below the threshold.

size

Within the suite of calculators available in mothur are a set that will predict the number of additional OTUs that will be observed for a given sample size. By default these calculators will base the prediction on a sample that is the same size as the initial sampling. If you would like to use a different sample size, use the size option:

mothur > summary.single(list=98_lt_phylip_amazon.fn.list, calc=boneh, size=50)

The value of size should be between 1 and the size of the initial sampling. If you go beyond those limits, the default sample size will be used.

label

There may only be a couple of lines in your OTU data that you are interested in summarizing. There are two options. You could: (i) manually delete the lines you aren’t interested in from you rabund, sabund, or list file; (ii) or use the label option. To use the label option with either the summary.single() command you need to know the labels you are interested in. If you want the summary data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:

mothur > summary.single(list=98_lt_phylip_amazon.fn.list, label=unique-0.03-0.05-0.10, calc=sobs-chao)

Opening 98_lt_phylip_amazon.fn.summary you would see the output as:

label  Sobs        Chao        Chao_lci    Chao_hci
unique 96.000000   1558.22222  347.616318  8593.437067
0.03   84.000000   481.193878  227.569853  1182.858662
0.05   73.000000   179.211735  119.908132  313.4899120
0.10   55.000000   100.872781  72.6255470  174.3898850

groupmode

If you are running summary.single with a shared file and would like your summary results collated in multiple files, set groupmode=f. (Default=True). If you run:

mothur > make.shared(list=98_lt_phylip_amazon.fn.list, group=amazon.groups)
mothur > summary.single(shared=98_lt_phylip_amazon.fn.shared, groupmode=t)

label  group   sobs    chao    chao_lci    chao_hci ...    
unique forest  48.000000   588.500000  235.495870  1606.115657 ...     
unique pasture 48.000000   588.500000  235.495870  1606.115657 ...     
0.00   forest  48.000000   588.500000  235.495870  1606.115657 ...     
0.00   pasture 48.000000   588.500000  235.495870  1606.115657 ... 
0.01   forest  47.000000   377.000000  165.127879  968.882296 ...      
0.01   pasture 47.000000   377.000000  165.127879  968.882296 ...  
0.02   forest  46.000000   519.000000  208.687352  1421.208320 ... 
0.02   pasture 44.000000   905.000000  517.445731  1609.799312 ...     
0.03   forest  45.000000   332.000000  146.710970  854.833987 ...      
0.03   pasture 43.000000   433.000000  175.375225  1192.006545 ... 
0.04   forest  44.000000   239.000000  115.962675  572.398926 ...      
0.04   pasture 43.000000   433.000000  175.375225  1192.006545 ...

subsample

The subsample parameter allows you to enter the size of the sample or you can set subsample=T and mothur will use the size of your smallest group in the case of a shared file. With a list, sabund or rabund file you must provide a subsample size.

iters

The iters parameter allows you to choose the number of times you would like to run the subsample. Default=1000.

withreplacement

The withreplacement parameter allows you to indicate you want to subsample your data allowing for the same read to be included multiple times. Default=f.

Revisions

  • 1.29.0 Added subsample options.
  • 1.40.0 - Speed and memory improvements for shared files. #357 , #347
  • 1.40.0 - Fixes segfault error for commands that use subsampling. #357 , #347
  • 1.42.0 - Adds withreplacement parameter to sub.sample command. #262
  • 1.43.0 - Modifies output files from dist.shared, summary.single and summary.shared. You may run with or without rarefaction, but not both. #607