summary.shared

The summary.shared command will produce a summary file that has the calculator value for each line in the OTU data and for all possible comparisons between the different groups in the group file. This can be useful if you aren’t interested in generating collector’s or rarefaction curves for your multi-sample data analysis. It would be worth your while, however, to look at the collector’s curves for the calculators you are interested in to determine how sensitive the values are to sampling. If the values are not sensitive to sampling, then you can trust the values. Otherwise, you need to keep sampling. For this tutorial you should download and decompress Patient70Data.zip

Default settings

First you will need to make a shared file from your list and group files.

mothur > make.shared(list=patient70.fn.list, group=patient70.tissue_stool.groups)

The summary data for multi-sample calculators are generated by default with the following command:

mothur > summary.shared(shared=patient70.fn.shared)

This will result in output to the screen looking like:

unique 1
0.00   2
0.01   3
0.02   4
0.03   5
0.04   6
0.05   7
0.06   8
0.07   9
0.08   10
0.09   11
0.10   12

The left column indicates the label for each line in the data set and the right column indicates the row number in the data set. In sons, the summary data was provided in a file ending in “sons.ltt” and was only generated after the collector’s curves were generated. Now, in mothur, all of this data is contained within a single “shared.summary” file. In this case data was written to the file patient70.fn.shared.summary, which looks like:

label  comparison  sharedsobs  sharedchao  sharedace   JAbund      SorAbund    Jclass      SorClass
unique stool   tissue  73.000000   161.449997  108.60603   0.150565    0.261723    0.026613    0.051847
0.00   stool   tissue  124.000000  237.481247  254.53860   0.489131    0.656935    0.174402    0.297006
0.01   stool   tissue  94.000000   162.892853  135.36864   0.736210    0.848066    0.367188    0.537143
0.02   stool   tissue  76.000000   110.477272  86.50789    0.892669    0.943291    0.554745    0.713615
0.03   stool   tissue  60.000000   75.916664   72.30236    0.926541    0.961870    0.545455    0.705882
...

Again, the first column contains the label for the row in the data set you are analyzing. The second and third columns give the group names of the pairwise comparison that is represented by the row. Further columns are labeled to indicate the calculator that was used to generate the data. For instance, here the data in the column labeled SharedSobs contains the number of OTUs that were observed to be shared between groups for each line in the list file. This is actually just a snippet of the file; there are 11 calculators that are calculated by default.

Options

calc

If you don’t want to see all of the default calculators, you can tell mothur which ones to use in the summary file:

mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs-sharedchao-jest)

This would generate the patient70.fn.shared.summary file:

label  A   B       sharedsobs  sharedchao  Jest
unique stool   tissue      73.000000   161.449997  0.008066
0.00   stool   tissue      124.000000  237.481247  0.219289
0.01   stool   tissue      94.000000   162.892853  0.546228
0.02   stool   tissue      76.000000   110.477272  0.665435
 ...

label

There may only be a couple of lines in your OTU data that you are interested in summarizing. There are two options. You could: (i) manually delete the lines you aren’t interested in from you rabund, sabund, or list file; (ii) or use the label option. To use the label option with either the summary.single() command you need to know the labels you are interested in. If you want the summary data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:

mothur > summary.shared(shared=patient70.fn.shared, label=unique-0.03-0.05-0.10, calc=sharedsobs-sharedchao)

Opening patient70.fn.shared.summary you would see the output as:

label  A   B       sharedsobs  sharedchao
unique stool   tissue      73.000000   161.449997
0.03   stool   tissue      60.000000   75.916664
0.05   stool   tissue      51.000000   63.312500
0.10   stool   tissue      28.000000   33.416668

groups

If you had started this tutorial with the following commands:

mothur > make.shared(list=patient70.fn.list, group=patient70.sites.groups)
mothur > get.group(shared=patient70.fn.shared)

You would have seen that there were 7 groups here: 70A-70F and 70S. The sequences from 70S were collected from Patient 70’s stool sample those from samples 70A-70F were from their mucosa. These 7 groups would yield 21 pairwise comparisons if you ran the summary.shared command; however, if you were only interested in the comparisons between each mucosa site and the stool sample you could use the group option:

mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70A-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70B-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70C-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70D-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70E-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70F-70S)

Alternatively, if you want all of the pairwise comparisons you can either not include the group option or set it equal to “all”.

mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=all)

all

The sharedsobs and sharedchao calculators not only do the pairwise estimates, but also estimate the shared richness of all the groups in your file. This calculation is RAM intensive. If your RAM is limited and you have a large number of groups this may result in a crash, so by default the all parameter is set to false. To calculate the shared richness of all your groups, set the all parameter to true.

mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs-sharedchao, all=true)

distance

The distance parameter allows you to indicate you would like a distance file created for each calculator at each label, default=f.

mothur > summary.shared(shared=patient70.fn.shared, distance=true)

subsample

The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group.

iters

The iters parameter allows you to choose the number of times you would like to run the subsample.

output

The output parameter allows you to indicate if you want the distance file created by summary.shared to be in lower triangle or square format. Options are lt or square, lt is the default.

mothur > summary.shared(shared=patient70.fn.shared, distance=true, output=square)

processors

The processors option allows you to reduce the processing time by using multiple processors. Default processors=Autodetect number of available processors and use all available.

mothur > summary.shared(shared=patient70.fn.shared, processors=2)

Running this command on my laptop doesn’t exactly cut the time in half, but it’s pretty close. There is no software limit on the number of processors that you can use.

withreplacement

The withreplacement parameter allows you to indicate you want to subsample your data allowing for the same read to be included multiple times. Default=f.

Revisions

  • 1.24.0 - paralellized for Windows.
  • 1.29.0 - added subsampling parameters
  • 1.33.0 - Bug Fix: *.ave.dist matrix = 0 when processors > 2 when using the subsample parameter and not using the distance parameter. https://forum.mothur.org/viewtopic.php?f=4&t=2660&p=7372#p7372
  • 1.33.0 - added Square Root Jensen-Shannon Divergenceand jensen-shannon divergence calculators.
  • 1.40.0 - Speed and memory improvements for shared files. #357 , #347
  • 1.40.0 - Rewrite of threaded code. Default processors=Autodetect number of available processors and use all available.
  • 1.40.0 - Fixes segfault error for commands that use subsampling. #357 , #347
  • 1.42.0 - Adds withreplacement parameter to sub.sample command. #262
  • 1.43.0 - Modifies output files from dist.shared, summary.single and summary.shared. You may run with or without rarefaction, but not both. #607