summary.shared
The summary.shared command will produce a summary file that has the calculator value for each line in the OTU data and for all possible comparisons between the different groups in the group file. This can be useful if you aren’t interested in generating collector’s or rarefaction curves for your multi-sample data analysis. It would be worth your while, however, to look at the collector’s curves for the calculators you are interested in to determine how sensitive the values are to sampling. If the values are not sensitive to sampling, then you can trust the values. Otherwise, you need to keep sampling. For this tutorial you should download and decompress Patient70Data.zip
Default settings
First you will need to make a shared file from your list and group files.
mothur > make.shared(list=patient70.fn.list, group=patient70.tissue_stool.groups)
The summary data for multi-sample calculators are generated by default with the following command:
mothur > summary.shared(shared=patient70.fn.shared)
This will result in output to the screen looking like:
unique 1
0.00 2
0.01 3
0.02 4
0.03 5
0.04 6
0.05 7
0.06 8
0.07 9
0.08 10
0.09 11
0.10 12
The left column indicates the label for each line in the data set and the right column indicates the row number in the data set. In sons, the summary data was provided in a file ending in “sons.ltt” and was only generated after the collector’s curves were generated. Now, in mothur, all of this data is contained within a single “shared.summary” file. In this case data was written to the file patient70.fn.shared.summary, which looks like:
label comparison sharedsobs sharedchao sharedace JAbund SorAbund Jclass SorClass
unique stool tissue 73.000000 161.449997 108.60603 0.150565 0.261723 0.026613 0.051847
0.00 stool tissue 124.000000 237.481247 254.53860 0.489131 0.656935 0.174402 0.297006
0.01 stool tissue 94.000000 162.892853 135.36864 0.736210 0.848066 0.367188 0.537143
0.02 stool tissue 76.000000 110.477272 86.50789 0.892669 0.943291 0.554745 0.713615
0.03 stool tissue 60.000000 75.916664 72.30236 0.926541 0.961870 0.545455 0.705882
...
Again, the first column contains the label for the row in the data set you are analyzing. The second and third columns give the group names of the pairwise comparison that is represented by the row. Further columns are labeled to indicate the calculator that was used to generate the data. For instance, here the data in the column labeled SharedSobs contains the number of OTUs that were observed to be shared between groups for each line in the list file. This is actually just a snippet of the file; there are 11 calculators that are calculated by default.
Options
calc
If you don’t want to see all of the default calculators, you can tell mothur which ones to use in the summary file:
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs-sharedchao-jest)
This would generate the patient70.fn.shared.summary file:
label A B sharedsobs sharedchao Jest
unique stool tissue 73.000000 161.449997 0.008066
0.00 stool tissue 124.000000 237.481247 0.219289
0.01 stool tissue 94.000000 162.892853 0.546228
0.02 stool tissue 76.000000 110.477272 0.665435
...
label
There may only be a couple of lines in your OTU data that you are interested in summarizing. There are two options. You could: (i) manually delete the lines you aren’t interested in from you rabund, sabund, or list file; (ii) or use the label option. To use the label option with either the summary.single() command you need to know the labels you are interested in. If you want the summary data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:
mothur > summary.shared(shared=patient70.fn.shared, label=unique-0.03-0.05-0.10, calc=sharedsobs-sharedchao)
Opening patient70.fn.shared.summary you would see the output as:
label A B sharedsobs sharedchao
unique stool tissue 73.000000 161.449997
0.03 stool tissue 60.000000 75.916664
0.05 stool tissue 51.000000 63.312500
0.10 stool tissue 28.000000 33.416668
groups
If you had started this tutorial with the following commands:
mothur > make.shared(list=patient70.fn.list, group=patient70.sites.groups)
mothur > get.group(shared=patient70.fn.shared)
You would have seen that there were 7 groups here: 70A-70F and 70S. The sequences from 70S were collected from Patient 70’s stool sample those from samples 70A-70F were from their mucosa. These 7 groups would yield 21 pairwise comparisons if you ran the summary.shared command; however, if you were only interested in the comparisons between each mucosa site and the stool sample you could use the group option:
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70A-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70B-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70C-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70D-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70E-70S)
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=70F-70S)
Alternatively, if you want all of the pairwise comparisons you can either not include the group option or set it equal to “all”.
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs, groups=all)
all
The sharedsobs and sharedchao calculators not only do the pairwise estimates, but also estimate the shared richness of all the groups in your file. This calculation is RAM intensive. If your RAM is limited and you have a large number of groups this may result in a crash, so by default the all parameter is set to false. To calculate the shared richness of all your groups, set the all parameter to true.
mothur > summary.shared(shared=patient70.fn.shared, calc=sharedsobs-sharedchao, all=true)
distance
The distance parameter allows you to indicate you would like a distance file created for each calculator at each label, default=f.
mothur > summary.shared(shared=patient70.fn.shared, distance=true)
subsample
The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group.
iters
The iters parameter allows you to choose the number of times you would like to run the subsample.
output
The output parameter allows you to indicate if you want the distance file created by summary.shared to be in lower triangle or square format. Options are lt or square, lt is the default.
mothur > summary.shared(shared=patient70.fn.shared, distance=true, output=square)
processors
The processors option allows you to reduce the processing time by using multiple processors. Default processors=Autodetect number of available processors and use all available.
mothur > summary.shared(shared=patient70.fn.shared, processors=2)
Running this command on my laptop doesn’t exactly cut the time in half, but it’s pretty close. There is no software limit on the number of processors that you can use.
withreplacement
The withreplacement parameter allows you to indicate you want to subsample your data allowing for the same read to be included multiple times. Default=f.
Revisions
- 1.24.0 - paralellized for Windows.
- 1.29.0 - added subsampling parameters
- 1.33.0 - Bug Fix: *.ave.dist matrix = 0 when processors > 2 when using the subsample parameter and not using the distance parameter. https://forum.mothur.org/viewtopic.php?f=4&t=2660&p=7372#p7372
- 1.33.0 - added Square Root Jensen-Shannon Divergenceand jensen-shannon divergence calculators.
- 1.40.0 - Speed and memory improvements for shared files. #357 , #347
- 1.40.0 - Rewrite of threaded code. Default processors=Autodetect number of available processors and use all available.
- 1.40.0 - Fixes segfault error for commands that use subsampling. #357 , #347
- 1.42.0 - Adds withreplacement parameter to sub.sample command. #262
- 1.43.0 - Modifies output files from dist.shared, summary.single and summary.shared. You may run with or without rarefaction, but not both. #607