rarefaction.shared

The rarefaction.shared command will generate inter-sample rarefaction curves using a re-sampling without replacement approach. The traditional way that ecologists use rarefaction is not to randomize the sampling order within a sample, rather between samples. For instance, if we wanted to know the number of OTUs in the human colon, we might sample from various sites within the colon, and sequence a bunch of 16S rRNA genes. By determining the number of OTUs in each sample and comparing the composition of those samples it is possible to determine how well you have sampled the biodiversity within the individual. mothur has the ability to generate data for inter-sample rarefaction curves for the number of observed species. For this tutorial you should download and decompress Patient70Data.zip

Default settings

For example:

mothur > make.shared(list=patient70.fn.list, group=patient70.sites.groups)

By default, the rarefaction.shared() command will perform 1,000 randomizations of the order in which the samples are considered. So if you run rarefaction.shared() multiple times, you will get slightly different results. To get the default behavior enter:

mothur > rarefaction.shared(shared=patient70.fn.shared)

This will result in output to the screen looking like:

unique 1
0.00   2
0.01   3
0.02   4
0.03   5
0.04   6
0.05   7
0.06   8
0.07   9
0.08   10
0.09   11
0.10   12

The left column indicates the label for each line in the data set and the right column indicates the row number in the data set. Execution of rarefaction.shared() will generate the file patient70.fn.shared.rarefaction, which looks like:

samples    unique      lci     hci     0.00        lci     hci     ... 
1  443.7500    164.7345    722.7655    182.8480    86.3489     279.3471    ...
2  850.0090    490.6902    1209.3278   308.3940    187.8304    428.9576    ...
3  1248.1000   855.0973    1641.1027   413.4350    291.1591    535.7109    ...
4  1634.4600   1251.6258   2017.2942   502.5470    392.0305    613.0635    ...
5  2002.1400   1655.6566   2348.6234   582.0000    485.6954    678.3046    ...
6  2375.1600   2107.6512   2642.6688   650.0620    578.2567    721.8673    ...
7  2742.0000   2742.0000   2742.0000   713.0000    713.0000    713.0000    ...

The actual file has many more columns, but this should give you the idea. The first column is the number of samples that have been considered. Since their were 7 group names in the group file, rarefaction.shared() rarefied the data over the 7 groups. The rest of the columns are in triplets. The first column of the triplet is the average number of OTUs observed after sampling 1, 2, 3, etc. groups. The second and third columns of each triplet are the bounds on the 95% confidence interval of average. If you look closely, you will find that the average for 1 sample is the average number of OTUs observed across all of the samples. Also, if you look at the data in the row for 7 samples you will see that the bounds on the 95% confidence interval are the same as the average. Finally, the richness observed for sample 7 is the total number of OTUs you would observe if you processed patient70.fn.list through summary.single().

Options

label

There may only be a couple of lines in your OTU data that you are interested in generating the rarefaction curve for. There are two options. You could: (i) manually delete the lines you aren’t interested in from your list file; (ii) or use the label option. To use the label option with the rarefaction.shared() command you need to know the labels you are interested in. If you want the rarefaction curve data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:

mothur > rarefaction.shared(shared=patient70.fn.shared, label=unique-0.03)

In the file patient70.fn.shared.rarefaction you would see something like:

numsampled unique      lci     hci     0.03        lci     hci
1      437.9010    156.7862    719.0158    58.6599     32.9568     84.3630
2      844.5540    484.9140    1204.1940   76.2709     52.1988     100.3430
3      1234.6800   850.1886    1619.1714   86.6658     64.5309     108.8007
4      1617.0100   1239.6273   1994.3927   94.2071     74.2022     114.2120
5      2003.0800   1661.2946   2344.8654   100.1300    82.7736     117.4864
6      2376.2700   2105.6267   2646.9133   105.6120    92.6939     118.5301
7      2742.0000   2742.0000   2742.0000   111.0000    111.0000    111.0000

iters

To improve the accuracy of the calculations you can change the number of randomizations that are performed using the iters option; the default value is 1,000. Running 10,000 randomization should take 10-times as long as the default:

mothur > rarefaction.shared(shared=patient70.fn.shared, iters=10000)

processors

The processors option allows you to reduce the processing time by using multiple processors. By default mothur will use all the processors found. You can set the processors with the following option:

mothur > rarefaction.shared(shared=patient70.fn.shared, processors=2)

jumble

Obviously, the goal of rarefaction is to randomize across the samples; however, if you just want a collector’s curve across the samples you can use the jumble option:

mothur > rarefaction.shared(shared=patient70.fn.shared, jumble=0)

design

The design parameter allows you to assign your groups to sets. If provided mothur will run rarefaction.shared on a per set basis.

mothur > rarefaction.shared(shared=final.an.0.03.subsample.0.03.pick.shared, design=mouse.sex_time.design)

sets

The sets parameter allows you to specify which of the sets in your design file you would like to analyze. The set names are separated by dashes. The default is all sets in the design file.

mothur > rarefaction.shared(shared=final.an.0.03.subsample.0.03.pick.shared, design=mouse.sex_time.design, sets=F003Late)

groupmode

If you are running rarefaction.shared with a design file and would like your results collated in multiple files, set groupmode=f. (Default=True).

mothur > rarefaction.shared(shared=final.an.0.03.subsample.0.03.pick.shared, design=mouse.sex_time.design, groupmode=f)

groups

If you had started this tutorial with the following comamnds:

mothur > get.group(shared=patient70.fn.shared)

You would have seen that there were 7 groups here: 70A-70F and 70S. The sequences from 70S were collected from Patient 70’s stool sample those from samples 70A-70F were from their mucosa. These 7 groups would yield 21 lies if you ran the rarefaction.shared() command; however, if you were only interested in the comparisons between each mucosa site you could use the groups option:

mothur > rarefaction.shared(shared=patient70.fn.shared, groups=70A-70B-70C-70D-70E-70F)

Alternatively, if you want all of the pairwise comparisons you can either not include the group option or set it equal to “all”.

mothur > rarefaction.shared(shared=patient70.fn.shared, calc=sharedobserved, groups=all)

subsample

The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group.

subsampleiters

The subsampleiters parameter allows you to choose the number of times you would like to run the subsample. Default=1000.

withreplacement

The withreplacement parameter allows you to indicate you want to subsample your data allowing for the same read to be included multiple times. Default=f.

Visualization with R

To visual your data follow Pat’s tutorial here.

Revisions

  • 1.26.0 - added design, sets and groupmode parameters
  • 1.29.0 - added subsampling parameters.
  • 1.40.0 - Speed and memory improvements for shared files. #357 , #347
  • 1.40.0 - Fixes segfault error for commands that use subsampling. #357 , #347
  • 1.42.0 - Adds processors option. #544
  • 1.42.0 - Adds withreplacement parameter to sub.sample command. #262
  • 1.45.0 rarefaction.shared groups output bug. #729