Rarefaction

mothur calculates rarefaction curve data using the rarefaction.single command for the number of observed OTUs as a function of distance between sequences and the number of sequences sampled. By theory, the rarefaction curve should match the following expression:

\[S_n = S_t - \left (\frac{\sum_{i=1}^{S_t}{N - N_i \choose n} }{ {N \choose n}} \right )\]

where,

\(S_n\) = Average number of OTUs observed after drawing n individuals.

\(S_{t}\) = Total number of OTUs in sample of N total individuals.

In the amazonian dataset the total number of sequences was 98, and there were 84 total OTUs with 75 singletons, 6 doubletons, 1 tripleton, and 2 quadrupletons when defining OTUs at a 0.03 distance cutoff. If we were interested in the predicted observed richness after sampling 50 individuals from this distribution we would evaluate the formula:

\[S_{50} = 84 - 75\frac{\binom{98 - 1}{50}}{\binom{98}{50}} - 6\frac{\binom{98 - 2}{50}}{\binom{98}{50}} - 1\frac{\binom{98 - 3}{50}}{\binom{98}{50}}- 2\frac{\binom{98 - 4}{50}}{\binom{98}{50}}\] \[S_{50} = 84 - 75\left(0.4898 \right) - 6 \left(0.2373 \right) - 1 \left(0.1137 \right) - 2 \left(0.0539 \right) = 45.620\]

Although this closed form solution exists, it is actually very slow compared to averaging across a large number of randomized collector’s curves. The following table compares the output based on the formula and using 1,000 permutations using mothur:

n  Theory  mothur Diff. %Error     n   Theory  mothur   Diff. %Error
1  1.000   1.000 0.000  0.000      50  45.620  45.634 -0.014   0.031
2  1.996   1.995 0.001  0.049      51  46.461  46.477 -0.016   0.035
3  2.987   2.986 0.001  0.024      52  47.299  47.317 -0.017   0.037
4  3.974   3.971 0.003  0.069      53  48.136  48.154 -0.019   0.038
5  4.956   4.954 0.002  0.045      54  48.970  48.995 -0.025   0.051
6  5.935   5.934 0.001  0.010      55  49.802  49.822 -0.019   0.038
7  6.909   6.909 0.000  0.002      56  50.633  50.647 -0.014   0.028
8  7.880   7.881 -0.001 0.014      57  51.461  51.478 -0.016   0.032
9  8.846   8.847 -0.001 0.013      58  52.288  52.305 -0.018   0.034
10     9.808   9.809 -0.001 0.011      59  53.112  53.126 -0.014   0.027
11     10.767 10.768 -0.001 0.012      60  53.935  53.946 -0.011   0.020
12     11.721 11.721  0.000 0.001      61  54.755  54.767 -0.012   0.022
13     12.672 12.675 -0.003 0.028      62  55.574  55.586 -0.012   0.021
14     13.619 13.623 -0.004 0.031      63  56.391  56.401 -0.009   0.017
15     14.562 14.566 -0.004 0.027      64  57.206  57.213 -0.007   0.012
16     15.502 15.510 -0.008 0.051      65  58.020  58.026 -0.006   0.010
17     16.438 16.446 -0.008 0.046      66  58.832  58.840 -0.008   0.014
18     17.371 17.378 -0.007 0.040      67  59.642  59.647 -0.005   0.009
19     18.300 18.305 -0.005 0.027      68      60.450  60.448  0.002   0.003
20     19.225 19.233 -0.007 0.037      69  61.256  61.260 -0.004   0.006
21     20.148 20.158 -0.010 0.049      70  62.061  62.057  0.004   0.007
22     21.066 21.079 -0.013 0.061      71  62.865  62.861  0.004   0.006
23     21.982 21.995 -0.012 0.057      72  63.666  63.667  0.000   0.000
24     22.894 22.904 -0.010 0.042      73  64.467  64.469 -0.003   0.004
25     23.804 23.814 -0.010 0.042      74  65.265  65.273 -0.008   0.013
26     24.710 24.719 -0.010 0.039      75  66.062  66.070 -0.008   0.012
27     25.613 25.623 -0.011 0.041      76  66.857  66.862 -0.005   0.007
28     26.512 26.521 -0.008 0.032      77  67.651  67.654 -0.002   0.004
30     28.303 28.313 -0.010 0.034      79  69.235  69.241 -0.007   0.010
31     29.194 29.204 -0.010 0.033      80  70.024  70.035 -0.011   0.015
32     30.082 30.093 -0.010 0.034      81  70.812  70.824 -0.011   0.016
33     30.967 30.984 -0.016 0.052      82  71.599  71.612 -0.013   0.018
34     31.850 31.863 -0.013 0.041      83  72.384  72.391 -0.007   0.010
35     32.729 32.745 -0.015 0.047      84  73.168  73.175 -0.007   0.009
36     33.606 33.618 -0.012 0.035      85  73.950  73.958 -0.008   0.010
37     34.481 34.498 -0.017 0.050      86  74.731  74.746 -0.015   0.020
38     35.352 35.371 -0.019 0.052      87  75.511  75.530 -0.019   0.026
39     36.221 36.241 -0.020 0.055      88  76.289  76.302 -0.013   0.017
40     37.088 37.109 -0.021 0.056      89  77.066  77.081 -0.015   0.020
41     37.952 37.973 -0.022 0.057      90  77.842  77.856 -0.014   0.018
42     38.813 38.835 -0.021 0.055      91  78.616  78.619 -0.003   0.004
43     39.672 39.687 -0.014 0.036      92  79.389  79.391 -0.002   0.003
44     40.529 40.542 -0.013 0.032      93  80.161  80.161  0.000   0.000
45     41.383 41.391 -0.008 0.020      94  80.931  80.930  0.002   0.002
46     42.235 42.245 -0.010 0.023      95  81.700  81.700  0.000   0.000
47     43.085 43.093 -0.009 0.020      96  82.468  82.473 -0.005   0.006
48     43.932 43.945 -0.013 0.029      97  83.235  83.237 -0.003   0.003
49     44.777 44.792 -0.015 0.034      98  84.000  84.000  0.000   0.000