Jack

The jack calculator returns the interpolated Jackknife richness estimate for an OTU definition. This calculator can be used in the summary.single, collect.single, and rarefaction.single commands. The calculations for the Jackknife richness estimator are implemented as described by Burnham and Overton. Note that programs like EstiamteS and various microbial ecology papers present either the first and/or second order Jackknife estimate. This method essentially uses a statistical procedure to determine which order results in the minimum bias (error).

\[S_{jack,k} = S_{obs} + \sum_{i=1}^k \left ( -1 \right )^{i+1} {k \choose i} n_i\] \[var \left( S_{jack,k} \right ) = \sum_{i=1}^{n_t} \left ( a_{ik} \right )^2 n_i - S_{jack,k}\] \[a_{ik} = \begin{cases} \left ( -1 \right )^{i+1} {k \choose i} + 1\mbox{, i}=\mbox{1...k} 1\mbox{, i}>\mbox{k} \end{cases}\]

where,

k = The order of the Jackknife estimate

\(n_t\) = The number of sequences in the largest OTU.

To determine which order of the estimate to use it is necessary to calculate the test statistics, \(T_k\):

\[T_k = \frac{S_{jack,k+1} - S_{jack,k}}{\left ( var \left( S_{jack,k+1} - S_{jack,k} | S \right )\right )^2}\] \[var \left( S_{jack,k+1} - S_{jack,k} | S \right ) = \frac {S_{obs}}{S_{obs}-1} \left [ \sum_{i=1}^{n_1} \left ( b_{i}^2 n_i \right ) - \frac{\left ( s_{jack,k+1} - s_{jack,k} \right )^2 }{s_{obs}}\right ]\]

where,

\[b_i = a_{i,k+1}-a_{i,k}\]

For each \(T_k\) value, calculate its two-sided p-value. Find the first k-value where \(P_k\)>0.05 and calculate c and d:

\[c = \frac {0.05 - P_{k-1}}{P_k - P_{k-1}}\] \[d_i = ca_{i,k} + \left( i-c \right )a_{i,k-1}\]

With c and d, calculate the interpolated \(S_{jack}\) and its standard error:

\[S_{jack} = \sum_{i=1}^{n_1}d_i n_i\] \[se \left ( S_{jack} \right ) = \left ( \sum_{i=1}^{n_1} \left ( d_{i}^2 n_i\right )-S_{jack}\right )^{0.5}\]

Open the file 98_lt_phylip_amazon.fn.sabund generated using the Amazonian dataset with the following commands:

mothur > cluster(phylip=98_lt_phylip_amazon.dist, cutoff=0.10)

The 98_lt_phylip_amazon.fn.sabund file is also outputted to the terminal window when the cluster() command is executed:

unique 2   94  2   
0.00   2   92  3   
0.01   2   88  5   
0.02   4   84  2   2   1   
0.03   4   75  6   1   2   
0.04   4   69  9   1   2   
0.05   4   55  13  3   2   
0.06   4   48  14  2   4   
0.07   4   44  16  2   4   
0.08   7   35  17  3   2   1   0   1   
0.09   7   35  14  3   3   0   0   2   
0.10   7   34  13  3   2   0   0   3   

The first column is the label for the OTU definition and the second column is an integer indicating the number of sequences in the dominant OTU. The Jackknife estimator is then calculated using the values found in the subsequent columns. For example, the calculations for an OTU definition of 0.03 and k=1 would include:

\[S_{jack,1} = 84 + \left ( -1 \right )^{2} {1 \choose 1} 75 = 159\] \[a_{11}=\left ( -1 \right )^{2} {1 \choose 1} + 1 = 2\] \[a_{21}= a_{31}= a_{41}=1\] \[var \left( S_{jack,1} \right ) = \left(a_{11}^2 \right) 75 + \left(a_{21}^2 \right) 6 + \left(a_{31}^2 \right) 1 + \left(a_{41}^2 \right) 2-S_{jack,k}= \left(4\right) 75 + \left(1\right) 6 + \left(1\right) 1 + \left(1\right) 2 -159= 150\]

The statistics for the remaining k’s and the Tk and Pk can be calculated:

k     Sj,k    var      Tk       Pk
1     159     150     13.91  <0.0001
2     228     450     8.89   <0.0001
3     292     938     5.77   <0.0001
4     350     1700    3.36    0.0008<---
5     399     2940    1.54    0.1235<---
6     434     5250     

The p-value crosses 0.05 between a order of 4 and 5 and you can calculate a c-value of 0.40 and the interpolated \(S_{jack}\) of 369.64 with 95% confidence interval between 278.98 and 460.30.

Running...

mothur > summary.single(calc=jack)

...and opening 98_lt_phylip_amazon.fn.summary gives:

label  Jackknife   Jackknife_lci   Jackknife_hci
unique 1549.202880 939.980213  2158.425548
0.00   1106.253448 698.849662  1513.657235
0.01   705.482860  471.308255  939.657466
0.02   623.978806  464.534765  783.422847
0.03   369.639494  278.978083  460.300904 <---
0.04   297.567063  221.375904  373.758221
0.05   185.682530  141.619413  229.745648
0.06   150.890195  116.993035  184.787355
0.07   117.801661  95.890012   139.713311
0.08   95.071587   78.049671   112.093503
0.09   95.406507   77.224837   113.588177
0.10   93.278681   74.883420   111.673942

These are the same values that we found above for a cutoff of 0.03.