phylo.diversity

The phylo.diversity command requires an input tree file. Two files will be output: .phylo.diversity and (if you set rarefy=T) .rarefaction. To run this tutorial download AbRecovery.zip

Default settings

For example:

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups)

or with a count file:

mothur > phylo.diversity(tree=abrecovery.paup.nj, count=abrecovery.count_table)

Execution of phylo.diversity() will generate the file abrecovery.paup.1.phylodiv.summary, which looks like:

Groups    numSampled    phyloDiversity
C    74    1.7782
A    84    2.0530
B    84    2.4740  

Options

name

The name option allows you to enter a namefile with your treefile.

 mothur > phylo.diversity(tree=abrecovery.phylip.nj, group=abrecovery.groups, name=abrecovery.names)

count

The count file is similar to the name file in that it is used to represent the number of duplicate sequences for a given representative sequence. It can also contain group information.

 mothur > make.table(group=abrecovery.groups, name=abrecovery.names)
 mothur > phylo.diversity(tree=abrecovery.phylip.nj, count=abrecovery.count_table)

groups

The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. The group names are separated by dashes. By default all groups are used.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, groups=A-B)

In the file abrecovery.paup.1.phylodiv.summary you would see something like:

Groups    numSampled    phyloDiversity
A    84    2.0530
B    84    2.4740          

rarefy

The rarefy parameter allows you to calculate the rarefaction data. The default is false. If you set rarefy=T, 1000 randomizations will be performed by dafault.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, rarefy=T)

In the file abrecovery.paup.1.phylodiv.rarefaction you would see something like:

numSampled C   A   B   
1  0.3187  0.2612  0.2537  
74 1.7782  1.9327  2.3498  
84 NA  2.0530  2.4740      

collect

The collect parameter allows you to create a collectors curve. The default is false.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T)

In the file abrecovery.paup.1.phylodiv.summary you would see something like:

numSampled C   A   B   
1  0.3665  0.4120  0.2117  
74 1.7782  1.9658  2.3353  
84 NA  2.0530  2.4740

summary

The summary parameter allows you to create a .summary file. The default is true.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, summary=T)

In the file abrecovery.paup.1.phylodiv you would see something like:

Groups numSampled  phyloDiversity
C  74  1.7782
A  84  2.0530
B  84  2.4740

freq

For larger datasets you might not be interested in obtaining all of the data for the number of sequences sampled. For instance, if you have 100,000 sequences, you may only want to output the data every 100 sequences. Alternatively, if you only have 100 sequences, you may only want to output all of the data. The default setting is to output data every 100 sequences. By altering the freq option you can set the frequency that the analysis is performed:

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T, freq=1)

or

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T,freq=10)

or you set set the frequency as a proportion of the total sequences. For example to output after 10%:

 mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T, freq=0.10)

The third command would generate data such as this:

numSampled    C    A    B    
1    0.3073    0.2236    0.1102    
8    1.0329    0.7658    0.8554    
16    1.1919    1.0696    1.3787    
24    1.2975    1.2211    1.4297    
32    1.3998    1.3733    1.6132    
40    1.5672    1.3934    1.7927    
48    1.6548    1.4878    1.8862    
56    1.6727    1.6068    1.9921    
64    1.7378    1.8582    2.1792    
72    1.7707    1.9093    2.3820    
74    1.7782    1.9499    2.4021    
80    NA    2.0316    2.4492    
84    NA    2.0530    2.4740       

iters

To improve the accuracy of the calculations you can change the number of randomizations that are performed using the iters option; the default value is 1,000. Running 10,000 randomization should take 10-times as long as the default:

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, rarefy=T, iters=10000)

scale

The scale parameter is used indicate that you want your output scaled to the number of sequences sampled, default = false.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, collect=T, scale=t)

In the file abrecovery.paup.1.phylodiv you would see something like:

numSampled C   A   B   
1  0.3053  0.4170  0.0921  
74 0.0240  0.0248  0.0325  
84 NA  0.0244  0.0295  

sampledepth

The sampledepth parameter allows you to enter the number of sequences you want to sample.

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, sampledepth=50)

numSampled A   B   C
1  0.2298  0.1765  0.2469
50 1.6520  2.1287  1.5145

processors

The processors parameter allows you to use multiple processors to reduce processing time. Default processors=Autodetect number of available processors and use all available. You can use 2 processors with the following option:

mothur > phylo.diversity(tree=abrecovery.paup.nj, group=abrecovery.groups, processors=2)

Running this command on my laptop doesn’t exactly cut the time in half, but it’s pretty close. There is no software limit on the number of processors that you can use.

Revisions

  • 1.28.0 Added count parameter
  • 1.31.0 Added multiple processors for Windows.
  • 1.36.0 Adds sampledepth parameter. - https://forum.mothur.org/viewtopic.php?f=3&t=3320
  • 1.40.0 Rewrite of threaded code. Default processors=Autodetect number of available processors and use all available.