We will be offering an R workshop December 18-20, 2019. Learn more.

Difference between revisions of "Get.lineage"

From mothur
Jump to: navigation, search
m
m
Line 5: Line 5:
  
  
==Default Settings==
+
== Default Settings ==
To run get.lineage, you must provide the taxonomy or constaxonomy file and taxon.  The command will generate a *.pick.* file.   
+
To run get.lineage, you must provide a taxonomy or constaxonomy file and one or more taxon names.  The command will generate a *.pick.* file.   
  
===Running with a taxonomy file===
+
=== Running with a taxonomy file ===
 
To generate an taxonomy file, let's first run [[classify.seqs]]:
 
To generate an taxonomy file, let's first run [[classify.seqs]]:
  
Line 16: Line 16:
 
This generates abrecovery.silva.pick.taxonomy a file containing the 106 sequences from Bacteria;Firmicutes;
 
This generates abrecovery.silva.pick.taxonomy a file containing the 106 sequences from Bacteria;Firmicutes;
  
You can select sequences from multiple taxons by separating them with dashes. Example:
+
You can select sequences from multiple taxa by separating them with dashes. Example:
  
 
   mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;-Bacteria;Bacteroidetes;)
 
   mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;-Bacteria;Bacteroidetes;)
Line 22: Line 22:
 
This generates abrecovery.silva.pick.taxonomy a file containing the 187 sequences from Bacteria;Firmicutes; or Bacteria;Bacteroidetes;
 
This generates abrecovery.silva.pick.taxonomy a file containing the 187 sequences from Bacteria;Firmicutes; or Bacteria;Bacteroidetes;
  
You may enter your taxons with confidence scores, doing so will get only those sequences that belong to the taxonomy and whose confidence scores is above the scores you give.
+
You may enter your taxon names with confidence scores, doing so will get only those sequences that belong to the taxonomy and whose confidence scores is above the scores you give.
  
 
  mothur > classify.seqs(fasta=abrecovery.fasta, template=silva.nogap.fasta, taxonomy=silva.bacteria.silva.tax)
 
  mothur > classify.seqs(fasta=abrecovery.fasta, template=silva.nogap.fasta, taxonomy=silva.bacteria.silva.tax)
Line 29: Line 29:
 
This generates abrecovery.silva.pick.taxonomy a file containing the 104 sequences from Bacteria;Firmicutes; whose confidence scores are at 100 percent for Bacteria and at or above 90 for Firmicutes.
 
This generates abrecovery.silva.pick.taxonomy a file containing the 104 sequences from Bacteria;Firmicutes; whose confidence scores are at 100 percent for Bacteria and at or above 90 for Firmicutes.
  
===Running with a constaxonomy file===
+
=== Running with a constaxonomy file ===
 
First we need to find the consensus taxonomies for each OTU with the [[classify.otu]] command:
 
First we need to find the consensus taxonomies for each OTU with the [[classify.otu]] command:
  
 
  mothur > classify.otu(list=final.an.list, name=final.names, taxonomy=final.taxonomy)
 
  mothur > classify.otu(list=final.an.list, name=final.names, taxonomy=final.taxonomy)
  mothur > get.lineage(constaxonomy=final.an.0.03.cons.taxonomy, list=final.an.list, taxon='Bacteria(100);Firmicutes(100);', label=0.03)
+
  mothur > get.lineage(constaxonomy=final.an.0.03.cons.taxonomy, list=final.an.list, taxon=Bacteria;Firmicutes;, label=0.03)
  
 
This generates final.an.0.03.pick.list containing the 401 OTUs that classified to Bacteria;Firmicutes;.
 
This generates final.an.0.03.pick.list containing the 401 OTUs that classified to Bacteria;Firmicutes;.
  
==fasta option==
+
== fasta option ==
 
To use the fasta option, follow this example:
 
To use the fasta option, follow this example:
  
Line 44: Line 44:
 
This generates the file abrecovery.pick.fasta, which contains only sequences from Bacteria;Firmicutes;
 
This generates the file abrecovery.pick.fasta, which contains only sequences from Bacteria;Firmicutes;
  
==name option==
+
== name option ==
 
To use the name option, follow this example:
 
To use the name option, follow this example:
  
Line 51: Line 51:
 
This generates the file abrecovery.pick.names, which contains only the names of sequences from Bacteria;Firmicutes;
 
This generates the file abrecovery.pick.names, which contains only the names of sequences from Bacteria;Firmicutes;
  
===dups===
+
=== dups ===
 
The dups parameter is only used in tandem with a namefile.  By default, dups=TRUE, so if any sequence in a specific line in the names file is in your taxon, then all sequences in that line will be kept.  This is especially useful when used with the groupfile, since for most commands your files can contain only the unique sequences, but the groupfile needs to contain all the sequences in your namefile.   
 
The dups parameter is only used in tandem with a namefile.  By default, dups=TRUE, so if any sequence in a specific line in the names file is in your taxon, then all sequences in that line will be kept.  This is especially useful when used with the groupfile, since for most commands your files can contain only the unique sequences, but the groupfile needs to contain all the sequences in your namefile.   
  
==group option==
+
== group option ==
 
To use the group option, follow this example:
 
To use the group option, follow this example:
  
Line 61: Line 61:
 
This generates the file abrecovery.pick.groups, which contains only sequences from Bacteria;Firmicutes;
 
This generates the file abrecovery.pick.groups, which contains only sequences from Bacteria;Firmicutes;
  
==count option==
+
== count option ==
 
The [[Count_File | count]] file is similar to the name file in that it is used to represent the number of duplicate sequences for a given representative sequence.  It can also contain group information.  
 
The [[Count_File | count]] file is similar to the name file in that it is used to represent the number of duplicate sequences for a given representative sequence.  It can also contain group information.  
  
 
  mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;, count=abrecovery.count_table)
 
  mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;, count=abrecovery.count_table)
  
== alignreport option==
+
== alignreport option ==
 
To use the alignreport option, follow this example:
 
To use the alignreport option, follow this example:
  
Line 73: Line 73:
 
This generates the file abrecovery.pick.align.report, which contains only sequences from Bacteria;Firmicutes;
 
This generates the file abrecovery.pick.align.report, which contains only sequences from Bacteria;Firmicutes;
  
== list option==
+
== list option ==
 
To use the list option, follow this example:
 
To use the list option, follow this example:
  
Line 80: Line 80:
 
This generates the file abrecovery.fn.pick.list, which contains only sequences from Bacteria;Firmicutes;
 
This generates the file abrecovery.fn.pick.list, which contains only sequences from Bacteria;Firmicutes;
  
== constaxonomy && shared && list==
+
== constaxonomy option ==
If you provide a constaxonomy file, mothur will select the OTUs from a shared or list file that are assigned to the requested taxon. The constaxonomy parameter may only be used with list or shared.
+
If you provide a constaxonomy file, mothur will select the OTUs from a shared or list file that are assigned to the requested taxon. The constaxonomy parameter requires a list or shared file.
  
  mothur > get.lineage(constaxonomy=final.an.0.03.cons.taxonomy, shared=final.an.shared, taxon='Bacteria(100);Firmicutes(100);', label=0.03)
+
  mothur > get.lineage(constaxonomy=final.an.0.03.cons.taxonomy, shared=final.an.shared, taxon=Bacteria;Firmicutes;, label=0.03)
  
 
or  
 
or  
  
  mothur > get.lineage(constaxonomy=final.an.0.03.cons.taxonomy, list=final.an.list, taxon='Bacteria(100);Firmicutes(100);', label=0.03)
+
  mothur > get.lineage(constaxonomy=final.an.0.03.cons.taxonomy, list=final.an.list, taxon=Bacteria;Firmicutes;, label=0.03)
  
==Revisions==
+
== Revisions ==
 
* 1.28.0 Added count parameter
 
* 1.28.0 Added count parameter
 
* 1.28.0 Can handle () in taxon names, http://www.mothur.org/forum/viewtopic.php?f=3&t=1815
 
* 1.28.0 Can handle () in taxon names, http://www.mothur.org/forum/viewtopic.php?f=3&t=1815

Revision as of 19:59, 11 March 2016

The get.lineage command reads a taxonomy file and a taxon and generates a new file that contains only the sequences in the that are from that taxon. You may also include either a fasta, name, group, list, count or align.report file to this command and mothur will generate new files for each of those containing only the selected sequences. To complete this tutorial, you are encouraged to obtain the AbRecovery dataset.



Default Settings

To run get.lineage, you must provide a taxonomy or constaxonomy file and one or more taxon names. The command will generate a *.pick.* file.

Running with a taxonomy file

To generate an taxonomy file, let's first run classify.seqs:

mothur > classify.seqs(fasta=abrecovery.fasta, template=silva.nogap.fasta, taxonomy=silva.bacteria.silva.tax)
mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;)

This generates abrecovery.silva.pick.taxonomy a file containing the 106 sequences from Bacteria;Firmicutes;

You can select sequences from multiple taxa by separating them with dashes. Example:

 mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;-Bacteria;Bacteroidetes;)

This generates abrecovery.silva.pick.taxonomy a file containing the 187 sequences from Bacteria;Firmicutes; or Bacteria;Bacteroidetes;

You may enter your taxon names with confidence scores, doing so will get only those sequences that belong to the taxonomy and whose confidence scores is above the scores you give.

mothur > classify.seqs(fasta=abrecovery.fasta, template=silva.nogap.fasta, taxonomy=silva.bacteria.silva.tax)
mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria(100);Firmicutes(90);)

This generates abrecovery.silva.pick.taxonomy a file containing the 104 sequences from Bacteria;Firmicutes; whose confidence scores are at 100 percent for Bacteria and at or above 90 for Firmicutes.

Running with a constaxonomy file

First we need to find the consensus taxonomies for each OTU with the classify.otu command:

mothur > classify.otu(list=final.an.list, name=final.names, taxonomy=final.taxonomy)
mothur > get.lineage(constaxonomy=final.an.0.03.cons.taxonomy, list=final.an.list, taxon=Bacteria;Firmicutes;, label=0.03)

This generates final.an.0.03.pick.list containing the 401 OTUs that classified to Bacteria;Firmicutes;.

fasta option

To use the fasta option, follow this example:

mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;, fasta=abrecovery.fasta)

This generates the file abrecovery.pick.fasta, which contains only sequences from Bacteria;Firmicutes;

name option

To use the name option, follow this example:

mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;, name=abrecovery.names)

This generates the file abrecovery.pick.names, which contains only the names of sequences from Bacteria;Firmicutes;

dups

The dups parameter is only used in tandem with a namefile. By default, dups=TRUE, so if any sequence in a specific line in the names file is in your taxon, then all sequences in that line will be kept. This is especially useful when used with the groupfile, since for most commands your files can contain only the unique sequences, but the groupfile needs to contain all the sequences in your namefile.

group option

To use the group option, follow this example:

mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;, group=abrecovery.groups)

This generates the file abrecovery.pick.groups, which contains only sequences from Bacteria;Firmicutes;

count option

The count file is similar to the name file in that it is used to represent the number of duplicate sequences for a given representative sequence. It can also contain group information.

mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;, count=abrecovery.count_table)

alignreport option

To use the alignreport option, follow this example:

mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;, alignreport=abrecovery.align.report)

This generates the file abrecovery.pick.align.report, which contains only sequences from Bacteria;Firmicutes;

list option

To use the list option, follow this example:

mothur > get.lineage(taxonomy=abrecovery.silva.taxonomy, taxon=Bacteria;Firmicutes;, list=abrecovery.fn.list)

This generates the file abrecovery.fn.pick.list, which contains only sequences from Bacteria;Firmicutes;

constaxonomy option

If you provide a constaxonomy file, mothur will select the OTUs from a shared or list file that are assigned to the requested taxon. The constaxonomy parameter requires a list or shared file.

mothur > get.lineage(constaxonomy=final.an.0.03.cons.taxonomy, shared=final.an.shared, taxon=Bacteria;Firmicutes;, label=0.03)

or

mothur > get.lineage(constaxonomy=final.an.0.03.cons.taxonomy, list=final.an.list, taxon=Bacteria;Firmicutes;, label=0.03)

Revisions