remove.groups

The remove.groups command removes sequences from a specific group or set of groups from the following file types: fasta, count, shared, list, taxonomy, phylip, column, design, name and group. To run through the example below, download Example Data

Default Settings

The count parameter is required, unless you have a current count file, or are using a shared file. The command will generate a *.pick.* file. To remove the groups F3D0, F3D1 and F3D8, run the following:

mothur > remove.groups(count=final.count_table, groups=F3D0-F3D1-F3D8)

or you may wish to list your groups in a file instead of manually typing them. This can be done by using an accnos file.

mothur > remove.groups(count=final.count_table, accnos=final.groupsToRemove.accnos) 

final.groupsToRemove.accnos looks like:

F3D0
F3D1
F3D8

Both commands will output the file final.pick.count_table containing the 2132 unique sequences that represent the 98863 sequences in groups F3D141, F3D142, F3D143, F3D144, F3D145, F3D146, F3D147, F3D148, F3D149, F3D150, F3D2, F3D3, F3D5, F3D6, F3D7 and F3D9.

Options

fasta

To use the fasta option, follow this example:

mothur > remove.groups(count=final.count_table, fasta=final.fasta, groups=F3D0-F3D1-F3D8)

This generates the file final.pick.fasta as well as the final.pick.count_table with the same sequences. If you open the final.pick.fasta file you will see something like:

>M00967_43_000000000-A3JHG_1_2102_22092_15614
TAC--GG-AG-GAT--GCG-A-G-C-G-T-T--AT-C-CGG-AT--TT-A-T-T--GG-GT--TT-AAGA-GG-GT-GC-G-TA-GGC-G-G-A-CA-G-T-T-AA-G-T-C-T-G-C-...
>M00967_43_000000000-A3JHG_1_1102_8406_20325
TAC--GG-AG-GAT--CCG-A-G-C-G-T-T--AT-C-CGG-AT--TT-A-T-T--GG-GT--TT-A-AA-GG-GA-GC-C-TA-GGT-G-G-A-TT-G-T-T-AA-G-T-C-A-G-T-...
>M00967_43_000000000-A3JHG_1_1106_15955_6621
TAC--GT-AG-GTG--GCA-A-G-C-G-T-T--AT-C-CGG-AT--TT-A-C-T--GG-GT--GT-A-AA-GG-GA-GC-G-TA-GAC-G-G-C-AG-G-G-C-AA-G-T-C-T-G-G-...
>M00967_43_000000000-A3JHG_1_2103_24256_12640
TAC--GT-AG-GGG--GCG-A-G-C-G-T-T--GT-C-CGG-AA--TG-A-C-T--GG-GC--GT-A-AA-GG-GA-GT-G-TA-GGC-G-G-C-CT-T-T-T-AA-G-T-T-A-T-G-...
>M00967_43_000000000-A3JHG_1_1101_6929_7655
TAC--GG-AG-GAT--GCG-A-G-C-G-T-T--AT-C-CGG-AT--TT-A-T-T--GG-GT--TT-A-AA-GG-GT-GC-G-TA-GGC-G-G-T-TT-G-T-C-AA-G-T-C-A-G-C-...
>M00967_43_000000000-A3JHG_1_1105_5520_9241
TAC--GT-AG-GGG--GCA-A-G-C-G-T-T--AT-C-CGG-AT--TT-A-C-T--GG-GT--GT-A-AA-GG-GA-GC-G-TA-GAC-G-G-C-GA-T-G-C-AA-G-T-C-T-G-A-...
>M00967_43_000000000-A3JHG_1_1112_5981_8948
TAC--GT-GG-GAT--GCG-A-G-C-G-T-T--AT-C-CGG-AT--TT-A-T-T--GG-GT--TT-A-AA-GG-GT-GC-G-CA-GGC-G-G-A-AG-A-T-C-AA-G-T-C-A-G-C-...
...

list

To use the list option, follow this example:

mothur > remove.groups(count=final.count_table, list=final.opti_mcc.list, groups=F3D0-F3D1-F3D8)

This generates the file final.opti_mcc.0.03.pick.list, which contains the 500 otus containing sequences from the groups F3D141, F3D142, F3D143, F3D144, F3D145, F3D146, F3D147, F3D148, F3D149, F3D150, F3D2, F3D3, F3D5, F3D6, F3D7 or F3D9.

label	numOtus	Otu001	Otu002	Otu003	Otu004	Otu005	Otu006	Otu007	Otu008	Otu009	Otu010	Otu011	Otu012	...
0.03	500	M00967_43_000000000-A3JHG_1_1101_20262_22075,M00967_43_000000000-A3JHG_1_2101_13755_8645,M00967_43_000000000-A3JHG_1_1110_16292_27414,M00967_43_000000000-A3JHG_1_2108_7751_22068,M00967_43_000000000-A3JHG_1_1110_19425_20093,M00967_43_000000000-A3JHG_1_2101_7036_14239,M00967_43_000000000-A3JHG_1_2103_14884_16027,M00967_43_000000000-...

as well as a final.pick.count_table with the same sequences.

taxonomy

To use the taxonomy option, follow this example:

mothur > remove.groups(count=final.count_table, taxonomy=final.taxonomy, groups=F3D0-F3D1-F3D8)

This generates the file final.pick.taxonomy, which looks like:

M00967_43_000000000-A3JHG_1_1106_24198_4737	Bacteria(100);Bacteria_unclassified(100);Bacteria_unclassified(100);Bacteria_unclassified(100);Bacteria_unclassified(100);Bacteria_unclassified(100);
M00967_43_000000000-A3JHG_1_2109_10445_7921	Bacteria(100);Firmicutes(97);Clostridia(97);Clostridiales(97);Lachnospiraceae(94);Lachnospiraceae_unclassified(94);
M00967_43_000000000-A3JHG_1_1106_7807_21999	Bacteria(100);Firmicutes(92);Clostridia(91);Clostridiales(91);Lachnospiraceae(89);Lachnospiraceae_unclassified(89);
M00967_43_000000000-A3JHG_1_2113_27238_12024	Bacteria(100);Firmicutes(100);Clostridia(100);Clostridiales(100);Lachnospiraceae(98);Dorea(82);
M00967_43_000000000-A3JHG_1_2109_21713_13835	Bacteria(100);Firmicutes(99);Clostridia(99);Clostridiales(99);Lachnospiraceae(99);Lachnospiraceae_unclassified(99);
M00967_43_000000000-A3JHG_1_1104_18175_14208	Bacteria(100);"Bacteroidetes"(99);"Bacteroidia"(93);"Bacteroidales"(93);"Porphyromonadaceae"(83);"Porphyromonadaceae"_unclassified(83);
M00967_43_000000000-A3JHG_1_2104_19276_2046	Bacteria(100);Firmicutes(100);Clostridia(99);Clostridiales(99);Lachnospiraceae(97);Lachnospiraceae_unclassified(97);
M00967_43_000000000-A3JHG_1_1101_26300_8238	Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(96);"Bacteroidales"(96);"Porphyromonadaceae"(90);"Porphyromonadaceae"_unclassified(90);
M00967_43_000000000-A3JHG_1_2111_18339_15747	Bacteria(100);Firmicutes(100);Clostridia(100);Clostridiales(100);Ruminococcaceae(91);Butyricicoccus(89);
M00967_43_000000000-A3JHG_1_1114_7614_20887	Bacteria(100);"Bacteroidetes"(98);"Bacteroidia"(91);"Bacteroidales"(91);"Porphyromonadaceae"(91);"Porphyromonadaceae"_unclassified(91);
...

as well as a final.pick.count_table with the same sequences.

shared

To use the shared option, follow this example:

mothur > remove.groups(shared=final.opti_mcc.shared, groups=F3D0-F3D1-F3D8)

This generates the file final.opti_mcc.0.03.pick.shared, which contains the following lines:

label	Group	numOtus	Otu001	Otu002	Otu003	Otu004	Otu005	Otu006	Otu007	Otu008	Otu009	Otu010	Otu011	Otu012	...
0.03	F3D141	500	388	335	301	482	425	279	206	331	119	175	111	101	8	13	28	96	60	4	37	45	48	76	11	42	6	...
0.03	F3D142	500	244	258	142	152	253	193	201	81	72	43	51	81	96	6	8	45	24	5	21	4	9	28	20	7	4	...
0.03	F3D143	500	189	152	173	207	308	183	116	92	62	76	70	43	37	2	7	36	26	1	15	9	8	56	17	13	3	...
0.03	F3D144	500	346	243	260	316	469	282	132	36	140	256	96	84	13	7	10	100	38	5	39	39	14	13	17	23	3	...
0.03	F3D145	500	566	445	461	527	665	471	362	127	217	319	131	188	18	8	14	107	81	5	81	32	11	12	18	11	5	...
0.03	F3D146	500	270	204	225	365	397	257	193	66	78	185	80	71	4	39	22	35	47	3	29	18	21	68	24	24	19	...
0.03	F3D147	500	1316	1107	819	1052	1619	1051	545	89	240	498	526	476	135	22	70	277	358	31	117	139	...
0.03	F3D148	500	750	666	520	830	1230	725	499	542	186	419	283	239	17	47	43	248	186	44	106	105	24	87	88	24	...
0.03	F3D149	500	770	701	650	861	879	651	454	542	257	497	290	159	74	39	37	163	155	3	99	133	41	245	32	59	8	...
0.03	F3D150	500	265	207	343	436	281	276	182	110	132	57	161	71	55	19	18	30	58	4	40	19	12	128	29	22	6	...
0.03	F3D2	500	3076	1439	1065	451	148	410	354	1222	572	60	306	143	308	550	337	17	32	378	173	29	148	...
0.03	F3D3	500	831	543	400	187	23	213	448	398	268	175	221	65	84	16	47	22	13	328	89	115	34	1	18	19	35	...
0.03	F3D5	500	278	235	236	148	20	137	142	193	191	67	115	66	45	83	78	29	12	13	40	7	38	2	116	32	47	...
0.03	F3D6	500	902	608	507	381	12	356	501	250	218	48	128	59	392	72	54	0	6	66	34	26	50	5	72	52	57	...
0.03	F3D7	500	576	449	385	294	9	235	517	223	251	13	117	43	106	25	35	2	7	37	15	35	13	2	32	12	10	...
0.03	F3D9	500	435	374	429	189	5	255	610	419	203	28	97	22	166	98	94	0	8	39	33	22	37	3	79	45	37	...

You can see that some OTU’s were entirely eliminated because they do not contain sequences from the groups F3D141, F3D142, F3D143, F3D144, F3D145, F3D146, F3D147, F3D148, F3D149, F3D150, F3D2, F3D3, F3D5, F3D6, F3D7 or F3D9.

design

To use the design option, follow this example:

mothur > remove.groups(shared=final.opti_mcc.shared, design=mouse.time.design, groups=F3D0-F3D1-F3D8)

mouse.time.design looks like:

group	treatment
F3D0   Early
F3D1   Early
F3D141 Late
F3D142 Late
F3D143 Late
F3D144 Late
F3D145 Late
F3D146 Late
F3D147 Late
F3D148 Late
F3D149 Late
F3D150 Late
F3D2   Early
F3D3   Early
F3D5   Early
F3D6   Early
F3D7   Early
F3D8   Early
F3D9   Early

The resulting mouse.time.pick.design file will look like:

group	treatment
F3D141	Late
F3D142	Late
F3D143	Late
F3D144	Late
F3D145	Late
F3D146	Late
F3D147	Late
F3D148	Late
F3D149	Late
F3D150	Late
F3D2	Early
F3D3	Early
F3D5	Early
F3D6	Early
F3D7	Early
F3D9	Early

sets

The sets parameter allows you to specify which of the sets in your designfile you would like to remove. You can separate set names with dashes. If you want to remove all the groups in the Early treatment, try this:

mothur > remove.groups(shared=final.opti_mcc.shared, design=mouse.time.design, sets=Early)

The resulting mouse.time.pick.design file will look like:

group	treatment
F3D141	Late
F3D142	Late
F3D143	Late
F3D144	Late
F3D145	Late
F3D146	Late
F3D147	Late
F3D148	Late
F3D149	Late
F3D150	Late

And the shared file will look like:

label	Group	numOtus	Otu001	Otu002	Otu003	Otu004	Otu005	Otu006	Otu007	Otu008	Otu009	Otu010	Otu011	Otu012	...
0.03	F3D141	428	388	335	301	482	425	279	206	331	119	175	111	101	8	13	28	96	60	4	37	45	48	76	11	42	6	...
0.03	F3D142	428	244	258	142	152	253	193	201	81	72	43	51	81	96	6	8	45	24	5	21	4	9	28	20	7	4	...
0.03	F3D143	428	189	152	173	207	308	183	116	92	62	76	70	43	37	2	7	36	26	1	15	9	8	56	17	13	3	...
0.03	F3D144	428	346	243	260	316	469	282	132	36	140	256	96	84	13	7	10	100	38	5	39	39	14	13	17	23	3	...
0.03	F3D145	428	566	445	461	527	665	471	362	127	217	319	131	188	18	8	14	107	81	5	81	32	11	12	18	11	5	...
0.03	F3D146	428	270	204	225	365	397	257	193	66	78	185	80	71	4	39	22	35	47	3	29	18	21	68	24	24	19	...
0.03	F3D147	428	1316	1107	819	1052	1619	1051	545	89	240	498	526	476	135	22	70	277	358	31	117	139	...
0.03	F3D148	428	750	666	520	830	1230	725	499	542	186	419	283	239	17	47	43	248	186	44	106	105	24	87	88	24	...
0.03	F3D149	428	770	701	650	861	879	651	454	542	257	497	290	159	74	39	37	163	155	3	99	133	41	245	32	59	8	...
0.03	F3D150	428	265	207	343	436	281	276	182	110	132	57	161	71	55	19	18	30	58	4	40	19	12	128	29	22	6	...

phylip

The phylip option is used to specify the name of the phylip formatted distance file you would like to remove distances from. To generate the .dist file, run the following:

mothur > dist.shared(shared=final.opti_mcc.shared, subsample=t)

The final.opti_mcc.thetayc.0.03.lt.ave.dist file contains:

19
F3D0
F3D1	0.474922
F3D141	0.210618	0.536193
F3D142	0.245883	0.522963	0.159277
F3D143	0.178547	0.587579	0.075820	0.111769
F3D144	0.245769	0.614136	0.150278	0.164039	0.081844
F3D145	0.240859	0.592020	0.113041	0.099472	0.069555	0.034270
F3D146	0.193530	0.559788	0.104369	0.182698	0.064180	0.103431	0.097991
F3D147	0.231679	0.593135	0.152776	0.105287	0.081308	0.054359	0.050777	0.122631
F3D148	0.214540	0.587721	0.067956	0.125263	0.048796	0.090400	0.075236	0.100006	0.085041
F3D149	0.213746	0.516693	0.037621	0.148458	0.066224	0.129811	0.105153	0.083807	0.134928	0.068088
F3D150	0.239095	0.565672	0.122289	0.225027	0.124343	0.224982	0.184366	0.123207	0.206566	0.189054	0.101765
F3D2	0.486915	0.388567	0.451548	0.422166	0.528051	0.530900	0.486701	0.565963	0.496491	0.524713	0.469993	0.531830
F3D3	0.508477	0.460610	0.383161	0.325817	0.463416	0.462134	0.389007	0.498920	0.436586	0.444289	0.386265	0.458765	0.168128
F3D5	0.396747	0.287033	0.347348	0.377409	0.434369	0.478551	0.425063	0.422747	0.468263	0.440300	0.337063	0.340654	0.322947	0.299151
F3D6	0.426449	0.406230	0.361172	0.249032	0.409600	0.443616	0.362942	0.431433	0.396574	0.440542	0.357365	0.358107	0.200769	0.156920	0.237299
F3D7	0.485061	0.512503	0.357344	0.257352	0.418828	0.444764	0.338009	0.448287	0.408318	0.429156	0.369801	0.375126	0.271915	0.125416	0.296154	0.093075
F3D8	0.534782	0.447580	0.458594	0.385501	0.517817	0.598668	0.503624	0.523409	0.575409	0.523240	0.451311	0.461517	0.485066	0.320019	0.273138	0.260956	0.217905
F3D9	0.485800	0.378642	0.407103	0.359453	0.477310	0.563293	0.469351	0.485860	0.539667	0.485885	0.404651	0.416494	0.380421	0.266538	0.190510	0.192757	0.191119	0.056494

To remove the distances for the Early treatment, run the following:

mothur > remove.groups(phylip=final.opti_mcc.thetayc.0.03.lt.ave.dist, design=mouse.time.design, sets=Early)

creates a final.opti_mcc.thetayc.0.03.lt.ave.pick.dist file:

10
F3D141
F3D142	0.159277
F3D143	0.07582	0.111769
F3D144	0.150278	0.164039	0.081844
F3D145	0.113041	0.099472	0.069555	0.03427
F3D146	0.104369	0.182698	0.06418	0.103432	0.097991
F3D147	0.152776	0.105287	0.081308	0.054359	0.050777	0.122631
F3D148	0.067956	0.125263	0.048796	0.0904	0.075236	0.100006	0.085041
F3D149	0.037621	0.148458	0.066224	0.129811	0.105153	0.083807	0.134928	0.068088
F3D150	0.122289	0.225027	0.124343	0.224982	0.184366	0.123207	0.206566	0.189054	0.101765

Alternatively, you can select distances between sequences. To generate the distance matrix, run the following:

mothur > dist.seqs(fasta=final.fasta, output=lt)

This will create the final.phylip.dist file. You can remove the distances related to sequences from groups F3D0, F3D1 or F3D8, as follows:

mothur > remove.groups(phylip=final.phylip.dist, count=final.count_table, groups=F3D0-F3D1-F3D8)

column

The column option is used to specify the name of the column formatted distance file you would like to remove distances from.

mothur > dist.shared(shared=final.opti_mcc.shared, subsample=t, output=column)

The final.opti_mcc.thetayc.0.03.column.ave.dist file contains:

F3D1	F3D0	0.474922
F3D141	F3D0	0.210618
F3D141	F3D1	0.536193
F3D142	F3D0	0.245883
F3D142	F3D1	0.522963
F3D142	F3D141	0.159277
F3D143	F3D0	0.178547
F3D143	F3D1	0.587579
F3D143	F3D141	0.075820
F3D143	F3D142	0.111769
F3D144	F3D0	0.245769
F3D144	F3D1	0.614136
F3D144	F3D141	0.150278
F3D144	F3D142	0.164039
F3D144	F3D143	0.081844
F3D145	F3D0	0.240859
F3D145	F3D1	0.592020
...

To select the distances for the Early treatment, run the following:

mothur > remove.groups(column=final.opti_mcc.thetayc.0.03.column.ave.dist, design=mouse.time.design, sets=Early)

creates a final.opti_mcc.thetayc.0.03.column.ave.pick.dist file:

F3D142	F3D141	0.159277
F3D143	F3D141	0.07582
F3D143	F3D142	0.111769
F3D144	F3D141	0.150278
F3D144	F3D142	0.164039
F3D144	F3D143	0.081844
F3D145	F3D141	0.113041
F3D145	F3D142	0.099472
F3D145	F3D143	0.069555
F3D145	F3D144	0.03427
F3D146	F3D141	0.104369
F3D146	F3D142	0.182698
F3D146	F3D143	0.06418
F3D146	F3D144	0.103431
F3D146	F3D145	0.097991
F3D147	F3D141	0.152776
F3D147	F3D142	0.105287
F3D147	F3D143	0.081308
F3D147	F3D144	0.054359
...

Alternatively, you can remove distances between sequences. To generate the distance matrix, run the following:

mothur > dist.seqs(fasta=final.fasta, cutoff=0.03)

This will create the final.dist file. You can remove the distances related to sequences from groups F3D0, F3D1 or F3D8, as follows:

mothur > remove.groups(column=final.dist, count=final.count_table, groups=F3D0-F3D1-F3D8)

You may see a warning similar to:

Removed 15096 sequences from your count file.
[WARNING]: Your accnos file contains 292 groups or sequences, but I only found 276 of them in the column file.
Removed 276 groups or sequences from your column file.

This warning can be disregarded. When you use a cutoff with the dist.seqs command, sequences with distance above the cutoff are NOT added to the distance file. This means you can have sequences in the count file, that are not in the distance file. Mothur assumes reads missing from the distance file but present in the count file only have distances above the cutoff.

The name option allows you to provide a name file associated with your group file.

We DO NOT recommend using the name file. Instead we recommend using a count file. The count file reduces the time and resources needed to process commands. It is a smaller file and can contain group information.

The group parameter allows you to provide a group file to use when selecting items from your files.

We DO NOT recommend using the name / group file combination. Instead we recommend using a count file. The count file reduces the time and resources needed to process commands. It is a smaller file and can contain group information.

Revisions

  • 1.23.0 File mismatch bug - https://forum.mothur.org/viewtopic.php?f=4&t=1396&p=3560#p3560
  • 1.24.0 Added design option
  • 1.28.0 Added count option
  • 1.31.0 Bug Fix: remove.groups(groups=notValidGroupName, ...) mothur removes all of the groups. Fix will ignore invalid group and continue.
  • 1.36.0 Bug Fix: not creating a list file for each label.
  • 1.37.0 Adds phylip and column options #79
  • 1.40.0 - Speed and memory improvements for shared files. #357 , #347
  • 1.40.0 - Allow for () characters in taxonomy definitions. #350
  • 1.44.0 - Adds sets parameter. #277
  • 1.47.0 - Fixes bug with remove.groups phylip file option.