We will be offering mothur and R workshops this summer. Learn more.

The mothur AMI

July 12, 2016 • PD Schloss • 3 min read

We get asked a lot of questions by mothur users. Perhaps the one I hate the most is, “What type of computer should I get?” I hate this question because I don’t want to spend other people’s money and because I honestly don’t have the answer. I used to encourage people to get the biggest, baddest computer they could afford. I’ve followed this advice myself.

Over the years, we have literally spent upwards of $50,000 on a high performance computer cluster with a ton of processors, RAM, and storage. Then the System Administrator told us that we were really only using 10% of the cluster’s capacity. In other words, we were effectively spending$50,000 to get \$5,000 worth of service. I’ve come to realize that you can do amazing and very affordable bioinformatics on a pretty crappy computer. Just to make the point clear, I’ve run mothur using my iPhone. The caveat, of course, is that you are able to log into a remote high performance computer cluster. Many institutions have high performance computing clusters (HPCCs) that they make very cheap for their constituents. Not everyone is so fortunate. For this latter group of researchers, there is the Amazon Web Server (AWS). Although this tends to be a bit more expensive than institutional HPCCs, it is a very powerful and well-supported option.

I’m curious what people think of this AMI. I hope to achieve a few goals with this. First, we want to provide an easier on-ramp for analyzing large datasets for people that don’t have access to large amounts of computing power. Part of this involves putting mothur into the path, preloading the AMI with various references, and throwing in RStudio so people can work with their data where it lives in the cloud. Second, we want to be a bit opinionated on how people set up their data analyses by doing things like separating reference, raw, and processed data and keeping data separate from the code. These steps are considered to be pretty good data hygiene habits. Third, by creating an AMI that researchers can modify to do their own analyses, they can create a derivative AMI that could be made publicly accessible to other researchers. The result could be clearer documentation and more reproducible analyses.