Low Cost, Scalable Proteomics Data Analysis Using Amazon’s Cloud Computing Services and Open Source Search Algorithms

One of the major difficulties for many laboratories setting up
proteomics programs has been obtaining and maintaining the
computational infrastructure required for the analysis of the large
flow of proteomics data. We describe a system that combines distributed
cloud computing and open source software to allow laboratories to set
up scalable virtual proteomics analysis clusters without the investment
in computational hardware or software licensing fees. Additionally, the
pricing structure of distributed computing providers, such as Amazon
Web Services, allows laboratories or even individuals to have
large-scale computational resources at their disposal at a very low
cost per run. We provide detailed step-by-step instructions on how to
implement the virtual proteomics analysis clusters as well as a list of
current available preconfigured Amazon machine images containing the
OMSSA and X!Tandem search algorithms and sequence databases on the
Medical College of Wisconsin Proteomics Center Web site (http://proteomics.mcw.edu/vipdac).