Limits...
Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud.

Cianfrocco MA, Leschziner AE - Elife (2015)

Bottom Line: The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures.We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters.Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States.

ABSTRACT
The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available 'off-the-shelf' computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16-480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM.

No MeSH data available.


Relion performance on STARcluster configurations of Amazon instances.(A) Processing times (minutes) for Relion to perform 3D Classification or 3D refinement on 80S ribosome dataset. (B) Speedup for each cluster size relative to a single CPU (black line) shown alongside performance estimate for a perfectly parallel cluster using Amdahl's Law (curve labeled ‘Theoretical limit’). For cluster sizes ≤ 64 CPUs, Relion exhibits near-perfect performance on STARcluster configurations, while cluster sizes > 64 show that Relion's performance reaches a maximum at 256 CPUs for both 3D classification and 3D refinement. (C) Speedup/Cost is plotted against cluster size, where Speedup/Cost is defined as the speedup observed divided by the cost associated with Amazon's pricing at $0.35/hr/16 CPUs. (D) Average STARcluster boot up time (± s.d.) was measured for clusters of increasing size (n = 5). Source data: Figure 4—source data 1.DOI:http://dx.doi.org/10.7554/eLife.06664.00810.7554/eLife.06664.009Figure 4—source data 1.Performance analysis statistics for Relion 3D classification and 3D refinement on STARcluster configurations.DOI:http://dx.doi.org/10.7554/eLife.06664.009
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4440898&req=5

fig4: Relion performance on STARcluster configurations of Amazon instances.(A) Processing times (minutes) for Relion to perform 3D Classification or 3D refinement on 80S ribosome dataset. (B) Speedup for each cluster size relative to a single CPU (black line) shown alongside performance estimate for a perfectly parallel cluster using Amdahl's Law (curve labeled ‘Theoretical limit’). For cluster sizes ≤ 64 CPUs, Relion exhibits near-perfect performance on STARcluster configurations, while cluster sizes > 64 show that Relion's performance reaches a maximum at 256 CPUs for both 3D classification and 3D refinement. (C) Speedup/Cost is plotted against cluster size, where Speedup/Cost is defined as the speedup observed divided by the cost associated with Amazon's pricing at $0.35/hr/16 CPUs. (D) Average STARcluster boot up time (± s.d.) was measured for clusters of increasing size (n = 5). Source data: Figure 4—source data 1.DOI:http://dx.doi.org/10.7554/eLife.06664.00810.7554/eLife.06664.009Figure 4—source data 1.Performance analysis statistics for Relion 3D classification and 3D refinement on STARcluster configurations.DOI:http://dx.doi.org/10.7554/eLife.06664.009

Mentions: To further test the performance of Amazon instances, we carried out 3D classification and refinement on a variety of STARcluster configurations using Relion. As before, we ran our tests on clusters of r3.8xlarge high-memory instances (256 GiB RAM and 16 CPUs per instance). Comparing performance across cluster sizes showed that 256 CPUs had the fastest overall time and the highest speedup relative to a single CPU for both 3D classification and refinement (Figure 4A,B). However, cluster sizes of 128 and 64 CPUs were the most cost effective for 3D classification and refinement, respectively, as these were the cluster configurations where the speedup per dollar reached a maximum (Figure 4C). Importantly, the average time required to boot up these STARclusters was ≤ 10 min for all cluster sizes (Figure 4D) and, once booted up, the clusters do not have any associated job wait times. Therefore, these tests showed that Amazon's EC2 infrastructure was amenable to the analysis of single particle cryo-EM data using Relion over a range of STARcluster sizes.10.7554/eLife.06664.008Figure 4.Relion performance on STARcluster configurations of Amazon instances.


Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud.

Cianfrocco MA, Leschziner AE - Elife (2015)

Relion performance on STARcluster configurations of Amazon instances.(A) Processing times (minutes) for Relion to perform 3D Classification or 3D refinement on 80S ribosome dataset. (B) Speedup for each cluster size relative to a single CPU (black line) shown alongside performance estimate for a perfectly parallel cluster using Amdahl's Law (curve labeled ‘Theoretical limit’). For cluster sizes ≤ 64 CPUs, Relion exhibits near-perfect performance on STARcluster configurations, while cluster sizes > 64 show that Relion's performance reaches a maximum at 256 CPUs for both 3D classification and 3D refinement. (C) Speedup/Cost is plotted against cluster size, where Speedup/Cost is defined as the speedup observed divided by the cost associated with Amazon's pricing at $0.35/hr/16 CPUs. (D) Average STARcluster boot up time (± s.d.) was measured for clusters of increasing size (n = 5). Source data: Figure 4—source data 1.DOI:http://dx.doi.org/10.7554/eLife.06664.00810.7554/eLife.06664.009Figure 4—source data 1.Performance analysis statistics for Relion 3D classification and 3D refinement on STARcluster configurations.DOI:http://dx.doi.org/10.7554/eLife.06664.009
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4440898&req=5

fig4: Relion performance on STARcluster configurations of Amazon instances.(A) Processing times (minutes) for Relion to perform 3D Classification or 3D refinement on 80S ribosome dataset. (B) Speedup for each cluster size relative to a single CPU (black line) shown alongside performance estimate for a perfectly parallel cluster using Amdahl's Law (curve labeled ‘Theoretical limit’). For cluster sizes ≤ 64 CPUs, Relion exhibits near-perfect performance on STARcluster configurations, while cluster sizes > 64 show that Relion's performance reaches a maximum at 256 CPUs for both 3D classification and 3D refinement. (C) Speedup/Cost is plotted against cluster size, where Speedup/Cost is defined as the speedup observed divided by the cost associated with Amazon's pricing at $0.35/hr/16 CPUs. (D) Average STARcluster boot up time (± s.d.) was measured for clusters of increasing size (n = 5). Source data: Figure 4—source data 1.DOI:http://dx.doi.org/10.7554/eLife.06664.00810.7554/eLife.06664.009Figure 4—source data 1.Performance analysis statistics for Relion 3D classification and 3D refinement on STARcluster configurations.DOI:http://dx.doi.org/10.7554/eLife.06664.009
Mentions: To further test the performance of Amazon instances, we carried out 3D classification and refinement on a variety of STARcluster configurations using Relion. As before, we ran our tests on clusters of r3.8xlarge high-memory instances (256 GiB RAM and 16 CPUs per instance). Comparing performance across cluster sizes showed that 256 CPUs had the fastest overall time and the highest speedup relative to a single CPU for both 3D classification and refinement (Figure 4A,B). However, cluster sizes of 128 and 64 CPUs were the most cost effective for 3D classification and refinement, respectively, as these were the cluster configurations where the speedup per dollar reached a maximum (Figure 4C). Importantly, the average time required to boot up these STARclusters was ≤ 10 min for all cluster sizes (Figure 4D) and, once booted up, the clusters do not have any associated job wait times. Therefore, these tests showed that Amazon's EC2 infrastructure was amenable to the analysis of single particle cryo-EM data using Relion over a range of STARcluster sizes.10.7554/eLife.06664.008Figure 4.Relion performance on STARcluster configurations of Amazon instances.

Bottom Line: The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures.We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters.Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular and Cellular Biology, Harvard University, Cambridge, United States.

ABSTRACT
The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available 'off-the-shelf' computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16-480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM.

No MeSH data available.