ACM

Communications of the ACM

Home/News/Google Researchers Reveal Lessons Learned in Large.../Full Text

ACM TechNews

Google Researchers Reveal Lessons Learned in Large-Scale Cloud Storage

By HPC in the Cloud
March 3, 2011
Comments

View as: Print Mobile App Share:

There is a lot of information on how storage systems fail, but new research on Google's main storage infrastructure provides more answers about the overall availability for cloud-based storage services.

"Highly-available cloud storage is often implemented with complex, multi-tiered distributed systems built on top of clusters of commodity services and disk drives," say Google researchers. As a result, "sophisticated management, load balancing, and recovery techniques are needed to achieve high performance and availability amidst an abundance of failure sources that include hardware, software, network connectivity, and power issues."

The Google researchers developed a series of statistical models for different design choices, including variable replication and data placement, and used them to examine availability against system parameters tested and encountered in Google's fleet. The researchers conclude that transitory node failures account for most unavailability.

From HPC in the Cloud
View Full Article

No entries found