Scalability is the capability of a parallel program to speed up its execution as we provide it with more CPUs. Back in 1967, Gene Amdahl noticed the sequential part of a parallel program had a disproportionate influence on scalability.1 Suppose that some program takes 100 s to run on a sequential processor. Now, let's run it on a parallel computer. If we are able to parallelize, say, 80% of the code, then with enough CPUs that 80% would take essentially zero time. However, the remaining sequential portion will not run any faster; this means the parallel program will always take at least 20 s to run, a maximum speed-up of only 5X. If we are able to parallelize 95% of the code, speedup is still limited to 20X, even with an infinite number of CPUs! This back-of-the-envelope calculation, known as Amdahl's Law, does not take into account other factors, such as increased memory size, but remains an important guideline.
In 1967, parallelism was a niche topic, but not any more. Improving program performance on today's clusters, clouds, and multicore computers requires the developer to pay serious attention to scalability. The inherent scalability of an interface is the focus of the following paper.
When a thread updates some shared datum, and another thread wants to read or write the most recent version of that datum (they conflict), they must synchronize, which constitutes a sequential bottleneck. This is a general result that does not depend on any particular implementation, even with efficient hardware support for cache coherence, as explained by the authors.
Here comes the paper's main insight: If two concurrent procedure calls commute with each other (that is, executing them in either order is equivalent), this means that neither one depends on the result of the other. Therefore, there is no inherent reason why these calls should conflict; and, hence, it is possible to implement them in a way that scales well.
The following paper presents a simple and powerful idea. It is not just about OSs, but applies to any piece of parallel software, whether running on a multicore computer or in the cloud.
The advantages of commutativity in software have been known for a long time, see the paper for relevant references. It is only recently, however, that focus has shifted from simply leveraging existing commutativity toward designing software to achieve commutativity.2,3 The paper goes well beyond previous work. First, instead of simple abstract data types, it considers the more complex case of software with an intricate interface and massive amount of shared state—a whole operating system (OS). Second, instead of just a black-and-white characterization "commute/don't-commute," it considers calls that may commute in some states and not in others. This is especially important when commuting is the common case, as in many OS calls. Finally, it leverages static program verification techniques, providing a tool that will prove if and when a given interface is commutative, and will generate test cases exercising the scalability of its implementation.
The authors designed a whole OS based on these ideas. It's similar to Linux, but its APIs are designed for commutativity. The implementation is mostly scalable, but not always: even when a scalable implementation of an API exists in theory, it will not necessarily be the most obvious or even the most efficient; sometimes, it's simply not worthwhile. They also learned that many advanced data structures do not scale well; for instance, rebalancing a tree might modify a portion of the tree that is semantically unrelated to the update that triggered the rebalancing.
The authors present a simple and powerful idea. It is not just about OSs, but applies to any piece of parallel software, whether running on a multicore computer or in the cloud. Commutativity enables us to reason about scalability in a principled way, independently of a particular implementation, benchmark or workload. We can now design our APIs to be scalable, by ensuring calls commute in the common case, and we can use verification tools to automate and exercise this reasoning—an unexpected connection between high-school math theory and hardcore computer science.
1. Amdahl, G.M. Validity of the single-processor approach to achieving large scale computing capabilities. In Proceedings of the AFIPS Conference, 30 (Atlantic City, NJ, Apr. 1967) AFIPS Press.
2. Shapiro, M., Preguica, N., Baquero, C. and Zawirski, M. Conflict-free replicated data types. In Proceedings of the Int. Symp. on Stabilization, Safety, and Security of Dist. Sys. 6976. Lecture Notes in Comp. Sc. X. Défago, F. Petit, and V. Villain, Eds. Grenoble, France, Oct. 2011, 386–400,. Springer-Verlag; doi: 10.1007/978-3-642-24550-3 29; http://www.springerlink.com/content/3rg39l2287330370/.
3. Shapiro, M., Preguica, N., Baquero, C. and Zawirski, M. Convergent and commutative replicated data types. Bulletin of the European Association for Theoretical Computer Science 104 (June 2011), 67–88; http://www.eatcs.org/images/bulletin/beatcs104.pdf.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
No entries found