The phenomenal growth of digital data produced by all kinds of systems and devices has been well documented. Studies indicate that until the year 2000, the amount of digital data in the world was dwarfed by the amount of analog data. Today, virtually all data being generated, transmitted, and consumed is in a digital form.
Commercial computers evolved when a large fraction of data in the digital universe was of the transactional kind. Transactional data continues to grow today, but it is a much smaller part of a digital universe dominated by consumer images, surveillance footage, entertainment feeds, social media, and sensor data. Yet, most systems today, including personal computers, use fundamentally the same type of processors, the same storage hierarchies, and the same computing paradigms pioneered by the transactional systems.
Systems built to support processing of transactions, for example, financial transactions, have special requirements. Much of the complexity in processors for these systems is incurred in satisfying high reliability, high precision, and a well-defined ordering of operations. These properties help in reasoning about and debugging a program but impose a cost on even those operations and phases of execution of a program where they could have been relaxed.
What would systems look like if we had to deal with only this new body of nontransactional data? Typical applications that mine and process this data can often tolerate lower precision, imprecise ordering, and even some unreliability in the operation of the system. Thus, an occasional stale or approximately correct piece of information delivered promptly is often more useful than up-to-date and precise information delivered later or at greater cost.
It may be argued that approximate computing makes it difficult to reason logically about the results produced by a program. However, such reasoning is often difficult even in traditional computing—real numbers cannot always be represented precisely, the order of access to shared variables by multiple processors is often unpredictable and may lead to non-deterministic results, and it is virtually impossible to eliminate all potential sources of errors, hardware and software, from any system. Unlike traditional computing, approximate computing grants that errors will occur and transfers the responsibility for tolerating errors to the runtime, compiler, or even the application itself.
The following paper demonstrates the significant advantages in cost, power, and latency through approximate computing. The programming model in the paper allows a region of a program to be deemed approximable by the programmer. The compiler produces two versions of the approximable code, one that runs on the host and another on a separate accelerator that potentially performs the computation faster and with lower consumption of energy, but with results that may deviate from the exact results that would have been produced by the host.
The authors postulate that in many approximable programs, the solution space is such that knowledge of the past behavior of the program on a range of inputs is a good predictor of the behavior of the program on some new input. As an accelerator for the approximable region they use a learning engine to deliver an approximately correct result. The learning engine chosen by the authors is a digital neural network. The host version of the approximable section is executed on either a set of typical inputs or a set of random inputs. The outputs from this execution are used to train the neural network accelerator. On a variety of programs that tolerate approximation the authors show impressive performance gains and power reduction for a tolerable loss of quality. It is also impressive that these gains and power reductions did not require different accelerator implementations for different programs—they were obtained on the same approximate accelerator working on all types of approximable code regions across all programs.
Beyond what is described in the paper, there is more energy savings to be harnessed by operating such approximate computing engines at considerably lower voltages. The real engine of growth in the industry—Dennard scaling—has allowed transistors to shrink in size, and simultaneously consume less power and improve performance. But Dennard scaling is ending. Device sizes have shrunk to atomic levels and voltages cannot be scaled down much without compromising the reliability of devices. New technologies promise good power-performance, but their advantage is offset by the redundancy needed to compensate for their unreliability. Thus, technologies of the future are unlikely to provide much better power-performance characteristics for traditional transactional systems. However, for systems that produce approximate results such as the one described by the authors, the redundancy needed is likely to be different and considerably simpler, making low-voltage operation practical for these types of systems.
Approximate computing appears to be our best hope to put massive amounts of computation to work on the massive amounts of data that will be produced in the world using future scaled CMOS and post-silicon technologies. The following paper is a good example of the type of work that needs to be done to fulfill this promise.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.
No entries found