"You may fire when you are ready, Gridley," is the famous command from Commodore Dewey in the Battle of Manila Bay, 1898. He may not have realized it, but he was articulating the basic principle of dataflow computing, where an instruction can be executed as soon as its inputs are available. Dataflow has long fascinated computer architects as perhaps a more "natural" way for computation circuits to best exploit parallelism for performance.
A visiting alien may be forgiven for experiencing whiplash when shown how we treat parallelism in programs. Mathematical algorithms have abundant parallelism; the only limit is data dependency (an operator can be evaluated when its inputs are available). We code it in a mainstream programming language (C/C++, Python, among others), which has completely sequential semantics (zero parallelism) to make sense of reads and writes to memory. As illustrated in Figure 1, compilers sweat mightily to rediscover some of the lost parallelism in their internal CDFGs (control and data flow graphs), and then produce machine code that, again, is completely sequential. When we execute this on a modern von Neumann CPU, wide-issue, out-of-order circuits once again sweat mightily (burning power) to rediscover parallelism.
Figure 1. Parallelism during coding, compilation, and execution.
The 1970s through early 1990s saw several attempts to avoid these "unnecessary" sequentializations (green circles in Figure 2). Dataflow languages (mostly purely functional) and machine code (dataflow graphs) retained parallelism from the math. Instead of a program counter, each instruction directly named its successor(s) receiving its outputs. Dataflow CPUs directly executed this graph machine code. Nowadays this computation model goes by the acronym EDGE, for explicit dataflow graph execution.
Figure 2. Alternative strategies for exploiting parallelism.
So, why aren't we all using EDGE machines today? A short answer is that they never quite mastered spatial or temporal locality and were subpar on inherently sequential code regions. In contrast, modern von Neumann CPUs excel at this, managing efficient flow of data between circuits that are fast-and-expensive (registers, wires), medium (caches), and slow-and-cheap (DRAMs).
The following paper by Tony Nowatzki, Vinay Gangadhar, and Karthikeyan Sankaralingam describes an innovative approach to exploit both models. From the CDFG, their compiler generates both traditional sequential machine code and a data graph, each being executed on appropriate circuits (blue squares in Figure 2), with efficient hand-off mechanisms. The authors describe extensive studies to validate the viability of this approach for existing codes.
EDGE computing is undergoing a renaissance, with many researchers pursuing related ideas. There are indications that big industry players are also contemplating this direction.a
a. Morgan, T.P. Intel's Exascale dataflow engine drops x86 and von Neumann. The NEXT Platform, Aug 30, 2018.
To view the accompanying paper, visit doi.acm.org/10.1145/3323923
The Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
No entries found