Pacific Northwest National Laboratory and Ohio State University researchers have developed a systematic framework that uses recursive broadcast, rotation, and reduction (RRR) to derive communication-efficient algorithms for distributed contraction of arbitrary dimensional tensors on the IBM Blue Gene/Q Mira supercomputer.
The framework automatically models potential space-performance tradeoffs to optimize the communication costs incurred in carrying out tensor contractions on supercomputers.
The researchers described distributed tensor contraction algorithms on tori networks, defining tensor indices, iteration space, and their mapping. They were able to accurately define where each computation of a tensor contraction occurs, as well as the data that needs to be present in each processor, by plotting out the iteration space.
For each iteration space mapping, the RRR framework identified the fundamental data movement direction required by a distributed algorithm. This process allows the framework to compute compatible input and output tensor distributions and systematically generate a contraction algorithm for them using communication operators.
During testing, the researchers demonstrated that the framework was scalable up to 16,384 nodes on Blue Gene/Q supercomputers. Furthermore, they showed their framework can enhance commutation optimality, and even surpass the the Cyclops Tensor Framework, which is vaunted as cutting-edge technology.
From Pacific Northwest National Laboratory
View Full Article
Abstracts Copyright © 2014 Information Inc., Bethesda, Maryland, USA
No entries found