ACM

Communications of the ACM

Home/News/Communication-Optimal Algorithms For Contracting Distributed.../Full Text

ACM TechNews

Communication-Optimal Algorithms For Contracting Distributed Tensors

By Pacific Northwest National Laboratory
August 7, 2014
Comments

View as: Print Mobile App Share:

Iteration Space and Data Space Mapping for Matrix-Matrix Multiplication on a 2D Torus Network. — Scientists from Pacific Northwest National Laboratory and Ohio State University developed a systematic framework that uses three fundamental communication operators to derive communication-efficient algorithms for distributed contraction of arbitrary dime

Credit: Pacific Northwest National Laboratory

Pacific Northwest National Laboratory and Ohio State University researchers have developed a systematic framework that uses recursive broadcast, rotation, and reduction (RRR) to derive communication-efficient algorithms for distributed contraction of arbitrary dimensional tensors on the IBM Blue Gene/Q Mira supercomputer.

The framework automatically models potential space-performance tradeoffs to optimize the communication costs incurred in carrying out tensor contractions on supercomputers.

The researchers described distributed tensor contraction algorithms on tori networks, defining tensor indices, iteration space, and their mapping. They were able to accurately define where each computation of a tensor contraction occurs, as well as the data that needs to be present in each processor, by plotting out the iteration space.

For each iteration space mapping, the RRR framework identified the fundamental data movement direction required by a distributed algorithm. This process allows the framework to compute compatible input and output tensor distributions and systematically generate a contraction algorithm for them using communication operators.

During testing, the researchers demonstrated that the framework was scalable up to 16,384 nodes on Blue Gene/Q supercomputers. Furthermore, they showed their framework can enhance commutation optimality, and even surpass the the Cyclops Tensor Framework, which is vaunted as cutting-edge technology.

From Pacific Northwest National Laboratory
View Full Article

No entries found