Japan-based Fujitsu has developed an approach to accelerating parallel computing driven by deep-learning neural network algorithms, enlarging the networks that can fit on a single chip.
The method trimmed the amount of internal graphics-processing unit (GPU) memory needed for neural network calculations by 40% via an efficiency shortcut, says Yasumoto Tomita with Fujitsu Laboratories' Next-Generation Computer Systems Project.
Tomita says Fujitsu determined how to reuse certain segments of the GPU's memory by calculating intermediate error data from weighted data and producing weighted error data from intermediate data, independently but simultaneously.
He estimates the 40% memory usage reduction lets a larger neural network with "roughly two times more layers or neurons" run on one GPU. Tomita notes this method avoids some of the performance bottlenecks that occur when neural networks diffused across numerous GPUs must share data during training.
In addition, Fujitsu is developing software to expedite data exchange across multiple GPUs, which could be merged with the memory-efficiency technology to advance the company's deep-learning capabilities. "By combining the memory-efficiency technology...with GPU parallelization technology, fast learning on large-scale networks becomes possible, without model parallelization," Tomita says.
From IEEE Spectrum
View Full Article
Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA
No entries found