ACM

Communications of the ACM

Home/News/Supercomputing Speeds Up Deep Learning Training/Full Text

ACM TechNews

Supercomputing Speeds Up Deep Learning Training

By Texas Advanced Computing Center
November 17, 2017
Comments

View as: Print Mobile App Share:

Two-dimensional embedding of images from the ImageNet database, extracted by a convolutional neural network using Caffe. — Researchers recently published the results of an effort to use supercomputers to train a deep neural network for rapid image recognition.

Credit: Andrej Karpathy

A team led by researchers at the Texas Advanced Computing Center (TACC) recently published the results of an effort to use supercomputers to train a deep neural network for rapid image recognition.

They used the Stampede2 supercomputer to complete a 100-epoch ImageNet training with AlexNet in 11 minutes, marking the fastest time recorded to date.

The team also completed a 90-epoch ImageNet training with ResNet-50 in 32 minutes.

"These results show the potential of using advanced computing resources...along with large mini-batch enabling algorithms, to train deep neural networks interactively and in a distributed way," says TACC's Zhao Zhang.

The research involved developing a Layer-Wise Adaptive Rate Scaling algorithm that distributes data efficiently to many processors to compute simultaneously using a batch size of up to 32,000 items.

"By not having to migrate large datasets between specialized hardware systems, the time to data-driven discovery is reduced and overall efficiency can be significantly increased," says TACC's Niall Gaffney.

From Texas Advanced Computing Center
View Full Article

No entries found