Oak Ridge National Laboratory's Frontier supercomputer experiences numerous hardware failures on a daily basis.
Frontier, which has not yet been deployed officially, aims to provide up to 1.685 FP64 ExaFLOPS peak performance via AMD's 64-core EPYC Trento processors, Instinct MI250X compute graphics processing units, and HPE's Slingshot interconnections at 21 MW of power.
"We are working through issues in hardware and making sure that we understand [what they are]," says Oak Ridge Leadership Computing Facility's Justin Whitt. "You are going to have failures at this scale. Mean time between failure on a system this size is hours; it's not days."
From Tom's Hardware
View Full Article
Abstracts Copyright © 2022 SmithBucklin, Washington, DC, USA
No entries found