acm-header
Sign In

Communications of the ACM

ACM TechNews

Addressing the Threat of Silent Data Corruption


View as: Print Mobile App Share:
binary data error, illustration

Credit: The Economic Times

Researchers at the Los Alamos National Laboratory (LANL) are conducting a large-scale field test study of incorrect results on high-performance computing platforms to gain a better understanding of soft errors and silent data corruption (SDC). Soft errors can lead to unintended changes in the state of an electronic device that alters stored information without destroying functionality, says LANL's Sarah E. Michalak. SDC is a troubling type of soft error that occurs when a computing system delivers incorrect results without logging an error. In some cases, that can lead to incorrect scientific results, and in others, the application can hang for a long time or even indefinitely.

"Silent data corruption has the potential to threaten the integrity of scientific calculations performed on high-performance computing platforms and other systems," the researchers note in a recent paper.

SDC can be caused by many factors, and the main culprits include temperature and voltage fluctuations, particles, manufacturing residues, oxide breakdown, and electrostatic discharge. The researchers note that new technologies in which clock frequencies, transistor counts, and noise levels increase while feature sizes and voltages decline could increase the incidences of SDC, which would lead to reliability problems.

From HPC Wire
View Full Article

 

Abstracts Copyright © 2013 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account