With transistors in computer chips shrinking in size, concern is growing about larger and more intricate cloud computing networks' fundamental dependence on less reliable and less predictable chips.
Recent studies by Facebook and Google researchers described outages with difficult-to-diagnose causes, arguing that underlying hardware was to blame.
Stanford University's Subhasish Mitra said people increasingly think manufacturing defects correspond with silent hardware errors, while scientists worry they are finding rare defects because they are attempting to meet bigger computing challenges, leading to unexpected system stressors.
The smallest error in a microprocessor hosting billions of transistors can disrupt systems that routinely execute billions of calculations each second, and mounting evidence suggests the problem is getting generationally worse.
Proposed remedies include software that proactively monitors for hardware errors.
From "Tiny Chips, Big Headaches"
The New York Times (02/07/22) John Markoff
View Full Article - May Require Paid Subscription
No entries found