acm-header
Sign In

Communications of the ACM

Communications of the ACM

Inside Risks: A Tale of Two Thousands


It was the best of times, it was the worst of times, but now it is time to reflect on the lessons of Y2K. Ironically, if the extensive media hype had not stimulated significant progress in the past half-year, serious social disruptions could have occurred. However, the colossal remediation effort is simultaneously (1) a success story that improved systems and people's technical knowledge, (2) a wonderful opportunity to have gotten rid of some obsolete systems (although there were some unnecessary hardware upgrades where software fixes would have sufficed), and (3) a manifestation of long-term, short-sightedness. After spending billions of dollars worldwide, we must wonder why a little more foresight did not avoid many of the Y2K computer problems sooner.

System development practice. System development should be based on constructive measures throughout the life cycle, on well-specified requirements, system architectures that are inherently sound, and intelligently applied system engineering and software engineering. The Y2K problem is a painful example of the absence of good practice—somewhat akin to its much neglected but long-suffering stepchild, the less glitzy but persistent buffer-overflow problem. For example, systematic use of concepts such as abstraction, encapsulation, information hiding, and object-orientation could have allowed the construction of efficient programs in which the representation of dates could be changed easily when needed.

Integrity of remediation. In the rush to remediation, relatively little attention was paid to the integrity of the process and ensuing risks. Many would-be fixes introduced new bugs. Windowing deferred some problems until later. Opportunities existed for theft of proprietary software, blackmail, financial fraud, and insertion of Trojan horses—some of which may not be evident for some time.

What happened? In addition to various problems triggered before the new year, there were many Y2K date-time screwups. )See the online Risks Forum, Vol. 20, beginning with issue 71, and www.csl.sri.com/neumann/cal.html for background.) The Pentagon had a self-inflicted Y2K mis-fix that resulted in complete loss of ability to process satellite intelligence data for almost three hours at midnight GMT on the year turnover, with the fix for that leaving only a trickle of data from 5 satellites for several days afterward. The Pentagon DefenseLINK site was disabled by a preventive mistake. The Kremlin press office could not send email. In New Zealand, an automated radio station kept playing the New Year's Eve 11 p.m. news hour as most recent, because 99 is greater than 00. Toronto abandoned their Y2K-noncompliant bus schedule information system altogether, rather than fix it. Birth certificates for British newborns were for 1900. Some credit-card machines failed, and some banks repeatedly charged for the same transaction—once a day until a previously available fix was finally installed. Various people received bills for cumulative interest since 1900. At least one person was temporarily rich, for the same reason. In email, Web sites, and other applications, strange years were observed beginning on New Year's Day (and continuing until patched), notably the years 100 (99+1), 19100 (19 concatenated with 99+1), 19000 (19 concatenated with 99+1 (mod 100)), 1900, 2100, 3900, and even 20100. Some Compaq sites said it was Jan. 2 on Jan. 1. U.K.'s NPL atomic clock read Dec. 31, 1999 27:00 at 2 a.m. GMT on New Year's Day. But all of these anomalies should be no surprise; as we noted here in January 1991, calendar arithmetic is a tricky business, even in the hands of expert programmers.

Conclusions: Local optimization certainly seems advantageous in the short term (to reduce immediate costs), but is often counterproductive in the long term. The security and safety communities (among others) have long maintained that trying to retrofit quality into poorly conceived systems is throwing good money after bad. It is better to do things right from the outset, with a clear strategy for evolution and analysis—so that mistakes can be readily fixed whenever they are recognized. Designing for evolvability, interoperability, and the desired functional "-ities" (such as security, reliability, survivability in the presence of arbitrary adversities) is difficult. Perhaps this column should have been entitled "A Tale of Two -ities"—predictability and dependability, both of which are greatly simplified when the requirements are defined in advance.

Between grumbles about the large cost of Y2K remediation and views on what might have happened had there not been such an intensive remediation effort, we still have much to learn. (Will this experience be repeated for Y10K?) Perhaps the biggest Y2K lessons are simply further reminders that greater foresight would have been beneficial, that fixes themselves are prone to errors, and that testing is inherently incomplete. We need better system-oriented training. Maybe it is also time for certification of developers, especially when dealing with critical systems.

Back to Top

Author

For more information see catless.ncl.ac.uk/Risks/; ftp://www.sri.com/risks/; and mirror.aarnet.edu.au/risks/house, the archives for the ACM Risks Forum.


©2000 ACM  0002-0782/00/0300  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2000 ACM, Inc.


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Article Contents: