acm-header
Sign In

Communications of the ACM

Contributed articles

Mars Code


Mars Code, illustration

Credit: Brian Greenberg / Andrij Borys Associates

On August 5, 2012, 10:18 P.M. PST, a large rover named Curiosity made a soft landing on the surface of Mars. Given the one-way light-time to Mars, the controllers on Earth learned about the successful touchdown 14 minutes later, at 10:32 P.M. PST. As can be expected, all functions on the rover, and on the spacecraft that brought it to its destination 350 million miles from Earth, are controlled by software. This article discusses some of the precautions the JPL flight software team took to improve its reliability.

Back to Top

Key Insights

ins01.gif

To begin the journey to Mars you need a launch vehicle with enough thrust to escape Earth's gravity. On Earth, Curiosity weighed 900 kg. It weighs no more than 337.5 kg on Mars because Mars is smaller than Earth. Curiosity began its trip atop a large Atlas V 541 rocket, which, together with fuel and all other parts needed for the trip, brought the total launch weight to a whopping 531,000 kg, or 590 times the weight of the rover alone.


Comments


CACM Administrator

The accompanying video of Gerard Holzmann was recorded at HotDep '12 (https://www.usenix.org/conference/hotdep12).


Don Sannella

The article points out the difficulties in writing correct multithreaded code, because of the ever-present risks of race conditions and other subtle concurrency flaws. It describes how critical MSL software components were checked for race conditions via the use of logical assertions in the code and a powerful model checker. Such methods are the state of the art for safety-critical software and software running on spacecraft, in which requirements for software quality are extreme. The difficulty of scaling up such methods is hinted at by the fact that these methods were used only for critical software components and by the use of different methods for subsystems of different sizes.

Static code analysis tools, referred to elsewhere in the article, can be used to automatically find race conditions and other concurrency flaws without requiring the use of assertions. Their accuracy is not quite as high as methods involving model checking, but they can be used routinely by ordinary software engineers and they scale to million-line codebases. We have developed a static analysis tool that focuses specifically on concurrency defects in Java, called ThreadSafe (www.contemplateltd.com/threadsafe), and can wholeheartedly appreciate the difficulty of finding such flaws. Our experience with users is that concurrency defects are as much of a problem in commercial software as in systems like the one described, and tool support is desperately needed to avoid the risk of failures.


CACM Administrator

The following letter was published in the Letters to the Editor in the April 2014 CACM (http://cacm.acm.org/magazines/2014/4/173227).
--CACM Administrator

Gerard J. Holzmann's article "Mars Code" (Feb. 2014) demonstrated a nonblocking implementation of concurrent double-ended queues, or deques,1 (previously shown to be incorrect by Doherty2) to not work through an application of Holzmann's own Spin model checker. However, the demonstration seemed too shallow. Assuming the writer process of the test driver is allowed to pushRight(0) and pushRight(1) before the reader process gets a chance to run, then the value of rv returned by the first succeeding popRight() in the reader process would definitely not be 0 and the assert(rv==i) would fail because i is 0; that is, the test driver is incorrect and could fail, even with a correct implementation of concurrent deques.

Holzmann's demonstration included a description of a failing run of his test driver exercising the concurrent deque implementation. Even though Holzmann clearly left out some detailsthe initialize function and complete output of the model checkerthe failing run he described was definitely much simpler than the failure described by Doherty,2 indicating the failure detected was in the test driver, not in the concurrent deque implementation.

Thorkil Naur
Odense, Denmark

----------------------------------------------

AUTHOR'S RESPONSE

Naur is correct, and I thank him for his keen observation. The test driver used for the example was flawed. If we make the required changes we can show it takes minimally three processes to expose the bug in the original DCAS algorithm,1 as was also shown in Doherty.2 A corrected version of the example, with the model-checking result, is available from the author.

Gerard J. Holzmann
Pasadena, CA

REFERENCES

1. Detlefs, D.L. et al. Even better DCAS-based concurrent deques. In Distributed Algorithms, LNCS Vol. 1914, M. Herlihy, Ed. Springer-Verlag, Heidelberg, Germany, 2000, 5973.

2. Doherty, S. Modelling and Verifying Non-blocking Algorithms That Use Dynamically Allocated Memory. Master's thesis, Victoria University, Wellington, New Zealand, 2004.


Yaki Tebeka

The article mentions that "In 145 code reviews held between 2008 and 2012 for MSL flight software, approximately 10,000 peer comments and 30,000 tool-generated reports were discussed"

This seems like a low number of code reviews for 4 years. Is this figure correct? If so, is there a reason for not doing code reviews more often?


Gerard Holzmann

Yaki Tebeka: each software module was peer reviewed at least once, and some more than once. (There were about 120 such modules -- one for each VxWorks task.)
So 145 code reviews is roughly what would be expected for this code. Note also that the peer code review was meant to be the final step before a code freeze for launch.
It would be good to do more frequent code reviews of course, but the time schedule for a mission like this is typically very tight.


Stephen Siegel

A small point: a kilogram (kg) is a unit of mass, not weight, and the mass of Curiosity is the same on Earth and Mars or anywhere else in the universe. The weight is the product of the mass and the local acceleration due to free-fall. Weight is a force, and has units such as kg.m/s^2; this is what changes from Earth to Mars.


Displaying all 6 comments

Log in to Read the Full Article

Sign In

Sign in using your ACM Web Account username and password to access premium content if you are an ACM member, Communications subscriber or Digital Library subscriber.

Need Access?

Please select one of the options below for access to premium content and features.

Create a Web Account

If you are already an ACM member, Communications subscriber, or Digital Library subscriber, please set up a web account to access premium content on this site.

Join the ACM

Become a member to take full advantage of ACM's outstanding computing information resources, networking opportunities, and other benefits.
  

Subscribe to Communications of the ACM Magazine

Get full access to 50+ years of CACM content and receive the print version of the magazine monthly.

Purchase the Article

Non-members can purchase this article or a copy of the magazine in which it appears.
Sign In for Full Access
» Forgot Password? » Create an ACM Web Account