acm-header
Sign In

Communications of the ACM

Communications of the ACM

More Testing Should Be Taught


Testing typically takes 50% or more of the resources for software development projects. Curiously, far less effort and resources of the software development portion of a typical undergraduate curriculum in computing is allocated to testing. A share of less than 50% may be appropriate, even in a software curriculum, but the current level is very low and has largely been determined by a perceived lack of space, especially in computer science curricula. This is not appropriate.

If testing is broadened to include all of Verification and Validation (V&V), the situation is even more serious. Highly effective practices such as software inspection [5] are hardly taught at all, and many computer science professors do not know (or care) what inspection is and why it is valuable. With the current pace of software development, V&V techniques must become ever more efficient and effective. Students today are not well equipped to apply widely practiced techniques, and have even further to go to understand techniques that could improve current practice. They are graduating with a serious gap in the knowledge they need to be effective software developers. Even new software engineering curricula tend to be weak in V&V.

We describe the teaching of testing in an undergraduate software specialization in a computer engineering curriculum and in a graduate course on V&V, and we compare the two different approaches. We believe the equivalent of several courses on testing, software quality, and the broader issues of V&V should be available to undergraduates, and should be mandatory for software engineers. The undergraduate curriculum we describe is moving in this direction by including V&V material in several courses. At the graduate level, particularly for students who already have industrial experience, a significant part of the material can be concentrated into a single course, sacrificing depth in favor of breadth.

Software V&V techniques are part of the larger discipline of software engineering. Software engineering is an emerging discipline, first identified more than 30 years ago [9]. Today, most undergraduate computing curricula are based on the 1991 ACM/IEEE Computing Curriculum (www.acm.org/education/curricula.html). It sets a minimum of only eight lecture hours to cover the whole of V&V, including reviews, testing, and elementary proofs of correctness. This situation is only slightly improved in the draft Curriculum 2001 (www.acm.org/sigcse/cc2001/): there are fewer hours (six) suggested in the core for software validation; formal proofs of correctness are gone, and inspection has been added. More recently, curricula in software engineering (as opposed to computer science) have started to appear. There is not yet a definitive source for guidance on such curricula. One source is the 1999 Guidelines for Software Engineering Education, Version 1.0 (www.sei.cmu.edu). While there is more attention to V&V in these guidelines, it is still well below the level we are proposing.

V&V covers issues of software quality, inspection, testing, and formal techniques. All of these need to be addressed, but in current practice, testing is the most common technique for finding bugs, which is too often the dominant cost in software development. An extensive study [10] a decade ago showed that code inspections can be up to four times more efficient at finding errors than testing. In the last few years, there has been considerable research into the evaluation of more effective inspection processes and techniques. Much of this research, along with its potential benefits, has not yet penetrated industry. Even worse, one informal survey [7] indicates that 80% of respondents practiced software inspection irregularly or not at all, after 30 years of research showing the effectiveness of software inspections over testing.

Good teaching should emphasize that execution of tests is only one of many V&V techniques, and should only be applied when it is expected to be the most effective of all possible actions. For example, resources are usually best put into execution of integration and system tests only when the software is already working reasonably well at the unit level. When tests find failures, debugging is needed to find the faults that caused the failures. If there is too much debugging, quality suffers, and costs and unpredictable delays mount. Debugging is a major contributor to the low cost-effectiveness of testing.

Usually allocated toward the end of the development process, testing bears the brunt of bug-finding under tight time and resource constraints and often becomes a grueling task. This isn't new, but it raises the (perhaps unanswerable) question of what students should be taught in order to improve their ability to conduct testing under severe pressure. On the other hand, strategies such as precise specifications and early test planning have long been proposed to rectify this situation, and such strategies should be emphasized in teaching testing. When they can be applied, these strategies work well.

Some recent ideas even advocate using test cases as a kind of specification, developing code and new test cases in incremental pieces, and testing early and often. This is a divide-and-conquer approach that splits testing into smaller pieces spread across longer periods of time. This is a fundamental idea. One place it is used is in Extreme Programming [1].

No matter how carefully other V&V techniques are applied, there will still be residual faults, as well as load, performance, and other issues, that can only be addressed at one of the many testing stages. Testing also has a legitimate role in determining when software is ready for release. Ideally, testing should be used to estimate the reliability of the tested product [3], although there are special circumstances in which a level of reliability is needed that is beyond what can be assured or measured by testing [2]. It is important to teach an appreciation and enthusiasm for this wide range of activities, to try to change the prevailing mindset of testing as a necessary evil.

Back to Top

Testing in an Undergraduate Software Engineering Specialization

Any discussion of how best to teach testing at the undergraduate level is complicated because there are several different types of software-related undergraduate programs, all with differing goals and priorities. For example, universities offer programs in computer science, computer engineering, software engineering, and information science, each emphasizing different aspects of software, and each having different amounts of time available to devote to testing and other software engineering issues. In addition, many different computer languages and systems are used, which influence the tools and resources available to practice testing.

Our specific example of an undergraduate program is the software specialization within computer engineering, taught at the Royal Military College of Canada (RMC) since 1992 [11]. It puts more emphasis on testing than most other undergraduate programs do. In the RMC program, software engineering is spread over eight key courses, one of which is a two-semester course. Issues of scale are taught in each of these courses, and are important for testing, although the scale of actual projects is necessarily small.

Computer Program Design is the first in-depth programming course. It teaches top-down and bottom-up testing, using stubs and test drivers. Students learn to use a debugger that helps them to learn the behavior of their programs. It's appropriate for students to learn about debuggers since professional programmers need debuggers as well, but serious testing shouldn't be reduced to making extensive use of a debugger.

In Software Work Products and Maintenance, (based on [6]), students are taught specific techniques for different parts of software development and maintenance. One of the main threads in the course is the use of precise specifications for objects. Unit tests are planned from these specifications, and automated using a simple scripting language. Unit testing at the object level, and planning tests from specifications, are fundamental concepts. The students are also taught the concepts of functional and structural testing, and the use of coverage measures as one way to assess the adequacy of testing. They have the opportunity to practice integration and system testing in an assignment (as discussed later). Recently, they have also had the opportunity to work with a commercial test tool suite. Inspections and reviews are emphasized as precursors to testing, and are applied to some of the specification documents used in the course.

Software Process and Quality is organized around a major project. The class is divided into teams to emulate a software development house. Several variations have been tried, including variations in the roles defined for each team, and in the amount of structure and direction provided to the students by the instructor. For example, in some years, teams have consisted of a developer and a tester. In some years, there is a Quality Assessment (QA) team that usually does quality assessment at best, rather than assurance. Some unit testing takes place, but integration and system testing normally fall victim to the schedule crunch. The students undertake code inspections. Despite their inexperience, inspecting fellow students' code provides enough results that the students are convinced of the value of inspection. A possible distraction is that student code is so buggy it yields an overabundance of results. As a result of the course, students learn a comprehensive V&V strategy is difficult to design and implement, and requires discipline. The exact experience changes from year to year. Lectures in the course related to V&V cover inspection and testing as well as other topics such as testing processes, defect tracking, configuration management, and quality attributes.

In Real-Time Embedded System Design, students use a variant of the Unified Modeling Language (UML) for real-time design supported by Rational's Rose/RT toolset (www.rational.com/products/rosert/index.jsp). It introduces the students to a new form of verification, namely visualization of execution at the design level. This technique is a blend of testing and inspection in the sense that execution of a design model is driven by scenarios represented by sets of test inputs. While we are not aware of any formal studies to confirm the value of this technique, intuition suggests it should be valuable. Another V&V problem dealt with briefly in this course is the specification and verification of timing behavior, including schedulability analysis and rate monotonic analysis. The course emphasizes that V&V of timing and performance issues must not be left to the testing stage alone.

Software Architecture and Distributed Systems has been offered since 1998/99. The original intent was to include testing theory and practice, as well as failure and recovery, which require specialized testing techniques. This turned out to be too ambitious. Instead, there are brief discussions of the specialized testing requirements of distributed systems.

All computer engineering students take a full-year Design Project course, in which they design and construct a prototype system, often including several thousand lines of code, that satisfies specified requirements. One requirement is the creation of an acceptance test plan used at the end of the project to test the system as built. Reviews are part of the development process, but students typically need to learn from experience how valuable reviews are in the early stages of development. Another lesson often learned is how much time can disappear into testing and debugging.

In the remaining software courses (Object-Oriented Techniques, Principles of Operating Systems), testing is used, but there is no special emphasis on teaching it.

In summary, students in this program learn more about concepts and techniques related to testing than is usual in most other programs. They also have opportunities to practice them. On the other hand, they learn very little about languages for describing and controlling the execution of test cases, strategies for selecting test cases, regression testing techniques, logging and analysis of test results, system testing issues (including performance, load, and stress testing), automatic test generation, software reliability analysis, commercial testing tools, or the management of test suites. We intend to acquire or create a tool environment for configuration management, version control, and defect tracking that is uniform across several courses, but it is proving difficult. Many commercial tools require a degree of discipline that is artificial for the scale of projects that students work on, so tools may get in the way of the learning experience.

Back to Top

Testing in a Graduate Course on V&V

In the winter term of 1992/93, a graduate course on software V&V was introduced at RMC. (An early version is described in [12].) To the best of our knowledge, this was the first such course in Canada. The course has now been given 11 times. It differs from the material taught at the undergraduate level in several ways. As many of our students have prior experience as software project managers, it emphasizes breadth, leaving students to learn the details of most techniques and tools on the job. It does provide specific hands-on testing experience, although the range is less than for the undergraduates who work hands-on with a larger variety of testing techniques and tools in eight different courses. The accompanying box lists the topics in the current version of the course.

The course meets for three hours per week. Students are expected to do a substantial amount of reading in preparation for each class. There are also typically three hours of guest lectures. These have included industrial V&V experiences, formal techniques, and testing tools.

The course includes four assignments. In past versions, there were fewer assignments and students had to do a research paper. The emphasis on a research paper has been reduced in order to broaden the common core of knowledge. The first assignment is a software inspection exercise where the students apply three different inspection techniques to examples of industrial software [8]. The second assignment examines the test apparatus of an existing software system for adequacy, understandability, repeatability, maintainability, completeness, and automation effectiveness. In the third assignment, the students are asked to evaluate a commercial testing tool for a brief list of capabilities. The fourth assignment is the shortest. Students are each assigned a research paper in the area of V&V and are asked to summarize and present the paper to the class.

Even at the graduate level, most students initially believe testing is the only practical way to really determine if software is correct. If they are recent graduates of a computer science program, they also tend to believe testing should be the primary way to find errors. There is a paradox here, since the students do appreciate that testing can never be used to demonstrate correctness completely, and they know how difficult debugging is. Students tend to have a bias against the use of inspections as a cost-effective engineering technique. In this respect, they share the views still held by many practicing software developers. By the end of the course, their views have changed. There is also an initial belief that test automation is easy and pushing a button takes away all the work of testing. That view also changes by the end of the course. Many of the students have been software project managers. Nearly all of them say the knowledge gained from the course would have been valuable in those positions.

By the end of the graduate course, students have learned more than undergraduates do about strategies for selecting test cases, regression testing techniques, logging and analysis of test results, software reliability analysis, commercial testing tools, and the management of test suites. They have also had a brief exposure to formal techniques of analysis and have been introduced to issues that affect the feasible scale of application of such techniques, but have no opportunity to try them.

The Software Work Products and Maintenance course and the graduate V&V course both do assignments working with an existing system of about 4,000 lines of code and 19 modules. A module in this context is an encapsulated set of state variables and methods that change or query the state variables. Two different systems can be built from the 19 modules. There are test plans and/or scripts for the modules and for both systems. Some of the testing is automated, including automated checking of results. Some requires the tester to painstakingly follow a script and compare actual results to expected results by hand. In some cases, there is no test plan and only a very simple script that acts as a driver to support manual testing. This diversity is typical of industrial testing.

In earlier versions of the courses, only unit (module level) testing was done. The undergraduates still write unit tests for one module. Currently, in both courses, the students are asked to evaluate the testing apparatus, and suggest improvements, assuming there will be future changes, and therefore a need for regression testing. The purpose of the exercise is not to run the tests to find bugs (of which there are very few, if any) but to understand the diversity of test-related documentation and review it to find faults (of which there are many, none seeded). The graduate students are expected to conduct a more thorough evaluation than the undergraduates. Greater thoroughness would benefit the undergraduates, but there is too little time for them to undertake it.

For most modules, tests are planned from precise specifications, and test execution is automated. For modules dealing with external devices (screen, keyboard, printer, disk), test automation is more difficult and is not complete in the present system. Testing adequacy is checked by a structural coverage measure [6], but it is emphasized that coverage by itself is not a sufficient indicator of adequacy. The approach illustrates several important ideas about testing, such as white box/black box/gray box issues, the value of specifications in deriving test cases and expected results, the need for careful design of test plans, the importance of automating the repetitive parts of the testing process, and the difficulty of testing some exceptions. It also opens discussion of the reasons some of the ideas taught (such as test planning from specifications and coverage measures to determine testing adequacy) are not widely used in practice.

Back to Top

Suggestions for Improvements at the Undergraduate Level

Current industrial testing practice includes many topics not covered in the undergraduate courses described. Among them are user-interface testing, performance and stress testing, usability testing, and installation testing. It would also be interesting to include alternative models of testing used in industry, such as the practice of daily builds and their associated testing activities [4]. In addition, there are other topics that could be covered more thoroughly. Examples are how to design test cases, scripts and other apparatus needed to support testing, how to debug, how to define and determine coverage, what adequacy means in relation to testing, how to decide when to stop testing, how far to go with test automation, how to calculate and use operational profiles, how to manage version control for test suites, how to select cases for regression testing, and how to evaluate commercial testing tools.

Introduction of a wider range of practices into undergraduate or graduate curricula needs to be done on the basis of fundamental underlying principles. These fundamental principles have not been well delineated for teaching purposes. One consequence is there are almost no textbooks that can even be considered in this area. There are clear advantages to separate course that teaches only V&V techniques. It provides a context for focusing on the comparison of the different techniques available. It allows a larger range of techniques to be presented, although not always in depth. It can be used to give a clearer indication of the importance of V&V in software development, since V&V taught in other courses is clearly secondary. On the other hand, it is essential to include V&V material in all software courses, although a case can be made for a V&V course at the undergraduate level that integrates and extends the knowledge acquired in earlier courses. What to include in other courses and how much to split off into V&V specific courses is a difficult question in curriculum design.

Back to Top

Conclusion

Much less than 50% of RMC's undergraduate curriculum is devoted to testing and other V&V issues. The percentage, and resultant depth of understanding, should increase but there is no obvious basis on which to set a target. Ideally, the time devoted to V&V activities in industry will be reduced as a result of better design practices, and curricula will in some way reflect that. The aim is to provide students with an understanding of:

  • The broad issues of V&V,
  • The proper places for testing activities in software processes,
  • How to plan and design good test strategies, and
  • How to minimize testing.

A concern is the continuing weakness of curriculum guidelines as they apply to the teaching of V&V. The two approaches we have described address this concern by integrating important elements into several courses at the undergraduate level, and offering a single course at the graduate level that emphasizes breadth.

Some of our graduate students have found jobs in V&V, in large part on the strength of taking this one course. In the undergraduate courses, some students have taken V&V ideas from one course and applied them in other courses—a positive sign they are learning! It is still the case at both the graduate and undergraduate levels that there is much more that could be usefully taught.

More dialogue between academics and software practitioners is needed to ensure that practical material with a sound engineering foundation is easily available in a form that is suitable for inclusion in undergraduate curricula.

It is clear more testing should be taught; exactly how much depends on factors such as the degree being offered and what the students expect to be doing upon graduation. We believe students do not reach a sufficient level of knowledge about V&V in most current computer degree programs to prepare them for software development.

The long-range goal should be to reduce the 50% of project resources typically devoted to testing. Better education in software engineering will contribute to that goal, but it must include significant improvement in how testing is taught.

Back to Top

References

1. Beck, K. Extreme Programming Explained. Addison-Wesley, Reading, PA, 2000.

2. Butler, R.W., and Finelli, G.B. The infeasibility of quantifying the reliability of life-critical real-time software. IEEE Trans. Softw. Eng 19, 1 (Jan. 1993), 3–12.

3. Currit, P.A., Dyer, M., and Mills, H. Certifying the reliability of software. IEEE Trans. Softw. Eng. (Mar 1989), 362.

4. Cusanamo, M., and Selby, R. Microsoft Secrets: How the World's Most Powerful Software Company Creates Technology, Shapes Markets, and Manages People. Simon & Schuster. New York, NY, 1995.

5. Fagan, M.E. Design and code inspections to reduce errors in program development. IBM Syst. J. 15, 3 (Mar. 1976), 105–211.

6. Hoffman, D. and Strooper, P. Software Design, Automated Testing, and Maintenance: A Practical Approach. International Thomson Computer Press, 1995.

7. Johnson, P.M. Reengineering Inspection. Commun. ACM 41, 2 (Feb. 1998), 49–52.

8. Kelly, D., and Shepard, T. Task-directed software inspection technique: An experiment and case study. IBM CASCON 2000 (Toronto, Nov. 2000).

9. Naur, P., and Randell, B. Software Engineering. Report on a Conference sponsored by the NATO Science Committee. (Oct. 1968) Garmisch, Germany.

10. Russell, G.W. Experience with inspection in ultralarge-scale developments. IEEE Softw. (Jan. 1991), 25–31.

11. Shepard, T. Software engineering in an undergraduate computer engineering program. In Proceedings of the 7th SEI Conference on Software Engineering Education (San Antonio, Jan. 5-7, 1994), 23–34.

12. Shepard, T. On teaching software verification and validation. In Proceedings of the 8th SEI Conference on Software Engineering Education. (New Orleans, 1995), 375–386.

Back to Top

Authors

Terry Shepard ([email protected]) is a professor in the Department of Electrical and Computer Engineering at the Royal Military College of Canada, Kingston, Ontario, Canada.

Margaret Lamb ([email protected]) is an adjunct lecturer in the Department of Computing and Information Science at Queen's University, Kingston, Ontario, Canada.

Diane Kelly ([email protected]) is a Ph.D. candidate in the Department of Electrical and Computer Engineering at the Royal Military College of Canada, Kingston, Ontario; and an adjunct instructor in the Department of Computing and Information Science at Queen's University, Kingston, Ontario, Canada.

Back to Top


©2001 ACM  0002-0782/01/0600  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2001 ACM, Inc.


 

No entries found