acm-header
Sign In

Communications of the ACM

Research highlights

Technical Perspective: Example-Driven Program Synthesis For End-User Programming


Understanding how to get a computer to perform a given task is a central question in computer science. For many years the standard answer has been to use a programming language to write a program the computer will then execute to accomplish the task.

An intriguing alternative, however, is to provide the computer with examples of inputs and corresponding outputs, then have the computer automatically generalize the examples to produce a program that performs the desired task for all inputs. Researchers have worked on this approach for decades, first in the LISP community,4 then later in the inductive logic programming community,1,2,3 to name two prominent examples. Given the relatively modest size of the programs the resulting techniques are able to produce, the field has evolved to focus largely on data mining, concept learning, knowledge discovery, and other applications (as opposed to mainstream software development).

The following paper focuses on an important emerging area—end user programming. As information technology has come to permeate our society, broader classes of users have developed the need for more sophisticated data manipulation and processing. While users in the past were satisfied with relatively simple interactive models of computation such as spreadsheets and other business applications, current users are now looking to automate custom data manipulations such as reformatting, reorganizing, simple calculations, or data cleaning. While such users may have a good command of the interactive functionality of their application, they often lack the expertise, time, or inclination to develop software specifically for their task.

The authors illustrate how to apply example-driven program synthesis to automate spreadsheet computations. This work, therefore, is of interest to the millions of people worldwide who use spreadsheets. The methodology consists of four basic steps:

  • Domain-specific language: Develop a domain-specific language capable of representing the desired set of computations.
  • Data structure: Develop a data structure that can efficiently represent the large set of programs that are consistent with a given input/output example.
  • Learn and intersect: Generate data structures for representing the programs consistent with each individual input/output example, then intersect the data structures to obtain a representation of the programs consistent with all examples.
  • Rank the resulting set of programs, preferring more general programs over less general programs. Users can then view the results of the ranked programs on different inputs to guide the program selection process.

This approach effectively addresses many of the issues that complicate example-driven approaches. Domain-specific languages help focus the synthesis process by avoiding the generality of standard programming languages (that can produce very large program search spaces that are intractable to manipulate efficiently). Compact program representations make it possible to manipulate large numbers of programs efficiently. An effective ranking algorithm helps users quickly identify a desirable program (out of the potentially unbounded number of programs that are consistent with the provided examples). And the interactive program evaluation mechanisms (automatic identification of inputs on which candidate programs produce different outputs) help users navigate the space of synthesized programs.

The authors have successfully applied this approach to three classes of spreadsheet programs: syntactic string manipulations (such as converting telephone numbers into a standard format), semantic string manipulations that make use of some background system knowledge stored as relational tables (such as transforming strings in inventory tables or transforming dates), and table layout computations (which reorganize data stored in tables). All of these systems have been successfully integrated with Microsoft Excel and have been tested on multiple examples from Excel help forums.

In the future, the need for users to obtain ever more sophisticated custom behavior will only increase. Given the significant obstacles that traditional programming approaches pose for the vast majority of users, we can expect to see a proliferation of solutions that help non-programmers accomplish software development tasks that have traditionally been the exclusive domain of software professionals. Given the difficulty of specifying and implementing large software systems, these solutions will (at least initially) focus on the automatic generation of relatively small but still useful solutions to everyday problems. This paper provides an outstanding example of the kinds of useful solutions that non-programmers can now automatically obtain and demonstrates the kind of sophisticated implementation techniques that will make such automatic program synthesis systems feasible for a variety of problem domains.

Back to Top

References

1. Lavrac, N. and Dzeroski, S. Inductive Logic Programming—Techniques and Applications. Ellis Horwood, NY, 1994.

2. Muggleton, S. and De Raedt, L. Inductive logic programming: Theory and methods. Journal of Logic Programming 19, 20 (1994), 629–679.

3. Muggleton, S., De Raedt, L., Poole, D., Bratko, I., Flach, P.A., Inoue, K. and Srinivasan, A. ILP turns 20—Biography and future challenges. Machine Learning 86, 1 (2012), 3–23.

4. Smith, D.R. The synthesis of LISP programs from examples: A survey. Automatic Program Construction Techniques, A.W. Biermann, G. Guiho. and Y. Kodratoff, Eds. Macmillan, 1984, 307–324.

Back to Top

Author

Martin C. Rinard ([email protected]) is a professor in the Department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, Cambridge, MA.


©2012 ACM  0001-0782/12/0800  $10.00

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2012 ACM, Inc.


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Article Contents: