I read "Automated Program Repair" with interest (Dec. 2019, p. 56–65). This is exciting technology that, if successful, holds out the promise of substantially improving software quality. While the article highlights systems developed by the first and third authors (GenProg, SemFix, Angelix), it omits quantitative data that can provide a more complete picture of the capabilities of extant program repair systems. My hope is this quantitative data can help researchers and practitioners better understand the capabilities and current limitations of this promising technology.
The most complete evaluation of the GenProg system was reported in Le Goues et al.,1,2 which examines results for a superset of the defects originally considered in Le Goues et al.3 Unfortunately, as reported in Qi et al.7 and communicated to the authors of Le Goues3 in fall of 2014, the experimental setup contains a variety of test harness and test script issues. When these issues are corrected, the results show that Gen-Prog does not fix 55 of 105 bugs, as one might reasonably expect from reading the title of the article. Instead, GenProg fixes only two bugs, highlighting the remarkable ineffectiveness of GenProg as an automatic patch generation system. Moreover, only 69 of the reported 105 bugs are bugs—the remaining 36 are deliberate functionality changes.
No entries found