6533b821fe1ef96bd127b9c2

RESEARCH PRODUCT

Automated Patch Assessment for Program Repair at Scale

He YeMatias MartinezMartin Monperrus

subject

FOS: Computer and information sciencesGround truthCorrectnessComputer sciencebusiness.industryRandom testing020207 software engineering02 engineering and technologyOverfittingMachine learningcomputer.software_genreOracleSoftware Engineering (cs.SE)External validityComputer Science - Software Engineering020204 information systems0202 electrical engineering electronic engineering information engineering[INFO]Computer Science [cs]State (computer science)Artificial intelligencebusinessScale (map)computerSoftware

description

AbstractIn this paper, we do automatic correctness assessment for patches generated by program repair systems. We consider the human-written patch as ground truth oracle and randomly generate tests based on it, a technique proposed by Shamshiri et al., called Random testing with Ground Truth (RGT) in this paper. We build a curated dataset of 638 patches for Defects4J generated by 14 state-of-the-art repair systems, we evaluate automated patch assessment on this dataset. The results of this study are novel and significant: First, we improve the state of the art performance of automatic patch assessment with RGT by 190% by improving the oracle; Second, we show that RGT is reliable enough to help scientists to do overfitting analysis when they evaluate program repair systems; Third, we improve the external validity of the program repair knowledge with the largest study ever.

10.1007/s10664-020-09920-whttp://arxiv.org/abs/1909.13694