Skip to content

Regression Testing in Continous and Large Scale Environments (FSE 2014)

Gregg Rothermel, John Penix , and I co-authored a paper entitled “Techniques for Improving Regression Testing in Continuous Integration Development Environments” that has been accepted to FSE 2014. The paper captures the new challenges for regression testing introduced by environments with large scale and fast changes,  and some initial solutions to deal with those challenges. The paper reflects some of what I experienced and observed during my sabbatical at Google, and better yet, it uses the Google Data Set that emerged from that experience.

Abstract:
In continuous integration development environments, software engineers frequently integrate new or changed code with the main- line codebase. This can reduce the amount of code rework that is needed as systems evolve and speed up development time. While continuous integration processes traditionally require that extensive testing be performed following the actual submission of code to the codebase, it is also important to ensure that enough testing is performed prior to code submission to avoid breaking builds and delaying the fast feedback that makes continuous integration desirable. In this work, we present algorithms that make continuous integration processes more cost-effective. In an initial phase of testing prior to code submission, developers specify related modules to be tested, and we use regression test selection techniques to select a subset of the test suites for those modules that render that phase more cost-effective. In a second phase of testing following code submission, where dependent modules as well as changed modules are tested, we use test case prioritization techniques to ensure that failures are reported more quickly. In both cases, the techniques we utilize are novel, involving algorithms that are relatively inexpensive and do not rely on code coverage information — two requirements for conducting testing cost-effectively in this context. To evaluate our approach, we conducted an empirical study on a large data set from Google that we make publicly available. The results of our study show that our selection and prioritization techniques can each lead to cost-effectiveness improvements in the continuous integration process.

Published inUncategorized