Making System User Interactive Tests Repeatable: When and What Should we Control?
Zebao Gao, Yalan Liang, Myra B. Cohen, Atif M. Memon and Zhen Wang
Supplementary Data -- ICSE 2015
Abstract
System testing and invariant detection is usually conducted from the user interface perspective when the goal is to evaluate behavior of an application as a whole. A large number of tools and techniques have been developed to generate and automate this process, many of which have been evaluated in the literature or internally within companies. Typical metrics for determining effectiveness of these techniques include code coverage and fault detection, however, this assumes that there is determinism in the resulting outputs. In this paper we examine the extent to which a common set of factors such as the system platform, Java version, application starting state and tool harness configurations, impact these metrics. We examine three layers of testing outputs: the code layer, the behavioral (or invariant) layer and the external (or user interaction) layer. In a study using five open source applications across three operating system platforms, manipulating several factors, we observe as many as 184 lines of code coverage difference between runs using the same test cases in the worst case, and may obtain up to to 96 percent false positives with respect to fault detection. We also see some (although less) variation among the invariants inferred. Despite our best efforts we can reduce, but not completely control all possible variation in the output. We use our findings to provide a set of best practices that should lead to better consistency and smaller differences in outcomes, allowing more repeatable and reliable testing and experimentation.
Experiment Settings
Supplemental Data
The following table shows the results of 5 applications in different configurations across different platforms.
No. Program Name Version Line Of Code Number of Windows Number of Events Configurations
1 Rachota 2.3 8,803 10 149
2 Buddi 3.4.0.8 9,588 11 185
3 JabRef 2.10b2 32,032 49 680
4 JEdit 5.1.0 55,006 20 457
5 DrJava 20130901-r5756 92,813 25 305

This research was supported in part by the National Science Foundation awards CCF-1161767, CNS-1205472, and CNS-CNS-1205501. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Research.