Title
Scaling up the Fitness Function for Reverse Engineering Feature Models
Thammasak Thianniwet and Myra B. Cohen
Supplementary Data for -- SSBSE 2016
Abstract
In recent research on software product line engineering, sev- eral frameworks have been developed to reverse engineer feature models using a genetic algorithm. In these frameworks the starting point for the search is either a set of Boolean equations or a set of valid products representing the software product line. The most successful fitness func- tions aim to maximize the number of matched products while reducing the number of missing and/or additional products. To calculate this fit- ness, each valid product defined by the model is enumerated using a SAT solver. In attempting to reproduce the results of these frameworks, we have discovered that once the size of the feature models increase beyond 27 features, the enumeration step becomes a bottleneck and the search either runs out of memory or times out. In this paper we propose a new fitness function, SATff, that simulates validity by computing the tauto- logical implication of the sets of constraints in each model against the original, estimating the distance between the two. In an empirical study on 101 feature models we evaluate the quality of the new fitness function compared with two existing fitness functions that use the enumeration technique. We see that SATff shows a significant improvement over one of these across all models, and no significant difference with the other one, suggesting a similar effectiveness. However, SATff requires only 7% of the runtime on average, scaling to feature models with as many as 97 features.
Subjects Used

We used 101 subjects for our experiments, categorized into several groups as bellows.

  1. Small - 22 models
  2. Medium - 54 models
  3. Large - 15 models
  4. XLarge - 9 models
  5. C code - 1 model
  6. Total: 101 models download
Each model has two files *.cfm and *.sfm. *.cfm is an extended (attributed) FaMa feature model [1]. *.sfm is a feature model used in our framework (same as it is used in [2]). The C code model was extracted from nano-2.4.2 [3] using FARCE [4]. The rest of models are originally from SPLOT feature model repository [5]. The Small, Medium, and Large models were used in ETHOM study [6]. Pleaes see subjects.xlsx for more information about the models download.

Supplementary Data

Here is the result from our experiment.

References
  1. Benavides, D., Segura, S., Trinidad, P., Ruiz-cortés, A.: FAMA, a framework for automated analyses of feature models. http://http://www.isa.us.es/fama/ (2014)
  2. N. Andersen, K. Czarnecki, S. She, and A. W, asowski. Efficient synthesis of feature models. In Proceedings of the 16th International Software Product Line Conference - Volume 1, SPLC '12, pages 106-115, 2012.
  3. http://www.nano-editor.org
  4. Nadi,S.,Berger,T.,Kästner,C.,Czarnecki,K.:Where do configuration constraints stem from? an extraction approach and an empirical study. IEEE Transactions on Software Engineering (2015)
  5. Mendonca, M., Branco, M., Cowan, D.: S.P.L.O.T.: Software product lines on- line tool. http://www.splot-research.org/ (2014), Generative Software Develop- ment Lab. Computer Systems Group, University of Waterloo
  6. Lopez-Herrejon, R.E., Galindo, J.A., Benavides, D., Segura, S., Egyed, A.: Reverse engineering feature models with evolutionary algorithms: An exploratory study. In: Symposium on Search Based Software Engineering, Lecture Notes in Computer Science, vol. 7515, pp. 168–182. Springer Berlin Heidelberg (2012)
Acknowledgments

This research was supported in part by the National Science Foundation award CCF-1161767. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the Air Force Office of Scientific Research.