An Investigation of Routine Repetitiveness in Open-Source Projects
Many programming languages contain a way to provide a sub-portion of the source code that performs a specific and often independent behavior. Depending on the language this is called a (sub-)routine, method, function, procedure, etc. One of the main purposes of creating a routine is to enable re-use. As devised, routines are intended to be called from multiple places within a program. Sometimes, however, the same code is repeated within a project or across projects. In this work, we investigate how often such routines are repeated in a large-scale corpus of open source software. This work attempts to independently reproduce a prior research result by Nguyen et al., building from the ground up the analysis framework and analyzing a different and very large set of open source software projects. In this work, we use the Boa infrastructure to investigate routine repetitiveness by analyzing over 300k open source projects from GitHub. Similar to the prior work, we first compute the program dependence graphs (PDGs) for each routine in the dataset, perform normalization on the PDGs, and look for repetitions both within and across projects. Our experiment shows that about 16.4% of routines repeat within a project and approximately 11% of routines repeat across at least two different projects. We then perform static program slicing on the PDGs, slicing the graph on each routine argument to obtain subroutines and look for repetitiveness once again. We observe that approximately 17% of all subroutines repeat within a project and 11% repeat across projects. Finally, we investigate if the size of the PDG or the number of control nodes has any impact on the repetitiveness of routines. Overall, our results confirm the trends shown in the prior study, though with differences in the size of the results.
Students
- BGSU, Master'sGraduated: August 2018
Back to all publications