Mining Ultra-Large-Scale Software Repositories
Consider answering a question such as "what are the average number of changed files per revision (churn rate) for all projects?" Answering this question ordinarily requires knowledge of (at a minimum):
- mining project metadata,
- mining code repository locations,
- how to access those code repositories,
- additional filtering code,
- controller logic,
- ...
Solving this task in Boa is much easier:
1# what are the churn rates for all projects
2p: Project = input;
3counts: output mean[string] of int;
4
5visit(p, visitor {
6 before node: Revision -> counts[p.id] << len(node.files);
7});
First, we declare the input to be of type Project and give it an alias. Next we declare the output, which accepts values of type integer and computes the mean of all integers it sees, indexed by a string. Then we visit the input data, including each code repository and each revision in the repositories. Finally when we see a Revision, we send the number of files changed in the revision to the output variable, indexed by this project's id. This produces the mean value, which is the churn rate.
Boa has over 1,400 registered users from 36 countries. Boa has been used in over 50 research papers and 15 theses.
Related Publications
- MSR:
Boa Views: Easy Modularization and Sharing of MSR Analyses
Che Shian Hung,
. June 29, 2020.Acceptance rate: 45/171 (26.32%)
- Boa Views: Enabling Modularization and Sharing of Boa Queries Che Shian Hung. August 1, 2019. A master's thesis at Bowling Green State University.
- An Investigation of Routine Repetitiveness in Open-Source Projects Mohd Arafat. August 1, 2018. A master's thesis at Bowling Green State University.
- TOSEM: Boa: Ultra-Large-Scale Software Repository and Source-Code Mining Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. December 1, 2015. ,
- SPLASH: Demonstrating Programming Language Feature Mining Using Boa Hridesh Rajan, Tien N. Nguyen, Hoan Anh Nguyen. October 30, 2015. ,
- Boa: an Enabling Language and Infrastructure for Ultra-large Scale MSR Studies Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. September 15, 2015. The Art and Science of Analyzing Software Data. ,
- Bringing Ultra-Large-Scale Software Repository Mining to the Masses With Boa . December 21, 2013. A Ph.D. dissertation at Iowa State University.
- SPLASH: Mining Source Code Repositories with Boa Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. October 31, 2013. ,
- SPLASH SRC: Task Fusion: Improving Utilization of Multi-user Clusters . October 31, 2013.
- SPLASH SRC: Task Fusion: Improving Utilization of Multi-user Clusters . October 31, 2013.
- GPCE:
Declarative Visitors to Ease Fine-grained Source Code Mining with Full History on Billions of AST Nodes
Hridesh Rajan,
Tien N. Nguyen. October 27, 2013. ,
Acceptance rate: 20/57 (35.09%)
- ICSE:
Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories
Hoan Anh Nguyen,
Hridesh Rajan,
Tien N. Nguyen. May 23, 2013. ,
Acceptance rate: 85/461 (18.43%)
- SPLASH: Analyzing Ultra-Large-Scale Code Corpus with Boa Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. October 22, 2012. ,
- SPLASH: Boa: Analyzing Ultra-Large-Scale Code Corpus Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. October 22, 2012. ,
Collaborators
- Ganesha UpadhyayaSenior Research EngineerCelestia
Students
- Mohd ArafatBGSU, Master'sGraduated: August 2018
- Neha BhideBGSU, Master'sGraduated: May 2016Project: A Domain Specific Language for Inherited Visitor Attributes
- Che Shian HungBGSU, Master'sGraduated: August 2019
- Kaushik NimmalaBGSU, Master'sGraduated: August 2017Project: Shadow Types: Automatically Mapping Object-Oriented Types to Discriminated Unions
- Brian SigurdsonBGSU, Master'sGraduated: August 2018Project: Boidae: Your Personal Mining Platform
- Farheen SultanaBGSU, Master'sGraduated: August 2016Project: A Comparative Study of Actual and Potential Usage of Java Language Features
- Jingyi SuBGSU, Master'sGraduated: December 2017Project: Supporting Specifications in Boa's Data Types