Boa - Mining Ultra-Large-Scale Software Repositories // Robert Dyer // UNL

Mining Ultra-Large-Scale Software Repositories

Project Website: https://boa.cs.iastate.edu/

GitHub: https://github.com/boalang/compiler

Consider answering a question such as "what are the average number of changed files per revision (churn rate) for all projects?" Answering this question ordinarily requires knowledge of (at a minimum):

mining project metadata,
mining code repository locations,
how to access those code repositories,
additional filtering code,
controller logic,
...

Solving this task in Boa is much easier:

1# what are the churn rates for all projects
2p: Project = input;
3counts: output mean[string] of int;
4
5visit(p, visitor {
6	before node: Revision -> counts[p.id] << len(node.files);
7});

First, we declare the input to be of type Project and give it an alias. Next we declare the output, which accepts values of type integer and computes the mean of all integers it sees, indexed by a string. Then we visit the input data, including each code repository and each revision in the repositories. Finally when we see a Revision, we send the number of files changed in the revision to the output variable, indexed by this project's id. This produces the mean value, which is the churn rate.

Boa has over 1,400 registered users from 36 countries. Boa has been used in over 50 research papers and 15 theses.

Related Publications

MSR: Boa Views: Easy Modularization and Sharing of MSR Analyses Che Shian Hung, Robert Dyer. June 29, 2020.
Acceptance rate: 45/171 (26.32%)
Boa Views: Enabling Modularization and Sharing of Boa Queries Che Shian Hung. August 1, 2019. A master's thesis at Bowling Green State University.
An Investigation of Routine Repetitiveness in Open-Source Projects Mohd Arafat. August 1, 2018. A master's thesis at Bowling Green State University.
TOSEM: Boa: Ultra-Large-Scale Software Repository and Source-Code Mining Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. December 1, 2015.
SPLASH: Demonstrating Programming Language Feature Mining Using Boa Robert Dyer, Hridesh Rajan, Tien N. Nguyen, Hoan Anh Nguyen. October 30, 2015.
Boa: an Enabling Language and Infrastructure for Ultra-large Scale MSR Studies Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. September 15, 2015. The Art and Science of Analyzing Software Data.
Bringing Ultra-Large-Scale Software Repository Mining to the Masses With Boa Robert Dyer. December 21, 2013. A Ph.D. dissertation at Iowa State University.
SPLASH: Mining Source Code Repositories with Boa Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. October 31, 2013.
SPLASH SRC: Task Fusion: Improving Utilization of Multi-user Clusters Robert Dyer. October 31, 2013.
SPLASH SRC: Task Fusion: Improving Utilization of Multi-user Clusters Robert Dyer. October 31, 2013.
GPCE: Declarative Visitors to Ease Fine-grained Source Code Mining with Full History on Billions of AST Nodes Robert Dyer, Hridesh Rajan, Tien N. Nguyen. October 27, 2013.
Acceptance rate: 20/57 (35.09%)
ICSE: Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. May 23, 2013.
Acceptance rate: 85/461 (18.43%)
SPLASH: Analyzing Ultra-Large-Scale Code Corpus with Boa Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. October 22, 2012.
SPLASH: Boa: Analyzing Ultra-Large-Scale Code Corpus Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, Tien N. Nguyen. October 22, 2012.

Collaborators

Hoan Anh Nguyen
Applied Scientist
Amazon
David OBrien
Ph.D. Student
Iowa State University
Hridesh Rajan
Professor
Iowa State University
Tien N. Nguyen
Professor
UT Dallas
Ganesha Upadhyaya
Senior Research Engineer
Celestia

Students

Samuel W. Flint
Ph.D.
Expected graduation: May 2025
I’m interested in the empirical evaluation of programming languages and language constructs using large datasets, as well as through methods like eye-tracking to better understand how developers comprehend code as they read or write.
Mohd Arafat
BGSU, Master's
Graduated: August 2018
Thesis: An Investigation of Routine Repetitiveness in Open-Source Projects
Neha Bhide
BGSU, Master's
Graduated: May 2016
Project: A Domain Specific Language for Inherited Visitor Attributes
Che Shian Hung
BGSU, Master's
Graduated: August 2019
Thesis: Boa Views: Enabling Modularization and Sharing of Boa Queries
Kaushik Nimmala
BGSU, Master's
Graduated: August 2017
Project: Shadow Types: Automatically Mapping Object-Oriented Types to Discriminated Unions
Brian Sigurdson
BGSU, Master's
Graduated: August 2018
Project: Boidae: Your Personal Mining Platform
Farheen Sultana
BGSU, Master's
Graduated: August 2016
Project: A Comparative Study of Actual and Potential Usage of Java Language Features
Jingyi Su
BGSU, Master's
Graduated: December 2017
Project: Supporting Specifications in Boa's Data Types