Enabling Efficient Storage of Git Repositories in PAClab
A current trend in the field of static analysis is to use open source platforms such as GitHub to develop benchmark program suites. These suites are used for the evaluation of static analysis tools. Programs collected from GitHub often must be transformed by hand to work with these tools. PAClab is an online platform that automatically performs code transformations to solve this problem. PAClab stores a base selection of Git projects that expand over time, creating a need to maintain multiple versions of each Git project effectively. In this work, we propose a solution to store multiple versions of a Git project efficiently in the PAClab system. We visualize Git projects as a graph, and by focusing on reducing the graph size and removing node duplication, we found filtering projects reduced the storage size of 2,955 Git projects by 96%. Compared with a naive approach, we also found our approach utilized around 6x less space when storing a week’s worth of snapshots.
Students
- BGSU, Master'sGraduated: August 2020
Back to all publications