Searching for relevant code is a common task among programmers, one that is becoming more crucial to their productivity as open code repositories grow in size and programmers have the opportunity to reuse the resources therein. Yet, the code search tools for programmers to leverage such massive repositories have barely evolved in the last decade. Programmers still search by specifying keywords, an approach that may perform very inconsistently as it depends on the programmer’s ability to provide terms that match how the programs in the repository are implemented or documented. This is problematic because a slightly wrong keyword could miss relevant results and return mostly spurious results, reducing programmer’s productivity.
This research aims to address these limitations by exploring a code search that is novel in two aspects. First, it allows programmers to search for code by specifying desired code behavior in the form of inputs and outputs. This removes the need for the programmer to imagine how the solution to a problem may have been implemented, and instead the programmer can concentrate on defining what the software should do. Second, given the inputs and outputs describing the desired behavior and the set of programs in a code repository, both automatically encoded as constraints, the proposed approach employs a
constraints solver to identify what programs in the repository could satisfy the programmers’ specifications, effectively solving the search. This guarantees the search results behave as specified. Developing this approach requires tackling several fundamental challenges including the definition of mappings to automatically and efficiently encode programs as constraints so that solvers can find suitable solutions. Systematic strategies must also be developed to refine the programmers’ specifications when they are too weak and return too many matches, and to relax the constraints in order to find partial matches when exact ones are not available. Techniques to support the composition of partial matches will also be necessary to scale the approach to larger and more diverse search queries. Last, infrastructure must be built and studies conducted to determine whether the proposed approach can be cost-effective in practice.
If successful, this research will change the way programmers operate, directly impacting their productivity by enabling them to truly leverage existing code in increasingly rich code repositories.
This very cool project is being supported by NSF and Google.