Robert Dyer Robert Dyer

Assistant Professor
School of Computing
University of Nebraska–Lincoln

EmailGitHubLinkedInCVGoogle Scholar

 NEW Leveraging Code Embeddings to Identify and Address Blind Spots in Benchmark Creation

Published: April 8, 2026
A master's thesis at University of Nebraska-Lincoln

Formal software verification remains critical for early vulnerability detection, yet benchmarking these tools is costly and often reliant on centralized datasets such as SV-COMP. While such repositories enable standardized evaluation, they introduce risks of overfitting and bias, particularly due to first-party benchmark contributions. To address these limitations, we extend ARG-V, our tool for generating SV-COMP-compatible benchmarks from real-world Java code, with a novel approach of using code embedding techniques to selectively sample from mined code. By leveraging Nomic Embed Code and a cosine-based Minimum Hyperspherical Energy (MHE) objective, we systematically select and transform benchmarks from scraped GitHub code that maximize embedding diversity and minimize redundancy with existing corpora. From 5,100 candidate files, we identify 86 new benchmarks and evaluate them against the state-of-the-art in Java verification. We demonstrate that the proposed benchmarks significantly increase verification difficulty for all four leading Java verifiers at SV-COMP 2026, observing a notable decline in verifier accuracy for all verifiers across assertion safety and exception detection tasks. Qualitative analysis further demonstrates that the benchmarks selected with MHE address numerous “blind spots” for Java features in the SV-COMP corpus, including lambdas, streams, and GUI libraries. These findings highlight the potential of embedding-driven benchmark selection to improve the diversity, realism, and evaluation rigor within software verification.

Slides Preview

Download Download


Students



 Back to all publications