NEW Leveraging Code Embeddings to Identify and Address Blind Spots in Benchmark Creation

Published: April 8, 2026

A master's thesis at University of Nebraska-Lincoln

Formal software verification remains critical for early vulnerability detection, yet benchmarking these tools is costly and often reliant on centralized datasets such as SV-COMP. While such repositories enable standardized evaluation, they introduce risks of overfitting and bias, particularly due to first-party benchmark contributions. To address these limitations, we extend ARG-V, our tool for generating SV-COMP-compatible benchmarks from real-world Java code, with a novel approach of using code embedding techniques to selectively sample from mined code. By leveraging Nomic Embed Code and a cosine-based Minimum Hyperspherical Energy (MHE) objective, we systematically select and transform benchmarks from scraped GitHub code that maximize embedding diversity and minimize redundancy with existing corpora. From 5,100 candidate files, we identify 86 new benchmarks and evaluate them against the state-of-the-art in Java verification. We demonstrate that the proposed benchmarks significantly increase verification difficulty for all four leading Java verifiers at SV-COMP 2026, observing a notable decline in verifier accuracy for all verifiers across assertion safety and exception detection tasks. Qualitative analysis further demonstrates that the benchmarks selected with MHE address numerous “blind spots” for Java features in the SV-COMP corpus, including lambdas, streams, and GUI libraries. These findings highlight the potential of embedding-driven benchmark selection to improve the diversity, realism, and evaluation rigor within software verification.

Slides Preview

Download

Students

Charles Moloney
UNL, Master's
Graduated: May 2026
Thesis: Leveraging Code Embeddings to Identify and Address Blind Spots in Benchmark Creation

Back to all publications

Slides Preview

Students

Cite Export Citation

Abstract Abstract

Export Citation

Abstract