Thursday, February 7, 2019
4 p.m., Avery 115
3:30, Avery 348
Alexandra Meliou, Ph.D.Assistant Professor, University of Massachusetts Amherst
Data-driven software has the ability to shape human behavior: it affects the products we view and purchase, the news articles we read, the social interactions we engage in, and, ultimately, the opinions we form. The correctness and proper function of such data-driven systems relies heavily on the correctness of their data. Errors, omissions, biases, and poor data quality in general can lead to disruptions, loss of revenue, incorrect conclusions, and misguided policy decisions. Improving data quality is far more than purging datasets of errors; it is critical to improve the processes that produce the data, to collect good sources for generating the data, and to address the root causes of problems.
Our work is grounded on an important insight: While existing data cleaning techniques can be effective at purging datasets of errors, they disregard the fact that many errors are systemic, inherent to the process that produces the data, and thus will keep occurring unless the problem is corrected at its source. In contrast to traditional data cleaning, we focus on data diagnosis: explaining where and how the errors happen in a data generative process. I will describe our work on two diagnostic frameworks for large-scale extraction systems and relational data systems. I will also provide a broader overview of my lab’s work on enhancing usability, understandability, and trust in data technologies, highlighting the role of data management in realizing a vision for toolsets that assist the exploration and effective use of information in a varied, diverse, and highly non-integrated data world.
Alexandra Meliou is an Assistant Professor in the College of Information and Computer Sciences, at the University of Massachusetts Amherst. Prior to that, she was a Post-Doctoral Research Associate at the University of Washington.
Alexandra received her PhD degree from the Electrical Engineering and Computer Sciences Department at the University of California, Berkeley. She has received recognitions for research, teaching, and service, including a CACM Research Highlight, an ACM SIGMOD Research Highlight Award, an ACM SIGSOFT Distinguished Paper Award, an NSF CAREER Award, a Google Faculty Research Award, multiple Distinguished Reviewer Awards, and a Lilly Fellowship for Teaching Excellence. Her research focuses on data provenance, causality, explanations, data quality, and algorithmic fairness.