Data Cleaning and Information Quality

This web site contains different kinds of information related to data quality issues like data cleaning, record matching and data reconciliation. By no means, the list of items below is a complete enumeration of the work which has been accomplished in this area, but we are doing our best to enrich this collection. Our main goal, in maintaining this site, is to report our experiences through our work in various projects closely related to the management and integrity of the data. 

Our research approach in cleaning data is focused on using machine learning and statistical techniques to automatically build models from training data. The models, derived in this way, can be applied to cleaning efficiently and effectively enormous amount of data with very high precision and recall. We are in the process of building a powerful data cleaning tool that produces different data cleaning models on the fly, and evaluates these models by using a public domain database generator.

 

Related Links

  1. Data and Information Quality
  1. Data Cleaning and Dirty Data
  1. Record Linkage

 

Publications (A collection of papers)

 

People

 

Events

 

Companies