This paper introduces a data quality (DQ) framework and develops a methodology for assessment and management of the DQ in the University data warehouse (DW). The framework includes a technical design that defines the required metrics, techniques and tools to measure, store and report DQ values in the DW. It is the first that integrates the metrics for subjective and objective analysis of DQ and adds the results of these analysis in a DQ repository in the DW. The framework is validated by one University domain application to provide guidance for the implementation phase.
The University data quality framework includes a policy, a methodology for DQ assessment and management, and a technical design that leads to more informative reports and measurable improvements in DQ. This framework focuses on the applications and data resources stored and managed by the DW. It measures the quality confidence of the reports from those applications and provides supporting evidence to the DW developer and data experts to trigger possible actions that improve the DQ.
Previous work on the DQ assessment of DW has discussed some metrics for specific application contexts provided by the database administration as the objective analysis. They suggested to establish precise constraints and business rules on the data and applications. However, information technology experts alone cannot improve DQ and we must consider data consumers’ perspectives and allow them to define the level of usefulness and trustworthiness required from data. Methodologies like Total Information Quality Management and Data Quality Assessment studied this perspective by subjective analysis on the organizational level.
Our framework is the first that applies both analysis over the data stored in the DW using a metrics-based approach and helps remove the assumptions and subjective emotion associated with DQ issues. It defines different metrics to measure completeness, consistency, validity, accuracy and timeliness of the data for the objective analysis. Subjective analysis of DQ are defined based on accessibility, interpretability and reasonableness measures.
The identification of the metrics for the subjective and objective assessments provides a valuable input into the DQ management and improvement process and it provides an informative and tangible indication of the DQ issues.
Details of the DQ management, including how to store and report the DQ values, are also discussed in the design. The conceptual model discusses the governance of DQ and presents the important roles and responsibilities for the DQ improvement.