Application Areas and Meanings of Data Cleaning

Source: Internet
Author: User
Keywords data cleaning data cleaning meanings data cleaning application areas
Data cleaning In English literature, there are usually many explanations such as Data Cleaning, Data Cleaning and Data Scrubbing, which are generally translated as data cleaning. In the practical application field of data cleaning, we believe that there are usually two aspects:
   * Data cleaning in data warehouse applications and data mining applications
   * Data cleaning of data quality management
 
Data cleaning in data warehouse applications and data mining applications
 
   Common literatures often describe data cleaning, usually in data warehouse applications and data mining applications. The purpose of data cleaning in this field is to select defective data when several database data are merged or multiple data source data sets, and then correct and normalize them to meet the required data quality standards . Data warehouse is a data collection to support decision analysis, and data mining is a value-added technology based on data warehouse. In traditional data warehouse applications, data cleaning is an integral part of the ETL process; the core responsibility of ETL is to capture the data of the business system to ODS (ODS can be regarded as the data source of the data warehouse), so data cleaning is for the data warehouse For applications and data mining applications, it is a basic step in obtaining reliable and effective data and the foundation in the foundation.
 
Data cleaning for data quality management
 
   A natural shortcoming of traditional data warehouse applications is to reprocess the original business data with the help of data cleaning and data conversion before using the data. After all, the data defects cleaned in the ETL process, whether it is inconsistent data, missing data, data errors, data duplication, noisy data, etc., are defects in the original business data. It is natural to clean the original business data, but who will ensure that the cleaning process is correct? Without the comparison of the original data, what is used to ensure that the cleaning results are correct? With the development and practice of informatization, people are increasingly aware of the limitations of such data processing. Therefore, in the concept of comprehensive data quality management, data cleaning is moved forward to the process of data generation and use; from the perspective of data quality, the data cleaning process and the data life cycle are combined. Therefore, the data cleaning process of data quality management is defined as a process of evaluating the accuracy of data and improving its quality. Through the methods and means of data quality management, in the process of data generation, use and extinction, find defective data in time, and then use data management methods to correct and standardize the data, so as to meet the required data quality standards. This idea ensures the correctness and reliability of data from the source, helps to improve the quality of data in the entire information business process, and solves the problem of information and data integration, reduces errors caused by data defects, and more and more obtains enterprise information The favor of management.
 
   Because the service objects and purposes of data cleaning are different in these two application areas, there are also differences in methods, algorithms, and implementations.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.