Data Warehouse construction and continuous improvement of data quality

Source: Internet
Author: User

The construction process and method of data warehouse system is different from the process and method of building the traditional operation type processing system, the construction of Data Warehouse system has two difficulties: first, how to guarantee the data quality, make the data accurate and credible, and how to construct the application system to meet the needs of different role users.

Affected by the current situation of the production system, such as the data source data is incomplete, inconsistent, data extraction time point can not be synchronized, there are market competition between the local network and business rules of the difference between the different professional statistical caliber, data quality problems exist objectively, The management of data quality problem will run through the whole process of data Warehouse system construction. The application of Data Warehouse system originates from the user's requirement, which originates from the commercial understanding of the developer, and the development and perfection of the application are restricted by the data quality. Therefore, data Warehouse system construction needs to realize the interaction between data and application.

Data Warehouse requirements for data quality

The requirements of data warehouse on data quality are summarized as follows: Data integrality, including whether the data source is complete, whether the data is complete, and whether the dimension value is complete. Data accuracy, including whether the data source is accurate, the encoding mapping relation is accurate, the processing logic is accurate and so on. The accurate judgment of data reconciliation is either the result is consistent or inconsistent but the reason is explanatory. Data consistency, including the same data between the source system is consistent, the source data and extracted data is consistent, data Warehouse internal processing link data are consistent and so on. Data logic rationality, mainly from the business logic point of view to determine whether the data is correct, such as the amount of account type, time and frequency of the logical relationship is satisfied, etc. monthly rental fee can not appear the number of calls, call time and so on. Timeliness of data, including the timeliness of data processing (acquisition, collation, loading, etc.), the timeliness of anomaly detection, and the timeliness of processing and rollback.

The Data Warehouse serves the management decision, the data of the management decision basis should be comprehensive, true and reliable and meaningful. Timeliness of data if not guaranteed, it may delay the analysis of market personnel, loss of business opportunities.

From the Data warehouse construction process, it is not very strong to repair the data to improve the quality of data, but it can find some data quality problems in the production system so as to remind users of the quality of the data problems, the data problem feedback to the business support system, the latter to do data modification.

Analysis of the quality of source data

Traditional business support system to meet the production of business processing as the goal, with internal management needs as the starting point, a variety of support systems independent design, lack of overall business process considerations, lack of efficient use of resources, within the enterprise formed a lot of information islands. Mainly reflected in:

Data is too dispersed, manual processing data and system processing data coexist, data format is diverse.

In the data model, entity semantics definition, attribute definition, naming rules and coding rules are the system, which makes it difficult to match with other systems.

There is a phenomenon that the number of data records of the same entity is inconsistent between the systems, such as the billing system and the number of customers in the 97 system.

There is information incomplete phenomenon, there is no complete unified customer view.

In the production system, there is a need to split into the atomic data, the size of the source data is too coarse to meet the analysis requirements.

In the same province companies, different local network billing is not the same, resulting in the time of data delivery is not uniform.

The improvement in data quality has the following difficulties:

Large amount of data, data format is not uniform.

Data quality standards are not easy to develop.

The boundaries of data cleanup are not easily defined.

Continuous upgrading of production systems, personnel position adjustment factors such as easy to cause after the chaos.

Because the data of data Warehouse system comes from many kinds of business systems, such as billing, business account, customer service, network management and so on, the data inconsistency in different systems is often found in the integration of various source data, and the quality problem of source data is more outstanding. In the initial stage of data Warehouse system construction, the data quality is not high, which needs to be revised and supplemented in the process of system construction and use, so as to gradually perfect and finally solve the data quality problem of the system.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.