Data quality management: Required Course for Data Center Optimization

Source: Internet
Author: User

Data is an important asset of an enterprise's data center. obtaining and maintaining high-quality data is critical to efficient IT and business operations. In the face of increasing complexity of business data, how can we fully ensure data quality? With the complete Informatica data quality platform, you can access, identify, clean, integrate, and deliver trusted data across the entire enterprise anytime and anywhere, in addition, you can immediately locate and correct data quality problems that may cause your company to spend millions of dollars at will.

Data management objectives of Informatica

What are the values of enterprise data in your mind? The survey shows that the average value of information is 37% of the enterprise's value. It can be said that information has become one of the most important assets of enterprises, and more enterprises pay more attention to the construction of data centers. However, there are many factors that may lead to the devaluation of these "assets", such as data redundancy and duplication, leading to information unidentifiable and untrusted, weak information timeliness, and insufficient accuracy; structured or non-structured data makes integration difficult; the impact caused by changes in the management layer; data standards cannot be unified, and incomplete relevant specifications may result in insufficient data understanding.

Informatica provides a series of infrastructure solutions at the data architecture level, including information transmission, B2B data exchange, and enterprise data integration, at the same time, it can also provide data quality management, primary data management, and complex event processing solutions for the industry to support data centers to achieve trusted, interactive, and authoritative information asset management and achieve business goals of enterprises, this is also the focus of information center construction.


 
Overall data quality management framework

Data quality management is a complete ecological chain in the construction of data centers. Data quality will be affected by suppliers, production staff, process flows, internal customers, and external systems. At the same time, from the perspective of applications and software, data providers, software development and integration, and quality control methods all affect the overall data quality of enterprises.

From the perspective of the overall framework and methodology of data quality, we should first determine the target, and then clarify who to use, what processes to use, and what technical support to achieve the goal, people, processes, and technology are indispensable. The most important thing before setting the final goal is to understand the status quo, find out which data quality enterprises are most concerned about, evaluate existing data through the scorecard, and perform real-time monitoring, discovering data changes from the process and time perspectives, the goals identified after understanding the status quo are credible and can be accomplished, rather than evening-style water, water, and fog.

In theory, data quality is not completely controllable. To improve data quality, it is necessary to quantify indicators and achieve quality control through quantitative indicators. At the technical level, we should consider the data quality from the following six perspectives, also known as the data quality matrix, including whether the completeness information is fully filled) compliance data is filled in according to the standard format), consistency refers to internal conflicts, the correlation between two fields in the same system is derived and constraints) accuracy includes whether the data is true and valid, and whether the data is updated in a timely manner), whether multiple pieces of unique information are consistent, and whether the integrity is considered as a reference between data and constraints ). Of course, enterprises need to determine the indicators or systems to consider data quality based on their own business needs, not necessarily limited to these six aspects.

Build a data quality processing process

A complete data quality management system is the perfect combination of people, processes and technologies to achieve our data quality management goal. What is the data quality processing process? For data quality processing, we divide it into two parts: one is the data quality-oriented analysis process, and the other is the process of enhancing the analysis results. First, we need to identify and quantify the data quality, and then define the data quality and objectives. Next, we need to hand it over to the relevant departments to design the quality improvement process, followed by the quality improvement process, turn the original low-quality data into high-quality data and deliver it to business personnel. At the same time, in the entire environment, related monitoring and comparison are also required to assess whether a goal is achieved and whether a new round of data quality improvement is required. This is a cyclical and spiraling process. It is not a one-stop process. It can solve all the problems at once.

Using Informatica to build data quality management is mainly divided into four parts. First, Analyze andPrfiling is used to Analyze and describe existing data to determine how to process and standardize the existing data. Second, Standardise \ Cleanse makes standardization and cleaning easier for computer recognition, for example, the data can be restricted and restricted by the format, and the data standard requirements can be met through the data particle or reference data; third, Match. After standardization, related data must be matched to solve the data duplication problem and ensure data uniqueness. Fourth, duplicate data should be merged, finally, it is applied to different systems. It should be noted that related monitoring is required throughout the process. In our daily data center construction, we can also extend or streamline these parts to improve our data quality management process.

Here we will introduce the most distinctive fuzzy match technology of Informatica. Different matching techniques can be used in data matching and association. For example, you can preview the data before achieving precise and fast data matching, check the matching degree of the two data, and then analyze it to determine whether the matching is applicable to the matching process of the data. If there is no exact match to ensure the matching rate, you can use a fuzzy comparison method. For example, in personal name comparison, a person's data may be inconsistent due to the characters in simplified Chinese pinyin or traditional Chinese characters, the two words are different in computer coding, so their names are not the same, in this case, we need to perform a fuzzy match to score his name and use the results to determine the data reliability. For example, if more than 0.8 of the data is set to be credible, the first two Kingdom trees are the same trusted person, and the second line is determined to be different from the same person.

Currently, the data centers of many enterprises are not only the most simple data storage centers for data warehouses, but also support both business operations and system analysis, it also supports integration between systems. In the construction of a data center, data quality needs to begin from the data source end to governance to business interaction. Informatica Data Quality tools, including PowerCenter, Data Quality, MDM Hub, and Informatica 9, can turn all business logic and rules into related services, the data service is called by various business front-ends and various business processes, and relevant data verification and data cleaning are performed, this is the support and help provided by the Informatica data quality platform for enterprise data center construction. Let's build a trusted data building from now on!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.