Data quality Monitoring

Source: Internet
Author: User

  1. Definition of data quality
    • From the point of view of data consumers, high-quality data should be data that fully meets the requirements of users.
  2. Standards for data quality
    • Completeness: The data record is missing, and the field content is missing.
    • Consistency: whether the field content satisfies the proper rules, such as telephone number, IP, etc., whether the logical relationship between data is satisfied, such as PV>=UV, the percentage cannot exceed 100%.
    • Accuracy: garbled, unusually large or unusually small
    • Timeliness: SLA
  3. Data quality assessment process
    • Data quality requirements Analysis
    • Determine the object and scope of evaluation
    • Selection of data quality dimensions and evaluation criteria
    • Determination of quality measure and evaluation method
    • Using methods to evaluate
    • Results Analysis and ratings
    • Quality results and reports
  4. Methods of evaluating data quality
    • Basic concepts
      • Model M=<d,I,R,W,E,s>
      • D (DataSet) is a dataset that needs to be evaluated
      • indicators that need to be evaluated on the I (Indicator) DataSet D, such as completeness, accuracy, consistency, etc.
      • Rules of R (rule) corresponding to the evaluation indicator
      • W (Weight) gives the weighted value of rule R ( an integer greater than 0 ), which describes the proportion of the rule in all rules.
      • E (expectation) The expected value of rule R (real number from 0 to 100 ) is the desired result of the rule prior to evaluation.
      • s" (Result) Rules r corresponding final results 0 to 100 The real number
    • Construction Technology
      • It takes 4 steps to construct a data quality evaluation Model : Determine the data Set evaluation application view, select the evaluation indicator, make the rule set, and calculate the rule result score.

      • The following is a concrete example of how to construct a data quality assessment model.

        • 1 , determine the data Set evaluation application View

          In the data quality assessment, we first need to put forward the requirements of data quality assessment, to determine which data is of interest to the user (including databases, datasets in the database and the fields on the dataset ), to establish a corresponding user view of them.

          2 , select evaluation Indicators

          For each given data set, select the desired evaluation metric: For Customer, select two metrics for completeness and validity.

          3 , make rule sets

          According to the selected evaluation indicators, develop data quality assessment rules and determine their corresponding weights and expectations. For Customer, the following rules are established for completeness and validity metrics:

          (1) ID non-null (weight:5, expected value:90): Integrity

          (2) ID length is 18 bits (weight:10, expected value:90): Accuracy

          (3) The sex value is F or M (weight:10, expected value:98): Validity

          4 , calculate rule result score

          For each rule in the rule set R, check the data instance on the dataset, calculate the percentage of the data tuple that meets R, and get R corresponding to the result S. The final result is calculated as a percentage of the total number of data tuples: Assuming that their results are, respectively ,90.

Reference

Http://www.chinaz.com/web/2012/1112/281738.shtml

Http://blog.sina.com.cn/s/blog_66239fdb0100z9yf.html

Data quality Monitoring

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.