1 , determine the data Set evaluation application View
In the data quality assessment, we first need to put forward the requirements of data quality assessment, to determine which data is of interest to the user (including databases, datasets in the database and the fields on the dataset ), to establish a corresponding user view of them.
2 , select evaluation Indicators
For each given data set, select the desired evaluation metric: For Customer, select two metrics for completeness and validity.
3 , make rule sets
According to the selected evaluation indicators, develop data quality assessment rules and determine their corresponding weights and expectations. For Customer, the following rules are established for completeness and validity metrics:
(1) ID non-null (weight:5, expected value:90): Integrity
(2) ID length is 18 bits (weight:10, expected value:90): Accuracy
(3) The sex value is F or M (weight:10, expected value:98): Validity
4 , calculate rule result score
For each rule in the rule set R, check the data instance on the dataset, calculate the percentage of the data tuple that meets R, and get R corresponding to the result S. The final result is calculated as a percentage of the total number of data tuples: Assuming that their results are, respectively ,90.