Keywordsbig data data quality data quality monitoring
Due to the complexity of the contents involved in big data, many business caliber, even market personnel are not sure to be able to make it clear. For example, Wang Xiaoqiang has to calculate the revenue of yesterday's RBT. So what kind of users are CRBT users? Is it the user who has ordered RBT service or the user who has used RBT service? So called income is to point to receivable income or actual income? These different calibres will lead to very different calculation results, and Wang Xiaoqiang may not be able to fully explain the statistical calibre he needs.
Once the caliber is determined, the data must be accurate. There is also the problem of time window. For example, did yesterday's time refer to 12 p.m? Are those incomes that haven't finished the rating at 12 o'clock included
Fortunately, after the caliber is determined, we can compare the relative value and see the real changes in the RBT market.
Similarly, in the process of big data processing, there will be many data quality problems, 80% of which are caused by this caliber difference. Therefore, when market personnel see the result data, they sometimes question whether the data is accurate; and data analysts also need to face this kind of query correctly, use technical means to prove that the data in their hands is not wrong, and their analysis results are true and reliable.
In the early stage of big data put into use, sometimes there will be a situation of "false data, true analysis". There are many reasons for this kind of false data, mainly due to the lack of inspection and audit means. Some data source providers sometimes intentionally or unintentionally provide improved data to "package" their business performance.
At the beginning of the establishment of big data, it is necessary to consider the monitoring of data quality, and identify fake data in time through multi-dimensional and multi angle data detection rules
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.