The correct method of large data analysis should be treated rationally
Source: Internet
Author: User
KeywordsLarge data analysis large data analysis large data big information analysis large data we large data analysis large data we a large number of large data analysis large data we a large number Alibaba
According to statistics, from the beginning of human civilization to the 2003, mankind has created a total of 5TB (trillion bytes) of information. Now, the same amount of data can be created in just two days, and the speed is still accelerating. Such a large amount of data complicates data analysis, and unstructured data in large data deepens this complexity.
In this case, we need to be clear: what data should be saved. If the data acquisition and storage are not large, the results obtained from the analysis and calculation of massive data are of practical value. This is also the value of large data.
With regard to the large number of data, one of the more radical views in the industry is that the term "big data" is problematic because it is useless for data to be "big". Although data is ubiquitous, it is only more valuable to have a strong reusability and to translate data into useful abstract information.
Even if our data collection and processing capabilities are growing, we still have to uphold the principle that "not all data is important". For the enterprise, the specific need to follow two points, one is to adhere to the broad data, internal control of the enterprise analysis of data, foreign users preferences and habits; the second is to adhere to the key data, from the most important place, grasp the data reuse, achieve maximum value and cost optimization.
The Harvard Business Review recently published an article entitled "Will greater data lead to better decisions?" Article, the article warns that focusing on quantity will lead to big mistakes. Today many companies are trying to get benefits from huge amounts of data, but only a handful of companies are truly successful, which is too much about the drawbacks of the "volume" of data.
Data quality and data sharing for large data analysis
We know that to ensure the accuracy of the analysis results, we must ensure that the analyzed data is true and effective, at least the majority of the data samples to have quality assurance. But in a large number of data from the convergence of data sources in the process, it is inevitable that the shoddy data mixed.
When buying Taobao, sellers credit rating is an important reference for buyers or not. In order to improve product sales, brush credit rating has become the industry's public secrets, with some sellers fraud, illegal increase in credit rating process, will produce a large number of distortion data, cheating consumers at the same time, will directly affect the later data analysis results.
Second, China's internet industry, "data separatist" phenomenon is more serious, that is, a large number of core data of the big Internet giants are each other, unwilling to share. such as Baidu to master the search data, Tencent, the master of Social data, and Alibaba, who have mastered the consumer data, are aware of the importance of the data for the future competitiveness of the enterprise, so they will not be able to give their data chips easily.
Still with Baidu, Tencent, Alibaba as an example, according to their current prevalence in China's internet, we can roughly estimate the total number of users of the three applications in the Internet, the proportion of users, conservative estimates, to 50% is not a problem. Therefore, once the three-party data is shared, a complete network information map can be pieced together. On the contrary, the "data separatist" caused large data faults and one-sidedness, which greatly reduced the value of their use.
CMiC that, in the large data torrent hit at the moment, the flow of information is the most important, the internet giants of the data separatist thinking seriously hinder the development of the entire industry. Especially for those who have large data analysis technology but no big data source of the middle and downstream enterprises, facing the dilemma of "bricks".
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.