In recent years, the hot topic of big Data has been a wave, and more and more people are concerned about the application of large data. Overall, people are optimistic about the prospects for big data, such as the technical features of big data, and it's easiest to remember 4 "V": vast (large numbers), produced (variety), velocity (rapid growth) and value (high total value). These are all right, but carefully, they are all biased towards the positive advantages of big data. But in fact, Big also has the big difficulty, the big data also inevitably has some negative disadvantage. These negative disadvantages can be summed up in four points:
inflated--Big data is fat. Large data is not only reflected in the number of rows of data records, more reflected in the number of field variables, which for the analysis of the correlation between many factors brought difficulty. Even the simplest analysis of variance, calculate one or two rows, calculate one hundred or two hundred is daunting.
unstructured--large data is unstructured. The structure of large data is also very complex, including such as turnover, time and other continuous variables, such as gender, work type, such as discrete variables such as the traditional structured data, but also add such as text, social networks, and even voice, image and many other new unstructured data, The amount of information contained in these unstructured data is often even greater, but the analytical approach is slightly thinner.
incomplete--large data is incomplete. In the real world, data loss is a common phenomenon due to the incomplete information of user registration and the error of computer data storage. In large data scenarios, data loss is more common, which adds to the risk of uncertainty for later analysis and modeling quality.
abnormal--large data is abnormal. Similarly, in the real world, there are quite a few outliers (outlier) in the big data. For example, some continuous variables (such as the amount of transactions in a short period of time) are too large, some discrete variables (such as a selected product name) in the number of a horizontal value appears too little, and so on. If not removed, it is likely to interfere with the calculation and evaluation of model coefficients; This left analysts with a dilemma.
The ancient Confucius, "warm and therefore know new", now, people use the analysis of large data, has been able to foresee and analyze a lot of the current era of the trend. But how can the authenticity of these analytical data be guaranteed? At present, our country's big data is in the development stage, if cannot handle these unfavorable factors blindly "follow the trend" to use, then the big data application superiority is difficult to play out. To really use large data is not a simple upgrade of data analysis under normal conditions, but a comprehensive work that requires great wisdom.
In fact, as long as the effective use of large data, can real-time monitoring of a variety of potential risks, improve production efficiency, more importantly for the enterprise to provide a lot of insights to enhance investment return and competitive advantage, but also to help enterprises from the multidimensional judgment of the global market potential business opportunities, to achieve rapid development. If there is no comprehensive, objective understanding of the process of large data, even if we can easily use large data to get a lot of predictable data, but also a bit of accuracy?
Therefore, we must have a comprehensive and objective understanding of the big data. The four difficulties mentioned in the paper are also important. Only in different business and data background to adopt different strategic tactics, can in the big data age, really play the leverage of large data, effectively improve the operational efficiency and market competitiveness.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.