For large data, there is a passage:
"The big data are like teenage sex,everyone talks about It,nobody really knows you to do it,everyone the everyone else is thinks it, So everyone claims tightly are doing it. "
See this sentence, what do people have a basic concept of what the big data is? At present, a lot of the understanding of large data still stay in: massive data, super large-scale, the amount of data reached the PB level, even EB, ZB level of data. Through in-depth analysis of these data, we can get very valuable conclusions to guide the enterprise to make the best decision.
Big data is the kind of article that many people have heard of or read about, but don't know what the specific thing is.
In fact, now http://www.aliyun.com/zixun/aggregation/14294.html "> Big data is not just a huge amount of data, more accurately is the method of large data analysis." The traditional data analysis is to verify the hypotheses by making assumptions and then obtaining the corresponding data. and large data is not such, large data from the collection of massive data, through the algorithm to these from different channels, formats of the data directly analyzed, looking for the correlation between the data. In simple terms, large data tends to be more focused on discovery, and the cycle approximation process of guessing/verifying.
The value of large data is embodied in the analysis and utilization of it. All along, the bottleneck of large data is not the storage and operation problems caused by data scale, but the way of collecting front-end data and structuring the data, which leads to the model and algorithm problem in the later business decision.
All industries are generating data, and the volume of data in modern society is increasing at an unprecedented rate. These different types of data and data type are extremely complex, including structured, semi-structured, and unstructured data. Enterprises need to consolidate and analyze data from complex traditional and non-traditional sources of information, including internal and external data from the enterprise. With the explosive growth of sensors, smart devices and social collaborative technologies, data types become difficult to count, including text, microblogging, sensor data, audio, video, and so on.
And now the big hot data analyst is doing the work: gathering information, structuring it, and finally, the magic power of the big data we see. But the problem is that there is too much work to do with the data. According to interviews and expert estimates, the data analyst's 50%~80% time is spent on processing data.
Monica Rogati, who is responsible for data work at the Jawbone Company, said:
Working with data is a huge part of the whole work. But sometimes we get frustrated because it seems like we're doing everything by working on the data.
This sounds a bit like the iceberg theory, where the big data we see is just a small corner of the iceberg, and what we don't see, such as the upfront work of big data, is the bigger part of the sea.
But McKinsey, a consultancy, said in a 2011 report:
"Data has penetrated into every industry and business functional area today, becoming an important factor in production." The excavation and application of massive data indicates a new wave of productivity growth and consumer surplus. ”
Yes, there are also opportunities lurking in problematic places. The original data format and source are not counted, for example, if a food business needs to collect and analyze large data, it can collect data including output, location information for shipments, weather reports, daily retail sales, social media reviews, etc. Based on this information, enterprises can gain insight into the market direction and demand changes, and then develop the corresponding product plan.
Indeed, the more information you get, the better it is for companies to make informed decisions. But this decision is based on a different set of data, the data from a variety of sensors, documents, Web pages, databases, all of which are in different formats and must be converted into a uniform format so that the software can understand them and analyze them.
It is a serious challenge to format all kinds of data, because data is as fuzzy as human language, some data humans know what it means, but computers can't recognize it, so we need to repeat the work over and over again.
There are already a number of startups trying to develop technologies to mitigate this, such as clearstory data, a start-up in Palo Alto, which develops software that identifies different data sources, integrates them, and renders the results visually, such as charts, graphs, or data maps. Like Paxata, a Californian start-up that focuses on data automation-discovering, cleaning, and provisioning data-can be fed into various analytical or visual software tools through paxata processed data.
The current situation of large data is somewhat similar to the trajectory of computer development. An advanced technology that is initially often mastered by only a few elites, but over time, through constant technological innovation and investment, this technology, or tool, will become more and more good. Especially when it is integrated into the commercial field, this tool can be widely used and become the mainstream of society.
So we are now the witness of history, looking at how the big data step by step, we all need to master or choose an optimal analytical method to better dig out the value of large data.