If the number of reports or concerns of non-it media is used to measure the heat of an IT vocabulary, I believe that "big data" is definitely the hottest it vocabulary, and cloud computing is certainly not its opponent.
The big data is so hot because it seems to have more to do with the life of the general public than other it terminology, or more legendary relationships. From targeted marketing to helping the U.S. presidential election, big data quickly wears a magical cloak. And it is because of this dominant influence on human life, let it become every person living in the information world, more or less should pay attention to the direction.
Only in relation to the relationship between the enterprise and the customer, business owners want to use large data to analyze customer behavior rules, hobbies, to find the target customers, to the potential customers to push the information they may want, but on the other hand, consumers will be due to some unreliable information pushed to speculate about the level of large data related enterprises. In addition, the case of companies that change their operational efficiencies through big data seems to be increasing rapidly ... Therefore, from what aspect, the big data also increasingly has become the public from the conversation the capital. However, what is the big data, perhaps these participants will give different statements, there is a technical level of interpretation, there is an interpretation of the concept level, there are from the use of the effect of the ex-post summary ... But to be honest, large data is more diverse and more uncertain than other it terminology.
Is big Data a gimmick?
Today we're going to talk about what "big data is," and don't want to delve into technology or ideas, and hopefully a simple description to reach a consensus on big data to the fullest. In fact, the concept of large data was first formed in the IDC and EMC cooperation Research Report, but the focus of the research report is on the Internet, things networking, cloud computing and other trends under the trend of data accumulation, preservation and management of the warning, and then different vendors on this basis constantly expanding, Enrich and eventually extend an industry-recognized 4V attribute theory (volume volume, species produced, velocity velocity, value VALUE,IBM 4V concept defines the last V as the true veracity), thus forming a closed large data system, From the vertical and summarized from the hardware infrastructure to data management, then to data analysis, and finally the data presented 4-tier architecture. Ultimately, everyone's final consensus is that big data is ultimately used to serve society, which is its ultimate use, the accumulation, preservation, management, analysis of the previous mentioned is to serve this purpose.
So, the idea of big data seems to be unified, but it also makes many people scoff at the big data concept. If only large data to reflect the value of the application of social services, and the previous data mining, business intelligence concept and what is the difference? Large data in the volume, may not be reflected in many occasions, may be a few terabytes of data can make people get burned, nothing more than data type expansion. So it makes no sense to say that "big data is just a gimmick."
However, some of the current application cases of large data are different from the previous data warehouse applications, which may have unstructured and structured data, but also a new concept, processing models and means of change. As for what new things are caused by large data, we do not need to entwine here, only need to answer a core question: What is the data?
Data energy?
If a lot of people are chasing the big data The magic is that it can make the original obscure historical data to the magic, then we can not compare large data to a processing of a kind of energy? If we can, we look at the energy on Earth and we find that there are too many similarities with big data.
There are many kinds of energy on earth, but the prerequisite of energy is the cognition of human beings. In ancient times, humans were ignited by lightning and discovered two sources of energy, fire and wood, which could be used to warm, drive, and process food, which could be used to make a fire. After this, with the progress of human civilization, science and technology developed, and gradually discovered more and more energy, such as coal, natural gas, oil, solar energy and so on, but in humans can not recognize them, they do not exist? Obviously not, in the absence of human beings, they have been quietly waiting on the earth for hundreds of millions of years, but only the millions of the corresponding technology and tools, they can be in the gate.
The same is true of data, if the data has a build/capture-application/process-save/manage-analyze/dig-save or delete such a cycle, then at the moment of its generation, it will have the value, only if you have the ability to find them. This requires new ideas, knowledge, technology and tools. Even if the primitive people know dozens of kilometers underground oil, they can not be mined, the so-called data analysis, the truth is the same.
From the history of human development, the discovery of new energy, is a kind of axiomatic doomed. When we have mastered more and more advanced concept basis, and in accordance with the development of more and more advanced tools, the emergence of new surprises can be taken for granted. Just like when the car was first invented, no one would think that electricity and water would become viable energy for automobiles. From this point of view, the current large data brings all kinds of magic, but only in the data collection, management, analysis and other areas of a progress, it is inevitable, rather than a human "epiphany."
Let's take a look at the 4V properties of the Earth's energy:
Note: 4V attributes for large data: Volume, produced, velocity, and value.
1, volume--Reserves: Now proven, or newly proven coal, natural gas, oil, wind, solar energy reserves are quite alarming, but spread over a variety of terrain, landform, it depends on the ability of human beings can be acquired.
2, variety--Category: With the progress of human technology, the type of energy obtained more and more, and more and more previously thought that no resources, and then slowly become energy, radioactive elements is a typical example, and if the technology is ripe, the sea will become an inexhaustible new energy. Once there was a science fiction paper that said, "Maybe the future only needs a handful of Earth, you can send rockets to the moon." On the other hand, the mining techniques and tools that are needed for different energy sources vary, as is the difference between structured and unstructured data. As a result, human beings face more and more energy categories, it also requires more and more kinds of human energy extraction capacity.
3, velocity--Mining/conversion efficiency: it is difficult to imagine a single gallon of oil well in one day, what is the value of a solar energy conversion efficiency of less than 1%, but also the need to use it? Therefore, the efficiency of the human energy mining/conversion will determine the availability of this energy, if you do data analysis, today's sales data analysis, one months to come out, that does not analyze it.
4, value--value: The value of energy depends on the energy and contribution it can provide, no one knows that number 95th is better than number 92nd, but it's also more expensive; coal is cheap, but it provides relatively low energy, and electricity, which is still being converted through other sources of energy, is of value. It depends on its own attributes, and the corresponding data mining and refining capabilities, the former equivalent to the difference between gasoline and coal, the latter equivalent to the refinery in 95 and 92nd refining process between different. They all ultimately determine the value of the data-and, as some of the big data campaigns preach, as if any data were analyzed with large data, it would be impossible to have the loess become gold, unless your own perception of a certain type of data is biased (the original is Earth, and originally is gold, is fundamentally different)
IBM advocated the veracity, I think can be understood as the level of energy refining, even if the number 95th gasoline, Chinese production can be compared with the European and American production? This is the so-called true 95 and false 95th difference. Another typical example is enriched uranium, where 3% of uranium 235 of low-enriched uranium can be used for nuclear power generation, while uranium 235, which is more than 90%, can be used to make nuclear weapons, and the value of abundance is certainly different. Therefore, the difference between quality and ability will eventually be reflected in the value of energy. Therefore, IBM emphasizes that the accuracy and authenticity of data analysis can correspond to the purity of the energy refining process, and embodies a qualitative requirement for data analysis.
Figure Note: IBM defines the Big Data 4th V as veracity.
Finish talking about the concept of the comparison, and then look at the vertical technical framework of similarities, we can think of large data in reference to data collection, aggregation, preservation, management, analysis, presentation is not with energy exploration, mining, gathering, storage, refining, use of one by one corresponding relationship?
In particular, one of the big branches of the Internet of Things is that the industrial web is getting hotter, and the big data is its last important support (GE has also specifically invested in this pivotal), the idea is to all kinds of sensors, actuators also into the overall information collection system, with the help of industrial equipment control platform, Analysis of valuable information in industrial equipment to facilitate management, precision tuning and health warning. Think about it, is this not like in geological prospecting, the sensing information of different bombing points is aggregated, and then the vibration wave analysis software is used to show the underground mineral distribution and structure view?
What is the big data?
That said, we should be able to sum up--in a way, large data is actually human energy development history in the IT field of a reproduction and mapping.
The rhetoric of large data, both in terms of ideas and methods of implementation, can be a corresponding relationship in the long experience of human energy exploration, exploitation and utilization, so fundamentally it is not new.
But the key is, in the IT field, for the "data energy" level of awareness, is far less than the human understanding of other types of energy, so when we finally have a day to perfect the relevant concept of the foundation, the development of the corresponding tools, see the data inside the energy, it may be a first discovery of oil as the impulse, and then exaggerate it, Myth。 But, think about it, for humans who have been through too much of this kind of surprises (fire, electricity, coal, oil, gas, nuclear energy, solar energy, which does not give us a surprise?) Should not be so fussy.
I think, in the vision of data utilization, large data and the original data Warehouse, data mining, business intelligence concepts are expatiating, the same. Big Data is a new phase in the use of data for humans, it represents an idea (data energy), a way of thinking (from data collection to data analysis to the overall idea of data presentation) and a new tool (bringing together structured and unstructured data, semantic and robotic data, unified processing, A collection of analysis and rendering tools. It gives people a new ability to recognize data and opens up the imagination of human data use. In this respect, it is not advisable to reject large data and to make a gimmick.
In short, we must understand that large data is not falling from the sky, is the human it level developed to a certain stage of the inevitable result, like PCs, smartphones, and so on, is a lot of related technology in the interactive process of natural products. We have to look at it from a whole, not only to see some attractive things, but the original very plain information into a "legend." Obviously, big data now has this tendency, it seems omnipotent, all-encompassing. As we said above, if the value of the data itself is at the soil level, there is no need to expect it to be able to extract gold, the big data is only the original existence of the various levels of "data energy" real show. In the foreseeable future, it will become a normal, basic ability, just like petrol now, each car will not be proud of burning gasoline, it will become legendary.
So, I think the current big data boom should be targeted to cool down, let it back to its original nature, restore its essence, and focus on an energy source should be put in place-data collection channels is not wide enough? Is the data aggregation ability not strong enough? Is data management too complex? is the data processing capability too weak? Is data analysis too smart? is the presentation of the data different in ease of use and friendliness? In fact, when one thing, in the end we all have to do, when a kind of ability, we have to have, it is not a myth and legend, large data that is so.
(Responsible editor: The good of the Legacy)