In the big data age, we need to think this way

Source: Internet
Author: User
Keywords Cloud computing Big Data Microsoft Google Apple data center data center
Tags .mall analysis apple based big data big data age change cloud

Victor Maire Schoenberg and Kenneth Couqueil in the big Data age tell us the 4V features of large data, namely volume (mass), velocity (high Speed), produced (variety), veracity (real). Large data must be complex compared to small data. However, complexity is definitely an opportunity, not a problem, for us. In the face of the big data age, how to embrace the big data, from the change of thinking way to begin.

From "structured database based on presets" to "non-relational database without presets"

In the small data age, our data storage and retrieval has been dependent on the classification and indexing method, classification and indexing is a clear acquisition of data mechanism design, this mechanism is based on the premise of the preset fields. The preset fields of this structured database can demonstrate the neat arrangement and accurate storage of data, which is no doubt consistent with the goal of pursuing the accuracy of data, and this kind of structured database based on presupposition can effectively answer people's questions in the age of data scarcity and problem clarity. And the database can provide consistent results at different times.

In the face of large data, the default database system crashes due to the mass and promiscuous characteristics of the data. In fact, the complexity of the data really presents the world's complex and uncertain characteristics, to get the value of large data, to admit chaos rather than confrontation or avoid chaos is a viable path. To this end, with the emergence of large data, the emergence of a relational database, it does not require a predetermined record structure, but also allows the processing of a variety of heterogeneous data. Because of the diversity of structures, these database designs, which do not need to be preset, can process and store more data and become an important coping tool in the age of large data. As Microsoft's database design expert Pat Helland said: "We can no longer pretend to live in a neat world." ”

From "random samples" to "full data"

Statisticians have found that the accuracy of sampling analysis has increased greatly with the increase of sampling randomness, but it is not related to the increase of sample quantity. This discovery is certainly very inspiring for the small data age, and random sampling has been a great success, and it has become the core idea of modern social measurement. Random samples are based on the absolute randomness of sampling, however, it is very difficult to achieve such a strict sense of random implementation, once the sampling process has any bias, the analysis will be far from the results, and random samples to us can only be the answer to pre-set questions. This lack of ductility will no doubt cause us to miss out on more problematic domains.

In the big Data age, the collection of data is no longer a problem for us to collect the full amount of data to become a reality. Full-scale data gives us a macroscopic and lofty perspective, which will allow us to look at the problem at a higher level, see the value of the data that has been submerged, and discover the interesting details hidden in the whole. With all or almost all of the data, we have the possibility of observing the data in a more detailed and comprehensive way from different angles, thus making the analysis of large data a process of surprise discovery and problem domain expansion.

From "The accuracy of the data and the veracity of the result" to "the fault tolerance of the data's confounding and result"

In the small data age, because the amount of data available is small, for this reason, we must record all the data obtained as accurately as possible, which leads to the optimization of the measuring tools; Due to the limitation of data processing means, the data that can be used by us are basically limited to the structured data that can be applied to traditional databases; The accuracy of the sampling process is therefore placed in an important position. Clearly, this obsession with precision is the product of the age of information scarcity and the analog age.

In the big data age, the emergence of massive data will certainly increase the chaos of the data and cause inaccurate results, if still obsession with the accuracy of adherence, then we will not be able to deal with this new era. Compared to the increase in the number of errors that may result from the confounding of the data, it makes more sense to have new insights, new trends, and new values from the expansion of data volumes, since large data is often spoken with probability, not to mention the fact that large data can be cleaned before it is processed to reduce some of the error data. So, the inclusion of errors will give us more information than the effort to avoid mistakes. In fact, allowing the data to be mixed and allow the inaccuracy of the result is our right attitude to embrace the large data, only concessions and accept or even appreciate the inaccuracy, can see the big data brings us a good future, we should be accustomed to this thinking.

From "Complex algorithm" to "Simple algorithm"

Algorithm is the tool of mining data value, so the research of algorithm has always been an important way to improve the efficiency of data utilization. In the small data age, the acquisition of data information and value is more and more deeply, and the algorithm is more and more complicated in the case that the data limit can't be broken. And it turns out that when the amount of data is expanded at a point of magnitude, the simple algorithm that behaves poorly in the small order of data, the accuracy rate will be greatly increased; conversely, complex algorithms that run best in a small amount of data do not show the advantage of the algorithm when adding more data. Therefore, more data than the algorithm system appears more intelligent and important, the simple algorithm of large data is more efficient than the complex algorithm of small data.

From "Why" to "what"?

In the small data age, due to the limitation of data availability and computational power, our research on the problem needs to be validated on the basis of hypothesis, and explore the "why", and the analysis that begins with hypothesis is very susceptible to bias.

In the large data age, the rapid development of data storage, data transmission, acquisition, processing and other series of technology groups, for our research on the problem to provide a new vision and valuable predictions, and let us get more of the past has not been concerned about the relationship and dynamic, to explore "what" Become a more convenient way for us to discover the world and not be prejudiced by a priori hypothesis.

From "Causal relationship" to "related relationship"

In the small data age, the scarcity of information tends to lead us to adopt a causal paradigm to quickly understand problems and make decisions, although this causal relationship may not exist, but it is a shortcut to understanding and interpreting the world. When human power is limited, such a cognitive shortcut can often give us a sense of comfort and security, as if the world is a causal existence.

Since the research of data in the large data age is no longer rigidly confined to the study of causality, it will make us fully conditional on the transformation of relational and unrelated relationships. There are a few classic examples of the correlation between beer and diapers. Massive data is constantly being manufactured and our ability to collect, store, transmit and process data is becoming more and more, which is the present feature of the large data age. Based on the Internet, cloud computing and other modern means, to a large number of data for statistical search, comparison, analysis, induction, we will find that there seems to be nothing to have a high degree of relevance between things, this is the traditional causal analysis, logical reasoning research is difficult to explain and can not be.

Of course, correlation is not the end goal of big data insights. In many cases, once we have completed a relational analysis of large data and are no longer content with what is merely "what", we continue to study causality, seek "why" and, based on the analysis of relevant relationships, further search for causal relationships will greatly reduce the cost of analysis. In fact, causal relationship is a special relationship.

From "Prudent decision and action" to "quick decision and action"

In the small data age, we validate this assumption by collecting and analyzing data based on assumptions about the social workings of the environment, and by checking the data, the original hypothesis does not hold, meaning we will restart the new assumptions and collect and analyze the new data until our verification is passed. Therefore, in the small data age, our decisions and actions are prudent.

In the big data age, we are no longer limited to traditional thinking patterns and implicit assumptions, we need the tool theory of large data analysis, through the analysis of large data, large data will show us new insights and release great value. We explore the world under the guidance of large data and are no longer subject to assumptions, which will make it imperative to receive insights from the data in a positive manner and to make rapid decisions and actions, because opportunity and value will soon be refreshed, The value of large data is precisely the timely delivery of information to those in need in a timely manner and make decisions and actions in a timely manner. The foreseeable future must be that the data will be the world.

In fact, we are just standing on the starting point of a very long process. (This paper is the first titanium media)

"Author: Platoguo, now serving in Shanghai-Thinking Information Technology Co., Ltd."

(Responsible editor: Lvguang)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.