In the big data age, you need to think like this

Source: Internet
Author: User

Victor? Mayr? Schenberger and kennis? In the big data age, couyer tells us the 4 V features of big data, namely volume (massive), velocity (high speed), variety (Diverse), and veracity (real ). Compared with small data, big data must be complex. However, complexity is definitely an opportunity for us, not a problem. In the face of the big data era, how to embrace big data begins with the transformation of thinking methods.

From "preset-Based Structured databases" to "non-preset non-relational databases are not required"

In the era of small data, we have always relied on the classification and indexing methods for data storage and retrieval. Classification and indexing are a mechanism for clear data acquisition, this mechanism is based on the preset field. The preset field of this structured database can display the data in a neat arrangement and accurate storage. There is no doubt that this is exactly the same as the goal of pursuing data accuracy, in the age of Data scarcity and clear questions, this preset-based structured database can effectively answer people's questions and provide consistent results at different times.

In the face of big data, the preset database system will crash due to the massive and mixed features of data. In fact, the complexity and uncertainty of the world are true only when the data is messy. To obtain the value of big data, it is a feasible path to admit chaos rather than confrontation or avoid chaos. To this end, with the emergence of big data, there is a non-relational database, it does not need to set the record structure in advance, and allows a variety of different types of data. Because of the diversity of structures, these non-relational database designs that do not need to be preset can process and store more data and become an important competitor in the big data era. As Microsoft's database design expert pathelland says, "We can no longer pretend to be in a neat world ."

From "random sample" to "full data"

Through analysis, statisticians found that the accuracy of sampling analysis significantly increased with the increase of random sampling, but it had little to do with the increase in the number of samples. This discovery is undoubtedly very encouraging in the era of small data. Random sampling has achieved great success and has become the core idea of modern social measurement. The basis of random samples is the absolute randomness of sampling. However, it is very difficult to implement random samples with such strict meaning. Once the sampling process has any bias, the analysis results will be far from each other, in addition, random samples only give us answers to the preset questions. This lack of scalability will undoubtedly lead us to miss more questions.

In the big data era, data collection is no longer a problem, and full data collection is a reality. Full data brings us a macro and high-level perspective, which enables us to look at the problem at a higher level and see the value of data that has been drowned, discover interesting details hidden in the whole. Because we have all or almost all of the data, we will be able to have a more detailed and comprehensive observation of the possibility of research data from different perspectives, this makes the big data analysis process a pleasant process of discovery and expansion of problem domains.

From "data accuracy and result accuracy" to "Data mixing and result error tolerance"

In the era of small data, because the amount of data available is relatively small, we must record all the data as accurately as possible, leading to optimization of measurement tools; due to the limitation of data processing methods, the data that can be used is basically limited to the structured data that can be applied to traditional databases. Because random sampling is used, therefore, the accuracy of the sampling process is important. Obviously, this kind of precision is the result of the lack of information age and the Simulation Age.

In the big data era, the emergence of massive data will certainly increase data chaos and result inaccuracy. If we are still obsessed with accuracy, we will not be able to cope with this new era. Compared with the increase in incorrect results due to data mixing, the expansion of data volume makes new insights, new trends, and new values more meaningful, because big data usually uses probability to speak, moreover, before big data processing, data can be cleansed to reduce part of the error data. Therefore, compared with the commitment to avoiding errors, the inclusion of errors will bring us more information. In fact, the right attitude for us to embrace big data is to allow the data to be mixed and the result to be inaccurate. Only concession, acceptance, and even appreciation are inaccurate, in order to see the bright prospects brought by big data, we should get used to this kind of thinking in the future.

From "complex algorithms" to "simple algorithms"

Algorithms are tools for data value mining. Therefore, algorithm research has always been an important path to improve data utilization efficiency. In the era of small data, when data restrictions cannot be broken through, the desire to obtain data information and price values leads to more and more in-depth research on algorithms and the complexity of algorithms invented. Facts show that when the data volume expands exponentially, the accuracy rate of simple algorithms that used to perform poorly in Small-order data will be greatly improved. On the contrary, the best complex algorithms run with a small amount of data. When more data is added, the advantages of the algorithms are not apparent. Therefore, more data is more intelligent and important than algorithm systems. Simple Big Data algorithms are more effective than complex algorithms of small data.

From "why" to "what"

In the era of small data, due to the limited data availability and computing power, our research on the problem needs to be verified based on assumptions and explore the "why ", however, the analysis research that begins with assumptions is very vulnerable to bias.

In the big data era, the rapid development of a series of technical groups, such as data storage, data transmission, data acquisition, and data processing, provides us with a new perspective and valuable predictions for our research on problems, it also gives us more connections and developments that have never been noticed before, and exploring what is becoming a more convenient way for us to discover the world and understand the world, and will not be affected by the prejudice of prior assumptions.

From "causal relationship" to "correlation"

In the era of small data, the lack of information will make us tend to use the causal relationship paradigm to quickly understand problems and make decisions. Although such a causal relationship may not exist, but this is a shortcut for us to understand and explain the world. When the limits of human power are highlighted, this cognitive shortcut often brings us a sense of comfort and security, as if the world is a result of existence.

In the big data era, the research on data is no longer constrained by the exploration of causal relationships, which will make us fully conditional on the transformation to the exploration of related relationships such as association and non-Association. There are countless classic cases similar to beer and diapers. Massive Data is constantly created and our capability to collect, store, transmit, and process data is increasingly increasing. This is a feature of the big data era. Based on Modern means such as the Internet and cloud computing, we will find that, there is a high degree of correlation between seemingly unrelated things, which is beyond the reach of traditional causal analysis and logical reasoning research.

Of course, the relationship is not the final goal of big data insight. In many cases, once we have completed the relevant Relationship Analysis of big data and are no longer satisfied with what it is, we will continue to study the causal relationship, looking for "why" and further looking for causal relationships based on the analysis of relevant relationships will greatly reduce the analysis cost. In fact, causal relationships are a special correlation.

From "Prudent decision-making and action" to "quick decision-making and action"

In the era of small data, we use data collection and analysis to verify this assumption based on the assumptions about social operation situations. Through data testing, the original assumptions are not true, this means that we will re-start new assumptions and re-collect and analyze new data until our verification passes. Therefore, in the era of small data, our decisions and actions are prudent.

In the big data era, we are no longer limited by traditional thinking models and implicit assumptions. We need to analyze the big data tools and theories, big Data will present us with new profound insights and release great value. We explore the world under the guidance of big data and are no longer subject to various assumptions. This will enable us to receive insights from data at any time with a positive attitude, and make quick decisions and actions, because opportunities and values will soon be refreshed, the value of big data lies in the timely delivery of timely information to people in need and timely decision-making and action. The foreseeable future must be that data is everywhere.

In fact, we are only standing at the starting point of a long process.

In the big data age, you need to think like this

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.