What is big data? Huawei Cloud Academy takes you on a journey to big data
Let's start with what the big data is about! Let me lead you! Start our Big Data learning journey! What big data is, content will include the creation of big data, the basic concept of developing big data. First we go back to the production and development of big data, big data generation and development has gone through three stages.
The first stage, we call it the budding period! From the 90 's to the beginning of this century, with the gradual maturation of data mining theory and database technology, a group of business intelligence tools and knowledge management techniques have also been applied, such as Data Warehouse, expert system knowledge management systems and so on. The second stage is what we call maturity. The rapid development of Web2.0 application in the first decade of this century, the massive generation of unstructured data makes the traditional processing method difficult to deal with, and the big data technology is fast breakthrough. And big data solutions are becoming more mature. Big Data has formed two core technologies of parallel computing and distributed system at mature stage. Big data technologies like Google's GFS and MapReduce are also being sought after! The Hadoop platform for open source technology is also beginning to be a big line! And the third stage we call a large-scale application period. After 2010, big data began to be widely used in all walks of life! People began to use data to drive decision-making, and the degree of intelligence of society was greatly improved.
Therefore, the development of big data, after the cliff mature, and then to a large-scale application of three stages.
We understand the history of big data, so what is big data? The concept of big data has become even a business issue and has been extensively reported in commercial publishing houses. For example, Forbes magazine reported that big Data had arrived at Dorset's healthcare family, and that by using the analysis tool, more than 2 million of patients in complex cases were assisted each year. The New York Times points out that data has become a class of economic assets, like money or gold. And CNBC has also this analogy, the data is like the new oil, for the excavation of no value, but after processing and refining, will greatly help the world development. So how exactly do we define big data? In fact, so far, big data does not have a clear unified definition, different organizations have different descriptions of big data. McKinsey argues that big data refers to data sets that are larger than the typical database software's ability to capture storage management and analysis. He argues that the general range of big data is from a few terabytes to several petabytes, while Wikipedia defines a large and complex collection of data that cannot be used for a certain amount of time using conventional software tools to crawl, manage, and manipulate its content.
The U.S. National Institute of Standards and Technology gives the definition of a large number of fast, or form a variety of data, it is difficult to use the traditional relational data analysis method for effective analysis, or the need for large-scale horizontal expansion, in order to effectively deal with this data form. Gartner believes that big data is a large, fast and diversified information asset that needs to be handled with efficient and innovative information technology to improve discovery Insight, decision making and process optimisation. It can be seen that the definition of big data does not have a unified conclusion. But whatever the description, the thinking characteristics of big data are universally acknowledged. What is the thinking characteristic of big data? Let's meet up with each other. First, the first v refers to the volume, the capacity. Mainly refers to the scale and growth rate of unstructured data. Because unstructured data accounts for 80% to 90% of the total amount of data, it also grows 10 times times as much as 50 times times faster than structured data, and the amount of data is 10 times to 50 times times the size of a traditional database. The second v refers to Variety, a plurality. It mainly refers to the easy structure and diversity of big data. The data has many different forms, such as the text image video machine data and so on, these data mostly is modeless or the pattern is not obvious. The third V is worth, value, mainly reflected in a large number of irrelevant, low information value density, need to pass the depth of complex analysis, before the future trends and patterns can be predicted. The fourth v refers to velocity efficiency. It is mainly embodied in real-time analysis, real-time rendering analysis results.
OK, so let's look at the thinking features in detail next. The first, V580 mu data volume is huge, mainly reflected in from the TB level to the PB level. So far, the amount of data produced by humans for all printed materials is 200 PB. So the current typical personal computer hard disk capacity of the TB magnitude, and some large enterprise data volume has been close to a B level, what is the concept? Let's take a look at some image examples of capacity units. For example, a PB equals 1024 TB equivalent to 50% of the United States Academic research library collection of the total content, a 1B is equal to 1024 PB, that five e-currency equivalent to so far all the words spoken by mankind worldwide, a ZB equals 1024 1B as the total amount of sand on the beach around the world. A crooked B equals 1024ZB equivalent to 7000 of the sum of the number of cells in the human body, it can be seen that big data volume is really huge. The second VVIP is diversified, and the data in the enterprise industry is a data component of big data.
So the diversity of big data is mainly reflected in, the source of the first data, multiple application systems within the enterprise data, the rise of the Internet and IoT, driving micro-blog social network sensors and other sources of data.
Second, there are many kinds of data, the structured data stored in the relational database is in fact only a few, and 80% to 90% of the data is a piece of audio and video model connection information documents and so on some unstructured and semi-structured data. These unstructured data are more and more important than the previously easy-to-store, text-based structured data. At the same time, these multi-types of data on the processing capacity of the data also put forward higher requirements.
Thirdly, the correlation is strong. Frequent interactions between data, such as images and logs uploaded by visitors on the road, have a strong correlation with the location and itinerary of visitors. So big data is not only reflected in the volume of the huge, but also reflected in the diversity of species. and a third VV6 value. For the big data itself, its value density is low, which is typical of its characteristics. And how to tap into the hidden value of big data, like haystack, is the core of big data, by mining scarce and valuable information from massive amounts of data. So what's the fourth V point?
The ability to implement data flow processing in big data is one of the key differences between big data references and traditional data Warehouse technology BA. So, for example, we have a one-second critical point, and for big data applications, the requirement must be answered within a second, otherwise the processing result is outdated or ineffective. According to the Digital Universe Report of I DC, it is expected that by 2020, the global data will be used for 35.2 ZB. So in the face of such a huge amount of data, the efficiency of data processing is the life of the enterprise. Well, these are the thinking features of big data.
Next time we will continue to explore big data, for further video learning, please visit Huawei Cloud Academy (https://edu.huaweicloud.com/)
I am waiting for you at Huawei Cloud College, I can't see you!
What is big data? Huawei Cloud Academy takes you on a journey to big data