We have all heard the following predictions: By 2020, the amount of data stored electronically in the world will reach 35ZB, which is 40 times times the world's reserves in 2009. At the end of 2010, according to IDC, global data volumes have reached 1.2 million PB, or 1.2ZB. If you burn the data on a DVD, you can stack the DVDs from the Earth to the moon and back (about 240,000 miles one way).
For those who are apt to worry about the sky, such a large number may be unknown, indicating the coming of the end of the world. For optimists, these numbers are an information gold mine, and as technology advances, the wealth that it contains becomes more and more easily mined.
Into the "Big data" era, there are a number of emerging data mining technology, making the data wealth storage, processing and analysis has become cheaper and faster than ever before. As long as there is a supercomputing environment, then large data technology can be used by a large number of enterprises, thus changing the way many industries run business.
Our definition of large data technologies is the use of non-traditional data-filtering tools, including but not limited to Hadoop, to excavate a large collection of structured and unstructured data to provide useful data insights.
The concept of large data technology, like "cloud computing", also has a lot of hype and a lot of uncertainty. To that end, we consulted a number of analysts and experts on large data to explain what big data technologies are and what they are not, and what big data technologies mean for the future of data mining.
The development background of large data technology
For large companies, the rise in large data is partly because computing power is available at lower cost, and systems are now capable of multitasking. Second, the cost of memory is also plummeting, and businesses can handle more data in memory than ever before. And it's getting easier to aggregate computers into server clusters. Carl Olofson, IDC's database management analyst, believes the combination of these three factors has spawned big data.
"Not only do we do these things well, but we can do them at a lower cost," he said. "In the past, some large supercomputers have been involved in heavy processing systems, built together into tightly aggregated clusters, but because they are specially designed hardware, it costs hundreds of thousands of or even millions of of dollars." And now we can get the same computing power with ordinary merchandising hardware. This helps us to process more data more quickly and cheaply. ”
Of course, not all companies with large data warehouses can say they are using large data technology. IDC argues that for a technology to be a big data technology, it must first be cost-affordable, followed by the need to meet two of the three "V" criteria described by IBM: Diversity (produced), Volume (volume), and velocity (velocity).
Diversity means that data should contain structured and unstructured data. Volume refers to the amount of data that is aggregated together for analysis to be very large. Speed, however, means that data processing must be fast. Olofson says big data "is not always said to have hundreds of TB." Depending on the actual usage, sometimes hundreds of gigabytes of data can also be called large data, which depends mainly on its third dimension, i.e., speed or time dimension. If I can analyze 300GB of data in 1 seconds, and usually it takes 1 hours, the results of this huge change will add great value. Large data technology is an affordable application that achieves at least two of these three criteria. ”
(Responsible editor: Lu Guang)