Big Data Doomsday omens or big business opportunities?

Source: Internet
Author: User
Keywords Big data big data technology these we some

We have all heard the following predictions: By 2020, the amount of data stored electronically in the world will reach 35ZB, which is 40 times times the world's reserves in 2009. At the end of 2010, according to IDC, global data volumes have reached 1.2 million PB, or 1.2ZB. If you burn the data on a DVD, you can stack the DVDs from the Earth to the moon and back (about 240,000 miles one way).

For those who are apt to worry about the sky, such a large number may be unknown, indicating the coming of the end of the world. For optimists, these numbers are an information gold mine, and as technology advances, the wealth that it contains becomes more and more easily mined.

Into the "Big data" era, there are a number of emerging data mining technology, making the data wealth storage, processing and analysis has become cheaper and faster than ever before. As long as there is a supercomputing environment, then large data technology can be used by a large number of enterprises, thus changing the way many industries run business.

Our definition of large data is the use of non-traditional data-filtering tools, including but not limited to Hadoop, to excavate a large collection of structured and unstructured data to provide useful data insights.

The concept of large data and "cloud computing" like, there are a lot of hype and a lot of uncertainty. To this end, we consulted a number of analysts and experts on large data to explain what big data is and what is not, and what big data means to the future of data mining.

The development background of large data

For large companies, the rise in large data is partly because computing power is available at lower cost, and systems are now capable of multitasking. Second, the cost of memory is also plummeting, and businesses can handle more data in memory than ever before. And it's getting easier to aggregate computers into server clusters. Carl Olofson, IDC's database management analyst, believes the combination of these three factors has spawned big data.

"Not only do we do these things well, but we can do them at a lower cost," he said. "In the past, some large supercomputers have been involved in heavy processing systems, built together into tightly aggregated clusters, but because they are specially designed hardware, it costs hundreds of thousands of or even millions of of dollars." And now we can get the same computing power with ordinary merchandising hardware. This helps us to process more data more quickly and cheaply. ”

Of course, not all companies with large data warehouses can say they are using large data technology. IDC argues that for a technology to be a big data technology, it must first be cost-affordable, followed by the need to meet two of the three "V" criteria described by IBM: Diversity (produced), Volume (volume), and velocity (velocity).

Diversity means that data should contain structured and unstructured data. Volume refers to the amount of data that is aggregated together for analysis to be very large. Speed, however, means that data processing must be fast. Olofson says big data "is not always said to have hundreds of TB." Depending on the actual usage, sometimes hundreds of gigabytes of data can also be called large data, which depends mainly on its third dimension, i.e., speed or time dimension. If I can analyze 300GB of data in 1 seconds, and usually it takes 1 hours, the results of this huge change will add great value. Large data technology is an affordable application that achieves at least two of these three criteria. ”

Relationship to open source

"Many people think Hadoop is synonymous with big data," he said. But it was a mistake, "Olofson explained. Examples of Teradata, MySQL, and some of the "Smart Cluster Technologies" implementations do not use Hadoop, but are also considered to be implementation cases of large data.

As an application environment for large data, Hadoop attracts people's attention because it is based on the MapReduce environment, a simplified environment commonly used in the hyper-computation circle, primarily a project created by Google. Hadoop is a hybrid implementation environment that is closely related to various Apache projects, including the HBase database created in the MapReduce environment.

Software developers typically respond by using everything from Hadoop and similar advanced technologies-many of which are developed in the open source community. "They created a dizzying and changeable thing, the so-called NoSQL database, where most of the key values of the database have been optimized for processing power, diversification, or database size," Olofson said.

Open-source technology is generally not commercially supported, "so these things have to evolve over time and gradually eliminate flaws that typically take years." That is to say, fledgling's large data technology is not yet popular in the general market. At the same time, IDC expects at least three commercial vendors to provide some type of support for Hadoop by the end of the year. Other vendors, such as Datameer, also provide analysis tools with Hadoop components that allow businesses to develop their own applications. For example, Cloudera and tableau have used Hadoop in their products.

Upgrading a relational database

Industry watchers generally agree that large data technologies should also be considered when upgrading a relational database management system (RDBMS). "Big data technology applies to situations that are faster, larger, and cheaper," Olofson said. "For example, Teradata makes its system cheaper, scalable and clustered.

Others, however, do not think so. Marcus Collins, Gartner's data management analyst, said, "The BI tool is usually used when using an RDBMS, but this process is not really big data." This process has a long history. ”

So who really uses big data analysis?

A year ago, some of the major users of large data technology were large web companies, such as Facebook and Yahoo, which needed to analyze the stream data. But today, "Big data technology has gone beyond the web, and it is possible for businesses that have a lot of data to deal with." "For example, banks, utilities, intelligence services and so on are taking the big data this car."

In fact, some of the big data technologies have been used by companies with cutting-edge technologies, such as those that are driven by social media to create the appropriate Web services. They are important for the contribution of large data projects.

In other vertical industries, some companies are realising that their value-orientation based on information services is much larger than they had previously imagined, so big data technology has quickly attracted these companies ' attention. Coupled with a drop in hardware and software costs, these companies find themselves in a perfect storm of opportunity for a big business transition.

New York City's TRA company is designed to help TV advertisers gauge the effectiveness of their TV commercials, and it compares the ads that a family receives through television and DVR (digital video recorders) with their bills in a retail store. The company collects data from cable television's DVR and some commodity store membership card programs to make this comparison. The amount of data processed by TRA's large data system represents 1.7 million of households ' viewing habits in seconds--such a large number of tasks can hardly be accomplished without large data technology. The company deploys Kognitia's WX2 database, which allows it to quickly load, describe, and analyze data, collect fine-grained ad viewing information from a DVR, and Fran the details of the point-of-sale to generate customized reports.

"Kognitia has a memory-running solution, so half of our existing entire database can be in memory, which means that when our customer needs to run a query, the response time is second rather than hourly or Japanese," said Tra's CEO Mark Lieberman.

The database can be run on normal hardware, and Tra's own front-end application is built on. Net Visual Studio. "We also use a little bit of MySQL, and the user interface is developed with DevExpress," Lieberman said.

In his view, big data technology could revolutionize the 70 billion-dollar TV advertising market in the United States. The traditional method of advertising evaluation can only be installed in the country's 20,000 sample families to install a special set-top box to analyze sampling data. Today, large data technologies can analyze actual data from 2.5 million DVR and set-top boxes.

Greg Belkin, an analyst with Aberdeen Group, believes that the large data tools used by TRA and other companies meet the speed, volume and diversity criteria of large data. "In retailing, big data is very impressive because the industry has a lot of data to analyze, but by traditional means it's unthinkable," such as social media sites, DVR devices and commodity-store membership card data. "The industry's data rooms are so huge and complex that it is impossible to analyze using traditional database tools, so retailers are turning to big data platforms." ”

Similarly, large data technology has revolutionized the Catalina marketing company in Florida San Petes. The company has a large database of member clients, 2.5PB in size, including historical sales data from 190 million U.S. commodity stores over the years. Its largest database has incredibly 425 million rows of data, and the company needs to manage about 625 million rows of data per day in this database.

By analyzing the data, Catalina can help some major consumer-goods manufacturers and large supermarket chains predict what consumers might buy and who will be interested in new products.

"We want to bring technology to the data, not to bring the data to technology," said Catalina's executive vice president and CIO Eric Williams. "Some of the existing technologies will allow SAS to use their analytics techniques for databases," he said. This has greatly changed their entire business. We have been doing these things before, but because of the serious technical constraints, we are unable to achieve the goals we want to achieve. We have to use some of the tools we have developed, and the things that these tools can achieve are very limited. The advent of large data technology has revolutionized our entire enterprise. ”

In addition to using some Open-source software in its proprietary systems, Catalina also uses SAS analysis tools on the Netezza Data Warehouse device platform.

Big data is fundamentally changing the way America's banks do business. Abhishek Mehta, a former executive director of Bank of America's Big data and analysis, said at the October 2010 Hadoop World Congress, "I think today's Hadoop is like Linux 20 years ago." We all see the success of Linux in the enterprise software market. Hadoop will achieve the same success. Its success is only a matter of time. ”

In addition to analyzing tap-stream data and transaction data, Hadoop can also allow Bank of America to solve various business problems quickly. "What I can think of as a bank is how to get rid of customer fraud," Mehta said. "Now, I can build a model for each customer to backtrack over the past 5 years of every fraud event." Before that, we had to take a sampling approach and build a model that needed to be modeled when a particular case was found to be unsuitable for the model. The day is finally over. ”

The utilities industry is just beginning to understand the application and value of big data. A power company in the Midwest uses Hadoop to analyze data from smart meters that can automate billing functions, but it also collects arbitrary current fluctuations on transmission lines. "If you collect this information and can describe the current change diagram, you can find it before a transformer in a certain place can fail," Olofson said. "Or when a power outage occurs, the company can detect fluctuations and act before the user calls for help. ”

Olofson predicts that at some point in the future, power companies will be able to use large data technology to improve services for their customers and reduce operating costs through power grid monitoring, problem detection and fine-tuning of power grids – but this may require major upgrades to some of the aging infrastructure.

Some brand marketing companies are also using Hadoop to experiment with so-called "emotional analysis" in social media. These service providers use Hadoop to scrutinize customers ' behavior on Twitter and see what they say and think about a particular product.

Act cautiously

Large data technology is developing rapidly. Some companies that are using large data technologies themselves have highly skilled IT professionals who can adapt well to the progress of large data technology and the needs of the enterprise.

"If you don't have the conditions to deploy large data, consider choosing a service provider--perhaps a cloud service provider, or waiting for the big data technology to mature to a certain point, with a lot of mature software products and supported services," Olofson said.

There is no doubt that the field of data mining has changed radically. But analysts say big data technologies will not completely replace today's data warehousing and data-mining tools.

"Existing data mining does not really have very large data, so you need to build a relatively complex analysis model," says Gartner's Collins. "Now, big Data provides companies with very large amounts of data, which means that companies no longer need to build complex analysis models." Therefore, there will be significant changes in the way data mining is analyzed. ”

"My point is that big data actually expands the market capacity of the Data warehouse," Olofson said. Companies use technologies such as mapreduce, whether Hadoop or some other commercial extension, to generate interesting business intelligence data that was not available before. Then, in order to reuse the data and track historical data, the enterprise will put that data into the Data warehouse, which is actually expanding its data warehouse usage. ”

The size of the big data represents another challenge, Collins said, "There is no mature architecture model for deploying and using large data technology, so we have to learn while we're doing it." ”

But Collins believes that some of the risk of large data technology itself is being eliminated because there are already a number of pre-packaged tools to choose from, but the technology is still very much like a programming interface-a throwback to business intelligence. For example, "Hadoop is a highly technical system, but with the impetus of business intelligence, has gradually entered the enterprise and desktop, has a very good user interface." While using Hadoop will take a step back in popularity some of the emerging vendors will help us push it to the user community that needs it. ”

"Big data technology also needs to be a bit of a leap, and we have to give these tools to the users of the business unit, but we can't do that right now," Collins added.

Three misunderstandings about large data technology

There is a lot of confusion in the industry about what big data is and what it can do. Here are three common myths about big data:

1, the relational database cannot be extended to a very large volume, so it does not need to consider large data technology.

2, Hadoop or its extension, such as any mapreduce environment is the best choice for large data, without having to consider the actual workload or the use of the environment.

3. The era of relational database management systems is over. The development of a true relationship can only be achieved in the deployment of large data.

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.