Big Data hits
Many years ago, the industry was discussing a topic: How to deal with massive data? In particular, some need to store a large number of user data industry, finance, telecommunications, insurance and other popular industries. Users almost every hour of the day, are likely to produce a large number of data, these industries storage equipment, must be the data generated during the period of meticulous record, in order to prevent loss, but also must do backup, but also have to do off-site disaster recovery backup, which is not finished, business interruption events can not exceed the number of time range, Otherwise, it is a major accident, so the business continuity must be ensured through IT systems.
However, when people find that the data in the database can be divided into three types: structural data, unstructured data and semi-structured data and other complex situations, the problem seems to be less simple. If only a large amount of structured data, then the solution is relatively simple, users to buy more storage equipment, improve the efficiency of storage devices and other solutions to such problems.
However, when the type of complex data surges, then the impact on the user's IT system will be another way to deal with. Many industry experts and third party investigators have found that the big data age is coming, through some market research data.
Among them, 85% of the data belong to a wide range of social networks, IoT, E-commerce and other unstructured data. The generation of these unstructured data is often accompanied by the emergence and application of new channels and technologies such as social networks, mobile computing and sensors.
The concept of large data, like cloud computing, also has a lot of hype and a lot of uncertainty. To this end, we consulted a number of analysts and experts on large data to explain what big data is and what is not, and what big data means to the future of data mining.
In the context of continued cloud computing and the growing competition between tablet makers, 2011 is expected to see more TB (1TB = 1000 GB) datasets for business intelligence and Business Analytics. Multiple TB datasets are also called large data. According to IDC, a market research firm, data usage is expected to grow 44 times times and global data usage will reach approximately 35.2ZB (1ZB = 1 billion TB). However, the file size of a single dataset will also increase, resulting in a need for greater processing power to analyze and understand these datasets.
Storage giant EMC points out that its more than 1000 customers use 1PB (gigabit) data in their arrays, which will grow to 100,000 by 2020. Some customers will also start using more than thousands of times times more data in a year or two, 1EB (1 bytes = 1 billion GB) or more.
For large companies, the rise in large data is partly because computing power is available at lower cost, and systems are now capable of multitasking. Second, the cost of memory is also plummeting, and businesses can handle more data in memory than ever before. And it's getting easier to aggregate computers into server clusters. Carl Olofson, IDC's database management analyst, believes the combination of these three factors has spawned big data.
IDC believes that a technology that wants to be a big data technology must first be cost-affordable, followed by the need to meet two of the three v criteria described by IBM: Diversity (produced), Volume (volume), and velocity (velocity).
Diversity means that data should contain structured and unstructured data.
Volume refers to the amount of data that is aggregated together for analysis to be very large.
Speed, however, means that data processing must be fast.
' Big data doesn't always say hundreds of TB, ' says Olofson.
Depending on the actual usage, sometimes hundreds of gigabytes of data can also be called large data, which depends mainly on its third dimension, i.e., speed or time dimension.
If I can analyze 300GB of data in 1 seconds, and usually it takes 1 hours, the results of this huge change will add great value. Large data technology is an affordable application that achieves at least two of these three criteria.
Global information is growing at an annual rate of more than 59% per cent, while volume is a significant challenge in managing data, business, and IT leaders must focus on information, type and speed.
Volume: The increase in the amount of data within an enterprise system is caused by transaction volume, other traditional data types, and new data types. Too much is a storage problem, but too much data is an issue of analysis.
Category: It leaders have been plagued by the transformation of a lot of trading information into decisions-there are now more types of information needs to be analyzed-mainly from social media and mobile (situational awareness). Categories include tabular data (database), layered data, files, e-mail, metering data, video, static images, audio, stock quotes, financial transactions, and more.
Speed: This involves the flow of data, the creation of structured records, and the availability of access and delivery. Speed means how fast data is being generated and how quickly data must be processed to meet demand.
While big data is a big issue, Gartner analysts say the real problem is making big data more meaningful and finding patterns in big data to help organizations make better business decisions.
A survey of 531 independent Oracle users by Unisphere Research found that 90% of businesses are rapidly rising, with 16% per cent growing at 50% or higher annually. The Unisphere Research survey found that 87% of respondents blamed the company's application performance problems on growing volumes of data.
Hundred on how to define large data
Although big data can be translated into large or massive data, there is a difference between large and massive data.
Definition One: Large data = mass data + complex type of data
Informatica, chief product advisor in China but Bin said: Big Data contains the meaning of massive data, and in the content beyond the mass of data, in short, large data is massive data + complex type of data.
But bin further pointed out that large data, including transactions and interactive data sets, all data sets, its size or complexity than the common technology in accordance with reasonable cost and time to capture, manage and process these datasets.
The big data is made up of three key technology trends:
Massive transaction data: Traditional relational data as well as unstructured and semi-structured information continue to grow in online transaction processing (OLTP) and analysis systems from ERP applications to data warehousing applications. This situation becomes more complex as businesses move more data and business processes to public and private clouds.
Massive Interactive data: the forces of the Force are made up of social media data from Facebook, Twitter, LinkedIn and other sources. It includes call detail records (CDR), device and sensor information, GPS and geo-location mapping data, mass image files transmitted through the managed file transfer (Manage file transmits) protocol, Web text and click Stream data, scientific information, e-mail, and so on.
Massive data processing: the emergence of large numbers has spawned architectures designed for data-intensive processing, such as Apache Hadoop, which has open source and runs in the commodity hardware cluster. The challenge for businesses is to quickly and reliably access data from Hadoop in a cost-effective way.
Definition Two: Large data includes three elements a, B, C
How do you understand large data? Chen, general manager of NETAPP Greater China, argues that big data means making things different and breaking through faster access to information. Large data is defined as a large amount of data (usually unstructured) that requires us to rethink how to store, manage, and recover data. So how big is it? One way to think about the problem is that it is so large that none of the tools we use today can handle it, so the key to how to digest the data and transform it into valuable insights and information is transformation.
Based on the workload requirements learned from customers, NetApp understands large data including a, B, and C three elements: analysis (analytic), bandwidth (bandwidth), and content.
1. Big Analytics, which helps gain insights-refers to the requirement for real time analysis of huge datasets that bring new business models, better customer service, and better results.
2. High bandwidth (big bandwidth) to help move faster-refers to the requirement to handle critical data at extremely high speeds. It enables fast and efficient digestion and processing of large datasets.
3. Large content, without losing any information-refers to highly scalable data storage with high security requirements and easy recovery. It supports manageable repositories of information content, not just data that has been stored for too long, and can span different continents.
Large data is a groundbreaking economic and technical force that introduces new infrastructure to IT support. Large data solutions eliminate traditional computing and storage limitations. With growing private and public data, an epoch-making new business model is emerging that promises to bring new substantial revenue growth and competitive advantages to large data customers.