The difference between "big data" and "mass data"

Source: Internet
Author: User
Keywords Large data large data massive data large data massive data complex large data massive data complex more large data massive data complex more must

If only a large amount of structured data, then the solution is relatively simple, users to buy more storage equipment, improve the efficiency of storage devices and other solutions to such problems. However, when people find that the data in the database can be divided into three types: structural data, unstructured data and semi-structured data and other complex situations, the problem seems to be less simple.


Big data hits.


When the type of complex data surges, the impact on the user IT system will be another way to deal with. Many industry experts and third party investigators have found that the big data age is coming, through some market research data. The survey found that 85% of the data in these complex data belong to the unstructured data in social networks, IoT, E-commerce, etc. The generation of these unstructured data is often accompanied by the emergence and application of new channels and technologies such as social networks, mobile computing and sensors.


Now the concept of large data also has a lot of hype and a lot of uncertainty. To this end, the editor in detail to some industry experts to understand the relevant issues, ask them to talk about what the big data is and what is not, and how to deal with large data and other issues, the form of a series of articles to meet with netizens.


Many TB datasets are also referred to as "large data." According to IDC, a market research firm, data usage is expected to grow 44 times times and global data usage will reach approximately 35.2ZB (1ZB = 1 billion TB). However, the file size of a single dataset will also increase, resulting in a need for greater processing power to analyze and understand these datasets.


EMC has said that its more than 1000 customers use 1PB (gigabit) data in their arrays, which will grow to 100,000 by 2020. Some customers will also start using more than thousands of times times more data in a year or two, 1EB (1 bytes = 1 billion GB) or more.


For large companies, the rise in large data is partly because computing power is available at lower cost, and systems are now capable of multitasking. Second, the cost of memory is also plummeting, businesses can handle more data in memory than ever before, and it is simpler to aggregate computers into server clusters. IDC believes that the combination of these three factors has spawned big data. At the same time, IDC said that for a technology to be a big data technology, it must first be cost-affordable, followed by the three "V" criteria described by IBM: Diversity (produced), Volume (volume), and speed (velocity).


Diversity means that data should contain structured and unstructured data.


Volume refers to the amount of data that is aggregated together for analysis to be very large.


Speed, however, means that data processing must be fast.


Big Data "is not always said to have hundreds of TB. Depending on the actual usage, sometimes hundreds of gigabytes of data can also be called large data, which depends mainly on its third dimension, i.e., speed or time dimension.


Garter says global information is growing at an annual rate of more than 59% per cent, while volume is a significant challenge in managing data, business, and IT leaders must focus on information, type and speed.


Volume: The increase in the amount of data within an enterprise system is caused by transaction volume, other traditional data types, and new data types. Too much is a storage problem, but too much data is an issue of analysis.


Category: It leaders have been plagued by the transformation of a lot of trading information into decisions-there are now more types of information needs to be analyzed-mainly from social media and mobile (situational awareness). Categories include tabular data (database), layered data, files, e-mail, metering data, video, static images, audio, stock quotes, financial transactions, and more.


Speed: This involves the flow of data, the creation of structured records, and the availability of access and delivery. Speed means how fast data is being generated and how quickly data must be processed to meet demand.


While big data is a big issue, Gartner analysts say the real problem is making big data more meaningful and finding patterns in big data to help organizations make better business decisions.


Hundred on how to define "big data"


Although "Big Data" can be translated into large or massive data, there is a difference between large data and massive data.


Definition One: Large data = mass data + complex type of data


Informatica, chief product advisor in China but Bin said: "Big Data" contains the meaning of "massive data", and in the content beyond the mass of data, in short, "Big Data" is "mass data" + Complex type of data.


But bin further pointed out that large data, including transactions and interactive data sets, all data sets, its size or complexity than the common technology in accordance with reasonable cost and time to capture, manage and process these datasets.


The big data is made up of three key technology trends:


Massive transaction data: Traditional relational data as well as unstructured and semi-structured information continue to grow in online transaction processing (OLTP) and analysis systems from ERP applications to data warehousing applications. This situation becomes more complex as businesses move more data and business processes to public and private clouds.


Massive Interactive data: the forces of the Force are made up of social media data from Facebook, Twitter, LinkedIn and other sources. It includes call detail records (CDR), device and sensor information, GPS and geo-location mapping data, mass image files transmitted through the managed file transfer (Manage file transmits) protocol, Web text and click Stream data, scientific information, e-mail, and so on.


Massive data processing: the emergence of large numbers has spawned architectures designed for data-intensive processing, such as Apache Hadoop, which has open source and runs in the commodity hardware cluster. The challenge for businesses is to quickly and reliably access data from Hadoop in a cost-effective way.


Definition Two: Large data includes three elements a, B, C


How do you understand large data? Chen, general manager of NETAPP Greater China, argues that big data means making things different and breaking through faster access to information. Large data is defined as a large amount of data (usually unstructured) that requires us to rethink how to store, manage, and recover data. So how big is it? One way to think about the problem is that it is so large that none of the tools we use today can handle it, so the key to how to digest the data and transform it into valuable insights and information is transformation.


Based on the workload requirements learned from customers, NetApp understands large data including a, B, and C three elements: analysis (analytic), bandwidth (bandwidth), and content.


1. Big Analytics, which helps gain insights-refers to the requirement for real time analysis of huge datasets that bring new business models, better customer service, and better results.


2. High bandwidth (big bandwidth) to help move faster-refers to the requirement to handle critical data at extremely high speeds. It enables fast and efficient digestion and processing of large datasets.


3. Large content, without losing any information-refers to highly scalable data storage with high security requirements and easy recovery. It supports manageable repositories of information content, not just data that has been stored for too long, and can span different continents.


Large data is a groundbreaking economic and technical force that introduces new infrastructure to IT support. Large data solutions eliminate traditional computing and storage limitations. With growing private and public data, an epoch-making new business model is emerging that promises to bring new substantial revenue growth and competitive advantages to large data customers.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.