Many years ago, the industry was discussing a topic: How to deal with massive data? In particular, some need to store a large number of user data industry, finance, telecommunications, insurance and other popular industries. Users are likely to generate large amounts of data almost every hour of the day, and storage devices in these industries have to be meticulously documented. With the rapid increase of data volume, many industry users began to find ways to change the "number" for the treasure, from the massive data mining valuable information.
If only a large amount of structured data, then the solution is relatively simple, users to buy more storage equipment, improve the efficiency of storage devices and other solutions to such problems. However, when people find that the data in the database can be divided into three types: structural data, unstructured data and semi-structured data and other complex situations, the problem seems to be less simple.
Big data hits.
When the type of complex data surges, the impact on the user IT system will be another way to deal with. Many industry experts and third party investigators have found that the big data age is coming, through some market research data. The survey found that 85% of the data in these complex data belong to the unstructured data in social networks, IoT, E-commerce, etc. The generation of these unstructured data is often accompanied by the emergence and application of new channels and technologies such as social networks, mobile computing and sensors.
Now the concept of large data also has a lot of hype and a lot of uncertainty. To this end, the editor in detail to some industry experts to understand the relevant issues, ask them to talk about what the big data is and what is not, and how to deal with large data and other issues, the form of a series of articles to meet with netizens.
Many TB datasets are also referred to as "large data." According to IDC, a market research firm, data usage is expected to grow 44 times times and global data usage will reach approximately 35.2ZB (1ZB = 1 billion TB). However, the file size of a single dataset will also increase, resulting in a need for greater processing power to analyze and understand these datasets.
EMC has said that its more than 1000 customers use 1PB (gigabit) data in their arrays, which will grow to 100,000 by 2020. Some customers will also start using more than thousands of times times more data in a year or two, 1EB (1 bytes = 1 billion GB) or more.
For large companies, the rise in large data is partly because computing power is available at lower cost, and systems are now capable of multitasking. Second, the cost of memory is also plummeting, businesses can handle more data in memory than ever before, and it is simpler to aggregate computers into server clusters. IDC believes that the combination of these three factors has spawned big data. At the same time, IDC said that for a technology to be a big data technology, it must first be cost-affordable, followed by the three "V" criteria described by IBM: Diversity (produced), Volume (volume), and speed (velocity).
Diversity means that data should contain structured and unstructured data.
Volume refers to the amount of data that is aggregated together for analysis to be very large.
Speed, however, means that data processing must be fast.
Big Data "is not always said to have hundreds of TB. Depending on the actual usage, sometimes hundreds of gigabytes of data can also be called large data, which depends mainly on its third dimension, i.e., speed or time dimension.
Garter says global information is growing at an annual rate of more than 59% per cent, while volume is a significant challenge in managing data, business, and IT leaders must focus on information, type and speed.
Volume: The increase in the amount of data within an enterprise system is caused by transaction volume, other traditional data types, and new data types. Too much is a storage problem, but too much data is an issue of analysis.
Category: It leaders have been plagued by the transformation of a lot of trading information into decisions-there are now more types of information needs to be analyzed-mainly from social media and mobile (situational awareness). Categories include tabular data (database), layered data, files, e-mail, metering data, video, static images, audio, stock quotes, financial transactions, and more.
Speed: This involves the flow of data, the creation of structured records, and the availability of access and delivery. Speed means how fast data is being generated and how quickly data must be processed to meet demand.
While big data is a big issue, Gartner analysts say the real problem is making big data more meaningful and finding patterns in big data to help organizations make better business decisions.
(Responsible editor: The good of the Legacy)