Big Data: Storage technology must keep up

Last Update:2014-12-18 Source: Internet

Author: User

Keywords Large data can storage systems these

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Big data" usually refers to data sets that are huge, difficult to collect, process, analyze, and those that are kept in the traditional infrastructure for long periods of time. The "big" here has several meanings, it can describe the size of the organization, and more importantly, it defines the size of the IT infrastructure in the enterprise. The industry has an infinite expectation of large data applications. The more value the business information accumulates, the more it's worth, but we need a way to dig it out.

Maybe people's impressions of big data are mainly from the cheapness of storage capacity, but in fact, businesses are creating a lot of data every day, and more and more, and people are trying to search for valuable business intelligence from the myriad of data. On the other hand, the user will also save the data that has been analyzed, because the old data can be compared with the new data collected in the future, still have potential utilization.

Why do
need large data? Why now?

In addition to the ability to store more data than ever before, we have to face more data types. The sources of these data include online trading, social networking, automatic sensors, mobile devices and scientific instruments, among others. In addition to the fixed sources of data production, various transactions can also speed up the accumulation of data. For example, the explosive growth of social-type multimedia data stems from new online transactions and record-keeping practices. Data is always growing, but the ability to store huge amounts of data is not enough because it does not guarantee that we can successfully search for business value from it.

data is an important factor of production

Information Age, data has become an important factor of production, like capital, labor and raw materials, and other elements, and as a general demand, it is no longer limited to certain special industries applications. Companies from all walks of life collect and use a large number of data analysis results to reduce costs as much as possible, improve product quality, improve production efficiency and create new products. For example, by analyzing data collected directly from the product test site, you can help the enterprise improve the design. In addition, a company can surpass its rivals by analyzing customer behavior in depth, contrasting a lot of market data.

storage technology must keep up with

with the explosive growth of large data applications, it has spawned its own unique architecture, and has also directly driven the development of storage, networking and computing. After all, the special need to deal with big data is a new challenge. The development of hardware is ultimately driven by software requirements, and in this case, it is clear that large data analysis application requirements are affecting the development of the data storage infrastructure.

On the other hand, this change is not an opportunity for storage vendors and other IT infrastructure vendors. With the continuous growth of structured and unstructured data and the diversification of data sources, the design of storage systems has not been able to meet the needs of large data applications. As the storage vendors realized this, they began to modify the architecture design of block and file based storage systems to accommodate these new requirements. Here we discuss what attributes are relevant to the large data storage infrastructure to see how they meet the challenges of large data.

capacity Problem

here the "large capacity" can usually reach the PB-level data scale, therefore, the mass data storage system must have a corresponding level of expansion capabilities. At the same time, the expansion of the storage system must be simple, you can increase the capacity by adding modules or disk cabinets, and even do not need downtime. Based on this demand, customers are now increasingly favoring the storage of scale-out architectures. Scale-out cluster structure is characterized by a certain amount of storage capacity of each node, in addition to the internal data processing capacity and interconnection equipment, and traditional storage system chimney-style architecture is completely different, scale-out architecture can achieve seamless and smooth expansion to avoid storage islands.

"Big Data" applications, in addition to the sheer scale of the data, mean a large number of files. Therefore, how to manage the accumulated metadata of the filesystem layer is a difficult problem, and improper handling can affect the scalability and performance of the system, which is the bottleneck of traditional NAS systems. Fortunately, there is no such problem with object-based storage architectures, which can manage the number of files at level 1 billion in a single system, and are not plagued by metadata management like traditional storage. Object-based Storage systems also have wide-area scalability to deploy and form a large, cross-regional storage infrastructure in several different locations.

Delay Problem

"Big Data" application still has the problem of real time. In particular, it involves applications related to online transactions or financial classes. For example, the online advertising service in the apparel sales industry requires real-time analysis of customer browsing records and accurate advertising. This requires the storage system to be able to support these features while maintaining a high response speed, as the result of response latency is that the system pushes "expired" advertising content to the customer. In this scenario, the Scale-out architecture's storage system can play an advantage because each of its nodes has a processing and interconnect component that can grow synchronously while increasing capacity. The object-based storage System can support concurrent data flow, and further improve data throughput.

There are many "big data" applications that require high IOPS performance, such as HPC high-performance computing. In addition, the popularity of server virtualization has led to a need for high iops, just as it has changed the traditional IT environment. In order to meet these challenges, various models of solid-state storage equipment emerged, small to simple within the server to do the cache, large to solid-state media, such as scalable storage system, etc. are booming.

Concurrent access once enterprises realize the potential value of large data analysis applications, they will compare more datasets into systems, while allowing more people to share and use the data. In order to create more business value, enterprises tend to synthetically analyze the various data objects from different platforms. The storage infrastructure, including the global file system, can help users solve data access problems, and the global file system allows multiple users on multiple hosts to access file data concurrently, which can be stored on multiple different types of storage devices in multiple locations.

Security Issues

some special industries, such as financial data, medical information and government intelligence, have their own security standards and confidentiality requirements. Although this is no different for IT managers, and all of them have to be obeyed, but large data analysis often requires multiple types of data to reference each other, and in the past there is no such data mixed access, so large data applications also spawned some new security issues to consider.

cost Problem

"Big" can also mean expensive. Cost control is a key issue for companies that are using large data environments. Trying to control costs means that we have to make each device more "efficient" while also reducing the expensive parts. Currently, technologies such as data deduplication have entered the primary storage market and can now handle more data types, which can bring more value to large data storage applications and improve storage efficiency. In an environment where data volumes are growing, a significant return on investment can be achieved by reducing the consumption of back-end storage, even if only a few percentage points. In addition, the use of automated compact configuration, snapshots, and cloning technology can also improve storage efficiency.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More