Data Center Storage architecture

Source: Internet
Author: User
Keywords Storage System DFS
Tags access application application system applications bandwidth based basic block

The storage system is the core infrastructure of the IT environment in the data center, and it is the final carrier of data access. Storage in cloud computing, virtualization, large data and other related technologies have undergone a huge change, block storage, file storage, object storage support for a variety of data types of reading; Centralized storage is no longer the mainstream storage architecture of data center, storage access of massive data, need extensibility, Highly scalable distributed storage architecture.

In the new IT development process, the data center construction has entered the cloud computing era, the enterprise IT storage environment can not simply build the cloud computing data Center storage environment from the general business operation demand. The Cloud computing data center is not built to meet the special goals of a particular business system, but to realize that all business systems can achieve flexible resource scheduling on the cloud platform, good scalability, flexible business expansion, and fast delivery. So, it's a bottom-up construction model (shown in Figure 1), based on the cloud computing platform construction before the application system needs, and no longer with the specific business bundle, application system construction, expansion, upgrade mainly software, hardware physical resources to the resource pool application, storage system to become the cloud data center can be allocated, can be scheduled resources , in this case, help to eliminate bottlenecks, improve processing speed, make the business system stable, efficient and lasting operation.

  

Fig. 1 Development of system construction in data center

I. Evolution of data center storage architecture

As data centers evolve from an enterprise-wide application of isolated systems to large-scale cloud computing services in the Internet phase, the storage architecture is evolving (see Figure 2). From meeting the performance and capacity requirements of critical systems, to the virtualization architecture to consolidate data center storage resources, provide on-demand storage services and automated operational dimensions, and further to the storage system of intelligent, agile evolution, application requirements change is the storage architecture to improve the driving force of continuous improvement, silo-type, virtualization, Cloud storage Three architectures coexist is the current situation, the appearance of software definition storage architecture is the storage development stage of post cloud computing era.

  

Figure 2 Storage System architecture and management evolution

Silo Type structure

For early systems, in the host architecture, data and logic are integrated, using process-oriented design methods, each application is an isolated system, maintenance is relatively easy, difficult to integrate; the client/server architecture separates the logic from the data (whether C/S or b/s mode, the essence is the client/ Server architecture), the same use of object-oriented design methods, each application is an isolated system, providing a certain background integration capabilities. The storage of this kind of architecture has formed its own independence with the construction of the system, the hardware equipment of the business platform is configured according to the maximum number of users during the planning period, and it is impossible to evaluate the storage scale and performance requirements in the initial business and the business development situation, which often wastes a lot of hardware equipment resources and space and power resources , and the hardware resources can not be flexibly scheduled. Each line of business needs to go through the software selection, assessment resources, hardware selection, procurement and implementation of the links, the business on-line process is long, time span is big, not conducive to business development.

Storage Virtualization

With the development of the business, data center storage inevitably forms a large number of heterogeneous environment, the standardized management process is difficult to implement. The storage virtualization architecture realizes centralized management of different storage devices, unifies and forms a storage pool, masks the particularity of storage device hardware to the server layer, and realizes the unified logic characteristic, thus realizing the centralized, unified and convenient management of storage system. Make all storage volumes in the storage pool have the same attributes, such as performance, redundancy, backup requirements, or costs, and automate (such as LUN management) and policy-based centralized storage management.

At the same time, the automated management of storage resources provides users with a higher level of policy choice. In a storage pool, you can define a variety of storage tools to represent different service levels for different business areas or storage users. In addition, users are allowed to manage storage resources within each storage pool in a modular manner, add, remove, or change as needed, while maintaining transparency in the Application server business system. Policy-based Storage virtualization manages the entire storage infrastructure, maintains a reasonable allocation of storage resources, has higher priority applications with higher storage priorities, uses the best performance storage, and uses inexpensive storage for low-priority applications.

Cloud Storage Architecture

With the advent of the data age of large-scale cloud computing, the cloud storage architecture delivers storage as a cloud, both as an enterprise private cloud and as a public cloud, focusing on the creation and distribution of large amounts of storage data, and on accessing data quickly through the cloud. The cloud storage architecture needs to support the storage, backup, migration, and transmission of large-scale data loads, while requiring significant cost, performance, and management advantages.

Cloud storage technology deployment, through the cluster application or Distributed file system functions, the network a large number of different types of storage devices through the application software together to work together to provide data storage and business access functions of a system to ensure data security, and save storage space.

In large-scale system support, distributed file system, technologies such as distributed object storage provide highly scalable, scalable and resilient support and strong data access for various applications of cloud storage, and because of the support of these distributed technologies for standardized hardware, the massive cloud storage can be built and transported in a low-cost capacity.

Cloud storage is not to replace the existing disk array, but in order to cope with the rapid growth of data and bandwidth generated by the new form of storage system, so cloud storage in the construction of the focus on the three points: simple expansion, easy to increase performance, ease of management.

Software definition Storage

Software definition storage is not currently defined exactly, but software definition storage represents a trend in the separation of software and hardware in the storage architecture, i.e. the separation of the data layer and the control layer. For data center users, software is used to manage and schedule storage resources, such as flexible volume migrations, without having to consider the hardware itself.

Through the software definition storage realizes the storage resources virtualization, the abstraction, the automation, can complete realizes the data center storage System deployment, the management, the monitoring, the adjustment and so on many requirements, causes the storage system to have the flexibility, the free and the high availability and so on the characteristic.

Second, the data Center storage technology architecture

1. Data type

Data center storage data types have changed a lot, according to the structure of the degree, can be roughly divided into the following three kinds.

0 storage and application of structured data. This is a user-defined data type that contains a series of attributes, each of which has a data type and is stored in a relational database. General business systems have a large number of structured data, generally stored in Oracle or MySQL, such as relational databases, in the enterprise data centers, generally in the centralized storage architecture to save, or become the primary storage system, block storage access primarily.

0 Storage and application of unstructured data. Compared to structured data, it is not convenient to use the database two-dimensional logical table to represent the data is called unstructured data, including all forms of Office documents, text, pictures, XML, HTML, all kinds of reports, images and audio/video information, etc., Distributed File system is the main technology to achieve unstructured data storage.

0 semi-structured data storage and application. The semi-structured data model has some structural features, but it is more flexible than traditional relational and object-oriented models, between the data of fully structured data (such as relational database, data in object-oriented database) and completely unstructured data (such as sound, image file, etc.). Semi-structured data models are completely not based on the strict concepts of traditional database schemas, and the data in these models are self-describing. Because semi-structured data does not have a strict semantic definition, it is not suitable for storing in traditional relational databases, and the database for storing such data is called the "NoSQL" database.

2. Blocks, files, objects

Block storage

For storage systems, block reads and writes are generally the data concepts on storage media, and for disks The Block data storage unit is one or more disk sectors. Therefore, block-level data reading and writing, is oriented to the bottom of the physical layer, data operations based on the starting sector number, operation code (read, write, etc.), the number of consecutive sectors, block data access interface is a SCSI interface. There are two common types of block storage.

0DAS (Direct Attach STorage). is a storage method that is directly connected to the host server, each host server has a separate storage device, each host server storage equipment is not interoperable, the need to access data across the host, must undergo a relatively complex settings, if the host server is different operating systems, to access each other's data, is more complex, Some systems are not even accessible. Usually used in a single network environment and the data exchange volume is not high, the performance requirements of the environment, is the early implementation of the technology.

0SAN (Storage area receptacle). is a high-speed network to connect the host server and storage equipment, a storage system is located in the back end of the host group, it uses high-speed I/O network connectivity, FC, ISCSI, FCoE as the current mainstream form. In general, San applications are characterized by high cost and good performance in applications that require high network speed, high reliability and security requirements for data, and high performance requirements for data sharing. It takes a set of SCSI block I/O commands, data access at the SAN Network level provides high-performance random I/O and data throughput, with high bandwidth and low latency, but because San systems are expensive and do not have large-scale scalability, they do not meet the storage needs of large cloud computing data centers.

  

Table 1 several file systems

File storage

For files, which are "accessed by name," in order to differentiate between different files on the disk, you need to give each file a certain name, called a filename, to represent the files on the disk so that you can "find by name" files on disk. File data operations are based on file name, offset, read-write bytes, and so on, but the file itself does not have the properties of the file itself, metadata information. Multiple forms of file systems are developed based on file storage for different environments (as shown in table 1).

Object storage

Objects are self-contained, contain metadata, data, and attributes, and can be managed themselves, and the objects themselves are equal. In other words, objects are distributed in a flat space, instead of a tree-like logical structure like a file system, object storage is based on IDs, which directly accesses data based on IDs, separating data paths (data reads or writes) from Control paths (metadata) and object-based storage devices ( object-based Storage device,osd) constructs the storage system, each object storage device has the certain intelligence, can manage its data distribution automatically. Typical representative: Swift, CEPH

View blocks, files, object storage from the device level (as shown in table 2)

  

Table 2 device-level access comparisons for block, file, and object storage

3. Primary storage Architecture

Primary storage is an important storage system in the data center, often referred to as Tier1 storage, for storing active data (data that is often required to be accessed) and for data that requires high performance, low latency, and high availability. Primary storage is typically used to support data center mission-critical programs such as databases, e-mail, and transaction processing. Most important programs use random data access patterns with different access requirements, but they can generate a lot of data that companies can use to do business.

Even with the advent of more and more new data storage technologies in the virtual world, traditional primary storage systems are still prevalent. Das is the oldest primary storage architecture, but sans have become the most widely used and most popular storage architectures at the moment. NAS is used for file-sharing applications in the data center, and it also uses a SAN extension on the back end. In the data center area, the vast majority of vendors are also using SAN architecture to deploy the primary storage solution for advanced users, and on this basis related disaster recovery and storage virtualization scenarios.

SAN is characterized by high performance, strong stability and high price. In some of the most important applications where real-time service requirements are high, such as databases that require centralized storage, still the mainstream technology in storage applications where high-end applications that require centralized storage are assumed by the SAN, and small file-based applications are more appropriate for Nas, making SAN and NAS complementary storage architectures.

In the initial data center, most of the data is master data. As data grows, large amounts of data are typically moved to secondary and level three storage. Thus, as storage technology grows and business matures, data centers are gradually looking for ways to reduce primary storage to take full advantage of capacity and reduce the cost of the entire data lifecycle.

4. Distributed File Storage architecture

The main function of the Distributed file system is to store the unstructured data, such as documents, images and video, which is based on the network, manages the system resources in a global way, and it can dispatch the storage resources in the network, and the scheduling process is "transparent".

The distributed storage System uses the scalable system structure, uses the multiple storage server to share the storage load, uses the location server to locate the storage information, not only enhances the system reliability, the usability and the access efficiency, but also is easy to expand. Distributed storage System with high performance and high capacity as its main feature.

HDFS (Hadoop Distributed File System) is a family member of the Open-source Project Hadoop and is an open source implementation of Google File system Googlefs, the following is a brief introduction to HDFS working mode.

HDFs is designed to be a distributed file system that runs on universal hardware and is a highly fault tolerant system for deployment on inexpensive machines. HDFS provides high throughput data access and is ideal for unstructured data and semi-structured applications on large datasets. Programs running on top of HDFs have a large set of datasets, typically HDFs file sizes that are GB to terabytes, so HDFs is adjusted to support large files. A HDFs cluster is composed of a namenode and a certain number of datanode (as shown in Figure 3):

A Namenode (name node) is a central server or group of HDFs that manages the Directory namespace information (namespace) and client access to files in the file system, and manages all datanode;

The DataNode (data node) is responsible for managing the storage block (data blocks) that are included with this node in HDFs. Inside the HDFs, the file is not placed on a disk, and a file is actually divided into blocks (chunks), which are dispersed in datanode clusters and Namenode record blocks correspond to the mapping relationships on different datanode.

Namenode accepts the client's metadata request, and then emits block OPS instructions for Datanode, file creation, deletion, and copy operations, while determining the mapping of block to concrete datanode nodes. Datanode creates, deletes, and replicates blocks under Namenode management.

  

Figure 3 HDFs Classic architecture diagram

HDFs reliability and performance are achieved primarily through a copy of the data block, and HDFs uses a strategy called rack-aware (Rack-aware) to improve the reliability, effectiveness, and utilization of network bandwidth.

In the case of a normal replica number of 3, the HDFs policy holds a copy on the local rack, one copy on the other node on the same rack, and one node on the different racks on the last copy. When reading, to reduce overall bandwidth consumption and read latency, if the client has a copy of the same rack, then read the copy.

HDFs is still a master-slave structure, namenode become the bottleneck of the whole system and the key point of failure, therefore, many users who use Distributed File system constantly improve their high availability, such as the development of a no central storage architecture.

5. Distributed Object Storage Architecture

In object storage, not only data is stored, but also property information related to rich data. Each object is assigned a unique OID (object ID). The object itself is equal, and all OIDs belong to a flat address space, rather than a tree-like logical structure like the file system. Object storage Access objects can only be identified through a unique OID, without the need for complex path structures, without the concept of "path" he "folder". The object storage schema has the following components.

Object

object is the basic unit of data storage in a system. An object is actually a combination of file data and a set of attribute information (Meta Data) that define file-based RAID parameters, data distribution, and quality of service, while traditional storage systems use files or blocks as basic storage units, In a block storage system, you also need to always track the properties of each block in the system, and the object maintains its own properties by communicating with the storage system. In a storage device, all objects have an object identity, which is accessed through the object Identity command. There are usually several types of objects, and the root object on the storage device identifies the storage device and the various properties of the device, which are collections of objects that share resource management policies on the storage device.

Object storage device (Osd,object Storage Device)

The OSD has its own CPU, memory, network, and disk systems, and the difference between the block devices is not the storage media, but the access interface provided by both. The main functions of OSD include data storage and secure access, so the object storage device is usually implemented by the standardized computing cell structure. The OSD performs a mapping from the object to the block, which allows the local entity to best determine how to store an object, the OSD storage node not only has storage capabilities, but also includes advanced capabilities for intelligence. Traditional storage drives are just as target in response to client-side I/O requests, while object storage devices are smart devices that simultaneously perform target and initiator functions and support communication and collaboration with other object storage devices, such as data allocation, replication, and recovery.

Meta Data server (MDS, Metadata server)

The work of a metadata server is to manage the namespace of the file system, control the interaction between client and OSD objects, and cache and synchronize distributed metadata. Although both metadata and data are stored in an object storage cluster, they are managed separately to support scalability.

Client clients for the object storage System

In order to effectively support client support for accessing objects on the OSD, it is necessary to implement the client of the object storage system at the compute node, typically providing a POSIX file system interface that allows the application to perform the same as standard file system operations.

On the client side, the file system is transparent to the user, and Linux is accessed by the kernel virtual file system Exchange (VFS) for low-level operations. End users access large capacity storage systems without knowing the metadata servers, monitors, and stand-alone object storage devices that are aggregated into the mass storage pool below. The intelligent processing of the file system is distributed on the nodes, which simplifies the client interface and can support the large-scale dynamic expansion capability.

Object storage is built on the standard hardware storage infrastructure, eliminates the need for RAID (redundant array of disks) to achieve high availability and scalability by introducing consistent hashing technology and data redundancy at the software level, sacrificing a certain degree of data consistency to support multi-tenant mode, container and object read and write operations, Suitable for solving the problem of unstructured data storage in the application environment of the Internet.

As with common Distributed file systems, files placed in object storage clusters are striped and put into cluster nodes based on specific data distributed algorithms. The application can communicate with the OSD nodes through the RESTful interface and store the objects directly in the cluster.

Iii. concluding remarks

Enterprise and Internet data at the rate of 50% per year growth, the new data in the total amount of structured data is limited, mostly unstructured, semi-structured data, data center storage architecture with the business development needs of a strong flexibility to adapt, low-cost, large capacity (massive) expansion, High concurrency performance is a basic technical attribute for operating storage architectures for large cloud data centers. How to make a large number of data storage and deep application processing, and quickly extracting valuable information, forming a rapid business decision become the survival basis of all types of enterprises is the future storage and around the storage structure of the business development direction. Therefore, the change direction of data storage technology must be distributed parallel file system, parallel database, efficient and unified data loading and access, multi interface, flexible expansion, and multi service hosted solution capability is the key to large capacity and mass storage technology architecture.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.