Distributed File system meta-data service model [go]

Source: Internet
Author: User

With the explosion of unstructured data, distributed file system has entered the golden period of development, from high-performance computing to data center, from data sharing to Internet application, has penetrated to all sides of the data application. For most Distributed file systems (or clustered file systems, or parallel file systems), it is common to separate the metadata from the data, which is the separation of the control flow from the data stream, resulting in higher system scalability and I/O concurrency. Therefore, the meta-data management model is very important, which directly affects the expansibility, performance, reliability and stability of the system. One of the biggest challenges of a storage system with its high scale-out characteristics is to record the image relationship between the data logic and the physical location, which is the data metadata, including information such as attributes and access rights. Especially for the application of large amount of small files, meta-data problem is a very big challenge. In general, the metadata management of distributed file system can be broadly divided into three models, that is, the centralized metadata service model, the distributed Metadata Service model and the non-meta-data service model. In academia and industry, these three models have been controversial, each has advantages and disadvantages, the actual system implementation is also difficult to distinguish merits and demerits. In fact, the idea of designing a generic Distributed file system that can be applied to a variety of data application workloads is inherently unrealistic. In this sense, the three meta-data service models have their own reasons, at least within the data storage application area where it is applicable.

centralized meta-data service model
In a Distributed file system, the data and I/O access load is dispersed across multiple physically independent storage and compute nodes, enabling high scalability and high performance of the system. For a set of files, if dispatched in a file, the different files are stored on different nodes, or stored in a stripe manner, a file is divided into multiple sections that are stored in multiple nodes. Clearly, one of the key issues we face is how to ensure that data is properly located and accessed, and metadata services are used to solve this problem. The metadata service records the relationship between the logical name of the data and the physical information, including all the metadata required for the file access control, and accesses the metadata service request to query the corresponding metadata, and then through the obtained metadata for subsequent I/O operations such as file read and write.

For the sake of simplifying system design complexity, and due to a large number of legacy systems, most distributed file systems employ centralized meta-data services such as lustre, PVFS, StorNext, GFS, etc. The centralized metadata service model, which typically provides a central metadata server for storing metadata and client query requests, provides a unified file system namespace and handles access control functions such as name resolution and data positioning. In traditional NAS systems, I/O traffic needs to go through a server, and in a distributed file system, I/O traffic does not need to go through a meta-data server, and the client interacts directly with the storage node. This architectural change, which separates the control flow from the data flow, is a significant improvement in the scalability and performance of the meta-data servers and storage servers. It is obvious that the most important advantage of a centralized metadata service model is that the design is simple and inherently equivalent to designing a single-machine application that provides network access interfaces such as sockets, RPCs, HTTP rest, or soap. The key to the metadata service design implementation is the OPS throughput, which is the number of operations per unit of time, which is especially critical for the centralized metadata service model because of the system scale-up limitations. To optimize OPS, the model has high CPU, memory, and disk requirements, allowing for high-performance CPUs, large memory and high-speed disks, and even back-end storage to consider using high-end disk arrays or SSDs. In the design of software architecture, we should consider the implementation mechanism of multi-process/thread (pool), asynchronous communication, Cache, event-driven and so on. As for the design and implementation of distributed File System namespaces, please refer to the article "Research on the implementation of namespaces in Distributed File Systems", which is not discussed here. In fact, the shortcomings of the centralized metadata service model are equally prominent, with two of the most critical performance bottlenecks and single point of failure issues.

Performance bottlenecks, which in this model will quickly become a bottleneck for the overall system performance as the load grows. According to Amdahl's law, system performance accelerates more than it is ultimately subject to the serial part, which determines the potential for the use of parallel methods to improve the performances. Here, the meta-data server is the serial part, it directly determines the scale and performance of the system. The basic nature of file metadata requires that it must be maintained and updated synchronously, and the metadata should be updated synchronously whenever the file data or metadata is manipulated. For example, a file's access time, even a read operation or a column directory, needs to be updated. When a client accesses a distributed file system, it needs to interact with the metadata server first, including namespace resolution, data location, access control, and so on, before I/O interacts directly with the storage node. As the scale of the system expands, the storage nodes, the number of disks, the number of files, the client data, the number of file operations, and so on, will increase dramatically, and the physical server performance of the metadata server is limited after all, so the centralized metadata server will eventually become a performance bottleneck. For the well-known LOSF (Lots of Small files) applications, the large number of files and the small files are usually small files of several KB to dozens of KB, such as CDN and life sciences DNA data applications, the performance bottleneck of the centralized metadata service model is more serious. LOSF application is mainly a large number of metadata operations, the metadata server in the event of performance problems, directly resulting in very low ops and I/O throughput. Currently, distributed file systems implemented with this model are not suitable for LOSF applications such as lustre, PVFS, GFS.

In fact, performance bottlenecks are not as serious as they are imagined, Lustre, StorNext, GFS and so on are extremely high performance in large file applications, and StorNext even performs well under small files. On the one hand, the first step should be to avoid applying to LOSF, unless performance requirements are extremely low. Second, for large file applications, more emphasis is placed on I/O data throughput, and the proportion of metadata operations is very small. When the file size is large, the amount of metadata is significantly reduced and the system is more time-out for data transfer, and the metadata server pressure drops dramatically. In this case, there is basically no performance bottleneck. Furthermore, if there is a performance bottleneck, the meta-data server can be optimized for performance under the maximum load that the system can carry. The most direct way to optimize is to upgrade the hardware, such as CPU, memory, storage, network, Moore's law is still valid. System-level optimizations are often also effective, including OS cropping and parameter optimization, which has a lot of room for improvement. The optimization of the metadata server design itself is the most critical, it can help users to save costs, simplify maintenance and management, the optimization method mainly includes data locality, Cache, asynchronous I/O, etc., designed to improve concurrency, reduce disk I/O access, reduce request processing time. Therefore, in a very large number of data applications, centralized metadata server performance is not a major problem, or through performance optimization can be resolved.

Single points of failure (Spof,single point of Failure), this problem appears to be more severe than the performance bottleneck. The entire system relies heavily on the metadata server, and in the event of a problem, the system becomes completely unusable, directly leading to application disruption and impacting business continuity. The network, compute, and storage components and software involved in a physical server are likely to fail, so a single point of failure problem is potential, and using better hardware and software can only reduce the probability of occurrence and cannot be avoided. At present, the SPOF problem mainly uses the HA mechanism to solve, according to the availability requirements, mirroring one or more metadata server (logical or physical can), constitute a meta-data service ha cluster. One of the clusters serves as the primary metadata server, accepting and processing requests from clients and synchronizing with other servers. When a problem occurs with the primary metadata server, an available server is automatically selected as the new primary server, a process that is transparent to the upper-level application and does not incur a business interruption. The HA mechanism solves the SPOF problem, but at the same time increases the cost overhead, where only the primary server is active and the other servers are inactive, without any help for performance gains.

Distributed meta-data service model

Naturally, some people put forward the distributed Metadata Service model, as the name implies, using multiple servers to form cluster collaboration to provide meta-data service for Distributed file system, so as to eliminate the performance bottleneck and single point failure problem of centralized meta-data service model. This model can be subdivided into two categories, one for the full-peer model, that is, each meta-data server in the cluster is fully equivalent, each can provide metadata services independently, and then the cluster internal metadata synchronization, to maintain data consistency, such as Isilon, Loongstore, CZSS and so on. The other is the full distribution mode, in which each metadata server in the cluster is responsible for some metadata services (partitions can overlap), which together form a complete meta-data service, such as Panfs, GPFS, Ceph, etc. Distributed metadata Service model that spreads the load across multiple servers addresses a performance bottleneck and solves a single point of failure with a peer server or redundant metadata service partition. While distributed is seemingly perfect, it greatly increases the complexity of the design implementation and may introduce new problems, namely performance overhead and data consistency issues.

Performance overhead, distributed systems often introduce additional overhead due to data synchronization between nodes because of the need to use various locking and synchronization mechanisms during synchronization to ensure data consistency. If the node synchronization problem is improperly handled, the performance overhead will have a great impact on the system scalability and performance, and the performance bottleneck is the same as the centralized metadata model, which puts forward higher requirements for the design of the distributed metadata server. This performance overhead offsets some of the performance gains from distributed adoption, and increases as the number of metadata servers, number of files, file operations, storage system size, number of disks, file size becomes smaller, and I/O operations are random. In addition, when the metadata server is large, high concurrency metadata access can lead to more significant synchronization performance overhead. Currently, some distributed file systems use technologies such as high-performance networks (such as InfiniBand, gibe, etc.), SSD or SAN disk arrays, distributed shared memory (SMP or ccNUMA) to synchronize and communicate metadata within a cluster. This can indeed significantly improve system performance to offset the synchronization overhead, but the cost has also increased in vain.

Data consistency, which is a problem that a distributed system must face. The distributed Metadata Service model is also at risk of potential system errors, although some metadata node failures do not cause the entire system to go down, but it may affect the whole system to run or access errors. To ensure high availability, metadata is replicated to multiple node locations, and maintaining synchronization between multiple replicas is a high risk. If the metadata is not synchronized or accidentally destroyed, the metadata of the same file will be inconsistent, resulting in inconsistent access to file data, directly affecting the correctness of the upper data application. The probability of this risk occurring increases greatly with the scale of the system, so the synchronization and concurrent access of distributed metadata is a huge challenge. Using synchronous method to synchronize metadata, then combine transaction or log, can solve the problem of data consistency, however, it greatly reduces the concurrency of the system and violates the design intention of the distributed system. In the premise of guaranteeing the consistency of metadata, we can improve the concurrency as much as possible, which puts forward strict requirements on synchronization mechanism and algorithm design, which is self-evident in complexity and challenge.

no meta Data service model
Since the centralized or distributed metadata service model can not completely solve the problem, then the direct removal of the meta-data server, is it possible to avoid these problems? Theoretically, the non-meta-data service model is completely feasible, and the alternative method of locating metadata query can be found. Ideally, this model eliminates the metadata performance bottleneck, single point of failure, data consistency and a series of related issues, system scalability significantly improved, system concurrency and performance will be linear expansion of growth. At present, the distributed file system based on the non-meta-data service model is very rare, and glusterfs is the most typical representative.

For distributed systems, meta-data processing is the key to determine the scalability, performance and stability of the system. Glusterfs A new approach, completely abandoning the metadata service, using elastic hashing algorithm instead of centralized or distributed meta-data service in traditional distributed file system. This fundamentally solves the problem of metadata, resulting in near-linear high scalability, while also improving system performance and reliability. Glusterfs uses the algorithm to locate data, and any server and client in the cluster can locate and read and write access to the data based on the path and file name. In other words, Glusterfs does not need to separate the metadata from the data because the file location can be parallelized independently. Glusterfs is uniquely designed to use a non-meta-data service instead of using algorithms to locate files, and metadata and data are stored together instead of separating them. All storage-system servers in a cluster can intelligently locate file data shards, based only on filenames and paths and using algorithms, without querying the index or other servers. This allows data access to be fully parallelized for true linear performance scaling. The no-metadata server greatly improves the performance, reliability, and stability of the glusterfs. (Glusterfs more in-depth analysis please refer to the "Glusterfs Cluster file System Research" article).

The benefit of a no-metadata server design is that there is no single point of failure and performance bottlenecks that can improve system scalability, performance, reliability, and stability. For a large amount of small file applications, this design can effectively solve the problem of meta-data. Its negative effect is that the data consistency problem is more complex, the file directory traversal operation is inefficient, the lack of global monitoring and management functions. It also causes the client to take on more functions, such as file positioning, namespace caching, logical volume view maintenance, and so on, which increase the load on the client and occupy a considerable amount of CPU and memory.

Comparison of three meta data Service models

One of the biggest challenges for scale-out storage systems is to document the image relationship between the data logic and the physical location, that is, the data metadata. Traditional distributed storage systems use centralized or fabric metadata services to maintain metadata, and centralized metadata services can lead to single point of failure and performance bottlenecks, while distributed metadata services have performance overhead, metadata synchronization consistency, and design complexity. The non-metadata service model eliminates the problem of metadata access, but also increases the complexity of management of the data itself, lacks the global monitoring management function, and increases the load of the client. Thus, the three models are not perfect, respectively, have their own advantages and disadvantages, no absolute merits and demerits of the points, the actual selection to choose the appropriate model according to the specific circumstances, and find ways to improve its shortcomings, so as to improve the Distributed File system scalability, high performance, usability and other characteristics. The representative of the centralized metadata service model is lustre, StorNext, GFS, etc., the typical case of the distributed Metadata Service model is Isilon, GPFS, Ceph, etc., Glustrefs is the classic of the metadata-free service model. These are very powerful distributed file systems, and they are very good examples of design. This is enough to demonstrate that the architecture is critical, but that the implementation technology often determines the final outcome.

Distributed File system meta-data service model [go]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.