Big data of Red Hat: gluster All-round interpretation

Source: Internet
Author: User
Keywords This Red Hat extension hardware interpretation

Red Hat http://www.aliyun.com/zixun/aggregation/17539.html "> Company announced the acquisition of Gluster, The latter is widely concerned as a developer of Glusterfs open source file system and Gluster storage platform software stack. In this way, Red Hat has created itself as a one-stop shop for customers looking for large data solutions like Apache Hadoop. But it also buys a file system that has tremendous potential for cloud-based deployment. If you haven't heard of Gluster, read this article in detail and learn how the company stands out in the area of extended network-attached storage.

Gluster Company Profile

In the company's own words, Glusterfs is a "scalable open source clustered file system that provides customers with a global namespace, distributed front-end, and scalability of up to hundreds of PB levels." "That's not a small tone, but Glusterfs does take responsibility for solving big problems-the real big issues," he said. In fact, Gluster's maximum capacity is brontobyte (yes, the word has become a reality, equivalent to 100 billion bytes).

Perhaps the most important detail that glusterfs is most deserving of immediate insight is that it fully implements the massive expansion of network-attached storage without resorting to the elements that others use in processing large data areas: metadata. Metadata is used to describe the location of a given file or chunk in a distributed file system, and it is also a fatal weakness in scale for network-attached storage solutions.

In some cases, such as the local HDFs of Hadoop, metadata is the key culprit in failure. In other cases, it is also an impediment to linear performance scalability because all nodes must keep in touch with the server (or server group) to extend the entire cluster's metadata- This approach will almost certainly result in additional latency and make storage hardware idle in the process of waiting for response metadata requests.

Gluster solves this problem by using its own elastic hash algorithm. With this algorithm, each node in the Gluster cluster can compute the location of a particular file without contacting the other nodes within the cluster-which essentially eliminates the need for metadata tracking and change. It is this package that gives Glusterfs the edge in the competition and enables it to truly achieve its own commitment to scalability in linear performance.

Back-end deployments

Glusterfs is a set of user-space file system drivers that can be deployed in any brand Liniux system (mainly Rhel or CentOS). In other words, Glusterfs's operation is completely independent of the hardware, so it is very easy to carry. In a prefab or private cloud instance, Glusterfs can be created on top of commercial server hardware such as JBOD (i.e. simple disk bundle), DAS (i.e. data acquisition system) or SAN storage-depending on the choice of end user. In a public cloud environment, Glusterfs can be installed directly on existing products, providing better scalability and effectiveness (now Amazon and Rightscale companies are offering similar products). In addition, when deployed in a growing number of virtual devices, the Gluster nodes will operate on top of the hypervisor-whether prefabricated or in the cloud.

Depending on how data is stored in a glusterfs node cluster, Gluster can be deployed in several ways with different performance and availability characteristics. The simplest type of RAID0 distribution, which essentially simulates file-level distributions. In this type, the file is stored only in one Gluster node, so the failure of a single node can result in loss of data. Well, that's nothing to be surprised about. Low security is the highest level of performance and the most efficient storage invocation status, because file backups are not involved in the entire process.

For applications that require data security in the event of a node failure, the Gluster distributed replica pattern meets such requirements, which are essentially similar to raid 10. In this mode, the file is distributed in a pair of mirrored nodes that are always in the synchronized state. In the event of a failure, the mirror node is supplemented in time to ensure that the availability of the file is unaffected.

Finally, Gluster also supports segmented mode, a pattern that is very close to the standard block layer RAID0 on execution. It is recommended that the pattern is generally suitable for storage with large files (typically more than 50GB) and for higher performance requirements for multiple nodes. This is the only one by one pattern that will ever split a file and distribute it across multiple nodes-all other schemas operate only at the file level. Unfortunately, mirroring does not combine with segmented mode, so to achieve extremely high availability, you must unify this scenario with your hardware deployment.

Although we cannot use multiple storage modes in the same gluster cluster, we can still run several logical clusters in the same set of hardware devices. As a result, you can actually run a distributed backup cluster and a segmented cluster in a separate physical hardware.

In addition to allowing a distributed backup system to be implemented within the Gluster cluster, it is also possible to have multiple-line geographic backups between the different clusters. This scenario can be used to protect the overall failure of the site or to make it easier for applications to migrate from one site to another. Gluster geographic Backup is flexible, allowing us to replicate patterns that include any number of intermediate replicas (e.g. from A to B, from B to C and D).

It should be noted that the extension of the Gluster cluster across physical sites is also feasible, but it has high requirements for replication, large WAN bandwidth, and low latency in the distributed cluster to ensure satisfactory performance. In practice, individual gluster are likely to be affected by the limitations of a site or a metropolitan area network.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.