Raid in glusterfs

Last Update:2018-12-03 Source: Internet

Author: User

Tags glusterfs gluster

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I have never understood the concept of raid. I just read it over the past two days. Raid is short for "Redundant Array of Independent Disks". It means an independent redundant disk array, disk Arrays use different technologies for different applications, known as raid level, and each level represents different technologies. Currently, the industry-recognized standard is RAID 0 ~ RAID 5. Baidu encyclopedia has a detailed explanation

Here we use the glusterfs application to describe raid0 and raid1.

What is the most common understanding of raid0 and raid1?
Raid0 is the fastest speed, because data is stored separately on each Hard Disk consisting of an array, so once one hard disk is faulty, all data will be damaged. Advantages: fast speed, low cost. Disadvantages: data is easy to lose and cannot be recovered once damaged.
The principle of raid1 is to have an array composed of two hard disks, one of which is used for normal use and the other for dedicated backup storage. It is equivalent to using only one hard disk for the two hard disks, in addition, the disk is used to store the data in this hard disk. In this way, even if one hard disk breaks down, data will not be lost, but the speed is slow. In addition, the two hard disks can only use the capacity of one hard disk.

The core of glusterfs Cluster Storage is cluster translators, which includes three types: AFR (Automatic File Replication), DHT (Distributed Hash Table), and stripe.

Afr is equivalent to raid1. The same file is retained on multiple storage nodes for high availability and automatic data repair. All the sub-volumes of AFR have the same namespace. When searching for a file, it starts from the first node until the search is successful or the last node has been searched. When reading data, AFR schedules all requests to all storage nodes for load balancing to improve system performance. When writing data, you must first lock the file on all lock servers. By default, the first node is the lock server and multiple lock servers can be specified. Then, AFR writes data to all servers in the form of log events. After the operation is successful, the log is deleted and unlocked. Afr automatically detects and fixes data inconsistency of the same file. It uses the change log to determine the correct data copy. Automatic repair is triggered when the file directory is accessed for the first time. If the directory is used to copy the correct data on all sub-volumes, if the file is not saved, it is created and the file information does not match, the log indicates that an update is performed.

DHT is the elastic hash algorithm described above. It uses hash for data distribution, and namespace is distributed across all nodes. The ehash algorithm is used to search for a file, which does not depend on the namespace. However, when traversing a file directory, the implementation is complicated and inefficient, and you need to search for all the storage nodes. A single file is only scheduled to a unique storage node. Once the file is located, the read/write mode is relatively simple. DHT does not have fault tolerance capabilities. Therefore, AFR is required for high availability.

Stripe is equivalent to raid0, that is, multipart storage. The data fragments of files are divided into fixed-length data fragments and stored in all storage nodes in the round-robin mode. All the storage nodes of stripe constitute a complete namespace. When searching for files, you need to ask all nodes, which is very inefficient. When reading and writing data, stripe involves all sharded storage nodes. operations can be performed concurrently between multiple nodes with high performance. Stripe is usually used in combination with AFR to form raid10/raid01, while obtaining high performance and high availability. Of course, the storage utilization is lower than 50%.

Well, there are so many concepts of copy, mainly how does glusterfs implement raid0 + raid1?

I initially thought that some commands through gluster volume could be implemented, but I encountered a problem and I never knew how to solve it? The details are as follows:

Based on the previous glusterfs installation and cluster setup, let's take a closer look at the gluster volume command.

Create distributed volume

Sudo gluster peer probe 192.168.30.8
Sudo gluster peer probe 192.168.30.9

Sudo gluster volume create test-volume1 transport TCP 192.168.30.8:/home/dir1 192.168.30.9:/home/dir1
View information: sudo gluster volume info
Enable volume: gluster volume start test-volume1

Create a distributed copy (image) Volume
Gluster volume create test-volume2 replica 2 Transport TCP 192.168.30.8:/home/dir2 192.168.30.9:/home/dir2

Configure distributed strip volumes
Gluster volume create test-volume3 stripe 2 Transport TCP 192.168.30.8:/home/dir3 192.168.30.9:/home/dir3

Extended volume
Gluster peer probe 192.168.30.10
Gluster volume add-brick test-volume1 192.168.30.10:/home/dir1
Shrink volume
Gluster volume remove-brick test-volume1 192.168.30.10:/home/dir1

These commands are all right, but how can we combine stripe and replica? If I keep a copy of the stripe volume consisting of 192.168.30.8:/home/dir1 and 192.168.30.9:/home/dir1, at this time with gluster volume create test-volume2 replica 2 Transport TCP 192.168.30.8:/home/dir1 192.168.30.10:/home/dir1, then the prompt 192.168.30.8:/home/dir1
This redundant volume cannot be created because it is already in use. It can only be a useless directory. However, I want to copy the used directory image for a long time, the corresponding command is not found.

Check on the network that the configuration files of the server and client are used to build the raid0 + raid1, start the glusterfs server through glusterfsd, start the glusterfs client through glusterfs, and mount it to an object directory, by referring to the practices on the internet, it turns out that this method works, but you only need to write the configuration file. The client configuration file is as follows:

Volume Client1 type protocol/Client Option transport-type TCP option remote-host 192.168.30.7 # IP address of the remote brick option remote-subvolume brick # Name of the remote volumeend-volumevolume Client2 type protocol/ client Option transport-type TCP option remote-host 192.168.30.8 # IP address of the remote brick option remote-subvolume brick # Name of the remote volumeend-volumevolume client3 type protocol/Client Option transport-type TCP option remote-host 192.168.30.9 # IP address of the remote brick option remote-subvolume brick # Name of the remote volumeend-volumevolume client4 type protocol/Client Option transport-type TCP option remote-host 192.168.30.10 # IP address of the remote brick option remote-subvolume brick # Name of the remote volumeend-Volume # relationship between servers volume brick1 type cluster/distribute # distributed volume subvolumes Client1 client2end-volumevolume brick2 type cluster/replicate # copy volume subvolumes client3 client4end-volumevolume brick3 type cluster/distries # distributed volume subvolumes brick1 brick2end-volume

Each volume client (1-4) corresponds to the entity directory of a server. Of course, a server can start multiple entity directories. If an error is reported that the port is not bound, add option remote-port 24016 to the configuration file. Port 24016 is the default port. You can change it by yourself.

We can see that this method does not directly use an object directory twice. It first forms two groups of subvolume (sub-volumes brick1 and brick2 ), then combine the two groups of sub-volumes and use the gluster volume method. Is it okay?

When two groups of subvolumes are formed, how can these two volumes be merged into one volume? The create volume command, whether distributed or replicated, must be followed by

In fact, you don't need to do the same as in the configuration file. The gluster volume directly provides commands for Distributed replication. I 've always understood the error before. You can take a look at the creating Distributed replicated volumes command, this command can be used to implement raid0 + raid1. I 've been stuck in the usage of creating Distributed volumes and creating replicated volumes. You can carefully compare the functions and usage of the Five Commands.

Creating distributed volumes
Creating replicated volumes
Creating striped volumes
Creating distributed striped volumes
Creating distributed replicated volumes
I think that as long as the configuration file can implement the function, the gluster volume management function should also be implemented accordingly.

References:

1. http://smilett.com /? Cat = 272. http://techbbs.zol.com.cn/1/60_1837.html3.http://blog.csdn.net/liuben/article/details/62845514.http://zhoubo.sinaapp.com? Cat = 225. http://blog.sina.com.cn/s/blog_4cbf97060100s8dh.html6.http://blog.csdn.net/njchenyi/article/details/5545395

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More