This article is reprinted from the soaring water drop "glusterfs six-volume model description"
Description of glusterfs 6
First, sub-volume
The distributed volume files are randomly distributed across the entire brick volume. To use distributed volumes, you need to expand storage. Redundancy is important or provides other hardware/software layers. (Introduction: distributed volumes. Files are randomly distributed to a volume composed of bricks using the hash algorithm. Resources in the volume are only stored on one server, and are not in the image or strip mode in the storage pool .)
(In a distributed volumes files are spread randomly into ss the bricks in the volume. use distributed volumes where you need to scale storage and redundancy is either not important or is provided by other hardware/software layers.
Note
Disk/Server failure in Distributed volumes can result in a serous loss of data because directory contents are spread randomly loss ss the bricks in the volume .)
(For example, delimiter, file1, and file2 are stored in server1, while file3 is stored in server2 .)
Create the distributed volume:
# Gluster volume create NEW-VOLNAME [transport [TCP/IP | rdma | TCP, rdma]
New-brick...
For example, to create a distributed volume with four storage servers using TCP:
# Gluster volume create test-volume server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
Creation of test-volume has been successful
Please start the volume to access data.
(Optional) You can display the volume information:
# Gluster volume info
Volume name: Test-volume
Type: distribute
Status: created
Number of bricks: 4
Transport-type: TCP
Bricks:
Brick1: server1:/exp1
Brick2: server2:/exp2
Brick3: server3:/exp3
Brick4: server4:/exp4
Second, roll-out
Copy a volume to create a copy of a file across multiple bricks. It is critical that you can use the high availability and high reliability of the replication volume in the environment. (Description: The number of replica must be equal to the number of storage servers contained by brick in volume, similar to raid1, with high availability. Create a two-to-one backup volume. If a hard disk in the storage pool is damaged, data usage is not affected. At least two servers are required to create a Distributed Image volume .)
(Replicated volumes create copies of files into SS multiple bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.
Note
The number of bricks shocould be equal to of the replica count for a replicated volume. To protect against server and disk failures, it is recommended that the bricks of the volume are from different servers .)
(For example, when file1 is stored on server1 and server2. The same is true for file2. The file in server2 is a copy of the file in server1 .)
Create the replicated volume:
# Gluster volume create NEW-VOLNAME [replica count] [transport [TCP |
Rdma | TCP, rdma] New-brick...
For example, to create a replicated volume with two storage servers:
# Gluster volume create test-volume replica 2 Transport TCP server1:/exp1 server2:/exp2
Creation of test-volume has been successful
Please start the volume to access data.
Third, strip volume
The data capacity between strip and strip. To achieve the best effect, you should use a strip volume to access very large files only in high-concurrency environments. (Example: a tape-type volume. Similar to raid0, the number of stripe must be equal to the number of storage servers contained by brick in volume. Files are divided into data blocks, it is stored in bricks in the form of round robin. The concurrency granularity is data blocks, and the performance of large files is good .)
(Striped volumes stripes data into SS bricks in the volume. For best results, you shocould use striped volumes only in high concurrency environments accessing very large files.
Note
The number of bricks shocould be a equal to the stripe count for a striped volume .)
(For example, segments and files are divided into 6 segments, 5 are stored in server1, and 6 are stored in server2 .)
Create the striped volume:
# Gluster volume create NEW-VOLNAME [stripe count] [transport [TCP |
Rdma | TCP, rdma] New-brick...
For example, to create a striped volume SS two storage servers:
# Gluster volume create test-volume stripe 2 Transport TCP server1:/exp1 server2:/exp2
Creation of test-volume has been successful
Please start the volume to access data.
For example:
Fourth, split-type Strip-volumes)
Two or more nodes of the distributed strip volume strip file in the cluster. In order to achieve the best results, you should use distributed strip volumes. The requirement is that it is crucial to expand storage and access the highly concurrent file environment. (Introduction: distributed strip volumes. in volume, the number of storage servers contained by brick must be a multiple of stripe (> = 2 times), taking into account both distributed and strip functions. Each file is distributed on four shared servers, which are usually used for large file access. At least four servers are required to create distributed strip volumes .)
(Distributed striped volumes stripes files into SS two or more nodes in the cluster. for best results, you shocould use distributed striped volumes where the requirement is to scale storage and in high concurrency environments accessing very large files is critical.
Note
The number of bricks shocould be a multiple of the stripe count for a distributed striped volume .)
Create the distributed striped volume:
# Gluster volume create NEW-VOLNAME [stripe count] [transport [TCP |
Rdma | TCP, rdma] New-brick...
For example, to create a distributed striped volume SS eight storage servers:
# Gluster volume create test-volume stripe 4 Transport TCP server1:/exp1 server2:/exp2
Server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6 Server7:/exp7 server8:/exp8
Creation of test-volume has been successful
Please start the volume to access data.
Fifth, distributed Compaction-type)
Allocate the volume of files in the copy brick. You can use distributed replication volumes to require storage and high reliability in a large-scale environment. Distributed replication volumes also provide better read performance in most environments
. (Introduction: distributed replication volumes. The number of storage servers contained by brick in volume must be a multiple of replica (> = 2 times), taking into account both distributed and replicated functions .)
(Distributes files stored SS replicated bricks in the volume. you can use distributed replicated volumes in environments where the requirement is to scale storage and high-reliability is critical. distributed replicated volumes also offer improved read performance in most environments.
Note
The number of bricks shocould be a multiple of the replica count for a distributed replicated volume. also, the order in which bricks are specified has a great effect on data protection. each replica_count consecutive bricks in the list you give will form a replica set, with all replica sets combined into a volume-wide distribute set. to make sure that replica-set members are not placed on the same node, list the first brick on every server, then the second brick on every server in the same order, and so on .)
Create the distributed replicated volume:
# Gluster volume create NEW-VOLNAME [replica count] [transport [TCP |
Rdma | TCP, rdma] New-brick...
For example, four node distributed (replicated) volume with a two-way mirror:
# Gluster volume create test-volume replica 2 Transport TCP server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
Creation of test-volume has been successful
Please start the volume to access data.
For example, to create a six node distributed (replicated) volume with a two-way mirror:
# Gluster volume create test-volume replica 2 Transport TCP server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6
Creation of test-volume has been successful
Please start the volume to access data.
Sixth, strip-type rolling)
A strip is used to create a volume. The strip is used to copy data in the cluster. To achieve the best results, it is vital to use the striped copy volume to concurrently access very large files and performance in a highly concurrent environment. In this version, this type of volume configuration only supports map to reduce the workload.
(Striped replicated volumes stripes data into SS replicated bricks in the cluster. for best results, you shocould use striped replicated volumes in highly concurrent environments where there is parallel access of very large files and performance is critical. in this release, configuration of this volume type is supported only for map reduce workloads.
Note
The number of bricks shocould be a multiple of the replicate count and stripe count for a striped replicated volume .)
Create a striped replicated volume:
# Gluster volume create NEW-VOLNAME [stripe count] [replica count]
[Transport [TCP | rdma | TCP, rdma] New-brick...
For example, to create a striped replicated volume SS four storage servers:
# Gluster volume create test-volume stripe 2 replica 2 Transport TCP server1:/exp1
Server2:/exp2 server3:/exp3 server4:/exp4
Creation of test-volume has been successful
Please start the volume to access data.
To create a striped replicated volume SS six storage servers:
# Gluster volume create test-volume stripe 3 replica 2 Transport TCP server1:/exp1
Server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6
Creation of test-volume has been successful
Please start the volume to access data.
For example:
For example:
7. Distributed strip rolling (three types of hybrid)
Distributed strip copy volume distribution strip data in the replication brick cluster. To achieve the best results, it is vital to use a highly concurrent strip replication volume environment to access very large files and performance in parallel. In this version, this type of volume configuration only supports map to reduce the workload.
(Distributed striped replicated volumes distributes striped data into SS replicated bricks in the cluster. for best results, you shocould use distributed striped replicated volumes in highly concurrent environments where parallel access of very large files and performance is critical. in this release, configuration of this volume type is supported only for map reduce workloads.
Note
The number of bricks shocould be a multiples of number of stripe count and replica count for adistributed striped replicated volume .)
Create a distributed striped replicated volume using the following command:
# Gluster volume create NEW-VOLNAME [stripe count] [replica count]
[Transport [TCP | rdma | TCP, rdma] New-brick...
For example, to create a distributed replicated striped volume SS eight storage servers:
# Gluster volume create test-volume stripe 2 replica 2 Transport TCP server1:/exp1
Server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6 Server7:/exp7
Server8:/exp8
Creation of test-volume has been successful
Please start the volume to access data.
RaidTechnologies includeRAID 0~Raid 7And other specifications, their focus is different, common norms are as follows:
RAID 0: RAID 0 continuously splits data by bit or byte and reads/writes data on multiple disks in parallel. Therefore, RAID 0 has a high data transmission rate, but it has no data redundancy, therefore, it cannot be regarded as a real raid structure. RAID 0 only improves performance and does not guarantee data reliability. However, failure of a disk affects all data. Therefore, RAID 0 cannot be used in scenarios with high data security requirements.
Raid 1: it achieves data redundancy through disk data mirroring, and generates data backed up by each other on a pair of Independent Disks. When raw data is busy, data can be directly copied from the image, so RAID 1 can improve read performance. Raid 1 is the most costly disk array, but provides high data security and availability. When a disk fails, the system can automatically switch to the image disk to read and write data without restructuring the invalid data.
RAID 0 + 1: Also known as the raid 10 standard, it is actually a product that combines RAID 0 and RAID 1 standards, when data is continuously divided by bit or byte and multiple disks are read/written in parallel, the disk image is redundant for each disk. Its advantage is that it has both the extraordinary speed of RAID 0 and the high data reliability of RAID 1, but the CPU usage is also higher, and the disk utilization is relatively low.
Raid 2: Data is distributed in blocks on different hard disks, measured in bytes or bits, and the average error correction code (Hamming Code) is used) "encoding technology to provide error detection and recovery. This encoding technology requires multiple disks to store inspection and recovery information, making raid 2 more complicated and therefore rarely used in commercial environments.
RAID 3: similar to raid 2, RAID 3 blocks data on different hard disks. The difference is that raid 3 uses simple parity, use a single disk to store the parity information. If a disk is invalid, data can be re-generated on the parity disk and other data disks. If the parity disk is invalid, data usage will not be affected. RAID 3 provides a good transfer rate for a large amount of continuous data, but for random data, the parity disk will become the bottleneck of write operations.
Raid 4: Raid 4 also blocks data and distributes the data on different disks. However, the disk unit is block or record. Raid 4 uses a disk as the parity disk. Each write operation requires access to the parity disk. In this case, the parity disk becomes the bottleneck for write operations. Therefore, raid 4 is rarely used in commercial environments.
RAID 5: RAID 5 does not separately specify a parity disk. Instead, it accesses data and parity information across all disks. On RAID 5, read/write pointers can be performed on the array devices at the same time, providing higher data traffic. RAID 5 is more suitable for small data blocks and random read/write data.
The main difference between RAID 3 and RAID 5 is that each data transmission of RAID 3 involves all array disks. For RAID 5, most data transmission only operates on one disk and can be performed in parallel. There is a "Write loss" in RAID 5, that is, each write operation will generate four actual read/write operations, two of which read the old data and parity information, write new data and parity information twice.
Raid 6: Compared with RAID 5, raid 6 adds a second independent parity information block. Two independent parity systems use different algorithms, and the data reliability is very high. Even if the two disks are invalid at the same time, the data usage will not be affected. However, raid 6 needs to allocate more disk space to the parity check information, which has a greater "Write loss" compared with RAID 5. Therefore, the "Write Performance" is very poor. Poor performance and complex implementation methods make raid 6 rarely applied in practice.
Raid 7: this is a new raid standard. It comes with an intelligent real-time operating system and a software tool for storage management. It can run completely independently of the host and does not occupy host CPU resources. Raid 7 can be viewed as a storage computer, which is significantly different from other raid standards. In addition to the above standards, we can combine RAID 0 + 1 with a variety of RAID specifications to build the required raid array, such as RAID 5 + 3 (RAID 53) it is a widely used array. Generally, you can flexibly configure disk arrays to obtain a more suitable disk storage system.
Reprinted Please note: http://blog.163.com/?email protected]/blog/static/62713185201351510303223/
[Reprinted] glusterfs six-volume model description