Preliminary discussion on the type of glusterfs-test volume

Source: Internet
Author: User
Tags glusterfs gluster

Preliminary discussion on the type of glusterfs-test volume

First, the preparation work 1, the physical machine 3 sets the IP mapping to the Hosts file: 192.168.20.72 node72192.168.20.73 node73192.168.20.86 node862, Partition [[ Email protected] ~]# yum install lvm2 xfsprogs -y   [[email  protected] ~]# pvcreate /dev/sdb[[email protected] ~]# vgcreate vg0  /dev/sdb [[email protected] ~]# lvcreate -l 16t -n lv01 vg0 [[Email protected] ~]# mkfs.xfs -f -i size=512 /dev/vg0/lv01 [[email  protected] ~]# blkid /dev/vg0/lv01/dev/vg0/lv01: uuid= " 58a47793-3202-45ab-8297-1c867b6fdd68 " type=" XFS " [[email protected] ~]# mkdir / data[[email protected] ~]# cat << ' _eof '  >>/etc/fstabuuid= 58a47793-3202-45ab-8297-1c867b6fdd68 /data                    xfs     defaults        0 0_eof[[email  protected] ~]# mount -a[[email protected] ~]# df -h |grep data/ DEV/MAPPER/VG0-LV01&NBSP;&NBSP;16T&NBSP;&NBSP;&NBSP;33M&NBSP;&NBSP;16T&NBSP;&NBSP;&NBSP;1%&NBSP;/DATA3, adjusting the firewall [[ email protected] ~]# vim /etc/sysconfig/network# rpc.statd-a input -p  Tcp --dport 111 -j accept-a input -p udp --dport 111 -j  accept# glusterd-a input -p tcp -m tcp --dport 24007 -j  accept# portmapper-a input -p tcp -m tcp --dport 38465 -j  ACCEPT-A INPUT -p tcp -m tcp --dport 38466 -j ACCEPT#  Nfs-a input -p tcp -m tcp --dport 38467 -j accept-a input  -p tcp -m tcp --dport 2049  -j accept-a input -p tcp -m tcp -- Dport 38469 -j accept# nrpe-a input -p tcp --dport 5666 -j  ACCEPT# status-A INPUT -p tcp -m tcp --dport 39543 -j  accept-a input -p tcp -m tcp --dport 55863 -j accept#  nlockmgr-a input -p tcp -m tcp --dport 38468 -j accept-a  Input -p udp -m udp --dport 963   -j accept-a input  -p tcp -m tcp --dport 965   -j ACCEPT# ctdbd-A  input -p tcp -m tcp --dport 4379  -j accept# smbd-a  Input -p tcp -m tcp --dport 139   -j accept-a input  -p tcp -m tcp&nbsP;--dport 445   -j accept# ports for gluster volume bricks   (default 100 ports)-a input -p tcp -m tcp --dport  24009:24108 -j accept-a input -p tcp -m tcp --dport 49152:49251 &NBSP;-J&NBSP;ACCEPT4, service installation [[Email protected] ~]# yum install glusterfs-server -y [[email protected] ~]# service glusterd start[[email protected] ~]#  Chkconfig glusterd on5, peer probe[[email protected] ~]# gluster peer  probe node73peer probe: success[[email protected] ~]# gluster peer  Probe node86peer probe: success Two, configuration: 2 scattered, 3 copies, 2 bands. Required Brick Size: 2 (Scatter)  x 2 (stripe)  x 3 (mirror)  = 12 Create multiple directories on 3 nodes to test: # mkdir /data/ Gv0/brick{1,2,3,4} -p# cd /data/gv0 is performed on 1 of the nodes: [[Email protected] gv0]# gluster volume create gv0 stripe 2 replica 3  Transport tcp node72:/data/gv0/brick1 node73:/data/gv0/brick1 node86:/data/gv0/brick1  node72:/data/gv0/brick2 node73:/data/gv0/brick2 node86:/data/gv0/brick2 node72:/data/gv0/ brick3 node73:/data/gv0/brick3 node86:/data/gv0/brick3 node72:/data/gv0/brick4 node73:/ data/gv0/brick4 node86:/data/gv0/brick4[[email protected] gv0]# gluster volume  Start gv0[[email protected] gv0]# gluster volume info volume name:  gv0Type: Distributed-Striped-ReplicateVolume ID:  83f759f9-2dad-433c-ae5d-d608c9d88a46status: startednumber of bricks: 2 x 2  X 3 = 12transport-type: tcpbricks:brick1: node72:/data/gv0/brick1brick2: node73 :/data/gv0/brick1brick3: node86:/data/gv0/brick1brick4: node72:/data/gv0/brick2brick5: node73:/data/gv0/brick2brick6: node86:/data/gv0/brick2brick7:  node72:/data/gv0/brick3brick8: node73:/data/gv0/brick3brick9: node86:/data/gv0/brick3brick10:  node72:/data/gv0/brick4brick11: node73:/data/gv0/brick4brick12: node86:/data/gv0/ Brick4 See how the current brick is distributing data: [[[Email protected] gv0]# cat /var/lib/glusterd/vols/gv0/ trusted-gv0-fuse.vol   (partial output information omitted)      ------------------------------------- -------#####  are mapped to gv0-client-0 -> gv0-client-11 , respectively, according to the order of the brick provided by our create gv0 #volume  gv0-client-0    option remote-subvolume /data/gv0/brick1     option remote-host node72end-volumevolume gv0-client-1    option  remote-subvolume /data/gv0/brick1    option remote-host  Node73end-volumevolume gv0-client-2    option remote-subvolume /data/gv0/brick1    option remote-host  node86end-volumevolume gv0-client-3    option remote-subvolume /data/gv0/ brick2    option remote-host node72end-volumevolume gv0-client-4     option remote-subvolume /data/gv0/brick2    option  Remote-host node73end-volumevolume gv0-client-5    option remote-subvolume  /data/gv0/brick2    option remote-host node86end-volumevolume  gv0-client-6    option remote-subvolume /data/gv0/brick3     option remote-host node72end-volumevolume gv0-client-7    option  remote-subvolume /data/gv0/brick3    option remote-host  node73end-volumevolume gv0-client-8    option remote-subvolume /data/gv0/brick3    option remote-host  node86end-volumevolume gv0-client-9    option remote-subvolume /data/gv0/ brick4    option remote-host node72end-volumevolume gv0-client-10     option remote-subvolume /data/gv0/brick4    option  Remote-host node73end-volumevolume gv0-client-11    option remote-subvolume  /data/gv0/brick4    option remote-host  Node86end-volume--------------------------------------------#####  We set up 3 copies, so the adjacent 3 gv0-client make up a mirror ( Replicate) group, total 4 x:gv0-replicate-0 -> gv0-replicate-3#volume gv0-replicate-0     subvolumes gv0-client-0 gv0-client-1 gv0-client-2end-volumevolume gv0-replicate-1     subvolumes gv0-client-3 gv0-clienT-4 gv0-client-5end-volumevolume gv0-replicate-2    subvolumes gv0-client-6  gv0-client-7 gv0-client-8end-volumevolume gv0-replicate-3    subvolumes  gv0-client-9 gv0-client-10 gv0-client-11end-volume-------------------------------------------- #####  we set 2 bands, so the adjacent 2 gv0-replicate form a stripe (stripe) group, totaling 2: Gv0-stripe-0 -> gv0-stripe-1#volume  gv0-stripe-0    subvolumes gv0-replicate-0 gv0-replicate-1end-volumevolume  gv0-stripe-1    subvolumes gv0-replicate-2  Gv0-replicate-3end-volume--------------------------------------------#####  The default behavior is dispersion, Thus the adjacent 2 gv0-stripe constitute a decentralized (distribute) group GV0-DHT. #volume  gv0-dht    type cluster/distribute    subvolumes  gv0-stripe-0 gv0-stripe-1end-volume--------------------------------------------#################### #########################  Correspondence: # gv0-client-0 -> gv0-client-1 -> gv0-client-2# ...node72:/data /gv0/brick1 node73:/data/gv0/brick1 node86:/data/gv0/brick1node72:/data/gv0/brick2 node73:/data /gv0/brick2 node86:/data/gv0/brick2node72:/data/gv0/brick3 node73:/data/gv0/brick3 node86:/data /gv0/brick3node72:/data/gv0/brick4 node73:/data/gv0/brick4 node86:/data/gv0/brick4###  Guess: If write file A1,a2,a3,a4,a5 etc, then have the following performance,# a1 ->  disperse   to  gv0-dht/gv0-stripe-0------------cut   &NBSP;2 data blocks (because stripe=2) are written to:gv0-replicate-0  and   respectively Gv0-replicate-1----------------Write: gv0-replicate-0--------------------copy   to  3 brick (because replica= 3): Gv0-client-0 gv0-client-1 gv0-client-2 "corresponding to Brick1"---------------- Write: gv0-replicate-1--------------------copy   to  3 brick (because Replica=3): gv0-client-3 gv0-client-4  gv0-client-5 "corresponding Brick2" # a2  dispersion   to  gv0-dht/gv0-stripe-1------------cutting   for   2 pieces of data (dueFor stripe=2) write:gv0-replicate-2  and   respectively Gv0-replicate-3----------------Write: gv0-replicate-2--------------------copy   to  3 brick (because replica= 3): Gv0-client-6 gv0-client-7 gv0-client-8 "corresponding to Brick3"---------------- Write: gv0-replicate-3--------------------copy   to  3 brick (because Replica=3): gv0-client-9 gv0-client-10  gv0-client-11 "corresponding Brick4" # a3  dispersion   to  gv0-dht/gv0-stripe-0# A4  dispersion   to   gv0-dht/gv0-stripe-1# a5  dispersion   to  gv0-dht/gv0-stripe-0 forecast: 1) Loop write to the members of the stripe Group 2) mirror, 3 parts of the same capacity data block 3) strip, Approximately half of the file size 4) Volume: a1-> a1.m + a1.n-- a1.m -> r0---- R0 -> &NBSP;C0/C1/C2--&NBSP;A1.N&NBSP;-&GT;&NBSP;R1, Capacity (R0)  =  Capacity (C0) capacity (Gv0)  =  capacity (R0+R1+R2+R3) = &NBSP;16T&NBSP;*&NBSP;4&NBSP;=&NBSP;64T Three, the client test configuration hosts resolve the above 3 nodes 192.168.20.72 node72192.168.20.73  Node73192.168.20.86 node86# yum install glusterfs-fuse -y # mount.glusterfs  192.168.20.72:/gv0 /mnt# df -h |grep mnt192.168.20.72:/gv0   64t  131m    64t   1% /mnt "Test 1" dd Write 5 Files "A1->a6", see how the file is distributed: # dd if=/dev/zero  of=/mnt/a1 bs=1024 count=32000      # dd if=/dev/zero  of=/mnt/a2 bs=1024 count=24000 # dd if=/dev/zero of=/mnt/a3 bs=1024  count=20000 # dd if=/dev/zero of=/mnt/A4 bs=1024 count=16000 #  dd if=/dev/zero of=/mnt/a5 bs=1024 count=12000 # dd if=/dev/zero of=/ mnt/a6 bs=1024 count=160000  Server: (3 nodes behave consistently, listing only the output on one node) ============[[email protected]  gv0]# find . -type f -name  ' A * '  -exec ls -l {} \;  |sort -n -k5-rw-r--r-- 2 root root 12189696 sep 24 11:24  ./brick3/a5-rw-r--r-- 2 root root 12288000 sep 24 11:24 ./brick4/a5-rw-r--r-- 2  Root root 16252928 sep 24 11:23 ./brick4/a4-rw-r--r-- 2 root root  16384000 sep 24 11:23 ./brick3/a4-rw-r--r-- 2 root root 20447232  sep 24 11:23 ./brick4/a3-rw-r--r-- 2 root root 20480000 sep  24 11:23 ./brick3/A3-rw-r--r-- 2 root root 24510464 Sep 24  11:22 ./brick1/a2-rw-r--r-- 2 root root 24576000 sep 24 11:22 ./ brick2/a2-rw-r--r-- 2 root root 32636928 sep 24 11:14 ./brick3/ a1-rw-r--r-- 2 root root 32768000 sep 24 11:14 ./brick4/a1-rw-r--r--  2 root root 163708928 Sep 24 11:36 ./brick1/A6-rw-r--r-- 2  root root 163840000&NBSP;SEP&NBSP;24&NBSP;11:36&NBSP;./BRICK2/A6 Summary: 1) Loop writes to the members of the stripe group "non-conforming" a1 -> brick3+brick4 - > gv0-stripe-1a2 -> brick1+brick2 -> gv0-stripe-0a3 -> brick3+ Brick4 -> gv0-stripe-1a4 -> brick3+brick4 -> gv0-stripe-1a5 ->  brick3+brick4 -> gv0-stripe-1A6 -> brick1+brick2 ->  Gv0-stripe-0 judgment: randomly dispersed into a stripe group 2) The Mirror "as expected" observed that all the./brick3/a1 files are of the same size, and the./BRICK4/A1 size is also the same, which means mirroring. 3) The file size of the stripe "non-conforming" DD is quite strange, for example A1, file size: 32M, however:./brick3/a1:32636928./brick4/a1:327680002 data blocks are nearly 32M in size, And one of the data blocks is the same as the full file size. "Test 2" then we test again to write several small files # for i in  ' seq 1 1000 ';d o echo  $i  >>/ mnt/b1;done  # for i in  ' seq 1 10000 ';d o echo  $i  > >/mnt/B2;done# for i in  ' seq 1 100000 ';d o echo  $i  >>/mnt /b3;done# cat /mnt/b3 >>/mnt/b4 && cat /mnt/b3 >>/mnt/b4# cat /mnt/b4 >>/mnt/ b5 && cat /mnt/b4 >>/mnt/b5# ll /mnt/ -htotal  5.5m-rw-r--r-- 1 root root 3.9k sep 24 12:19 b1-rw-r--r-- 1  root root  48k sep 24 12:20 b2-rw-r--r-- 1 root root  576k sep 24 12:05 b3-rw-r--r-- 1 root root 1.2m sep 24  12:28&NBSP;B4-RW-R--R--&NBSP;1&NBSP;ROOT&NBSP;ROOT&NBSP;2.3M&NBSP;SEP&NBSP;24&NBSP;12:32&NBSP;B5 Server: (3 nodes behave consistently, List only the output on one node) ============# find . -type f -name  ' b* '  -exec ls - Lh {} \; |sort -k3 -t '/'-rw-r--r-- 2 root root 0 sep 24  12:19 ./brick2/b1-rw-r--r-- 2 root root 3.9k sep 24 12:19 . /brick1/b1-rw-r--r-- 2 root root 0 sep 24 12:19 ./brick2/b2-rw-r--r-- 2 root  root 48k sep 24 12:20 ./brick1/b2-rw-r--r-- 2 root root 512k  sep 24 12:04 ./brick4/b3-rw-r--r-- 2 root root 576k sep 24  12:05 ./brick3/b3-rw-r--r-- 2 root root 1.0m sep 24 12:28 . /brick4/b4-rw-r--r-- 2 root root 1.2m sep 24 12:28 ./brick3/ b4-rw-r--r-- 2 root root 2.2m sep 24 12:32 ./brick3/b5-rw-r--r--  2&NBSP;ROOT&NBSP;ROOT&NBSP;2.3M&NBSP;SEP&NBSP;24&NBSP;12:32&NBSP;./BRICK4/B5 "Test 3" test a large file # find .  -type f -name  ' c* '  -exec ls -lh {} \; |sort -k3 - T '/'  -rw-r--r-- 2 vdsm kvm 4.2g jul 24  2014 ./brick1/ Centos-6.5-x86_64-bin-dvd1.iso-rw-r--r-- 2 vdsm kvm 4.2g jul 24  2014 ./brick2/centos-6.5-x86_64- Bin-dvd1.iso so far, we are all puzzled, why the 2 file size in 2 bands is almost equal to the source file? Perhaps large files are easier to observe, and we quickly find that assuming the above file size is true, then the result of DF is inconsistent with the results:# df -h /datafilesystem             size  used avail use% mounted on /dev/mapper/vg0-lv01   16t  6.3g   16t   1% / Data began to wonder why the actual capacity is 6.3G, hurriedly go to the data directory to see:# du -h /data/gv0/ |grep g3.2g     /data/gv0/brick1/.glusterfs/1f/863.2G    /data/gv0/brick1/.glusterfs/1f3.2G     /data/gv0/brick1/.glusterfs3.2G    /data/gv0/brick13.2G     /data/gv0/brick2/.glusterfs/1f/863.2g    /data/gv0/brick2/.glusterfs/1f3.2g     /data/gv0/brick2/.glusterfs3.2g    /data/gv0/briCk26.3g    /data/gv0/Originally, Gluster has cut the file to 2 data blocks here, we can summarize again: 1) When there are multiple data groups, Randomly dispersed to a group 2) there is a specified number of mirrors and bands 3) explanation of the composition number of bricks: 2 x 2 x 3 =  12 means: 2 (scatter)  x 2 (striped)  x 3 (mirrored) configuration gluster volume create gv0 replica 3  transport tcp node72:/data/gv0/brick1 node73:/data/gv0/brick1 node86:/data/gv0/ Brick1 means: 1 (scatter) x 3 (mirrored) configuration Gluster volume create gv0 replica 3 transport  tcp node72:/data/gv0/brick1 node73:/data/gv0/brick1 node86:/data/gv0/brick1 node72:/ Data/gv0/brick2 node73:/data/gv0/brick2 node86:/data/gv0/brick2 means: 2 (scatter) x 3 (mirror) other combinations, and so on. Iv. cleaning up data [[email protected] gv0]# gluster volume stop gv0[[email protected]  gv0]# gluster volume delete gv0 on each host: # find /data/gv0 -delete Five, the wrong "Q1" , Startup  glusterd  Error:/usr/lib64/glusterfs/3.4.4/rpc-transport/rdma.so: cannot open shared object file: no such file  or directorya: Other versions of Glusterfs have been used before, so the/var/lib/glusterd directory needs to be removed. # mv /var/lib/glusterd /tmp/glusterd_old# service glusterd startstarting  glusterd:                                           [  ok  ] "Q2", create volume times wrong  volume create:  gv0: failed: /data/gv0/brick1 or a prefix of it is already  part of a volumea: Before creating the volume, error, volume create: gv0: failed Therefore, there is information left, need to clean up. # cd /data/gv0/brick1# attr -lq .glusterfs.volume-id# setfattr -x  trusted.glusterfs.volume-id  or bulk: # cd /data/gv0# for i in  ' ls . '; do setfattr -x trusted.glusterfs.volume-id  $i;  Done "Q3", because mount volume times wrong, domain name resolution exception A: Mount the Gluster node is provided by the domain name service, therefore, you need to configure the client can properly resolve all nodes in the cluster domain name.


Preliminary discussion on the type of glusterfs-test volume

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.