One, GlusterFS overview
1.1 Introduction
Glusterfs is an open source distributed file system that is the core of Scale storage and can handle thousands of clients. In traditional solutions, Glusterfs can flexibly combine physical, virtual and cloud resources to reflect high availability and enterprise-level Performance storage.
Simple Application Server
USD1.00 New User Coupon
* Only 3,000 coupons available.
* Each new user can only get one coupon(except users from distributors).
* The coupon is valid for 30 days from the date of receipt.
Glusterfs aggregates the client's storage resource block sources through TCP/IP or InfiniBand RDMA network links, and uses a single global namespace to manage data, disk and memory resources.
Glusterfs is based on a stacked user space design, which can provide high performance for different workloads.
1.2 Features
●Scalability and high performance
GlusterFS uses dual features to provide highly scalable storage solutions ranging from several terabytes to several petabytes. The Scale-Out architecture allows to increase storage capacity and performance by simply adding resources. Disk, computing and I/O resources can all be increased independently, supporting high-speed network interconnections such as 10GbE and InfiniBand. Gluster Elastic Hash relieves GlusterFS's need for metadata
servers, eliminates single points of failure and performance bottlenecks, and truly realizes parallel data access.
●High availability
GlusterFS can automatically copy files, such as mirroring or multiple copies, so as to ensure that the data is always accessible, even in the case of hardware failure. The self-healing function can restore the data to the correct state, and the repair is performed in the background in an incremental manner, with almost no performance load. GlusterFS does not design its own private data file format, but uses mainstream standard disk file systems in the operating system (such as EXT3, ZFS) to store files, so data can be copied and accessed using various standard tools.
●Global unified namespace
The global unified namespace gathers disk and memory resources into a single virtual storage pool, shielding the underlying physical hardware from upper-layer users and applications. Storage resources can be elastically expanded in the virtual storage pool as needed, such as capacity expansion or contraction. When storing virtual machine images, there is no limit to the number of stored virtual image files. Thousands of virtual machines share data through a single mount point. Virtual machine I/O can be automatically load-balanced on all
servers in the namespace, eliminating access hotspots and performance bottlenecks that often occur in SAN environments.
●Flexible volume management
Data is stored in logical volumes, which can be obtained by independent logical division from virtualized physical storage pools. Storage servers can be added and removed online without causing application interruption. Logical volumes can be increased and reduced in all configuration servers, capacity can be balanced in different servers, or systems can be added and removed. These operations can be performed online. File system configuration changes can also be made and applied online in real time, which can adapt to changes in workload conditions or online performance tuning.
●Based on standard protocol
Gluster storage service supports NFS, CIFS, HTTP, FTP and Gluster native protocols, and is fully compatible with POSIX standards. Existing applications can access the data in Gluster without any modification or use of dedicated APIs. This is very useful when deploying Gluster in a public cloud environment. Gluster abstracts the dedicated API of the cloud service provider and then provides a standard POSIX interface.
1.3 GlusterFS related terms
● Brick: The storage unit in GFS, through an export directory of a server in a trusted storage pool. It can be identified by host name and directory name, such as ’SERVER:EXPORT’
●Volume: a logical collection of bricks (volume)
●FUSE: Filesystem Userspace is a loadable kernel module, which supports unprivileged users to create their own file system without modifying the kernel code. By running the file system code in the user space, the FUSE code is bridged with the kernel.
●VFS: Virtual File System
● Glusterd: Gluster management daemon, to run on all servers in the trusted storage pool.
Node: a device with several bricks
Client: The device with the GFS volume mounted
RDMA: Remote direct memory access, supports direct memory access without both OSs.
RRDNS: round robin DNS is a method of rotating back to different devices through DNS for load balancing
Self-heal: Used in the background to detect the inconsistencies of files and directories in the replica volume and resolve these inconsistencies.
Split-brain: split brain
Volfile: The configuration file of the glusterfs process, usually located in /var/lib/glusterd/vols/volname
1.4 Modular stacked architecture
●Modular, stacked architecture
●Realize complex functions through the combination of modules
GlusterFS adopts a modular and stacked architecture, and can support highly customized application environments through flexible configuration, such as large file storage, massive small file storage, cloud storage, and multi-transmission protocol applications. Each function is realized in the form of modules, and then a simple combination of building blocks can be used to realize complex functions. For example, the Replicate module can realize RAID1, and the Stripe module can realize RAID0. Through the combination of the two, RAID10 and RAID01 can be realized, while achieving high performance and high reliability. As shown below
Insert picture description here
Two, GlusterFS working principle
2.1 Elastic hash algorithm
●Get a 32-bit integer through the hash algorithm
● Divided into N continuous subspaces, each space corresponds to a Brick
●Advantages of elastic hash algorithm
Ensure that the data is evenly distributed in each Brick
Solve the dependence on the metadata server, thereby solving the single point of failure and access bottleneck
Insert picture description here
2.2 GlusterFS workflow
1. Clients or applications access data through the mount point of GlusterFS. For the user, the existence of the cluster system is completely transparent to the user, and the user does not feel whether it is operating the local system or the remote cluster system.
2. This operation of the user is submitted to the VFS of the local Linux system for processing.
3. VFS submits the data to the FUSE kernel file system, and the fuse file system submits the data to the GlusterFS client through the /dev/fuse device file. Therefore, we can understand the fuse file system as a proxy.
4. After the GlusterFS client receives the data, the client processes the data according to the configuration file configuration
5. Pass the data to the remote GlusterFS Server through the network, and write the data to the server storage device
Insert picture description here
Three, GlusterFS volume type
3.1 Distributed Volume
●The file is not divided into blocks
●Save hash value through extended file attributes
●The supported underlying file systems are ext4, zfs, xfs, etc.
●Features
Files are distributed on different servers, no redundancy
The size of the volume can be flexibly expanded
Single point of failure can cause data loss
Rely on underlying data protection
●Create distributed volumes
Create a distributed volume named dis-vol, the files will be distributed in node1:/data/sdb node2:/data/sdb according to the hash
gluster volume create dis-vol node1:/data/sdb node2:/data/sdb force
3.2 Strip roll
●The file is divided into N blocks (N strip nodes) according to the offset, and the polled storage is stored in each Brick Server node
●Performance is particularly outstanding when storing large files
●Without redundancy, similar to Raid0
●Features
Data is divided into smaller pieces and distributed to different strips in the block server
Distribution reduces the load and smaller files accelerate the speed of access
No data redundancy
●Create a striped volume
Created a stripe volume named stripe-vol, the files will be polled and stored in node1:/data/sdc node2:/data/sdc
gluster volume create stripe-vol stripe 2 transport tcp node1:/data/sdc node2:/data/sdc
(When transport is not specified, the default is RDMA)
3.3 Copy volume
●Save one or more copies of the same file
●Because the copy is saved, the disk utilization is low
●If the storage space on multiple nodes is inconsistent, the capacity of the lowest node will be taken as the total capacity of the volume according to the barrel effect
●Features
All servers in the volume keep a complete copy
The number of copies of the volume can be determined when the customer creates it
At least two block servers or more servers
With redundancy
●Create a replicated volume
Create a replication volume named rep-vol, the file will be stored in two copies at the same time, respectively in node3:/data/sdb node4:/data/sdb two bricks
gluster volume create rep-vol replica 2 node3:/data/sdb node4:/data/sdb force
3.4 Distributed striped volume
● Taking into account the functions of distributed volumes and striped volumes
●Mainly used for large file access processing
●At least 4 servers are required
●Create distributed strip volume
A distributed stripe volume named dis-stripe is created. When configuring a distributed stripe volume, the number of storage servers contained in the brick in the volume must be a multiple of the number of stripes (>=2 times)
gluster volume create dis-stripe stripe 2 node1:/data/sdd node2:/data/sdd node3:/data/sdd node4:/data/sdd force
3.5 Distributed replicated volumes
● Taking into account the functions of distributed volumes and replicated volumes
●Used when redundancy is required
●Create distributed replication volume
Create a distributed striped volume named dis-rep. When configuring a distributed replicated volume, the number of storage servers contained in the brick in the volume must be a multiple of the number of stripes (>=2 times)
gluster volume create dis-rep replica 2 node1:/data/sde node2:/data/sde node3:/data/sde node4:/data/sde force