Novice must read: Linux distributed storage--mogilefs

Source: Internet
Author: User
Tags file copy domain list install perl

first

: The background of distributed storage


①Time background

With the advent of the Web 2.0 era, a single computer node is far from meeting users' needs for massive data and application running space. Individuals and businesses have a need for secure and durable storage of information, and backup has become the most popular means. Individual users realize the security of information storage by saving multiple copies. A small probability event is not easy to occur, but if a winning occurs and multiple copies are lost, the collapse may not be a simple psychological expression, but will become an immutable fact. Cannot be restored. Storage needs to maintain real-time reading and writing, provide complex query functions, lack disaster recovery and backup capabilities, and store and process large amounts of unstructured data, which poses challenges to traditional storage methods.

②Technical background

With the rapid development of Internet technology, the "cloud" is surging, and the "Cloud Storage" system provides online storage services. Distributed storage uses these storage services to distribute data on multiple servers, making data storage simple to deploy, intelligent in operation, stable, reliable, and easy to expand.

2: MogileFS software introduction

1.MogileFS software features

① Support multi-node redundancy

② Can realize automatic file copy

③ Use namespace (namespace), each file is determined by key

For example: the key of file 123.jpg is: /000/000/00/01/md5hash.fid

④ RAID is not needed, and the application layer can directly implement RAID without sharing anything. Services are provided through the “cluster” interface.

⑤ work in the application layer, no special component requirements;

⑥ Do not share any data. MogileFS does not need to rely on expensive SAN to share disks. Each machine only needs to maintain its own disks.



2.MogileFS architecture

Three major components based on MogileFS distributed storage system:

① Tracker (Mogilefsd process): The scheduler is the core part of MogileFS and is mainly used to help clients locate the real data storage location. The MogileFSd process is the trackers process program. Trackers do a lot of work: Replication, Delete, Query, Reaper, Monitor, etc. This is an event-based parent process / message bus to manage all the client applications Interaction (requesting operations to be performed), including load balancing requests to multiple "query workers", and then letting the child processes of MogileFSd handle them;

② MySQL: used to store MogileFS metadata (namespace && file storage location), operated and managed by Trackers. You can use the mogdbsetup program to initialize the database, because the database saves all metadata of MogileFS, it is recommended to make it into HA (master-slave) architecture;

③ Storage Nodes: Storage node servers, also called Storage Servers, are used to store real data. Each storage node must start a mogstored service. Expansion is achieved by adding storage node servers.

3.MogileFS system management related concepts

① Domain: Domain. A MogileFS can have multiple domains to store different files (size, type). The key in the same domain must be unique, and the keys in different domains can be the same.

② Each storage node is called a host. A host can have multiple storage devices dev (separate hard disks), each device has an ID number, and Domain + Fid is used to locate the file;

③Class: file attribute management, locate the number of copies of the file stored on different devices;

3: Detailed explanation of MogileFS software

1.MogileFS software installation process

① Install perl related packages

yum install perl-Net-Netmask perl-IO-String perl-Sys-Syslog perl-IO-AIO
② Install the following rpm packages locally

MogileFS-Server-2.46-2.el6.noarch.rpm #Core services
perl-Danga-Socket-1.61-1.el6.rf.noarch.rpm #socket
MogileFS-Server-mogilefsd-2.46-2.el6.noarch.rpm # tracker node (tracker node must be installed)
perl-MogileFS-Client-1.14-1.el6.noarch.rpm #client
MogileFS-Server-mogstored-2.46-2.el6.noarch.rpm #Storage storage node (storage node must be installed)
MogileFS-Utils-2.19-1.el6.noarch.rpm #It is mainly some management tools of MogileFS, such as mogadm.
2. Program file && configuration file

Main program: / usr / bin / mogilefsd
Command line management tool program: / usr / bin / mogadm
Main configuration file (Tracker): /etc/mogilefs/mogilefsd.conf
Main configuration file (Storage Nodes): /etc/mogilefs/mogstored.conf
Four: Detailed explanation of the basic operations of MogileFS

1.Tracker initialization

Database authorization
GRANT ALL PRIVILEGES ON mogilefs. * TO 'mogile' @ '127.0.0.1' IDENTIFIED BY 'mogile' WITH GRANT OPTION;

2. Set up the database
mogdbsetup --dbhost = 127.0.0.1 --dbpass = mogile

3. Add running user mogilefs (not necessary for yum installation)
useradd -r mogilefs
mkdir / var / run / mogilefsd /
chown -R mogilefs.mogilefs / var / run / mogilefsd

4. Modify the tracker configuration file
vim /etc/mogilefs/mogilefsd.conf
    # Database connection information Set database user and password
    db_dsn = DBI: mysql: mogilefs: host = 127.0.0.1
    db_user = mogile
    db_pass = mogile
    # IP: PORT to listen on for mogilefs client requests mogilefs client IP
    listen = 172.17.250.121:7001
5. Start mogilefsd
/etc/init.d/mogilefsd start

6. Create a tracker sub-configuration file and specify tracker_IP. After the creation, the tracker host uses mogilefs related commands without specifying a tracker.
vim /etc/mogilefs/mogilefs.conf #Note, not the main configuration file! !!
    trackers = 172.17.250.121: 7001
2.Storage Nodes initialization

1. Modify the configuration file and customize the data storage directory
vim /etc/mogilefs/mogstored.conf
    docroot = / data / mogdata / #Specify the data storage location, usually mounted on a separate disk

2. Create this folder and modify the owner and group of the storage directory to mogilefs
chown -R mogilefs.mogilefs / data / mogdata /

3. Start the mogstored service
/etc/init.d/mogstored start

Note: After the startup of mogstored is completed, the machine will become a storage node. Next, you must cooperate with mogadm to add the current host to the MogileFS system
3.Add nodes to MogileFS

1. Add storage to the specified tracker
mogadm --tracker = 172.17.250.121: 7001 host add node1 --ip = 172.17.214.74 --port = 7500 --status = alive
mogadm --tracker = 172.17.250.121: 7001 host add node2 --ip = 172.17.214.75 --port = 7500 --status = alive
Note: This command can be executed on both tracker and storage because a specific tracker has been specified.

2. Check if the host is added successfully
mogadm --tracker = 172.17.214.73: 7001 check
mogadm --tracker = 172.17.214.73: 7001 host list

3. Modify node information
mogadm host modify node1 --ip = 123.xxx.xxx.70 --status = alive
4.Add storage device to MogileFS

Note: In a production environment, the devID / directory will be mounted on the specified hard disk, and then it will be added as a node device. The system disk will not be used.

1.Create the directory / dev / mogdata / devID
Directory name: dev + ID, and the ID is unique and cannot be repeated
mkdir / data / mogdata / dev1
mkdir / data / mogdata / dev2

2. Modify directory permissions to mogilefs
chown -R mogilefs.mogilefs dev1 /
chown -R mogilefs.mogilefs dev2 /

3. Add device
Format: mogadm device add --tracker = <IP: PORT> <storage_node_name> ID
mogadm --tracker = 172.17.214.73: 7001 device add node1 1
mogadm --tracker = 172.17.214.73: 7001 device add node2 2

4. View device list
mogadm --tracker = 172.17.214.73: 7001 device list

Off topic:
Mark the device as: dead (handle carefully !!!)
When the hard disk is damaged and the device has a problem, you can mark the device as dead. At this time, MogileFS will start deleting files on the device and try to copy them to other devices between the clusters.
mogadm --tracker = 172.17.214.73: 7001 device mark node1 1 dead
5.Domain && Class management

1. Add a domain name
mogadm --tracker = 172.17.250.121: 7001 domain add zdd

2. Check the "domain" to show the existing domains
[[email protected] ~] # mogadm --tracker = 172.17.250.121: 7001 domain list
domain class mindevcount replpolicy hashtype
-------------------- -------------------- ---------- --- ------------ -------
zdd default 2 MultipleHosts () NONE
Note: mindevcount number of copies

3. Create "class" zd in the domain zdd, and add the minimum number of copies (number of copies) to 3
mogadm --tracker = 172.17.250.121: 7001 class add zdd zd --mindevcount = 3
6.MogileFS file management

1. Upload file: mogupload

mogupload --domain = zdd --key = test1 --file = Chrysanthemum.jpg

2. Query file: mogfileinfo
mogfileinfo --domain = zdd --key = test1

3. Delete the file: mogdelete
mogdelete --domain = zdd --key = test1

4. List all file keys: moglistkeys
This can list all the keys under the specified domain, or you can specify a prefix to find the keys of all files with a specific prefix.
moglistkeys --domain = zdd
moglistkeys --domain = <domain_name> --key_prefix = <string>

5. List files with specified fid: moglistfids

fid: the file number in the MogileFS file system, which is self-incrementing

Purpose: query all / specified number of files after specified fid
moglistfids --fromfid = show all files after the specified fid
moglistfids --fromfid = --count = <num> display num files after the specified fid
7.Storage server status management && hard disk management

1. Stop the server temporarily
Usage scenario: If you need to maintain a server, such as updating memory, upgrading the operating system, and other operations that require shutdown, it is recommended that you set these to "down" before operating.
Natural failures can be handled very flexibly.
mogadm host mark node_name down ## Mark the storage node offline Note: This node is readable and cannot be written
......... Do related operations ... .
............... When the operation is complete, start the server ...............
mogadm host mark node_name alive ## Storage node machine is back online

2. Add Hard Disk Device
Use scenario: Sometimes you need to replace the new hard disk with the old hard disk, you need a new deviceID. The reason for this is that you can copy files from the list of all files in the old device to the new hard disk, thus protecting
Maintain data integrity.
mogadm device add node_name node_ID --status = alive
mogadm device add node_name node_ID --status = down
8.Storage mode management

1, read-only mode: radonly, can only read, prohibit writing and deleting
mogadm device mark node_name node_id readonly

2, exhaustion mode: drain, can only read and delete, prohibit writing
mogadm device mark node_name node_id drain

Note: Drain mode in earlier versions of MogileFS will remove FIDS from the device. It has now been replaced by the function of rebalancing.

3.Recopy files
If a hard disk is broken, MogileFS can automatically prevent requests from accessing the device, but it will not automatically recopy the files on this hard disk.You must use mogadm to
Manually mark it as 'dead'. As long as you do this, MogileFS will start deleting files on the device and try to copy them between the clusters to other devices.
mogadm device mark node_name node_id dead
9. MogileFS software bug fix: Unable to create a copy

Attentive friends found that after setting the mindevcount of MogileFS, it cannot take effect, so it can be solved by the following steps

1: tracker and storage perform the following operations
1. Install the perl language compilation environment
yum -y install make gcc unzip perl-DBD-MySQL perl perl-CPAN perl- YAML perl-Time-HiRes

2. Download related modules
Package address: http://www.cpan.org/authors/id/B/BR/BRADFITZ/Sys-Syscall-0.23.tar.gz

3. Unpack and compile
tar zxvf Sys-Syscall-0.23.tar.gz
cd Sys-Syscall-0.23 /
perl Makefile.PL
make && make install

Two: restart the service
1. Restart tracker's mogilefsd service
2. Restart storage's mogstored service
10.Rebalance strategy

Rebalance policy official document: https://github.com/hrchu/mogilefs/blob/wiki/Rebalance.md

① Strategy description

Write data on a disk with less free space to another disk with more free space to balance space usage

② Strategy work diagram

Interpretation of the strategy icon: For example, when the size and utilization of the new hard disk and the existing hard disk in the cluster are inconsistent, continuing the write operation will cause the other four hard disks to run out first, which will cause the creation of the copy to fail, which will cause security risks.

③ Policy related execution commands

Command options:
mogadm rebalance settings:
rebal_policy = from_percent_used = 95 to_percent_free = 50 limit_type = device limit_by = size limit = 5g fid_age = old

1. Move dev2 data with a usage rate of 50% to a disk with a usage rate of 40% to balance the storage
mogadm rebalance policy --options = "from_hosts = 2 to_percent_free = 40"

2.View existing balance strategies
mogadm rebalance test

3. Start of the strategy | Pause | Reset | Queue
$ mogadm rebalance start
$ mogadm rebalance stop
$ mogadm rebalance reset
$ mogadm --stats = "general-queues" ## rebalance policies that are still running after stop, for these policies will continue to be executed in queues
Note: stop does not affect the ongoing rebalance strategy

4, check the rebalance status
$ mogadm rebalance status


Novice must read: Linux distributed storage-Mogilefs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.