MogileFS is an efficient automated file backup component, developed by six apart, and is widely used in web2.0 sites including LiveJournal.
The mogilefs consists of 3 parts:
The 1th part is the server side, including the MOGILEFSD and mogstored two programs. The former is MOGILEFSD tracker, which keeps some global information in the database, such as site Domain,class,host. The latter is the storage node (store node), which is actually an HTTP Daemon, which listens on port 7500 by default and accepts file backup requests from clients. After installation, the Mogadm tool will be run to register all store nodes in the MOGILEFSD database, and MOGILEFSD will manage and monitor them.
The 2nd part is the Utils (toolset), mainly mogilefs some of the management tools, such as Mogadm.
The 3rd part is the client API, currently only Perl API (MOGILEFS.PM), PHP, with this module can write client programs, the implementation of file backup management functions
introduce
MogileFS once brought me into the field of distributed file system, since the Ttlsa on the Gearman, also talk about the MogileFS bar, is a person developed, MogileFS clever use of HTTP put implemented a distributed server, suitable for storing small files
Methodology
To know a system, I think my steps are as follows:
What this system is, is to solve what problems exist;
What the system looks like;
How to talk to this system.
So this paper from the source of MogileFS, and then sketch the structure of MogileFS, introduce the basic use of mogilefs, and finally introduced the MogileFS management.
The article introduces how to incorporate MogileFS into the third party application method.
Mind Map
Background
MogileFS, a distributed file system developed by Danga Interactive, was created to address the storage challenges of LiveJournal sites that were operating at that time.
Prior to this, the technical team had adopted technology such as database partitioning, which meant that MogileFS also contained the idea of divide and conquer. The current mogilefs has been widely used in some high-performance web2.0 Web sites, most typically Instagram use it as a picture storage cluster.
Terminology and interpretation
Understanding the terminology that appears in MogileFS is critical to mastering the architecture of MogileFS
Terminology interpretation
Application thing that wants to store/load files
Database The database that stores the MogileFS metadata (the namespace, and which files are where). This should is setup in a HA config so don ' t have a single point of failure.
Tracker event-based Parent Process/message Bus This manages all client communication from applications (requesting ope Rations to is performed), including load balancing those requests onto "query workers" and handles all communication Een mogilefsd child processes.
Storage node where files are stored. The storage nodes are just HTTP servers "do DELETE", put, etc. Any WebDAV server are fine, but mogstored is recommended. MOGILEFSD can is configured to use two servers on different ports. Mogstored for the All DAV operations (and sideband Monitori NG), and your fast/light HTTP server of choice for get operations. Typically people have one fat SATA disk per mountpoint, each mounted at/var/mogdata/devnn.
Domain A is the top level separation of files. The File keys are unique within domains. A domain consists of a set of classes that define the files within the domain. Examples of Domains:fotobilder, LiveJournal.
Class Every file is part of exactly one class. A class is part of exactly one domain. Class A, in effect, specifies the minimum replica count of a file. Examples of classes:userpicture, Userbackup, Phonepost. Classes may have extra replication policies defined.
Minimum replica count (mindevcount) is a class. This is defines how many times the "files in" class need to is replicated onto different in order to devices ensure Ancy among the data and prevent loss.
Key A key is a unique textual string that identifies A file. Keys are unique within domains. Examples of keys:userpicture:34:39, phonepost:93:3834, userbackup:15. Fake structures work too:/pics/hello.png, any string.
File a file is a defined collection of bits uploaded to MogileFS to store. The Files are replicated according to their minimum count. Each file has a key, are a part of one class, and are located in one domain. The Files are the things that mogilefs stores for you.
FID a FID is a internal numerical representation of a file. Every file is assigned a unique FID. If a file is overwritten, it is given a new FID.
mogilefs Installation Configuration
The architecture of the MogileFS
The MogileFS architecture is as follows
In a mogilefs cluster, there are three types of roles in the node
Tracker node
Task Distribution Scheduling
Meta Database Node
Storing meta information for a cluster
Host information
Device information
Domain Information
Class information
Key information
File information
Storage node
File storage
MogileFS two kinds of programs
MOGILEFSD #负责实现tracker角色功能
mogstored #负责实现storage node Role function
In mogilefs, file is defined as a series of bits that are uploaded to storage node, identified by the unique key in domain within the system. A file belongs to a class,class as a set of attribute values.
Installation of MogileFS
Server environment
IP hostname
10.1.192.63 Cluster-database
10.1.192.58 Cluster-master01
10.1.192.59 Cluster-master02
10.1.192.60 cluster-segment01
10.1.192.61 cluster-segment02
10.1.192.62 cluster-segment03
This five server is the five virtual machines on the VMware vsphere, the virtual machine hangs under a new VMware NETWORK2 port, the server is connected through VMware switch, the port rate is 10000Mbps;
Because dependencies between modules are not strictly differentiated by server roles, it is recommended that you install the following modules under All servers:
Mogilefs-utils-2.28.tar.gz
Mogilefs-server-2.70.tar.gz
Mogilefs-client-1.17.tar.gz
installation process for MogileFS
Initializing the database on Cluster-database
Create User and database
CREATE DATABASE mogilefs;
GRANT all on mogilefs.* to ' mogile ' @ ' cluster-database ';
SET PASSWORD for ' mogile ' @ ' ibm01 ' = old_password (' Mo ');
GRANT all on mogilefs.* to ' mogile ' @ '% ';
SET PASSWORD for ' mogile ' @ '% ' = old_password (' Mo ');
FLUSH privileges;
Initializing the database
Mogdbsetup--dbname=mogilefs--dbuser=mogile--dbpass=mo
Configuring the Tracker Node
Mkdir-p/etc/mogilefs
echo << End > Mogilefsd.conf
DB_DSN = dbi:mysql:mogilefs:host=cluster-database;port=3306;mysql_connect_timeout=5
#db连接串
Db_user = Mogile
Db_pass = Mo
Conf_port = 7001
#管理端口
Listener_jobs = 5
Node_timeout = 5
rebalance_ignore_missing = 1
End
Configuring Storage node Nodes
Mkdir-p/etc/mogilefs
echo << End > Mogstored.conf
httplisten=0.0.0.0:7500
mgmtlisten=0.0.0.0:7501
Docroot=/data/mogdata
#http Server Listening Directory
End
Building device directory at Storage node nodes
Mkdir-p/data/mogdata/dev[1-n]
Add host and Device
Start Tracker
Mogilefsd-c/etc/mogilefs/mogilefsd.conf--daemon
Add host and Device
View Source
Print
?
Mogadm--trackers=cluster-master01:7001 Host add segment01--ip=10.1.192.60--status=alive
Mogadm--trackers=cluster-master01:7001 Host add segment02--ip=10.1.192.61--status=alive
Mogadm--trackers=cluster-master01:7001 Host add segment03--ip=10.1.192.62--status=alive
Mogadm--trackers=cluster-master01:7001 Device Add segment01 1
Mogadm--trackers=cluster-master01:7001 Device Add segment02 2
Mogadm--trackers=cluster-master01:7001 Device Add segment03 3
the use of MogileFS
The use of MogileFS
File download
Mogfetch--trackers=cluster-master01:7001--domain=abc--key= "Speach_of_dependence"--file=./speach_of_dependence_ Income.words
Files are present in domain, and you specify domain parameters when downloading
File Upload
Mogupload--trackers=cluster-master01:7001--domain=abc--class=test01.abc--key= "Speach_of_dependence"--file=./ Speach_of_dependence.words
The file has the class attribute, so you specify the class parameter and the domain parameter when uploading
File View
Moglistkeys--trackers=cluster-master01:7001--DOMAIN=ABC
Storage Device View
Mogadm--trackers=cluster-master01:7001 Device List
Node Device view
Mogadm--trackers=cluster-master01:7001 Host List
Domain View
Mogadm--trackers=cluster-master01:7001 Domain List
Class View
Mogadm--trackers=cluster-master01:7001 Class List
All requests are sent to the tracker node.
Inner MogileFS
Key-file
MogileFS does not maintain the original filename, and the so-called file is the bit stream that storage node receives. Tag the file inside the mogilefs with a key that is visible in domain.
File storage
The mogilefs assigns FID to each file, and the file is stored with a. FID suffix, and the system maintains the mapping relationship of FID to path. After splitting the FID (\d) (\d{3}) (\d{3}) (\d{3}) into four parts, the file is placed in the directory/devid/$1/$2/$3, and for which Devid is provided by master to the client for decision.
File redundancy
The Dvcont attribute of class to ensure the redundancy of the file within the system
Look into MogileFS
Since MogileFS is written in Perl, let's take a look at the source code for the program.
Mogdbsetup
This program initializes the META database when the database node is installed
Program Code Analysis
Call Module
Use Mogilefs::config;
Use Mogilefs::store;
#!/usr/bin/perl
Eval ' exec/usr/bin/perl-s $ ${1+ ' $@ '} '
if 0;
# not running under some shell
Use strict;
Use Getopt::long;
Use Lib ' Lib ';
Use Mogilefs::store;
Use Mogilefs::config;
#
#省略usage与opt设置部分
#
mogilefs::config->load_config;
My $sto = $sclass->new_from_mogdbsetup (
Map {$_ => $args {$_}}}
QW (Dbhost dbport dbname
Dbrootuser Dbrootpass
Dbuser Dbpass)
);
My $dbh = $sto->dbh;
$sto->setup_database
Or die "Database upgrade failed.\n";
My $latestver = mogilefs::store->latest_schema_version;
if ($opt _noschemabump) {
Warn "\n*\n* per your request, not upgrading to $latestver. I assume you understand why.\n*\n ";
} else {
$sto->set_schema_vesion ($latestver);
}
Warn "done.\n" if $opt _verbose;
Exit 0;
The Mogdbsetup program calls the Setup_database subroutine in Mogilefs::store initializes the database and determines whether the current operation is in the installation or upgrade by Schema_version.
Mogilefsd
Tracker the node process to complete the task assignment for the entire cluster
Program Code Analysis
Call Module
Use Mogilefs::server;
#!/usr/bin/perl
......
# Rename binary in process list to make init scripts saner
$ = "MOGILEFSD";
My $s = mogilefs::server->server;
$s->run;
1;
The program simply invokes the run subroutine in Mogilefs::server.
The whole mogilefs is a event-based of cluster.
Mogstored
Storage node node process, responsible for the actual operation of the file
Program Code Analysis
Call Module
Use Perlbal 1.73;
Use Findbin QW ($Bin $RealScript);
Use Mogstored::httpserver;
Use mogstored::httpserver::P erlbal;
Use MOGSTORED::HTTPSERVER::LIGHTTPD;
Use Mogstored::httpserver::none;
Use Mogstored::httpserver::apache;
Use Mogstored::httpserver::nginx;
Use Mogstored::sidechannellistener;
Use mogstored::sidechannelclient;
......
# Initialize basic required Perlbal machinery, for any HTTP server
My $perlbal _init = qq{
CREATE SERVICE mogstored
SET role = Web_server
SET Docroot = $docroot
# don ' t listen ... this is just a stub service.
CREATE SERVICE Mgmt
SET role = Management
ENABLE Mgmt
};
$perlbal _init. = "\nserver Pidfile = $pidfile" if defined ($pidfile);
Perlbal::run_manage_commands ($perlbal _init, sub {print STDERR "$_[0]\n";});
# Start HTTP Server
My $httpsrv _class = "Mogstored::httpserver::". Ucfirst ($server);
My $httpsrv = $httpsrv _class->new (
Listen => $http _listen,
Docroot => $docroot,
Maxconns => $max _conns,
Bin => $serverbin,
);
# Configure Perlbal HTTP Listener after daemonization since it can create a
# Kqueue on *bsd. Kqueue descriptors are automatically invalidated on fork (),
# making them unusable after daemonize. For Non-perlbal, starting the
# Server before daemonization improves error reporting as Daemonization
# redirects Stdout/stderr to/dev/null.
$httpsrv->start if $server ne "perlbal";
if ($opt _daemonize) {
$httpsrv->pre_daemonize;
Perlbal::d aemonize ();
} else {
print "running.\n";
}
# It's now safe for Perlbal to create a kqueue
$httpsrv->start if $server eq "Perlbal";
$httpsrv->post_daemonize;
# Kill our children processes on exit:
My $parent _pid = $$;
$SIG {TERM} = $SIG {INT} = sub {
return unless $$ = = $parent _pid;
# don ' t let it be inherited
Kill ' TERM ', grep {$_} keys%on_death;
Posix::_exit (0);
};
Setup_iostat_pipes ();
Start_disk_usage_process ();
Start_iostat_process () if $opt _iostat;
Harvest_dead_children ();
# every 2 seconds, it reschedules itself
Setup_sidechannel_listener ();
# now start the main loop
Perlbal::run ();
Management MogileFS
1. The system starts and stops
Start Tracker
Mogilefsd-c/etc/mogilefs/mogilefsd.conf--daemon
Start Storage node
Mogstored--daemon
Stop Tracker
echo!shutdown | NC Cluster-master01 7001
Stop Storage Node
Killall mogstored
2. View System Status
#mogadm Check
Dfs
mogilefs-04
"Flow distribution within the system"
There are three kinds of traffic in the system
Tcp7001 on Tracke #Client客户端发送给tracker请求流量
Tcp3306 on MySQL #tracker与meta database traffic
Tcp7500 on Storage node #Client与storage node data traffic
Tcp7501