Distributed File System MogileFS

Source: Internet
Author: User
Tags in domain picture hosting to domain



MogileFS is an open source Distributed file system for the formation of distributed file clusters, developed by LiveJournal Danga Interactive Company, Danga team developed including Memcached, MogileFS, Perlbal Good Open Source project: (Note: Perlbal is a powerful Perl-written reverse proxy server). At present, domestic use of mogilefs has a picture hosting site Yupoo and so on.

MogileFS characteristics

1. Application tier – no special core components required

2. No single point of failure the-mogilefs Distributed File storage System installs three components (storage nodes, trackers, tracking databases) that can be run on multiple machines, so there is no single point of failure. (You can also run trackers and storage nodes on the same machine, so you don't need to recommend at least two machines with 4 machines).

3. automatic file copying-based on different file "classification", the file can be automatically copied to multiple storage nodes with sufficient storage space, so as to meet the "category" of the minimum replication requirements. For example, you have a picture site, you can set the original JPEG image to copy at least three copies, but only 1or2 copies, if the data is lost, then the MogileFS Distributed file storage System can re-establish the lost copy number. In this way, mogilefs (do not raid) can save the disk, otherwise you will store the same copy multiple copies, completely unnecessary.

4. "Much better than raid" – in the establishment of a non-storage area network RAID (Non-san RAID), the disk is redundant, but the host is not, and if your entire machine is broken, the file will not be accessible. The MogileFS Distributed file storage system copies files between different machines, so files are always available.

Transport neutral, no special protocol-mogilefs Distributed file storage System clients can communicate with MogileFS storage nodes via NFS or HTTP, but first you need to tell the tracker.

5. Simple namespace – The file is determined by a given key and is a global namespace. You can generate multiple namespaces yourself, as long as you want, but this may cause key collisions in the same mogilefs.

6. Do not share anything-mogilefs Distributed file storage systems do not rely on expensive sans to share disks, and each machine only maintains its own disks.

7. The disk that does not need raid-in MogileFS can be raid or not, and if it is for security, RAID is not necessary because the MogileFS Distributed file storage System has been provided.

Structure of the MogileFS

Before use, we need to have a basic understanding of mogilefs, which is his three large parts, Tracker and Database and Storage Nodes, the Client composition. By two service processes MOGILEFSD and mogstored.

Components of the MogileFS

Mentioned earlier Tracker and Database and Storage Nodes, client composition, we do not speak of the client. Because the client is actually a Perl pm, you can write the program to call the PM to use the mogilefs system, the entire System for read and write operations. In addition, there are related modules like nginx. In addition, it is mounted as a file system using fuse mode.

Trackers (tracker, scheduler)

This is the core of MogileFS, the popular point, he is a dispatcher. MOGILEFSD process is the trackers program, similar to the MogileFS wiki, trackers do a lot of work, Replication, Deletion,query,reaper,monitor and so on. This is the event-based (event-based) parent process/Message bus that manages all the interactions that come with the client application (requesting operations to be performed), including balancing the request load into "query workers", allowing Mo GILEFSD the child process to handle. Mogadm,mogtool all operations to deal with trackers, some Client operations also need to define a good trackers, so it is best to run multiple trackers at the same time to do load balancing. Trackers can also be run on only one machine or run with other programs (not recommended)

Configuration file:/etc/mogilefs/mogilefsd.conf

Database (MySQL) section

As shown, the database is used to store MogileFS metadata (namespace, and where the file is). It is trackers to operate and manage it. You can use the Mogdbsetup program to initialize the database. Because the database holds all the metadata for mogilefs, if it hangs here, the entire mogilefs will be in an unusable state. It is therefore better to have an HA structure.

Storage node (Storage Nodes)

Where the actual files are stored. The storage node is an HTTP server that can be used to delete, store, rename, and so on, any WebDAV server, but it is recommended to use mogstored. MOGILEFSD can be configured to use different ports on two machines ... mogstored for all DAV operations (and traffic monitoring), and your own choice of fast HTTP server is used to do GET operations (to provide files to clients). A typical application is a mount point with a large capacity SATA disk, which is mounted to the/VAR/MOGDATA/DEVNN. As soon as you finish configuring the configuration file, the Mogstored program starts to make the machine a storage node. Of course, you need to mogadm this tool to add this machine to the Cluster.

Configuration file:/etc/mogilefs/mogstored.conf

MogileFS Service Process

Corresponds to the above section

Mogilefsd-mogilefs's main daemon, which is the trackers (tracker) referred to above, is controlled by the/etc/mogilefs/mogilefsd.conf configuration file.

The Mogstored-mogilefs storage daemon, which is the storage node (Storage Nodes) referred to above, is controlled by the/etc/mogilefs/mogstored.conf configuration file.



MogileFS consists of 3 parts:
1th Part: Is the server side, including MOGILEFSD and mogstored two programs. The former is MOGILEFSD's tracker, which keeps some global information in the database, such as site Domain,class,host. The latter is the storage node, which is actually an HTTP Daemon, which listens on port 7500 by default, accepting file backup requests from clients. After installation, to run the Mogadm tool to register all store node in the MOGILEFSD database, MOGILEFSD will manage and monitor these nodes.
2nd part: Is Utils (toolset), mainly mogilefs some management tools, such as Mogadm.
3rd part: Is the client API, currently only Perl API (MOGILEFS.PM), PHP, with this module can write the client program, implement the backup management function of the file, provide mogilefs.pm.




cap theory : Consistency, availability, partitioning fault tolerance; a distributed system that does not meet the three requirements of consistency, availability, and partitioning fault tolerance, At most, only two of them can be satisfied,
   c ( Consistency): Consistency , any read operation is always able to read the previously completed write operations, is a data write immediately read,
   a (availability): Availability , each operation is always able to return at a determined time, regardless of success or failure to receive a return value;
   p (Tolerance of network Partition): Partition fault tolerance , in the case of network partitions, can still meet the consistency and availability;

 

base Law model anti-acid model, completely different acid models, sacrificing high consistency for availability or reliability:
   ba : Basically Available, basic can be used, support partition failure ( Sharding fragment partitioning database);
   S : Soft state, soft state, accepts states for a period of time out of sync, asynchronously,
   e : Eventually consistent: eventual consistency, weak consistency of performance;
   base thought mainly emphasizes the basic usability, if you need high availability, that is, pure performance, then with consistency or fault tolerance as the sacrifice, the base idea of the scheme has the potential to dig in performance.

Paxos algorithm : More than 2PC submission of a more lightweight distributed transaction coordination method, probably refers to the premise of the Byzantine general does not appear, in order to obtain data consistency, in the communication channel is not secure, our data transmission may be hijacked, so that the data can not be trusted, Therefore, it is necessary to ensure that the Paxos algorithm is feasible in communication channel security.




[MogileFS divided into several parts]

1. Database (MySQL) section
You can use the Mogdbsetup program to initialize the database. The database holds all the metadata of the mogilefs, you can take the database server alone, you can run with other programs, the database part is very important, similar to the mail system Certification Center is so important, if it hangs here, then the entire mogilefs will be in an unusable state. It is therefore better to have an HA structure.

2. Storage node
The start of the mogstored program will make this machine a storage node. The/etc/mogilefs/mogstored.conf is read by default at startup and can be configured with reference to the configuration section. After the mogstored is started, the machine can be added to the cluster by MOGADM. A machine can run only one mogstored as a storage node, or it can run other programs at the same time.

3. Trackers (Tracker)
MOGILEFSD is the trackers program, similar to the MogileFS wiki, trackers do a lot of work, Replication, Deletion,query,reaper,monitor and so on. Mogadm,mogtool all operations to deal with trackers, some client operations also need to define a good trackers, so it is best to run multiple trackers at the same time to do load balancing. Trackers can also be run on only one machine, or can be run with other programs, as long as you configure his configuration file, the default in/etc/mogilefs/mogilefsd.conf.

4. Tools
The main thing is the Mogadm,mogtool, which is used to control the entire mogilefs system under the command line and view the status and so on.

5. Client
The client is actually a Perl pm that can write programs that call the PM to use the MogileFS system to read and write to the entire system.

[Concept definition]
can refer to the official wiki here, simply say
Domain: The highest field, under a domain key is unique.
Class: Included in domain, you can define the number of copies saved for each class.
Key: A unique identifier for the file.
File: Files.

Applicability
Because MogileFS does not support random reads and writes to a file, it is destined to be a part of the application. Compared to slice service, static HTML service.
That is, after the file is written basically does not need to modify the application, of course you can also generate a new file overlay up.



Three. Configuration 1) Create DATABASE #mogdbsetup –dbhost=10.15.6.28 –dbname=mogilefs –dbuser=root2) tracker Configure new/etc/ mogilefsd.conf File Contents:db_dsn dbi:mysql:mogilefsdb_user mogiledb_pass 123123conf_port  6001listener_jobs 5 db_dsn points to the location of your database, if your database is not on the same machine, please instead: db_dsn dbi:mysql:mogilefs:127.0.0.1   Because MOGILEFSD cannot be started with the root user. Add Mogile user # adduser mogile start before configuring  trackers server#  Su mogile# mogilefsd -c /etc/mogilefsd.conf –daemon2) Storage Server  Configure the Mogadm tool to add Storage server to the database: #mogadm  -lib=/usr/lib/perl5/5.8.8 -trackers=10.15.6.28:6001  host add mogilestorage -ip=10.15.6.28 -port=7500 -status=alive (since I was in a machine, So the trackers address and IP address are the same) use the following command to detect success: #mogadm  -lib=/usr/lib/perl5/5.8.8 -trackers=10.15.6.28:6001  Host list add a device to your storage server: #mogadm  -lib=/usr/lib/perl5/5.8.8 -trackers= 10.15.6.28:6001 device add mogilestoragE 1 use the following command to detect success: #mogadm  -lib=/usr/lib/perl5/5.8.8 -trackers=10.15.6.28:6001 device  listdevice id  is unique and once created will not be able to be deleted, only mark is dead.  so if you have a disk that is broken, your mark is dead,  and then repaired again, Then you have to reformat and name the new device id,  does not support changing device from dead to alive.  new configuration file:/etc/mogstored.conf content is httplisten= 0.0.0.0:7500mgmtlisten=0.0.0.0:7501docroot=/opt/mogdata to build a directory for storing files  # mkdir /opt/mogdata  Build a directory under the directory where the files are stored: #mkdir  -p /opt/mogdata/dev1PS:mogadm  parameter usage please refer to http://search.cpan.org/~dormando/ MOGILEFS-UTILS/MOGADM3) Run mogilefs start  storage server#mogstored -c /etc/mogstored.conf – Daemon Start  trackerssu mogilemogilefsd -c /etc/mogilefs/mogilefsd.conf – Daemon View all of your services are up without #ps -ef | grep mogilefsd#ps -ef | grep  Mogstored three. Test phase generation domain#mogadm -lib=/usr/lib/perl5/5.8.8 -trackers=10.15.6.28:6001 domain  add testdomain  plus a  class  to domain#mogadm -lib=/usr/lib/perl5/5.8.8 -trackers=10.15.6.28:6001 class add testdomain testclass 






This article from "~" blog, declined reprint!

Distributed File System MogileFS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.