MogileFS and Fastdfs's personal insights
MogileFS & Fastdfs for two open source Distributed File system, are mainly used for Internet file sharing, upload, download and other functions, mainly for multi-upload and download, not often modified operations. The M and F deployment architectures are similar in that they are designed to avoid a single point problem in a cluster.
MogileFS
————————-
official website : https://code.google.com/p/mogilefs/
Basic Architecture : Trackerserver (Tracker + DataBase) + Storageserver
[Part of MogileFS]
1. Database (MySQL) section
You can use the Mogdbsetup program to initialize the database. The database holds all the metadata of the mogilefs, you can take the database server alone, you can run with other programs, the database part is very important, similar to the mail system Certification Center is so important, if it hangs here, then the entire mogilefs will be in an unusable state. It is therefore better to have an HA structure.
2. Storageserver (Storage node)
The start of the mogstored program will make this machine a storage node. The/etc/mogilefs/mogstored.conf is read by default at startup and can be configured with reference to the configuration section. After the mogstored is started, the machine can be added to the cluster by MOGADM. A machine can run only one mogstored as a storage node, or it can run other programs at the same time.
3. Trackersserver (Tracker)
MOGILEFSD is the trackers program, similar to the MogileFS wiki, trackers do a lot of work, Replication, Deletion,query,reaper,monitor and so on. Mogadm,mogtool all operations to deal with trackers, some client operations also need to define a good trackers, so it is best to run multiple trackers at the same time to do load balancing. Trackers can also be run on only one machine, or can be run with other programs, as long as you configure his configuration file, the default in/etc/mogilefs/mogilefsd.conf.
4. Tools
The main thing is the Mogadm,mogtool, which is used to control the entire mogilefs system under the command line and view the status and so on.
If you call an interface in a different language, you need to develop it two times.
5. Client
The client is actually a Perl pm that can write programs that call the PM to use the MogileFS system to read and write to the entire system.
[Logical principle]
Each file upload and read, all through the front-end trackerserver server, Trackerserver server by the client side of the request, query the database, return an upload or read the available backend Storageserver address, Then the client side directly operates the backend Storageserver server. The upload operation return is the result of success or failure, and the read operation is to return the corresponding query data.
====================================
Fastdfs
————————-
Official website: https://code.google.com/p/fastdfs/
Basic architecture: Trackerserver + storageserver
[Part of Fastdfs]
1. Storage Server
In other file systems, it is commonly referred to as Trunk server or data server. Storage server uses the OS file system to store files directly. Fastdfs does not block the file, the client uploads the file and the file on the storage server corresponds to one by one.
2. Trackerserver
Tracker server is a central node, and its main role is load balancing and scheduling. Tracker server records information such as group and storage server status in memory, does not log file index information, and consumes little memory. Additionally, when the client (app) and storage server access Tracker server, Tracker server scans the in-memory grouping and storage server information, and then gives the answer.
[Logical principle]
Storageserver as the active party, after the service, will be timed (time can be configured) to his corresponding tracker to publish their status and related information. The Trackerserver service simply logs the corresponding server IP to the group, so that it returns to the server IP directly when read. Trackerserver stores a list of servers for each group, and the storage server in the server list is a real-time backup operation.
In the context of a single tracker, multiple storage architectures, the first is the upload operation:
The client side sends the upload request directly to the tracker server, tracker receives, will reply according to own set of defined rules (can configure), replies the storage server IP which can upload, The client upload the file to the storage server and completes the operation.
Read operation:
The client sends a URL address that requires a GET, tracker the storage server according to the group in the URL, and returns the IP address of the server that can be accessed.
Client directly access the specified storage server, this storage server has deployed Nginx similar HTTP service, and loaded Fastdfs module, need to advance the domain name jump settings, complete the file read.
On the system architecture design of multi-tracker and multi-storage
According to Fastdfs's architecture description, both tracker and storage can be extended horizontally, and there is not a comparable standard of a tracker and storage pairing pattern, just say personal advice in a group, Storage as far as possible to ensure that in 2-3 storage servers, paired tracker keep 2 units.
For a cluster pattern, it is probably possible to design the system architecture:
2 Nginx Most front-end server, for client side of the read request, the main role is to do load balancing service, hot standby operation, preferably in the Nginx config configuration to add location settings, Jump directly to the corresponding storage server according to GroupName.
2 tracker servers, for the distribution of write operations, can also do hot standby operations, the backend Storage server configuration tracker settings need to bind 2 servers.
n Storage server, a group with 2-3 servers, can be based on the size of the data, from a small scale, such as the new expansion, directly add new group and storage server can, so just modify the front-end that Nginx service configuration, the other do not have to adjust.
====================================
MogileFS VS Fastdfs
[Similar point]
1. The architecture is similar, with the tracker and storage two parts of the cluster architecture, can be very convenient for horizontal expansion.
2. In the case of storageserver, if there is a machine down, hard disk damage, can automatically complete the repair function.
3. There is no single point of failure in the architecture design, the cluster server does not need to use RAID services, to avoid a hadoop-like design of the cluster in the front end mapreduce outage, the entire system failure problem
4. File storage cannot be split on large files (Hadoop can be implemented), so if a single file exceeds the storage space of a physical machine, this system storage cannot be used
5. File System storage format is not the original storage, even if you log on to the file server can not get the data in the system, must pass a certain interface to obtain
[Different points]
1. There is no database in the F-schema, and a database must be used in M to store the address of each file. However, in the F-schema, a relative address is returned after each upload, which needs to be saved by writing to a data source on its own, and will be able to find the location of this file for the file system at the next read. This means that the data in F is independent of the file system and can be designed by itself, and M is included in the file system.
2. f is developed by C, M is has Perl development, performance aspects F-dominant
3. F uses the socket communication method directly, which is more efficient than the HTTP mode of M. and f transmits files using Sendfile, with 0 copies of memory, less overhead and higher file transfer efficiency.
4. F from V1.14 to support the same file content to save only one copy, which can save storage space, improve file access performance, M does not have this feature
5. F has the concept of group, each group is a storage cluster, one cluster each storage server replicates data to each other. There is no group in M, but the class is introduced to differentiate how the files are stored. For example, the configuration in Appendix 01, different files can be used in the upload operation of the choice of which class, you can take the same amount of backup control.
6. F is developed by the people, there are problems can be directly asked the author (QQ Group: 212801927), M want to find relevant information more trouble
Appendix 01
Mogadm Class List
Domain Class Mindevcount Replpolicy
——————– ——————– ————- ————
Toast Byhost 2 multiplehosts ()
Toast default 2 multiplehosts ()
Toast four 4 multiplehosts ()
Toast Fourbynamenet 1 hostspernetwork (near=2,far=1)
Mogadm class Add Toast Twoontwonets–replpolicy "Hostspernetwork (near=2,far=2)"
Mogadm class Modify Toast Twoontwonets–replpolicy "Hostspernetwork (near=3,far=3)"
Only 1 reviews
- Terry2014/01/23
5. File System storage format is not the original storage, even if you log on to the file server can not get the data in the system, must pass a certain interface to obtain
————————————————————————
The Fastdfs is stored in the original file and cannot be cut to the file.
Reply
Add a new comment
MogileFS and Fastdfs's personal insights