Introduction to Distributed File Systems

Last Update:2015-01-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. The origin of the story

Or many years ago, I had a small system, a small system related to payment. Because it is a small system, everything is so simple. An application server and a database server. Files and images are stored on the application server. Everything is so plain that everything is taken for granted.

Suddenly one day, payment became a fashionable topic. suddenly one day, the platform was about to become the core system of a new payment company. As a result, access to the system was about to soar. All of this is sudden ......

2. simplified system architecture after adjustment

The front end uses the Server Load balancer device for unified scheduling. The middle end corresponds to the server for horizontal scaling, And the backend upgrades the database. The back-end file storage uses a multi-layer NFS architecture, but it is still not enough. Distributed File systems have become an inevitable choice. After the distributed file system is adopted, data access between servers is no longer a one-to-many relationship, but a many-to-many relationship. As a result, the performance is greatly improved without any problems.

3. Introduction to Distributed File Systems

With the distributed file system, you can easily locate and manage shared resources in the network, use a unified named path to access the required resource center, provide reliable load balancing, and file replication service (FR) the combination provides redundancy between multiple servers and integrates with system permissions to ensure security.

In a distributed environment, there are too many accidents, data transmission errors at any time, and servers are ready to sacrifice at all times. Many common exceptions are often called exceptions, which must be handled as usual. Therefore, for a distributed file system, it is not enough to satisfy all file system services under normal conditions. It is also necessary to ensure healthy and continuous services in various distributed unexpected scenarios. Otherwise, it will be useless.

1. Server Error Recovery

In a distributed environment, it is common to sacrifice servers. The sacrifice is terrible. What's terrible is that you are not always ready to sacrifice them. As a qualified distributed system, the application server is always ready. When an error occurs on each application server, corresponding emergency policies and solutions are required;

Client

In distributed file systems, the most unimportant application server should be the client. After all, as a file system user, the position in the entire file system is inevitably not high. As a client, most of the time it was sacrificed, no one mourned, no one sympathized, and unfortunately left the world when it was hard to write (the machine was down or the network was disconnected, and so on ...), will cause some panic. Because, at this moment, the corresponding file on the master server is alive as a node that may be constructed, just as the client service provider that occupies it as a specific file, it does not allow other clients to dye fingers. In this way, once the client service provider occupies it sacrifices, the client will still occupy resources and will not be released. There must be a way to solve this problem. This is the lease...

A lease is a short-term contract signed with the master server when the client needs to occupy a certain file. This contract has a term in which the client can extend the term of the contract. Once the term expires, the master server will forcibly terminate the lease and grant the file the right to use it, assign to others...

Before opening or creating a file and preparing to append the file, sign a lease with the master server in the specified path with the client. The client regularly polls and renews the lease. At one end of the master server, check all the leases and check whether there are unrenewed leases. If everything works properly, the client will close the file and stop the lease. In the event of an accident, for example, if the file is deleted and the client sacrifices, the master server will deprive the lease, to avoid the problem that resources are occupied for a long time due to client downtime...

File Server

Massive file servers are an even more unstable factor. Once a file server fails and the master server does not know it, the master server will cheat the client in disguise and give them a list of read/write servers that cannot be connected, making them unable to work everywhere. Therefore, to ensure the stability of the entire system, the data server must always report to the master server to keep the master server fully aware of it. This mechanism is the heartbeat message. The file server must constantly report its status to the master server. For example, how much space is available, how much space is used, and so on. The master server reports the status of the file server as the basis for new data block allocation or load balancing.

Master Server

The Master Control Server is the core of the entire Distributed File System. as the core and single point of failure of the entire system, once the master control server is on the machine, the entire distributed file service cluster will be completely paralyzed. After the sacrifice of the master server, how to promote a new master server and quickly enable it to enter the working role becomes a problem that must be considered by the system. The solution policy is transaction logs.

If you are familiar with databases, you will know that they are from the database. On the master server, all the key steps for file directory operations (the data server where the specific file content is located will not be written into logs, because these contents are dynamically created ...), logs are written. In addition, the master server serializes the current file directory to the local device at some time point, which is called an image. Once an image is stored, the logs and other images written in the early stage of the image are purely redundant, and their historical mission has been completed and can be decommissioned and deleted. After the master server is unfortunately sacrificed, or the shutdown is completed and restarted, the master server reconstructs the entire file directory based on all the logs after the recent image + image, quickly restore service capabilities to the level before sacrifice...

2. Data correctness

In a complex distributed environment, we firmly believe that everything is possible. Even if all servers are working properly, there may be various situations that may cause data loss or errors during network transmission. In a distributed file system, data in the same file is backed up in a large amount of redundancy. The system must maintain full synchronization of all data blocks. Otherwise, users have to go crazy when reading different data from the same file on different clients.

In fact, before using a distributed system, our system encountered data inconsistency many times. In serious cases, company leaders paid close attention to the inconsistency, but the dust and dust were completely rooted in the soil, what should we do? The problem cannot be completely solved without being weighed from the overall perspective ....

To ensure data correctness and consistency of the same data, the distributed file system must do a lot of work. First, each data block has a version ID. Once the data on the data block changes, the version number will be increased. On the master server, the version of each data block is saved at this time. Once the version of the data block on the data server is inconsistent with the version, the related recovery process is triggered. This mechanism ensures that the data blocks on each data server are consistent in general. However, due to the complexity of the network, the consistency of the specific content cannot be guaranteed by the simple version information (because the version information is irrelevant to the content, the version may be the same, but the content is different ). Therefore, in order to ensure consistency of data content, a signature must be made according to the content...

When the client appends a data packet to the data server, the data of each data packet is segmented into segments, which serves as the basic unit of signature verification. When the data packet is transmitted to the last level of the pipeline, the data server will verify the data. Once it finds that the current transmission block signature is inconsistent with the signature in the client, the entire data packet writing is considered invalid and the entire process needs to be repeated;

3. Internal load balancing of Distributed File Systems

The server Load balancer process mentioned here is a balanced process in a broad sense. It mainly involves two phases of transactions. One is to allocate tasks as reasonably as possible during initial task allocation, the other is to monitor and adjust it in time

Server Load balancer is an eternal topic in distributed systems. It is important for everyone to work together and exert their unique advantages. In addition, Server Load balancer is also a complicated problem. What is Server Load balancer is a vague concept. For example, in a distributed file system, a total of three hundred data blocks are evenly distributed to ten data servers. Is it even balanced? In fact, not necessarily, because each data block requires several backups, the distribution of each backup should fully consider the location of the rack, and the communication speed between servers in the same rack is faster, the security of servers distributed in different racks is further improved. servers distributed in different data centers provide higher security, but the response speed is more difficult to control.

4. Garbage Collection

Spam is a simple task, but it is not easy to lose anything. When I was doing hygiene at home, there was always a lot of things I felt no longer using since then, but my daughter-in-law (master server) and I didn't reach an agreement every time, and I wanted to lose a lot of things, my daughter-in-law said that she would like to put it on. wait a few days to see it. Many times, a few days have passed, and maybe it has been like this for a few years, you can only wait for the next cleaning or there is no place to put things in your house.

In distributed file systems, data block backup that does not take advantage of value is junk. Basically, all junk data can be considered as two types. One is generated by the normal logic of the system. For example, if a file is deleted, all related data blocks become junk, A data block is moved by the Server Load balancer, And the raw data block is unfortunately junk. The biggest characteristic of this type of spam is that the master server is the culprit in generating spam. That is to say, the master server fully understands which spam needs to be processed. There is also a kind of garbage, which is caused by some abnormal symptoms of the system. For example, if a file server is down for a while, after the restart, it is found that a data block on it has been added to another server for backup. The backup above it has expired and lost value, it needs to be treated as garbage. The opposite is the nature of such spam. The master server cannot directly understand the spam status. In this case, additional policies are required for processing, for example, the data on the faulty file server is processed as garbage, and all data on the file server is synchronized according to the rules. It can be cached first, and no one wants to restore it in a few days before deleting it.

4. Summary

The entire distributed file system. Three types of servers, as core Master Control servers for spof, log-based recovery mechanisms, and lease-based keep-in-touch mechanisms can be seen in distributed computing systems and distributed databases, the most important feature of a distributed file system is the redundant storage of file blocks, which directly leads to a complicated writing process.

Having written so much and reading so many exciting concepts, it is a good idea to build a distributed file system, but it is also a challenge, if you cannot make great determination and spend countless money and time, choose one of the many distributed file systems.

This article permanently updates the link address:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to Distributed File Systems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to Distributed File Systems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support