Summary of distributed Blog system TFs

Last Update:2014-09-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Solved problems

The total number of files is too large. A server cannot store the files. It can only store nodes in a network cluster, that is, it forms"One big CPU" and "one big disk". The client only needs to perform read/write/mofity operations on the CPU. However, different services cannot meet the requirements of connectivity. Different systems are designed based on different services, and their efficiency is relatively high 【Personal Opinion]

TFS personal understanding

Because the architecture of GFS affects the design center nodes and slave nodes of the distributed system, the GFS can withstand the Google business because it is used to store files. Therefore, the general company only needs to design and implement this architecture. it should be able to meet business needs well.

Now let's look at the specific TFs. First, let's look at the next business requirement: there are 20 billion images to be stored in a system. The system can quickly determine where the image files are and support read/write operations. Assume that the image metadata is 64 bytes, and the metadata is 20 billion * 64> 1 TB, which can fill in a host disk, in addition, if the real-time metadata is put into metadata, data in all file directories cannot be cached in the memory. Therefore, the query may cause the disk to be read three times, which is very inefficient. Let's see how TFs is designed:

Basic information of the file system

To find the corresponding file through a file system, you need to know the directory of the file [extendWhere is the file?] And file size [to readNumber of Blocks], The OS can be from this systemRead.

Core Mechanism

Multiple small filesSharedA physical file. That is, by controlling the size of this physical file, a single host can store the meta information of all small files, which makes this not a bottleneck. So we have to discuss how the specific TFs can achieve this.Shared?

I define a physical file as Block 1 ~ Block N. The size is MB.

The image file name is picname and the only 64-bit ID of PKB. TFs is ID.

The TFS client requests the nameserver to write a small file picname, And the nameserver assigns a TFs"Internal file name path"To the client, it guides you where to write files. In fact, it works on file name encoding:

Contains the block ID and file sequence ID. In fact, meta information is maintained in the master. In this way, the master controls the block size to control the number of files supported by the entire TFs cluster. Blocking internal writes. If TFs is successfully written, what information should be returned corresponding to the general file? A database is required to record the relationship between the two. Therefore, a database is required to record the location of these files. The kV system is not suitable.

Now let's take a look at what is stored in our master: for the Master, it is a block information block. You can map <blockid, metea> to store the meta information. In this way, the memory shortage of the master node is solved. For the client, whether it is read or write, you only need to request the block ID and offset of the master. IfClient CacheThen, you can directly find the corresponding dataserver. Therefore, this design is still very good.

The reading process is very obvious: tfs_file -----------> block_id ----> meta information ----> block ID and file offset find the corresponding file location and then read it out. In this way, the read performance should be relatively high.

Now let's take a look at the write process:

There are several points: write operations are a single-threaded model structure for TFS, and all write operations are queued for one write after another, and cannot be written concurrently. In this way, he should think that writing is a minority of the time, and reading is the most time, so the slow speed does not have a great impact. It is easy to implement. The dataserver (master) is used to communicate control information with nameserver. Nameserver has many interactions, which slows down the nameserver speed. The replication policy is master and dataserver replication. After all the steps are completed, the write operation is successfully sent to the client. If one of the operations fails, the system re-writes the data. The cost is quite high.

TFS Fault Tolerance Mechanism disaster recovery and resizing

If the dataserver capacity in a cluster is insufficient, it is reasonable to automatically expand the capacity. Similarly, if a host node fails, the replica should write content to other hosts, which will be automatically processed by the master. The content is done by the maseter, and the master and slave master should be the same copy. You can also switch automatically when the master node fails. This is very important, but it is relatively simple. A synchronization mechanism should be adopted between the master and slave master, otherwise inconsistency may occur.

The master should maintain all the heartbeat information of the dataserver. What should we do if the information is not sent back at the specified time? Enable the data migration mechanism. Therefore, to find the block ID of the host, you should maintain the Map <dataserver, block_id> operation, which will be much faster! Then, the new dataserver is assigned based on the information and capacity of each node based on the new overall information.

If the entire TFs cluster goes down, dual-write dual-data center should be used for better security, but it will be cheaper!

　　Other points　　　

Another 1: Data may be read unevenly. What should I do if most clients request to read the same block_id at the same time? In fact, the cache still has to be done. I mean the File Cache, which can reduce the efficiency. Of course, the data itself has no cold hotspot, in this way, it will take more time to maintain the cache, so application data should have a certain degree of cold and hot. Hey, as long as you don't suddenly invite a lot of irrelevant data into the cache pool to be contaminated, it will be a little troublesome. It is the opportunity to discuss these issues after caching.

2: reads a 64 M file with n Bytes starting from a certain offset. This problem is actually a big performance loss for TFs reading. You need to call lseek () and then read () or you can directly read all the data into the memory [which is more unreliable]. If this problem can be effectively solved, the read performance is still very high. If you don't know how TFs does this, you can take a good look at the source code.

Over

Summary of distributed Blog system TFs

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Summary of distributed Blog system TFs

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support