DFS selection and Function Extension start and end

Source: Internet
Author: User

A few days ago, the "batch upload" function was added to DFS due to the working relationship. In addition, to make DFS files more manageable, add the file prefix "yyyymm" to the front of the DFS directory. These tasks are developed based on dfs1.21, but can be used in any version. The following describes the practices, ideas, and the reasons for doing so.

CodeSome time ago it was made public. If you are interested, you can go to my previous blogs.

Before writing the bulk upload function, I first previewed the DFSSource codeAnd needs help from the author. The most important transmission protocol of DFS, the parsing mechanism between the server and client, and so on. I would also like to thank the author of DFS for helping me solve the problem when I added the batch addition feature.

Let's talk about how to select DFS first. During the selection, we considered a lot of related software, such as mFs and hadoop. However, the DFS is selected for the following reasons:

1. DFS can meet our current needs;

2. DFS is written in full C, within our control and well scalable;

3. Comparing DFS with MFs or hadoop, we can easily relate to the author;

4. Pass functional testing of DFS performance;

I also want to add this batch upload function. At the beginning, we considered that this batch upload function is required in the actual environment. Currently, few websites upload one image at a time, especially some e-commerce sites. To better display the details of the product, they usually upload n images at a time (n> 5 ). Therefore, considering the network connection performance, this function is also necessary. Even if you use a connection pool, the connection performance of the network is still less costly than uploading the connection server in batches. So we started to expand the bulk upload function after testing DFS in the real environment.

Another feature is to add the "year and month" label before the original path. The reason for adding this function is to facilitate image management. Many websites, especially those that are now collectively known as Web, have unimaginable images! For e-commerce sites, images have a time attribute. That is to say, after a period of time, the image will expire. This time period is generally determined by the business, but generally it is up to half a year (I don't know how long the Taobao time is ?). For B2C sites with release orders, the time is usually shorter. The order is purchased, indicating that the transaction is over and the Image life cycle of the release order or order is over. Of course, I do not include some identification pictures. In fact, these pictures follow the user, so in a strict sense, they cannot be pictures of the transaction process. Because images have time attributes, it is imperative to manage images in a timely manner. The simplest method is to add the "yyyymm" formatted path name to the original path. This change minimizes the changes to DFS source code, maintains all the features of DFS, and facilitates DFS upgrade. Then our path will be changed from groupname \ data \ 00 \ 00 \ img.jpg to groupname \ data \ yyyymm \ 00 \ 00 \ img.jpg. in this way, we can archive our "expired (Service expired)" images within a certain period of time. This saves our online array resources and reduces the difficulty of management.

In the past, there was another feature to be added, that is, adding a heartbeat function on the client and server to detect the health status of the tracker in the DFS. However, later I thought about it, for the tracker on the client, as long as the tracker is connected, even if the tracker is healthy, the tracker requests will be stopped for a period of time if the tracker cannot be connected. This function can be fully implemented by the client, and there is no need to change the DFS server code. The implementation method is: as long as the client adds an event and heartbeatProgram(Because we have considered the connection pool when implementing the DFS client ). The event is triggered when the tracker connection fails after we normally get a connection from the connection pool. The event content is that all the trackers that failed the connection in the connection pool are suspended, and then the heartbeat event is triggered, you can connect to a suspended tracker at intervals (for example, 5 minutes). After a successful connection, the tracker that is suspended in the rear position is available and the heartbeat event is canceled. This function can also be used for short connections, you do not have to use the connection pool. To use short connections, replace the connection pool in the preceding description with a "pending tracker connection list.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.