A Brief Introduction to the distributed streamline operation architecture (solutions and ideas for leader pressure and bottlenecks)

Source: Internet
Author: User

In fact, I have been thinking about hadoop during this time, mainly because I am going to use DOTNET to simulate it. These two days I have just seen

Http://blog.csdn.net/cenwenchu79/article/details/7206804 this article, it seems that there is more than one view of hadoop architecture, of course, others are cool people, some cool people dare to doubt, I also followed to say something.

First of all, frankly speaking, I have never used hadoop. I have only learned about its mechanism. According to the above views, hadoop's master will become a bottleneck because of its role as a reducer, is the final merging result. I have never actually used hadoop, but I have not found this problem. I think that the role of the master should be relatively simple in the process of preparing to simulate hadoop, if the master's job division responsibilities (such as large file cutting) involve a large amount of computing workload and data traffic, it will easily become a performance bottleneck for distributed computing systems. Based on the analogy of the actual enterprise operation mode, distributed streamline job clusters (including hadoop clusters) are actually equivalent to a processing agent enterprise with sales, production, warehousing, management, human resources and other roles (financial services are required for enterprises, but they are not needed here ):
Sales is responsible for order acceptance and task delivery, and production is responsible for specific computing (MAP, reduce, etc., not limited to these, as long as they meet the characteristics of the production flow ). Warehousing is responsible for the storage of raw materials and finished products, that is, the file or database system. Managers are responsible for centralized coordination. To avoid multi-headed leadership issues, management roles are divided into two roles: global managers (Global Leaders) and project managers (transaction leaders, each order is equivalent to a project). The Global Manager (master) is the general manager of the company, responsible for command and coordination, and only one project manager is responsible for the management of specific transactions (by order, there can be multiple masters, in addition to coordination and command, but also assume the role of human resource management. Human resources, of course, need to manage producers and project managers, number and status of warehouse and other roles. The warehousing role can be undertaken by a specialized file system, responsible for the storage of raw materials, semi-finished products, and finished products. According to the information I got from hadoop, its master has four roles: Sales, management, producer, and human resources. It is natural to take too much responsibility and easily become a bottleneck.

We know that in enterprise management, multi-headed leaders and overlapping responsibilities must be avoided. Of course, if we fully distinguish from each other based on the single responsibility principle, there is no need to implement the computing system, second, it will also increase complexity, not worth the candle. Computers were born in human production and life. In fact, many practices also come from real production and life. Human production and life are inherently distributed and concurrent. In reality, pipeline production is an efficient production method, while MP in hadoop is a simple pipeline operation mode. Of course, in a computing system, there is no need to simulate the reality completely without the complexity of actual production and life ). Back, let's analyze the roles and responsibilities of the hadoop master. The problem of multi-head leadership does not exist. The human resource role is necessary because of coordination (and this information is critical to the master, the workload is not very large, so it is not necessary to manage a role independently), but some producers and projects do not have the necessary responsibilities. Besides, both roles consume resources. Moreover, due to its particularity, global managers cannot achieve load balancing and must be fixed (because it is the intermediary of all other roles and must be fixed, otherwise, the communication and temporary election costs are too high ). However, in such a distributed computing system, merging the final result is a good choice for task managers (it is not impossible to specify such a merging role, but it will increase the complexity of the computing system and the difficulty of Implementation and Management)
Because the merger is basically based on projects (order tasks), it is better to hand over to the Project Manager role. The role of the business can be set separately or by the master, but I think it will be better to set it separately. In fact, the master does not have to be set externally and the business will be better.

The overall business logic and process of this computing operating system are as follows:

A) All roles except the master node will register and update their statuses with the master node through the heartbeat service (in fact, when the master node crashes, you can also perform master recovery (partial) through this mechanism, but the mechanism is more complex );
B) after the business receives the order (processing task), it will first hand over the order to the master, the business does not need to save the order information, which also determines that it can achieve load balancing, it is the intermediary between the customer and the computing system;
C) The master allocates processing tasks to a worker based on the worker's status and designates the worker as the project manager. The Project Manager completes the specific task management (project management;
D) After receiving the task, the project manager must split the task as necessary, apply for the Worker Resources from the master, assign the task to the resources, and perform project management (project status report, project member progress, etc );
The policies used by the Project Manager when a Member encounters a problem are the same as the responsibilities implemented in the original master, but are only completed by the Project Manager. Of course, if the master finds that the project manager is down, you can also appoint a project manager to start the task again based on the same policy. (I will discuss how to solve this problem later)
E) After receiving the task assignment from the Project Manager, each worker executes the specific production task and reports the situation to the project manager on a regular basis. After the task is completed, the worker reports the result to the project manager;
F) if the project manager finds that a Member task has not been completed and has lost contact with the project manager, the Project Manager will re-designate the worker to complete the task (internal coordination is allowed, you can also apply for new resources from the Master. (to improve efficiency, the Project Manager can report the task to the master after the worker completes the task, release the control, but keep in touch, the link is completely detached only after the task is cleared)
G) after the project is completed, the project manager merges the results and reports the completion information to the master. After receiving the report, the master notifies the business to perform order interaction;
H) The specific delivery is completed by the customer, business, and project manager. Of course, the interaction result will be reported to the master.
(I) After the order is delivered, the master instructs the project manager to clear the job and remove the project manager (the member can clear the job when the project manager determines that the job is completed ).
J) if a problem occurs in the middle, you can start from C and E.

From the logic and process above, we can see that the responsibilities of the Master are much simpler than the original ones, because the large communication and computing parts are completed by specific workers who can load and replace them, therefore, it is difficult for the master to become a bottleneck. This is also the reason why the general manager and the enterprise are relatively idle. However, the disadvantage of this method is that it increases the difficulty of design and implementation. If the project manager fails, the cost of the restart item is also relatively high (you can also use some methods to reduce the cost ). Of course, nothing can be perfect, and you can choose from it.

The master scale-out proposed by the cool man above is not impossible, but it will be very difficult to implement, because to ensure reliability and consistency, there must be a unified place in the end, only one point can assume this unified responsibility (but there can be peer-to-peer backup ), otherwise, the master node in GFS Implementation of Daniel like Google will certainly not use the Leader Election Algorithm to ensure that there can only be one leader at a time. Therefore, I feel that to reduce the pressure on leaders, we can only achieve this by stripping them of non-critical businesses or functions. To improve leadership, you can only upgrade devices.

 

In addition, the splitting results of the original master may be reused. In this mode, you need to add instructions to instruct the project manager on how to manage the splitting results (such as large file splitting ). of course, there are still a lot of details to be processed. The following sentence applies: the idea determines the height, and the Details determine the result. A good architecture must also be compatible with good management. Many things are for mutual reference and penetration. at a high level, everyone is also unified. As the level is higher, the Unity is stronger, and the final rise to the height is philosophy.

I am currently considering this flattening project manager responsibility system, which is similar to the Agile Manufacturing Method in actual production activities. Of course, these are some of the ideas for implementing your own MP, learning and thinking during this time. They are not well written. You are welcome to make a picture.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.