Understanding zookeeper from the remote Invoke framework

Last Update:2015-03-17 Source: Internet

Author: User

Keywords remote invocation this

Tags address an application application application software applications business call management call service

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Remote invocation is the communication mechanism between system and process, and is the core technology of distributed system development. Remote invocation technology can form a group of computer systems into a network system, the external provision of the overall service, then this group of computer systems constitute a larger, more performance of the computer system.

The architecture design of remote invocation service

First, we need to understand the following questions: Why do we need a remote invocation service in the application software service? What is the problem with the remote Invoke service that solves the software design?

I have written an article on the design of distributed Web site architecture, at the beginning of the article on the new Web site architecture and traditional enterprise software B/S Architecture contrast, the author of a Web site to provide business services component abstraction as an independent service system, the user information to receive the logical part of the abstract as a front-end system, Service systems and front-end systems communicate using communication components such as Netty. When it comes to the framework design of the remote Invocation service, the Netty communication component is further abstracted into a communication independent system and a remote invocation service, which is why the remote invocation service is designed to create a Web site architecture upgrade if the traditional enterprise B/ s architecture for version 1.0, front-end and business Server to separate the system is 2.0 version, then introduced the Remote Call Service Web site is 3.0 version, the benefits of the 3.0 version of the architecture are that n-plus front-end systems and N-plus service-side systems can be integrated into one, the size of the Web site will grow and the number of services provided will be increased, which avoids duplication of wheel-making problems and The size of the site is getting bigger.

3.0 version of the Web site architecture map

With the remote invocation service, we can do business-level clusters, such as: A manufacturing enterprise, general will have procurement business, production operations, sales operations and financial services, according to the traditional thinking we will give each business independent development of a system, if the reference to the remote Call service, These services can be made into separate services that make up the business cluster, and these services are used as portals for the operation of a unified remote invocation service, so that the front-end callers can achieve uniform application. The so-called application of the unified Taobao station is the most typical representative, we can operate a variety of applications in a single Web site, and will not happen because the application of different we have to re-enter the new address or login to another system to do other business operations. On the other side of the service side, it is possible to get rid of the traditional client and server-side coupling development, enhance the professionalism and stability of the entire service side, which makes it easier to extend and maintain the service side. If the service side also need to call each other through the remote Invocation service implementation, because of the unity of the remote invocation service, so that the service calls to avoid the message and call the way of the lack of uniformity, standardize the entire development process. If the remote invocation service also has load balancing capabilities, the entire service cluster becomes a private cloud, so it is not too much to say that the remote invocation service is an important part of cloud computing.

Remote Invoke Service technical principle

Remote Call Framework

Traditional service invocation is a direct call from the service provider and service caller, from the architecture diagram we see a remote call management component, Remote Call Management component is an independent service system, in order to ensure the stability of the system, it must also be a distributed system, But this distributed system and Web Distributed system are completely different distributed systems. The traditional Web application cluster is based on the stateless features of the HTTP protocol, because each HTTP request is a separate transaction and there is no relationship between requests, so we can deploy the Web application to a different server, requesting that no matter the server, can be normal to provide users with the corresponding services, but the Web application of the session mechanism is stateful, so the traditional Web cluster is to have session synchronization operation, large Web sites tend to abstract session function as an independent caching system, But here the remote call Management component's cluster principle or distributed principle is different from the Web application cluster distributed principle. The remote invocation management component can be used as a registry that records information about the service provider and the service caller and pushes that information to the service provider or service caller, which is recorded in memory to ensure the efficiency of the system's execution. Let's just imagine that if these registrations are lost, the entire system will be unavailable, so the cluster of remote invocation management components is a cluster that ensures data reliability and service delivery robustness, rather than a cluster based on HTTP stateless features.

Here we assume the remote Invoke service cluster run scenario, if we have 5 servers running as a remote invocation service, then each server must have a redundant backup of the registration information, and when one of the servers fails, the data on this failed server is not lost. In addition, the cluster should have a mechanism to check the failure, when the discovery of a server is not available, the timely elimination of the server, and zookeeper is the technical framework to solve this problem. In addition to ensure the stability and availability of the system, the cluster data storage mode is also very important, I talked about the cluster of data storage to have a redundant mechanism, in addition to redundancy mechanism also has a very suitable for fast access and read-write data model, and zookeeper just contains this data model, So the remote invocation service I designed is a very suitable scenario for zookeeper applications.

The remote call management component also has a heartbeat mechanism, the function of the heartbeat mechanism is to detect the health of the service provider and the availability of the service provider, and the service provider will launch its own registration information to the remote call management component, which contains the IP address and port number of the server. The remote call management component starts a thread, according to the timing of this IP address and port number to ping the corresponding application of IP and port number, if not available remote call management components will be repeatedly tried several times, this number of times and how long heartbeat can be configured, if repeated several times or not, Then assume that the service is not available. There are netizens asked me on QQ, why not detect the heartbeat of the service caller, this is completely unnecessary, the caller is the active side, the provider is the passive side, this is like you visit the website, if you are sick not to visit, the system does not need to check whether you have been sick.

Zookeeper Technical Detailed

In the remote invocation service zookeeper is used in the remote invocation management component, while the service caller is the zookeeper client, and the remote invocation management component is also at the heart of the remote invocation service, and the entire application will not be available if the remote call management component is hung at runtime. So the remote invocation management component must be reliable, and this reliability requirement is even higher than the reliability of the service provider and service caller, so the design of the remote invocation management component must be distributed, and must be reliable distributed.

The remote invocation management component is a fully compliant application of the zookeeper scenario or a standard zookeeper application, and in order to facilitate later discussion, here I further explain the function of remote invoke management components. From the previous narrative, we know that the central role of the remote invocation management component is to store the configuration information for the communication between the service provider and the service caller, such as the IP address and port of the storage service provider, and the service category of the service provider. It also records the IP address and port number of the service caller (this information is entered in the Web management system of the remote invocation management component) and the control relationship between the service caller and the service provider. For service providers, the remote invocation management component also provides a heartbeat mechanism to detect the health of the service provider, and if the remote call Management component discovers some server problems with the service provider, it updates the service provider's configuration information in a timely manner and pushes those changes to the service caller. From the point of view of the configuration information store, the remote invocation management component is actually a system for Remote Storage configuration information, and the heartbeat mechanism and push mechanism this is an observer pattern, and the above features are functional in a distributed environment and require high reliability. Zookeeper one of the most typical applications is to do distributed application configuration services, specific point is like we usually write the configuration file, to the distributed system also needs a separate system to complete, and is a dynamic configuration services.

Now that zookeeper can do distributed configuration services, we can understand the role of zookeeper in reverse through the characteristics of distributed configuration services. The author in Java Web development, will use a large number of configuration files, generally these files are completed with properties file, service startup, the property file in the information will be read into memory, the Web system from memory to read these configuration information. This configuration information has several features: The property file is generally not very large (this refers to the system run related configuration file), the configuration information is persisted, uses the time to load in the memory, reads from the memory, zookeeper can also accomplish such thing, moreover its characteristic and the traditional configuration file uses almost the same, Zookeeper has a file system, this file system is used to store small files, we read the configuration information in memory is read in the high efficiency, write information zookeeper will be the configuration information persisted. This is why some books introduce the performance of zookeeper:

Zookeeper has a baseline throughput of more than 10,000 operations, while for read operations, throughput is several times higher than the workload.

This sentence makes sense, small file write speed quickly, 10,000 operation Wood have what problem, read is through memory, high several times is taken for granted.

System running configuration information reliability requirements are very high, since we now use a distributed system to complete the configuration information read and write operations, then ensure that the accuracy of information literacy is very important, especially write, the requirement is absolutely either success or failure, this scenario is thread-safe. But we are now a distributed system, the operation between different servers is not the category of threads but the scope of the process, so there is a need for new technology to ensure operational security, in other words, the mechanism of process security. In addition, distributed configuration services use distributed to ensure the stability and security of the configuration service system in order to continuously provide users with high-quality services. These two problems seem irrelevant, but there is a solution to the problem at the same time, the solution is the Zookeeper Zab protocol. To clarify the Zab protocol, for example, we use 5 servers as zookeeper servers, we send instructions to the zookeeper cluster, which is read or write, zookeeper cluster completes the following two operations:

Operation One: Leader election, when the zookeeper start, these 5 servers will elect a leader machine, the other machines are followers, if more than half of the followers and the leader of the communication has confirmed the status, then this phase is completed. If the leader has always been healthy, then the leadership of the election operation will not be triggered, if the leader of the problem, then zookeeper will again trigger the operation of the leader election. (Here I have a question I'm not sure: Will the zookeeper's leader use the heartbeat mechanism when it detects followers ' health?) ）

Operation two: If the instruction issued is a write request, all write requests are forwarded to the leader, and the leader sends the updated broadcast to the followers, and when more than half of the followers change the persistence, the leader submits the update before the client receives a successful update response. This approach to consensus is designed to be atomic, and the operation either succeeds or fails.

The above operation guarantees the atomicity of reading and writing, does not occur dirty data, the repeated election leaders also guarantee the reliability of the service. There's a problem, of course, if the leader fails? At this time zookeeper cluster will repeat the above leader election operation. This also explains why the zookeeper cluster requires odd-numbered servers, 5 servers 2 hang up, the service can run normally, if 6 servers, or only allow 2 server failure, because if 3 hung, the remaining server does not have more than half, So zookeeper himself all hung up, so odd server does not cause server resources waste.

For read operations, zookeeper any server can be directly to the service, with a few other operations, so efficient, and write operations, only when all the server persisted data, zookeeper will update the corresponding data in memory, so it will be much slower than read operation.

The Zookeeper storage data operation is consistent with the UNIX file system path operation, and the memory data storage model is a tree structure, the tree-like node is called Znode,znode is used to store and read the data, the operation of this tree is as follows list:

Action Description Create creates a znode, must have a parent node delete Deletes a znode, cannot have any child nodes Existsznode exists, and queries its metadata getacl,setacl get/ Set a znode Aclgetchildren get a list of child nodes Getdata,setdata get/Set a znode saved data sync synchronizes the Znode view of the client with zookeeper

The configuration information we store is done using these operations, such as when the service provider starts to push its configuration information to the remote call Management component, the component does the creation of the node or sets the operation of the data saved by the Znode, and when the data is saved successfully, Zookeeper will immediately push the information to the service caller, the push work zookeeper can also be completed, zookeeper Znode in some form of change, each znode with an observation mechanism, the observer mechanism will inform the client, This client is the service caller. If the heartbeat mechanism detects that a service provider has failed, zookeeper also modifies the corresponding Znode information, which will also trigger an observation mechanism to notify the service caller of a change.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More