Zookeeper principle Architecture

Last Update:2018-07-26 Source: Internet

Author: User

Tags mutex zookeeper scp command

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

first, zookeeper is a Java project for Apache, which belongs to the Hadoop system and plays the role of administrator.

1 1

So let's take a closer look at this thing. zookeeper is capable of. 1. Configuration Management

That's a good understanding. Distributed systems have a lot of machines, such as when I build Hadoop hdfs, I need to be on a master machine (master node) to configure the various configuration files HDFs needs, and then through the SCP command to copy these configuration files to other nodes, In this way, the configuration information obtained by each machine is consistent in order to successfully run the HDFs service. Zookeeper provides a service that centrally manages the configuration, where we modify the configuration in this centralized location, and all interested in this configuration can be changed. This eliminates the manual copy configuration and ensures reliability and consistency.
2. Name Service

This can be simply understood as a phone book, phone number is not good to remember, but the name of the person to remember, who to call, directly check the name of the good.
In a distributed environment, it is often necessary to name the application/service in a unified and easy to identify different services.
 similar to the domain name and the corresponding relationship between IP, domain name easy to remember;
 by name to obtain the address of a resource or service, etc. 3. Distributed Locks

The distribution of two words seems difficult to understand, in fact, very simple. Each process of a stand-alone program needs to be locked when it accesses the mutex, and the distributed program is distributed on each host to access the mutex. Many distributed systems have multiple windows to serve, but at some point only one service is allowed to work, and when the service goes wrong, the lock is released, and immediately fail over to another service. This is done in many distributed systems, and the design has a more pleasant name called leader election (leader election). For example, a popular point, such as the bank to withdraw money, there are multiple windows, but for you, there can only be a window to you, if the clerk is on the window of your service suddenly urgent to go, then what to do. Find the lobby manager (zookeeper)! The Lobby Manager specifies another window to continue to serve you. 4. Cluster Management

In distributed clusters, often due to various reasons, such as hardware failure, software failure, network problems, some nodes will enter and exit. There are new nodes to join in, and there are old nodes exiting the cluster. At this point, some machines in the cluster (such as the master node) need to perceive this change and then make corresponding decisions based on that change. I already know that the Namenode in HDFs is through the datanode heartbeat mechanism to achieve the above perception, then we can first assume that zookeeper is actually a similar heartbeat mechanism to achieve the function of it. features of Zookeeper

1 Final consistency: show the same view for the client, which is the most important feature of zookeeper.
2 Reliability: If the message is accepted by a server, it will be accepted by all servers.
3 Real-time: Zookeeper does not guarantee that two clients can get the newly updated data at the same time, if you need the latest data, you should call the sync () interface before reading the data.
4 wait Unrelated (Wait-free): Slow or invalid client does not intervene in fast client requests.
5 atomicity: The update can only succeed or fail with no intermediate state.
6 Order: All servers, the same message publishing order consistent. use the zookeeper system.

Ha Scenarios in HDFs
The HA scheme of yarn
Hbase: Must rely on zookeeper, save Regionserver Heartbeat information, and other key information.
Flume: Load Balancing, single point of failure zookpeeper basic architecture

1 each server stores a copy of the data in memory;
2 When Zookeeper is started, a leader (Paxos protocol) will be elected from the instance;
3 leader is responsible for handling data update and other operations (Zab protocol);
41 update operations succeeded when and only if most servers were successfully modified in memory
Data.
number of Zookpeeper Server nodes

Zookeeper server number is usually odd
Leader election algorithm adopts Paxos protocol; Paxos core idea: When most servers write successfully, the task data is written
Success. Other words:
If there are 3 servers, then two can write successfully;
If you have 4 or 5 servers, three writes succeed.
The number of servers is generally odd (3, 5, 7)
If there are 3 servers, allow up to 1 servers to hang up;
If you have 4 servers, you can also allow up to 1 servers to hang out
That being the case, why use 4 servers. Observer Node

3.3.0 later version new role Observer
Add Reason:
Zookeeper need to ensure high availability and strong consistency;
As the number of cluster nodes grows to support more clients, more servers need to be added, but the server increases, and the polling phase increases latency, impacting performance. To weigh scalability and high throughput rates, introduce observer:
Observer not participate in voting;
Observers accepts the client connection and forwards the write request to the leader node;
Add more observer nodes to increase scalability without compromising throughput. Zookeeper writing process:

The client first communicates with a server or observe (which can be considered a proxy for a server), initiates a write request, and then the server forwards the write request to the Leader,leader and then forward the write request to the other server. The server writes the data after it receives the write request and Leader,leader after receiving most of the write successful responses, it considers the data to be successful, and finally, the server that originally received the request returns the result to the client. Zookeeper Data Model

Zookeeper adopts a hierarchical directory structure, naming conforms to the regular file system specification, &NBSP;
Each directory is called Znode in Zookeeper, and it has a unique path identifier;
Znode can contain data and sub-znode (ephemeral types of nodes cannot have sub-znode);
can have more than one version of the data, such as a certain znode have more than one version of the data, then query the path of the data need to bring the version;
Client apps can set the Monitor (watcher) on Znode;
Znode does not support partial read and write, but one-time full read/write
Znode has two types, Short-lived (ephemeral) and persistent (persistent); the type of
Znode is determined at the time of creation and cannot be modified afterwards;
EPHEMERALZN when the client session ends, Zookeeper will delete the ephemeral Znode, EPHEMERALZN Ode can not have child nodes;
Persistent znode is not dependent on client sessions, only if the client explicitly wants to delete the persistent Znode, ,
Znode has four types of directory nodes, persistent, persistent_sequential, ephemeral, phemeral_sequential.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More