On ZooKeeper's official website There is this sentence: ZooKeeper is a centralized service for maintaining configuration information, naming, providing dist Ributed synchronization, and providing group services.
This probably describes what zookeeper can do primarily: configuration management, Name services, distributed synchronization, and cluster management. And what exactly are these services? Why do we need such a service? Why do we use zookeeper to achieve this, and what are the advantages of using zookeeper? Next I'll show you what these are, and what open source systems are used.
Configuration Management
In addition to the code in our application, there are a number of configurations. such as database connections. In general, we are using configuration files to introduce these configuration files in code. But it's a good idea to use a configuration file when we have only one configuration, one server, and it's not often modified, but if we have a lot of configuration, there are many servers that need this configuration, and it might be dynamic to use a configuration file. At this point, we often need to find a way to centrally manage the configuration, where we have modified the configuration in this centralized location, and all interested in this configuration can be changed. For example, we can put the configuration in the database, and then all the services that need to be configured go to this database to read the configuration. However, because many of the services are very dependent on this configuration, the service required to provide configuration services in a centralized setting is highly reliable. Generally we can use a cluster to provide this configuration service, but with the cluster to improve the reliability, how to ensure that the configuration in the cluster consistency? This is the time to use a service that implements a consistency protocol. Zookeeper is this service, which uses Zab as a consistency protocol to provide consistency. There are many open source projects that use zookeeper to maintain the configuration, such as in HBase, where the client connects to a zookeeper and obtains the necessary configuration information for the HBase cluster before it can be further manipulated. Also in the open source Message Queuing Kafka, zookeeper is used to maintain broker information. The Alibaba Open source SOA Framework Dubbo also extensively uses zookeeper to manage a number of configurations to implement service governance.
Name Service
Name Service This is a good understanding. For example, in order to access a system through the network, we need to know each other's IP address, but the IP address is very unfriendly to people, this time we need to use the domain name to access. But the computer cannot be a domain name. What do we do? If we have a domain name to IP address mapping in each machine, this can solve some of the problems, but what if the domain name corresponding to the IP has changed? So we have the DNS this thing. We only need to access a well-known (known) point and it will tell you what the IP address of the domain corresponds to. There are a lot of these problems in our application, especially when we have a very large number of services, and it will be very inconvenient if we save the address of the service locally, but if we only need access to a well-known access point where we provide a unified portal, it will be much easier to maintain.
Distributed locks
In fact, the first article has introduced the zookeeper is a distributed coordination service. This allows us to use zookeeper to coordinate the activities between multiple distributed processes. For example, in a distributed environment, the same service is deployed on every server in our cluster to improve reliability. However, if each server in the cluster is going to be coordinated, it will be very complex to program. And if we only let one service operate, there is a single point. A common practice is to use a distributed lock, and at some point only one service goes to work, and when the service is faulty, the lock is released, and immediately fail over to another service. This is done in many distributed systems, and the design has a more pleasant name called leader election (leader election). For example, the master of HBase is using this mechanism. However, it is important to note that there is a difference between a distributed lock and a lock on the same process, so use it more cautiously than a lock in the same process.
Cluster Management
In distributed clusters, often due to various reasons, such as hardware failure, software failure, network problems, some nodes will enter and exit. There are new nodes to join in, and there are old nodes exiting the cluster. At this point, other machines in the cluster need to perceive this change and then make corresponding decisions based on that change. For example, we are a distributed storage system, there is a central control node responsible for the allocation of storage, when there are new storage in the current state of the cluster to allocate storage nodes. At this point we need to dynamically perceive the current state of the cluster. Also, for example, in a distributed SOA architecture, a service is provided by a cluster, and when a consumer accesses a service, a mechanism is needed to discover which nodes are available for the service (also known as service discovery, such as Alibaba open-source SOA box The Dubbo uses zookeeper as the underlying mechanism for service discovery). There is also the open source Kafka queue to use zookeeper as the cosnumer of the upper and lower line management.
"What can zookeeper do?"