Zookeeper programming notes

Source: Internet
Author: User
Tags zookeeper client

I first recognized zookeeper and made some records.

Zookeeper provides a centralized service, including configuration maintenance, service naming, distributed synchronization, and group management. Sub-services are commonly used in distributed applications.

 

Zookeeper Architecture

Zookeeper is an open-source Distributed Coordination Service for distributed applications. It exposes a group of interfaces, and allows distributed applications to implement configuration maintenance, data synchronization, service naming, group management, and other upper-layer services. It adopts a directory tree structure data model similar to a file system. Coordination services are difficult to handle and are especially prone to errors, such as conditional competition and deadlocks. Zookeeper is motivated to reduce the burden of coordinating services for distributed applications.

Zookeeper allows distributed applications to coordinate with each other through the shared hierarchical namespace. Zookeeper maintains data in the memory and features high throughput and low latency in access. Zookeeper attaches great importance to high performance, high availability, and strict and ordered access. Therefore, Zookeeper is competent for large-scale distributed systems in terms of performance. It can eliminate single point of failure in reliability and strictly sequential access to ensure that the client can implement complex synchronization primitives.

 

Data Model, hierarchical namespace, and Node

The Zookeeper namespace consists of the znode node, which is organized in a similar way as a file system. Each node is equivalent to a directory and a file and uniquely identified by a path. Different from the file system, each node has the corresponding data content and can also have subnodes. Zookeeper is used to store coordination data, such as status, configuration, and location information. Each node stores a small amount of data and is KB-level ~ 1024kb ). The node maintains a stat structure (including version numbers, ACL changes, and timestamps of data changes) to allow cache verification and coordinated updates. The version number increases whenever the node data content changes. When the client obtains data, it also obtains the data version number. The data content of a node is read and written in an atomic manner. The read operation reads all the content, and the write operation replaces all the content. The node has an access control list-ACL to restrict the operations that can be performed by the users. Zookeeper has a type of node called a temporary node. A temporary node exists only during the lifetime of the session for which the node is created. The session ends and the temporary node is automatically deleted.

 

Zookeeper component

As shown in, there are two types of servers under the same zookeeper service: Leader server and Follower server. The leader has the right to decide and has the request processor. Each server in the zookeeper service will copy each component. Replicated Database is a memory database that contains all data. Read requests are processed using the local copy database. Requests and write requests that change the zookeeper service status are executed according to a collaborative protocol. All write requests (from client) will be passed to the leader, and then the leader will initiate a proposal to all follower. Follower will receive the proposal and make a statement. After the proposal is approved, the proposal will be executed. The message transmission layer is responsible for changing the leader and synchronization between followers and leader when the proposal fails.

 

About zookeeper watches

All zookeeper read Operations (getdata (), getchildren (), exists () have the option of setting watch.
Zookeeper watch is defined as follows: A watch event is a one-time trigger. When the data monitored by watch changes, the notification sets the client of the watch, that is, watcher.

Note the following three points:

1. One-time trigger
The client sets watch on a node, and the content of the node changes. The client obtains the event. When the content of the node changes again, the client will not obtain this event unless it performs another read operation and sets the watch

2. Send to client, watch event Delay
The watch event is asynchronously sent to the observer. For example, if the client performs a write operation and the node data content changes, after the operation is returned, the watch event may still be sent to the client. In this case, Zookeeper ensures that the client does not know the data changes until it obtains the watch event. Network latency or other factors may cause different clients to obtain the watch event and operation return values at different times.

3. Set the watch data content
Different methods of node change are involved. For example, Zookeeper maintains two watch lists: the data watch of the node and the child node watch. Getdata () and exists () set content watch, getchildren () set child node watch, the Data Type returned by the operation is different, the former is the content of the node, the latter is the subnode list of the node. Setdata () triggers content watch, create () triggers "content watch" of the current node and "child node watch" of its parent node, delete () trigger "content watch" and "sub-node watch" (all sub-nodes are deleted), and "sub-node watch" of its parent node ". To put it bluntly, the operation on the current node should take into account the impact on its parent node and child node.

Watch is maintained locally on the server connected to the client. The settings, maintenance, and distribution operations of watch are lightweight. When the client is connected to a new server, watch is triggered by any session event. When you disconnect from the server, you cannot obtain the watch event. After the client re-connects, the previously registered watch will be re-registered and triggered as needed. Normally, this happens transparently and the user will not notice it. Watch may be lost in one case: exists watch has been set for a node that has not yet been created. If the node is created or deleted During disconnection, the watch will be lost.

For watch, Zookeeper provides the following guarantees:
1. Watch is ordered for other events, watches, and asynchronous responses. Zookeeper client library ensures ordered distribution
2. When the client monitors a node, it always obtains the watch event first and then discovers the data changes of the node.
3. The watch Event Sequence corresponds to the data update sequence seen by the zookeeper service.

Watch should remember:
1. Watch is triggered at one time. If you want to get a watch event and receive a notification of new changes, you need to reset watch.
2. Watch is triggered at one time and there is a delay between getting the watch event and setting the new watch event. Therefore, we cannot reliably observe every change of the node. We need to realize this.
3. watch object is triggered only once. For example, if a watch object is registered with getdata () and exists () of the same node, the node is deleted and only corresponds to exists () watch ojbect is called
4. If you disconnect from the server, you cannot obtain the watch event until you reconnect to the server.

 

Zookeeper Access Control

Zookeeper uses the access control list (ACL) to control access to nodes. ACL is similar to the access permission for UNIX files. It controls the operation type of nodes in the form of a permission bit. zookeeper does not have the owner or user group concept, and uses ACL to specify the access permission for each user. An ACL is only used for one node. Note that it cannot be used for its subnodes. That is, the access permission of each node is determined by its own ACL.

Each client connection has a unique ID within zookeeper. zookeeper associates the connection with the ID. When the client attempts to access the node, it compares the ID with the node ACL to determine the client access permissions.

An ACL consists of key-value pairs in the format of scheme: expression, perms ). The expression content format is specific to scheme.

ACL permission

 

Zookeeper client development

For C development, Zookeeper provides two libraries: zookeeper_st (single-threaded database) and zookeeper_mt (multi-threaded database ). Zookeeper_st abandons the event loop and can be used in event-driven applications. Zookeeper_mt is easier to use. Similar to Java APIs, zookeeper_mt creates a network I/O thread and an event distribution thread to maintain connections and execute callbacks.

In specific use, zookeeper_st only provides asynchronous APIs and callbacks for integrating into the event loop of the application. It only exists for platforms that support pthread library unavailability or instability, such as FreeBSD 4.x. In other cases, zookeeper_mt provides synchronous and asynchronous APIs.

The Zookeeper client using the zookeeper_mt library uses three threads:
Thread 1 is the business logic layer, responsible for direct interaction with users, mainly the APIS provided by the zookeeper Library
Thread 2 is the network I/O layer and is responsible for network communication with the zookeeper server, including the request data generated by sending API calls at the business logic layer, server response data, and Server watch event data.
Thread 3 is the event processing layer and is responsible for executing the watch callback.

 

 

 

 

 

 

Watch event type:

Zoo_created_event: node creation event. You need to watch a node that does not exist. When the node is created, this watch is set through zoo_exists ().
Zoo_deleted_event: Node Deletion event. This watch is set through zoo_exists () or zoo_get ().
Zoo_changed_event: node data change event. This watch is set through zoo_exists () or zoo_get ().
Zoo_child_event: subnode list change event. This watch is set through zoo_get_children () or zoo_get_children2 ().
Zoo_session_event: A session failure event triggered when the client is disconnected from the server or is re-connected.
Zoo_notwatching_event: The watch removal event. This event is not triggered when the server is no longer a watch node of the client for some reason.

The relationship between the watch event and the zookeeper read operation:

 

 

API:

 

Typedef void (* watcher_fn) (zhandle_t * zh, int type, int state, const char * path, void * watcherctx); watch callback function, two notification methods for watch events: 1. legacy: implement the watch callback function in advance and pass the function pointer to zookeeper_init. Then, use other APIs to set watch2.watcher object: A function pointer and a watcher context pointer. Watch is called in combination when triggered. To use this type, you must use the 'W' prefix API, such as zhandle_awexists and zoo_wget. zhandle_t * zookeeper_init, void * context, int flags); creates a handle for communications with the zookeeper server and a session corresponding to this handle. The session creation process is asynchronous. After receiving the zoo_connected_state event, confirm that the session is successfully established.
Int zookeeper_close (zhandle_t * zh); close the handle and release resources. After the function is called, the session becomes unavailable. Before the function is returned, unsent requests are sent, which may cause blocking. For a handle, this method can be called only once, and multiple calls will produce uncertain results. For the handle that has called this method, other handle operations also produce uncertain results. Const clientid_t * zoo_client_id (zhandle_t * zh); returns the client session ID. Int zoo_recv_timeout (zhandle_t * zh) is valid only when the connection is normal with the server. Session Timeout is returned, valid only when the server connection is normal. This value may change const void * zoo_get_context (zhandle_t * zh) after the server is reconnected. Return handle context void zoo_set_context (zhandle_t * zh, void * context); set the handle context watcher_fn zoo_set_watcher (zhandle_t * zh, watcher_fn newfn); set the watch callback and return the previous watch callback struct sockaddr * handle (zhandle_t * zh, struct s Ockaddr * ADDR, socklen_t * addr_len); returns the network address of the server (sockaddr structure). Valid int zookeeper_interest (zhandle_t * zh, int * FD, int * interest, struct timeval * TV); temporarily not quite understandable. It may be that zookeeper is listening for reading a FD or writing int zookeeper_process (zhandle_t * zh, int events ); I don't quite understand it for the time being. I have notified the zookeeper to listen for the event of typedef void (* void_completion_t) (INT RC, const void * data); function type definition, the callback type typedef void (* stat_completion_t) (int rc, const Ruct stat * stat, const void * data); Same as above, typedef void (* data_completion_t) (int rc, const char * value, int value_len, const struct stat * stat, const void * data); Same as above, return detailed data typedef void (* strings_completion_t) (int rc, const struct string_vector * strings, const void * data); same as typedef void) (int rc, const struct string_vector * strings, const struct stat * stat, const void * Data ); Same as typedef void (* string_completion_t) (int rc, const char * value, const void * data); same as typedef void (* acl_completion_t) (int rc, struct acl_vector * ACL, struct stat * stat, const void * data); same as int zoo_state (zhandle_t * zh); Return handle status int zoo_acreate (zhandle_t * zh, const char * path, const char * value, int valuelen, const struct acl_vector * ACL, int flags, string_completion_t completion, const void * data ); Create a node that does not exist before. If zoo_ephemeral is set and the client session becomes invalid, the node is automatically deleted. If zoo_sequence is set, a unique automatically added serial number is appended to the path name. The serial number width is the width of 10 digits, enter int zoo_adelete (zhandle_t * zh, const char * path, int version, void_completion_t completion, const void * Data) with zero fill; delete a node int zoo_aexists (zhandle_t * zh, const char * path, int watch, stat_completion_t completion, const void * data); check whether a node has int zoo_awexists (zhandle_t * zh, const char * path, watcher_fn watcher, void * Watcherctx, stat_completion_t completion, const void * data); check whether a node exists. It allows you to specify a watcher object (a function pointer watcher and the corresponding context watcherctx). When watch is removed, this function is called. watcherctx is used as Watcher's input parameter int zoo_aget (zhandle_t * zh, const char * path, int watch, data_completion_t completion, const void * data ); obtain node data (in the legacy mode ). Completion is a callback function. The RC parameter may be the following parameter: zok-complete, znonode-node does not exist, and znoauth-client has no permission to int zoo_awget (zhandle_t * zh, const char * path, watcher_fn watcher, void * watcherctx, data_completion_t completion, const void * data); get node data (watcher object method ). Int zoo_aset (zhandle_t * zh, const char * path, const char * buffer, int buflen, int version, stat_completion_t completion, const void * data ); set the node data int zoo_aget_children (zhandle_t * zh, const char * path, int watch, strings_completion_t completion, const void * data); obtain the child node list (legacy) int zoo_awget_children (zhandle_t * zh, const char * path, watcher_fn watcher, void * watcherctx, strings_completion_t completio N, const void * data); get the subnode list (watcher object) int zoo_aget_children2 (zhandle_t * zh, const char * path, int watch, strings_stat_completion_t completion, const void * data ); obtain the subnode list. Add (legacy) int partition (zhandle_t * zh, const char * path, watcher_fn watcher, void * watcherctx, strings_stat_completion_t completion, const void * data) to version 3.3.0 ); obtain the subnode list. Add (watcher object) int zoo_async (zhandle_t * Zh, const char * path, string_completion_t completion, const void * data); flush leader channel. int zoo_aget_acl (zhandle_t * zh, const char * path, acl_completion_t completion, const void * data) is not clear for the moment; get the node ACL. ACL describes the conditions required to operate the node, that is, WHO (ID) has the permission to perform operations on the node. Int encode (zhandle_t * zh, const char * path, int version, struct acl_vector * ACL, void_completion_t, const void * data); set the aclint zoo_amulti (zhandle_t * zh, int count, const zoo_op_t * ops, zoo_op_result_t * results, void_completion_t, const void * data); execute a series of operations in atomic mode const char * zerror (int c ); error message int zoo_add_auth (zhandle_t * zh, const char * scheme, const char * cert, int certlen, void_completion_t Completion, const void * data); specify a certificate for the application. Call this function to authenticate the certificate. The server authenticates the client connection with the security service specified by scheme. If the authentication fails, it will be disconnected from the server and triggered by watcher. the status code is isis_unrecoverable (zhandle_t * zh); check whether the zookeeper connection can restore void zoo_set_debug_level (zoologlevel loglevel ); set the debugging level void zoo_set_log_stream (File * logstream); set the file stream used to record logs. Stderr is used by default. If logstream is null, the default value stderr is used. Void zoo_deterministic_conn_order (INT yesorno); used to enable or disable the Randomization sorting of the quarum endpoint, which is usually used only during testing. If the value is not 0, the client connects to the quarum end in the initialization order. If it is 0, zookeeper_init will change the endpoint order so that the client connection is distributed on a better endpoint. Int zoo_create (zhandle_t * zh, const char * path, const char * value, int valuelen, const struct acl_vector * ACL, int flags, char * path_buffer, int path_buffer_len ); synchronize the creation node int zoo_delete (zhandle_t * zh, const char * path, int version); synchronously Delete the node int zoo_exists (zhandle_t * zh, const char * path, int watch, struct stat * Stat); check whether the node has int zoo_wexists (zhandle_t * zh, const char * path, watcher_fn watcher, void * watcherctx, struct stat * Stat ); synchronous check whether the node exists (watcher object) int zoo_get (zhandle_t * zh, const char * path, int watch, char * buffer, int * buffer_len, struct stat * Stat ); synchronously obtain node data (legacy) int zoo_wget (zhandle_t * zh, const char * path, watcher_fn watcher, void * watcherctx, char * buffer, int * buffer_len, struct stat * Stat ); synchronously retrieve node data (watcher object) int zoo_set (zhandle_t * zh, const char * path, const char * buffer, int buflen, int version ); synchronous setting of node data int zoo_set2 (zhandle_t * zh, const char * path, const char * buffer, int buflen, int version, struct stat * Stat ); synchronize the set node data and return the stat information of the current node int zoo_get_children (zhandle_t * zh, const char * path, int watch, struct string_vector * strings); synchronously obtain the child node list (legacy) int zoo_wget_children (zhandle_t * zh, const char * path, watcher_fn watcher, void * watcherctx, struct string_vector * strings); get the child node list synchronously (watcher object) int zoo_get_children2 (zhandle_t * zh, const char * path, int watch, struct string_vector * strings, struct stat * Stat ); synchronously obtains the subnode list and returns the stat information (legacy) of the current node. In version 3.3.0, int zoo_wget_children2 (zhandle_t * zh, const char * path, watcher_fn watcher, void * watcherctx, struct string_vector * strings, struct stat * Stat); synchronously obtains the subnode list and returns the stat information of the current node (watcher object). In version 3.3.0, int zoo_get_acl (zhandle_t * zh, const char * path, struct acl_vector * ACL, struct stat * Stat); synchronously obtains the node aclint zoo_set_acl (zhandle_t * zh, const char * path, int version, const struct acl_vector * ACL); synchronously sets the node aclint zoo_multi (zhandle_t * zh, int count, const zoo_op_t * ops, zoo_op_result_t * results); synchronously executes a series of operations in atomic Mode

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.