Tips
For detailed instructions on zookeeper deployment and management, see the official documentation http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html.
1. Configuring the Zookeeper Service
The zookeeper server contains various configuration parameters. These parameters are defined in the Zoo.cfg configuration file. If they are configured for the same application, the servers deployed in the Zookeeper service can share a file. The myID file separates the server from the other servers. Although the default options in this profile typically provide the most common use cases for evaluation or testing of an application, it is important that the values of these parameters are correctly set to the appropriate inference in the production environment.
You can also zookeeper.propertyName
set many configuration parameters using the Java System properties. These properties use the option settings when starting the server -D
. However, the parameters defined in the configuration file take precedence over the parameters set in the Java command line using the -D
option.
2. Minimum configuration
The basic configuration parameters that must be defined in the configuration file for each zookeeper server are mentioned here. These parameters are not predefined and must be set in the configuration file to run the zookeeper instance.
- ClientPort: This is the TCP port that the client connects to the server. The client port can be set to any number, and different servers can be configured to listen on different ports. The default port is 2181.
- DataDir: This is the directory where the Zookeeper memory database snapshot is stored. If the Datalogdir parameter is not defined separately, the transaction log that is updated to the database is also stored in this directory. If this server is a member of ensemble, the myID file will also be stored in this directory. If the data catalog is not sensitive to performance, the transaction logs are stored in different locations and are not required to be configured in a dedicated device.
- Ticktime: This is the length of a single mark, expressed in milliseconds. Tick is the basic unit of time that Zookeeper uses to determine heartbeat and session timeouts. The default Ticktime parameter is 2000 milliseconds. Reducing the Ticktime parameter allows for faster timeouts, but increases network traffic (heartbeat) and processing overhead on the zookeeper server.
3. Storage Configuration
The following are the advanced parameters for configuring storage options for the Zookeeper service:
- Datalogdir: This is the directory where the zookeeper transaction log is stored. The server uses synchronous writes to flush the transaction log. Therefore, it is important to use a dedicated transaction log device so that transaction log records for the zookeeper server are not affected by the I/O activity of other processes in the system. Having a dedicated log device can increase overall throughput and allocate a stable wait time for requests.
- Preallocsize:
zookeeper.preAllocSize
Java system property setting block size is pre-assigned to the transaction log file. The default block size is ~ MB. Pre-allocation of transaction logs minimizes disk searches. If you use snapshots frequently, the transaction log may not grow to a few megabytes. In this case, we can adjust this parameter to optimize storage usage.
- Snapcount:
zookeeper.snapCount
Java System Properties provide the number of transactions between two consecutive snapshots. snapCoun
after the specified time, the transaction is written to a logfile file, a new snapshot is started, and a new transaction log file is created. Snapshots are a performance-sensitive operation, so snapCount
having a small value on the zookeeper can negatively affect performance. The default value of the Snapcount parameter is 100,000.
- Tracefile:
requestTraceFile
Java System Properties set this option to enable trace files to be logged to traceFile.year.month.day
the request. This option is useful for debugging, but it can affect the overall performance of the zookeeper server.
- FSYNC.WARNINGTHRESHOLDMS: This is the time in milliseconds, which defines the threshold for the maximum amount of time that is allowed to flush all outstanding writes for the transaction log, the pre-write log (Write-ahead Log--wal). Whenever the synchronization operation exceeds this value, it issues a warning message to the debug log. The default value is 1,000.
- Autopurge.snapretaincount: Refers to the number of snapshots and corresponding transaction logs that are saved in the directory DataDir and Datalogdir respectively. The default value is 3.
- Autopurge.purgeinterval: This refers to the time interval in hours to purge old snapshots and transaction logs. The default value is 0, which means that the auto-purge feature is disabled by default. You can set this option to a positive integer (1 or higher) to enable automatic cleanup. If disabled (set to 0), the purge does not occur automatically by default. Manual cleanup can be done by running the zkcleanup.sh script under the Zookeeper Release Bin directory.
- Syncenabled: This configuration option was introduced in 3.4.6 and later versions of zookeeper. It is set using Java System Properties to
zookeeper.observer.syncEnabled
enable observer to log transactions and write the snapshot to disk, as is the case with follower by default. Observer does not participate in the voting process like follower, but rather submits the leader's advice. Enable this option to reduce the recovery time of the observer when restarting. The default value is true.
4. Network Configuration
The following configuration parameters are related to client interaction with the Zookeeper server:
- Globaloutstandinglimit: This parameter defines the maximum number of outstanding requests in the zookeeper. In reality, clients can submit requests more quickly than zookeeper. This happens if you have a large number of clients. This parameter enables zookeeper to perform traffic control by restricting the client. This is to prevent zookeeper from running out of memory due to queued requests. Once the Globaloutstandinglimit,zookeeper server is reached, it will begin restricting client requests. The default limit is 1000 requests.
(Java System Properties: Zookeeper.globaloutstandinglimit)
- Maxclientcnxns: This is the maximum number of connection connections between a single client and a zookeeper server and kits. The client is identified by its IP address. Setting up a TCP connection is a resource-intensive operation that prevents the server from overloading. It is also used to prevent certain types of Dos attacks, including the exhaustion of file descriptors. The default value is 60. Setting this to 0 completely eliminates the limitations of concurrent connections.
- Clientportaddress: This is the IP address that listens for client connections. By default, the zookeeper server is bound to all interfaces that accept client connections.
- Minsessiontimeout: This is the minimum session time-out (in milliseconds) that the server allows the client to negotiate. The default value is twice times the Ticktime parameter. If this time-out is set to a very low value, false positives can result from incorrectly detecting a client failure. Setting this time-out to a higher value will delay detection of client failures.
- Maxsessiontimeout: This is the maximum session time-out (in milliseconds) that the server allows the client to negotiate. By default, it is 20 times times the Ticktime parameter.
10. Managing Apache Zookeeper Configuration