Zookeeper logs and disk usage
The server uses transaction logs to persist transactions. Before accepting a proposal, the server (Follower and Leader) needs to persist the transactions in the proposal to the transaction log. Transaction logs are a file on the local disk of the server. The transaction is appended to this file in order. From time to time, the server closes the current file and creates a new file to scroll (Roll Over) log (this article is the translation of Local Storage/Logs and Disk use in Chapter 9th of the Zookeeper book of Flavio Junqueria and Benjamin Reed ).
Because the write transaction log is on the Key Path of the write request, Zookeeper needs to implement it efficiently. Attaching information to a file can be completed efficiently on the hard disk, but Zookeeper uses some other techniques to make it complete faster:
A group commit is to attach multiple transactions to a disk as a write operation. This method allows persistent transactions to only use the overhead of Disk Seek once.
Here is an important warning about persistent transactions to disks. Modern Operating systems usually cache Dirty pages and asynchronously write them into disk media. However, we need to ensure that the transaction has been persisted before proceeding. Therefore, we need to Flush the transaction to the disk media. Here, we tell the operating system to write dirty pages to the disk and return the results after the operation is complete. This processor is also responsible for flushing because we have persistent transactions in SyncRequestProcessor. When the transaction is flushed to the disk in SyncRequestProcessor, we actually flushed the transactions in all queues. In this way, group submission can be optimized. If there is only one transaction in the queue, the processor will still execute erosion. This processor will not wait for more transactions to enter the queue, because doing so will increase the execution latency. For code reference, you can view the SyncRequestProcessor. run () method.
Disk Write Cache)
The server will confirm the proposal only after the transaction is forcibly written into the transaction log. More accurately, the server calls the ZKDatabase commit method, which will eventually call FileChannel. force. In this way, the server ensures that the transaction has been persisted to the disk before confirmation. However, this observation requires attention. A modern disk has a write cache that stores data to be written to the disk. If the write cache is enabled, the force call cannot ensure that the returned data is written to the media. Instead, it may still be in the write cache. To ensure that the written data is already in the media after the FileChannel. force () method is returned, the disk write cache must be disabled. The operating system has different shutdown methods.
Padding is implemented by pre-allocating disk blocks to files. In this way, the metadata update of the file system does not significantly affect the sequential file writing. If a transaction is being appended to the log at high speed, and the block is not pre-allocated to the file, the file system needs to allocate a new block whenever it reaches the end of the written file. At least two additional disk seek operations will be reduced: one is to update metadata, and the other is to return files.
To avoid being disturbed by write operations by other systems, we strongly recommend that you write transaction logs to an independent disk. The second disk can be used as an operating system file and Snapshot.
-------------------------------------- Split line --------------------------------------
Ubuntu 14.04 installs distributed storage Sheepdog + ZooKeeper
CentOS 6 installs sheepdog VM distributed storage
ZooKeeper cluster configuration
Use ZooKeeper to implement distributed shared locks
Distributed service framework ZooKeeper-manage data in a distributed environment
Build a ZooKeeper Cluster Environment
Test Environment configuration of ZooKeeper server cluster
ZooKeeper cluster Installation
This article permanently updates the link address: