First, introduce
Since ActiveMQ5.4, KAHADB has become the default persistent storage method for ACTIVEMQ. Compared to the original AMQ storage, the official claims that KAHADB uses fewer file descriptors and provides a faster storage recovery mechanism.
Second, KAHADB storage configuration
The configuration in Conf/activemq.xml is as follows:
<BrokerBrokername= "Broker" ... > <Persistenceadapter> < kahadb Directory= "Activemq-data"Journalmaxfilelength= "32MB"/> </Persistenceadapter> ...</Broker>
The KAHADB is specified in <persistenceAdapter> and indicates that the data is stored in the "Activemq-data" directory, and the maximum length of the log file is 32MB.
For example, an actual ACTIVEMQ data directory under the KAHADB storage mode is as follows:
As you can see, there are altogether four files in the directory above:
①db.data
It is the index file for the message. Essentially a b-tree implementation that uses B-tree as an index to point to messages stored in Db-*.log.
②db.redo
Used primarily for message recovery.
③db-*.log stores the contents of a message. For a message, there is not only the data of the message itself, but also (destinations, subscription relationships, transactions ...).
message Data and all of the information about destinations, subscriptions, transactions, etc.
Data log stores messages as logs, and new data is always appended to the end of the log file in a append manner . Therefore, the storage of messages is very fast. For example, for persistent messages, producer sends the message to Broker,broker first to store the message on disk (enablejournaldisksyncs configuration option ). Then return the acknowledge to producer. The Append method reduces the time that broker returns acknowledge to producer to a certain extent.
④lock file
In addition, some of the configuration options for KAHADB are as follows:
1) Indexwritebatchsize default value of 1000, when the updated index in metadata cache reaches 1000, it is synchronized to the metadata store on disk. instead of writing the disk every time the update is written, the bulk of the write disk is updated, and the cost of comparing write disks is significant.
2) Indexcachesize Default value 10000, (number of index pages cached in memory), allocate up to multiple pages in memory to cache index. The more index the cache, the greater the probability of a hit, and the higher the efficiency of the retrieval.
3) Journalmaxfilelength default value of 32MB, when the stored message reaches 32MB, create a new file to save the message. This configuration has an effect on the rate of the producer or the person who has the message. For example, if the producer rate is fast and the consumer rate is slow, it is better to configure it a bit larger.
4) Enablejournaldisksyncs Default value True, the default is synchronous write disk, that is, the message is first stored on disk and then returned to producer ACK
Normally,the Broker performs a disk sync (ensuring that a message have been physically written to disk) before sending the AC K back to a producer
5) CleanupInterval default value of 30000ms, when the message is successfully consumed by the message, the broker can delete the message.
6) Checkpointinterval default value of 5s, update the in-memory index (Metadata Cache) to the index file of the disk (Metadata Store) every 5s
Third, KAHADB storage low-level implementation simple analysis
is the architecture of KAHADB:
As you can see, the sections in the diagram correspond to the one by one files in the storage directory of the KAHADB configuration.
The part of ① in memory (cache) B-tree is metadata cache
By caching the index into memory, you can speed up the query (quick retrival of message data). However, you need to synchronize the Metadata Cache with the Metadata store on a timed basis.
This synchronization process is called: Check point. the checkpointinterval option determines how often the checkpoint operation takes place.
②btree indexes is stored on disk, called the metadata Store, which corresponds to the file Db.data, which is the data logs in the form of a B-tree index.
With it, the Broker (message server) can quickly restart recovery because it is the index of the message, and it can recover the location of each message.
If the metadata store is damaged, you can only scan the entire data logs to rebuild the B-tree, which is very complex and slow.
The presence of the metadata store, however, enables the broker instance to restart rapidly. If the metadata store got damaged or is accidentally deleted, the broker could recover by reading the data logs,but the R Estart would then take a considerable length of time.
③data logs corresponds to file Db-*.log, default is 32MB
Data logs stores messages in log form, which is the real carrier for the production of producers.
The data logs is used to store data in the form of journals, where events of all kinds-messages, acknowledgments, SUBSCRI ptions, subscription cancellations, transaction boundaries, etc.---is stored in a rolling log
The ④redo log corresponds to the file Db.redo
The principle of redo log uses "Double Write". For "Double Write" refer to
Briefly record your understanding: Because the page size of the disk is different from the page size of the operating system, the disk's page size is typically 16KB, and the OS page size is 4KB. The data written to disk is in the disk page size, that is, write one disk page size at a time, which requires 4 OS page size (4*4=16). If a failure occurs during writing (sudden power loss), only part of the data is written (partial page write)
After "Double write" is used, the data is written to a recovery buffer and then written to the actual destination file. In the ACTIVEMQ source code Pagefile.java has the corresponding implementation.
Four, reference documents
KAHADB Storage Engine Analysis for ActiveMQ
"ActiveMQ Tuning" kahadb optimization
KAHADB Overview
JMS Learning (vii) KAHADB storage for persistent storage of-ACTIVEMQ messages