Elasticsearch How to ensure data is not lost?

Source: Internet
Author: User
Tags commit flush

As mentioned in the previous article, between Elasticsearch and disk there is also a layer of cache is the filesystem cache, most of the new or modified, deleted data are in this layer cache, if there is no flush operation, then can not 100% Ensure that the system data is not lost, such as sudden power outage or machine downtime, but the reality is that es in the default is 30 minutes to flush a disk, for such a long time, if an uncontrolled failure occurs, then is not necessarily lost data it.

It is clear that ES designers have long considered this problem, in the two full commit operation (flush), if a failure can not lose data, then how to do es?

In Es, the introduction of transaction log (abbreviated as Translog), the role of this log is any operation of each data will be recorded in the log, very much like the edits log in Hadoop and the Wal log inside HBase, as shown below:

The workflow for transaction log is as follows:

(1) When a document is indexed, it is added to the memory buffer and appended to the Translog

(2) When each shard performs a refresh operation once per second, the memory buffer is emptied but translog not.

The process is as follows:

2.1 When the refresh action is finished, the data inside the memory buffer will be written to a segment, which is still in the cache, and does not execute flush command

2.2 The newly generated segment in the cache, will be opened, This is the time to search for new data.

2.3 The last memory buffer will be emptied.

The above process is shown below:

(3) As more document is added, the memory buffer area will refresh continuously, then clear, but the number of translog is increased more and more, as shown below:

(4) When the default 30 minute is reached, the translog will also become very large, this time the index to perform a flush operation, while generating a new Translog file, and to perform a full commit operation, the process is as follows:

4.1 All the document in memory buffer will be generated a new segment

4.2 and then segment will be flushed to the system cache, the memory buffer is emptied

4.3 then commit Point will be written to disk

4.4 filesystem cache will be flush to disk via Fsync operation

4.5 The last old Translog will be deleted and a new translog will be generated

The following figure:

Tanslog's role is to provide persistent records for all data that has not yet been flush to the hard drive, and when ES restarts, it first recovers all known segments files based on the commit point file of the last stop. Then through the Translog file, all the index changes after the last commit point including Add, delete, update and so on are replayed.

In addition to the Tanslog file is also used to provide a near real-time curd operation, when we read through the ID, update or delete the document, es in the relevant segments inside the query document, ES will first get the latest changes from the Translog inside , which means that ES always accesses the latest version of the data in near real-time priority.

We know that after executing the flush command, all the data in the system cache will be synced to disk and the old translog will be deleted and the new Translog generated, by default ES shard will automatically execute the flush command every 30 minutes, Or when the translog becomes larger than a certain threshold.

The API for the Flush command is as follows:

Post/blogs/_flush  //flush Specific index

POST/_flush?wait_for_ongoing//flush All index know that the response is returned after the operation is completed

The flush command basically does not require us to operate manually, but when we want to restart the node or close the index, it is best to perform the following flush command as an optimization in advance, because ES restores the index or re-opens the index, it must first to the translog inside of all operations to restore, So that means the smaller the translog, the faster the recovery recovery operation.

We know the purpose of Tangslog is to ensure that the operation record is not lost, then the problem comes, how reliable tangslog.

By default, Translog performs a fsync operation every 5 seconds or after a write request (Index,delete,update,bulk) completes, which is performed on all primary Shard and replica shard. The operation of this daemon will not receive a $ OK request at the client.

It is still time-consuming to perform a translog fsync operation after each request is completed, although the amount of data may not be large. The translog of the default ES is configured as follows:

"Index.translog.durability": "Request"

If the data is not very important in a large data-volume cluster, it can be set to asynchronous Fsync operations Translog every 5 seconds, configured as follows:

Put/my_index/_settings
{
    "index.translog.durability": "Async",
    "Index.translog.sync_interval": "5s"
}

The above configuration can be set in each index, and can be dynamically requested to take effect at any time, so if our data is relatively not very important, we turn on the asynchronous flush Translog This operation, so performance may be better, but the bad situation may be lost in 5 seconds of data, So consider the importance of your business before you set it up.

If you do not know how to use, then use ES default configuration on the line, after each request to perform translog FSYCN operation to avoid data loss.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.