Force removal of deleted files in Elasticsearch

Source: Internet
Author: User

Elasticsearch is a real-time distributed search engine based on Apache Lucene, in order to improve the real-time of search, and to store documents in segment by non-modification (immutable) method. In other words, a segment cannot be modified after writing to the storage system. So how does lucene delete an indexed document from a segment? In a nutshell, when a user issues a command to delete an indexed document #abc, the document is not immediately removed from the corresponding segment that stores it, but a special file is marked for deletion. When the user searches again for #abc, Elasticsearch can still find #abc in segment, but since the #abc document has been marked for deletion, Lucene will remove #abc from the search results sent back to the user, so the user feels that #abc has been deleted.

Elasticseach will have a background thread regularly segment merging merge operations based on Lucene's consolidation rules, which generally do not require users to worry about or take any action. Deleted documents are actually deleted when they are merged in segment. Until then, it will still occupy resources such as the JVM heap and the operating system's file cache. In some cases, we need to force Elasticsearch to segment merging, which frees up a lot of the system resources it occupies.

POST/{index}/_optimize?only_expunge_deletes=true&wait_for_completion=true

The _optimize command enforces a segment merge and removes all documents marked for deletion. Segment merging to consume CPU, as well as a large number of I/O resources, be sure to do so while your elasticsearch cluster is in the Maintenance window and have sufficient I/O space (such as SSD), which is likely to cause cluster crashes and data loss.

The following figure shows the CPU and disk I/O usage observed when we are forcing expunge. The cluster runs on Microsoft's Azure cloud Platform IaaS virtual machine, with all data nodes using D13 virtual machines and data stored on local SSD disks. The cluster is a backup cluster that, in order to ensure a smooth merge, pauses all writes to it during this period, with only a small amount of read operations. It is important to note that the expunge operation is a last resort operation, that is, if the elasticsearch does not effectively purge the deleted files automatically. It is also recommended that during this operation, it is best to stop all read/write operations on the cluster and pause the automatic assignment of the Shard (cluster.routing.allocation.enable= none) in case there is a loss of data due to the Shard automatically assigned after the node is kicked out.



The following two settings can be used to control the processing speed when purging, where the given value is the default value and can be adjusted according to the requirements, see merge for details. In addition, you can temporarily set the replica of all indexes to 0 so that only expunge is used for primary to reduce I/O pressure.


PUT/{index}/_settings
{
"Settings": {
"Index.merge.policy.expunge_deletes_allowed": "10",
"Index.merge.policy.max_merge_at_once_explicit": "30"
}
}


References Lucene ' s handling of Deleted Documents.
  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.