Explore | Elasticsearch How do I physically delete historical data for a given period?

Source: Internet
Author: User
Tags curl
1. Preface

In the mind of deletion, the basic cognition is delete, subdivided into deleted documents (document) and delete index; To delete historical data, the basic cognition is: Delete the data of the given condition, use Delete_by_query.
Actual operation found:
-After you delete the document, the disk space does not decrease immediately, but it increases.
-There is no better way to do it than to +delete_by_query a timed task. 2. Common Delete Operations 2.1 delete a single document

Delete/twitter/_doc/1
2.2 Delete a document that satisfies a given condition
POST twitter/_delete_by_query
{"
  query": {" 
    match": {
      "message": ' Some message
    }
}}

Note: Version conflicts may occur when a bulk deletion is performed. The deletion is enforced in the following ways:

POST twitter/_doc/_delete_by_query?conflicts=proceed
{"
  query": {
    "Match_all": {}
  }
}
2.3 Delete a single index
Delete/twitter
2.4 Delete all indexes
DELETE/_all

Or

DELETE/*

Removing all indexes is a very risky operation, and be careful. 3, delete the background of what the document did.

To perform the returned results after the deletion:

{"
  _index": "Test_index", "
  _type": "Test_type",
  "_id": "All",
  "_version": 2, "Result
  ": "deleted" ,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 2,
  "_primary_ term ":
}

Interpretation:

Each document of the index is versioned.
When you delete a document, you can specify a version to make sure that the related document that we are trying to delete is actually deleted and that no changes have been made during that period.

Each write that is performed on the document, including the deletion, will increase its version.

Real Time to delete:

Deleting a document doesn ' t immediately remove the document from disk; It just marks it as deleted. Elasticsearch'll clean up deleted documents in the background as your continue to index more data. 4. The difference between deleting an index and deleting a document.

1 The deletion of the index will immediately free space, there is no so-called "tag" logic.

2 When you delete a document, you write the new document and mark the old document as deleted. Whether or not the disk space is released depends on whether the old document is in the same segment file, so the segment merge in the ES background may trigger the physical deletion of the old document during the merging of segment file.

But because a shard may have hundreds of segment files, there is a great chance that new and old documents exist in different segment and cannot be physically deleted. To manually free up space, you can only do the force merge on a regular basis and set the max_num_segments to 1.

POST/_forcemerge
5, how to save only the last 100 days of data.

With the above understanding, the data tasks that are saved for nearly 100 days are decomposed into:
-1) Delete_by_query set to retrieve data for nearly 100 days;
2) Perform forcemerge operations and manually free disk space.

The deletion script is as follows:

#!/bin/sh
curl-h ' Content-type:application/json '-d ' {
    "query": {"
        range": {
            "pt": {
                "LT": " now-100d ",
                format:" Epoch_millis "}}
            }
'-xpost ' Http://192.168.1.101:9200/logstash _*/
_delete_by_query?conflicts=proceed "

The merge script is as follows:

#!/bin/sh
curl-xpost ' Http://192.168.1.101:9200/_forcemerge?
Only_expunge_deletes=true&max_num_segments=1 '
6, there is no more general method.

have, use ES official website tool--curator tool. 6.1 Curator Introduction

Main purpose: To plan and manage the index of ES. Supports common operations: Create, delete, merge, Reindex, snapshot, and so on. 6.2 Curator Website Address

Http://t.cn/RuwN0oM

git address: Https://github.com/elastic/curator 6.3 Curator Installation Wizard

Address: Http://t.cn/RuwCkBD

Attention:
Curator various blog tutorials are endless, but curator old version and the new version has a big difference, suggest reference to the latest official website manual deployment.
The old version of the command-line method is not supported by the new version. 6.4 Curator Command line operation

$ curator--help
usage:curator [OPTIONS] action_file

  curator for Elasticsearch indices.

  Http://elastic.co/guide/en/elasticsearch/client/curator/current

Options:
  --config path  path to Configuration file. Default: ~/.curator/curator.yml
  --dry-run don't      perform any changes.
  --version Show the      version and exit.
  --help Show this and         exit.

Core:
-profile CONFIG.YML: Configure the ES address, log configuration, log level, and so on to be connected; Execute file action.yml: Configure the action to be performed (batch), format of the configuration index (prefix matching, regular match, etc.) 6.5 curator applicable scenario

The most important thing is:

For example, delete operations only: Curator can easily delete the index of X days from the premise that the index naming follows a specific naming pattern-for example, a day-named index: logstash_2018.04.05.

The naming pattern needs to correspond to the timestring under the delete_indices in the action.yml. 7, summary reference to the latest official website documents, historical version of the historical document is easy to mislead people; more real practice, not limited to know; Medcl:es the new version 6.3 has an index lifecycle Management can easily manage the retention period for the indexes.

Reference:

[1]http://t.cn/ruwotv
[2]HTTP://T.CN/RUWXHBR
[3]http://t.cn/ruwoofc

2018-04-22 14:51 thinking about the bed in the home

Author: Ming Yi World
Reprint please indicate the source, the original address:
https://blog.csdn.net/laoyang360/article/details/80038930
If you feel this article to help you, please click ' Praise ' support, your support is I insist on writing the biggest motivation, thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.