How to use filter efficiently in Elasticsearch

Source: Internet
Author: User
Tags bitset

Here is a very good article, very good, translation and collation of a bit, English is good, suggest directly read the original: http://euphonious-intuition.com/2013/05/all-about-elasticsearch-filter-bitsets/

Elasticsearch There are bool filter, and, or, not filter, these look very similar, what is the difference? When do you use Boolfilter? When do I use and filter?

In fact, the bool filter and and, or, not filter are completely different, and the effect on query performance is very large.

The first thing we need to know is how the filter works, one of the core things called Bitset, can be understood as a large array of bits, each element in the array has 2 states: 0 and 1 (Bloom filter know? , and filter everyone knows that processing documents only matches or not, and does not involve document scoring operations. If a document matches a filter query, the corresponding bit bit is set to 1, and the match is set to 0.

Es in the execution of filter query filtering, will open the Lucene of each segment segment file, and then to determine whether the document conforms to the filter or not, the result of this match we can use Bitset to store up, the next time the same filter query come over, We directly use the memory inside the bitset to make judgments, and do not need to open the Lucene segment file, to avoid the IO operation, so that can greatly improve the speed of query processing, which is why filter so efficient reason.

Because Lucene's segment section file is constant, Lucene produces new segments, but the old ones are immutable, so the bitset is reused, depending on the filter conditions and the different segments, the corresponding Bitset is generated, In addition, different queries may involve the intersection of multiple bitset, and the computer is very adept at this bit bit processing and is very fast.

In addition, if the result of the filter is empty, then the Bitset bit inside is 0,es after the filter is processed, it will ignore the bitset, improve performance.

Before we finish the basics, let's look at the difference between the bool filter and the filter.

The bool filter uses the previously mentioned Bitset data structure (Bitset faction), and \or\ Notfilter cannot take advantage of Bitset (Non-bitset).

And, or, not filter is a document-by-file processing of Doc by Doc, es loads the field contents of the document, and then checks to see if the contents of the field satisfy the query criteria, and the unsatisfied document is excluded from the result set, and then iterated until all the documents have been completed again. This intermediate process does not use the Bitset mentioned previously, and it is not possible to reuse cache resources

If you have multiple filter conditions, that is, one and, or, and not contain multiple filter filters (the way arrays are supported), then the logic of processing is that each filter will sequentially pass the resulting set of results to the next filter, and the number of documents processed in the theory will be reduced, Because only the filter will be reduced, will not increase, and so on, so the general constraints can be more stringent in front of the implementation, so that the following filter needs to process the number of documents will be very small, this can greatly improve the overall processing speed, in addition to the number of considerations, but also to consider the efficiency of the filter , some filter execution is inefficient, such as Geo filter (a large number of calculations) or script based filter (dynamic script), it is recommended that these high performance overhead queries put the final execution to improve the overall processing speed.

Well, now there should be such a concept, and, or, not is a document by document, processing, if your result set is large, that is, a very loose query, hit a lot, then you use and, or, not filter is not appropriate, However, some filter files must be processed by the document by document, as in the following filter:

    • geo* Filters
    • Scripts
    • Numeric_range

So in addition to the above few can not be done, the other filter should use the BOOL filter to improve query performance.

If you need to use both the Bitset and Non-bitset filter types in your query, you can combine them with the bool filter and the And\or\not filter,

As I said earlier, and is the result set is passed backwards, so we put the performance of the front, Non-bitset put and the filter of the back, such as the following a complex filter with multiple filter types

{  [    {      {        [          {{}}< /c13>, {{}}          ,          { {}}        ]      }    },    {      [        {{} },        {{}}      ]    }  ] }

And in the outermost do wrapper, the first filter is a bool filter, there are 3 must sub-filter, after processing, get the document result set, and then execute an or sub-filter,or inside two queries will be carried out separately, The final Document result set is our search result.

In summary, when using the filter, be sure to use the Bitset stream first, and then consider the filter sequence and the combination of problems

    • Geo, Script or Numeric_range filter: Using And/or/not Filters
    • All other: Using the Bool Filter

Mastering the above, it is not difficult to write high-performance queries.

This article originates from: http://log.medcl.net/item/2013/09/elasticsearch-inside-the-various-filter/

How to use filter efficiently in Elasticsearch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.