[elasticsearch2.x] Filter principle of filter __elasticsearch

Source: Internet
Author: User
Tags aliases bitset curl deprecated
1. Elasticsearch 2.0 Change 1.1 queries combined with filters

Queries (Queries) and filters (filters) are merged-all filter clauses are now query clauses (ALL filter clauses are today query clauses.). Instead, the query clause can now be used in either the query context or the filter context:

Query context

Queries used in the context of the query will calculate the correlation score and will not be cached. Use the query context as long as the filter contexts do not apply.

Filter context

Queries used in the context of the filter will not compute the correlation score and can be cached. The filter context is introduced by the following: Constant_score query for Must_not in BOOL query and (newly added) filter parameter Function_score filter and filters parameters in query any API called filter For example Post_filter search parameters, or in aggregation and index aliases (any API called filter, such as the Post_filter search parameter, or in aggregations or index Aliases) 1.2 or and and is implemented by BOOL

Previous OR and and filters have different execution modes from the bool filter. (It used to is important to use and/or with certain filter clauses, and bool with others.).

Now this distinction has been removed: now BOOL queries are smart enough to handle both situations well. Because of this change, the OR and and filters are now internal execution syntax for BOOL queries. These filters will be deleted in the future. 1.3 Filtered queries and query filter obsolescence

The query filter has been deprecated and no longer needed-all queries can be used in the query or filter context.

Filtered query has been deprecated. Filtered inquiries are as follows:

Get _search {"Query": {"filtered": {"Query": {"match": {"
          text": "Quick brown fox"
        }
      },
      ' filter ': {'
        term ': {
          ' status ': ' Published '
}}}}

To convert queries and filters to the must and filter parameters in a bool query:

Get _search {"Query": {"bool": {"must": {"match": {"
          text": "Quick brown fox"
        }
      },
      ' filter ': {
        ' term ': {
          ' status ': ' Published '
    }
}}}
1.4 Filter Automatic Caching

Previously, you can control which filters are cached by using the _cache option and provide a custom _cache_key. These options have been deprecated and, if they exist, will be ignored.

The query clauses used in the filter context can now be automatically cached. The algorithm takes into account the frequency of usage, query execution cost, and the cost of constructing filters.

The terms filter lookup mechanism does not cache document content. Now relies on the file system cache. If the lookup index is not too large, it is recommended that you copy it to all nodes by setting Index.auto_expand_replicas:0-all to eliminate network overhead. 1.5 Java API Query and filter refactoring

Org.elasticsearch.index.queries.FilterBuilders has been removed from the ElasticSearch2.0 as part of the query and filter mix. These filters can now use methods with the same name in Querybuilders. All methods that can accept Filterbuilder can now accept QueryBuilder.

Previous use mode:

Filterbuilders.boolfilter ()  
    . Must (Filterbuilders.termfilter ("name", "John"))  
    . Mustnot ( Filterbuilders.rangefilter ("Age"). (a).  
    to (a) should (Filterbuilders.termfilter ("City", "Beijing"));

You can now use the following methods:

Boolquerybuilder Boolquerybuilder = Querybuilders.boolquery ();
Boolquerybuilder.must (Querybuilders.termquery ("name", "John"));
Boolquerybuilder.must (Querybuilders.rangequery ("age"). from (a). to ();
Boolquerybuilder.must (Querybuilders.termquery ("City", "Beijing");
Constantscorequerybuilder QueryBuilder = Querybuilders.constantscorequery (Boolquerybuilder);
2. Deep understanding of filter

When performing a elasticsearch query, multiple operations are performed internally. Take the following query as an example:

Curl-xput  ' localhost:9200/my_store/products/1 '-d ' {
"price": " 
ProductID": "xhdk-a-1293-#fJ3"
}';

Curl-xput  ' LOCALHOST:9200/MY_STORE/PRODUCTS/2 '-d ' {
"price": " 
ProductID": "Kdke-b-9947-#kL5"
}';

Curl-xput  ' LOCALHOST:9200/MY_STORE/PRODUCTS/3 '-d ' {
"price": " 
ProductID": "Jodl-x-1937-#pV7"
}';

Curl-xput  ' LOCALHOST:9200/MY_STORE/PRODUCTS/4 '-d ' {
"price": " 
ProductID": "Qqpx-r-3956-#aD8"
}';

For example, we want to query the product ID for xhdk-a-1293-#fJ3的产品:

Curl-xget ' Localhost:9200/my_store/products/_search?pretty '-d '
{
    "query": {
        "Constant_score": { 
            " Filter ': {'
                term ': { 
                    ' ProductID ': ' xhdk-a-1293-#fJ3 '}}}
'
2.1 Find a matching document

The term query finds the entry xhdk-a-1293-#fj3 in the inverted index and retrieves all the documents that contain the entry. In this case, only file 1 has the entry we're looking for. Then get all the documents that contain the term. 2.2 Building Bitset

The filter then builds a bitset-an array of 1 and 0 that describes which documents contain lookup entries. The label for the matching document is 1. In our example, Bitset will be [1,0,0,0] (only document 1 has the entry we are looking for). Internally, it is represented as a "roaring bitmap" that can encode sparse or dense sets efficiently. 2.3 Iterative Bitset (s)

Once the bitsets,elasticsearch is generated for each query, the bitsets is traversed to find a collection of matching documents that meet all of the filter criteria. The sequence of execution is heuristic (the order of execution is decided heuristically), but the most sparse bitsets is usually the first iteration (since it excludes the largest number of documents). 2.4 Increased use of counters

Elasticsearch can cache non-scoring queries for faster access, but a bit unreasonable is that it caches something that is rarely used. Because of the inverted index, the non-scoring calculation is already pretty fast, so we just want to cache the queries that we know will be reused later to avoid wasting resources.

To achieve the above goal, Elasticsearch tracks the history of each index query usage. If the query is used multiple times in the last 256 queries, it is cached in memory. When Bitset is cached, caching is omitted for segments with less than 10,000 documents (or 3% less than the total index size). These small segments are about to disappear, so caching them is a waste.

This is not the case (execution is a bit complicated, depending on how the query plan is redesigned, some heuristics are based on the query cost) (execution is a bit more complicated based in how the query planner Re-arra Nges things, and some heuristics based on the query cost), you can theoretically assume that a non-scoring query is performed prior to the scoring query. The goal of non-scoring queries is to reduce the number of documents that are calculated with a high cost score query to achieve fast searching.

Conceptually remember that non-scoring calculations are performed first, which will help write efficient and fast search requests.

Original: Https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html#_internal_filter_operation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.