Elasticsearch Filters Characteristics

Source: Internet
Author: User

Optimizing Queries with Filters

Elasticsearch supports a variety of different types of queries, which you should all be familiar with. However, the query is not the only option when choosing which document should match successfully and which document should be presented to the user. ElasticSearch Query DSL allows the vast majority of queries that a user can use to have their own identities, which are also nested into the following query types:

    • constant_score
    • filterd
    • custom_filters_score

So the question comes, why bother to use filtering? In what scenario can I use only queries? Then try to solve the problem above.

Filters (Filters) and caches

First, as the reader thinks, filters is a good choice for caching, and Elasticsearch also provides this special cache, the filter cache, to store filters resulting set of results. In addition, the cache filters does not require much memory (it retains only one message, which document matches the filter), and it can be reused by other queries, greatly improving the performance of the query. Imagine that you are running the following query command:

{    "query" : {        "bool" : {            "must" : [            {                "term" : { "name" : "joe" } }, { "term" : { "year" : 1981 } } ] } }}

The command queries a document that meets the following criteria: The domain value is the same as the name joe year domain value 1981 . This is a very simple query, but if it is used to query football players for information, it can query all the athletes who meet the specified name and the year of birth.

If you build a query in the format of the above command, the query object binds all the conditions to the cache, so if we query for athletes with the same name but different year of birth, Elasticsearch cannot reuse any of the information in the query command above. So, let's try to optimize the query. Because 1000 people may have 1000 names, the names are not suitable for caching, but the year is good ( year there aren't too many different values in the general domain, right?). )。 So we're introducing a different query command that combines a single simple query with a filter.

{    "query" : {        "filtered" : {            "query" : {                "term" : { "name" : "joe" } }, "filter" : { "term" : { "year" : 1981 } } } }}

We used a query object of the filtered type, and the query object included both the query element and the filter element. After running the query command for the first time, Elasticsearch will cache the filter, and if another query uses the same filter, the cache will be used directly. In this way, Elasticsearch does not have to load the same information multiple times.

Not all filters will be cached by default.

The cache is powerful, but in fact Elasticsearch does not cache all filters by default. This is because some filters use the domain data cache. This cache is typically used in scenes that are sorted by domain values and faceting operations. By default, the following filters are not cached:

    • Numeric_range
    • Script
    • Geo_bbox
    • Geo_distance
    • Geo_distance_range
    • Geo_polygon
    • Geo_shape
    • and
    • Or
    • Not

Although the last three kinds of filters mentioned above do not use the domain cache, they are primarily used to control other filters, so it is not cached, but the filters they control are cached when they are used.

Changing the behavior of the Elasticsearch cache

Elasticsearch allows users to turn on or off the filters caching function by using the _chache and _cache_key properties themselves. Back to the previous example, assuming that we cache the results of the keyword filter and name the key for the cache entry year_1981_cache , the query command is as follows:

{    "query" : {        "filtered" : {            "query" : {                "term" : { "name" : "joe" } }, "filter" : { "term" : { "year" : 1981, "_cache_key" : "year_1981_cache" } } } }}

You can also turn off the cache of the keyword filter by using the following command:

{    "query" : {        "filtered" : {            "query" : {                "term" : { "name" : "joe" } }, "filter" : { "term" : { "year" : 1981, "_cache" : false } } } }}
Why bother to name the key to the cache key?

In other words, is it necessary for me to use the _cache_key attribute in such a troublesome way, elasticsearch not be able to implement this function on my own? Of course it can be implemented on its own and control the cache when necessary, but sometimes we need more control. For example, there are not many opportunities for query reuse, and we want to periodically clear the cache for these queries. If you do not specify _cache_key, you can only clear the entire filter cache (filter caches), instead, you only need to execute the following command to clear the specific cache:

‘localhost:9200/users/_cache/clear?filter_keys=year_1981_cache‘
When should I change the behavior of the Elasticsearch filter cache?

Of course, there are times when users should be more aware of business needs rather than letting Elasticsearch predict the distribution of data. For example, suppose you want to limit a query to a limited number of locations using the Geo_distance filter, which uses the same parameter values in a multiple query request, that is, the same script is used multiple times with the filter. In this scenario, it is worthwhile to turn on caching for the filter. You need to ask yourself this question at any time. "Will the filter be reused multiple times?" "Adding data to the cache is an operation that consumes machine resources, and users should avoid unnecessary waste of resources."

Keyword search filter

Caching and standard queries are not the entire content. With the release of the Elasticsearch 0.90 release, we got an ingenious filter that can be used to get multiple values from Elasticsearch as a parameter to query (similar to SQL in operations).

Let's look at a simple example. Suppose we have in an online bookstore that stores the information about the books that the user buys, i.e. the bookstore's customers. The books index is simple (stored in the Books.json file):

{     "mappings" : {         "book" : {             "properties" : {                "id" : { "type" : "string", "store" : "yes", "index" : "not_analyzed" }, "title" : { "type" : "string", "store" : "yes", "index" : "analyzed" } } } }}

In the above code, nothing is unusual; only the ID and title of the book. Next, let's take a look at the Clients.json file, which stores the mappings information for the clients index:

{"Mappings": {"Client": { "Properties": { "id": {" store ": " yes ",  "index":  "not_analyzed"},  "type":  "string",  "store": " yes ", " index ": " analyzed "}, " books ": {" type ": " string ", " store ": " yes ", " index ":  "not_analyzed"}} } }} 

The index defines the ID information, the name, and the list of the ID of the user who purchased the book. In addition, we need some sample data:

curl -XPUT ‘localhost:9200/clients/client/1‘ -d ‘{ "id":"1", "name":"Joe Doe", "books":["1","3"]}‘curl -XPUT ‘localhost:9200/clients/client/2‘ -d ‘{ "id":"2", "name":"Jane Doe", "books":["3"]}‘curl -XPUT ‘localhost:9200/books/book/1‘ -d ‘{ "id":"1", "title":"Test book one"}‘curl -XPUT ‘localhost:9200/books/book/2‘ -d ‘{ "id":"2", "title":"Test book two"}‘curl -XPUT ‘localhost:9200/books/book/3‘ -d ‘{ "id":"3", "title":"Test book three"}‘

Next, imagine the requirements below, and we want to show all the books that a user has purchased, taking the user with ID 1 as an example. Of course, we can execute a request to curl -XGET ‘localhost:9200/clients/client/1‘ get the purchase record of the current customer, then take out the value in the Books field and execute a second query:

curl -XGET ‘localhost:9200/books/_search‘ -d ‘{"query" : {        "ids" : {            "type" : "book",            "values" : [ "1", "3" ]        }    }}‘

This is too much trouble, ElasticSearch 0.90 has introduced a keyword query filter (term lookup filter), the filter only needs a query can be done by the above two queries to do things. The query using this filter is as follows:

Curl-xget ' Localhost:9200/books/_search '-d ' {    "query": {        "filtered": {"            query": {                "Match_all": {}
   },            "filter": {                "terms": {"id": {"                        index": "Clients",                        "type": "Client",                        "id": "1",                        " Path ":" Books "                    },                    " _cache_key ":" Terms_lookup_client_1_books "}}}}    '

Note _cache_key The value of the parameter, and you can see terms_lookup_client_1_books that it contains the customer ID information. Note that if you set the same for different queries _cache_key , unpredictable errors will occur. This is because Elasticsearch stores the query results based on the specified key and then re-uses it in different queries. Next look at the return value of the above query:

{    ..."Hits": {"Total":2,"Max_score":1.0,"Hits": [{"_index":"Books","_type": "book",  "_id":  "1", " _score ": 1.0, " _source ": { "1",  "title":  "Test Book one"}} , { "books",  "_type":  "book",  "_id":  "3",  "_score": 1.0,  "_source": { "id":  "3",  "title":  "Test book three"}} ] }} 

That's the result we want to see, that's great!

How the term filter works

Review the query commands that we sent to Elasticsearch. As you can see, it's just a simple filter query that contains a full-volume query and a terms filter. Just in the query command, the terms filter uses a different technique--not explicitly specifying certain term values, but dynamically loading from other indexes.

As you can see, our filter is based on the ID field, because it requires only the ID field to integrate all the other attributes. Next you need to focus on the new attributes in the ID field: Index,type,id, and path. The Idex property indicates the index source in which the terms is loaded (in this case, the clients index). The Type property tells Elasticsearch our target document type (in this case, the client type). The id attribute indicates the target document in the document type that we specify in the index. Finally, the Path property tells Elasticsearch which domain the term should be loaded from, in this case the books domain of the clients index. To summarize, Elasticsearch's work is to load the term in the books domain from the clients indexed client document type with the ID 1 document. These obtained values will be used to filter the documents that are queried from the books index (the destination of the command execution is the books index), which is the value of the Document ID field (in this case, the terms filter name is called an ID) that exists in the filter terms.

Elasticsearch Filters Characteristics

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.