Brief introduction of mapping in Elasticsearch

Source: Internet
Author: User
Tags curl lowercase

Recently, the project is ready to use elasticsearch, first need to make clear the concept of elasticsearch, found this article on the Internet is good, in plain language to understand the concept of mapping.


Default Mapping

Elasticsearch (hereinafter referred to as ES) is not a schema, when we execute the following command:

Curl-xput HTTP://LOCALHOST:9200/TEST/ITEM/1-d ' {' name ': ' Zach ', ' description ': ' A pretty cool guy. '} '

ES can be very smart to recognize that the type of "name" and "description" fields is string, and es by default creates the following mapping.

Mappings: {
    item: {
        properties: {
            Description: {
                type:string
            }
            Name: {
                type:string
            }
        }
    }
}

What is mapping

The mapping of ES is very similar to the data type in static languages: Declaring a variable of type int, which can only store data of type int later. Similarly, a mapping field of type number can only store data of type number.

Mapping has other meanings than the data type of the language, mapping not only tells ES what type of value is in a field, it also tells ES how to index data and whether the data can be searched.

When your query does not return the corresponding data, your mapping is likely to have a problem. When you are in doubt, check your mapping directly.


Analysis of mapping

A mapping consists of one or more analyzer, one analyzer is composed of one or more filter. When ES index the document, it passes the contents of the field to the corresponding Analyzer,analyzer and passes it to the respective filters.

The function of filter is easy to understand: a filter is a method of converting data, enter a string, this method returns another string, such as a method of converting a string to lowercase is a good example of filter.

An analyzer consists of a set of sequential filter, the process of performing the analysis is in order a filter a filter called sequentially, ES Storage and Index results.

In summary, the role of mapping is to execute a series of instructions to turn the input data into searchable index entries.


Default Analyzer

Back to our example, es guessing the Description field is a string type, and the default is to create a string type of mapping that uses the default Global analyzer, the default analyzer is Standard Analyzer, This standard analyzer has three Filter:token filter, lowercase filter and stop token filter.

We can type the _analyze keyword to view the analysis process while doing the query. Use the following instructions to view the conversion process for the Description field:

Curl-x get "http://localhost:9200/test/_analyze?analyzer=standard&pretty=true"-D "A pretty cool guy."
 
{"
  tokens": [{
    "token": "Pretty",
    "Start_offset": 2,
    "End_offset": 8, "
    type": "<ALPHANUM> ",
    " position ": 2
  }, {
    " token ":" Cool ",
    " Start_offset ": 9,
    " End_offset ": +,"
    type ":" <a Lphanum> ",
    " position ": 3
  }, {
    " token ":" Guy ",
    " Start_offset ":"
    end_offset ": 17, "
    type": "<ALPHANUM>",
    "position": 4
  }]

As you can see, the value of our Description field is converted to [pretty], [cool], [guy], in the conversion process uppercase A, punctuation is filtered out by filter, pretty also turned into all lowercase pretty, where it is more important, Even if the ES store data is still stored in the full data, but can search the data of the key word only left the three words, the rest are discarded.

Look at the results of the search with Word a:

$ curl-x Get "http://localhost:9200/test/_search?pretty=true"-d ' {
    "query": {
        "text": {"description": "A"}
  }
'
 
{
  took ': "
  timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
   "Failed": 0
  },
  "hits": {
    "total": 0,
    "Max_score": null,
    "hits": []
  }
}

The text type search uses the same analysis/filtering system as before in the query process, so we enter "a" and mapping will not have any return, because the word "a" is not stored and indexed by ES. Conversely, if we search by using the word "cool":

Curl-x get "http://localhost:9200/test/_search?pretty=true"-d ' {
    "query": {
        "text": {"description": "Cool"}< c2/>}
'
 
{
  took ': "
  timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,< c10/> "Failed": 0
  },
  "hits": {
    "total": 1,
    "Max_score": 0.15342641,
    "hits": [{
      ] _index ":" Test ",
      " _type ":" Item ",
      " _id ":" 1 ",
      " _score ": 0.15342641," _source ": {" name ":" Zach "," description " : "A pretty Cool Guy"}}}}

Getting the right results right now is a generally accepted simple example, but it describes how ES works, not thinking of mapping as a data type, and imagining it as a collection of instructions to search for data. If you do not want the character "a" to be deleted, you need to modify your analyzer.

Original: http://euphonious-intuition.com/2012/07/an-introduction-to-mapping-in-elasticsearch/


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.