Data index of Elasticsearch

Source: Internet
Author: User

For a tool that provides full-text indexing, the index is a critical process--only through index operations, the data can be analyzed for storage, an inverted index is created, which allows the consumer to query the relevant information.

This article expands on the content of the Data index operation for es:

More content reference: Elasticsearch Data Summary

Index operations

The simplest usage is to specify the index index, type, ID (the index of the verb to be differentiated from the index), and refer to the following example:

$ curl-xput'HTTP://LOCALHOST:9200/TWITTER/TWEET/1'-D'{    "User":"Kimchy",    "post_date":"2009-11-15t14:12:12",    "message":"trying out Elasticsearch"}'

This stores the data with ID 1 in the tweet type in the index Twitter.

The result of the index operation is:

{    "_shards" : {        " Total":Ten,        "failed":0,        "successful":Ten    },    "_index":"Twitter",    "_type":"Tweet",    "_id":"1",    "_version":1,    "created":true}

Shard-related information is described in the above _shards, which is currently a total of 10 shards (5 primary shards, 5 sub-shards, and all available), and information related to index, type, ID, version.

Automatically create indexes

If there is no Twitter index in ES before the above operation, the index will be created directly by default, and the Type field will be created automatically. Thatis, ES does not need to define a table's structure in advance, like a traditional database .

The types in each index have a mapping mapping, which is dynamically generated , so when the new field is added, the mapping setting is automatically added.

You can turn off the auto-create index feature by setting Action.auto_create_index to False in the configuration file.

Automatically create an index feature, or you can set a blacklist or whitelist , such as:

Setting Action.auto_create_index to +aaa*,-bbb*,' + ' means allowing the creation of an index with the start of the AAA, '-' means not allowing the creation of an index at the beginning of the BBB .

About version numbers

The version number maintains the status of a document, and we only operate on documents with the highest version number.

Document numbers can not only be stored in the document, but also externally maintained version numbers, specific reference to the official documents ....

Operation Type Op_type

ES provides a "missing-to-join" function via the parameter op_type, which is indexed if the document is not in Es, and if so, an error is returned.

If a document with ID 1 already exists, an error will be made, using the _create API directly, the same effect:

Auto-Create ID:

In the top example, ES will take the document ID we specify as ID. If you do not specify an ID, one is randomly assigned:

Routing Routing

ES is queried by routing, and a query typically passes through the following process:

1 nodes receive requests, broadcast to each shard

2 shards receive requests, perform calculations, return results

3 Merging messages, returning

If we set the routing information, it is equivalent to Tell ES, which shard to query the data, also canceled the broadcast merge this process, thus improving the efficiency of the query. How to use:

$ curl-xpost'Http://localhost:9200/twitter/tweet?routing=kimchy'-D'{    "User":"Kimchy",    "post_date":"2009-11-15t14:12:12",    "message":"trying out Elasticsearch"}'

The route is implemented by hashing, and if we specify the value of routing directly at the index, the hash value is computed according to this value, the Shard is allocated, and if not specified, it is assigned according to the ID. because the IDs are normally generated randomly, it is guaranteed that the data payload of the shards is the same by default. If we need to save specific content in a particular shard, we can use the route to specify the Shard. However, in the future, as the amount of data increases, it may also lead to excessive pressure on a particular shard.

In addition, you can set the relevant values of the routing directly when you define mapping . If the data in this type does not specify a value of routing, the route value defined in mapping is used by default.

Parent setting Parent-child relationship

The dependencies of some documents may be involved in ES, and you can set this relationship by using the parent parameter:

$ curl-xput localhost:9200/blogs/blog_tag/1122? parent=1111'{    " Tag " " something " }'
_timestamp setting timestamps

The timestamp field can also be specified during an index operation:

 $ curl-xput Localhost:9200 /twitter/tweet/ 1 ? Timestamp=2009 - 11  -15t14%3a12%3a12-d  " {  user   ": "  kimchy   " ,  "   : "  trying out Elasticsearch   "  Span style= "color: #800000;" > '  

If you do not manually specify a timestamp, the time stamp is not present in the _source, and the time specified for the index is set. However, you need to specify that the _timestamp in mapping is set to enable

PUT my_index{  "mappings": {    "my_type  ": {      "_timestamp": {         "  Enabled"True}}}  }
TTL document expires

ES can also set the document to automatically expire, expiration is set a positive time interval, and then _timestamp as the benchmark, if time-out, it will be automatically deleted.

If set to timestamp:

' http://localhost:9200/twitter/tweet/1?ttl=86400000 ' ' {    "user""kimchy",      "message""Trying out Elasticsearch, so far so good? " " }'

If set to a date mathematical expression:

' http://localhost:9200/twitter/tweet/1?ttl=1d ' ' {    "user""kimchy",      "message""Trying out Elasticsearch, so far so good? " " }'

You can also specify in the JSON field:

Curl-xput'HTTP://LOCALHOST:9200/TWITTER/TWEET/1'-D'{    "_ttl":"1d",    "User":"Kimchy",    "message":"Trying out elasticsearch good?"}'
Manual Refresh

Since ES is not a framework for real-time indexed search, it takes 1 seconds for the data to be searched after the index operation. The search here refers to the retrieval operation. If you're using the Get API, that's real real-time operation. The difference between them is that the search may also need to be analyzed and the calculation of the score relevance sort operations.

In order to be able to search immediately after the data index operation, you can also perform the refresh operation manually . Just add refresh=true after the API.

This operation is recommended only in special cases, and if it is performed in a large number of operations, each operation performs a refresh, which is very expensive.

Timeout timed out

Shards are not always available, and are not indexed when the shards are backed up and so on. Therefore, you need to wait for the shards to be available before you can proceed. At this point, there will be a certain waiting time, if more than the time to return and throw an error, this wait time can be set by timeout:

$ curl-xput'http://localhost:9200/twitter/tweet/1?timeout=5m'-D'{    "User":"Kimchy",    "post_date":"2009-11-15t14:12:12",    "message":"trying out Elasticsearch"}'

The above is the index operation related knowledge, there are some advanced knowledge, such as the Shard and version number of the detailed usage, because the ES or understanding is not thorough enough, do not do too much to tell, lest too many mistakes.

If you have any objection, please correct me.

Elasticsearch Data Index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.