Elasticsearch common operations: Document

Source: Internet
Author: User
[TOC]

1. Create a document 1.1 with the specified ID
PUT my_blog/article/1{  "id":1,  "title":"elasticsearch",  "posttime":"2017-05-01",  "content":"elasticsearch is helpfull!"}

Return Value:

{  "_index": "my_blog",  "_type": "article",  "_id": "1",  "_version": 1,  "result": "created",  "_shards": {    "total": 2,    "successful": 1,    "failed": 0  },  "created": true}

The version number automatically increases with the document update.

1.2 do not specify an ID

If no ID is specified, es will automatically generate it, but only post is used:

POST my_blog/article{  "id":2,  "title":"spark",  "posttime":"2017-05-01",  "content":"spark is helpfull!"}

Return Value:

{  "_index": "my_blog",  "_type": "article",  "_id": "AWagTCv8O1qbT1zqbREV",  "_version": 1,  "result": "created",  "_shards": {    "total": 2,    "successful": 1,    "failed": 0  },  "created": true}
2 get document 2.1 normal get

Get existing documents:

GET my_blog/article/1

Return Value:

{  "_index": "my_blog",  "_type": "article",  "_id": "1",  "_version": 1,  "found": true,  "_source": {    "id": 1,    "title": "elasticsearch",    "posttime": "2017-05-01",    "content": "elasticsearch is helpfull!"  }}

Obtain a document that does not exist:

GET my_blog/article/2

Return Value:

{  "_index": "my_blog",  "_type": "article",  "_id": "2",  "found": false}
2.2 test whether the document exists

Use head to test whether the document exists:

HEAD my_blog/article/1200 - OKHEAD my_blog/article/2404 - Not Found
2.3 batch acquisition

Different types of indexes:

GET _mget{  "docs":[    {      "_index":"my_blog",      "_type":"article",      "_id":1    },    {      "_index":"twitter",      "_type":"tweet",      "_id":2    }  ]}

Different types under the same index:

GET my_blog/_mget{  "docs":[    {      "_type":"article",      "_id":1    },    {      "_type":"essay",      "_id":2    }  ]}

Same index and type:

GET my_blog/article/_mget{  "docs":[    {"_id":1},    {"_id":2}  ]}

Or:

GET my_blog/article/_mget{  "ids":[1,2]}
3. Update the document

The principle of the es update document is: first find this document, delete the content of the old document, and perform the update. After the update, index the latest document.

Add the following document first:

PUT test/type1/1{  "counter":1,  "tags":["red"]}
3.1 update document Fields

Increase the counter value by 4:

POST test/type1/1/_update{  "script": {    "inline": "ctx._source.counter += params.count",    "lang": "painless",    "params": {      "count":4    }  }}

Note1: inline is the execution script in the command, CTX is an execution object in the script language, painless is a built-in scripting language of ES, and Params is a set of parameters;

Note2: CTX objects can be accessed in addition_sourceYou can also access_index,_type,_id,_version,_routing,_parentAnd other fields;

Add a value to the tags field:

POST test/type1/1/_update{  "script":{    "inline":"ctx._source.tags.add(params.tag)",    "lang":"painless",    "params":{      "tag":"blue"    }  }}
3.2 add and remove Fields

Add a field name to test/type1/1:

POST test/type1/1/_update{  "script": {    "inline": "ctx._source.name=\"test\""  }}

The preceding command can also be abbreviated:{"script":""ctx._source.name=\"test\""}

Remove the name field:

POST test/type1/1/_update{  "script": {    "inline": "ctx._source.remove(\"name\")"  }}
3.3 upsert operation

If the document does not exist, upsert creates a new document. If the document exists, the Script script is executed normally. As follows:

POST test/type1/2/_update{  "script": {    "inline": "ctx._source.counter += params.count",    "lang": "painless",    "params": {      "count":4    }  },  "upsert": {    "counter":1,    "tag":["pink"]  }}

If test/type1/2 exists, update count. If not, create a document containing counter and tag fields.

Return Value:

{  "_index": "test",  "_type": "type1",  "_id": "2",  "_version": 1,  "result": "created",  "_shards": {    "total": 2,    "successful": 1,    "failed": 0  }}
4. query updates
POST my_blog/_update_by_query{  "script":{    "inline": "ctx._source.content = params.content",    "lang": "painless",    "params": {      "content":"spark is popular"    }  },  "query":{    "term": {      "title": {        "value": "spark"      }    }  }}

Return Value:

{  "took": 33,  "timed_out": false,  "total": 1,  "updated": 1,  "deleted": 0,  "batches": 1,  "version_conflicts": 0,  "noops": 0,  "retries": {    "bulk": 0,    "search": 0  },  "throttled_millis": 0,  "requests_per_second": -1,  "throttled_until_millis": 0,  "failures": []}
5. delete a document
DELETE my_blog/article/2

If you specify a route when indexing a document, you can also add the route parameters when deleting the document:

DELETE my_blog/article/2?routing=user123

Note1: if the route value is incorrect during deletion, the deletion will fail;

Note2: When the ing _ routing is set to required and no route value is specified, the deletion operation will throw a routing missing exception and reject the request;

6. query and delete
POST my_blog/_delete_by_query{  "query":{    "term": {      "title": {        "value": "mybatis"      }    }  }}

Delete all documents under a type:

POST my_blog/article/_delete_by_query{  "query":{    "match_all":{}  }}
7 batch operation 7.1 Command Format

Run the following command:

curl -XPOST ‘localhost:9200/indexname/_bulk?prettry‘ --data-binary @accounts.json

The accounts. JSON file must meet the following format:

Action_and_meta_data row data

Note1: In the action_and_meta_data row, the action must be index, create, update, or delete. Metadata must specify_index,_typeAnd_id;

Note2: The data row is the added data. When adding a document, the data row is required;

Note3: the end of each line must have a line break "\ n", and the last line must also exist. line breaks can effectively separate each line;

7.2 Add document
{"index": {"_index": "my_blog"}, "_type": "article", "_id": "1"}{"title": "blog title"}

Or

{"create": {"_index": "my_blog", "_type": "article", "_id": "1"}}{"title": "blog title"}

You can also leave the ID empty.

7.3 delete a document
{"delete": {"_index": "website", "_type": "blog", "_id": "123"}}
7.4 comprehensive Cases

The following content includes the index document request, update document request, and delete document request:

{"delete": {"_index": "website", "_type": "blog", "_id": "123"}}{"create": {"_index": "website", "_type": "blog", "_id": "123"}}{"title": "blog title"}{"index": {"_index": "website", "_type": "blog"}}{"title": "blog title"}{"update": {"_index": "website", "_type": "blog", "_id": "123"}}{"doc": {"title": "blog title"}}
8. Version Control

For the first contact, 8.2 can be ignored. Just take a look at the 8.3 command operations.

When ES updates a document, it first reads the source document and updates the original document. After the update operation is completed, it re-indexes the entire document.

8.1 lock Control

It is very likely that multiple users modify and update the data in the same document at the same time. In this case, transaction control or concurrency control are required.

8.1.1 pessimistic lock Control

If a thread modifies the data, it locks the data. If other threads want to access the data, they need to wait for the current lock to release. This ensures that only one thread can access the data at a time. Traditional relational databases use many such locks, such as row locks, table locks, read locks, and write locks.

8.1.2 optimistic lock Control

Data resources are not locked, and data integrity is checked only when data is submitted. Elasticsearch uses the optimistic lock mechanism. Optimistic locks are applicable to application types with more read operations than write operations, saving lock overhead and improving throughput.

8.2 elasticsearch Version Control

Since elasticsearch uses optimistic locks, how can we ensure that old data does not overwrite new data? Use _ version in elasticsearch for version control. Each time a document is updated, 1 is added.

Elasticsearch's document version control mechanism mainly includes internal version control and external version control:

  • The internal version control mechanism requires that each operation request be successful only when the version number is equal;
  • The external version control requires that the version of the external document be updated successfully only when the version of the internal document is higher;

In fact, no matter whether the request obtains data or updates the data, the version number can be carried. No matter how complicated the situation is, you only need to remember the following two points:

  • 1. If you only request data acquisition, the internal version control mechanism will take effect, and the external version control mechanism will not take effect, as shown below:
    • A. the version number is not included. The operation is successful:
    • B. Carry the internal version number, so it must be the same as the current version number of the document;
    • C. Carry the external version number, so it must be the same as the current version number of the document;
  • 2. The update operation is as follows:
    • A. if the version number is not included, the operation is successful. The document version number is incremented by 1;
    • B. Carry the internal version number, so it must be the same as the current version number of the document. The document version number will be added with 1;
    • C. Carry the external version number, which must be later than the current version number of the document;

You can also consider the problem from the perspective of whether the version number is included:

  • 1. If no version number is included, the request for data retrieval and update operations are successful;
  • 2. Any operation that carries the internal version number must be the same as the document version number;
  • 3. Carry the external version number. When requesting to obtain data, the version number must be equal. During the update operation, the version number must be greater than the document version number;

I personally don't understand Why elasticsearch is designed like this, but it is indeed a version control mechanism.

8.3 command operation
GET website/blog/1?version=1PUT website/blog/1?version=5&version_type=external
9 Routing Mechanism 9.1 partition Location Calculation and case studies

When there are multiple parts in the index, how does es determine which document is saved to index a document? Assume that the environment is as follows:

Master Node:shard0(primary)shard1 shard2(primary)Common Node:shard0shard1(primary)shard2

The routing mechanism of ES places documents with the same hash value into the same primary shard using the hash algorithm. The method is as follows:

shard = hash(routing) % number_of_primary_shards

If we add a document with no ID specified, the ID generated by ES isAWagTCv8O1qbT1zqbREV, Apply the above formula, shard should be:

shard = hash("AWagTCv8O1qbT1zqbREV") % 3

Of course, the implementation of the hash function is not necessarily. We can call the hash () function provided by the hash function in Python to demonstrate the above calculation:

>>> shard = hash("AWagTCv8O1qbT1zqbREV") % 3>>> shard2

Obviously, this document will be stored in Part 2, that is, 3rd parts.

We can see from the above introduction that the default routing mode can ensure that the data is evenly distributed, but you can also customize the routing value to specify the storage location of the document.

9.2 elasticsearch process and custom routing Value

If there is an index with 50 shards, the process of executing a query on the cluster is as follows:

  • (1) query requests are first received by a node in the cluster;
  • (2) The node that receives the request broadcasts the query to each shard of the index;
  • (3) After each shard completes the search query and returns the result;
  • (4) The results are merged, sorted, and returned to the user on the channel node;

You can customize the route value to avoid this broadcast. The following is a case study.

Normally, we will add a document as follows:

PUT website/article/1{    "title":"My first blog entry",    "text":"Just trying this out...",    "user":"user123"}

When querying, you want to query all articles of user123:

GET website/article/_search{    "query":{        "term":{"user":"user123"}    }}

Obviously, the query will follow the process above, that is, the request will be sent to all shards. In this case, we hope to optimize it.

Add a document using user as routing:

PUT website/article/1?routing=user123{    "title":"My first blog entry",    "text":"Just trying this out...",    "user":"user123"}

After the routing value is specified, all the articles (documents) published by user123 will be stored in the same partition. This is assumed to be Part 1. In this way, when we query the articles published by user123, you only need to specify the routing value during search so that the request will not be broadcast. The request directly reaches shard 1 and the query is as follows:

GET website/article/_search?routing=user123

Note1: this will also cause problems. For example, user123 has published hundreds of thousands of articles, but there are only a few other users. Obviously, the distribution of data is not balanced;

Note2: You can also specify multiple route values for the document. Route values are separated by commas (,). In this way, the document may be allocated to Multiple shards, as for the shards that meet the conditions, how to choose elasticsearch, and through which algorithms, you can study them on your own );

Elasticsearch common operations: Document

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.