Introduction of Elastic Stack-elasticsearch use (v)

Source: Internet
Author: User

First, preface

The first 4 elasticsearch usage of the API and the principle of things introduced a bit, I believe that we have a certain understanding of elasticsearch, and then we mainly from the establishment of the index to the later stage of some optimization to do some introduction;

Ii. Construction of mapping

The previous introduction of index is like our database Database,type the equivalent of our table, and mapping is the bridge to build these fields and index relationships. When the database is built we have to obey the three paradigms, so what should we consider when building mapping? I think the following considerations are necessary:

1. What is the type of the field;

The type of the field has been introduced, consider this time and database field settings to consider the basic problem;

2. Whether the need to be retrieved, that is, the need for participle;

Fields that do not need to be retrieved: index is set to false; character types that do not need to be retrieved are set directly to the keyword type;

Need to retrieve the field, can be set by index_options the granularity of the required storage word, mainly the following four kinds of parameters docs, freqs, position, offsets, according to the need to set their own;

3. Whether sorting and aggregation is required;

No sorting or aggregation analysis required: Doc_values set to False,fielddata set to false;

If a field does not require retrieval, sorting, and aggregation analysis, then enabled is set to false;

4. Whether additional storage is required;

Set true from the store to store the original contents of the field, here's an example

That's what we need to think about when we build mapping, and then we have another question about how the tables and tables in the database relate to each other, and how does this work in Elasticsearch?

There are two ways to implement correlation in Elasticsearch,

1.Nested object (Nested object)

As an example:

This time the user field is mapped to the object type, and this time the JSON document is processed as:

When we query, it will be the result of unexpected occurrence;

Results:

Elasticsearch provides this solution for nested object (nested objects) in this case,

Add this step and then follow the above steps to get the result we want;

When using a nested document inside the nested object, the query is not available when searching for first name is Alice and last name is white;

2.join

The join data type is specified in the same index by parent and child to specify the parents document and subdocument, and then form a 1-to-many or 1-to-1 relationship, for example:

I this is more than 5.0 version, not 6.0 version, this I still forget to say, people notice the issue of the version, I probably said this means that only specify the My_join_field field question the parent is answer;

Next we'll add 2 parent documents:

The next step is to add 2 sub-documents:

The problem to note here is that the route must be specified, the parent document must be on the same shard as its own document, and the document and parent ID of the specified join;

This time we look at the documentation we have formed:

{    "took": -,    "timed_out":false,    "_shards": {        " Total":5,        "successful":5,        "skipped":0,        "failed":0    },    "hits": {        " Total":4,        "Max_score":NULL,        "hits": [            {                "_index":"My_index",                "_type":"Doc",                "_id":"1",                "_score":NULL,                "_source": {                    "text":"This is a question",                    "My_join_field": {                        "name":"question"                    }                },                "Sort": [                    "1"                ]            },            {                "_index":"My_index",                "_type":"Doc",                "_id":"2",                "_score":NULL,                "_source": {                    "text":"This is a another question",                    "My_join_field": {                        "name":"question"                    }                },                "Sort": [                    "2"                ]            },            {                "_index":"My_index",                "_type":"Doc",                "_id":"3",                "_score":NULL,                "_routing":"1",                "_source": {                    "text":"This was an answer",                    "My_join_field": {                        "name":"Answer",                        "Parent":"1"                    }                },                "Sort": [                    "3"                ]            },            {                "_index":"My_index",                "_type":"Doc",                "_id":"4",                "_score":NULL,                "_routing":"1",                "_source": {                    "text":"This is another answer",                    "My_join_field": {                        "name":"Answer",                        "Parent":"1"                    }                },                "Sort": [                    "4"                ]            }        ]    }}
View Code

Seeing the results, we can see that our 1-to-many relationship has been established, and then we follow the parent ID to query the document with a parent ID of 1:

This place also need to remind, also can use GET, just use artifact when this is more convenient;

The results are as follows:

You can also use Has_child to return the parent document that contains a subdocument, has_parent return the subdocument that contains the parent document. Do not write an example to explore it yourself;

Let's make a comparison:

Note: The mapping field is not set too much;

Third, partial optimization of the views

recommended parameters in Elasticsearch.yml:

Before the cluster is set up by the time you are certain parameters, in addition to the above your introduction also set some parameters, we also specifically under the production environment before the actual setting of those parameters:

1.cluster.name cluster name;

2.node.name node name;

Whether the 3.node.master is the primary node;

4.node.data whether to store data, the master node is not recommended to store data;

5.discovery.zen.ping.unicast.hosts set other nodes of the cluster;

6.network.host IP;

7.path.data and path.logs the directory where records and logs are stored, by default in the Elasticsearch directory;

8.discovery.zen.minimum_master_nodes the number of election master after the cluster hangs;

JVM Settings:

1. Set the minimum heap size (Xms) and maximum heap size (XMX) to be equal to each other to prevent garbage collection from becoming too frequent;

2. Set the XMX to no more than 50% of the physical RAM to ensure that sufficient physical RAM is available for the kernel file system cache;

3. Do not exceed 32GB, which is recommended for JVM optimization;

settings for the Shard:

suggested reading this article;

read/write optimization:

Write the optimization, this reference under elastic Stack-elasticsearch use Introduction (c) This article, this article I introduce some steps to write, we optimize the direction is from these 3 directions:

Of course, to implement multi-threaded, bulk write this can not become a guideline, for the refresh, Translog, flush these 3 aspects to do optimization:

1.refresh

Each refresh will generate a segment, if the refresh frequency is too high, may be taken into segment the number of documents contained is very small, generating a lot of segment;

Adjust direction

Increase refresh_interval, reduce the real-time, the default is 1s, set to 1 directly prohibit automatic refresh;

Increase the size of the buffer, the parameter is indices.memory.index_buffer_size (static parameters, set in Elasticsearch.yml, the parameter must be restarted after the node), the default is 10%;

2.translog

The goal is to reduce the frequency of translog write disks, thereby increasing efficiency, but there is a risk of losing data;

Adjust direction

Index.translog.durabiliy set to Asyn,index.translog.sync_interval to set the write time interval, in seconds, such as 10s, then Translog will write to every 10s disk, This time, if the outage will be lost data;

Index.translog.flush_threshold_size defaults to 512MB, which triggers a flush when it exceeds that size;

3. Set the number of nodes and shards reasonably, by setting Index.routing.allocation.total_shards_per_node to limit the number of total primary sub-shards that each index can allocate on each node;

Optimized for read:

1. Set the number of shards reasonably

By testing a shard performance, and then calculating according to the business, set a reasonable number of shards; how do I test the performance of a Shard? First build the same environment as the production, then set a single shard without a copy of the index, and then write the data test and query data, Then according to the monitoring indicators provided to monitor the situation of the pressure test, this part of the monitoring content of the next chapter to explain; Pressure measurement tools can use esrally, refer to this article;

2. Refine query statements

Try to use the filter context, reduce the calculation of the scene, because the filter has a caching mechanism, can improve query performance;

Iv. End of

Next will start talking about monitoring problems, welcome to add group 438836709, welcome to pay attention to my public number!

Introduction of Elastic Stack-elasticsearch use (v)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.