Some of the problems encountered by the production environment using elasticsearch and how to resolve them (constantly updated)

Last Update:2018-07-24 Source: Internet

Author: User

Tags java throws

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Node out of cluster caused by GCBecause the GC will cause the JVM to stop working, if a node GC takes too long, master Ping3 (Zen Discovery default ping fails to retry 3 times) then rejects the node out of the cluster, causing the index to be redistributed. Workaround: (1) Optimize GC to reduce GC time. (2) Increase the number of retries (es parameter: ping_retries) and Time-out (es parameter: ping_timeout) of the Zen discovery. It turns out that the root cause is that a system with a node is on the drive full. Cause system performance to degrade.
2.out of memory errorBecause ES to the field data cache size is unrestricted by default, the query will put the field values in memory, especially facet query, memory requirements are very high, it will put the results in memory, and then sort operations, always use memory, until the memory runs out, An out of memory error may occur when there is insufficient RAM. Workaround: (1) Set ES cache type is soft Reference, its main feature is a strong reference function. This kind of memory is only recycled when there is not enough memory, so when memory is sufficient, they are usually not recycled. In addition, these reference objects are guaranteed to be set to NULL before Java throws a OutOfMemory exception. It can be used to implement the cache of some commonly used images, realize the function of the cache, and ensure the maximum use of memory without causing outofmemory. In the ES configuration file plus index.cache.field.type:soft. (2) Set the ES maximum number of cache data and cache expiration time, by setting index.cache.field.max_size:50000 to set the maximum value of cache field to 50000, set Index.cache.field.expire: 10m set the expiration time to 10 minutes.
3. Unable to create local threading problemEs recovery times error: recoverfilesrecoveryexception[[index][3] Failed to transfer [215] files with total size of [9.4GB]]; Nested:outofmemoryerror[unable to create new native thread]; ]] At first thought it was the file handle limit, but the thought of the previous report is too many open file this error, and also changed the data. Data learned that the maximum number of threads for a process's JVM process is: virtual memory/(stack size *1024*1024), which means that the larger the virtual memory or the smaller the stack, the more threads can be created. Re-set or will be reported that this error, supposedly can create the number of threads completely enough, think is not a system of some limitations. Later on the internet to find that is the max user processes, the default is 1024, this parameter to see the name is the maximum number of users open process, but the official note is that the user can create a maximum number of threads, because a process has at least one thread, so indirectly affect the maximum number of processes. After this parameter is changed, there is no report of this mistake. Workaround: (1) Increase the JVM's heap memory or reduce the XSS stack size (default is 512K). (2) Open/etc/security/limits.d/90-nproc.conf, soft nproc 1024 This line of 1024 to change the line.
4. When the cluster status is yellow, insert data error[7]: Index [index], type [index], id [1569133], message [unavailableshardsexception[[index][1] [4] Shardit, [2] Active:t Imeout waiting for [1m], Request:org.elasticsearch.action.bulk.bulkshardrequest@5989fa07] This is the error message when the cluster status is yellow, that is, the copy is not assigned. At that time, the replica is set to 2, only one node, when you set the copy is larger than the assignable machine, at this time if you insert data can report the above error, because the write consistency of ES by default is to use quorum, that is, the quorum value must be greater than (copy number/2+1), I here 2/2+1= 2 that is to be inserted at least two index, because there is only one node, quorum equals 1, so only inserted into the main index, the copy can not find to report the above error. Workaround: (1) Remove the unassigned copy. (2) Change the write consistency to one, that is, just write an index on the line.
5. Set the JVM to start warning when lock memoryWhen setting up Bootstrap.mlockall:true, start es alarm unknown mlockall error 0, because the Linux system defaults to a process that locks the memory to 45k. Workaround: Set to Unrestricted, Linux command: Ulimit-l Unlimited
6. Incorrect use of API causes cluster card to dieIn fact, this is a very low-level error. function is to update some of the data, may be deleted, but the deletion of colleagues used deletebyquery this interface, through the construction of Boolquery to delete the data ID passed in, to find out the data deleted. But the problem is that boolquery up to only 1024 conditions, 100 conditions are already many, so such a query all of a sudden to the ES cluster card dead. Workaround: Use bulkrequest for bulk delete operations.
7.org.elasticsearch.transport.remotetransportexception:failed to deserialize exception response from StreamCause: The JDK versions between ES nodes are not the same workaround: Unified JDK Environment

Reference: Http://stackoverflow.com/questions/344203/maximum-number-of-threads-per-process-in-linux/HTTP Www.elasticsearch.org/guide/reference/setup/installation.html http://blog.sematext.com/2012/05/17/ Elasticsearch-cache-usage/http://www.searchtech.pro/articles/2013/02/15/1360942664366.html

This address: http://blog.csdn.net/laigood12345/article/details/8193170

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More