Problems and Solutions encountered when using elasticsearch in the production environment (constantly updated)

Source: Internet
Author: User
Tags java throws
1. nodes are removed from the cluster due to GC

This is because the JVM stops working during GC. If the GC time of a node is too long, the master ping3 times (the default Ping failure of Zen discovery is retried 3 times) will remove the node from the cluster after it fails, as a result, the index is re-allocated. Solution: (1) Optimize GC to reduce GC time. (2) Increase the number of retries (ES parameter: ping_retries) and timeout (ES parameter: ping_timeout) of Zen discovery ). The root cause was that the hard disk of the system with one node was full. This results in a reduction in system performance. 2. Out of memory errorBy default, elasticsearch does not limit the size of the field data cache. During query, elasticsearch places the field value in the memory, especially for facet queries. elasticsearch has high memory requirements, it stores the results in memory, sorts the results, and uses the memory until the memory is used up. When the memory is insufficient, the out of memory error may occur. Solution: (1) set the elasticsearch cache type to soft reference, which is characterized by a strong reference function. This type of memory is recycled only when the memory is insufficient. Therefore, when the memory is sufficient, it is usually not recycled. In addition, these cited objects can be set to null before Java throws an outofmemory exception. It can be used to cache some common images and implement the cache function to ensure maximum memory usage without causing outofmemory. Add index. cache. Field. type to the es configuration file:
Soft. (2) set the maximum number of cached elasticsearch data records and the cache expiration time. cache. field. max_size: 50000 to set the maximum value of the cached field to 50000 and the index. cache. field. expire: 10 m to set the expiration time to 10 minutes. 3. The local thread cannot be created.Es recovery error: recoverfilesrecoveryexception [[[Index] [3] failed to transfer [215] files with total size of [9.4 GB]; Nested: outofmemoryerror [unable to create new Native thread, the data is also increased. The maximum number of threads of a JVM process is: Virtual Memory/(stack size * 1024*1024). That is to say, the larger the virtual memory or the smaller the stack, the more threads you can create. After resetting, this error will still be reported. It is reasonable to say that the number of threads that can be created is enough, and the system may be limited. Later, I found it on the Internet and said it was a problem with Max user processes. The default value is 1024. this parameter is the maximum number of processes opened by the user, the maximum number of threads that a user can create. Because a process has at least one thread, the maximum number of processes is indirectly affected. This error is not reported after the parameter is increased. Solution: (1) Increase the JVM heap memory or reduce the XSS stack size (512 KB by default ). (2) Open/etc/security/limits. d/90-nproc.conf and increase the value of 1024 in soft nproc 1024. 4. When the cluster status is yellow, an error is reported when data is inserted concurrently.[7]: Index [Index], type [Index], Id [1569133], message [unavailableshardsexception [[Index] [1] [4] shardit, [2] active: timeout waiting for [1 m], request: Org. elasticsearch. action. bulk. bulkshardrequest @ 5989fa07] This is an error message. The cluster status is yellow, that is, the copy is not allocated. At that time, the replica was set to 2 and there was only one node. When you set a replica larger than the allocable machine, if you insert data, the following error may be reported, because the write consistency of ES is quorum by default, that is, the quorum value must be greater than (number of replicas/2 + 1 ), I want to insert 2/2 + 1 = 2 to at least two indexes here. Since there is only one node and quorum is equal to 1, it is only inserted to the primary index, the above error is reported if the copy is not found. Solution: (1) Remove unallocated copies. (2) Change the write consistency to one, that is, write only one index. 5. Set the JVM startup warning when locking the memoryWhen Bootstrap. mlockall: True is set, start es and report the warning unknown mlockall error 0, because the Linux system can lock the process to 45 KB by default. Solution: Set it to unrestricted. Linux Command: ulimit-l Unlimited 6. The cluster becomes stuck due to incorrect API usageIn fact, this is a very low-level error. The function is to update some data and may delete some data. However, when deleting the data, a colleague uses the deletebyquery interface to create a boolquery to upload the ID of the data to be deleted, delete the data. But the problem is that boolquery supports a maximum of 1024 conditions, and there are already many 100 conditions. Therefore, such a query suddenly stops the es cluster. Solution: Use bulkrequest for batch deletion. References: Workshop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.