Loggly: Nine Advanced configuration tips for improving Elasticsearch performance

Source: Internet
Author: User
Tags curl api disk usage loggly

The loggly Log Management Service uses Elasticsearch as a search engine in many of its core functions. In his article "ElasticSearch vs SOLR", Jon Gifford noted that the field of log management has a higher demand for search technology. In general, it must be able to:

    • Reliable large-scale real-time indexing-for us, processing more than 100,000 log data per second;

    • High-performance, reliable processing of highly concurrent search requests on the same index.

When we set up the Gen2 Log Management Service, we studied the configuration information of Elasticsearch repeatedly, so as to get the highest performance of index and search. Unfortunately, these configuration items are scattered everywhere and it's not easy to find them all. This article summarizes our experience, and you can refer to these items listed in this article to optimize the use of ES in your application.

Tip 1: Before you begin, you need to understand the deployment topology

Loggly uses ES 0.90.13, which uses a deployment topology that separates the primary node from the data node, and we don't go into the details here, but we want to emphasize that you need to be very clear about the deployment topology before deciding how to configure it.

In addition, we interact with the data node using the ES node client. This makes the client transparent to the data node, and it only cares about the interaction with the node client. To set the node as the primary node or the data node, you only need to set the two property to true or false. For example, to set a elasticsearch as a data node, you can set this:

Node.master:false and Node.data:true

It's easy, right. Then we'll discuss some of the ES advanced properties you might be interested in. In most cases, the default configuration for ES is sufficient, but if you want to make your service behave the same as the high-performance Log Management Service we see at any time, the following recommendations will work for you.

Tip 2:mlockall Property is the ultimate weapon for getting performance efficiency returns

Linux divides its physical memory (RAM) into a set of memory blocks called pages. Swapping is a memory page that is copied to a pre-defined space (swapspace, swap space) on the hard disk, freeing up memory processing. The combined size of physical memory and swap space is the amount of virtual memory available.

Swapping has a flaw. Disk speed is slow compared to memory. The memory speed is measured in nanoseconds, while the hard disk speed is measured in milliseconds, so the time spent accessing the disk is hundreds of thousands of times times the amount of access to memory. The more times you access the disk, the slower the performance is, so try to avoid swapping.

The Mlockall property allows the ES node to not be swapping. (note applies to Linux/unix systems only). This property can be set in the Yaml file.

Bootstrap.mlockall:true

The Mlockall default is false, meaning that the ES node is allowed to swapping. Note that if you set this property in a file, you must restart the ES node. You can see if the property setting takes effect by executing the following statement:

Curlhttp://localhost:9200/_nodes/process?pretty

If you decide to set this property, be sure to reserve enough memory for the ES node through the-DXMX option or es_heap_size.

Tip 3:discovery.zen Property Set control elasticsearch Discovery Protocol

The Zen Discovery protocol is used to Elasticsearch discover other nodes in the cluster and establish communication. The Discovery.zen.* attribute collection forms the Zen Discovery Protocol. Unicast and multicast are an effective part of the Discovery Protocol:

1. Multicast means that when one or more requests are sent to all nodes, the nodes in the cluster will be discovered.

2. Unicast is a one-to-a-connection between the IP addresses in the node and the discovery.zen.ping.unicast.hosts.

For unicast to take effect, you need to set discovery.zen.ping.multicast.enabled to false. You also need to set up a set of host domain names through discovery.zen.ping.unicast.hosts, which should contain the domain name of the primary node used for communication.

The discovery.zen.minimum_master_nodes is used to set the minimum number of qualified master nodes that a node needs to see "sees" in order to perform cluster operations. It is strongly recommended to set this value to a value greater than 1 when there are more than 2 nodes in the cluster. One way to calculate this value is n/2+ 1,n is the number of primary nodes.

Data nodes and master nodes are checked against each other in the following two different ways:

    • The master node pinging all the other nodes in the cluster to determine if they are working properly.

    • All other nodes confirm that they are working properly through the pinging master node, or they need to start the election process.

The node detection process is controlled by the Discover.zen.fd.ping_timeout property. This property defines the maximum time the node waits for feedback, and the default value is 30s. If your Internet connection is not good, you need to adjust the value appropriately. If the network speed is slow, this value should be set higher. The higher the value, the less likely it is to find the failure.

loggly Set the Discovery.zen property collection as follows:

discovery.zen.fd.ping_timeout:30s

Discovery.zen.minimum_master_nodes:2

Discovery.zen.ping.multicast.enabled:false

discovery.zen.ping.unicast.hosts: ["Esmaster01″," Esmaster02″, "Esmaster03″]

The meaning of the above attribute is that the node detection timeout is 30s, by setting discovery.zen.fd.ping_timeout. In addition, at least two primary nodes can be detected by other nodes (we have 3 primary nodes). With unicast protocol, the unicast domain Name list is: Esmaster01,esmaster02, esmaster03.

Tip 4: Treat delete_all_indices! with caution

A particularly important thing is that the Curl API in ES does not have a good authentication mechanism built in. Altogether simple curlapi can cause the index to be deleted, all the data will be lost. Here is an instruction that will result in a false deletion:

Curl-xdelete ' http://localhost:9200/*/'

To avoid this kind of tragedy, you only need to set the following properties:

Action.disable_delete_all_indices:true.

This property ensures that even if the above curl instruction is executed, the index will not be dropped to cause an error.

Tip 5: Field data caching results in very slow classification searches (facet search)

This is how the field data cache is described in the Elasticsearch guide:

The field data cache is primarily used when sorting or classifying a field. It will load all the field values into memory. Establishing a field data cache for a field is expensive and requires allocating enough memory to ensure full loading.

You should keep in mind that improper setting of this value will result in:

    • Poor classification search and sorting performance

    • If you classify queries on very large indexes, you will cause ES node memory overflow

For example:

indices.fielddata.cache.size:25%

When setting this value, it's important to consider what type of classification search your app will be doing.

Tip 6: Optimize index requests

In loggly, we built our own index management system, because the nature of log management means that there will be frequent updates and mapping relationships to change. The function of this index management system is to manage the indexing of ES clusters. It checks when the index needs to be created or closed based on an existing configuration policy. There are many policies for index management. For example, when the index size grows to a specific value or if the time is longer than a certain time value, the index management system will close the old one and create new ones.

This article is compiled by the journal Help (public number Id:rizhibang).

When an index management system sends an index request to a node for processing, the node updates its own mapping relationship table and concurrently to the master node. The master node is sent to the map relationship table for that node, which is a total older version. If there is a conflict, and it is not a bad thing (that is, the cluster actually has the correct mapping relationship table), we only need to send an update from this node to the primary node. To make the index request more efficient, we set this property on the data node.

Indices.cluster.send_refresh_mapping:false

In turn, it is more important to send an updated mapping relationship table, because for some reason, the mapping Relationship table on the master node conflicts with the actual node. In this case, updating the mapping relationship table will record a warning on the master node.

Tip 7: Teach you to assign related properties using Elasticsearch

Shard (Shard) allocation is the process of allocating shards to nodes. This can occur during initial recovery, replica allocation, or rebalancing. It can also occur when a node is added or removed.

The Cluster.routing.allocation.cluster_concurrent_rebalance property specifies the number of shards to use for concurrent rebalancing. The setting of this property depends on the hard drive condition, such as the number of CPUs, IO performance, and so on. If this property is set improperly, Elasticsearch index performance will be affected.

Cluster.routing.allocation.cluster_concurrent_rebalance:2

The value defaults to 2, meaning that only 2 shards can be moved at any point in time. This value is set to a lower level, which reduces fragmentation rebalancing and avoids affecting the index.

Another Shard allocation property is cluster.routing.allocation.disk.threshold_enabled. If this property is set to True, disk space is taken into account when assigning shards to nodes.

When set to True, the Shard allocation takes into account two conditions: The low value, the high value.

    • The low value corresponds to the disk usage, and when reached, ES no longer assigns new shards. In the following example, when disk usage is 97%, ES will stop allocating shards

    • The high value corresponds to the disk usage, when the Shard begins to move out of the node (99% in the following example)

Cluster.routing.allocation.disk.threshold_enabled:true

cluster.routing.allocation.disk.watermark.low:.97

cluster.routing.allocation.disk.watermark.high:.99

Tip 8: Set recovery-related properties to shorten restart time

ES contains several recovery-related property items that can be used to improve recovery and restart times for Elasticsearch clusters. We'll show you a few simple examples below. For you, the best value depends on the hardware condition you are using, and the advice we can give is testing, testing, or testing.

This attribute defines how many shards of a node can be used to perform recovery at any time. Reclaiming Shards is an IO-intensive operation, so you need to set this property value carefully.

Cluster.routing.allocation.node_initial_primaries_recoveries:18

This property controls the number of primary shards (primaryshards) that are initialized at the same time on a single node. The number of parallel streams of reclaimed shards transferred from the node to the peer node is controlled by the Indices.recovery.concurrent_streams property. The following values are set for instances of the Amazon cloud, and if you are using your own hardware, this value may need to be set higher. The Max_bytes_per_sec property is used to set how many bytes are transferred per second, and this property needs to be configured according to hardware conditions.

Indices.recovery.concurrent_streams:4

indices.recovery.max_bytes_per_sec:40mb

All of the above attributes need to take effect after restarting the cluster.

Tip 9:threadpool property to prevent data loss

The ES node has several Threadpools properties that improve the number of threads managed within the node. In loggly, we use block requests extensively (Bulkrequest), and we find it important to set the correct values for the batch thread pool using the Threadpool.bulk.queue_size attribute, to avoid data loss or bulk retry.

threadpool.bulk.queue_size:3000

This property value is about the block request. It specifies the number of requests that are waiting to be processed in the ES node queue when there are no more threads to process the bulk request. This value is set according to your block request load condition. If your block request is higher than this value, you will receive a remotetransportexception exception like the one in the example below.

Note that in ES, the block request queue, a shard corresponding to one of the items, so if you want to send the number of block requests to include a lot of data items to the Shard, then the value of this property needs to be set to a higher than the number of bulk requests you want to send a value. For example, a single block request contains 10 shards of data items, and even if you send only one block request, you must set the queue size to at least 10. This value is too large to consume your JVM heap, but it allows ES to get the most out of the queue and free up your client.

You either set the value higher or properly handle the remotetransportexception exception on the client. If the exception is not handled properly, you will lose data. Below we will simulate the presentation of an exception by sending more than 10 block requests (the queue value is set to 10).

Remotetransportexception[[<bantam>][inet[/192.168.76.1:9300]][bulk/shard]];nested: Esrejectedexecutionexception[rejected Execution (queue capacity) on Org.elasticsearch.action.support.replication.trans[email Protected]13fe9be];

Summary: The configuration properties of ES are essential for its flexibility and scalability

The deeper the elasticsearch of the configuration items, the greater the benefit of loggly, because in our use case, ES's design parameters have been used to the extreme (and sometimes more so, we will continue to share in subsequent articles). If you are already able to meet your application's current needs with the default configuration, rest assured that you have a lot of room to optimize as your app grows.

Original address: https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/

Translator: Log Gang (Id:rizhibang)

Loggly: Nine Advanced configuration tips for improving Elasticsearch performance

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.